Structural Comparison: Application to the study of Protein Binding - - PowerPoint PPT Presentation

structural comparison application to the study of protein
SMART_READER_LITE
LIVE PREVIEW

Structural Comparison: Application to the study of Protein Binding - - PowerPoint PPT Presentation

Structural Comparison: Application to the study of Protein Binding Patches N. Malod-Dognin 1 (1) Department of Computing, Imperial College London, UK AlgoSB : Algorithms in Structural Bio-informatics Outline 1) Function/Complex/Binding Patches


slide-1
SLIDE 1

Structural Comparison: Application to the study of Protein Binding Patches

  • N. Malod-Dognin1

(1) Department of Computing, Imperial College London, UK

AlgoSB : Algorithms in Structural Bio-informatics

slide-2
SLIDE 2

Outline

1) Function/Complex/Binding Patches 2) Mathematical Models 3) Divide and Conquer Strategies 4) Results

Structural Comparison of Binding Patches 2/36

slide-3
SLIDE 3

Proteins perform their functions through binding

2JEL, an antibody / antigen complex.

Related biological problem :

Measuring the specificity and affinity of the interaction Predicting the structure of a complex based on the unbound structures (docking) Answering these questions requires insight on the surface atoms accounting for the interaction

Structural Comparison of Binding Patches 3/36

slide-4
SLIDE 4

Protein Solvant Accessible Surface (SAS)

Surface as seen by a water probe molecule (Lee & Richard, 1971) : No hydrogen atom, VdW radii + 1.4 Å (water probe) Atoms participating to the SAS are found using so-called α-shape

Structural Comparison of Binding Patches 4/36

slide-5
SLIDE 5

Interface and Binding Patches

b1 b2 b3 b4 b5 a1 a2 a3 a4 a5 a6

Interface : All atoms participating in the interaction i.e. having SAS intersected by the SAS of the other partner

→ Exact computation using the Voronoï Interface Model (Cazals et Al., 2006)

Patch : Interface atoms restricted to a single partner

Structural Comparison of Binding Patches 5/36

slide-6
SLIDE 6

The Atom Shelling Tree Model

Binding Patch as Pattern of Neasted Shells (Malod-dognin, Bansal and Cazals, 2012)

1 43 2 34 3 1 4 23 5 17 6 13 7 1 8 2

Each BP face (spherical polygon) is associated to a Shelling Order (SO), that is the distance, in terms of faces, to the boundary of the patch Connected components of faces having same SO form Shells Inclusion relation between shells is represented in the Atom Shelling Tree

Structural Comparison of Binding Patches 6/36

slide-7
SLIDE 7

Structural Comparison of Patches

Related Questions : Functional prediction

i.e. Do similar receptor bind similar ligand ?

Affinity prediction

Do similar patches have similar binding affinities ?

Structural classification of interfaces ? Do patches change morphology during docking ?

hints towards rigide/flexible docking

Structural Comparison of Binding Patches 7/36

slide-8
SLIDE 8

Pitfalls

Algorithms for protein structure can not be used for comparing patches

No total ordering between atoms on the protein surface.

Hardness of Geometrical Comparison

Ex : Largest quasi-isometric subset of atoms (Brint & Willett, 1987)

↔ maximum clique problem

NP-Complete problem (Karp in 1972) Hard to approximate (Feige et Al., 1991) Fixed parameter intractable (Chen et Al., 2006)

Structural Comparison of Binding Patches 8/36

slide-9
SLIDE 9

Low Resolution Level Methods

Amino-acid level comparison (Scoppi : Winter et Al., 2006) Functional group level comparison (Probis : Konc & Janezick 2010)

Structural Comparison of Binding Patches 9/36

slide-10
SLIDE 10

Outline

1) Function/Complex/Binding Patches 2) Mathematical Models 3) Divide and Conquer Strategies 4) Results

Structural Comparison of Binding Patches 10/36

slide-11
SLIDE 11

Contact Map Overlap Maximization Without Order

Nathalie (El-Kebir et Al., 2011), a cost-split model for PPI Alignment

1 2 3 4 1 2 3 4 CM1 CM2 1 2 3 4 1 2 3 4 CM1 CM2

1.1 2.1 3.1 4.1 1.2 2.2 3.2 4.2 1.3 2.3 3.3 4.3 1.4 2.4 3.4 4.4

Classical representation Alignment graph

Variables

Each vertex i.k is represented by a boolean variable xik Edge (i.k,j.l) are represented by two arcs / boolean variables : yikjl (from i.k to j.l, i < j), zjlik (from j.l to i.k, i < j)

Structural Comparison of Binding Patches 11/36

slide-12
SLIDE 12

Integer Programming Formulation

CM1

1.1 2.1 3.1 4.1 1.2 2.2 3.2 4.2 1.3 2.3 3.3 4.3 1.4 2.4 3.4 4.4

y z

CM2

Objective : max ∑

i,k,j,l

1 2yikjl + ∑

i,k,j,l

1 2zjlik Subject to : 1/row : ∑

k

xik ≤ 1,

∀i

1/col : ∑

i

xik ≤ 1,

∀k

bind y-arcs & tail-vertex :

l

yikjl ≤ xik,

∀i,k,j,i < j

bind z-arcs & tail-vertex :

l

zjlik ≤ xjl,

∀i,k,j,i < j

edge equality : yikjl = zjlik,

∀i,k,j,l,i < j,k = l

Structural Comparison of Binding Patches 12/36

slide-13
SLIDE 13

Lagrangian Relaxation Approach

When relaxing edge equality : yikjl = zjlik Local Problem : ∀i.k, find optimal sum of outgoing yikjl and zjlik arcs, with at most one head vertex per row or per col :

1.1 2.1 3.1 4.1 1.2 2.2 3.2 4.2 1.3 2.3 3.3 4.3 1.4 2.4 3.4 4.4

1 1 1 1 1 1

1.1 2.1 3.1 4.1 1.2 2.2 3.2 4.2 1.3 2.3 3.3 4.3 1.4 2.4 3.4 4.4

1 1 1 1 1 1

1 3 4 1 4

1

Rows Cols

1 1 1 1 1

Global Problem : Maximum Cost (based on outgoing arcs) set of vertices, with at most one vertex per row or column : Both are maximum cost bipartite matching between the rows and columns. Time complexity of Local+Global is O(n4 logn), versus O(n2) for the ordered case.

Structural Comparison of Binding Patches 13/36

slide-14
SLIDE 14

Outline

1) Function/Complex/Binding Patches 2) Mathematical Models 3) Divide and Conquer Strategies 4) Results

Structural Comparison of Binding Patches 14/36

slide-15
SLIDE 15

Divide and Conquer Strategies : Probis (Konc & Janezic, 2010)

Principle : Use 10Å spheres centered at each functional group

Structural Comparison of Binding Patches 15/36

slide-16
SLIDE 16

Divide and Conquer Strategies : Compatch

Principle : Use the Atom Shelling Tree to localized the matching :

Local atom-matchings between shells (using topology or geometry) Global matching is reconstructed via the shell-matching of the tree-edit-distance

1 50 2 1 3 54 4 1 5 2 6 12 7 24 8 1 9 1 1 50 2 48 3 32 4 18 5 3 6 4 Structural Comparison of Binding Patches 16/36

slide-17
SLIDE 17

Using Tree Edit Distance

Using only topological information (Patterns of neasted shells)

1 2 3 4 T1 50 10 20 5 1' 2 3 4' T2 40 10 20 5 1' 2 3 40 10 20

Edit operations and costs : Insert / Delete a shell i : |i| (nb atoms in shell i) Edit shell i into shell j : abs(|i|−|j|) Optimal edit script = minimum cost, denoted by TEDt(T1,T2) Dynamic programming (Bille, 2005) Dissimilarity score : DISt = TEDt(T1,T2)

|T1|+|T2| ,

i.e. % of non-isotopologic atoms

Structural Comparison of Binding Patches 17/36

slide-18
SLIDE 18

Local matching between two shells

Objective : Largest matching between S1 and S2 such that for any two pairs of matched atoms i ↔ k and j ↔ l, |dij − dkl| < 2Å.

1 2 3 4 1 2 3 4 S1 (rows) S2 (columns)

1.1 2.1 3.1 4.1 1.2 2.2 3.2 4.2 1.3 2.3 3.3 4.3 1.4 2.4 3.4 4.4

1 2 3 4 1 2 3 4

S1 S2

4Å 4Å 4Å 4Å 4Å 8Å 4Å 8Å 7Å 7Å 16Å 11Å

Maximum clique problem that is solved using Cliquer (Ostergard 2002).

Structural Comparison of Binding Patches 18/36

slide-19
SLIDE 19

Global matching of the shell using Tree Edit Distance 2/2

Using Geometric information

1 2 3 4 1 2 3 4

i j

4Å 4Å 4Å 4Å 4Å 8Å 4Å 8Å 7Å 7Å 16Å 11Å

TED costs : Insert / Delete a shell i : |i| Edit shell i into shell j : symmetric difference

= |i|+|j|− 2×|i∩ j|,

where i∩ j is the largest quasi-isometric subset between i and j. I J Dissimilarity score : DISg = TEDg(T1,T2)

|T1|+|T2|

Structural Comparison of Binding Patches 19/36

slide-20
SLIDE 20

Outline

1) Function/Complex/Binding Patches 2) Mathematical Models 3) Divide and Conquer Strategies 4) Results

Structural Comparison of Binding Patches 20/36

slide-21
SLIDE 21

Running time comparison : Dataset_1

77 high resolution (≤ 2Å) Immunoglobulin/Antigen from IMGT-3D (Lefranc 2003) 15 high resolution Protease/Inibitor complexes from the Protein Docking Benchmark (Chen et Al, 2003) These 92 complexes yield a total of 184 patches. The all-against-all comparison involves 17020 pairs of patches.

Structural Comparison of Binding Patches 21/36

slide-22
SLIDE 22

Running Time Comparison (over 17020 instances)

50 100 150 200 250 300 50 100 150 200 0.05 0.1 0.15 0.2 0.25 max(#BP1, #BP2) min(#BP1, #BP2) T(s.) 50 100 150 200 250 300 50 100 150 200 100 200 300 400 500 600 max(#BP1, #BP2) min(#BP1, #BP2) T(s.) 50 100 150 200 250 300 50 100 150 200 5 10 15 20 25 30 max(#BP1, #BP2) min(#BP1, #BP2) T(s.)

Total running times :

Top-Left : TEDt : 315.6 secs. Top-Right : TEDg (ε = 2Å) : 9843.8 secs. Bottom : Clique (ε = 2Å) : 1166221.6 secs.

Structural Comparison of Binding Patches 22/36

slide-23
SLIDE 23

Dataset_2

116 high resolution (≤ 2Å) Immunoglobulin/Antigen from IMGT-3D 133 Enzime/Ligand complexes from the Affinity Benchmark (Kastritis et Al., 2011), with resolution in [1.1Å, 3.3Å] 249 complexes → 498 patches → 124251 pairs

Family of complex Sub-Family of complex Partner Type Class identifier #patches (A) Antibody (Carb) Carbohydrate (R) Receptor A_Carb_R * 9 (L) Ligand A_Carb_L * 9 (Chem) Chemical (R) Receptor A_Chem_R * 40 (L) Ligand A_Chem_L * 40 (DNA) DNA (R) Receptor A_DNA_R 1 (L) Ligand A_DNA_L 1 (Pept) Peptide (R) Receptor A_Pept_R * 21 (L) Ligand A_Pept_L * 21 (Prot) Protein (R) Receptor A_Prot_R * 53 (L) Ligand A_Prot_L * 53 (E) Enzyme (Inhi) Inhibitor (R) Receptor E_Inhi_R * 40 (L) Ligand E_Inhi_L * 40 (Regu) Regulator (R) Receptor E_Regu_R * 11 (L) Ligand E_Regu_L * 11 (Subs) Substrat (R) Receptor E_Subs_R * 10 (L) Ligand E_Subs_L * 10 (OG) ? ? non-available non-available OG 34 (OR) ? ? non-available non-available OR 26 (OX) ? ? non-available non-available OX 68

Structural Comparison of Binding Patches 23/36

slide-24
SLIDE 24

Geometric matching and RMSD_d

5 10 15 20 25 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RMSD_d DIS_g Comparison instance

FIGURE: For the 124251 comparison instances,DISg (using a 2Å distance threshold) is plotted against the RMSD_d of the corresponding alignment.

→ Low geometric dissimilarity relates to alignment that are both long and have small

RMSDd values

Structural Comparison of Binding Patches 24/36

slide-25
SLIDE 25

Family Identifications & Partner Retrieving (1/2)

Identifications using nearest neighbors :

Method Same class Partner class Unrelated TEDt 41.6% (A : 50.4% , E : 23.8%) 9.8% 48.6% TEDg : ε = 1Å 57.6% (A : 69.5%, E : 33.6%) 3.3% 39.1% TEDg : ε = 2Å 58.1% (A : 70.3%, E : 33.6%) 5.2% 36.7%

Structural Comparison of Binding Patches 25/36

slide-26
SLIDE 26

Family Identifications & Partner Retrieving (2/2)

Average SO and Patch Asymmetry

1 2 3 4 5 AA_Carb_L AA_Carb_R AA_Chem_L AA_Chem_R AA_DNA_L AA_DNA_R AA_Pept_L AA_Pept_R AA_Prot_L AA_Prot_R E_Inhi_L E_Inhi_R E_Regu_L E_regu_R E_Subs_L E_Subs_R OG OR OX Average Shelling Order Typed Sub-Family

Structural Comparison of Binding Patches 26/36

slide-27
SLIDE 27

Identification vs Dissimilarity

50 100 150 200 250 0.1 0.2 0.3 0.4 0.5 0.6 #Identified binding patches Geometric dissimilarity threshold # Correct+erroneous # Correct # Erroneous FIGURE: Identifications having very low geometric dissimilarities are precise. Then, the higher is the geometric distance between p and ˆ p, the higher is the number of erroneous identifications

Structural Comparison of Binding Patches 27/36

slide-28
SLIDE 28

Classification consistency 1/3

0.0 0.2 0.4 0.6 0.8 2 4 6 8 10 12 14 x Density

Principle : Wilcoxon-Mann-Whitney (WMW) probability that intra-class dissimilarity scores come from the same distribution than the inter-class scores

Structural Comparison of Binding Patches 28/36

slide-29
SLIDE 29

Classification consistency 2/3

Family (=P) % identification WMW probability A_Carb_R 44.4% 1.04e-11 A_Carb_L 33.3% 2.20e-15 A_Chem_R 92.5% 1.74e-260 A_Chem_L 85.0% A_Pept_R 33.3% 4.94e-40 A_Pept_L 38.1% 8.41e-41 A_Prot_R 73.6% 1.45e-96 A_Prot_L 77.4% 1.50e-97 E_Inhi_R 50.0% 9.10e-36 E_Inhi_L 52.5% 2.53e-27 E_Regu_R 0% 0.96 E_Regu_L 0% 0.60 E_Subs_R 20.0% 0.06 E_Subs_L 0% 0.01

TABLE: Low WMW probabilities relate to high identification rates.

Structural Comparison of Binding Patches 29/36

slide-30
SLIDE 30

Classification consistency 3/3

10 20 30 40 50 60 70 80 90 100 100 200 300 400 500 600 Identification rates (in percent) Wilcoxon-Mann_Whitney: -log(P_value) FIGURE: Relation between identification rate and WMW probability is confirmed by a Spearman’s correlation coefficient of -0.893

Structural Comparison of Binding Patches 30/36

slide-31
SLIDE 31

Dissimilar Receptors bind dissimilar Ligands

0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 DIS_g between the corresponding ligands DIS_g between two receptors FIGURE: For each complex, the geometrical dissimilarity between the two receptor patches is plotted agains the one between the corresponding ligands.

Structural Comparison of Binding Patches 31/36

slide-32
SLIDE 32

Similarity and binding affinity

Pearson Spearman Maximal Information Parameter coef. p-value coef. p-value coef. p-value IPL 0.31 1.3e−4 0.43 7.6e−8 0.35 7.6e−4 #Atoms 0.27 1.2e−3 0.37 4.7e−6 0.24 Depth 0.29 4.8e−4 0.35 1.5e−5 0.26

∆asa

0.22 8.9e−3 0.33 6.6e−5 0.25 Firedock score

  • 0.17

4.2e−2 0.20 1.8e−2 0.23 I_RMSD

  • 0.11

2.0e−1 0.17 4.3e−2 0.24 #Shells 0.092 2.7e−1

  • 0.16

5.4e−2 0.16 DISg 0.16 5.8e−2

  • 0.14

8.5e−2 0.24 Assymetry 0.045 5.9e−1

  • 0.094

2.6e−1 0.19 DISt 0.029 7.2e−1

  • 0.089

2.9e−1 0.20

TABLE: On the Affinity Benchmark (Kastritis et Al., 2011), the Internal Path Length yields the best correlation against the binding affinity (−lnKd).

Structural Comparison of Binding Patches 32/36

slide-33
SLIDE 33

Flexibility during docking

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.2 0.3 0.4 0.5 0.6 DIS_g DIS_t patch vs preimage 2jel_HL-P_HL 2i25_N-L_L 1iqd_AB-C_C y=x

FIGURE: For each pairs of patch/preimage of the affinity benchmark (Kastritis et Al., 2011), the topological similarity is plotted agains the geometrical one

According to the affinity benchmark, at the Ca level, all the presented cases are supposed to be easy ones, with i-rmsd of 0.17Å, 0.35Å and 0.48Å respectively.

Structural Comparison of Binding Patches 33/36

slide-34
SLIDE 34

Rigid patch

FIGURE: Left : Preprint (unbound form) ; Right : Patch (bound form). 2jel_HL :P chains HL is

  • rigid. Both the topology and the geometry is preserved between the unbound and bound form

(DISt = 0.026, and DISg = 0.058, associated with a RMSDd of 0.90Å).

Structural Comparison of Binding Patches 34/36

slide-35
SLIDE 35

Topo-Rigid patch

FIGURE: Left : Preprint ; Right : Patch. 2sni_E :I chain I is a flexible patch, where the topology is preserved, but not the geometry (DISt = 0.167, and DISg = 0.407, associated with a RMSDd

  • f 2.96Å).

Structural Comparison of Binding Patches 35/36

slide-36
SLIDE 36

Flexible patch

FIGURE: Left : Preprint ; Right : Patch. 1iqd_AB :C chain C is a flexible patch where neither the topology nor the geometry is preserved (DISt = 0.464, and DISg = 0.608, associated with a RMSDd of 2.46Å)

Structural Comparison of Binding Patches 36/36