The interval branch-and-prune algorithm for the protein structure - - PowerPoint PPT Presentation

the interval branch and prune algorithm for the protein
SMART_READER_LITE
LIVE PREVIEW

The interval branch-and-prune algorithm for the protein structure - - PowerPoint PPT Presentation

Introduction Global optimisation The interval branch-and-prune algorithm for the protein structure determination by NMR Th er` ese E Malliavin Unit e de Bioinformatique Structurale Institut Pasteur and UMR CNRS 3528 Paris, France


slide-1
SLIDE 1

Introduction Global optimisation

The interval branch-and-prune algorithm for the protein structure determination by NMR

Th´ er` ese E Malliavin

Unit´ e de Bioinformatique Structurale Institut Pasteur and UMR CNRS 3528 Paris, France

DIMACS Workshop on Distance Geometry: Theory and Applications Rutgers University, 26-29 July 2016

Th´ er` ese E Malliavin Protein structure

slide-2
SLIDE 2

Introduction Global optimisation Structural biology Order/disorder in structural biology

Experimental techniques for structural biology

MOLECULAR SIZE

D I S O R D E R

Cryo-EM Microscopy Nuclear Magnetic Resonance X-ray crystallography Small-angle X-ray scattering

Th´ er` ese E Malliavin Protein structure

slide-3
SLIDE 3

Introduction Global optimisation Structural biology Order/disorder in structural biology

The infancy of structural biology

kathryngamer.co.uk/blog Model of pig insulin: Dorothy Hodgkin, 1967 (Image credit: Science Museum). Sausage model of myoglobin, John Kendrew, 1957 (Image credit: Science Museum). John Kendrew with the ‘forest of rods’ model of myoglobin (Image credit: MRC Laboratory of Molecular Biology)

Th´ er` ese E Malliavin Protein structure

slide-4
SLIDE 4

Introduction Global optimisation Structural biology Order/disorder in structural biology

Structural bioinformatics

Th´ er` ese E Malliavin Protein structure

slide-5
SLIDE 5

Introduction Global optimisation Structural biology Order/disorder in structural biology

Low ordered or disordered biomolecular structure

« Unique » conformation Conformational exchange Instrinsically Disordered Proteins (IDP)

nitrogen regulatory protein C (NtrC). Vanatta...Pande, Nat Comm 2014 Bobela... Schneider, Biomolecules 2015 Conotoxin, 1IEN Sharp...Lewis Nat Neurosc 2001

X-ray crystallography Nuclear Magnetic Resonance (solid-state, solution) Small-angle X-ray scattering Fluorescence resonance energy transfer

Th´ er` ese E Malliavin Protein structure

slide-6
SLIDE 6

Introduction Global optimisation Structural biology Order/disorder in structural biology

Exploration of AC conformational space

AC inactive state Larger range of explored gyration radii

Active enzyme Inactive enzyme Bordetalla pertussis

* ?

Dynamic light scattering Karst et al, Biochem 2010

Cortes-Ciriano, Bouvier, Nilges, Maragliano, Malliavin. Temperature Accelerated Molecular Dynamics with Soft-Ratcheting Criterion Orients Enhanced Sampling by Low-Resolution Information. J Chem Theory Comput 2015. Th´ er` ese E Malliavin Protein structure

slide-7
SLIDE 7

Introduction Global optimisation Structural biology Order/disorder in structural biology

Exploration of AC conformational space

AC inactive state Larger range of explored gyration radii Clustered compact conformations

26.0 25.5 25.0 24.5 24.0 23.5 23.0 22.5 22.0

Rg(Å)

1 10 9 8 7 6 5 4 3 2

Cortes-Ciriano, Bouvier, Nilges, Maragliano, Malliavin. Temperature Accelerated Molecular Dynamics with Soft-Ratcheting Criterion Orients Enhanced Sampling by Low-Resolution Information. J Chem Theory Comput 2015. Th´ er` ese E Malliavin Protein structure

slide-8
SLIDE 8

Introduction Global optimisation Structural biology Order/disorder in structural biology

Exploration of AC conformational space

AC inactive state Larger range of explored gyration radii Clustered compact conformations 1 10 9 8 7 6 5 4 3 2

Cortes-Ciriano, Bouvier, Nilges, Maragliano, Malliavin. Temperature Accelerated Molecular Dynamics with Soft-Ratcheting Criterion Orients Enhanced Sampling by Low-Resolution Information. J Chem Theory Comput 2015. Th´ er` ese E Malliavin Protein structure

slide-9
SLIDE 9

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Global optimisation Global optimization is distinguished from regular optimization by its focus on finding the maximum or minimum over all input values, as

  • pposed to finding

local minima or maxima

Martin, Zhou, Donald. Systematic solution to homo-ligomeric structures determined by NMR Proteins 2015. Th´ er` ese E Malliavin Protein structure

slide-10
SLIDE 10

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Distance Geometry problem

Problem Spheres intersection

Malliavin, Mucherino, Nilges. Distance geometry in structural biology: new perspectives in: Distance Geometry: Theory, Methods and Applications, Mucherino, Lavor, Liberti, Maculan (Eds.), Springer 2013. Th´ er` ese E Malliavin Protein structure

slide-11
SLIDE 11

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Distance Geometry problem

Problem Spheres intersection Tree: branch and prune

C●

  • HN

CA● d1 d2 d3

  • Search for the position of an atom X such that:

dist(X,C) = d1, dist(X,HN) = d2, dist(X,CA) = d3

Malliavin, Mucherino, Nilges. Distance geometry in structural biology: new perspectives in: Distance Geometry: Theory, Methods and Applications, Mucherino, Lavor, Liberti, Maculan (Eds.), Springer 2013. Th´ er` ese E Malliavin Protein structure

slide-12
SLIDE 12

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Distance Geometry problem

Problem Spheres intersection Tree: branch and prune Atoms recursive

  • rdering

Malliavin, Mucherino, Nilges. Distance geometry in structural biology: new perspectives in: Distance Geometry: Theory, Methods and Applications, Mucherino, Lavor, Liberti, Maculan (Eds.), Springer 2013. Th´ er` ese E Malliavin Protein structure

slide-13
SLIDE 13

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Distance Geometry problem

Problem Spheres intersection Tree: branch and prune Atoms recursive

  • rdering

Malliavin, Mucherino, Nilges. Distance geometry in structural biology: new perspectives in: Distance Geometry: Theory, Methods and Applications, Mucherino, Lavor, Liberti, Maculan (Eds.), Springer 2013. Th´ er` ese E Malliavin Protein structure

slide-14
SLIDE 14

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Atom recursive ordering and branching distances

Distances corresponding to chemical bond, or to 2 chemical bonds connected by a bond angle Distances that can be exact or interval

i i-1 i-2 i-3

Case without cycle Case with cycle

I i = i -3 i-1 i-2

di-3,i = 0 di-3,i-2 = di,i-2 cos(ω) = 1 sin(ω) = 0

Atom ordering and branching

i i i-1 i-2 i-3 Positive sin(ω) Negative sin(ω)

Th´ er` ese E Malliavin Protein structure

slide-15
SLIDE 15

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Helical peptides with long-range distance restraints

Exploration of conformations using few long-range distance restraints Superposition to Protein Data Bank target structures

2KXA (aa) 2KSL (aaaa)

Cassioli, Bardiaux, Bouvier, Mucherino, Alves, Liberti, Nilges, Lavor C, Malliavin TE. BMC Bioinformatics 2015 Th´ er` ese E Malliavin Protein structure

slide-16
SLIDE 16

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Helical peptides with long-range distance restraints

Exploration of conformations using few long-range distance restraints Superposition to Protein Data Bank target structures

N C N C 2KSL 2KXA

PROCHECK 2KXA 2KSL Core 63.6 75.5 Allowed 36.4 14.2 Generously Allowed 0.0 9.2 Disallowed 0.0 1.0 Cassioli, Bardiaux, Bouvier, Mucherino, Alves, Liberti, Nilges, Lavor C, Malliavin TE. BMC Bioinformatics 2015 Th´ er` ese E Malliavin Protein structure

slide-17
SLIDE 17

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Validation set of proteins

PDB Nres SecStruct EXP 1CEY 128 ababababab NMR 2F05 85 aaaa NMR 2KSL 51 aaaa NMR 2KXA 24 aa NMR 2LJ0 65 bbbbb NMR 2LVR 30 bba NMR 2LXZ 32 bbb NMR 2M5X 40 bbab NMR 2MC6 73 bbba NMR 2MDI 56 bbb NMR 2MGV 65 abbb NMR 2MH2 64 abaabb NMR PDB Nres SecStruct EXP 2MJ6 90 aabbbb NMR 2MLA 37 babb NMR 2MNI 92 baabba NMR 2MP1 77 bbba NMR 2MW9 33 bbb NMR 2MXE 47 bbaab NMR 2N17 56 bbab NMR 2N2Q 54 babb NMR 2RUP 58 bbb NMR 4BYA 75 aaaa NMR 4OU0 66 abaabb XR 4RBX 32 bbb XR

Calculations using exact distances and intervals, determined from a limited informative set of restraints.

Th´ er` ese E Malliavin Protein structure

slide-18
SLIDE 18

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Short-range exact distance are used H C O CB CB

Exact branching distances Residue i Residue i+1 Exact pruning distances: involved residues

Ap = 5,4: (i,i) (i,i+1) Ap = 6,4: (i,i) (i,i+1) Ap = 7,4: (i,i) (i,i+1) Ap = 8,4: (i,i) (i,i+1) (i,i+2) Ap = 9: (i,i+1) Ap = 10: (i,i+1) (i,i+2) Ap = 11: (i,i+1) (i,i+2) Ap = 12: (i,i+1) (i,i+2)

Th´ er` ese E Malliavin Protein structure

slide-19
SLIDE 19

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Calculation scheme for exact distances

Pruning distance (exact distances) Branching distances (exact d14 distances)

ibp

Set

  • f

protein conformations Filtering RMSD > 2 Å

Found: One conformation found by iBP has an RMSD < 2˚ A w.r.t. the PDB structure. Mirror found: One conformation found by iBP has an RMSD < 2˚ A w.r.t. the mirror image of the PDB structure. Partially found: At least 50% of the structure of one conformation found by iBP has an RMSD < 2˚ A w.r.t. the PDB structure. Not found: None of the previous cases.

Th´ er` ese E Malliavin Protein structure

slide-20
SLIDE 20

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Detection of PDB structure if five conformations are generated

Restraint types i,i+1 i,i+1,i+2 i,i+1 i,i+1,i+2 PDB Nres SecStruct p=5,4 p=6,4 p=7,4 p=8,4 p=9 p=10 p=11 p=12 1CEY 128 ababababab 2F05 85 aaaa 2KSL 51 aaaa 2KXA 24 aa 2LJ0 65 bbbbb 2LVR 30 bba 2LXZ 32 bbb 2M5X 40 bbab 2MC6 73 bbba 2MDI 56 bbb 2MGV 65 abbbb 2MH2 64 abaabb 2MJ6 90 aabbbb 2MLA 37 babb 2MNI 92 baabba 2MP1 77 bbba 2MW9 33 bbb 2MXE 47 bbaab 2N17 56 bbab 2N2Q 54 babb 2RUP 58 bbb 4BYA 75 aaaa 4OU0 66 abaabb 4RBX 32 bbb

Not found Partially found Found Mirror found

Th´ er` ese E Malliavin Protein structure

slide-21
SLIDE 21

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Detection of PDB structure if chirality is switched off

Restraint types i,i+1 i,i+1,i+2 i,i+1 i,i+1,i+2 PDB Nres SecStruct p=5,4 p=6,4 p=7,4 p=8,4 p=9 p=10 p=11 p=12 1CEY 128 ababababab 2F05 85 aaaa 2KSL 51 aaaa 2KXA 24 aa 2LJ0 65 bbbbb 2LVR 30 bba 2LXZ 32 bbb 2M5X 40 bbab 2MC6 73 bbba 2MDI 56 bbb 2MGV 65 abbbb 2MH2 64 abaabb 2MJ6 90 aabbbb 2MLA 37 babb 2MNI 92 baabba 2MP1 77 bbba 2MW9 33 bbb 2MXE 47 bbaab 2N17 56 bbab 2N2Q 54 babb 2RUP 58 bbb 4BYA 75 aaaa 4OU0 66 abaabb 4RBX 32 bbb

Not found Partially found Found Mirror found

Th´ er` ese E Malliavin Protein structure

slide-22
SLIDE 22

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Detection of PDB structure if five conformations are generated

Restraint types i,i+1 i,i+1,i+2 i,i+1 i,i+1,i+2 PDB Nres SecStruct p=5,4 p=6,4 p=7,4 p=8,4 p=9 p=10 p=11 p=12 1CEY 128 ababababab 2F05 85 aaaa 2KSL 51 aaaa 2KXA 24 aa 2LJ0 65 bbbbb 2LVR 30 bba 2LXZ 32 bbb 2M5X 40 bbab 2MC6 73 bbba 2MDI 56 bbb 2MGV 65 abbbb 2MH2 64 abaabb 2MJ6 90 aabbbb 2MLA 37 babb 2MNI 92 baabba 2MP1 77 bbba 2MW9 33 bbb 2MXE 47 bbaab 2N17 56 bbab 2N2Q 54 babb 2RUP 58 bbb 4BYA 75 aaaa 4OU0 66 abaabb 4RBX 32 bbb

Not found Partially found Found Mirror found

Th´ er` ese E Malliavin Protein structure

slide-23
SLIDE 23

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Conformational exploration for short-range restraints

PDB nbres SecStruct rank RMSD Number of Number of CPU (˚ A) saved parsed leaves conformations 2F05 85 aaaa 5.37×103 0.19 3.75×107 5.36×108 3d20h 2M5X 40 bbab 1.66×103 0.08 4.09×103 1.63×105 10s 2MGV 65 abbb 5.07×102 0.09 5.12×102 1.63×105 19s 2MXE 47 bbaab 1.92×103 0.11 2.04×103 3.27×103 30s 2N17 56 bbab 2.04×105 0.17 106 > 9 × 106 6d5h 2N2Q 54 babb 1.77×103 0.42 1.79×103 2.68×108 1d19h 2RUP 58 bbb 22 0.09 64 64 9s 4BYA 75 aaaa 9.99×103 0.19 104 > 8 × 105 90s 4OU0 66 abaabb 1.10×105 0.12 106 > 1 × 107 21mn

Pruning distances p=8,4: (i,i) (i,i+1) (i,i+2)

Th´ er` ese E Malliavin Protein structure

slide-24
SLIDE 24

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

One example of exploration

Bouvier,Desdouits,Ferber,Blondel,Nilges.Bioinformatics 2015 Bouvier,Duclert-Savatier,Desdouits,Meziane-Cherif,Blondel, Courvalin,Nilges,Malliavin.JCIM 2014 Th´ er` ese E Malliavin Protein structure

slide-25
SLIDE 25

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Which distance intervals? H C O CB CB

Exact branching distances due to bond and bond angle geometry Interval branching distances

H-i/HA-i, HA-i/N-(i+1), CA-i/CA-(i+1)

Residue i Residue i+1 Interval pruning distances between Cα atoms

Th´ er` ese E Malliavin Protein structure

slide-26
SLIDE 26

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Strict branching intervals obtained using ω distributions in secondary structure elements

cos(ω) sin(ω) HA-i/N-(i+1) H [-1,-0.9] + HA-i/N-(i+1) E [0.9,1] +/- HA-i/N-(i+1) L [-1,1] +/- H-i/HA-i H [-0.75,-0.4] - H-i/HA-i E [-1,-0.9] +/- H-i/HA-i L [-1,-0.9] +/- CA-i/CA-(i+1) H [-1,-0.97] + CA-i/CA-(i+1) E [-1,-0.97] + CA-i/CA-(i+1) L [-1,-0.97] + N-i/C-i H bond

  • N-i/C-i

E bond

  • N-i/C-i

L bond

  • N-i/H-i

H bond + N-i/H-i E bond + N-i/H-i L bond +

Th´ er` ese E Malliavin Protein structure

slide-27
SLIDE 27

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Distribution of distances between atoms Cα

5 10 15 20 5 10 15 20 2KXA 5 10 15 20 25 30 10 20 30 Distance (A) Difference residues 2LVR 5 10 15 20 25 30 10 20 30 Distance (A) Difference residues 2MW9 10 20 30 40 50 10 20 30 Distance (A) Difference residues 2KSL 5 10 15 20 25 30 10 20 30 Distance (A) Difference residues 4RBX 5 10 15 20 25 30 10 20 30 Distance (A) Difference residues 2LXZ 5 10 15 20 25 30 35 5 15 25 Distance (A) Difference residues 2MLA 20 40 60 10 30 Distance (A) Difference residues 4BYA 10 20 30 40 50 60 10 30 Distance (A) Difference residues 2MH2 10 20 30 40 50 60 10 20 30 Distance (A) Difference residues 4OU0

Th´ er` ese E Malliavin Protein structure

slide-28
SLIDE 28

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Distribution of distances between atoms Cα

Th´ er` ese E Malliavin Protein structure

slide-29
SLIDE 29

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Distribution of distances between atoms Cα

Th´ er` ese E Malliavin Protein structure

slide-30
SLIDE 30

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Calculation scheme for intervals

ibp

Set

  • f

protein conformations Filtering RMSD > 4 Å Branching distances (interval d14 with discretization 0.4 Å, Strict SecStr) Pruning distances (Cα-Cα distance intervals from protein global shape)

Th´ er` ese E Malliavin Protein structure

slide-31
SLIDE 31

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Conformational exploration for short-range restraints

PDB nbres SecStruct rank RMSD Number of Number of CPU (˚ A) saved parsed leaves conformations 2KXA 24 aa 12 2.36 512 3.27×105 1 mn 2LVR 30 bba 7.95×104 2.23 200000 2.88×108 15h55mn 2MW9 33 bbb 1.86×105 3.93 200000 3.14×108 11h35mn 2KSL 51 aaaa 1.91×105 4.39 200000 8.49×107 16h20mn 4RBX 32 bbb 1.60×105 4.77 200000 1.84×109 4d15h40mn 2LXZ 32 bbb 1.48×104 5.69 200000 2.15×109 5d8hrs 2MLA 37 babb 1.92×104 5.33 200000 2.68×109 6d9h50mn 4OU0 66 abaabb 1.05×105 6.55 200000 1.64×109 6d14h20mn 2MH2 64 abaabb 1.44×104 8.48 200000 1.34×109 6d1h55mn 4BYA 75 aaaa 1.73×104 8.68 200000 1.05×107 8h18mn

Th´ er` ese E Malliavin Protein structure

slide-32
SLIDE 32

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Some examples of calculations with intervals and pruning

2KSL Aaaaa 51 res Conformation 191546 RMSD = 4.4 Å 2KXA Aaa 24 res Conformation 12 RMSD = 2.4 Å 2LVR Abba 30 res Conformation 79488 RMSD = 2.2 Å 2MW9 Abbb 33 res Conformation 186093 RMSD = 3.9 Å 4OU0 Aabaabb 66 res Conformation 105084 RMSD = 6.5 Å Th´ er` ese E Malliavin Protein structure

slide-33
SLIDE 33

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Some examples of calculations with intervals and pruning

2LXZ Abbb 32 res Conformation 14812 RMSD = 5.7 Å 2MH2 Aabaabb 64 res Conformation 14410 RMSD = 5.6 Å (res 20-63) 2MLA Ababb 37 res Conformation 19241 RMSD = 4.7 Å (res 11-35) 4BYA Aaaaa 75 res Conformation 17316 RMSD = 4.0 Å (res 40-74) 4RBX Abbb 32 res Conformation 160089 RMSD = 5.7 Å Th´ er` ese E Malliavin Protein structure

slide-34
SLIDE 34

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Problem complexity

  • 20

40 60 80 100 120 50 100 150 200

  • exact distances without pruning

intervals without pruning strict intervals without pruning 10^(1.3*nres) 10^(1.5*nres) 10^(0.7*nres)

Number of residues (nres) Log10(Number_of_leaves)

discretization factor: 0.4 ˚ A Th´ er` ese E Malliavin Protein structure

slide-35
SLIDE 35

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Problem complexity

N= maximum number of leaves without pruning ns= number of parsed leaves using pruning NF= number of leaves to be parsed without pruning

Number of parsed leaves without pruning = N/NF Reduction factor due to pruning: F = ns/(N/NF) Predicted maximum number of leaves with pruning = F. N

discretization factor: 0.4 ˚ A Th´ er` ese E Malliavin Protein structure

slide-36
SLIDE 36

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Problem complexity

  • 20

40 60 80 100 120 50 100 150 200

  • exact distances without pruning

intervals without pruning strict intervals without pruning CA−CA pruning 10^(1.3*nres) 10^(1.5*nres) 10^(0.7*nres) 10^(0.3*nres)

Number of residues (nres) Log10(Number_of_leaves)

discretization factor: 0.4 ˚ A Th´ er` ese E Malliavin Protein structure

slide-37
SLIDE 37

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Conclusions and perspectives

ibp provides an algorithm designed for the systematic exploration of the configuration space of protein conformation ibp is efficient for the exploration of protein conformational space with limited sets of exact short-range inter-atomic distances the case of distance intervals could be addressed using stricty definition of secondary structure elements coupled to pruning based on global distance distributions

Th´ er` ese E Malliavin Protein structure

slide-38
SLIDE 38

Introduction Global optimisation interval branch and prune Efficiency with exact distances Interval distances and folded proteins

Acknowledgments

Michael Nilges

  • I. Pasteur

Leo Liberti

  • E. Polytechnique

Carlile Lavor UNICAMP Andrea Cassioli

  • E. Polytechnique

Benjamin Bardiaux

  • I. Pasteur

Funding : ANR bip:bip, ERC BayCells; Pasteur Fondation, CNRS, I. Pasteur, Bradley Worley

  • I. Pasteur

Guillaume Bouvier

  • I. Pasteur

Mohamed Machat

  • I. Pasteur

Antonio Mucherino

  • U. Rennes

Th´ er` ese E Malliavin Protein structure