SLIDE 1 Bruce Donald Laboratory
Jeff Martin
Geometric arrangement algorithms for protein structure determination
http://www.cs.duke.edu/donaldlab/
SLIDE 2
Protein structure determination Protein structure determination
Protein Synthesis
SLIDE 3 Protein structure determination Protein structure determination
Protein Synthesis
Primary sequence
ANNTTGFTRIIKAAGYSWKGLRAAWINEAAF RQEGVAVLLAVVIACWLDVDAITRVLLISSV MLVMIVEILNSAIEAVVDRIGSEYHELSGRAK DMGSAAVLIAIIVAVITWCILLWSHFG
SLIDE 4 Protein structure determination Protein structure determination
Protein Synthesis Protein Folding
Primary sequence
ANNTTGFTRIIKAAGYSWKGLRAAWINEAAF RQEGVAVLLAVVIACWLDVDAITRVLLISSV MLVMIVEILNSAIEAVVDRIGSEYHELSGRAK DMGSAAVLIAIIVAVITWCILLWSHFG
SLIDE 5 Protein structure determination Protein structure determination
Protein Synthesis Protein Folding
Primary sequence
Experimental measurements
ANNTTGFTRIIKAAGYSWKGLRAAWINEAAF RQEGVAVLLAVVIACWLDVDAITRVLLISSV MLVMIVEILNSAIEAVVDRIGSEYHELSGRAK DMGSAAVLIAIIVAVITWCILLWSHFG
d
SLIDE 6
Why solve protein structures? Why solve protein structures?
Structure determines function! Hence, studying structure helps understand function
SLIDE 7 How is protein function important? How is protein function important?
Disease mechanisms
MPER: HIV surface protein
P Reardon, et al., (in preparation)
SLIDE 8 How is protein function important? How is protein function important?
Disease mechanisms
MPER: HIV surface protein
T Zhou, I Georgiev, et al., Science, 2010
VRC01: HIV antibody
P Reardon, et al., (in preparation)
SLIDE 9 How is protein function important? How is protein function important?
New drugs Disease mechanisms
MPER: HIV surface protein
VRC01: HIV antibody
GrsA PheA: helps manufacture antibiotics
CY Chen, I Georgiev, et al., PNAS, 2009 T Zhou, I Georgiev, et al., Science, 2010 P Reardon, et al., (in preparation)
SLIDE 10
One protein function: a molecular assembly line One protein function: a molecular assembly line
The domains have specific tasks, but how do they work?
Protein Antibiotic
SLIDE 11
One protein function: a molecular assembly line One protein function: a molecular assembly line
Solving the structure for a domain shows us how it works.
Protein Antibiotic
SLIDE 12
One protein function: a molecular assembly line One protein function: a molecular assembly line
If we modify a domain, the protein could perform a new function.
Protein Antibiotic
SLIDE 13 Protein redesign relies on protein structure Protein redesign relies on protein structure
Cheng-Yu Chen, Ivelin Georgiev, Amy C. Anderson, Bruce R. Donald, Computational structure-based redesign of enzyme activity, PNAS 2009
Switched specificity from Phenylalanine to Leucine!
Prehaps we can engineer molecules to make new drugs?
Predicted binding model for Leucine
Redesign!
SLIDE 14
Experimental methods Experimental methods
Nuclear Magnetic Resonance spectroscopy (NMR)
Duke NMR Center Crystals of Insulin
X-ray Crystallography
For atomic-precision protein structure determination
d
NMR is often preferred because measurements are in solution state.
SLIDE 15 How are structures traditionally solved by NMR? How are structures traditionally solved by NMR?
- 13Cα chemical shifts
- 13Cβ chemical shifts
- 13CO chemical shifts
- 1Hα chemical shifts
- 1HN chemical shifts
- 3JHNHα couplings
- 1H-15N RDCs
- 15N R2
- Nitroxide spin-label PREs
- ATCUN PREs
- NOEs
- O2-induced 13C paramagnetic shifts
- Tryptophan indole solvent accessibility
Experimental Restraints
SLIDE 16 Geometric restraints on protein structure Geometric restraints on protein structure
Effect (NOE)
Enhancement (PRE)
Restraints on distance:
- 13Cα chemical shifts
- 13Cβ chemical shifts
- 13CO chemical shifts
- 1Hα chemical shifts
- 1HN chemical shifts
- 3JHNHα couplings
- 1H-15N RDCs
- 15N R2
- Nitroxide spin-label PREs
- ATCUN PREs
- NOEs
- O2-induced 13C paramagnetic shifts
- Tryptophan indole solvent accessibility
Experimental Restraints
SLIDE 17 Geometric restraints on protein structure Geometric restraints on protein structure
Couplings (RDCs) Restraints on orientation:
- 13Cα chemical shifts
- 13Cβ chemical shifts
- 13CO chemical shifts
- 1Hα chemical shifts
- 1HN chemical shifts
- 3JHNHα couplings
- 1H-15N RDCs
- 15N R2
- Nitroxide spin-label PREs
- ATCUN PREs
- NOEs
- O2-induced 13C paramagnetic shifts
- Tryptophan indole solvent accessibility
Experimental Restraints
SLIDE 18 Geometric restraints on protein structure Geometric restraints on protein structure
Couplings (RDCs) Restraints on orientation:
- 13Cα chemical shifts
- 13Cβ chemical shifts
- 13CO chemical shifts
- 1Hα chemical shifts
- 1HN chemical shifts
- 3JHNHα couplings
- 1H-15N RDCs
- 15N R2
- Nitroxide spin-label PREs
- ATCUN PREs
- NOEs
- O2-induced 13C paramagnetic shifts
- Tryptophan indole solvent accessibility
Experimental Restraints
SLIDE 19 How are structures traditionally solved by NMR? How are structures traditionally solved by NMR?
- 13Cα chemical shifts
- 13Cβ chemical shifts
- 13CO chemical shifts
- 1Hα chemical shifts
- 1HN chemical shifts
- 3JHNHα couplings
- 1H-15N RDCs
- 15N R2
- Nitroxide spin-label PREs
- ATCUN PREs
- NOEs
- O2-induced 13C paramagnetic shifts
- Tryptophan indole solvent accessibility
Simulated Annealing: Experimental Restraints
Molecular dynamics simulation and energy minimization
SLIDE 20
Structure determination of protein complexes Structure determination of protein complexes
traditionally uses simulated annealing as well
SLIDE 21
Simulated annealing is based on heuristics Simulated annealing is based on heuristics
Stochastic search
SLIDE 22
Simulated annealing is based on heuristics Simulated annealing is based on heuristics
Stochastic search Simulation & Minimization
SLIDE 23
Simulation & Minimization
Simulated annealing is based on heuristics Simulated annealing is based on heuristics
Stochastic search Convergence not guaranteed
SLIDE 24
Protein complexes are composed of subunits Protein complexes are composed of subunits
Homodimers have 2 identical subunits Homo-oligomers have n identical subunits
SLIDE 25
Structure determination of protein complexes Structure determination of protein complexes
using divide and conquer instead
SLIDE 26
Divide and conquer Divide and conquer
de novo structure determination
SLIDE 27
Divide and conquer Divide and conquer
de novo structure determination Oligomeric assembly
SLIDE 28
Related work in structure determination by solution NMR Related work in structure determination by solution NMR
Chris Bailey-Kellogg, et al., 2000 Wang and Donald, 2004 Wang, Mettu, and Donald, 2006 Zeng, Tripathy, Zhou, and Donald, 2008 Zeng, et al., 2009 Zeng, Zhou, and Donald, 2011 Zeng, Roberts, Zhou, Donald, 2011 Tripathy, Zeng, Zhou and Donald, 2011 Nilges, 1993 Nilges, 1995 Nilges, et al., 1997 Meiler, et al., 2000 Fowler, et al., 2000 Tian, Valafar, and Prestegard, 2001 Herrmann, Güntert, and Wüthrich, 2002 Wedemeyer, Rohl, and Scheraga, 2002 Rieping, et al., 2007 Bardiaux, et al., 2009 Heuristic
de novo determination
Provable Potluri, et al., 2006 Potluri, et al., 2007 Martin, Yan, Bailey-Kellogg, Zhou, and Donald Protein Science, 2011 Martin, Yan, Bailey-Kellogg, Zhou, and Donald J Comp Bio, 2011
Oligomeric assembly
Wang, Lozano-Pérez, and Tidor, 1998 Wang, Bansal, Jiang, and Prestegard, 2008
* * *
polynomial time algorithms
*
SLIDE 29 Full citations for Donald lab work Full citations for Donald lab work
Bailey-Kellogg C, Widge A, Kelley JJ, Berardi MJ, Bushweller JH, Donald BR. The NOESY jigsaw: automated protein secondary structure and main- chain assignment from sparse, unassigned NMR data. J Comput Biol. 2000;7(3-4):537-58. PMID: 11108478 Martin JW, Yan AK, Bailey-Kellogg C, Zhou P, Donald BR. A geometric arrangement algorithm for structure determination of symmetric protein homo-oligomers from NOEs and RDCs. J Comput Biol. 2011 Nov;18(11):1507-23. PMID: 22035328 Martin JW, Yan AK, Bailey-Kellogg C, Zhou P, Donald BR. A graphical method for analyzing distance restraints using residual dipolar couplings for structure determination of symmetric protein homo-oligomers. Protein Sci. 2011 Jun;20(6):970-85. doi: 10.1002/pro.620. PMID: 21413097 Donald BR, Martin J. Automated NMR Assignment and Protein Structure Determination using Sparse Dipolar Coupling Constraints. Prog Nucl Magn Reson Spectrosc. 2009 Aug 1;55(2):101-127. PMID: 20160991 Wang L, Donald BR. Exact solutions for internuclear vectors and backbone dihedral angles from NH residual dipolar couplings in two media, and their application in a systematic search algorithm for determining protein backbone structure. J Biomol NMR. 2004 Jul;29(3):223-42. PMID: 15213422. Wang L, Mettu RR, Donald BR. A polynomial-time algorithm for de novo protein backbone structure determination from nuclear magnetic resonance
- data. J Comput Biol. 2006 Sep;13(7):1267-88. PMID: 17037958.
Zeng J, Tripathy C, Zhou P, Donald BR. A Hausdorff-based NOE assignment algorithm using protein backbone determined from residual dipolar couplings and rotamer patterns. Comput Syst Bioinformatics Conf. 2008;7:169-81. PMID: 19642278. Zeng J, Boyles J, Tripathy C, Wang L, Yan A, Zhou P, Donald BR. High-resolution protein structure determination starting with a global fold calculated from exact solutions to the RDC equations. J Biomol NMR. 2009 Nov;45(3):265-81. Epub 2009 Aug 27. PMID: 19711185 Zeng J, Zhou P, Donald BR. Protein side-chain resonance assignment and NOE assignment using RDC-defined backbones without TOCSY data. J Biomol NMR. 2011 Aug;50(4):371-95. Epub 2011 Jun 25. PMID: 21706248 Zeng J, Roberts KE, Zhou P, Donald BR. A Bayesian approach for determining protein side-chain rotamer conformations using unassigned NOE
- data. J Comput Biol. 2011 Nov;18(11):1661-79. Epub 2011 Oct 4. PMID: 21970619
Tripathy C, Zeng J, Zhou P, Donald BR. Protein loop closure using orientational restraints from NMR data. Proteins. 2011 Sep 26. doi: 10.1002/prot.23207. PMID: 22161780 Potluri S, Yan AK, Chou JJ, Donald BR, Bailey-Kellogg C. Structure determination of symmetric homo-oligomers by a complete search of symmetry configuration space, using NMR restraints and van der Waals packing. Proteins. 2006 Oct 1;65(1):203-19. PMID: 16897780. Potluri S, Yan AK, Donald BR, Bailey-Kellogg C. A complete algorithm to resolve ambiguity for intersubunit NOE assignment in structure determination of symmetric homo-oligomers. Protein Sci. 2007 Jan;16(1):69-81. PMID: 17192589
SLIDE 30 Protein complexes tend to be symmetric Protein complexes tend to be symmetric
50–70% of known complexes are symmetric homo-oligomers
Levy, et.al. Assembly reflects evolution of protein complexes. Nature, 453(7199):1262–1265, June 2008.
SLIDE 31 Protein complexes tend to be symmetric Protein complexes tend to be symmetric
50–70% of known complexes are symmetric homo-oligomers
Levy, et.al. Assembly reflects evolution of protein complexes. Nature, 453(7199):1262–1265, June 2008.
SLIDE 32
Protein oligomers with cyclic symmetry Protein oligomers with cyclic symmetry
cyclic symmetry (C5) (C3)
SLIDE 33 Benefits of symmetry Benefits of symmetry
cyclic symmetry (C5) (C3)
If the subunit structure is known, the
- ligomer structure is completely
specified by the symmetry axis
Just need orientation and position relative to subunit
SLIDE 34
Oligomeric assembly using DISCO Oligomeric assembly using DISCO
Assemble oligomer using the symmetry Compute symmetry axis orientation Compute symmetry axis position
SLIDE 35
Oligomeric assembly using DISCO Oligomeric assembly using DISCO
Assemble oligomer using the symmetry Compute symmetry axis orientation Compute symmetry axis position
SLIDE 36 Geometric restraints on protein structure Geometric restraints on protein structure
Couplings (RDCs) Restraints on orientation:
Principal order frame
SLIDE 37 Geometric restraints on protein structure Geometric restraints on protein structure
Couplings (RDCs) Restraints on orientation:
Principal order frame
SLIDE 38
Geometry of RDC curves Geometry of RDC curves
RDC measurement (scalar) Bond orientation (vector) Alignment tensor (matrix)
SLIDE 39
Geometry of RDC curves Geometry of RDC curves
RDC measurement (scalar) Bond orientation (vector) Alignment tensor (matrix) Rotation Scaling
SLIDE 40
Geometry of RDC curves Geometry of RDC curves
RDC measurement (scalar) Bond orientation (vector) Alignment tensor (matrix) Rotation Scaling After rotation:
SLIDE 41
Geometry of RDC curves Geometry of RDC curves
RDC measurement (scalar) Bond orientation (vector) Alignment tensor (matrix) Rotation Scaling After rotation:
SLIDE 42
Geometry of RDC curves Geometry of RDC curves
RDC measurement (scalar) Bond orientation (vector) Alignment tensor (matrix) Rotation Scaling After rotation: Quadric surface!
SLIDE 43
Solving for symmetry axis orientation Solving for symmetry axis orientation
Molecular frame Principal order frame (defined by )
SLIDE 44
Solving for symmetry axis orientation Solving for symmetry axis orientation
Molecular frame
is unknown
Principal order frame (defined by )
SLIDE 45
Solving for symmetry axis orientation Solving for symmetry axis orientation
Molecular frame
is unknown
Principal order frame (defined by )
Can solve for (and ) with at least 5 RDCs
SLIDE 46
Solving for symmetry axis orientation Solving for symmetry axis orientation
Molecular frame Principal order frame z-axis of alignment tensor is parallel to the symmetry axis
is unknown Can solve for (and ) with at least 5 RDCs
SLIDE 47
Oligomeric assembly using DISCO Oligomeric assembly using DISCO
Assemble oligomer using the symmetry Compute symmetry axis orientation Compute symmetry axis position
SLIDE 48 Geometric restraints on protein structure Geometric restraints on protein structure
Effect (NOE)
Enhancement (PRE)
Restraints on distance:
SLIDE 49
Solving for the symmetry axis position Solving for the symmetry axis position
Consider the geometry of an inter-subunit distance restraint:
SLIDE 50
Solving for the symmetry axis position Solving for the symmetry axis position
Consider the geometry of an inter-subunit distance restraint:
SLIDE 51
Solving for the symmetry axis position Solving for the symmetry axis position
Consider the geometry of an inter-subunit distance restraint:
SLIDE 52
Analytical solution for the green annulus Analytical solution for the green annulus
SLIDE 53
Symmetry causes uncertainty Symmetry causes uncertainty
For homo-trimers and higher, subunit assignments for distance restraints are not known Subunit ambiguity
SLIDE 54
Symmetry causes uncertainty Symmetry causes uncertainty
For homo-trimers and higher, subunit assignments for distance restraints are not known Subunit ambiguity
SLIDE 55
Encoding subunit ambiguity Encoding subunit ambiguity
Each assignment generates a constraint annulus Annuli are combined using set union
SLIDE 56
Analysis of multiple distance restraints Analysis of multiple distance restraints
DISCO computes the Maximally Satisfying Regions by analyzing the arrangement of the unions of annuli
SLIDE 57
Efficient computation of the MSRs Efficient computation of the MSRs
Compute the arrangement from the circular curves bounding the annuli in CGAL
SLIDE 58
Efficient computation of the MSRs Efficient computation of the MSRs
Compute face depths using BFS Search dual graph
SLIDE 59
Efficient computation of the MSRs Efficient computation of the MSRs
SLIDE 60
Efficient computation of the MSRs Efficient computation of the MSRs
SLIDE 61
Efficient computation of the MSRs Efficient computation of the MSRs
SLIDE 62
Efficient computation of the MSRs Efficient computation of the MSRs
SLIDE 63
Efficient computation of the MSRs Efficient computation of the MSRs
SLIDE 64
Polynomial time computation of the MSRs Polynomial time computation of the MSRs
Let n be the number of distance restraints Let a be the max number of assignments per distance restraint There are O(an) curves in the arrangement
Search dual graph
SLIDE 65
Polynomial time computation of the MSRs Polynomial time computation of the MSRs
O(a2n2) faces O(a2n2) edges Let n be the number of distance restraints Let a be the max number of assignments per distance restraint There are O(an) curves in the arrangement
Search dual graph
Expected O(a2n2) time
SLIDE 66
Polynomial time computation of the MSRs Polynomial time computation of the MSRs
O(a2n2) nodes O(a2n2) edges O(a2n2) faces O(a2n2) edges Let n be the number of distance restraints Let a be the max number of assignments per distance restraint There are O(an) curves in the arrangement
Search dual graph
Expected O(a2n2) time
SLIDE 67
Polynomial time computation of the MSRs Polynomial time computation of the MSRs
Compute face depths using BFS in expected O(a2n2) time O(a2n2) nodes O(a2n2) edges O(a2n2) faces O(a2n2) edges Let n be the number of distance restraints Let a be the max number of assignments per distance restraint There are O(an) curves in the arrangement
Search dual graph
Expected O(a2n2) time
SLIDE 68
Polynomial time computation of the MSRs Polynomial time computation of the MSRs
Compute face depths using BFS in expected O(n2) time O(n2) nodes O(n2) edges O(n2) faces O(n2) edges Let n be the number of distance restraints Let a be bounded by a small constant There are O(n) curves in the arrangement
Search dual graph
Expected O(n2) time
SLIDE 69
Oligomeric assembly using DISCO Oligomeric assembly using DISCO
Assemble oligomer using the symmetry Compute symmetry axis orientation Compute symmetry axis position
SLIDE 70 Oligomeric assembly using DISCO Oligomeric assembly using DISCO
Compute symmetry axis position
DISCO guarantees:
- Compute all satisfying symmetry axis positions
- Runs in polynomial time
SLIDE 71 Validation using experimental data Validation using experimental data
– DAGK – Membrane protein Homotrimer 121x3 residues 67 orientational restraints 23 distance restraints
Van Horn et al., 2009
Ground Truth aka Reference Structure
PDB: 2KDC Model 1
SLIDE 72
Symmetry axis orientations computed by DISCO Symmetry axis orientations computed by DISCO
Using 67 restraints on orientation Using 67 restraints on orientation
SLIDE 73
Symmetry axis positions computed by DISCO Symmetry axis positions computed by DISCO
Using 23 uncertain distance restraints Using 23 uncertain distance restraints
SLIDE 74
Individual structure evaluation Individual structure evaluation
SLIDE 75
Ensemble of structures computed by DISCO Ensemble of structures computed by DISCO
Reference DISCO
All structures within 1.5 Å backbone RMSD
SLIDE 76 Acknowledgements Acknowledgements
We are very grateful for funding from:
The National Institutes of Health
DISCO is open source software
SLIDE 77