SLIDE 1 The computational Core of Molecular Modeling. (The What? Why? And How? ) Alexey Onufriev, Dept of Computer Science;
- Dept. of Physics and the GBCB program.
CS Seminar. Feb 09.
SLIDE 2 Thanks to:
Faculty co-authors: T.M. Murali, M. Prisant, L. Heath, C. Simmerling Student co-authors:
- J. Gordon, J. Myers, A. Fenley, J. Ruscio,
- R. Anandakrishnan, D. Kumar, M. Shukla, V. Sojia
Sponsors: NIH, VT. Support: System-X team,
- N. Polys (myoglobin graphics)
SLIDE 3
- a. Rational Drug Design
- b. From atomic motion to biological function
- c. The “grand challenge” of computational
science: the Protein Folding Problem.
Will focus on the computational side
- f solving problems in the following areas:
SLIDE 4
in vivo in vitro in virtuo The emergence of “in virtuo” Science.
SLIDE 5 Biological function = f( 3D molecular structure )
…A T G C … DNA sequence Protein structure Bilogical function
Key challenges: Biomolecular structures are complex (e.g. compared to crystal solids). Biology works on many time scales. Experiments can only go so far.
A solution: Computational methods. Model movements
How can we predict/understand/modify function if we know the structure? Can we predict the structure?
SLIDE 6 “All things are made of atoms, and everything that living things do can be understood in terms of the wigglings and jigglings of atoms”
Suggests the approach: model what nature does, i.e. let the molecule evolve with time according to underlying physics laws.
The paradigm:
SLIDE 7 A protein
surface. Atomic resolution
SLIDE 8 Theme 1. Rational (structure- based) design of new medicines:
Picture: Design of a HIV protease inhibitor. Hornak, V.; Okur, A., Rizzo, R. and Simmerling, C.,
“HIV-1 protease flaps spontaneously open and reclose in molecular dynamics simulations”,
- Proc. Nat. Acad. Sci. USA, 103:915-920 (2006)
SLIDE 9
Example: rational drug design.
Drug agent e.g:viral protease (chops up proteins) If you block the enzymes function – you kill the virus.
SLIDE 10 Example of successful computer-aided (rational) drug design: One of the drugs that helped slow down the AIDS epidemic
(part of anti-retro viral cocktail).
The drug blocks the function of a key viral protein. To design the drug, one needs a precise 3D structure of that protein.
SLIDE 11 A computational challenge:
Need high quality, complete protein
- structures. But the experiment (X-
ray) does not ``see” hydrogen atoms. These have to be placed computationally.
SLIDE 12 +
X1 X2 X3
+ +
W23 ΔGk(pH) = Σ
xi k (kT ln10 pH - Δ Ei calc) + 1/2ΣΣ
ΣΣxi
kxj k Wij
N N N
i i j
Combinatorial explosion problem: A molecule with N sites, each of which can be occupied by an H+ atom (or be empty), has 2N possible states.
All of the possible charge (protonation) states must be taken into
- account. For a typical protein with 100
ionizable amino-acids this means 2100 ~ 1030 possible variants to consider! And we need to find the minimum energy state among them (for the experts: what we really
need is, of course, the partition function Z, see below. Which is an even more computationally demanding job)
Xi = 1 or 0 (occupied or empty in state k) k – protonation state (out of 2N ) Free energy Matrix of site-site interactions.
Z = Σall states exp(-ΔGk/kT)
trouble
SLIDE 13 Beating the combinatorics: the clustering approach. Total number of protonation states = 2N = 64 Cluster 1, N=3 Cluster 2, N=3 Solution: neglect the ``weak” edges. Cluster the strong ones. After clustering: Total number of states = 23 + 23 = 16 < 64. Example: a protein with N = 6 ionizable groups.
Every site interacts with every other site, a complete graph
SLIDE 14 http://biophysics.cs.vt.edu/H++
A web-server that adds hydrogens to molecular
- structures. Launched by Onufriev’s group in
June 2005. ~1000 registered users since.
SLIDE 15
Intrigued?
Suggested reading: “The Many roles of computation in drug discovery”, W. Jorgensen, Sceince 303, 1813 (2004). + references therein.
SLIDE 16 THEME II The protein folding challenge.
MET—ALA—ALA—ASP—GLU—GLU--….
Experiment: amino acid sequence uniquely determines proteinʼs 3D shape (ground state).
Nature does it all the time. Can we?
Amino-acid sequence – translated genetic code.
How?
Why bother: proteinʼs shape determines its biological function.
SLIDE 17
Protein Structure in 3 steps. Amino-acid #1 Amino-acid #2 Peptide bond Step 1. Two amino-acids together (di-peptide)
SLIDE 18
Step 2: Most flexible degrees of freedom:
Protein Structure in 3 steps.
SLIDE 19 φ1 φ2 φ3 φ4 A protein is simply a chain of amino-acids: Each configuration {Φ1, Φ2, …Φκ} has some
- energy. The folded (biologically functional) protein
has the lowest possible energy - global minimum. So just find this conformation by some kind of a minimization algorithm… what’s he big deal?
SLIDE 20 The magnitude of the protein folding challenge:
φ1 φ2 φ3 φ4
A typical protein is a chain of ~ 100 mino acids. Assume that each amino acid can take up only 10 conformations (vast underestimation) Total number of possible conformations: 10100 Suppose each energy estimate is just 1 float point operation. Suppose you have a Penta-Flop supercomputer.
An exhaustive search for the global minimum would take 1085 seconds ~ 3*1078 years. Age of the Universe ~ 2*1010 years.
Enormous number of the possible conformations of the polypeptide chain
SLIDE 21 Finding a global minimum in a multidimensional case is easy only when the landscape is smooth. No matter where you start (1, 2 or 3), you quickly end up at the bottom -
- the Native (N), functional state of
the protein.
Free energy Folding coordinate
1 2 3
Adopted from Ken Dill’s web site at UCSF
SLIDE 22 Realistic landscapes are much more complex, with multiple local minima – folding traps. Proteins “trapped” in those minima may lead to disease, such as Altzheimer’s
Adopted from Ken Dill’s web site at UCSF
SLIDE 23 Adopted from Dobson, NATURE 426, 884 2003
SLIDE 24 Since minimization won’t work, choose an alternative.
Do what Nature does: just let it fold
- n its own, at normal temperature.
Method: Molecular Dynamics
SLIDE 25 Each atom moves by Newton’s 2nd Law: F = ma
E =
+
x
Y Principles of Molecular Dynamics (MD): F = dE/dr
System’s energy
Kr2
Bond stretching + A/r12 – B/r6 VDW interaction
+ Q1Q2/r
Electrostatic forces
Bond
spring
SLIDE 26 Can compute statistical averages, fluctuations; Analyze side chain movements, Cavity dynamics, Domain motion, Etc. Now we have positions of all atoms as a function of time.
SLIDE 27 PRICIPLE: Given positions of each atom x(t) at time t, its position at next time-step t + Δt is given by: x(t + Δt) x(t) + v(t) Δt + ½*F/m * (Δt)2 Key parameter: integration time step Δt. Controls accuracy and speed
- f numerical integration routines.
Molecular Dynamics: Smaller Δt – more accurate, but need more steps. How many needed to simulate biology? How many can one afford? force
SLIDE 28 As a result, we can not quite get into the “biological” time scales.
Characteristic time scales [sec] 10-14 10-6 100 H-C bond vibration biology
For stability, Δt must be at least an order of magnitude less than the fastest motion, i.e Δt ~ 10-15 s. Example: to simulate folding of the fastest folding protein, at least 10-6/10-15 = 109 steps will be needed .
Protein folding Currently accessible times Time-step, Δt
SLIDE 29 The bottleneck of the methodology: computation of long-range interactions.
Electrostatic interactions fall of as inverse distance between atoms. Too strong to neglect. Need to account for all of them. Very expensive. Up to 99%
- f total cost for a protein.
SLIDE 30
Virginia Tech’s supercomputer, System-X Massive parallel machines help.
SLIDE 31 Processor #1 Processor #2 Processor #3 Processor #4 The “worst” problem for parallel computations: X1, F(on X1) X2, F(on X2) Force acting
depends upon positions of every other atom in the system. Computed coordinates have to be communicated between all processors at each step
SLIDE 32 Without approximations, computation of long-range electrostatic forces will cost you O(N2), where the number of atoms N may be as large as N ~ 106. Too expensive for large systems. After coarse-graining costs can be as low as Nlog(N) Works because macromolecules are naturally partitioned into hierarchical levels: atoms -> amino-acids -> proteins -> complexes..
Every atom interacts with every other atom, a complete graph
+3 A solution: combine charges (vertices) into groups, that is use coarse-graining.
SLIDE 33
SLIDE 34
SLIDE 35 Simulated Refolding pathway
- f the 46-residue protein. Molecular
dynamics based on AMBER-7
NB: due to the absence of viscosity, folding occurs on much shorter time-scale than in an experiment.
Movie available at: www.scripps.edu/~onufriev/RESEARCH/in_virtuo.html
1 3 5 0 1 2 3 4 5 6
SLIDE 36 Intrigued?
Suggested articles: 1.“Protein Folding and Misfolding”, C. Dobson, Nature 426, 884 (2003).
- 2. “Design of a Novel Globular Protein Fold
with Atomic-level Accuracy, Kuhlman et al. , Science, 302, 1364, (2003) + references therein.
SLIDE 37
Trivia: it is myoglobin (with oxygen bound to it) that gives fresh meat its red color. The protein stores oxygen.
Theme III: how does oxygen get inside the protein myoglobin?
SLIDE 38 Myoglobin – the “hydrogen atom” of structural biology
Exactly what are the migration pathways of small non-polar ligands (O2 , CO, NO) inside the protein? A 50 year old question.
O2
SLIDE 39
The heme group – the heart of myoglobin.
O2 |
SLIDE 40 Myoglobin: protein responsible for oxygen storage in all breathing organisms.
O2
Fe
?
How does Oxygen,
ligands get in to bind to Fe atom in its core?
SLIDE 41
After 50 years of research many questions still remain. Example: one or many pathways? Single dominant pathway, “HIS gate” e.g. Olson et al. Multiple pathays, e.g. Elber and Karplus
SLIDE 42 Two complementary approaches:
- #1 Simulate explicit ligand (CO)
trajectories via microseconds of room temperature, explicit solvent MD.
- #2 Compute dynamic volume fluctuations
in the protein.
SLIDE 43 Approach #1: Straightforward MD
(explicit solvent, PME, NPT, 300K, 7 µs total)
1. Simulation type #1. Model photo- detachement of carbon monoxyde (CO)
- ligand. 20 trajectories , 90 ns each. The
ligand starts at the docking site (Fe) and comes out, sometimes.
- 2. Simulation type #2. Model diffusion of the
ligand from the outside. The ligand starts in the solvent, and sometimes diffuses all the way to the docking site inside. 48 trajectories, each 90 ns long.
SLIDE 44 Raw Result #1: a sample of simulated trajectories that show ligand migration paths Photodetachement
Initially at the binding site
CO initially in the water
multiple individual trajectories combined in
SLIDE 45
Summary of the MD trajectories:
Concern: Have we explored ALL the pathways? Challenge: can we get the same pathways cheaper? Pathway #1 Pathway #2
SLIDE 46 pathway
Approach #2. PATHFINDER – dynamic free volume analysis. Run a separate, much shorter (15 ns) MD in which the ligand does not move. In each snapshot, identify cavities large enough to hold the ligand. Purely steric
- criterion. Combine the cavities over the entire trajectory, determine connectivity.
timeline
SLIDE 47
The challenge: protein shape is very complex, internal cavities are hiding within, also very complex shape. Connectivity to the outside is a non-trivia issue. In this case, it is critical to have a SIMPLE and robust algorithm, to have just ONE complexity to deal with, not a product of the two: (problem complexity) x (algorithm complexity).
SLIDE 48
Free volume analysis (PATHFINDER)
SLIDE 49
Pathways from volume fluctuations (PATHFINDER):
SLIDE 50 Combining the two approaches:
Explicit trajectories (MD) Free volume fluctuations
- J. Ruscio et al. “Atomic level computational identification of ligand migration
pathways between solvent and binding site in myoglobin, Proc. Natl. Acad. Sci. USA, 105, 9204 (2008)
SLIDE 51
The end result: oxygen diffusion pathways in myoglobin:
The pathways (blue) occur inside the rigid scaffold structures (helices, white).
SLIDE 52
The law of (bio/phys/chem)- computational science:
accuracy speed easy hard genius
SLIDE 53
Lunch time