cs seminar feb 09
play

CS Seminar. Feb 09. Alexey Onufriev, Dept of Computer Science; - PowerPoint PPT Presentation

The computational Core of Molecular Modeling. (The What? Why? And How? ) CS Seminar. Feb 09. Alexey Onufriev, Dept of Computer Science; Dept. of Physics and the GBCB program. Thanks to: Faculty co-authors: T.M. Murali, M. Prisant, L.


  1. The computational Core of Molecular Modeling. (The What? Why? And How? ) CS Seminar. Feb 09. Alexey Onufriev, Dept of Computer Science; Dept. of Physics and the GBCB program.

  2. Thanks to: Faculty co-authors: T.M. Murali, M. Prisant, L. Heath, C. Simmerling Student co-authors: J. Gordon, J. Myers, A. Fenley, J. Ruscio, R. Anandakrishnan, D. Kumar, M. Shukla, V. Sojia Sponsors: NIH, VT. Support: System-X team, N. Polys (myoglobin graphics)

  3. Will focus on the computational side of solving problems in the following areas: • a. Rational Drug Design • b. From atomic motion to biological function • c. The “grand challenge” of computational science: the Protein Folding Problem.

  4. The emergence of “ in virtuo” Science. in vivo in vitro in virtuo

  5. Biological function = f( 3D molecular structure ) …A T G C … DNA sequence Bilogical Protein function structure How can we predict/understand/modify function if we know the structure? Can we predict the structure? Key challenges: Biomolecular structures are complex (e.g. compared to crystal solids). Biology works on many time scales. Experiments can only go so far. A solution: Computational methods. Model movements of individual atoms.

  6. The paradigm: “All things are made of atoms, and everything that living things do can be understood in terms of the wigglings and jigglings of atoms” R. Feynmann Suggests the approach: model what nature does, i.e. let the molecule evolve with time according to underlying physics laws.

  7. A protein on a surface. Atomic resolution

  8. Theme 1. Rational (structure- based) design of new medicines: Picture: Design of a HIV protease inhibitor. Hornak, V.; Okur, A., Rizzo, R. and Simmerling, C., “HIV-1 protease flaps spontaneously open and reclose in molecular dynamics simulations”, Proc. Nat. Acad. Sci. USA, 103:915-920 (2006)

  9. Example: rational drug design. If you block the enzymes function – you kill the virus. Drug e.g: viral protease agent (chops up proteins)

  10. Example of successful computer-aided (rational) drug design: One of the drugs that helped slow down the AIDS epidemic (part of anti-retro viral cocktail). The drug blocks the function of a key viral protein. To design the drug, one needs a precise 3D structure of that protein.

  11. A computational challenge: Need high quality, complete protein structures. But the experiment (X- ray) does not ``see” hydrogen atoms. These have to be placed computationally.

  12. Combinatorial explosion problem: A molecule with N sites, each of which can be occupied by an H+ atom (or be empty), has 2 N possible states. All of the possible charge + (protonation) states must be taken into account. For a typical protein with 100 X 3 ionizable amino-acids this means 2 100 ~ W 23 10 30 possible variants to consider! And X 1 we need to find the minimum energy state among them (for the experts: what we really X 2 + + need is, of course, the partition function Z, see below. Which is an even more computationally demanding job) Matrix of site-site interactions. N N N Δ G k (pH) = Σ k (kT ln10 pH - Δ E i calc ) + 1/2 ΣΣ k x j k W ij x i ΣΣ x i i i j trouble X i = 1 or 0 ( occupied or empty in state k ) Z = Σ all states exp(- Δ G k /kT) k – protonation state ( out of 2 N ) Free energy

  13. Beating the combinatorics: the clustering approach. Example: a protein with N = 6 ionizable groups. Total number of protonation states = 2 N = 64 Every site interacts with every other site, a complete graph Cluster 2, N=3 Cluster 1, N=3 Solution: neglect the ``weak” edges. Cluster the strong ones. After clustering: Total number of states = 2 3 + 2 3 = 16 < 64.

  14. http://biophysics.cs.vt.edu/H++ A web-server that adds hydrogens to molecular structures. Launched by Onufriev’s group in June 2005. ~1000 registered users since.

  15. Intrigued? Suggested reading: “The Many roles of computation in drug discovery”, W. Jorgensen, Sceince 303, 1813 (2004). + references therein.

  16. THEME II The protein folding challenge. Nature does it all the time. Can we? Amino-acid sequence – translated genetic code. MET—ALA—ALA—ASP—GLU—GLU--…. How? Experiment: amino acid sequence uniquely determines protein ʼ s 3D shape (ground state). Why bother: protein ʼ s shape determines its biological function.

  17. Protein Structure in 3 steps. Step 1. Two amino-acids together (di-peptide) Peptide bond Amino-acid #1 Amino-acid #2

  18. Protein Structure in 3 steps. Step 2: Most flexible degrees of freedom:

  19. A protein is simply a chain of amino-acids: φ 4 φ 2 φ 3 φ 1 Each configuration { Φ 1, Φ 2, … Φκ } has some energy. The folded (biologically functional) protein has the lowest possible energy - global minimum. So just find this conformation by some kind of a minimization algorithm… what’s he big deal?

  20. The magnitude of the protein folding challenge: Enormous number of the possible conformations of the polypeptide chain φ 4 φ 2 φ 3 φ 1 A typical protein is a chain of ~ 100 mino acids. Assume that each amino acid can take up only 10 conformations (vast underestimation) Total number of possible conformations: 10 100 Suppose each energy estimate is just 1 float point operation. Suppose you have a Penta-Flop supercomputer. An exhaustive search for the global minimum would take 10 85 seconds ~ 3*10 78 years. Age of the Universe ~ 2*10 10 years.

  21. 2 3 Free energy 1 Finding a global minimum in a multidimensional case is easy only when the landscape is smooth. No matter where you start (1, 2 or 3), Folding coordinate you quickly end up at the bottom - - the Native (N), functional state of the protein. Adopted from Ken Dill’s web site at UCSF

  22. Realistic landscapes are much more complex, with multiple local minima – folding traps. Proteins “trapped” in those minima may lead to disease, such as Altzheimer’s Adopted from Ken Dill’s web site at UCSF

  23. Adopted from Dobson, NATURE 426, 884 2003

  24. Since minimization won’t work, choose an alternative. Do what Nature does: just let it fold on its own, at normal temperature. Method: Molecular Dynamics

  25. Principles of M olecular Dynamics (MD): Y Each atom moves by Newton’s 2 nd Law: F = ma F = dE/dr System’s energy + - Bond spring x + … + Q 1 Q 2 /r Kr 2 E = Bond stretching + A/r 12 – B/r 6 Electrostatic forces VDW interaction

  26. Can compute statistical averages, fluctuations; Analyze side chain Now we movements, have Cavity positions of dynamics, all atoms Domain as a motion, function of Etc. time.

  27. Molecular Dynamics: PRICIPLE: Given positions of each atom x(t) at time t, its position at next time-step t + Δ t is given by: force x(t + Δ t)  x(t) + v(t) Δ t + ½ *F / m * ( Δ t) 2 Key parameter: integration time step Δ t . Controls accuracy and speed of numerical integration routines. Smaller Δ t – more accurate, but need more steps. How many needed to simulate biology? How many can one afford?

  28. As a result, we can not quite get into the “biological” time scales. Currently accessible times biology Characteristic 10 -14 10 -6 10 0 time scales [sec] H-C bond vibration Protein folding Time-step, Δ t For stability, Δ t must be at least an order of magnitude less than the fastest motion, i.e Δ t ~ 10 -15 s. Example: to simulate folding of the fastest folding protein , at least 10 -6 /10 -15 = 10 9 steps will be needed .

  29. The bottleneck of the methodology: computation of long-range interactions. Electrostatic interactions fall of as inverse distance between atoms. Too strong to neglect. Need to account for all of them. Very expensive. Up to 99% of total cost for a protein.

  30. Massive parallel machines help. Virginia Tech’s supercomputer, System-X

  31. The “worst” problem for parallel computations: Force acting on each atom Processor #1 Processor #2 depends upon X 1 , F(on X 1 ) X 2 , F(on X 2 ) positions of every other atom in the system. Computed coordinates have to be communicated between all Processor #3 processors Processor #4 at each step

  32. Without approximations, computation of long-range electrostatic forces will cost you O(N 2 ), where the number of atoms N may be as large as N ~ 10 6 . Too expensive for large systems. Every atom interacts with every other atom, a complete graph A solution: combine charges (vertices) into groups, that is use coarse-graining. +3 After coarse-graining costs can be as low as Nlog(N) Works because macromolecules are naturally partitioned into hierarchical levels: atoms -> amino-acids -> proteins -> complexes..

  33. Simulated Refolding pathway Movie available at: www.scripps.edu/~onufriev/RESEARCH/in_virtuo.html of the 46-residue protein. Molecular dynamics based on AMBER-7 1 3 0 1 2 3 4 5 6 5 NB: due to the absence of viscosity, folding occurs on much shorter time-scale than in an experiment.

  34. Intrigued? Suggested articles: 1.“Protein Folding and Misfolding”, C. Dobson, Nature 426, 884 (2003). 2. “Design of a Novel Globular Protein Fold with Atomic-level Accuracy, Kuhlman et al. , Science , 302, 1364, (2003) + references therein.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend