CS Seminar. Feb 09. Alexey Onufriev, Dept of Computer Science; - - PowerPoint PPT Presentation

cs seminar feb 09
SMART_READER_LITE
LIVE PREVIEW

CS Seminar. Feb 09. Alexey Onufriev, Dept of Computer Science; - - PowerPoint PPT Presentation

The computational Core of Molecular Modeling. (The What? Why? And How? ) CS Seminar. Feb 09. Alexey Onufriev, Dept of Computer Science; Dept. of Physics and the GBCB program. Thanks to: Faculty co-authors: T.M. Murali, M. Prisant, L.


slide-1
SLIDE 1

The computational Core of Molecular Modeling. (The What? Why? And How? ) Alexey Onufriev, Dept of Computer Science;

  • Dept. of Physics and the GBCB program.

CS Seminar. Feb 09.

slide-2
SLIDE 2

Thanks to:

Faculty co-authors: T.M. Murali, M. Prisant, L. Heath, C. Simmerling Student co-authors:

  • J. Gordon, J. Myers, A. Fenley, J. Ruscio,
  • R. Anandakrishnan, D. Kumar, M. Shukla, V. Sojia

Sponsors: NIH, VT. Support: System-X team,

  • N. Polys (myoglobin graphics)
slide-3
SLIDE 3
  • a. Rational Drug Design
  • b. From atomic motion to biological function
  • c. The “grand challenge” of computational

science: the Protein Folding Problem.

Will focus on the computational side

  • f solving problems in the following areas:
slide-4
SLIDE 4

in vivo in vitro in virtuo The emergence of “in virtuo” Science.

slide-5
SLIDE 5

Biological function = f( 3D molecular structure )

…A T G C … DNA sequence Protein structure Bilogical function

Key challenges: Biomolecular structures are complex (e.g. compared to crystal solids). Biology works on many time scales. Experiments can only go so far.

A solution: Computational methods. Model movements

  • f individual atoms.

How can we predict/understand/modify function if we know the structure? Can we predict the structure?

slide-6
SLIDE 6

“All things are made of atoms, and everything that living things do can be understood in terms of the wigglings and jigglings of atoms”

  • R. Feynmann

Suggests the approach: model what nature does, i.e. let the molecule evolve with time according to underlying physics laws.

The paradigm:

slide-7
SLIDE 7

A protein

  • n a

surface. Atomic resolution

slide-8
SLIDE 8

Theme 1. Rational (structure- based) design of new medicines:

Picture: Design of a HIV protease inhibitor. Hornak, V.; Okur, A., Rizzo, R. and Simmerling, C.,

“HIV-1 protease flaps spontaneously open and reclose in molecular dynamics simulations”,

  • Proc. Nat. Acad. Sci. USA, 103:915-920 (2006)
slide-9
SLIDE 9

Example: rational drug design.

Drug agent e.g:viral protease (chops up proteins) If you block the enzymes function – you kill the virus.

slide-10
SLIDE 10

Example of successful computer-aided (rational) drug design: One of the drugs that helped slow down the AIDS epidemic

(part of anti-retro viral cocktail).

The drug blocks the function of a key viral protein. To design the drug, one needs a precise 3D structure of that protein.

slide-11
SLIDE 11

A computational challenge:

Need high quality, complete protein

  • structures. But the experiment (X-

ray) does not ``see” hydrogen atoms. These have to be placed computationally.

slide-12
SLIDE 12

+

X1 X2 X3

+ +

W23 ΔGk(pH) = Σ

xi k (kT ln10 pH - Δ Ei calc) + 1/2ΣΣ

ΣΣxi

kxj k Wij

N N N

i i j

Combinatorial explosion problem: A molecule with N sites, each of which can be occupied by an H+ atom (or be empty), has 2N possible states.

All of the possible charge (protonation) states must be taken into

  • account. For a typical protein with 100

ionizable amino-acids this means 2100 ~ 1030 possible variants to consider! And we need to find the minimum energy state among them (for the experts: what we really

need is, of course, the partition function Z, see below. Which is an even more computationally demanding job)

Xi = 1 or 0 (occupied or empty in state k) k – protonation state (out of 2N ) Free energy Matrix of site-site interactions.

Z = Σall states exp(-ΔGk/kT)

trouble

slide-13
SLIDE 13

Beating the combinatorics: the clustering approach. Total number of protonation states = 2N = 64 Cluster 1, N=3 Cluster 2, N=3 Solution: neglect the ``weak” edges. Cluster the strong ones. After clustering: Total number of states = 23 + 23 = 16 < 64. Example: a protein with N = 6 ionizable groups.

Every site interacts with every other site, a complete graph

slide-14
SLIDE 14

http://biophysics.cs.vt.edu/H++

A web-server that adds hydrogens to molecular

  • structures. Launched by Onufriev’s group in

June 2005. ~1000 registered users since.

slide-15
SLIDE 15

Intrigued?

Suggested reading: “The Many roles of computation in drug discovery”, W. Jorgensen, Sceince 303, 1813 (2004). + references therein.

slide-16
SLIDE 16

THEME II The protein folding challenge.

MET—ALA—ALA—ASP—GLU—GLU--….

Experiment: amino acid sequence uniquely determines proteinʼs 3D shape (ground state).

Nature does it all the time. Can we?

Amino-acid sequence – translated genetic code.

How?

Why bother: proteinʼs shape determines its biological function.

slide-17
SLIDE 17

Protein Structure in 3 steps. Amino-acid #1 Amino-acid #2 Peptide bond Step 1. Two amino-acids together (di-peptide)

slide-18
SLIDE 18

Step 2: Most flexible degrees of freedom:

Protein Structure in 3 steps.

slide-19
SLIDE 19

φ1 φ2 φ3 φ4 A protein is simply a chain of amino-acids: Each configuration {Φ1, Φ2, …Φκ} has some

  • energy. The folded (biologically functional) protein

has the lowest possible energy - global minimum. So just find this conformation by some kind of a minimization algorithm… what’s he big deal?

slide-20
SLIDE 20

The magnitude of the protein folding challenge:

φ1 φ2 φ3 φ4

A typical protein is a chain of ~ 100 mino acids. Assume that each amino acid can take up only 10 conformations (vast underestimation) Total number of possible conformations: 10100 Suppose each energy estimate is just 1 float point operation. Suppose you have a Penta-Flop supercomputer.

An exhaustive search for the global minimum would take 1085 seconds ~ 3*1078 years. Age of the Universe ~ 2*1010 years.

Enormous number of the possible conformations of the polypeptide chain

slide-21
SLIDE 21

Finding a global minimum in a multidimensional case is easy only when the landscape is smooth. No matter where you start (1, 2 or 3), you quickly end up at the bottom -

  • the Native (N), functional state of

the protein.

Free energy Folding coordinate

1 2 3

Adopted from Ken Dill’s web site at UCSF

slide-22
SLIDE 22

Realistic landscapes are much more complex, with multiple local minima – folding traps. Proteins “trapped” in those minima may lead to disease, such as Altzheimer’s

Adopted from Ken Dill’s web site at UCSF

slide-23
SLIDE 23

Adopted from Dobson, NATURE 426, 884 2003

slide-24
SLIDE 24

Since minimization won’t work, choose an alternative.

Do what Nature does: just let it fold

  • n its own, at normal temperature.

Method: Molecular Dynamics

slide-25
SLIDE 25

Each atom moves by Newton’s 2nd Law: F = ma

E =

+

  • + …

x

Y Principles of Molecular Dynamics (MD): F = dE/dr

System’s energy

Kr2

Bond stretching + A/r12 – B/r6 VDW interaction

+ Q1Q2/r

Electrostatic forces

Bond

spring

slide-26
SLIDE 26

Can compute statistical averages, fluctuations; Analyze side chain movements, Cavity dynamics, Domain motion, Etc. Now we have positions of all atoms as a function of time.

slide-27
SLIDE 27

PRICIPLE: Given positions of each atom x(t) at time t, its position at next time-step t + Δt is given by: x(t + Δt)  x(t) + v(t) Δt + ½*F/m * (Δt)2 Key parameter: integration time step Δt. Controls accuracy and speed

  • f numerical integration routines.

Molecular Dynamics: Smaller Δt – more accurate, but need more steps. How many needed to simulate biology? How many can one afford? force

slide-28
SLIDE 28

As a result, we can not quite get into the “biological” time scales.

Characteristic time scales [sec] 10-14 10-6 100 H-C bond vibration biology

For stability, Δt must be at least an order of magnitude less than the fastest motion, i.e Δt ~ 10-15 s. Example: to simulate folding of the fastest folding protein, at least 10-6/10-15 = 109 steps will be needed .

Protein folding Currently accessible times Time-step, Δt

slide-29
SLIDE 29

The bottleneck of the methodology: computation of long-range interactions.

Electrostatic interactions fall of as inverse distance between atoms. Too strong to neglect. Need to account for all of them. Very expensive. Up to 99%

  • f total cost for a protein.
slide-30
SLIDE 30

Virginia Tech’s supercomputer, System-X Massive parallel machines help.

slide-31
SLIDE 31

Processor #1 Processor #2 Processor #3 Processor #4 The “worst” problem for parallel computations: X1, F(on X1) X2, F(on X2) Force acting

  • n each atom

depends upon positions of every other atom in the system. Computed coordinates have to be communicated between all processors at each step

slide-32
SLIDE 32

Without approximations, computation of long-range electrostatic forces will cost you O(N2), where the number of atoms N may be as large as N ~ 106. Too expensive for large systems. After coarse-graining costs can be as low as Nlog(N) Works because macromolecules are naturally partitioned into hierarchical levels: atoms -> amino-acids -> proteins -> complexes..

Every atom interacts with every other atom, a complete graph

+3 A solution: combine charges (vertices) into groups, that is use coarse-graining.

slide-33
SLIDE 33
slide-34
SLIDE 34
slide-35
SLIDE 35

Simulated Refolding pathway

  • f the 46-residue protein. Molecular

dynamics based on AMBER-7

NB: due to the absence of viscosity, folding occurs on much shorter time-scale than in an experiment.

Movie available at: www.scripps.edu/~onufriev/RESEARCH/in_virtuo.html

1 3 5 0 1 2 3 4 5 6

slide-36
SLIDE 36

Intrigued?

Suggested articles: 1.“Protein Folding and Misfolding”, C. Dobson, Nature 426, 884 (2003).

  • 2. “Design of a Novel Globular Protein Fold

with Atomic-level Accuracy, Kuhlman et al. , Science, 302, 1364, (2003) + references therein.

slide-37
SLIDE 37

Trivia: it is myoglobin (with oxygen bound to it) that gives fresh meat its red color. The protein stores oxygen.

Theme III: how does oxygen get inside the protein myoglobin?

slide-38
SLIDE 38

Myoglobin – the “hydrogen atom” of structural biology

Exactly what are the migration pathways of small non-polar ligands (O2 , CO, NO) inside the protein? A 50 year old question.

O2

slide-39
SLIDE 39

The heme group – the heart of myoglobin.

O2 |

slide-40
SLIDE 40

Myoglobin: protein responsible for oxygen storage in all breathing organisms.

O2

Fe

?

How does Oxygen,

  • r other small non-polar

ligands get in to bind to Fe atom in its core?

slide-41
SLIDE 41

After 50 years of research many questions still remain. Example: one or many pathways? Single dominant pathway, “HIS gate” e.g. Olson et al. Multiple pathays, e.g. Elber and Karplus

slide-42
SLIDE 42

Two complementary approaches:

  • #1 Simulate explicit ligand (CO)

trajectories via microseconds of room temperature, explicit solvent MD.

  • #2 Compute dynamic volume fluctuations

in the protein.

slide-43
SLIDE 43

Approach #1: Straightforward MD

(explicit solvent, PME, NPT, 300K, 7 µs total)

1. Simulation type #1. Model photo- detachement of carbon monoxyde (CO)

  • ligand. 20 trajectories , 90 ns each. The

ligand starts at the docking site (Fe) and comes out, sometimes.

  • 2. Simulation type #2. Model diffusion of the

ligand from the outside. The ligand starts in the solvent, and sometimes diffuses all the way to the docking site inside. 48 trajectories, each 90 ns long.

slide-44
SLIDE 44

Raw Result #1: a sample of simulated trajectories that show ligand migration paths Photodetachement

  • experiment. CO

Initially at the binding site

CO initially in the water

multiple individual trajectories combined in

  • ne slide
slide-45
SLIDE 45

Summary of the MD trajectories:

Concern: Have we explored ALL the pathways? Challenge: can we get the same pathways cheaper? Pathway #1 Pathway #2

slide-46
SLIDE 46

pathway

Approach #2. PATHFINDER – dynamic free volume analysis. Run a separate, much shorter (15 ns) MD in which the ligand does not move. In each snapshot, identify cavities large enough to hold the ligand. Purely steric

  • criterion. Combine the cavities over the entire trajectory, determine connectivity.

timeline

slide-47
SLIDE 47

The challenge: protein shape is very complex, internal cavities are hiding within, also very complex shape. Connectivity to the outside is a non-trivia issue. In this case, it is critical to have a SIMPLE and robust algorithm, to have just ONE complexity to deal with, not a product of the two: (problem complexity) x (algorithm complexity).

slide-48
SLIDE 48

Free volume analysis (PATHFINDER)

slide-49
SLIDE 49

Pathways from volume fluctuations (PATHFINDER):

slide-50
SLIDE 50

Combining the two approaches:

Explicit trajectories (MD) Free volume fluctuations

  • J. Ruscio et al. “Atomic level computational identification of ligand migration

pathways between solvent and binding site in myoglobin, Proc. Natl. Acad. Sci. USA, 105, 9204 (2008)

slide-51
SLIDE 51

The end result: oxygen diffusion pathways in myoglobin:

The pathways (blue) occur inside the rigid scaffold structures (helices, white).

slide-52
SLIDE 52

The law of (bio/phys/chem)- computational science:

accuracy speed easy hard genius

slide-53
SLIDE 53

Lunch time