How computation has changed research in chemistry and biology Peter - - PowerPoint PPT Presentation
How computation has changed research in chemistry and biology Peter - - PowerPoint PPT Presentation
How computation has changed research in chemistry and biology Peter Schuster Institut fr Theoretische Chemie, Universitt Wien, Austria and The Santa Fe Institute, Santa Fe, New Mexico, USA IWR - 25 Jahre-Jubilum Heidelberg, 21.
How computation has changed research in chemistry and biology
Peter Schuster
Institut für Theoretische Chemie, Universität Wien, Austria and The Santa Fe Institute, Santa Fe, New Mexico, USA
IWR - 25 Jahre-Jubiläum Heidelberg, 21. – 22.02.2013
Web-Page for further information: http://www.tbi.univie.ac.at/~pks
Some technological revolutions in 20th century science: 1. molecular spectroscopy, 2. micro-technology, 3. electronic computation, 4. molecular revolution in biology, 5. computational quantum chemistry, and 6. holistic chemistry of biological entities.
Electronics 38 (8), 4-7,1965
Gordon E. Moore, 1929 -
Exponential increase in hardware power
Martin Grötschel, 1948 -
. Grötschel, an expert in optimization, observes that a benchmark production planning model solved using linear programming would have taken 82 years to solve in 1988, using the computers and the linear programming algorithms of the day. Fifteen years later – in 2003 – the same model could be solved in roughly 1 minute, an improvement by a factor of roughly 43 million. J.P. Holdren, E. Lander, H. Varmus. Designing a digital future: Federally funded research and development in networking and information technology. President‘s council on science and technology, Washington, DC, p.71, 2010 Of this, a factor of roughly 1000 was due to increased processor speed, whereas a factor of roughly 43000 was due to improvements in algortihms ! Grötschel also cites an algorithmic improvement of roughly 30000 for mixed integer programming between 1991 and 2008.
PCIT Report to the President, 2010. Progress in Algorithms Beats Moore‘s Law.
Four selected examples
1. Parameter determination in chemical kinetics 2. Design of ribonucleic acid (RNA) structures 3. Kinetic folding of RNA molecules 4. Modeling evolution
Four selected examples
- 1. Parameter determination in chemical kinetics
2. Design of ribonucleic acid (RNA) structures 3. Kinetic folding of RNA molecules 4. Modeling evolution
Michaelis-Menten mechanism of enzyme reactions
- L. Michaelis, M. Menten. Die Kinetik der
Invertin-Wirkung. Biochemische Zeitschrift 49, 333-369,1913
] S [ ] S [ ]) S ([ ] P [
M max
+ ⋅ = = K v v dt d
max M
] E [ and ] ES [ ]) S ([ , ⋅ = ⋅ = + =
r r f d r
k v k v k k k K
vv
v
basic assumptions: kr kd [E]0 << [S]0
Linearization of a hyperbola:
] S [ ] S [ ]) S ([
M max
+ ⋅ = K v v
Lineweaver-Burk: 1/v = f (1/[S]) Eadie-Hofstee: v = f (1/[S]) Scatchard: 1/[S] = f (v) Hanes: [S] / v = f ([S]) Hill: log (v/(vmax – v)) = f (log [S])
The Lineweaver-Burke plot of Michaelis-Menten kinetics
Source: Wikipedia, “Enzymkinetik”
Validity of the Michaelis-Menten approximation
The forward problem of chemical reaction kinetics
The inverse problem of chemical reaction kinetics
Parameter identification and determination is an ill-posed problem Inverse problem solution techniques
Y y Q q y q y q F ∈ ∈ = and ; data (noisy) , vector parameter , ) (
δ δ
Q q Y
q F y
∈
→ −
min ) (
2 δ
ill-conditioned problem
2 2
) , ( with min ) , ( ) (
Q Q q Y
q q q q q q q F y
− = → + −
∈
R R α
δ
regularization term R - here Tikhonov regularization - with q0 being an initial parameter guess and the regularization parameter
Parameter identification and determination as an inverse problem
Four selected examples
1. Parameter determination in chemical kinetics 2. Design of ribonucleic acid (RNA) structures 3. Kinetic folding of RNA molecules 4. Modeling evolution
O CH2 OH O O P O O O
N1
O CH2 OH O P O O O
N2
O CH2 OH O P O O O
N3
O CH2 OH O P O O O
N4
N A U G C
k =
, , ,
3' - end 5' - end Na Na Na Na
5'-end 3’-end
GCGGAU AUUCGC UUA AGUUGGGA G CUGAAGA AGGUC UUCGAUC A ACCA GCUC GAGC CCAGA UCUGG CUGUG CACAG
RNA structure The molecular phenotype
The notion of structure
The minimum free energy structures on a discrete space of conformations
S1
(h)
S9
(h)
Free energy G Minimum of free energy Suboptimal conformations
S0
(h) S2
(h)
S3
(h)
S4
(h)
S7
(h)
S6
(h)
S5
(h)
S8
(h)
RNA sequence RNA structure
- f minimal free
energy
RNA folding: structural biology, spectroscopy of biomolecules, understanding molecular function empirical parameters biophysical chemistry: thermodynamics and kinetics
From RNA sequence to structure
linear programming
RNA sequence RNA structure
- f minimal free
energy
RNA folding: Structural biology, spectroscopy of biomolecules, understanding molecular function inverse folding of RNA: biotechnology, design of biomolecules with predefined structures and functions inverse Folding Algorithm iterative determination
- f a sequence for the
given secondary structure
From RNA structure to sequence
Linear programming
Ivo L. Hofacker, Walter Fontana, Peter F. Stadler, Sebastian Bonhoeffer, Manfred Tacker, and Peter Schuster. Fast folding and comparison of RNA secondary structures. Mh.Chem. 125:167-188, 1994 Ronny Lorenz, Stephan H. Bernhart, Christian Höner zu Siederissen, Hakim Tafer, Christioh Flamm, Peter F. Stadler, and Ivo L. Hofacker. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6:26, 2011
ViennaRNA Package:
A mapping and its inversion
Gk = ( ) | ( ) =
- 1
U
S I S
k j j k
I
( ) = I S
j k Space of genotypes: = { I S I I I I I S S S S S
1 2 3 4 N 1 2 3 4 M
, , , , ... , } ; Hamming metric Space of phenotypes: , , , , ... , } ; metric (not required) N M = {
≡
many genotypes one phenotype
Four selected examples
1. Parameter determination in chemical kinetics 2. Design of ribonucleic acid (RNA) structures
- 3. Kinetic folding of RNA molecules
4. Modeling evolution
Extension of the notion of structure
F r e e e n e r g y G "Reaction coordinate" Sk S{ Saddle point T
{ k
Free energy G Sk S{ T
{ k
"Barrier tree"
Definition of a ‚barrier tree‘
Interconversion of suboptimal structures
Computation of kinetic folding
JN1LH
1D 1D 1D 2D 2D 2D R R R
G GGGUGGAAC GUUC GAAC GUUCCUCCC CACGAG CACGAG CACGAG
- 28.6 kcal·mol
- 1
G/
- 31.8 kcal·mol
- 1
G G G G G G C C C C C C A A U U U U G G C C U U A A G G G C C C A A A A G C G C A A G C /G
- 28.2 kcal·mol
- 1
G G G G G G GG CCC C C C C C U G G G G C C C C A A A A A A A A U U U U U G G C C A A
- 28.6 kcal·mol
- 1
3 3 3 13 13 13 23 23 23 33 33 33 44 44 44
5' 5' 3’ 3’
J.H.A. Nagel, C. Flamm, I.L. Hofacker, K. Franke, M.H. de Smit, P. Schuster, and C.W.A. Pleij. Structural parameters affecting the kinetic competition of RNA hairpin formation. Nucleic Acids Res. 34:3568-3576 (2006)
An experimental RNA switch
4 5 8 9 11
19 20 24 25 27 33 34
36
38 39 41 46 47
3
4 9
1
2 6 7 10
1 2 1 3 1 4 1 5 1 6 1 7 1 8 2 1 22 2 3 2 6 2 8 2 9 3 3 1 32 35 37
40
42 43 44 45 48 50
- 26.0
- 28.0
- 30.0
- 32.0
- 34.0
- 36.0
- 38.0
- 40.0
- 42.0
- 44.0
- 46.0
- 48.0
- 50.0
2.77 5.32 2.09 3.4 2.36 2.44 2.44 2.44 1.46 1.44 1.66
1.9
2 . 1 4
2 . 5 1 2 . 1 4 2 . 5 1
2.14 1.47
1.49
3.04 2.97 3.04 4.88 6.13 6.8 2.89
F r e e e n e r g y [ k c a l / m
- l
e ]
J1LH barrier tree
Four selected examples
1. Parameter determination in chemical kinetics 2. Design of ribonucleic acid (RNA) structures 3. Kinetic folding of RNA molecules
- 4. Modeling evolution
Sewall Wrights fitness landscape as metaphor for Darwinian evolution
Sewall Wright. 1932. The roles of mutation, inbreeding, crossbreeding and selection in evolution. In: D.F.Jones, ed. Int. Proceedings of the Sixth International Congress on Genetics. Vol.1, 356-366. Ithaca, NY.
The multiplicity of gene replacements with two alleles on each locus + …….. wild type a .......... alternative allele
- n locus A
: : : abcde … alternative alleles
- n all five loci
Sewall Wright. 1988. Surfaces of selective value revisited. American Naturalist 131:115-123
Sewall Wright, 1889 - 1988
Evolution is hill climbing of populations or subpopulations Sewall Wright. 1988. Surfaces of selective value revisited. American Naturalist 131:115-123
The logics of DNA (or RNA) replication
Accuracy of replication: Q = q1 q2 q3 q4 …
Evolution in the test tube: G.F. Joyce, Angew.Chem.Int.Ed. 46 (2007), 6420-6436
Sol Spiegelman, 1914 - 1983
Kinetics of RNA replication
C.K. Biebricher, M. Eigen, W.C. Gardiner, Jr. Biochemistry 22:2544-2559, 1983
Christof K. Biebricher, 1941-2009
Manfred Eigen 1927 -
∑ ∑ ∑
= = =
= ⋅ = = − =
n i i n i i i i ji ji j i n i ji j
x x f Φ f Q W n j Φ x x W x
1 1 1
, , , 2 , 1 ; dt d
Mutation and (correct) replication as parallel chemical reactions
- M. Eigen. 1971. Naturwissenschaften 58:465,
- M. Eigen & P. Schuster.1977. Naturwissenschaften 64:541, 65:7 und 65:341
quasispecies
The error threshold in replication and mutation
The paradigm of structural biology
The simplified model
Model fitness landscapes I
single peak landscape step linear landscape
Stationary population or quasispecies as a function
- f the mutation or error
rate p
Error rate p = 1-q
0.00 0.05 0.10
Quasispecies Uniform distribution
Error threshold on the single peak landscape
Error threshold on the step linear landscape
Rugged fitness landscapes
- ver individual binary sequences
with n = 10
single peak landscape „realistic“ landscape
Random distribution of fitness values: d = 1.0 and s = 637
Error threshold on ‚realistic‘ landscapes n = 10, f0 = 1.1, fn = 1.0, d = 0.5
s = 541 s = 637 s = 919
s = 541 s = 919 s = 637
Error threshold on ‚realistic‘ landscapes n = 10, f0 = 1.1, fn = 1.0, d = 0.995
s = 919 s = 541 s = 637
Error threshold on ‚realistic‘ landscapes n = 10, f0 = 1.1, fn = 1.0, d = 1.0
Complexity in molecular evolution
W = G F 0 , 0 largest eigenvalue and eigenvector
diagonalization of matrix W „ complicated but not complex “ fitness landscape mutation matrix „ complex “ ( complex )
sequence structure
„ complex “
mutation selection
The new biology provides a hitherto unknown challenge for mathematicians, computer scientists, and theorical biologists for mainly two reasons enormous amount of data and complexity of structure and dynamics:
. I was taught in the pregenomic era to be a
- hunter. I learnt how to identify the wild beasts
and how to go out, hunt them down and kill
- them. We are now urged to be gatherers, to
collect everything lying around and put it into storehouses. Someday, it is assumed, someone will come and sort through the storehouses, discard all the junk, and keep the rare finds. The only difficulty is how to recognize them. Sydney Brenner, 1927 - Sydney Brenner. Hunters and gatherers. The Scientist 16(4): 14, 2002
The „big data“ problem in bioinformatics
Theory – mathematics and computation – cannot remove complexity, but it shows what kind of „regular“ behavior can be expected and what experiments have to be done to get a grasp on the irregularities.
Manfred Eigen, 1927 -
Preface to E. Domingo, C.R. Parrish, J.J.Holland, eds. Origin and Evolution of
- Viruses. Academic Press 2008
Theory, mathematics and complexity
Coworkers
Peter Stadler, Bärbel M. Stadler, Universität Leipzig, GE Paul E. Phillipson, University of Colorado at Boulder, CO Heinz Engl, Philipp Kügler, James Lu, Stefan Müller, RICAM Linz, AT Jord Nagel, Kees Pleij, Universiteit Leiden, NL Walter Fontana, Harvard Medical School, MA Martin Nowak, Harvard University, MA Christian Reidys, University of Southern Denmark, Odense, DK Christian Forst, University of Texas, Southwestern Medical Center, TX Thomas Wiehe, Ulrike Göbel, Walter Grüner, Stefan Kopp, Jaqueline Weber, Institut für Molekulare Biotechnologie, Jena, GE Ivo L.Hofacker, Christoph Flamm, Andreas Svrček-Seiler, Universität Wien, AT Kurt Grünberger, Michael Kospach , Andreas Wernitznig, Stefanie Widder, Stefan Wuchty, Jan Cupal, Stefan Bernhart, Lukas Endler, Ulrike Langhammer, Rainer Machne, Ulrike Mückstein, Erich Bornberg-Bauer, Universität Wien, AT
Universität Wien
Universität Wien
Acknowledgement of support
Fonds zur Förderung der wissenschaftlichen Forschung (FWF) Projects No. 09942, 10578, 11065, 13093 13887, and 14898 Wiener Wissenschafts-, Forschungs- und Technologiefonds (WWTF) Project No. Mat05 Jubiläumsfonds der Österreichischen Nationalbank Project No. Nat-7813 European Commission: Contracts No. 98-0189, 12835 (NEST) Austrian Genome Research Program – GEN-AU: Bioinformatics Network (BIN) Österreichische Akademie der Wissenschaften Siemens AG, Austria Universität Wien and the Santa Fe Institute