Evolutionary Optimization at the Molecular Level Peter Schuster - - PowerPoint PPT Presentation
Evolutionary Optimization at the Molecular Level Peter Schuster - - PowerPoint PPT Presentation
Evolutionary Optimization at the Molecular Level Peter Schuster Institut fr Theoretische Chemie, Universitt Wien, Austria and The Santa Fe Institute, Santa Fe, New Mexico, USA Physikalisches Kolloquium TU Wien, 28.11.2005 Web-Page for
Evolutionary Optimization at the Molecular Level
Peter Schuster
Institut für Theoretische Chemie, Universität Wien, Austria and The Santa Fe Institute, Santa Fe, New Mexico, USA
Physikalisches Kolloquium TU Wien, 28.11.2005
Web-Page for further information: http://www.tbi.univie.ac.at/~pks
Genotype, Genome Phenotype
Unfolding of the genotype
Highly specific environmental conditions Developmental program
Collection of genes
Evolution explains the origin of species and their interactions
Genotype, Genome
GCGGATTTAGCTCAGTTGGGAGAGCGCCAGACTGAAGATCTGGAGGTCCTGTGTTCGATCCACAGAATTCGCACCA
Phenotype
Unfolding of the genotype
Highly specific environmental conditions
James D. Watson und Francis H.C. Crick
Biochemistry molecular biology structural biology molecular evolution molecular genetics systems biology bioinfomatics
Hemoglobin sequence Gerhard Braunitzer The exciting RNA story evolution of RNA molecules, ribozymes and splicing, the idea of an RNA world, selection of RNA molecules, RNA editing, the ribosome is a ribozyme, small RNAs and RNA switches.
Omics
‘the new biology is the chemistry of living matter’ Molecular evolution Linus Pauling and Emile Zuckerkandl Manfred Eigen Max Perutz John Kendrew
Three necessary conditions for Darwinian evolution are: 1. Multiplication, 2. Variation, and 3. Selection. Variation through mutation and recombination operates on the genotype whereas the phenotype is the target of selection. One important property of the Darwinian scenario is that variations in the form of mutations or recombination events occur uncorrelated with their effects on the selection process. All conditions can be fulfilled not only by cellular organisms but also by nucleic acid molecules in suitable cell-free experimental assays.
Generation time Selection and adaptation 10 000 generations Genetic drift in small populations 106 generations Genetic drift in large populations 107 generations RNA molecules 10 sec 1 min 27.8 h = 1.16 d 6.94 d 115.7 d 1.90 a 3.17 a 19.01 a Bacteria 20 min 10 h 138.9 d 11.40 a 38.03 a 1 140 a 380 a 11 408 a Multicelluar organisms 10 d 20 a 274 a 200 000 a 27 380 a 2 × 107 a 273 800 a 2 × 108 a
Time scales of evolutionary change
RNA
RNA as scaffold for supramolecular complexes
ribosome ? ? ? ? ? RNA as transmitter of genetic information
DNA
...AGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUC...
messenger-RNA protein transcription translation RNA as
- f genetic information
working copy
RNA is modified by epigenetic control RNA RNA editing Alternative splicing of messenger
Functions of RNA molecules
RNA is the catalytic subunit in supramolecular complexes
RNA as regulator of gene expression Gene silencing by small interfering RNAs Allosteric control of transcribed RNA
Riboswitches metabolites controlling transcription and translation through
The world as a precursor of the current + biology RNA DNA protein
RNA as catalyst Ribozyme RNA as adapter molecule
G A C . . . C U G . . .
leu genetic code
RNA as carrier of genetic information
RNA viruses and retroviruses RNA evolution in vitro Evolutionary biotechnology RNA aptamers, artificial ribozymes, allosteric ribozymes
1. RNA sequences and structures 2. Neutral networks 3. Evolutionary optimization of structure 4. Suboptimal structures and kinetic folding 5. Comparison of kinetic folding and evolution
- 1. RNA sequences and structures
2. Neutral networks 3. Evolutionary optimization of structure 4. Suboptimal structures and kinetic folding 5. Comparison of kinetic folding and evolution
O CH2 OH O O P O O O
N1
O CH2 OH O P O O O
N2
O CH2 OH O P O O O
N3
O CH2 OH O P O O O
N4
N A U G C
k =
, , ,
3' - end 5' - end Na Na Na Na
5'-end 3’-end
GCGGAU AUUCGC UUA AGUUGGGA G CUGAAGA AGGUC UUCGAUC A ACCA GCUC GAGC CCAGA UCUGG CUGUG CACAG
Definition of RNA structure
A symbolic notation of RNA secondary structure that is equivalent to the conventional graphs
N = 4n NS < 3n Criterion: Minimum free energy (mfe) Rules: _ ( _ ) _ {AU,CG,GC,GU,UA,UG} A symbolic notation of RNA secondary structure that is equivalent to the conventional graphs
Conventional definition of RNA secondary structures
j n n j j n n
S S S S
− − = − +
⋅ + =
∑
1 1 1 1
Counting the numbers of structures of chain length n n+1
M.S. Waterman, T.F. Smith (1978) Math.Bioscience 42:257-266
Restrictions on physically acceptable mfe-structures: 3 and 2
Size restriction of elements: (i) hairpin loop (ii) stack
σ λ ≥ ≥
stack loop
n n
⎣ ⎦
∑ ∑
+ − − = + − + − − + = + − + − + +
Ξ = Φ ⋅ Φ + = Ξ Φ + Ξ =
2 / ) 1 ( 1 1 2 1 2 2 2 1 1 1 1 1 λ σ σ λ m k k m m m k k m k m m m m m
S S S Sn # structures of a sequence with chain length n
Recursion formula for the number of physically acceptable stable structures
I.L.Hofacker, P.Schuster, P.F. Stadler. 1998. Discr.Appl.Math. 89:177-207
RNA sequence RNA structure
- f minimal free
energy
RNA folding: Structural biology, spectroscopy of biomolecules, understanding molecular function Empirical parameters Biophysical chemistry: thermodynamics and kinetics
Sequence, structure, and design
G G G G G G G G G G G G G G G G U U U U U U U U U U U A A A A A A A A A A A A U C C C C C C C C C C C C 5’-end 3’-end
S1
(h)
S9
(h)
F r e e e n e r g y G
- Minimum of free energy
Suboptimal conformations
S0
(h) S2
(h)
S3
(h)
S4
(h)
S7
(h)
S6
(h)
S5
(h)
S8
(h)
The minimum free energy structures on a discrete space of conformations
Sequence space
CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... G A G T A C A C
Hamming distance d (I ,I ) =
H 1 2
4 d (I ,I ) = 0
H 1 1
d (I ,I ) = d (I ,I )
H H 1 2 2 1
d (I ,I ) d (I ,I ) + d (I ,I )
H H H 1 3 1 2 2 3
- (i)
(ii) (iii)
The Hamming distance between sequences induces a metric in sequence space
Sequence space and structure space
Two measures of distance in shape space: Hamming distance between structures, dH(Si,Sj) and base pair distance, dP(Si,Sj)
1. RNA sequences and structures
- 2. Neutral networks
3. Evolutionary optimization of structure 4. Suboptimal structures and kinetic folding 5. Comparison of kinetic folding and evolution 6. How to model evolution of kinetic folding?
RNA sequence RNA structure
- f minimal free
energy
RNA folding: Structural biology, spectroscopy of biomolecules, understanding molecular function Inverse Folding Algorithm Iterative determination
- f a sequence for the
given secondary structure
Sequence, structure, and design
Inverse folding of RNA: Biotechnology, design of biomolecules with predefined structures and functions
UUUAGCCAGCGCGAGUCGUGCGGACGGGGUUAUCUCUGUCGGGCUAGGGCGC GUGAGCGCGGGGCACAGUUUCUCAAGGAUGUAAGUUUUUGCCGUUUAUCUGG UUAGCGAGAGAGGAGGCUUCUAGACCCAGCUCUCUGGGUCGUUGCUGAUGCG CAUUGGUGCUAAUGAUAUUAGGGCUGUAUUCCUGUAUAGCGAUCAGUGUCCG GUAGGCCCUCUUGACAUAAGAUUUUUCCAAUGGUGGGAGAUGGCCAUUGCAG
Minimum free energy criterion Inverse folding
1st 2nd 3rd trial 4th 5th
The inverse folding algorithm searches for sequences that form a given RNA secondary structure under the minimum free energy criterion.
A mapping and its inversion
- Gk =
( ) | ( ) =
- 1
U
- S
I S
k j j k
I
( ) = I S
j k Space of genotypes: = { I
S I I I I I S S S S S
1 2 3 4 N 1 2 3 4 M
, , , , ... , } ; Hamming metric Space of phenotypes: , , , , ... , } ; metric (not required) N M = {
Degree of neutrality of neutral networks and the connectivity threshold
A multi-component neutral network formed by a rare structure: < cr
A connected neutral network formed by a common structure: > cr
Reference for postulation and in silico verification of neutral networks
Properties of RNA sequence to secondary structure mapping
- 1. More sequences than structures
Properties of RNA sequence to secondary structure mapping
- 1. More sequences than structures
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures 2. Few common versus many rare structures
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures 2. Few common versus many rare structures
n = 100, stem-loop structures n = 30
RNA secondary structures and Zipf’s law
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures 2. Few common versus many rare structures 3. Shape space covering of common structures
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures 2. Few common versus many rare structures 3. Shape space covering of common structures
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures 2. Few common versus many rare structures 3. Shape space covering of common structures 4. Neutral networks of common structures are connected
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures 2. Few common versus many rare structures 3. Shape space covering of common structures 4. Neutral networks of common structures are connected
RNA 9:1456-1463, 2003
Evidence for neutral networks and shape space covering
Evidence for neutral networks and
intersection of apatamer functions
AUGC, n = 100
Degree of neutrality λ Mean length of path h Unconstrained fold 0.33 > 95 Cofold with one sequence 0.32 75 Cofold with two sequences 0.18 40
Folding constraints, degree of neutrality and lengths of neutral path
1. RNA sequences and structures 2. Neutral networks
- 3. Evolutionary optimization of structure
4. Suboptimal structures and kinetic folding 5. Comparison of kinetic folding and evolution
Evolution in silico
- W. Fontana, P. Schuster,
Science 280 (1998), 1451-1455
Replication rate constant: fk = / [ + dS
(k)]
dS
(k) = dH(Sk,S)
Selection constraint: Population size, N = # RNA molecules, is controlled by the flow Mutation rate: p = 0.001 / site replication N N t N ± ≈ ) ( The flowreactor as a device for studies of evolution in vitro and in silico
Phenylalanyl-tRNA as target structure Randomly chosen initial structure
S{ = ( ) I{ f S
{ {
ƒ = ( )
S{ f{ I{
Mutation Genotype-Phenotype Mapping Evaluation of the Phenotype
Q{
j
I1 I2 I3 I4 I5 In
Q
f1 f2 f3 f4 f5 fn
I1 I2 I3 I4 I5 I{ In+1 f1 f2 f3 f4 f5 f{ fn+1
Q
Evolutionary dynamics including molecular phenotypes
In silico optimization in the flow reactor: Evolutionary Trajectory
28 neutral point mutations during a long quasi-stationary epoch Transition inducing point mutations change the molecular structure Neutral point mutations leave the molecular structure unchanged
Neutral genotype evolution during phenotypic stasis
Evolutionary trajectory Spreading of the population
- n neutral networks
Drift of the population center in sequence space
Spreading and evolution of a population on a neutral network: t = 150
Spreading and evolution of a population on a neutral network : t = 170
Spreading and evolution of a population on a neutral network : t = 200
Spreading and evolution of a population on a neutral network : t = 350
Spreading and evolution of a population on a neutral network : t = 500
Spreading and evolution of a population on a neutral network : t = 650
Spreading and evolution of a population on a neutral network : t = 820
Spreading and evolution of a population on a neutral network : t = 825
Spreading and evolution of a population on a neutral network : t = 830
Spreading and evolution of a population on a neutral network : t = 835
Spreading and evolution of a population on a neutral network : t = 840
Spreading and evolution of a population on a neutral network : t = 845
Spreading and evolution of a population on a neutral network : t = 850
Spreading and evolution of a population on a neutral network : t = 855
Mount Fuji
Example of a smooth landscape on Earth
Dolomites Bryce Canyon
Examples of rugged landscapes on Earth
Genotype Space Fitness
Start of Walk End of Walk
Evolutionary optimization in absence of neutral paths in sequence space
Genotype Space F i t n e s s
Start of Walk End of Walk Random Drift Periods Adaptive Periods
Evolutionary optimization including neutral paths in sequence space
Grand Canyon
Example of a landscape on Earth with ‘neutral’ ridges and plateaus
1. RNA sequences and structures 2. Neutral networks 3. Evolutionary optimization of structure
- 4. Suboptimal structures and kinetic folding
5. Comparison of kinetic folding and evolution
The Folding Algorithm
A sequence I specifies an energy ordered set of compatible structures S(I):
S(I) = {S0 , S1 , … , Sm , O}
A trajectory Tk(I) is a time ordered series of structures in S(I). A folding trajectory is defined by starting with the open chain O and ending with the global minimum free energy structure S0 or a metastable structure Sk which represents a local energy minimum:
T0(I) = {O , S (1) , … , S (t-1) , S (t) , S (t+1) , … , S0} Tk(I) = {O , S (1) , … , S (t-1) , S (t) , S (t+1) , … , Sk}
Master equation
( )
1 , , 1 , ) ( ) (
1 1 1
+ = − = − =
∑ ∑ ∑
+ = + = + =
m k k P P k t P t P dt dP
m i ki k i m i ik m i ki ik k
K
Transition probabilities Pij(t) = Prob{Si→Sj} are defined by
Pij(t) = Pi(t) kij = Pi(t) exp(-∆Gij/2RT) / Σi Pji(t) = Pj(t) kji = Pj(t) exp(-∆Gji/2RT) / Σj exp(-∆Gki/2RT)
The symmetric rule for transition rate parameters is due to Kawasaki (K. Kawasaki, Diffusion constants near the critical point for time depen-dent Ising models. Phys.Rev. 145:224-230, 1966).
∑
+ ≠ =
= Σ
2 , 1 m i k k k
Formulation of kinetic RNA folding as a stochastic process
Corresponds to base pair distance: dP(S1,S2) Base pair formation and base pair cleavage moves for nucleation and elongation of stacks
Base pair closure, opening and shift corresponds to Hamming distance: dH(S1,S2) Base pair shift move of class 1: Shift inside internal loops or bulges
Two measures of distance in shape space: Hamming distance between structures, dH(Si,Sj) and base pair distance, dP(Si,Sj)
Sh S1
(h)
S6
(h)
S7
(h)
S5
(h)
S2
(h)
S9
(h)
Free energy G
- Local minimum
Suboptimal conformations
Search for local minima in conformation space
F r e e e n e r g y G
- "Reaction coordinate"
Sk S{ Saddle point T
{ k
F r e e e n e r g y G
- Sk
S{ T
{ k
"Barrier tree"
Definition of a ‚barrier tree‘
CUGCGGCUUUGGCUCUAGCC ....((((........)))) -4.30 (((.(((....))).))).. -3.50 (((..((....))..))).. -3.10 ..........(((....))) -2.80 ..(((((....)))...)). -2.20 ....(((..........))) -2.20 ((..(((....)))..)).. -2.00 ..((.((....))....)). -1.60 ....(((....)))...... -1.60 .....(((........))). -1.50 .((.(((....))).))... -1.40 ....((((..(...).)))) -1.40 .((..((....))..))... -1.00 (((.(((....)).)))).. -0.90 (((.((......)).))).. -0.90 ....((((..(....))))) -0.80 .....((....))....... -0.80 ..(.(((....))))..... -0.60 ....(((....)).)..... -0.60 (((..(......)..))).. -0.50 ..(((((....)).)..)). -0.50 ..(.(((....))).).... -0.40 ..((.......))....... -0.30 ..........((......)) -0.30 ...........((....)). -0.30 (((.(((....)))).)).. -0.20 ....(((.(.......)))) -0.20 ....(((..((....))))) -0.20 ..(..((....))..).... 0.00 .................... 0.00 .(..(((....)))..)... 0.10
M.T. Wolfinger, W.A. Svrcek-Seiler, C. Flamm, I.L. Hofacker, P.F. Stadler. 2004. J.Phys.A: Math.Gen. 37:4731-4741.
CUGCGGCUUUGGCUCUAGCC ....((((........)))) -4.30 (((.(((....))).))).. -3.50 (((..((....))..))).. -3.10 ..........(((....))) -2.80 ..(((((....)))...)). -2.20 ....(((..........))) -2.20 ((..(((....)))..)).. -2.00 ..((.((....))....)). -1.60 ....(((....)))...... -1.60 .....(((........))). -1.50 .((.(((....))).))... -1.40 ....((((..(...).)))) -1.40 .((..((....))..))... -1.00 (((.(((....)).)))).. -0.90 (((.((......)).))).. -0.90 ....((((..(....))))) -0.80 .....((....))....... -0.80 ..(.(((....))))..... -0.60 ....(((....)).)..... -0.60 (((..(......)..))).. -0.50 ..(((((....)).)..)). -0.50 ..(.(((....))).).... -0.40 ..((.......))....... -0.30 ..........((......)) -0.30 ...........((....)). -0.30 (((.(((....)))).)).. -0.20 ....(((.(.......)))) -0.20 ....(((..((....))))) -0.20 ..(..((....))..).... 0.00 .................... 0.00 .(..(((....)))..)... 0.10
M.T. Wolfinger, W.A. Svrcek-Seiler, C. Flamm, I.L. Hofacker, P.F. Stadler. 2004. J.Phys.A: Math.Gen. 37:4731-4741.
Arrhenius kinetics M.T. Wolfinger, W.A. Svrcek-Seiler, C. Flamm, I.L. Hofacker, P.F. Stadler. 2004. J.Phys.A: Math.Gen. 37:4731-4741.
Arrhenius kinetic Exact solution of the master equation M.T. Wolfinger, W.A. Svrcek-Seiler, C. Flamm, I.L. Hofacker, P.F. Stadler. 2004. J.Phys.A: Math.Gen. 37:4731-4741.
S0 S1 Kinetic structures Free Energy
S0 S0 S1 S2 S3 S4 S5 S6 S7 S8 S10 S9
Minimum free energy structure Suboptimal structures One sequence - one structure Many suboptimal structures Partition function Metastable structures Conformational switches
RNA secondary structures derived from a single sequence
Gk Neutral Network
Structure S
k
Gk C
- k
Compatible Set Ck
The compatible set Ck of a structure Sk consists of all sequences which form Sk as its minimum free energy structure (the neutral network Gk) or one of its suboptimal structures.
Structure S Structure S
1
The intersection of two compatible sets is always non empty: C0 C1
Reference for the definition of the intersection and the proof of the intersection theorem
JN1LH
1D 1D 1D 2D 2D 2D R R R
G GGGUGGAAC GUUC GAAC GUUCCUCCC CACGAG CACGAG CACGAG
- 28.6 kcal·mol
- 1
G/
- 31.8 kcal·mol
- 1
G G G G G G C C C C C C A A U U U U G G C C U U A A G G G C C C A A A A G C G C A A G C /G
- 28.2 kcal·mol
- 1
G G G G G G GG CCC C C C C C U G G G G C C C C A A A A A A A A U U U U U G G C C A A
- 28.6 kcal·mol
- 1
3 3 3 13 13 13 23 23 23 33 33 33 44 44 44
5' 5' 3’ 3’
J.H.A. Nagel, C. Flamm, I.L. Hofacker, K. Franke, M.H. de Smit, P. Schuster, and C.W.A. Pleij. Structural parameters affecting the kinetic competition of RNA hairpin formation, Nucleic Acids Res., in press 2005.
An RNA switch
4 5 8 9 11
1 9 2 2 4 2 5 2 7 3 3 3 4
36
38 39 41 46 47
3
49
1
2 6 7 10
1 2 1 3 1 4 1 5 1 6 1 7 1 8 2 1 22 2 3 2 6 2 8 2 9 3 3 1 32 3 5 3 7
40
4 2 4 3 44 45 48 50
- 26.0
- 28.0
- 30.0
- 32.0
- 34.0
- 36.0
- 38.0
- 40.0
- 42.0
- 44.0
- 46.0
- 48.0
- 50.0
2.77 5.32 2 . 9 3.4 2.36 2 . 4 4 2.44 2.44 1.46 1.44 1.66
1.9
2.14
2.51 2.14 2.51
2 . 1 4 1 . 4 7
1.49
3.04 2.97 3.04 4.88 6.13 6 . 8 2.89
Free energy [kcal / mole]
J1LH barrier tree
A ribozyme switch
E.A.Schultes, D.B.Bartel, Science 289 (2000), 448-452
Two ribozymes of chain lengths n = 88 nucleotides: An artificial ligase (A) and a natural cleavage ribozyme of hepatitis--virus (B)
The sequence at the intersection: An RNA molecules which is 88 nucleotides long and can form both structures
Two neutral walks through sequence space with conservation of structure and catalytic activity
1. RNA sequences and structures 2. Neutral networks 3. Evolutionary optimization of structure 4. Suboptimal structures and kinetic folding
- 5. Comparison of kinetic folding and evolution
Kinetic Folding
Compatible structures: Set of stuctures compatible with a given sequence stability restriction Conformation space Folding trajectory in conformation space: Time ordered series of structures Folding process: Average of trajectories on the ensemble level Criterium: minimizing free energy
Evolutionary optimization
Compatible sequences: Set of sequences compatible with a given structure mfe restriction Neutral network Genealogy on a neutral network: Time ordered series of sequences Optimization process: Average over genealogies on the population level Criterium: maximizing fitness
Universität Wien
Acknowledgement of support
Fonds zur Förderung der wissenschaftlichen Forschung (FWF) Projects No. 09942, 10578, 11065, 13093 13887, and 14898 Wiener Wissenschafts-, Forschungs- und Technologiefonds (WWTF) Project No. Mat05 Jubiläumsfonds der Österreichischen Nationalbank Project No. Nat-7813 European Commission: Contracts No. 98-0189, 12835 (NEST) Austrian Genome Research Program – GEN-AU: Bioinformatics Network (BIN) Österreichische Akademie der Wissenschaften Berlin-Brandenburgische Akademie der Wissenschaften Siemens AG, Austria Universität Wien and the Santa Fe Institute
Coworkers
Peter Stadler, Bärbel M. Stadler, Universität Leipzig, GE Paul E. Phillipson, University of Colorado at Boulder, CO Heinz Engl, Philipp Kügler, James Lu, Stefan Müller, RICAM Linz, AT Jord Nagel, Kees Pleij, Universiteit Leiden, NL Walter Fontana, Harvard Medical School, MA Christian Reidys, Christian Forst, Los Alamos National Laboratory, NM Ulrike Göbel, Walter Grüner, Stefan Kopp, Jaqueline Weber, Institut für Molekulare Biotechnologie, Jena, GE Ivo L.Hofacker, Christoph Flamm, Andreas Svrček-Seiler, Universität Wien, AT Kurt Grünberger, Michael Kospach , Andreas Wernitznig, Stefanie Widder, Michael Wolfinger, Stefan Wuchty, Universität Wien, AT Jan Cupal, Stefan Bernhart, Lukas Endler, Ulrike Langhammer, Rainer Machne, Ulrike Mückstein, Hakim Tafer, Thomas Taylor, Universität Wien, AT
Universität Wien
Web-Page for further information: http://www.tbi.univie.ac.at/~pks