The Role of Neutral Networks in Evolutionary Optimization RNA - - PowerPoint PPT Presentation
The Role of Neutral Networks in Evolutionary Optimization RNA - - PowerPoint PPT Presentation
The Role of Neutral Networks in Evolutionary Optimization RNA Structures as an Example Peter Schuster Institut fr Theoretische Chemie und Molekulare Strukturbiologie der Universitt Wien Understanding Complex Systems Urbana (IL), 17.
The Role of Neutral Networks in Evolutionary Optimization RNA Structures as an Example Peter Schuster Institut für Theoretische Chemie und Molekulare Strukturbiologie der Universität Wien Understanding Complex Systems Urbana (IL), 17.– 20.05.2004
Web-Page for further information: http://www.tbi.univie.ac.at/~pks
Conformational and mutational landscapes of biomolecules as well as fitness landscapes of evolutionary biology are rugged.
Genotype Space Fitness Start of Walk End of Walk
Adaptive or non-descending walks on rugged landscapes end commonly at one of the low lying local maxima.
Genotype Space Fitness Start of Walk End of Walk
Selective neutrality in the form of neutral networks plays an active role in evolutionary optimization and enables populations to reach high local maxima or even the global optimum.
1. What is a neutral network? 2. RNA secondary structures and neutrality 3. Optimization on neutral networks 4. Some experiments with RNA molecules
1. What is a neutral network? 2. RNA secondary structures and neutrality 3. Optimization on neutral networks 4. Some experiments with RNA molecules
„...Variations neither useful not injurious would not be affected by natural selection, and would be left either a fluctuating element, as perhaps we see in certain polymorphic species, or would ultimately become fixed,
- wing to the nature of the organism and the nature of
the conditions. ...“
Charles Darwin, Origin of species (1859)
The molecular clock of evolution
Motoo Kimura’s population genetics of neutral evolution. Evolutionary rate at the molecular level. Nature 217: 624-626, 1955. The Neutral Theory of Molecular Evolution. Canbridge University Press. Cambridge, UK, 1983.
A mapping and its inversion
- Gk =
( ) | ( ) =
- 1
U
- S
I S
k j j k
I
- ( ) =
I S
j k Space of genotypes: = { I
S I I I I I S S S S S
1 2 3 4 N 1 2 3 4 M
, , , , ... , } ; Hamming metric Space of phenotypes: , , , , ... , } ; metric (not required) N M = {
CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... G A G T A C A C
Hamming distance d (I ,I ) =
H 1 2
4 d (I ,I ) = 0
H 1 1
d (I ,I ) = d (I ,I )
H H 1 2 2 1
d (I ,I ) d (I ,I ) + d (I ,I )
H H H 1 3 1 2 2 3
- (i)
(ii) (iii)
The Hamming distance between genotypes induces a metric in sequence space
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Structure space Real numbers Mapping from sequence space into structure space and into function
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Structure space Real numbers
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Structure space Real numbers
The pre-image of the structure Sk in sequence space is the neutral network Gk
Neutral networks are sets of sequences forming the same object in a phenotype space. The neutral network Gk is, for example, the pre- image of the structure Sk in sequence space: Gk =
- 1(Sk) π{
j |
(Ij) = Sk} The set is converted into a graph by connecting all sequences of Hamming distance one. Neutral networks of small biomolecules can be computed by exhaustive folding of complete sequence spaces, i.e. all RNA sequences of a given chain length. This number, N=4n , becomes very large with increasing length, and is prohibitive for numerical computations. Neutral networks can be modelled by random graphs in sequence
- space. In this approach, nodes are inserted randomly into sequence
space until the size of the pre-image, i.e. the number of neutral sequences, matches the neutral network to be studied.
Random graph approach to neutral networks Sketch of sequence space Step 00
Random graph approach to neutral networks Sketch of sequence space Step 01
Random graph approach to neutral networks Sketch of sequence space Step 02
Random graph approach to neutral networks Sketch of sequence space Step 03
Random graph approach to neutral networks Sketch of sequence space Step 04
Random graph approach to neutral networks Sketch of sequence space Step 05
Random graph approach to neutral networks Sketch of sequence space Step 10
Random graph approach to neutral networks Sketch of sequence space Step 15
Random graph approach to neutral networks Sketch of sequence space Step 25
Random graph approach to neutral networks Sketch of sequence space Step 50
Random graph approach to neutral networks Sketch of sequence space Step 75
Random graph approach to neutral networks Sketch of sequence space Step 100
λj = 27 = 0.444 ,
/
12 λk = (k)
j
| | Gk λ λ
k cr . . . .
> λ λ
k cr . . . .
< network is G connected
k
network is G not connected k
λ κ
cr = 1 -
- 1 (
1)
/ κ- Connectivity threshold: The parameter is the size of the alphabet underlying the strings in sequence space
- G
S S
k k k
= ( ) | ( ) =
- 1
U
- I
I
j j
- cr
2 0.5 3 0.423 4 0.370
Mean degree of neutrality λ and connectivity of neutral networks
Giant Component
A neutral network below connectivity threshold
A connected neutral network
1. What is a neutral network? 2. RNA secondary structures and neutrality 3. Optimization on neutral networks 4. Some experiments with RNA molecules
O CH2 OH O O P O O O
N1
O CH2 OH O P O O O
N2
O CH2 OH O P O O O
N3
O CH2 OH O P O O O
N4
N A U G C
k =
, , ,
3' - end 5' - end Na Na Na Na
nd 3’-end
GCGGAU AUUCGC UUA AGUUGGGA G CUGAAGA AGGUC UUCGAUC A ACCA GCUC GAGC CCAGA UCUGG CUGUG CACAG 3'-end 5’-end
70 60 50 40 30 20 10
Definition of RNA structure
5'-e
5'-End 5'-End 3'-End 3'-End
70 60 50 40 30 20 10 GCGGAUUUAGCUCAGDDGGGAGAGCMCCAGACUGAAYAUCUGGAGMUCCUGUGTPCGAUCCACAGAAUUCGCACCA
Sequence Secondary structure
Definition and physical relevance of RNA secondary structures
RNA secondary structures are listings of Watson-Crick and GU wobble base pairs, which are free of knots and pseudokots. „Secondary structures are folding intermediates in the formation of full three-dimensional structures.“ D.Thirumalai, N.Lee, S.A.Woodson, and D.K.Klimov. Annu.Rev.Phys.Chem. 52:751-762 (2001):
James D. Watson and Francis H.C. Crick Nobel prize 1962 1953 – 2003 fifty years double helix Stacking of base pairs in nucleic acid double helices (B-DNA)
2 2 6 5 6 8 C ’
1
C ’
1
5 4 4 6 2 9 7 4 3 3 2 1 1
54.4 55.7
10.72 Å 2 2 6 5 6 8 C ’
1
C ’
1
5 4 4 4 2 9 7 6 3 3 1 1
56.2 57.4
10.44 Å
U = A C G
- Watson-Crick type base pairs
O O O H H H H H H N N N N O O H N N H O N N N N N N N
G=U U=G
Deviation from Watson-Crick geometry Deviation from Watson-Crick geometry
Wobble base pairs
RNA sequence
Empirical parameters Biophysical chemistry: thermodynamics and kinetics
RNA structure
- f minimal free
energy
Sequence, structure, and design
Inverse folding of RNA: Biotechnology, design of biomolecules with predefined structures and functions RNA folding: Structural biology, spectroscopy of biomolecules, understanding molecular function
hairpin loop hairpin loop stack stack stack hairpin loop stack free end free end free end hairpin loop hairpin loop stack stack free end free end joint hairpin loop stack stack stack internal loop bulge multiloop
Elements of RNA secondary structures as used in free energy calculations
G G G G G G G G G G G G G G G G U U U U U U U U U U U A A A A A A A A A A A A U C C C C C C C C C C C C 5’-end 3’-end
free energy of stacking < 0
L
∑ ∑ ∑ ∑
+ + + + = ∆
loops internal bulges loops hairpin pairs base
- f
stacks , 300
) ( ) ( ) (
i b l kl ij
n i n b n h g G
Folding of RNA sequences into secondary structures of minimal free energy, G0
300
G G G G G G G G G G G G G G G G U U U U U U U U U U U A A A A A A A A A A A A U C C C C C C C C C C C C 5’-end 3’-end
S1
(h)
S9
(h)
F r e e e n e r g y G Minimum of free energy Suboptimal conformations
S0
(h) S2
(h)
S3
(h)
S4
(h)
S7
(h)
S6
(h)
S5
(h)
S8
(h)
The minimum free energy structures on a discrete space of conformations
5'-End 5'-End 5'-End 3'-End 3'-End 3'-End
70 60 50 40 30 20 10 GCGGAUUUAGCUCAGDDGGGAGAGCMCCAGACUGAAYAUCUGGAGMUCCUGUGTPCGAUCCACAGAAUUCGCACCA
Sequence Secondary structure Symbolic notation
- A symbolic notation of RNA secondary structure that is equivalent to the conventional graphs
Hamming distance d (S ,S ) =
H 1 2
4 d (S ,S ) = 0
H 1 1
d (S ,S ) = d (S ,S )
H H 1 2 2 1
d (S ,S ) d (S ,S ) + d (S ,S )
H H H 1 3 1 2 2 3
- (i)
(ii) (iii)
The Hamming distance between structures in parentheses notation forms a metric in structure space
Minimal hairpin loop size: nlp 3 Minimal stack length: nst 2
Recursion formula for the number of acceptable RNA secondary structures
Computed numbers of minimum free energy structures over different nucleotide alphabets
- P. Schuster, Molecular insights into evolution of phenotypes. In: J. Crutchfield & P.Schuster,
Evolutionary Dynamics. Oxford University Press, New York 2003, pp.163-215.
UUUAGCCAGCGCGAGUCGUGCGGACGGGGUUAUCUCUGUCGGGCUAGGGCGC GUGAGCGCGGGGCACAGUUUCUCAAGGAUGUAAGUUUUUGCCGUUUAUCUGG UUAGCGAGAGAGGAGGCUUCUAGACCCAGCUCUCUGGGUCGUUGCUGAUGCG CAUUGGUGCUAAUGAUAUUAGGGCUGUAUUCCUGUAUAGCGAUCAGUGUCCG GUAGGCCCUCUUGACAUAAGAUUUUUCCAAUGGUGGGAGAUGGCCAUUGCAG
Criterion of Minimum Free Energy
Sequence Space Shape Space
Reference for postulation and in silico verification of neutral networks
Evolution in silico
- W. Fontana, P. Schuster,
Science 280 (1998), 1451-1455
λj = 27 = 0.444 ,
/
12 λk = (k)
j
| | Gk
λ κ
cr = 1 -
- 1 (
1)
/ κ- λ λ
k cr . . . .
> λ λ
k cr . . . .
< network is connected Gk network is connected not Gk Connectivity threshold: Alphabet size : = 4
- AUGC
G S S
k k k
= ( ) | ( ) =
- 1
U
- I
I
j j
- cr
2 0.5 3 0.423 4 0.370
GC,AU GUC,AUG AUGC
Mean degree of neutrality and connectivity of neutral networks
A connected neutral network formed by a common structure
Giant Component
A multi-component neutral network formed by a rare structure
Structure
C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G
Compatible sequence Structure
5’-end 3’-end
C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G G G G G G G G C C C C G G G G C C C C C C C U A U U G U A A A A U
Compatible sequence Structure
5’-end 3’-end
C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G G G G G G G G C C C C U U G G G G G C C C C C C C U U A A A A A U
Compatible sequence Structure
5’-end 3’-end
Single nucleotides: A U G C , , ,
Single bases pairs are varied independently
C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G G G C C C C G G G G C C G G G G G C C C C C U A U U G U A A A A U
Compatible sequence Structure
5’-end 3’-end
Base pairs: AU , UA GC , CG GU , UG
Base pairs are varied in strict correlation
C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C G G U C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G G G G G G G G G G G G G G G C U C C C C C C U U U U G G G G G G G G G G C C C C C C C C C C C C C C U U U U A A A A A A A A A A U U
Compatible sequences Structure
5’-end 5’-end 3’-end 3’-end
C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C G C G G G G G G G G G C G C C U U G G G G G C C C C C C C U U A A A A A U
Structure Incompatible sequence
5’-end 3’-end
Gk Neutral Network
Structure S
k
Gk C k
Compatible Set Ck
The compatible set Ck of a structure Sk consists of all sequences which form Sk as its minimum free energy structure (the neutral network Gk) or one of its suboptimal structures.
Structure S Structure S
1
The intersection of two compatible sets is always non empty: C0 C1 π
Reference for the definition of the intersection and the proof of the intersection theorem
5.10
2 8
14 15 18 17 23 19 27 22 38 45 25 36 33 39 40 43 41
3.30 7.40
5 3 7 4 10 9 6
13 12
3.1011 21 20 16 28 29 26 30 32 42 46 44 24 35 34 37 49 31 47 48
S0 S1
Kinetic folding
S0 S1 S2 S3 S4 S5 S6 S7 S8 S10 S9
Suboptimal structures
lim t finite folding time
5.90
A typical energy landscape of a sequence with two (meta)stable comformations
1. What is a neutral network? 2. RNA secondary structures and neutrality 3. Optimization on neutral networks 4. Some experiments with RNA molecules
Stock Solution Reaction Mixture
Replication rate constant: fk = / [+ dS
(k)]
- dS
(k) = dH(Sk,S
) Selection constraint: # RNA molecules is controlled by the flow N N t N ± ≈ ) ( The flowreactor as a device for studies of evolution in vitro and in silico
f0 f f1 f2 f3 f4 f6 f5 f7
Replication rate constant: fk = / [+ dS
(k)]
- dS
(k) = dH(Sk,S
)
Evaluation of RNA secondary structures yields replication rate constants
S{ = ( ) I{ f S
{ {
ƒ = ( )
S{ f{ I{
Mutation Genotype-Phenotype Mapping Evaluation of the Phenotype
Q{
j
I1 I2 I3 I4 I5 In
Q
f1 f2 f3 f4 f5 fn
I1 I2 I3 I4 I5 I{ In+1 f1 f2 f3 f4 f5 f{ fn+1
Q
Evolutionary dynamics including molecular phenotypes
5'-End 3'-End
70 60 50 40 30 20 10
Randomly chosen initial structure Phenylalanyl-tRNA as target structure
In silico optimization in the flow reactor: Trajectory (physicists‘ view) Time (arbitrary units) A v e r a g e s t r u c t u r e d i s t a n c e t
- t
a r g e t d
- S
500 750 1000 1250 250 50 40 30 20 10
Evolutionary trajectory
44
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
Endconformation of optimization
44 43
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
Reconstruction of the last step 43 44
44 43 42
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
Reconstruction of last-but-one step 42 43 ( 44)
44 43 42 41
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
Reconstruction of step 41 42 ( 43 44)
44 43 42 41 40
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
Reconstruction of step 40 41 ( 42 43 44)
44 43 42 41 40 39 Evolutionary process Reconstruction
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
Reconstruction of the relay series
Transition inducing point mutations Neutral point mutations
Change in RNA sequences during the final five relay steps 39 44
In silico optimization in the flow reactor: Trajectory and relay steps Time (arbitrary units) A v e r a g e s t r u c t u r e d i s t a n c e t
- t
a r g e t d
- S
500 750 1000 1250 250 50 40 30 20 10
Evolutionary trajectory
Relay steps
10 08 12 14 Time (arbitrary units) Average structure distance to target dS
- 500
250 20 10
Uninterrupted presence Evolutionary trajectory Number of relay step
28 neutral point mutations during a long quasi-stationary epoch Transition inducing point mutations Neutral point mutations
Neutral genotype evolution during phenotypic stasis
Variation in genotype space during optimization of phenotypes
Mean Hamming distance within the population and drift velocity of the population center in sequence space.
Spread of population in sequence space during a quasistationary epoch: t = 150
Spread of population in sequence space during a quasistationary epoch: t = 170
Spread of population in sequence space during a quasistationary epoch: t = 200
Spread of population in sequence space during a quasistationary epoch: t = 350
Spread of population in sequence space during a quasistationary epoch: t = 500
Spread of population in sequence space during a quasistationary epoch: t = 650
Spread of population in sequence space during a quasistationary epoch: t = 820
Spread of population in sequence space during a quasistationary epoch: t = 825
Spread of population in sequence space during a quasistationary epoch: t = 830
Spread of population in sequence space during a quasistationary epoch: t = 835
Spread of population in sequence space during a quasistationary epoch: t = 840
Spread of population in sequence space during a quasistationary epoch: t = 845
Spread of population in sequence space during a quasistationary epoch: t = 850
Spread of population in sequence space during a quasistationary epoch: t = 855
AUGC GC Movies of optimization trajectories over the AUGC and the GC alphabet
Alphabet Runtime Transitions Main transitions
- No. of runs
AUGC 385.6 22.5 12.6 1017 GUC 448.9 30.5 16.5 611 GC 2188.3 40.0 20.6 107
Statistics of trajectories and relay series (mean values of log-normal distributions).
AUGC neutral networks of tRNAs are near the connectivity threshold, GC neutral networks are way below.
Mount Fuji
Example of a smooth landscape on Earth
Dolomites Bryce Canyon
Examples of rugged landscapes on Earth
Genotype Space Fitness
Start of Walk End of Walk
Evolutionary optimization in absence of neutral paths in sequence space
Genotype Space F i t n e s s
Start of Walk End of Walk Random Drift Periods Adaptive Periods
Evolutionary optimization including neutral paths in sequence space
Grand Canyon
Example of a landscape on Earth with ‘neutral’ ridges and plateaus
Neutral ridges and plateus
1. What is a neutral network? 2. RNA secondary structures and neutrality 3. Optimization on neutral networks 4. Some experiments with RNA molecules
Structure S Structure S
1
The intersection of two compatible sets is always non empty: C0 C1 π
A ribozyme switch
E.A.Schultes, D.B.Bartel, Science 289 (2000), 448-452
Two ribozymes of chain lengths n = 88 nucleotides: An artificial ligase (A) and a natural cleavage ribozyme of hepatitis-
- virus (B)
The sequence at the intersection: An RNA molecules which is 88 nucleotides long and can form both structures
Two neutral walks through sequence space with conservation of structure and catalytic activity
J.H.A. Nagel, J. Møller-Jensen, C. Flamm, K.J. Öistämö, J. Besnard, I.L. Hofacker, A.P. Gultyaev, M.H. de Smit, P. Schuster, K. Gerdes and C.W.A. Pleij. The refolding mechanism of the metastable structure in the 5’-end of the hok mRNA of plasmid R1, submitted 2004. J.H.A. Nagel, C. Flamm, I.L. Hofacker, K. Franke, M.H. de Smit, P. Schuster, and C.W.A. Pleij. Structural parameters affecting the kinetic competition of RNA hairpin formation, in press 2004.
5 . 1
2 8
14 15 18 17 23 19 27 22 38 45 25 36 33 39 40 43 41
3 . 3 7 . 4
5 3 7 4 10 9 6
13 12
3 . 1
11 21 20 16 28 29 26 30 32 42 46 44 24 35 34 37 49 31 47 48
S0 S1
5 . 9
Barrier tree of a sequence forming two distinct hairpin structures
RNA 9:1456-1463, 2003
Evidence for neutral networks and shape space covering
Evidence for neutral networks and intersection of apatamer functions
Acknowledgement of support
Fonds zur Förderung der wissenschaftlichen Forschung (FWF) Projects No. 09942, 10578, 11065, 13093 13887, and 14898 Jubiläumsfonds der Österreichischen Nationalbank Project No. Nat-7813 European Commission: Project No. EU-980189 Austrian Genome Research Program – GEN-AU Siemens AG, Austria The Santa Fe Institute and the Universität Wien The software for producing RNA movies was developed by Robert Giegerich and coworkers at the Universität Bielefeld
Universität Wien
Coworkers
Walter Fontana, Santa Fe Institute, NM Christian Reidys, Christian Forst, Los Alamos National Laboratory, NM Peter Stadler, Bärbel Stadler, Universität Leipzig, GE Jord Nagel, Kees Pleij, Universiteit Leiden,NL Ivo L.Hofacker, Christoph Flamm, Universität Wien, AT Andreas Wernitznig, Michael Kospach, Universität Wien, AT Ulrike Langhammer, Ulrike Mückstein, Stefanie Widder Jan Cupal, Kurt Grünberger, Andreas Svrček-Seiler, Stefan Wuchty Stefan Bernhart, Lukas Endler Ulrike Göbel, Institut für Molekulare Biotechnologie, Jena, GE Walter Grüner, Stefan Kopp, Jaqueline Weber
Universität Wien