Evolutionre Optimierung von Moleklen Von mathematischer Modellierung - - PowerPoint PPT Presentation

evolution re optimierung von molek len
SMART_READER_LITE
LIVE PREVIEW

Evolutionre Optimierung von Moleklen Von mathematischer Modellierung - - PowerPoint PPT Presentation

Evolutionre Optimierung von Moleklen Von mathematischer Modellierung zur Besttigung im Experiment Peter Schuster Institut fr Theoretische Chemie und Molekulare Strukturbiologie der Universitt Wien DMV-Jahrestagung 2002 Halle an der


slide-1
SLIDE 1
slide-2
SLIDE 2

Evolutionäre Optimierung von Molekülen

Von mathematischer Modellierung zur Bestätigung im Experiment

Peter Schuster Institut für Theoretische Chemie und Molekulare Strukturbiologie der Universität Wien DMV-Jahrestagung 2002 Halle an der Saale, 16.– 21.09.2002

slide-3
SLIDE 3

Das Darwinsche Optimierungsprinzip baut auf drei Voraussetzungen auf.

  • 1. Reproduktion von Organismen durch Vermehrung der Phänotypen

Die Reproduktionseffizienz wird gemessen als Zahl der fruchtbaren Nachkommen oder Fitness.

  • 2. Variation der Genotypen durch Kopierfehler und Rekombination

Die Genotypen oder Genome sind der Träger der genetischen Information.

  • 3. Selektion durch Unterschiede in der Fitness der Phänotypen

Zwei zusätzlichen Voraussetzungen

  • 4. Eine hinreichend große Zahl unterschiedlicher Genotypen und eine

hinreichend große Vielfalt an Phänotypen

  • 5. Eine für die Optimierung unterstützende Beziehung zwischen den

Genotypen und den Phänotypen

Die Beziehung zwischen Genotypen und Phänotypen wird als eine Abbildung von einem Raum der Genotypen in einen Raum der Phänotypen verstanden.

slide-4
SLIDE 4

Die Ursache für den Erfolg und die universelle Anwendbarkeit des Darwinschen Optimierungsprinzips bildet gleichzeitig den Grund für seine einscheidende Beschränkung: Die inneren Strukturen der sich reproduzierenden Einheiten gehen nur in Form der Fitnessparameter ein. Es ist gleichgültig, ob Moleküle, nicht-autonome oder autonome Organismen, Kolonien, Vielzeller oder Gesellschaften vermehrt werden. In dieser Form bietet die biologische Evolutionstheorie nur eine rein ordnende makroskopische Beschreibung der beobachtbaren Phänomene an.

slide-5
SLIDE 5

1. Optimierung durch Variation und Selektion in Populationen 2. Neutrale Netzwerke in Genotype-Phänotyp-Abbildungen 3. Optimierung im RNA-Modell 4. Evolutionsexperimente mit Molekülen im Laboratorium

slide-6
SLIDE 6

Das Darwinsche Optimierungsprinzip ist im Fall von null verschiedener Mutationsraten (q<1 oder p>0) nur als eine Optimierungsheuristik zu verstehen. Es gilt nur in einem Teil des Simplex der relativen Konzentrationen. Mit steigender Mutationsrate p wird der Teil des Konzentrationsraumes, in welchem das Optimierungsprinzip gilt, immer kleiner. Analog gilt für das Selektions-Rekombinationsmodell, dass das Fishersche Optimierungskriterium nur eingeschränkt auf das Ein-Gen-Modell (Single locus model) gültig ist.

slide-7
SLIDE 7

Evolutionary Optimization of Molecules

From mathematical models to confirmation by experiment Peter Schuster Institut für Theoretische Chemie und Molekulare Strukturbiologie der Universität Wien DMV-Jahrestagung 2002 Halle an der Saale, 16.– 21.09.2002

slide-8
SLIDE 8

The Darwinian principle of optimization is built on three prerequisites:

  • 1. Reproduction of organisms through multiplication of phenotypes

Efficiency of reproduction is measured as fitness being tantamount to the number

  • f fertile descendants which are brought into the next generation.
  • 2. Variation of genotypes though copying errors and recombination

The genotypes or genomes are the carriers of genetic information.

  • 3. Selection through differences in the fitness of phenotypes

Two additional prerequisites

  • 4. A large enough number of genotypes and a sufficiently large reservoir of

diversity of phenotypes

  • 5. A relation between genotypes and phenotypes that supports optimization

through variation and selection

The relation between genotypes and phenotypes is understood as a mapping from a space of genotypes onto a space of phenotypes.

slide-9
SLIDE 9

The basis for success and universal applicability of the Darwinian priciple of optimization represents, at the same time, also its most serious limitation: The internal structures of the reproducing units are addressed only in terms of fitness parameters. Therefore, it does not matter whether multiplication concerns molecules, non-autonomous or autonomous cells, colonies, multicellular organisms or societies. The theory of biological evolution in this form can provide only a macroscopic description and classification as well as ordering relations of the observed phenomena.

slide-10
SLIDE 10

1. Optimization through variation and selection in populations 2. Neutral networks in genotype-phenotype mappings 3. Optimization in the RNA model 4. Evolution experiments with molecules in the laboratory

slide-11
SLIDE 11

1. Optimization through variation and selection in populations 2. Neutral networks in genotype-phenotype mappings 3. Optimization in the RNA model 4. Evolution experiments with molecules in the laboratory

slide-12
SLIDE 12

G G G G C C C G C C G C C G C C G C C G C C C C G G G G G C G C

Plus Strand Plus Strand Minus Strand Plus Strand Plus Strand Minus Strand

3' 3' 3' 3' 3' 5' 5' 5' 3' 3' 5' 5' 5' +

Complex Dissociation Synthesis Synthesis

Complementary replication as the simplest copying mechanism of RNA Complementarity is determined by Watson-Crick base pairs: G C and A=U

slide-13
SLIDE 13

dx / dt = x - x x

i i i j j

; Σ = 1 ; i,j f f

i j

Φ Φ fi Φ = ( = Σ x

  • i

)

j j

x =1,2,...,n [I ] = x 0 ;

i i

i =1,2,...,n ; Ii I1 I2 I1 I2 I1 I2 I i I n I i I n I n

+ + + + + +

(A) + (A) + (A) + (A) + (A) + (A) + fn fi f1 f2 I m I m I m

+

(A) + (A) + fm fm fj = max { ; j=1,2,...,n} xm(t) 1 for t

  • [A] = a = constant

Reproduction of organisms or replication of molecules as the basis of selection

slide-14
SLIDE 14

Selection equation: [Ii] = xi 0 , fi > 0 Mean fitness or dilution flux, φ (t), is a non-decreasing function of time, Solutions are obtained by integrating factor transformation

( )

f x f x n i f x dt dx

n j j j n i i i i i

= = = = − =

∑ ∑

= = 1 1

; 1 ; , , 2 , 1 , φ φ L

( )

{ }

var

2 2 1

≥ = − = = ∑

=

f f f dt dx f dt d

i n i i

φ

( ) ( ) ( ) ( )

( )

n i t f x t f x t x

j n j j i i i

, , 2 , 1 ; exp exp

1

L = ⋅ ⋅ =

∑ =

slide-15
SLIDE 15

s = ( f2-f1) / f1; f2 > f1 ; x1(0) = 1 - 1/N ; x2(0) = 1/N

200 400 600 800 1000 0.2 0.4 0.6 0.8 1 Time [Generations] Fraction of advantageous variant s = 0.1 s = 0.01 s = 0.02

Selection of advantageous mutants in populations of N = 10 000 individuals

slide-16
SLIDE 16

G G G C C C G C C G C C C G C C C G C G G G G C

Plus Strand Plus Strand Minus Strand Plus Strand 3' 3' 3' 3' 5' 3' 5' 5' 5'

Point Mutation Insertion Deletion

GAA AA UCCCG GAAUCC A CGA GAA AA UCCCGUCCCG GAAUCCA

Mutations in nucleic acids represent the mechanism of variation of genotypes.

slide-17
SLIDE 17

Theory of molecular evolution

M.Eigen, Self-organization of matter and the evolution of biological macromolecules. Naturwissenschaften 58 (1971), 465-526 C.J. Thompson, J.L. McBride, On Eigen's theory of the self-organization of matter and the evolution

  • f biological macromolecules. Math. Biosci. 21 (1974), 127-142

B.L. Jones, R.H. Enns, S.S. Rangnekar, On the theory of selection of coupled macromolecular

  • systems. Bull.Math.Biol. 38 (1976), 15-28

M.Eigen, P.Schuster, The hypercycle. A principle of natural self-organization. Part A: Emergence of the hypercycle. Naturwissenschaften 58 (1977), 465-526 M.Eigen, P.Schuster, The hypercycle. A principle of natural self-organization. Part B: The abstract

  • hypercycle. Naturwissenschaften 65 (1978), 7-41

M.Eigen, P.Schuster, The hypercycle. A principle of natural self-organization. Part C: The realistic

  • hypercycle. Naturwissenschaften 65 (1978), 341-369
  • J. Swetina, P. Schuster, Self-replication with errors - A model for polynucleotide replication.

Biophys.Chem. 16 (1982), 329-345 J.S. McCaskill, A localization threshold for macromolecular quasispecies from continuously distributed replication rates. J.Chem.Phys. 80 (1984), 5194-5202 M.Eigen, J.McCaskill, P.Schuster, The molecular quasispecies. Adv.Chem.Phys. 75 (1989), 149-263

  • C. Reidys, C.Forst, P.Schuster, Replication and mutation on neutral networks. Bull.Math.Biol. 63

(2001), 57-94

slide-18
SLIDE 18

Ij In I2 Ii I1 I j I j I j I j I j I j

+ + + + +

(A) + fj Qj1 fj Qj2 fj Qji fj Qjj fj Qjn Q (1- )

ij

  • d(i,j)

d(i,j)

=

l

p p

p .......... Error rate per digit d(i,j) .... Hamming distance between Ii and Ij ........... Chain length of the polynucleotide

l

dx / dt = x - x x

i j j i j j

Σ

; Σ = 1 ; f f x

j j j i

Φ Φ = Σ Qji Qij

Σi

= 1 [A] = a = constant [Ii] = xi 0 ;

  • i =1,2,...,n ;

Chemical kinetics of replication and mutation as parallel reactions

slide-19
SLIDE 19

Error rate p = 1-q

0.00 0.05 0.10

Quasispecies Uniform distribution Quasispecies as a function of the replication accuracy q

slide-20
SLIDE 20

space Sequence C

  • n

c e n t r a t i

  • n

Master sequence Mutant cloud

The molecular quasispecies in sequence space

slide-21
SLIDE 21

Mutation-selection equation: [Ii] = xi 0, fi > 0, Qij Solutions are obtained after integrating factor transformation by means of an eigenvalue problem

f x f x n i x x Q f dt dx

n j j j n i i i j n j ji j i

= = = = − =

∑ ∑ ∑

= = = 1 1 1

; 1 ; , , 2 , 1 , φ φ L

( ) ( ) ( ) ( ) ( )

) ( ) ( ; , , 2 , 1 ; exp exp

1 1 1 1

∑ ∑ ∑ ∑

= = − = − =

= = ⋅ ⋅ ⋅ ⋅ =

n i i ki k n j k k n k jk k k n k ik i

x h c n i t c t c t x L l l λ λ

{ } { } { }

n j i h H L n j i L n j i Q f W

ij ij ij i

, , 2 , 1 , ; ; , , 2 , 1 , ; ; , , 2 , 1 , ;

1

L L l L = = = = = = ÷

{ }

1 , , 1 , ;

1

− = = Λ = ⋅ ⋅

n k L W L

k

L λ

slide-22
SLIDE 22

e1 e1 e3 e3 e2 e2

l 0 l 1 l 2

x3 x1 x2

The quasispecies on the concentration simplex S3= {

}

1 ; 3 , 2 , 1 ,

3 1

= = ≥

∑ =

i i i

x i x

slide-23
SLIDE 23

In the case of non-zero mutation rates (p>0 or q<1) the Darwinian principle of

  • ptimization of mean fitness can be understood only as an optimization heuristic.

It is valid only on part of the concentration simplex. There are other well defined areas were the mean fitness decreases monotonously or were it may show non- monotonous behavior. The volume of the part of the simplex where mean fitness is non-decreasing in the conventional sense decreases with inreasing mutation rate p. In systems with recombination a similar restriction holds for Fisher‘s „universal selection equation“. Its global validity is restricted to the one-gene (single locus) model.

slide-24
SLIDE 24

1. Optimization through variation and selection in populations 2. Neutral networks in genotype-phenotype mappings 3. Optimization in the RNA model 4. Evolution experiments with molecules in the laboratory

slide-25
SLIDE 25

Theory of genotype – phenotype mapping

  • P. Schuster, W.Fontana, P.F.Stadler, I.L.Hofacker, From sequences to shapes and back:

A case study in RNA secondary structures. Proc.Roy.Soc.London B 255 (1994), 279-284 W.Grüner, R.Giegerich, D.Strothmann, C.Reidys, I.L.Hofacker, P.Schuster, Analysis of RNA sequence structure maps by exhaustive enumeration. I. Neutral networks. Mh.Chem. 127 (1996), 355-374 W.Grüner, R.Giegerich, D.Strothmann, C.Reidys, I.L.Hofacker, P.Schuster, Analysis of RNA sequence structure maps by exhaustive enumeration. II. Structure of neutral networks and shape space covering. Mh.Chem. 127 (1996), 375-389 C.M.Reidys, P.F.Stadler, P.Schuster, Generic properties of combinatory maps. Bull.Math.Biol. 59 (1997), 339-397 I.L.Hofacker, P. Schuster, P.F.Stadler, Combinatorics of RNA secondary structures. Discr.Appl.Math. 89 (1998), 177-207 C.M.Reidys, P.F.Stadler, Combinatory landscapes. SIAM Review 44 (2002), 3-54

slide-26
SLIDE 26

Genotype-phenotype relations are highly complex and only the most simple cases can be studied. One example is the folding of RNA sequences into RNA structures represented in course-grained form as secondary structures. The RNA genotype-phenotype relation is understood as a mapping from the space of RNA sequences into a space of RNA structures.

slide-27
SLIDE 27

5'-End 5'-End 5'-End 3'-End 3'-End 3'-End

70 60 50 40 30 20 10

GCGGAU AUUCGC UUA AGDDGGGA M CUGAAYA AGMUC TPCGAUC A ACCA GCUC GAGC CCAGA UCUGG CUGUG CACAG

Sequence Secondary structure Tertiary structure Symbolic notation

The RNA secondary structure is a listing of GC, AU, and GU base pairs. It is understood in contrast to the full 3D-

  • r tertiary structure at the resolution of atomic coordinates. RNA secondary structures are biologically relevant.

They are, for example, conserved in evolution.

slide-28
SLIDE 28

RNA Minimum Free Energy Structures

Efficient algorithms based on dynamical programming are available for computation of secondary structures for given

  • sequences. Inverse folding algorithms compute sequences

for given secondary structures.

M.Zuker and P.Stiegler. Nucleic Acids Res. 9:133-148 (1981) Vienna RNA Package: http:www.tbi.univie.ac.at (includes inverse folding, suboptimal structures, kinetic folding, etc.) I.L.Hofacker, W. Fontana, P.F.Stadler, L.S.Bonhoeffer, M.Tacker, and P. Schuster. Mh.Chem. 125:167-188 (1994)

slide-29
SLIDE 29

UUUAGCCAGCGCGAGUCGUGCGGACGGGGUUAUCUCUGUCGGGCUAGGGCGC GUGAGCGCGGGGCACAGUUUCUCAAGGAUGUAAGUUUUUGCCGUUUAUCUGG UUAGCGAGAGAGGAGGCUUCUAGACCCAGCUCUCUGGGUCGUUGCUGAUGCG CAUUGGUGCUAAUGAUAUUAGGGCUGUAUUCCUGUAUAGCGAUCAGUGUCCG GUAGGCCCUCUUGACAUAAGAUUUUUCCAAUGGUGGGAGAUGGCCAUUGCAG

Minimum free energy criterion Inverse folding

1st 2nd 3rd trial 4th 5th

The inverse folding algorithm searches for sequences that form a given RNA secondary structure under the minimum free energy criterion.

slide-30
SLIDE 30

UUUAGCCAGCGCGAGUCGUGCGGACGGGGUUAUCUCUGUCGGGCUAGGGCGC GUGAGCGCGGGGCACAGUUUCUCAAGGAUGUAAGUUUUUGCCGUUUAUCUGG UUAGCGAGAGAGGAGGCUUCUAGACCCAGCUCUCUGGGUCGUUGCUGAUGCG CAUUGGUGCUAAUGAUAUUAGGGCUGUAUUCCUGUAUAGCGAUCAGUGUCCG GUAGGCCCUCUUGACAUAAGAUUUUUCCAAUGGUGGGAGAUGGCCAUUGCAG

Criterion of Minimum Free Energy

Sequence Space Shape Space

slide-31
SLIDE 31

The RNA model considers RNA sequences as genotypes and simplified RNA structures, called secondary structures, as phenotypes. The mapping from genotypes into phenotypes is many-to-one. Hence, it is redundant and not invertible. Genotypes, i.e. RNA sequences, which are mapped onto the same phenotype, i.e. the same RNA secondary structure, form neutral networks. Neutral networks are represented by graphs in sequence space.

slide-32
SLIDE 32

CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... G A G T A C A C

Hamming distance d (S ,S ) =

H 1 2

4 d (S ,S ) = 0

H 1 1

d (S ,S ) = d (S ,S )

H H 1 2 2 1

d (S ,S ) d (S ,S ) + d (S ,S )

H H H 1 3 1 2 2 3

  • (i)

(ii) (iii)

The Hamming distance induces a metric in sequence space

slide-33
SLIDE 33

.... GC UC .... CA .... GC UC .... GU .... GC UC .... GA .... GC UC .... CU

d =1

H

d =1

H

d =2

H

Single point mutations as moves in sequence space

slide-34
SLIDE 34

4 2 1 8 16 10 19 9 14 6 13 5 11 3 7 12 21 17 22 18 25 20 26 24 28 27 23 15 29 30 31

Binary sequences are encoded by their decimal equivalents: = 0 and = 1, for example, "0" 00000 = "14" 01110 = , "29" 11101 = , etc. ≡ ≡ ≡ , C CCCCC C C C G GGG GGG G

Mutant class

1 2

3 4

5

Sequence space of binary sequences of chain lenght n=5

slide-35
SLIDE 35

Sk I. = ( ) ψ

fk f Sk = ( )

Sequence space Phenotype space Non-negative numbers Mapping from sequence space into phenotype space and into fitness values

slide-36
SLIDE 36

Sk I. = ( ) ψ

fk f Sk = ( )

Sequence space Phenotype space Non-negative numbers

slide-37
SLIDE 37

Sk I. = ( ) ψ

fk f Sk = ( )

Sequence space Phenotype space Non-negative numbers

The pre-image of the structure Sk in sequence space is the neutral network Gk

slide-38
SLIDE 38

Neutral networks are sets of sequences forming the same structure. Gk is the pre-image of the structure Sk in sequence space: Gk =

  • 1(Sk) {

j |

(Ij) = Sk} The set is converted into a graph by connecting all sequences of Hamming distance one. Neutral networks of small RNA molecules can be computed by exhaustive folding of complete sequence spaces, i.e. all RNA sequences of a given chain length. This number, N=4n , becomes very large with increasing length, and is prohibitive for numerical computations. Neutral networks can be modelled by random graphs in sequence

  • space. In this approach, nodes are inserted randomly into sequence

space until the size of the pre-image, i.e. the number of neutral sequences, matches the neutral network to be studied.

slide-39
SLIDE 39

Random graph approach to neutral networks Sketch of sequence space Step 00

slide-40
SLIDE 40

Random graph approach to neutral networks Sketch of sequence space Step 01

slide-41
SLIDE 41

Random graph approach to neutral networks Sketch of sequence space Step 02

slide-42
SLIDE 42

Random graph approach to neutral networks Sketch of sequence space Step 03

slide-43
SLIDE 43

Random graph approach to neutral networks Sketch of sequence space Step 04

slide-44
SLIDE 44

Random graph approach to neutral networks Sketch of sequence space Step 05

slide-45
SLIDE 45

Random graph approach to neutral networks Sketch of sequence space Step 10

slide-46
SLIDE 46

Random graph approach to neutral networks Sketch of sequence space Step 15

slide-47
SLIDE 47

Random graph approach to neutral networks Sketch of sequence space Step 25

slide-48
SLIDE 48

Random graph approach to neutral networks Sketch of sequence space Step 50

slide-49
SLIDE 49

Random graph approach to neutral networks Sketch of sequence space Step 75

slide-50
SLIDE 50

Random graph approach to neutral networks Sketch of sequence space Step 100

slide-51
SLIDE 51

λj = 27 ,

/

12 λk = (k)

j

| | Gk

λ κ

cr = 1 - -1 (

1)

/ κ- λ λ

k cr . . . .

> λ λ

k cr . . . .

< network is connected Gk network is connected not Gk Connectivity threshold: Alphabet size : = 4

  • AUGC

G S S

k k k

= ( ) | ( ) =

  • 1
  • I

I

j j

  • cr

2 0.5 3 0.4226 4 0.3700

Mean degree of neutrality and connectivity of neutral networks

slide-52
SLIDE 52

Giant Component

A multi-component neutral network

slide-53
SLIDE 53

A connected neutral network

slide-54
SLIDE 54

C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C G C G G G G G G G G G G G G G G G G C C C G C C C C U U U U G G G G G G G G G G C C C C C C C C C C C C C C U U U U A A A A A A A A A A U U

Compatible Incompatible

5’-end 5’-end 3’-end 3’-end

Compatibility of sequences with structures A sequence is compatible with its minimum free energy structure and all its suboptimal structures.

slide-55
SLIDE 55

G C

k k

Gk

Neutral network Compatible set Ck The compatible set Ck of a structure Sk consists of all sequences which form Sk as its minimum free energy structure (neutral network Gk) or one of its suboptimal structures.

slide-56
SLIDE 56

C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G G G G G G G G G G G G G G G G G G G C C C C C C C C U U U U U U G G G G G C C C C C C C C C C C C C U U U A A A A A A A A A A U

3’- end

Minimum free energy conformation S0 Suboptimal conformation S1

C G

A sequence at the intersection of two neutral networks is compatible with both structures

slide-57
SLIDE 57

:

  • C1

C2 :

  • C1

C2

G1 G2

The intersection of two compatible sets is always non empty: C1 C2

slide-58
SLIDE 58

1. Optimization through variation and selection in populations 2. Neutral networks in genotype-phenotype mappings 3. Optimization in the RNA model 4. Evolution experiments with molecules in the laboratory

slide-59
SLIDE 59

Optimization of RNA molecules in silico

W.Fontana, P.Schuster, A computer model of evolutionary optimization. Biophysical Chemistry 26 (1987), 123-147 W.Fontana, W.Schnabl, P.Schuster, Physical aspects of evolutionary optimization and

  • adaptation. Phys.Rev.A 40 (1989), 3301-3321

M.A.Huynen, W.Fontana, P.F.Stadler, Smoothness within ruggedness. The role of neutrality in adaptation. Proc.Natl.Acad.Sci.USA 93 (1996), 397-401 W.Fontana, P.Schuster, Continuity in evolution. On the nature of transitions. Science 280 (1998), 1451-1455 W.Fontana, P.Schuster, Shaping space. The possible and the attainable in RNA genotype- phenotype mapping. J.Theor.Biol. 194 (1998), 491-515 B.M.R. Stadler, P.F. Stadler, G.P. Wagner, W. Fontana, The topology of the possible: Formal spaces underlying patterns of evolutionary change. J.Theor.Biol. 213 (2001), 241-274

slide-60
SLIDE 60

5'-End 3'-End

70 60 50 40 30 20 10

Randomly chosen initial structure Phenylalanyl-tRNA as target structure

slide-61
SLIDE 61

Stock Solution Reaction Mixture

Fitness function: fk = / [+ dS

(k)]

  • dS

(k) = ds(Ik,I

) The flowreactor as a device for studies of evolution in vitro and in silico

slide-62
SLIDE 62

s p a c e Sequence Concentration

Master sequence Mutant cloud “Off-the-cloud” mutations

The molecular quasispecies in sequence space

slide-63
SLIDE 63

S

=

( ) I f S

  • ƒ

= ( )

S f I

Mutation Genotype-Phenotype Mapping Evaluation of the Phenotype

Q

j

I1 I2 I3 I4 I5 In

Q

f1 f2 f3 f4 f5 fn

I1 I2 I3 I4 I5 I In+1 f1 f2 f3 f4 f5 f fn+1

Q

Evolutionary dynamics including molecular phenotypes

slide-64
SLIDE 64

In silico optimization in the flow reactor: Trajectory (biologists‘ view) Time (arbitrary units) A v e r a g e d i s t a n c e f r

  • m

i n i t i a l s t r u c t u r e 5

  • d
  • S

500 750 1000 1250 250 50 40 30 20 10

Evolutionary trajectory

slide-65
SLIDE 65

In silico optimization in the flow reactor: Trajectory (physicists‘ view) Time (arbitrary units) A v e r a g e s t r u c t u r e d i s t a n c e t

  • t

a r g e t d

  • S

500 750 1000 1250 250 50 40 30 20 10

Evolutionary trajectory

slide-66
SLIDE 66

44

Average structure distance to target dS

  • Evolutionary trajectory

1250 10

44 42 40 38 36 Relay steps Number of relay step Time

Endconformation of optimization

slide-67
SLIDE 67

44 43

Average structure distance to target dS

  • Evolutionary trajectory

1250 10

44 42 40 38 36 Relay steps Number of relay step Time

Reconstruction of the last step 43 44

slide-68
SLIDE 68

44 43 42

Average structure distance to target dS

  • Evolutionary trajectory

1250 10

44 42 40 38 36 Relay steps Number of relay step Time

Reconstruction of last-but-one step 42 43 ( 44)

slide-69
SLIDE 69

44 43 42 41

Average structure distance to target dS

  • Evolutionary trajectory

1250 10

44 42 40 38 36 Relay steps Number of relay step Time

Reconstruction of step 41 42 ( 43 44)

slide-70
SLIDE 70

44 43 42 41 40

Average structure distance to target dS

  • Evolutionary trajectory

1250 10

44 42 40 38 36 Relay steps Number of relay step Time

Reconstruction of step 40 41 ( 42 43 44)

slide-71
SLIDE 71

44 43 42 41 40 39 Evolutionary process Reconstruction

Average structure distance to target dS

  • Evolutionary trajectory

1250 10

44 42 40 38 36 Relay steps Number of relay step Time

Reconstruction of the relay series

slide-72
SLIDE 72

Transition inducing point mutations Neutral point mutations

Change in RNA sequences during the final five relay steps 39 44

slide-73
SLIDE 73

In silico optimization in the flow reactor: Trajectory and relay steps Time (arbitrary units) A v e r a g e s t r u c t u r e d i s t a n c e t

  • t

a r g e t d

  • S

500 750 1000 1250 250 50 40 30 20 10

Evolutionary trajectory

Relay steps

slide-74
SLIDE 74

In silico optimization in the flow reactor: Uninterrupted presence Time (arbitrary units) A v e r a g e s t r u c t u r e d i s t a n c e t

  • t

a r g e t d

  • S

500 750 1000 1250 250 50 40 30 20 10

Evolutionary trajectory Uninterrupted presence

Relay steps

slide-75
SLIDE 75

10 08 12 14 Time (arbitrary units) Average structure distance to target dS

  • 500

250 20 10

Uninterrupted presence Evolutionary trajectory Number of relay step

Transition inducing point mutations Neutral point mutations

Neutral genotype evolution during phenotypic stasis

slide-76
SLIDE 76

18 19 20 21 26 28 29 31

Time (arbitrary units)

750 1000 1250

Average structure distance to target dS

  • 30

20 10

Uninterrupted presence Evolutionary trajectory 35 30 25 20 Number of relay step

A random sequence of minor or continuous transitions in the relay series

slide-77
SLIDE 77

18 19 20 21 26 28 29 31

A random sequence of minor or continuous transitions in the relay series

slide-78
SLIDE 78

Elongation of Stacks Shortening of Stacks Opening of Constrained Stacks

Multi- loop

Minor or continuous transitions: Occur frequently on single point mutations

slide-79
SLIDE 79

In silico optimization in the flow reactor: Uninterrupted presence Time (arbitrary units) A v e r a g e s t r u c t u r e d i s t a n c e t

  • t

a r g e t d

  • S

500 750 1000 1250 250 50 40 30 20 10

Evolutionary trajectory Uninterrupted presence

Relay steps

slide-80
SLIDE 80

Average structure distance to target dS

  • Evolutionary trajectory

1250 10

44 42 40 38 36 Relay steps Number of relay step Time

38 37 36 Main transition leading to clover leaf

Reconstruction of a main transitions 36 37 ( 38)

slide-81
SLIDE 81

In silico optimization in the flow reactor: Main transitions Main transitions Relay steps Time (arbitrary units) Average structure distance to target d S

500 750 1000 1250 250 50 40 30 20 10

Evolutionary trajectory

slide-82
SLIDE 82

Shift Roll-Over Flip Double Flip

a a b a a b α α α α β β

Closing of Constrained Stacks

Multi- loop

Main or discontinuous transitions: Structural innovations, occur rarely on single point mutations

slide-83
SLIDE 83

In silico optimization in the flow reactor Time (arbitrary units) Average structure distance to target d S

500 750 1000 1250 250 50 40 30 20 10

Relay steps Main transitions

Uninterrupted presence Evolutionary trajectory

slide-84
SLIDE 84

The one-error neighborhood of the neutral network Gk corresponding to the structure Sk is defined by

  • (Sk) = {Sj | Sj =

(Ii) dh(Ii,Im) , Im Gk} Let

jk be the number of points, at which the two neutral networks Gk and

Gj are in Hamming distance one contact, with

jk =

  • kj. The probability of
  • ccurrence of Sj in the neighbothood of Sk is then given by
  • (Sj;Sk) =

jk ⌫ (l

( -1)

|Gk|) We note that this probability is not symmetric, (Sj;Sk)

  • (Sk;Sj), except

the two networks are of equal size, |Gk| = |Gj|. The definition of a statistical

  • neighborhood of the structure Sk allows for precise distinction between

frequent and rare neighbors. Frequent neighbors are contained in the statistical neighborhood

  • (Sk) = {Sj
  • (Sk) |

(Sj;Sk)

  • } .
slide-85
SLIDE 85

10 10

1

10

2

10

3

10

4

10

5

Rank

10

  • 6

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

Frequency of occurrence

5'-End 3'-End

70 60 50 40 30 20 10

10 2 5

Rare neighbors Main transitions Frequent neighbors Minor transitions

Probability of occurrence of different structures in the mutational neighborhood of tRNAphe

slide-86
SLIDE 86

Statistics of evolutionary trajectories

Population size N Number of replications < n >

rep

Number of transitions < n >

tr

Number of main transitions < n >

dtr

The number of main transitions or evolutionary innovations is constant.

slide-87
SLIDE 87

S1

(j)

Sk

(j)

S2

(j)

S3

(j)

Sm

(j)

k k k k k

P P P P P

  • P
  • Transition probabilities determining the presence of phenotype Sk

(j) in the population

slide-88
SLIDE 88

N N-1 1 2 3 4 5 6 7 8 9 10

x

µ ν µ ν λ µ ν λ µ ν λ µ ν λ µ ν λ µ ν λ µ ν λ µ ν λ µ ν ν ν λ µ λ

λ λ ν (x) = x + ( -x)

N

(x) = x µ µ

T1,0 T0,1

Time t P a r t i c l e n u m b e r ( t )

X

2 4 6 8 10 12

Calculation of transition probabilities by means of a birth-and-death process with immigration

slide-89
SLIDE 89

S1

(j)

Sk

(j)

S2

(j)

S3

(j)

Sm

(j)

k k k k k

P P P P P

  • P
  • N

=

sat (j)

p . . < >

l

  • (j)

1

slide-90
SLIDE 90

00 09 31 44

Three important steps in the formation of the tRNA clover leaf from a randomly chosen initial structure corresponding to three main transitions.

slide-91
SLIDE 91

Stable tRNA clover leaf structures built from binary, GC-only, sequences exist. The corresponding sequences are readily found through inverse folding. Optimization by mutation and selection in the flow reactor has so far always been unsuccessful.

5'-End 3'-End

70 60 50 40 30 20 10

The neutral network of the tRNA clover leaf in GC sequence space is not connected, whereas to the corresponding neutral network in AUGC sequence space is very close to the critical connectivity threshold,

cr . Here, both inverse folding

and optimization in the flow reactor are successful.

The success of optimization depends on the connectivity of neutral networks.

slide-92
SLIDE 92

Main results of computer simulations of molecular evolution

  • No trajectory was reproducible in detail. Sequences of target structures were always
  • different. Nevertheless solutions of the same quality are almost always achieved.
  • Transitions between molecular phenotypes represented by RNA structures can be

classified with respect to the induced structural changes. Highly probable minor transitions are opposed by main transitions with low probability of occurrence.

  • Main transitions represent important innovations in the course of evolution.
  • The number of minor transitions decreases with increasing population size.
  • The number of main transitions or evolutionary innovations is approximately

constant for given start and stop structures.

  • Not all known structures are accessible through evolution in the flow reactor. An

example is the tRNA clover leaf for GC-only sequences.

slide-93
SLIDE 93

1. Optimization through variation and selection in populations 2. Neutral networks in genotype-phenotype mappings 3. Optimization in the RNA model 4. Evolution experiments with molecules in the laboratory

slide-94
SLIDE 94

Generation time 10 000 generations 106 generations 107 generations RNA molecules 10 sec 1 min 27.8 h = 1.16 d 6.94 d 115.7 d 1.90 a 3.17 a 19.01 a Bacteria 20 min 10 h 138.9 d 11.40 a 38.03 a 1 140 a 380 a 11 408 a Higher multicelluar

  • rganisms

10 d 20 a 274 a 20 000 a 27 380 a 2 × 107 a 273 800 a 2 × 108 a

Generation times and evolutionary timescales

slide-95
SLIDE 95

Evolution of RNA molecules based on Qβ phage

D.R.Mills, R,L,Peterson, S.Spiegelman, An extracellular Darwinian experiment with a self-duplicating nucleic acid molecule. Proc.Natl.Acad.Sci.USA 58 (1967), 217-224 S.Spiegelman, An approach to the experimental analysis of precellular evolution. Quart.Rev.Biophys. 4 (1971), 213-253 C.K.Biebricher, Darwinian selection of self-replicating RNA molecules. Evolutionary Biology 16 (1983), 1-52 C.K.Biebricher, W.C. Gardiner, Molecular evolution of RNA in vitro. Biophysical Chemistry 66 (1997), 179-192 G.Strunk, T. Ederhof, Machines for automated evolution experiments in vitro based on the serial transfer concept. Biophysical Chemistry 66 (1997), 193-202

slide-96
SLIDE 96

RNA sample Stock solution: Q RNA-replicase, ATP, CTP, GTP and UTP, buffer

  • Time

1 2 3 4 5 6 69 70 The serial transfer technique applied to RNA evolution in vitro

slide-97
SLIDE 97

Reproduction of the original figure of the serial transfer experiment with Q RNA β D.R.Mills, R,L,Peterson, S.Spiegelman, . Proc.Natl.Acad.Sci.USA (1967), 217-224 An extracellular Darwinian experiment with a self-duplicating nucleic acid molecule 58

slide-98
SLIDE 98

Decrease in mean fitness due to quasispecies formation

The increase in RNA production rate during a serial transfer experiment

slide-99
SLIDE 99

Evolutionary design of RNA molecules

D.B.Bartel, J.W.Szostak, In vitro selection of RNA molecules that bind specific ligands. Nature 346 (1990), 818-822 C.Tuerk, L.Gold, SELEX - Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249 (1990), 505-510 D.P.Bartel, J.W.Szostak, Isolation of new ribozymes from a large pool of random

  • sequences. Science 261 (1993), 1411-1418

R.D.Jenison, S.C.Gill, A.Pardi, B.Poliski, High-resolution molecular discrimination by

  • RNA. Science 263 (1994), 1425-1429
slide-100
SLIDE 100

yes

Selection Cycle

no

Genetic Diversity

Desired Properties ? ? ? Selection Amplification Diversification

Selection cycle used in applied molecular evolution to design molecules with predefined properties

slide-101
SLIDE 101

Retention of binders Elution of binders C h r

  • m

a t

  • g

r a p h i c c

  • l

u m n

The SELEX technique for the evolutionary design of aptamers

slide-102
SLIDE 102

A A A A A C C C C C C C C G G G G G G G G U U U U U U

5’- 3’-

A A A A A U U U U U U C C C C C C C C G G G G G G G G

5’-

  • 3’

Formation of secondary structure of the tobramycin binding RNA aptamer l = 27 4l = 1.801 1016 possible different sequences

  • L. Jiang, A. K. Suri, R. Fiala, D. J. Patel, Chemistry & Biology 4:35-50 (1997)
slide-103
SLIDE 103

The three-dimensional structure of the tobramycin aptamer complex

  • L. Jiang, A. K. Suri, R. Fiala, D. J. Patel,

Chemistry & Biology 4:35-50 (1997)

slide-104
SLIDE 104

A ribozyme switch

E.A.Schultes, D.B.Bartel, One sequence, two ribozymes: Implication for the emergence of new ribozyme folds. Science 289 (2000), 448-452

slide-105
SLIDE 105

Reference for the definition of the intersection and the proof of the intersection theorem

slide-106
SLIDE 106

C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G G G G G G G G G G G G G G G G G G G C C C C C C C C U U U U U U G G G G G C C C C C C C C C C C C C U U U A A A A A A A A A A U

3’- end

Minimum free energy conformation S0 Suboptimal conformation S1

C G

A sequence at the intersection of two neutral networks is compatible with both structures

slide-107
SLIDE 107

5.10

2

2.90

8 14 15 18

2.60

17 23 19 27 22 38 45 25 36 33 39 40

3.10

43

3.40

41

3.30 7.40

5 3 7

3.00

4 10 9

3.40

6 13 12

3.10

11 21 20 16 28 29 26 30 32 42 46 44 24 35 34 37 49

2.80

31 47 48

S0 S1

Barrier tree of a sequence which switches between two conformations

5.90

slide-108
SLIDE 108

Two ribozymes of chain lengths n = 88 nucleotides: An artificial ligase (A) and a natural cleavage ribozyme of hepatitis-

  • virus (B)
slide-109
SLIDE 109

The sequence at the intersection: An RNA molecules which is 88 nucleotides long and can form both structures

slide-110
SLIDE 110

Two neutral walks through sequence space with conservation of structure and catalytic activity

slide-111
SLIDE 111

Reference for postulation and in silico verification of neutral networks

slide-112
SLIDE 112

Coworkers

Walter Fontana, Santa Fe Institute, NM Christian Reidys, Christian Forst, Los Alamos National Laboratory, NM Peter Stadler, Universität Leipzig, GE Ivo L.Hofacker, Christoph Flamm, Universität Wien, AT Bärbel Stadler, Andreas Wernitznig, Universität Wien, AT Michael Kospach, Ulrike Langhammer, Ulrike Mückstein, Stefanie Widder Jan Cupal, Kurt Grünberger, Andreas Svrček-Seiler, Stefan Wuchty Ulrike Göbel, Institut für Molekulare Biotechnologie, Jena, GE Walter Grüner, Stefan Kopp, Jaqueline Weber

slide-113
SLIDE 113

Variation in genotype space during optimization of phenotypes

slide-114
SLIDE 114

„...Variations neither useful not injurious would not be affected by natural selection, and would be left either a fluctuating element, as perhaps we see in certain polymorphic species, or would ultimately become fixed, owing to the nature of the organism and the nature of the conditions. ...“

Charles Darwin, Origin of species (1859)

slide-115
SLIDE 115

Genotype Space F i t n e s s

Start of Walk End of Walk Random Drift Periods Adaptive Periods

Evolution in genotype space sketched as a non-descending walk in a fitness landscape

slide-116
SLIDE 116

5.10

2

2.90

8 14 15 18

2.60

17 23 19 27 22 38 45 25 36 33 39 40

3.10

43

3.40

41

3.30 7.40

5 3 7

3.00

4 10 9

3.40

6 13 12

3.10

11 21 20 16 28 29 26 30 32 42 46 44 24 35 34 37 49

2.80

31 47 48

S0 S1

Kinetic Structures Free Energy S0 S0 S1 S2 S3 S4 S5 S6 S7 S8 S10 S9 Minimum Free Energy Structure Suboptimal Structures T = 0 K , t T > 0 K , t T > 0 K , t finite

5.90

Different notions of RNA structure including suboptimal conformations

slide-117
SLIDE 117

U U U U U G G G G G G G G G G G G G G G G G A A A A A A A A A A C C C C C C C C C C C C C C C

Cleavage site

The "hammerhead" ribozyme

OH OH OH ppp 5' 5' 3' 3'

The smallest known catalytically active RNA molecule

slide-118
SLIDE 118

Sequence of mutants from the intersection to both reference ribozymes