Inverse Folding and Sequence-Structure Maps of Ribonucleic Acids - - PowerPoint PPT Presentation

inverse folding and sequence structure maps of
SMART_READER_LITE
LIVE PREVIEW

Inverse Folding and Sequence-Structure Maps of Ribonucleic Acids - - PowerPoint PPT Presentation

Inverse Folding and Sequence-Structure Maps of Ribonucleic Acids Peter Schuster Institut fr Theoretische Chemie und Molekulare Strukturbiologie der Universitt Wien Inverse Problem Workshop IPAM, UCLA, 22.10.2003 Web-Page for further


slide-1
SLIDE 1
slide-2
SLIDE 2

Inverse Folding and Sequence-Structure Maps of Ribonucleic Acids

Peter Schuster Institut für Theoretische Chemie und Molekulare Strukturbiologie der Universität Wien Inverse Problem Workshop IPAM, UCLA, 22.10.2003

slide-3
SLIDE 3

Web-Page for further information: http://www.tbi.univie.ac.at/~pks

slide-4
SLIDE 4

1. The role of RNA in the cell and the notion of structure 2. RNA folding 3. Inverse folding of RNA 4. Sequence structure maps, neutral networks, and intersection 5. Reference to experimental data 6. Concluding remarks

slide-5
SLIDE 5

1. The role of RNA in the cell and the notion of structure 2. RNA folding 3. Inverse folding of RNA 4. Sequence structure maps, neutral networks, and intersection 5. Reference to experimental data 6. Concluding remarks

slide-6
SLIDE 6

RNA

RNA as scaffold for supramolecular complexes

ribosome ? ? ? ? ?

RNA as adapter molecule

GAC ... CUG ...

leu genetic code

RNA as transmitter of genetic information

DNA

...AGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUC...

messenger-RNA protein transcription translation RNA as

  • f genetic information

working copy

RNA as carrier of genetic information RNA RNA viruses and retroviruses as information carrier in evolution and evolutionary biotechnology in vitro

RNA as catalyst ribozyme

The RNA DNA protein world as a precursor of the current + biology

RNA as regulator of gene expression

gene silencing by small interfering RNAs

RNA is modified by epigenetic control RNA RNA editing Alternative splicing of messenger RNA is the catalytic subunit in

supramolecular complexes

Functions of RNA molecules

slide-7
SLIDE 7 O CH2 OH O O P O O O

N1

O CH2 OH O P O O O

N2

O CH2 OH O P O O O

N3

O CH2 OH O P O O O

N4

N A U G C

k =

, , ,

3' - end 5' - end Na Na Na Na

RNA

nd 3’-end

GCGGAU AUUCGC UUA AGUUGGGA G CUGAAGA AGGUC UUCGAUC A ACCA GCUC GAGC CCAGA UCUGG CUGUG CACAG 3'-end 5’-end

70 60 50 40 30 20 10

Definition of RNA structure

5'-e

slide-8
SLIDE 8

RNA sequence

Empirical parameters Biophysical chemistry: thermodynamics and kinetics

RNA structure

Inverse folding of RNA: Biotechnology, design of biomolecules with predefined structures and functions RNA folding: Structural biology, spectroscopy of biomolecules, understanding molecular function

Sequence, structure, and function

slide-9
SLIDE 9

Definition and physical relevance of RNA secondary structures

RNA secondary structures are listings of Watson-Crick and GU wobble base pairs, which are free of knots and pseudokots. „Secondary structures are folding intermediates in the formation of full three-dimensional structures.“ D.Thirumalai, N.Lee, S.A.Woodson, and D.K.Klimov. Annu.Rev.Phys.Chem. 52:751-762 (2001):

slide-10
SLIDE 10

5'-End 5'-End 3'-End 3'-End

70 60 50 40 30 20 10 GCGGAUUUAGCUCAGDDGGGAGAGCMCCAGACUGAAYAUCUGGAGMUCCUGUGTPCGAUCCACAGAAUUCGCACCA

Sequence Secondary structure The RNA secondary structure lists the double helical stretches or stacks of a folded single strand molecule

slide-11
SLIDE 11

The three-dimensional structure of a short double helical stack of B-DNA

James D. Watson, 1928- , and Francis Crick, 1916- , Nobel Prize 1962

1953 – 2003 fifty years double helix

slide-12
SLIDE 12

Canonical Watson-Crick base pairs: cytosine – guanine uracil – adenine

W.Saenger, Principles of Nucleic Acid Structure, Springer, Berlin 1984

slide-13
SLIDE 13

5'-End 5'-End 3'-End 3'-End

70 60 50 40 30 20 10 GCGGAUUUAGCUCAGDDGGGAGAGCMCCAGACUGAAYAUCUGGAGMUCCUGUGTPCGAUCCACAGAAUUCGCACCA

Sequence Secondary structure

slide-14
SLIDE 14

5'-End 5'-End 5'-End 3'-End 3'-End 3'-End

70 60 50 40 30 20 10 GCGGAUUUAGCUCAGDDGGGAGAGCMCCAGACUGAAYAUCUGGAGMUCCUGUGTPCGAUCCACAGAAUUCGCACCA

Sequence Secondary structure Symbolic notation

  • A symbolic notation of RNA secondary structure that is equivalent to the conventional graphs
slide-15
SLIDE 15

Tertiary elements in RNA structure

1. Different classes of pseudoknots 2. Different classes of non-Watson-Crick base pairs 3. Base triplets, G-quartets, A-platforms, etc. 4. End-on-end stacking of double helices 5. Divalent metal ion complexes, Mg2+, etc. 6. Other interactions involving phosphate, 2‘-OH, etc.

slide-16
SLIDE 16

Tertiary elements in RNA structure

1. Different classes of pseudoknots 2. Different classes of non-Watson-Crick base pairs 3. Base triplets, G-quartets, A-platforms, etc. 4. End-on-end stacking of double helices 5. Divalent metal ion complexes, Mg2+, etc. 6. Other interactions involving phosphate, 2‘-OH, etc.

slide-17
SLIDE 17

3'-end

"H-type pseudoknot"

5'-end 3'-end pseudoknot

"Kissing loops"

5'-end

··((((····· [[ ·))))····(((((·]] ·····))))) ··· Two classes of pseudoknots in RNA structures

slide-18
SLIDE 18

Tertiary elements in RNA structure

1. Different classes of pseudoknots 2. Different classes of non-Watson-Crick base pairs 3. Base triplets, G-quartets, A-platforms, etc. 4. End-on-end stacking of double helices 5. Divalent metal ion complexes, Mg2+, etc. 6. Other interactions involving phosphate, 2‘-OH, etc.

slide-19
SLIDE 19

Twelve families of base pairs Watson-Crick / Hogsteen / Sugar edge Cis / Trans

  • rientation

N.B. Leontis, E. Westhof, Geometric nomenclature and classification of RNA base

  • pairs. RNA 7:499-512, 2001.
slide-20
SLIDE 20

Tertiary elements in RNA structure

1. Different classes of pseudoknots 2. Different classes of non-Watson-Crick base pairs 3. Base triplets, G-quartets, A-platforms, etc. 4. End-on-end stacking of double helices 5. Divalent metal ion complexes, Mg2+, etc. 6. Other interactions involving phosphate, 2‘-OH, etc.

slide-21
SLIDE 21

5'-End 3'-End

70 60 50 40 30 20 10

5'-End 3'-End

70 60 50 40 30 20 10

End-on-end stacking of double helical regions yields the L-shape of tRNAphe

slide-22
SLIDE 22

1. The role of RNA in the cell and the notion of structure 2. RNA folding 3. Inverse folding of RNA 4. Sequence structure maps, neutral networks, and intersection 5. Reference to experimental data 6. Concluding remarks

slide-23
SLIDE 23

How to compute RNA secondary structures

Efficient algorithms based on dynamic programming are available for computation of minimum free energy and many suboptimal secondary structures for given sequences.

M.Zuker and P.Stiegler. Nucleic Acids Res. 9:133-148 (1981) M.Zuker, Science 244: 48-52 (1989)

Equilibrium partition function and base pairing probabilities in Boltzmann ensembles of suboptimal structures.

J.S.McCaskill. Biopolymers 29:1105-1190 (1990)

The Vienna RNA Package provides in addition: inverse folding (computing sequences for given secondary structures), computation of melting profiles from partition functions, all suboptimal structures within a given energy interval, barrier tress of suboptimal structures, kinetic folding of RNA sequences, RNA-hybridization and RNA/DNA-hybridization through cofolding of sequences, alignment, etc..

I.L.Hofacker, W. Fontana, P.F.Stadler, L.S.Bonhoeffer, M.Tacker, and P. Schuster. Mh.Chem. 125:167-188 (1994) S.Wuchty, W.Fontana, I.L.Hofacker, and P.Schuster. Biopolymers 49:145-165 (1999) C.Flamm, W.Fontana, I.L.Hofacker, and P.Schuster. RNA 6:325-338 (1999)

Vienna RNA Package: http://www.tbi.univie.ac.at

slide-24
SLIDE 24

G G G G G G G G G G G G G G G G U U U U U U U U U U U A A A A A A A A A A A A U C C C C C C C C C C C C 5’-end 3’-end

Folding of RNA sequences into secondary structures of minimal free energy, G0

300

slide-25
SLIDE 25

G G G G G G G G G G G G G G G G U U U U U U U U U U U A A A A A A A A A A A A U C C C C C C C C C C C C 5’-end 3’-end i j k l

Edges: i·j,k·l S S .... base pairs (i) i· i+1 S .... backbone (ii) #base pairs per node = {0,1} (iii) if i·j and l·k S, then i<k<j i<l<j .... pseudoknot exclusion

Folding of RNA sequences into secondary structures of minimal free energy, G0

300

slide-26
SLIDE 26

G G G G G G G G G G G G G G G G U U U U U U U U U U U A A A A A A A A A A A A U C C C C C C C C C C C C 5’-end 3’-end

free energy of stacking < 0

L

∑ ∑ ∑ ∑

+ + + + = ∆

loops internal bulges loops hairpin pairs base

  • f

stacks , 300

) ( ) ( ) (

i b l kl ij

n i n b n h g G

Folding of RNA sequences into secondary structures of minimal free energy, G0

300

slide-27
SLIDE 27

hairpin loop hairpin loop stack stack stack hairpin loop stack free end free end free end hairpin loop hairpin loop stack stack free end free end joint hairpin loop stack stack stack internal loop bulge multiloop

Elements of RNA secondary structures as used in free energy calculations

slide-28
SLIDE 28

Maximum matching

An example of a dynamic programming computation

  • f the maximum number of base pairs

Back tracking yields the structure(s).

i i+1 i+2 k Xi,k-1 j-1 j Xk+1,j j+1 [ k+1,j ] [i,k-1]

( ) { }

1 , , 1 1 , 1 , 1 ,

) 1 ( max , max

+ + − − ≤ ≤ +

+ + =

j k j k k i j k i j i j i

X X X X ρ

Minimum free energy computations are based on empirical energies

GGCGCGCCCGGCGCC GUAUCGAAAUACGUAGCGUAUGGGGAUGCUGGACGGUCCCAUCGGUACUCCA UGGUUACGCGUUGGGGUAACGAAGAUUCCGAGAGGAGUUUAGUGACUAGAGG

RNAStudio.lnk

slide-29
SLIDE 29

Maximum matching

j 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 i G G C G C G C C C G G C G C C 1 G * * 1 1 1 1 2 3 3 3 4 4 5 6 6 2 G * * 1 1 2 2 2 3 3 4 4 5 6 3 C * * 1 1 1 2 3 3 3 4 5 5 4 G * * 1 1 2 2 2 3 4 5 5 5 C * * 1 1 2 2 3 4 4 4 6 G * * 1 1 1 2 3 3 3 4 7 C * * 1 2 2 2 2 3 8 C * * 1 1 1 2 2 2 9 C * * 1 1 2 2 2 10 G * * 1 1 1 2 11 G * * 1 1 12 C * * 1 13 G * * 1 14 C * * 15 C *

An example of a dynamic programming computation

  • f the maximum number of base pairs

Back tracking yields the structure(s).

i i+1 i+2 k Xi,k-1 j-1 j Xk+1,j j+1 [ k+1,j ] [i,k-1]

( ) { }

1 , , 1 1 , 1 , 1 ,

) 1 ( max , max

+ + − − ≤ ≤ +

+ + =

j k j k k i j k i j i j i

X X X X ρ

Minimum free energy computations are based on empirical energies

GGCGCGCCCGGCGCC GUAUCGAAAUACGUAGCGUAUGGGGAUGCUGGACGGUCCCAUCGGUACUCCA UGGUUACGCGUUGGGGUAACGAAGAUUCCGAGAGGAGUUUAGUGACUAGAGG

RNAStudio.lnk

slide-30
SLIDE 30

S1

(h)

S9

(h)

Free energy G Minimum of free energy Suboptimal conformations

S0

(h) S2

(h)

S3

(h)

S4

(h)

S7

(h)

S6

(h)

S5

(h)

S8

(h)

G G G G G G G G G G G G G G G G U U U U U U U U U U U A A A A A A A A A A A A U C C C C C C C C C C C C 5’-end 3’-end

The minimum free energy structures on a discrete space of conformations

slide-31
SLIDE 31

Free energy G "Reaction coordinate" Sk S{ Saddle point T

{ k

F r e e e n e r g y G Sk S{ T

{ k

"Barrier tree"

Definition of a ‚barrier tree‘

slide-32
SLIDE 32

5 . 1

2 8

14 15 18 17 23 19 27 22 38 45 25 36 33 39 40 43 41

3 . 3 7 . 4

5 3 7 4 10 9 6

13 12 3.10 11 21 20 16 28 29 26 30 32 42 46 44 24 35 34 37 49 31 47 48

S0 S1

Kinetic folding

S0 S1 S2 S3 S4 S5 S6 S7 S8 S10 S9

Suboptimal structures

lim t finite folding time

5 . 9

A typical energy landscape of a sequence with two (meta)stable comformations

slide-33
SLIDE 33

Kinetics RNA refolding between a long living metastable conformation and the minmum free energy structure

slide-34
SLIDE 34

1. The role of RNA in the cell and the notion of structure 2. RNA folding 3. Inverse folding of RNA 4. Sequence structure maps, neutral networks, and intersection 5. Reference to experimental data 6. Concluding remarks

slide-35
SLIDE 35

Minimum free energy criterion Inverse folding of RNA secondary structures

The idea of inverse folding algorithm is to search for sequences that form a given RNA secondary structure under the minimum free energy criterion.

slide-36
SLIDE 36

Structure

slide-37
SLIDE 37

C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G

Compatible sequence Structure

5’-end 3’-end

slide-38
SLIDE 38

C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G G G G G G G G C C C C G G G G C C C C C C C U A U U G U A A A A U

Compatible sequence Structure

5’-end 3’-end

slide-39
SLIDE 39

C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G G G G G G G G C C C C U U G G G G G C C C C C C C U U A A A A A U

Compatible sequence Structure

5’-end 3’-end

Single nucleotides: A U G C , , ,

Single bases pairs are varied independently

slide-40
SLIDE 40

C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G G G C C C C G G G G C C G G G G G C C C C C U A U U G U A A A A U

Compatible sequence Structure

5’-end 3’-end

Base pairs: AU , UA GC , CG GU , UG

Base pairs are varied in strict correlation

slide-41
SLIDE 41

C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C G G U C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G G G G G G G G G G G G G G G C U C C C C C C U U U U G G G G G G G G G G C C C C C C C C C C C C C C U U U U A A A A A A A A A A U U

Compatible sequences Structure

5’-end 5’-end 3’-end 3’-end

slide-42
SLIDE 42

C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C G C G G G G G G G G G C G C C U U G G G G G C C C C C C C U U A A A A A U

Structure Incompatible sequence

5’-end 3’-end

slide-43
SLIDE 43

.... GC UC .... CA .... GC UC .... GU .... GC UC .... GA .... GC UC .... CU

d =1

H

d =1

H

d =2

H

City-block distance in sequence space 2D Sketch of sequence space

Single point mutations as moves in sequence space

slide-44
SLIDE 44

4 2 1 8 16 10 19 9 14 6 13 5 11 3 7 12 21 17 22 18 25 20 26 24 28 27 23 15 29 30 31

Binary sequences are encoded by their decimal equivalents: = 0 and = 1, for example, "0" 00000 = "14" 01110 = , "29" 11101 = , etc. ≡ ≡ ≡ , C CCCCC C C C G GGG GGG G

Mutant class

1 2

3 4

5 Hypercube of dimension n = 5 Decimal coding of binary sequences

Sequence space of binary sequences of chain lenght n = 5

slide-45
SLIDE 45

CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... G A G T A C A C

Hamming distance d (I ,I ) =

H 1 2

4 d (I ,I ) = 0

H 1 1

d (I ,I ) = d (I ,I )

H H 1 2 2 1

d (I ,I ) d (I ,I ) + d (I ,I )

H H H 1 3 1 2 2 3

  • (i)

(ii) (iii)

The Hamming distance between sequences induces a metric in sequence space

slide-46
SLIDE 46

Target structure Sk Initial trial sequences Target sequence Stop sequence of an unsuccessful trial Intermediate compatible sequences

Approach to the target structure Sk in the inverse folding algorithm

slide-47
SLIDE 47

Inverse folding algorithm I0 I1 I2 I3 I4 ... Ik Ik+1 ... It S0 S1 S2 S3 S4 ... Sk Sk+1 ... St Ik+1 = Mk(Ik) and dS(Sk,Sk+1) = dS(Sk+1,St) - dS(Sk,St) < 0 M M ... base or base pair mutation operator dS (Si,Sj) ... distance between the two structures Si and Sj ‚Unsuccessful trial‘ ... termination after n steps

slide-48
SLIDE 48

Minimum free energy criterion

Inverse folding of RNA secondary structures

1st 2nd 3rd trial 4th 5th

The inverse folding algorithm searches for sequences that form a given RNA secondary structure under the minimum free energy criterion.

slide-49
SLIDE 49

1. The role of RNA in the cell and the notion of structure 2. RNA folding 3. Inverse folding of RNA 4. Sequence structure maps, neutral networks, and intersection 5. Reference to experimental data 6. Concluding remarks

slide-50
SLIDE 50

Minimal hairpin loop size: nlp 3 Minimal stack length: nst 2

Recursion formula for the number of acceptable RNA secondary structures

slide-51
SLIDE 51

Computed numbers of minimum free energy structures over different nucleotide alphabets

  • P. Schuster, Molecular insights into evolution of phenotypes. In: J. Crutchfield & P.Schuster,

Evolutionary Dynamics. Oxford University Press, New York 2003, pp.163-215.

slide-52
SLIDE 52

Hamming distance d (S ,S ) =

H 1 2

4 d (S ,S ) = 0

H 1 1

d (S ,S ) = d (S ,S )

H H 1 2 2 1

d (S ,S ) d (S ,S ) + d (S ,S )

H H H 1 3 1 2 2 3

  • (i)

(ii) (iii)

The Hamming distance between structures in parentheses notation forms a metric in structure space

slide-53
SLIDE 53

RNA sequences as well as RNA secondary structures can be visualized as objects in metric spaces. At constant chain length the sequence space is a (generalized) hypercube. The mapping from RNA sequences into RNA secondary structures is many-to-one. Hence, it is redundant and not invertible. RNA sequences, which are mapped onto the same RNA secondary structure, are neutral with respect to structure. The pre-images of structures in sequence space are neutral

  • networks. They can be represented by graphs where the edges

connect sequences of Hamming distance dH = 1.

slide-54
SLIDE 54
slide-55
SLIDE 55

Sk I. = ( ) ψ

fk f Sk = ( )

Sequence space Structure space Real numbers Mapping from sequence space into structure space and into function

slide-56
SLIDE 56

Sk I. = ( ) ψ

fk f Sk = ( )

Sequence space Structure space Real numbers

slide-57
SLIDE 57

Sk I. = ( ) ψ

fk f Sk = ( )

Sequence space Structure space Real numbers

The pre-image of the structure Sk in sequence space is the neutral network Gk

slide-58
SLIDE 58

Neutral networks are sets of sequences forming the same structure. Gk is the pre-image of the structure Sk in sequence space: Gk =

  • 1(Sk) π{

j |

(Ij) = Sk} The set is converted into a graph by connecting all sequences of Hamming distance one. Neutral networks of small RNA molecules can be computed by exhaustive folding of complete sequence spaces, i.e. all RNA sequences of a given chain length. This number, N=4n , becomes very large with increasing length, and is prohibitive for numerical computations. Neutral networks can be modelled by random graphs in sequence

  • space. In this approach, nodes are inserted randomly into sequence

space until the size of the pre-image, i.e. the number of neutral sequences, matches the neutral network to be studied.

slide-59
SLIDE 59

Random graph approach to neutral networks Sketch of sequence space Step 00

slide-60
SLIDE 60

Random graph approach to neutral networks Sketch of sequence space Step 01

slide-61
SLIDE 61

Random graph approach to neutral networks Sketch of sequence space Step 02

slide-62
SLIDE 62

Random graph approach to neutral networks Sketch of sequence space Step 03

slide-63
SLIDE 63

Random graph approach to neutral networks Sketch of sequence space Step 04

slide-64
SLIDE 64

Random graph approach to neutral networks Sketch of sequence space Step 05

slide-65
SLIDE 65

Random graph approach to neutral networks Sketch of sequence space Step 10

slide-66
SLIDE 66

Random graph approach to neutral networks Sketch of sequence space Step 15

slide-67
SLIDE 67

Random graph approach to neutral networks Sketch of sequence space Step 25

slide-68
SLIDE 68

Random graph approach to neutral networks Sketch of sequence space Step 50

slide-69
SLIDE 69

Random graph approach to neutral networks Sketch of sequence space Step 75

slide-70
SLIDE 70

Random graph approach to neutral networks Sketch of sequence space Step 100

slide-71
SLIDE 71

λj = 27 = 0.444 ,

/

12 λk = (k)

j

| | Gk

λ κ

cr = 1 -

  • 1 (

1)

/ κ- λ λ

k cr . . . .

> λ λ

k cr . . . .

< network is connected Gk network is connected not Gk Connectivity threshold: Alphabet size : = 4

  • AUGC

G S S

k k k

= ( ) | ( ) =

  • 1

U

  • I

I

j j

  • cr

2 0.5 3 0.423 4 0.370

GC,AU GUC,AUG AUGC

Mean degree of neutrality and connectivity of neutral networks

slide-72
SLIDE 72

A connected neutral network

slide-73
SLIDE 73

Giant Component

A multi-component neutral network

slide-74
SLIDE 74

Reference for postulation and in silico verification of neutral networks

slide-75
SLIDE 75

Gk Neutral Network

Structure S

k

Gk C k

Compatible Set Ck

The compatible set Ck of a structure Sk consists of all sequences which form Sk as its minimum free energy structure (the neutral network Gk) or one of its suboptimal structures.

slide-76
SLIDE 76

Structure S Structure S

1

The intersection of two compatible sets is always non empty: C0 C1 π

slide-77
SLIDE 77

Reference for the definition of the intersection and the proof of the intersection theorem

slide-78
SLIDE 78

C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G G G G G G G G G G G G G G G G G G G C C C C C C C C U U U U U U G G G G G C C C C C C C C C C C C C U U U A A A A A A A A A A U

3’- end

Minimum free energy conformation S0 Suboptimal conformation S1

C G

A sequence at the intersection of two neutral networks is compatible with both structures

slide-79
SLIDE 79

5.10 5.90

2 8

14 15 18 17 23 19 27 22 38 45 25 36 33 39 40 43 41

3.30 7.40

5 3 7 4 10 9 6

13 12 3 . 1 11 21 20 16 28 29 26 30 32 42 46 44 24 35 34 37 49 31 47 48

S0 S1

basin '1' long living metastable structure basin '0' minimum free energy structure

Barrier tree for two long living structures

slide-80
SLIDE 80
slide-81
SLIDE 81

1. The role of RNA in the cell and the notion of structure 2. RNA folding 3. Inverse folding of RNA 4. Sequence structure maps, neutral networks, and intersection 5. Reference to experimental data 6. Concluding remarks

slide-82
SLIDE 82

A ribozyme switch

E.A.Schultes, D.B.Bartel, Science 289 (2000), 448-452

slide-83
SLIDE 83

Two ribozymes of chain lengths n = 88 nucleotides: An artificial ligase (A) and a natural cleavage ribozyme of hepatitis-

  • virus (B)
slide-84
SLIDE 84

The sequence at the intersection: An RNA molecules which is 88 nucleotides long and can form both structures

slide-85
SLIDE 85

Two neutral walks through sequence space with conservation of structure and catalytic activity

slide-86
SLIDE 86

Nature , 323-325, 1999 402

Catalytic activity in the AUG alphabet

slide-87
SLIDE 87

Nature , 841-844, 2002 420

Catalytic activity in the DU alphabet

slide-88
SLIDE 88

5'-End 3'-End

70 60 50 40 30 20 10

RNA clover-leaf secondary structures

  • f sequences with chain length n=76

tRNAphe

slide-89
SLIDE 89

Target structure Sk Initial trial sequences Target sequence Stop sequence of an unsuccessful trial Intermediate compatible sequences

Approach to the target structure Sk in the inverse folding algorithm

slide-90
SLIDE 90 5'-End 5'-End 5'-End 5'-End 3'-End 3'-End 3'-End 3'-End 70 70 70 70 60 60 60 60 50 50 50 50 40 40 40 40 30 30 30 30 20 20 20 20 10 10 10 10

Alphabet Probability of successful trials in inverse folding

AU AUG AUGC UGC GC

  • -
  • -

0.794 0.007 0.548 0.011 0.067 0.007

  • -

0.003 0.001 0.884 0.008 0.628 0.012

  • 0.086 0.008
  • 0.051 0.006

0.374 0.016 0.982 0.004 0.818 0.012 0.127 0.006

  • Accessibility of cloverleaf RNA secondary structures through inverse folding
slide-91
SLIDE 91 5'-End 5'-End 5'-End 5'-End 3'-End 3'-End 3'-End 3'-End 70 70 70 70 60 60 60 60 50 50 50 50 40 40 40 40 30 30 30 30 20 20 20 20 10 10 10 10

Alphabet Degree of neutrality

AU AUG AUGC UGC GC

  • -
  • -

0.275 0.064 0.263 0.071 0.052 0.033

  • -

0.217 0.051 0.279 0.063 0.257 0.070

  • 0.057 0.034
  • 0.073 0.032

0.201 0.056 0.313 0.058 0.250 0.064 0.068 0.034

  • Degree of neutrality of cloverleaf RNA secondary structures over different alphabets
slide-92
SLIDE 92

1. The role of RNA in the cell and the notion of structure 2. RNA folding 3. Inverse folding of RNA 4. Sequence structure maps, neutral networks, and intersection 5. Reference to experimental data 6. Concluding remarks

slide-93
SLIDE 93

Concluding remarks

1. At constant chain lengths the number of RNA sequences exceeds the number of secondary structures by orders of magnitude. 2. The pre-images of common structures in sequence space are extended and connected neutral networks. 3. The intersection of the sets of compatible sequences of two structures is always non-empty. 4. Inverse folding allows for the design of RNA molecules with predefined structures and properties.

slide-94
SLIDE 94

Acknowledgement of support

Fonds zur Förderung der wissenschaftlichen Forschung (FWF) Projects No. 09942, 10578, 11065, 13093 13887, and 14898 Jubiläumsfonds der Österreichischen Nationalbank Project No. Nat-7813 European Commission: Project No. EU-980189 Siemens AG, Austria The Santa Fe Institute and the Universität Wien The software for producing RNA movies was developed by Robert Giegerich and coworkers at the Universität Bielefeld

Universität Wien

slide-95
SLIDE 95

Coworkers

Universität Wien

Walter Fontana, Santa Fe Institute, NM Christian Reidys, Christian Forst, Los Alamos National Laboratory, NM Peter Stadler, Bärbel Stadler, Universität Leipzig, GE Ivo L.Hofacker, Christoph Flamm, Universität Wien, AT Andreas Wernitznig, Michael Kospach, Universität Wien, AT Ulrike Langhammer, Ulrike Mückstein, Stefanie Widder Jan Cupal, Kurt Grünberger, Andreas Svrček-Seiler, Stefan Wuchty Andreas DeStefano Ulrike Göbel, Institut für Molekulare Biotechnologie, Jena, GE Walter Grüner, Stefan Kopp, Jaqueline Weber

slide-96
SLIDE 96

Web-Page for further information: http://www.tbi.univie.ac.at/~pks

slide-97
SLIDE 97