Prediction and Analysis of RNA Secondary Structures Peter Schuster - - PowerPoint PPT Presentation
Prediction and Analysis of RNA Secondary Structures Peter Schuster - - PowerPoint PPT Presentation
Prediction and Analysis of RNA Secondary Structures Peter Schuster Institut fr Theoretische Chemie und Molekulare Strukturbiologie der Universitt Wien RNA Secondary Structures in Dijon Dijon, 24. 26.06.2002 Three-dimensional structure
Prediction and Analysis of RNA Secondary Structures
Peter Schuster Institut für Theoretische Chemie und Molekulare Strukturbiologie der Universität Wien RNA Secondary Structures in Dijon Dijon, 24.– 26.06.2002
Three-dimensional structure of phenylalanyl-transfer-RNA
RNA Secondary Structures and their Properties
RNA secondary structures are listings of Watson-Crick and GU wobble base pairs, which are free of knots and pseudokots. Secondary structures are folding intermediates in the formation of full three-dimensional structures.
D.Thirumalai, N.Lee, S.A.Woodson, and D.K.Klimov. Annu.Rev.Phys.Chem. 52:751-762 (2001)
5'-End 5'-End 5'-End 3'-End 3'-End 3'-End
70 60 50 40 30 20 10 GCGGAU AUUCGC UUA AGDDGGGA M CUGAAYA AGMUC TPCGAUC A ACCA GCUC GAGC CCAGA UCUGG CUGUG CACAG
Sequence Secondary Structure Symbolic Notation
Definition and formation of the secondary structure of phenylalanyl-tRNA
5'-Ende 3'-Ende 10 20 30 40 50 60 70
Circle representation of tRNAphe
5'-Ende 3'-Ende Virtuelle Root
Tree representation of tRNAphe
76 60 50 40 30 20 10 70
3'-Ende 5'-Ende
Mountain representation of tRNAphe
Mountain representation used in structure prediction of medium size RNA molecules
Mountain representation used in structure prediction of large RNA molecules
5.10
2
2.90
8 14 15 18
2.60
17 23 19 27 22 38 45 25 36 33 39 40
3.10
43
3.40
41
3.30 7.40
5 3 7
3.00
4 10 9
3.40
6 13 12
3.10
11 21 20 16 28 29 26 30 32 42 46 44 24 35 34 37 49
2.80
31 47 48
S0 S1
Kinetic Structures Free Energy S0 S0 S1 S2 S3 S4 S5 S6 S7 S8 S10 S9 Minimum Free Energy Structure Suboptimal Structures T = 0 K , t T > 0 K , t T > 0 K , t finite
5.90
Different notions of RNA structure
RNA Minimum Free Energy Structures
Efficient algorithms based on dynamical programming are available for computation of secondary structures for given
- sequences. Inverse folding algorithms compute sequences
for given secondary structures.
M.Zuker and P.Stiegler. Nucleic Acids Res. 9:133-148 (1981) Vienna RNA Package: http:www.tbi.univie.ac.at (includes inverse folding, suboptimal structures, kinetic folding, etc.) I.L.Hofacker, W. Fontana, P.F.Stadler, L.S.Bonhoeffer, M.Tacker, and P. Schuster. Mh.Chem. 125:167-188 (1994)
UUUAGCCAGCGCGAGUCGUGCGGACGGGGUUAUCUCUGUCGGGCUAGGGCGC GUGAGCGCGGGGCACAGUUUCUCAAGGAUGUAAGUUUUUGCCGUUUAUCUGG UUAGCGAGAGAGGAGGCUUCUAGACCCAGCUCUCUGGGUCGUUGCUGAUGCG CAUUGGUGCUAAUGAUAUUAGGGCUGUAUUCCUGUAUAGCGAUCAGUGUCCG GUAGGCCCUCUUGACAUAAGAUUUUUCCAAUGGUGGGAGAUGGCCAUUGCAG
Minimum free energy criterion Inverse folding
1st 2nd 3rd trial 4th 5th
The inverse folding algorithm searches for sequences that form a given RNA secondary structure under the minimum free energy criterion.
UUUAGCCAGCGCGAGUCGUGCGGACGGGGUUAUCUCUGUCGGGCUAGGGCGC GUGAGCGCGGGGCACAGUUUCUCAAGGAUGUAAGUUUUUGCCGUUUAUCUGG UUAGCGAGAGAGGAGGCUUCUAGACCCAGCUCUCUGGGUCGUUGCUGAUGCG CAUUGGUGCUAAUGAUAUUAGGGCUGUAUUCCUGUAUAGCGAUCAGUGUCCG GUAGGCCCUCUUGACAUAAGAUUUUUCCAAUGGUGGGAGAUGGCCAUUGCAG
Criterion of Minimum Free Energy
Sequence Space Shape Space
.... GC UC .... CA .... GC UC .... GU .... GC UC .... GA .... GC UC .... CU
d =1
H
d =1
H
d =2
H
Point mutations as moves in sequence space
CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... G A G T A C A C
Hamming distance d (S ,S ) =
H 1 2
4 d (S ,S ) = 0
H 1 1
d (S ,S ) = d (S ,S )
H H 1 2 2 1
d (S ,S ) d (S ,S ) + d (S ,S )
H H H 1 3 1 2 2 3
- (i)
(ii) (iii)
The Hamming distance induces a metric in sequence space
4 2 1 8 16 10 19 9 14 6 13 5 11 3 7 12 21 17 22 18 25 20 26 24 28 27 23 15 29 30 31
Binary sequences are encoded by their decimal equivalents: = 0 and = 1, for example, "0" 00000 = "14" 01110 = , "29" 11101 = , etc. ≡ ≡ ≡ , C CCCCC C C C G GGG GGG G
Mutant class
1 2
3 4
5
Sequence space of binary sequences of chain lenght n=5
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Phenotype space Non-negative numbers Mapping from sequence space into phenotype space and into fitness values
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Phenotype space Non-negative numbers
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Phenotype space Non-negative numbers
Neutral networks of small RNA molecules can be computed by exhaustive folding of complete sequence spaces, i.e. all RNA sequences of a given chain length. This number, N=4n , becomes very large with increasing length, and is prohibitive for numerical computations. Neutral networks can be modelled by random graphs in sequence
- space. In this approach, nodes are inserted randomly into sequence
space until the size of the pre-image, i.e. the number of neutral sequences, matches the neutral network to be studied.
Random graph approach to neutral networks Sketch of sequence space Step 00
Random graph approach to neutral networks Sketch of sequence space Step 01
Random graph approach to neutral networks Sketch of sequence space Step 02
Random graph approach to neutral networks Sketch of sequence space Step 03
Random graph approach to neutral networks Sketch of sequence space Step 04
Random graph approach to neutral networks Sketch of sequence space Step 05
Random graph approach to neutral networks Sketch of sequence space Step 10
Random graph approach to neutral networks Sketch of sequence space Step 15
Random graph approach to neutral networks Sketch of sequence space Step 25
Random graph approach to neutral networks Sketch of sequence space Step 50
Random graph approach to neutral networks Sketch of sequence space Step 75
Random graph approach to neutral networks Sketch of sequence space Step 100
λj = 27 ,
/
12 λk = (k)
j
| | Gk
λ κ
cr = 1 - -1 (
1)
/ κ- λ λ
k cr . . . .
> λ λ
k cr . . . .
< network is connected Gk network is connected not Gk Connectivity threshold: Alphabet size : = 4
- AUGC
G S S
k k k
= ( ) | ( ) =
- 1
Υ
- I
I
j j
- cr
2 0.5 3 0.4226 4 0.3700 Mean degree of neutrality and connectivity of neutral networks
Giant Component
A multi-component neutral network
A connected neutral network
Suboptimal RNA Secondary Structures
Michael Zuker. On finding all suboptimal foldings of an RNA molecule. Science 244 (1989), 48-52 Stefan Wuchty, Walter Fontana, Ivo L. Hofacker, Peter Schuster. Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers 49 (1999), 145-165
3' 5'
Total number of structures including all suboptimal conformations, stable and unstable (with G0>0): #conformations = 1 416 661 Minimum free energy structure AAAGGGCACAGGGUGAUUUCAAUAAUUUUA Sequence
Example of a small RNA molecule: n=30
Density of stares of suboptimal structures of the RNA molecule with the sequence: AAAGGGCACAGGGUGAUUUCAAUAAUUUUA
Partition Function of RNA Secondary Structures
John S. McCaskill. The equilibrium function and base pair binding probabilities for RNA secondary structure. Biopolymers 29 (1990), 1105-1119 Ivo L. Hofacker, Walter Fontana, Peter F. Stadler, L. Sebastian Bonhoeffer, Manfred Tacker, Peter Schuster. Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie 125 (1994), 167-188
3' 5'
Example of a small RNA molecule with two low-lying suboptimal conformations which contribute substantially to the partition function
UUGGAGUACACAACCUGUACACUCUUUC
Example of a small RNA molecule: n=28
U U G G A G U A C A C A A C C U G U A C A C U C U U U C U U G G A G U A C A C A A C C U G U A C A C U C U U U C C U U U C U C A C A U G U C C A A C A C A U G A G G U U U U G G A G U A C A C A A C C U G U A C A C U C U U U C
U U G G A G U A C A C A A C C U G U A C A C U C U U U C
U U G G A G U A C A C A A C C U G U A C A C U C U U U C U U G G A G U A C A C A A C C U G U A C A C U C U U U C
second suboptimal configuration first suboptimal configuration
minimum free energy configuration
∆E = 0.55 kcal / mole
0→2
∆E = 0.50 kcal / mole
1 →
- G = - 5.39 kcal / mole
3' 5'
„Dot plot“ of the minimum free energy structure (lower triangle) and the partition function (upper triangle) of a small RNA molecule (n=28) with low energy suboptimal configurations
5'-End 5'-End 5'-End 3'-End 3'-End 3'-End
70 60 50 40 30 20 10 GCGGAU AUUCGC UUA AGDDGGGA M CUGAAYA AGMUC TPCGAUC A ACCA GCUC GAGC CCAGA UCUGG CUGUG CACAG
Sequence Secondary Structure Symbolic Notation
Phenylalanyl-tRNA as an example for the computation of the partition function
tRNAphe
modified bases without
G
first suboptimal configuration E = 0.43 kcal / mole ∆ 0
1 →
3’ 5’
G C G G A U U U A G C U C A G D D G G G A G A G C MC C A G A C U G A A Y A U C U G G A G MU C C U G U G T P C G A U C C A C A G A A U U C G C A C C A G C G G A U U U A G C U C A G D D G G G A G A G C MC C A G A C U G A A Y A U C U G G A G MU C C U G U G T P C G A U C C A C A G A A U U C G C A C C A A C C A C G C U U A A G A C A C C U A G C P T G U G U C C U MG A G G U C U A Y A A G U C A G A C C M C G A G A G G G D D G A C U C G A U U U A G G C G G C G G A U U U A G C U C A G D D G G G A G A G C MC C A G A C U G A A Y A U C U G G A G M U C C U G U G T P C G A U C C A C A G A A U U C G C A C C A
tRNA modified bases
phe
with
first suboptimal configuration E = 0.94 kcal / mole ∆ 0
1 →
G C G G A U U U A G C U C A G D D G G G A G A G C M C C A G A C U G A A Y A U C U G G A G M U C C U G U G T P C G A U C C A C A G A A U U C G C A C C A
3’ 5’
Kinetic Folding of RNA Secondary Structures
Christoph Flamm, Walter Fontana, Ivo L. Hofacker, Peter Schuster. RNA folding kinetics at elementary step resolution. RNA 6:325-338, 2000 Christoph Flamm, Ivo L. Hofacker, Sebastian Maurer-Stroh, Peter F. Stadler, Martin Zehl. Design of multistable RNA molecules. RNA 7:325-338, 2001
The Folding Algorithm
A sequence I specifies an energy ordered set of compatible structures S(I):
S(I) = {S0 , S1 , … , Sm , O}
A trajectory Tk(I) is a time ordered series of structures in S(I). A folding trajectory is defined by starting with the open chain O and ending with the global minimum free energy structure S0 or a metastable structure Sk which represents a local energy minimum:
T0(I) = {O , S (1) , … , S (t-1) , S (t) , S (t+1) , … , S0} Tk(I) = {O , S (1) , … , S (t-1) , S (t) , S (t+1) , … , Sk}
Transition probabilities Pij(t) = P rob{Si→Sj} are defined by
Pij(t) = Pi(t) kij = Pi(t) exp(-∆Gij/2RT) / Σi Pji(t) = Pj(t) kji = Pj(t) exp(-∆Gji/2RT) / Σj exp(-∆Gki/2RT)
The symmetric rule for transition rate parameters is due to Kawasaki (K. Kawasaki, Diffusion constants near the critical point for time depen-dent Ising models. Phys.Rev. 145:224-230, 1966).
∑
+ ≠ =
= Σ
2 , 1 m i k k k
Formulation of kinetic RNA folding as a stochastic process
Base pair formation Base pair formation Base pair cleavage Base pair cleavage
Base pair formation and base pair cleavage moves for nucleation and elongation of stacks
Base pair shift
Base pair shift move of class 1: Shift inside internal loops or bulges
Base pair shift
Base pair shift move of class 2: Shift involving free ends
Examples of rearrangements through consecutive shift moves
Mean folding curves for three small RNA molecules with different folding behavior
Sh S1
(h)
S6
(h)
S7
(h)
S5
(h)
S2
(h)
S9
(h)
Free energy G Local minimum Suboptimal conformations
Search for local minima in conformation space
Free energy G0
- Free energy G0
- "Reaction coordinate"
Sk Sk S{ S{ Saddle point T
{k
T
{k
"Barrier tree"
I1 = ACUGAUCGUAGUCAC S0 S1 S2 S3 O
Example of an unefficiently folding small RNA molecule with n = 15
I2 = AUUGAGCAUAUUCAC S0 S1 S4 S2 S3 O
Example of an easily folding small RNA molecule with n = 15
I3 = CGGGCUAUUUAGCUG
S0 S1 S2 S3 O
Example of an easily folding and especially stable small RNA molecule with n = 15
Folding dynamics of the sequence GGCCCCUUUGGGGGCCAGACCCCUAAAAAGGGUC
C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G G G G G G G G G G G G G G G G G G G C C C C C C C C U U U U U U G G G G G C C C C C C C C C C C C C U U U A A A A A A A A A A U
3’-end
Minimum free energy conformation S0 Suboptimal conformation S1
C G
One sequence is compatible with two structures
5.10
2
2.90
8 14 15 18
2.60
17 23 19 27 22 38 45 25 36 33 39 40
3.10
43
3.40
41
3.30 7.40
5 3 7
3.00
4 10 9
3.40
6 13 12
3.10
11 21 20 16 28 29 26 30 32 42 46 44 24 35 34 37 49
2.80
31 47 48
S0 S1
Barrier tree of a sequence with two conformations
5.90
modified
unmodified Folding dynamics of tRNAphe with and without modified nucelotides
Barrier tree of tRNAphe without modified nucelotides
Coworkers
Walter Fontana, Santa Fe Institute, NM Christian Reidys, Christian Forst, Los Alamos National Laboratory, NM Peter Stadler, Universität Leipzig, GE Ivo L.Hofacker, Christoph Flamm, Universität Wien, AT Bärbel Stadler, Andreas Wernitznig, Universität Wien, AT Michael Kospach, Ulrike Langhammer, Ulrike Mückstein, Stefanie Widder Jan Cupal, Kurt Grünberger, Andreas Svrček-Seiler, Stefan Wuchty Ulrike Göbel, Institut für Molekulare Biotechnologie, Jena, GE Walter Grüner, Stefan Kopp, Jaqueline Weber