Diversity and Plasticity of RNA Beyond the - - PowerPoint PPT Presentation
Diversity and Plasticity of RNA Beyond the - - PowerPoint PPT Presentation
Diversity and Plasticity of RNA Beyond the One-Sequence-One-Structure Paradigm Peter Schuster Institut fr Theoretische Chemie und Molekulare Strukturbiologie der Universitt Wien Chemistry towards Biology Portoro, 8. 12.09.2002 5' -
Diversity and Plasticity of RNA
Beyond the One-Sequence-One-Structure Paradigm
Peter Schuster Institut für Theoretische Chemie und Molekulare Strukturbiologie der Universität Wien Chemistry towards Biology Portorož, 8.– 12.09.2002
The chemical formula of RNA consisting of nucleobases, ribose rings, phosphate groups, and sodium counterions
O CH2 OH O O P O O O
N1
O CH2 OH O P O O O
N2
O CH2 OH O P O O O
N3
O CH2 OH O P O O O
N4
N A U G C
k =
, , ,
3' - end 5' - end
Na Na Na Na
Magnesium ions play a special role and act as coordination centers which are indispensible for the formation of full three- dimensional structures
5'-End 3'-End
GCGGAU AUUCGC UUA AGDDGGGA M CUGAAYA AGMUC TPCGAUC A ACCA GCUC GAGC CCAGA UCUGG CUGUG CACAG
5'-End 3'-End
70 60 50 40 30 20 10
5'-End 3'-End
Crystallography NMR, FRET, ...... Biochemical probing Structure prediction and chemical
The one sequence – one structure paradigm
One day, when biomolecular structures were understood in sufficient detail, we would be able to design molecules with predefined structures and for a priori given purposes. Biomolecular structures are not fully understood yet, but the lack of knowledge in structure and function can be compensated by applying selection methods.
A A A A A U U U U U U C C C C C C C C G G G G G G G G A U C G
= adenylate = uridylate = cytidylate = guanylate
Combinatorial diversity of sequences: N = 4 4 = 1.801 10 possible different sequences
27 16
- 5’-
- 3’
Combinatorial diversity of heteropolymers illustrated by means of an RNA aptamer that binds to the antibiotic tobramycin Number of (different) sequences created by common scale random synthesis: 1015 – 1016.
Taming of sequence diversity through selection and evolutionary design of RNA molecules
D.B.Bartel, J.W.Szostak, In vitro selection of RNA molecules that bind specific ligands. Nature 346 (1990), 818-822 C.Tuerk, L.Gold, SELEX - Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249 (1990), 505-510 D.P.Bartel, J.W.Szostak, Isolation of new ribozymes from a large pool of random
- sequences. Science 261 (1993), 1411-1418
R.D.Jenison, S.C.Gill, A.Pardi, B.Poliski, High-resolution molecular discrimination by
- RNA. Science 263 (1994), 1425-1429
yes
Selection Cycle
no
Genetic Diversity
Desired Properties ? ? ? Selection Amplification Diversification
Selection cycle used in applied molecular evolution to design molecules with predefined properties
Retention of binders Elution of binders C h r
- m
a t
- g
r a p h i c c
- l
u m n
The SELEX technique for the evolutionary design of aptamers
A A A A A C C C C C C C C G G G G G G G G U U U U U U
5’- 3’-
A A A A A U U U U U U C C C C C C C C G G G G G G G G
5’-
- 3’
Formation of secondary structure of the tobramycin binding RNA aptamer
- L. Jiang, A. K. Suri, R. Fiala, D. J. Patel, Chemistry & Biology 4:35-50 (1997)
The three-dimensional structure of the tobramycin aptamer complex
- L. Jiang, A. K. Suri, R. Fiala, D. J. Patel,
Chemistry & Biology 4:35-50 (1997)
Mapping RNA sequences onto RNA structures The attempt to investigate this mapping is understood as a search for the relations between all possible 4n sequences and all thermodynamically stable structures, which are the structures of minimal free energy. Sequence-structure mappings of RNA molecules were studied by a variety of different experimental and in silico techniques.
5'-End 5'-End 5'-End 3'-End 3'-End 3'-End
70 60 50 40 30 20 10
GCGGAU AUUCGC UUA AGDDGGGA M CUGAAYA AGMUC TPCGAUC A ACCA GCUC GAGC CCAGA UCUGG CUGUG CACAG
Sequence Secondary structure Tertiary structure Symbolic notation
What is an RNA structure? The secondary structure is a listing of base pairs, and it is understood in contrast to the full 3D-structure dealing with atomic coordinates. An intermediate state of structural details is provided by RNA threading or other toy models.
RNA Secondary Structures and their Properties
RNA secondary structures are listings of Watson-Crick and GU wobble base pairs, which are free of knots and pseudokots. Secondary structures are folding intermediates in the formation of full three-dimensional structures.
D.Thirumalai, N.Lee, S.A.Woodson, and D.K.Klimov. Annu.Rev.Phys.Chem. 52:751-762 (2001)
RNA Minimum Free Energy Structures
Efficient algorithms based on dynamical programming are available for computation of secondary structures for given
- sequences. Inverse folding algorithms compute sequences
for given secondary structures.
M.Zuker and P.Stiegler. Nucleic Acids Res. 9:133-148 (1981) Vienna RNA Package: http:www.tbi.univie.ac.at (includes inverse folding, suboptimal structures, kinetic folding, etc.) I.L.Hofacker, W. Fontana, P.F.Stadler, L.S.Bonhoeffer, M.Tacker, and P. Schuster. Mh.Chem. 125:167-188 (1994)
UUUAGCCAGCGCGAGUCGUGCGGACGGGGUUAUCUCUGUCGGGCUAGGGCGC GUGAGCGCGGGGCACAGUUUCUCAAGGAUGUAAGUUUUUGCCGUUUAUCUGG UUAGCGAGAGAGGAGGCUUCUAGACCCAGCUCUCUGGGUCGUUGCUGAUGCG CAUUGGUGCUAAUGAUAUUAGGGCUGUAUUCCUGUAUAGCGAUCAGUGUCCG GUAGGCCCUCUUGACAUAAGAUUUUUCCAAUGGUGGGAGAUGGCCAUUGCAG
Criterion of Minimum Free Energy
Sequence Space Shape Space
Many sequences from the same minimum free energy secondary structure
Mapping from sequence space into phenotype space and into fitness values
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Phenotype space Non-negative numbers
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Phenotype space Non-negative numbers
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Phenotype space Non-negative numbers
A connected neutral network
Giant Component
A multi-component neutral network
5.10 5.90
2
2.90
8 14 15 18
2.60
17 23 19 27 22 38 45 25 36 33 39 40
3.10
43
3.40
41
3.30 7.40
5 3 7
3.00
4 10 9
3.40
6 13 12
3.10
11 21 20 16 28 29 26 30 32 42 46 44 24 35 34 37 49
2.80
31 47 48
S0 S1
Kinetic Structures Free Energy S0 S0 S1 S2 S3 S4 S5 S6 S7 S8 S10 S9 Minimum Free Energy Structure Suboptimal Structures T = 0 K , t T > 0 K , t T > 0 K , t finite
Different notions of RNA structure including suboptimal conformations
Partition Function of RNA Secondary Structures
John S. McCaskill. The equilibrium function and base pair binding probabilities for RNA secondary structure. Biopolymers 29 (1990), 1105-1119 Ivo L. Hofacker, Walter Fontana, Peter F. Stadler, L. Sebastian Bonhoeffer, Manfred Tacker, Peter Schuster. Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie 125 (1994), 167-188
3' 5'
Example of a small RNA molecule with two low-lying suboptimal conformations which contribute substantially to the partition function
UUGGAGUACACAACCUGUACACUCUUUC
Example of a small RNA molecule: n=28
„Dot plot“ of the minimum free energy structure (lower triangle) and the partition function (upper triangle) of a small RNA molecule (n=28) with low energy suboptimal configurations
U U G G A G U A C A C A A C C U G U A C A C U C U U U C U U G G A G U A C A C A A C C U G U A C A C U C U U U C C U U U C U C A C A U G U C C A A C A C A U G A G G U U U U G G A G U A C A C A A C C U G U A C A C U C U U U C
U U G G A G U A C A C A A C C U G U A C A C U C U U U C
U U G G A G U A C A C A A C C U G U A C A C U C U U U C U U G G A G U A C A C A A C C U G U A C A C U C U U U C
second suboptimal configuration first suboptimal configuration
minimum free energy configuration
∆E = 0.55 kcal / mole
0→2
∆E = 0.50 kcal / mole
1 →
- G = - 5.39 kcal / mole
3' 5'
5'-End 5'-End 5'-End 3'-End 3'-End 3'-End
70 60 50 40 30 20 10 GCGGAU AUUCGC UUA AGDDGGGA M CUGAAYA AGMUC TPCGAUC A ACCA GCUC GAGC CCAGA UCUGG CUGUG CACAG
Sequence Secondary Structure Symbolic Notation
Phenylalanyl-tRNA as an example for the computation of the partition function
tRNAphe
modified bases without
G
first suboptimal configuration E = 0.43 kcal / mole ∆ 0
1 →
3’ 5’
G C G G A U U U A G C U C A G D D G G G A G A G C MC C A G A C U G A A Y A U C U G G A G MU C C U G U G T P C G A U C C A C A G A A U U C G C A C C A G C G G A U U U A G C U C A G D D G G G A G A G C MC C A G A C U G A A Y A U C U G G A G MU C C U G U G T P C G A U C C A C A G A A U U C G C A C C A A C C A C G C U U A A G A C A C C U A G C P T G U G U C C U MG A G G U C U A Y A A G U C A G A C C M C G A G A G G G D D G A C U C G A U U U A G G C G G C G G A U U U A G C U C A G D D G G G A G A G C MC C A G A C U G A A Y A U C U G G A G M U C C U G U G T P C G A U C C A C A G A A U U C G C A C C A
tRNA modified bases
phe
with
first suboptimal configuration E = 0.94 kcal / mole ∆ 0
1 →
G C G G A U U U A G C U C A G D D G G G A G A G C M C C A G A C U G A A Y A U C U G G A G M U C C U G U G T P C G A U C C A C A G A A U U C G C A C C A
3’ 5’
Kinetic Folding of RNA at Elementary Step Resolution
The RNA folding process is resolved to base pair closure, base pair cleavage and base pair shift. The kinetic folding behavior is determined by computation
- f a sufficiently large ensemble of individual folding trajectories and taking an
average over them. The folding behavior is illustrated by barrier trees showing the path of lowest energy between two local minima of free energy.
C.Flamm, W.Fontana, I.L.Hofacker and P.Schuster. RNA, 6:325-338 (2000)
closure shift cleavage
Move set for elementary steps in kinetic RNA folding
Mean folding curves for three small RNA molecules with n=15 and very different folding behavior
Sh S1
(h)
S6
(h)
S7
(h)
S5
(h)
S2
(h)
S9
(h)
Free energy G Local minimum Suboptimal conformations
Search for local minima in conformation space
Free energy G0
- Free energy G0
- "Reaction coordinate"
Sk Sk S S Saddle point T
- k
T
- k
"Barrier tree"
I1 = ACUGAUCGUAGUCAC S0 S1 S2 S3 O
Example of an inefficiently folding small RNA molecule with n = 15
I2 = AUUGAGCAUAUUCAC S0 S1 S4 S2 S3 O
Example of an easily folding small RNA molecule with n = 15
I3 = CGGGCUAUUUAGCUG
S0 S1 S2 S3 O
Example of an easily folding and especially stable small RNA molecule with n = 15
Folding dynamics of the sequence GGCCCCUUUGGGGGCCAGACCCCUAAAAAGGGUC
C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G G G G G G G G G G G G G G G G G G G C C C C C C C C U U U U U U G G G G G C C C C C C C C C C C C C U U U A A A A A A A A A A U
3’-end
Minimum free energy conformation S0 Suboptimal conformation S1
C G
One sequence is compatible with two structures
5.10
2
2.90
8 14 15 18
2.60
17 23 19 27 22 38 45 25 36 33 39 40
3.10
43
3.40
41
3.30 7.40
5 3 7
3.00
4 10 9
3.40
6 13 12
3.10
11 21 20 16 28 29 26 30 32 42 46 44 24 35 34 37 49
2.80
31 47 48
S0 S1
Barrier tree of a sequence with two conformations
5.90
Is there experimental evidence for structural multiplicity
- f RNA sequences?
Are there RNA molecules with multiple functions? How can RNA molecules with multiple functions be designed?
U U U U U G G G G G G G G G G G G G G G G G A A A A A A A A A A C C C C C C C C C C C C C C C
Cleavage site
The "hammerhead" ribozyme
OH OH OH ppp 5' 5' 3' 3'
The smallest known catalytically active RNA molecule
A ribozyme switch
E.A.Schultes, D.B.Bartel, One sequence, two ribozymes: Implication for the emergence of new ribozyme folds. Science 289 (2000), 448-452
Two ribozymes of chain lengths n = 88 nucleotides: An artificial ligase (A) and a natural cleavage ribozyme of hepatitis-
- virus (B)
The sequence at the intersection: An RNA molecules which is 88 nucleotides long and can form both structures
Reference for the definition of the intersection and the proof of the intersection theorem
Two neutral walks through sequence space with conservation of structure and catalytic activity
Sequence of mutants from the intersection to both reference ribozymes
Reference for postulation and in silico verification of neutral networks
5'-End 3'-End
70 60 50 40 30 20 10
From RNA secondary structures to full three-dimensional structures. Example: Phenylalanyl-transfer-RNA
Which perspectives have RNA structure modelling and elaborate sequence- structure analysis? Secondary structures are based on the identification of base pairs with defined and
- nly marginally varying geometries that fit into A- or A’-type helices. Until now
a great variety of other classifiable base pairs have been found by crystallography and NMR. They can be readily included in structure prediction methods with are similar to the current algorithms for conventional secondary structures. What is needed, however, is the determination of thermodynamic parameters for these unconventional base-base interactions, as it was done in the nineteen-seventies for DNA and RNA double helical and loop structures. So far these data are scarce except H-type pseudo-knots and end-to-end stacking of helices. It seems that the prediction of RNA structures will be an easier task than that of proteins.
Classification of purine- pyrimidine base pairs
Classification of purine-purine base pairs
Classification of pyrimidine- pyrimidine base pairs
General classification
- f base pairs
N.B.Leontis and E. Westhof, RNA 7:499-512 (2001)