How innovation occurs in evolution of molecules Peter Schuster - - PowerPoint PPT Presentation
How innovation occurs in evolution of molecules Peter Schuster - - PowerPoint PPT Presentation
How innovation occurs in evolution of molecules Peter Schuster Institut fr Theoretische Chemie und Molekulare Strukturbiologie der Universitt Wien Evolutionary innovation Praha, 30.05.2002 Darwinian principle is based on three functions:
How innovation occurs in evolution of molecules
Peter Schuster Institut für Theoretische Chemie und Molekulare Strukturbiologie der Universität Wien Evolutionary innovation Praha, 30.05.2002
Darwinian principle is based on three functions:
- Reproduction efficiency expressed by fitness of phenotypes.
- Variation of genotypes through imperfect copying and recombination.
- Selection of phenotypes based on differences in fitness.
Two additional features are required:
- Large reservoirs of genotypes and sufficiently rich repertoires of phenotypes.
- Mapping of genotypes into phenotypes with suitable properties.
The genotypes or genomes of individuals are DNA or RNA sequences. They are changing from generation to generation through mutation and
- recombination. Species are reproductively related ensembles of
individuals. Genotypes unfold into phenotypes, being molecular structures, viruses or
- rganisms, which are the targets of the evolutionary selection process.
The most common mutations are point mutations, which consist of single nucleotide exchanges. The Hamming distance of two sequences is the minimal number of single nucleotide exchanges that mutually converts the two sequence into each other.
A A A A A U U U U U U C C C C C C C C G G G G G G G G A U C G
= adenylate = uridylate = cytidylate = guanylate
5’-
- 3’
Genotype: The sequence of an RNA molecule consisting of monomers chosen from four classes, A, U, G, and C.
Phenotype: Three-dimensional structure of phenylalanyl transfer-RNA
Hydrogen bonds
Hydrogen bonding between nucleotide bases is the principle of template action of RNA and DNA.
G G G G C C C G C C G C C G C C G C C G C C C C G G G G G C G C
Plus Strand Plus Strand Minus Strand Plus Strand Plus Strand Minus Strand
3' 3' 3' 3' 3' 5' 5' 5' 3' 3' 5' 5' 5' +
Complex Dissociation Synthesis Synthesis
Complementary replication as the simplest copying mechanism of RNA
G G G C C C G C C G C C C G C C C G C G G G G C
Plus Strand Plus Strand Minus Strand Plus Strand 3' 3' 3' 3' 5' 3' 5' 5' 5'
Point Mutation Insertion Deletion
GAA AA UCCCG GAAUCC A CGA GAA AA UCCCGUCCCG GAAUCCA
Mutations represent the mechanism of variation in nucleic acids.
A A A A A U U U U U U C C C C C C C C G G G G G G G G A U C G
= adenylate = uridylate = cytidylate = guanylate
Combinatorial diversity of sequences: N = 4{ 4 = 1.801 10 possible different sequences
27 16
- 5’-
- 3’
Combinatorial diversity of heteropolymers illustrated by means of an RNA aptamer that binds to the antibiotic tobramycin
Sk I. = ( ) ψ fk f Sk = ( )
Sequence space Phenotype space Non-negative numbers
Mapping from sequence space into phenotype space and into fitness values
Evolution of RNA molecules based on Qβ phage
D.R.Mills, R,L,Peterson, S.Spiegelman, An extracellular Darwinian experiment with a self-duplicating nucleic acid molecule. Proc.Natl.Acad.Sci.USA 58 (1967), 217-224 S.Spiegelman, An approach to the experimental analysis of precellular evolution. Quart.Rev.Biophys. 4 (1971), 213-253 C.K.Biebricher, Darwinian selection of self-replicating RNA molecules. Evolutionary Biology 16 (1983), 1-52 C.K.Biebricher, W.C. Gardiner, Molecular evolution of RNA in vitro. Biophysical Chemistry 66 (1997), 179-192
RNA sample Stock solution: Q RNA-replicase, ATP, CTP, GTP and UTP, buffer
- Time
1 2 3 4 5 6 69 70 The serial transfer technique applied to RNA evolution in vitro
The increase in RNA production rate during a serial transfer experiment
A ribozyme switch
E.A.Schultes, D.B.Bartel, One sequence, two ribozymes: Implication for the emergence of new ribozyme folds. Science 289 (2000), 448-452
Two ribozymes of chain lengths n = 88 nucleotides: An artificial ligase (A) and a natural cleavage ribozyme of hepatitis-
- virus (B)
The sequence at the intersection: An RNA molecules which is 88 nucleotides long and can form both structures
Reference for the definition of the intersection and the proof of the intersection theorem
Two neutral walks through sequence space with conservation of structure and catalytic activity
Sequence of mutants from the intersection to both reference ribozymes
Reference for postulation and in silico verification of neutral networks
No new principle will declare itself from below a heap of facts.
Sir Peter Medawar, 1985
dx / dt = x - x x
j i i j i i
Σ
; Σ = 1 k k x
i i i i
Φ Φ = Σ
[A] = a = constant
Ij Ij I1 I2 I1 I2 I1 I2 Ij In Ij In In
+ + + + + +
(A) + (A) + (A) + (A) + (A) + (A) + kj kn kj k1 k2 Im Im Im
+
(A) + (A) + km
k = max {k ; j=1,2,...,n} x (t) 1 for t
m j m
- s = (km+1-km)/km
Selection of the „fittest“ or fastest replicating species
200 400 600 800 1000 0.2 0.4 0.6 0.8 1 Time [Generations] Fraction of advantageous variant s = 0.1 s = 0.01 s = 0.02
Selection of advantageous mutants in populations of N = 10 000 individuals
Theory of molecular evolution
M.Eigen, Self-organization of matter and the evolution of biological macromolecules. Naturwissenschaften 58 (1971), 465-526 M.Eigen, P.Schuster, The hypercycle. A principle of natural self-organization. Part A: Emergence of the hypercycle. Naturwissenschaften 58 (1977), 465-526 M.Eigen, P.Schuster, The hypercycle. A principle of natural self-organization. Part B: The abstract hypercycle. Naturwissenschaften 65 (1978), 7-41 M.Eigen, P.Schuster, The hypercycle. A principle of natural self-organization. Part C: The realistic hypercycle. Naturwissenschaften 65 (1978), 341-369 J.S.McCaskill, A localization threshold for macromolecular quasi-species from continuously distributed replication rates. J.Chem.Phys. 80 (1984), 5194-5205 M.Eigen, J.McCaskill, P.Schuster, The molecular quasispecies. Adv.Chem.Phys. 75 (1989), 149-263
- C. Reidys, C.Forst, P.Schuster, Replication and mutation on neutral networks.
Bull.Math.Biol. 63 (2001), 57-94
Ij Ij In I2 I1 Ij Ij Ij Ij
+ + + +
M + k Q
j 2j
k Q
j 1j
k Q
j nj
Σi
ij
Q = 1 Q = (1-p) p ; p ...... error rate per digit d(i,j) ...... Hamming distance between I and I
ij i j j i i ji i j i i i i i
n-d(i,j) d(i,j)
dx / dt = k Q x - x k x x Σ Φ Φ = Σ ; Σ = 1
Chemical kinetics of replication and mutation as parallel reactions
space Sequence C
- n
c e n t r a t i
- n
Master sequence Mutant cloud
The molecular quasispecies in sequence space
The RNA model considers RNA sequences as genotypes and simplified RNA structures, called secondary structures, as phenotypes. Variation is restricted to point mutations. The mapping from genotypes into phenotypes is many-to-one. Hence, it is redundant and not invertible. Genotypes, i.e. RNA sequences, which are mapped onto the same phenotype, i.e. the same RNA secondary structure, form neutral networks. Neutral networks are represented by graphs in sequence space.
RNA secondary structures and their properties
RNA secondary structures are listings of Watson-Crick and GU wobble base pairs, which are free of knots and pseudokots. Secondary structures are folding intermediates in the formation
- f full three-dimensional structures.
D.Thirumalai, N.Lee, S.A.Woodson, and D.K.Klimov. Annu.Rev.Phys.Chem. 52:751-762 (2001)
5'-End 5'-End 5'-End 3'-End 3'-End 3'-End
70 60 50 40 30 20 10 GCGGAU AUUCGC UUA AGDDGGGA M CUGAAYA AGMUC TPCGAUC A ACCA GCUC GAGC CCAGA UCUGG CUGUG CACAG
Sequence Secondary Structure Symbolic Notation
Definition and formation of the secondary structure of phenylalanyl-tRNA
RNA minimum free energy structures
Efficient algorithms based on dynamical programming are available for computation of secondary structures for given
- sequences. Inverse folding algorithms compute sequences
for given secondary structures.
M.Zuker and P.Stiegler. Nucleic Acids Res. 9:133-148 (1981) Vienna RNA Package: http:www.tbi.univie.ac.at (includes inverse folding, suboptimal structures, kinetic folding, etc.) I.L.Hofacker, W. Fontana, P.F.Stadler, L.S.Bonhoeffer, M.Tacker, and P. Schuster. Mh.Chem. 125:167-188 (1994)
UUUAGCCAGCGCGAGUCGUGCGGACGGGGUUAUCUCUGUCGGGCUAGGGCGC GUGAGCGCGGGGCACAGUUUCUCAAGGAUGUAAGUUUUUGCCGUUUAUCUGG UUAGCGAGAGAGGAGGCUUCUAGACCCAGCUCUCUGGGUCGUUGCUGAUGCG CAUUGGUGCUAAUGAUAUUAGGGCUGUAUUCCUGUAUAGCGAUCAGUGUCCG GUAGGCCCUCUUGACAUAAGAUUUUUCCAAUGGUGGGAGAUGGCCAUUGCAG
Criterion of Minimum Free Energy
Sequence Space Shape Space
.... GC UC .... CA .... GC UC .... GU .... GC UC .... GA .... GC UC .... CU
d =1
H
d =1
H
d =2
H
Point mutations as moves in sequence space
CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... G A G T A C A C
Hamming distance d (S ,S ) =
H 1 2
4 d (S ,S ) = 0
H 1 1
d (S ,S ) = d (S ,S )
H H 1 2 2 1
d (S ,S ) d (S ,S ) + d (S ,S )
H H H 1 3 1 2 2 3
- (i)
(ii) (iii)
The Hamming distance induces a metric in sequence space
4 2 1 8 16 10 19 9 14 6 13 5 11 3 7 12 21 17 22 18 25 20 26 24 28 27 23 15 29 30 31
Binary sequences are encoded by their decimal equivalents: = 0 and = 1, for example, "0" 00000 = "14" 01110 = , "29" 11101 = , etc. ≡ ≡ ≡ , C CCCCC C C C G GGG GGG G
Mutant class
1 2
3 4
5
Sequence space of binary sequences of chain lenght n=5
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Phenotype space Non-negative numbers Mapping from sequence space into phenotype space and into fitness values
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Phenotype space Non-negative numbers
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Phenotype space Non-negative numbers
Neutral networks of small RNA molecules can be computed by exhaustive folding of complete sequence spaces, i.e. all RNA sequences of a given chain length. This number, N=4l , becomes very large with increasing length, and is prohibitive for numerical computations. Neutral networks can be modelled by random graphs in sequence
- space. In this approach, nodes are inserted randomly into sequence
space until the size of the pre-image, i.e. the number of neutral sequences, matches the neutral network to be studied.
Random graph approach to neutral networks Sketch of sequence space Step 00
Random graph approach to neutral networks Sketch of sequence space Step 01
Random graph approach to neutral networks Sketch of sequence space Step 02
Random graph approach to neutral networks Sketch of sequence space Step 03
Random graph approach to neutral networks Sketch of sequence space Step 04
Random graph approach to neutral networks Sketch of sequence space Step 05
Random graph approach to neutral networks Sketch of sequence space Step 10
Random graph approach to neutral networks Sketch of sequence space Step 15
Random graph approach to neutral networks Sketch of sequence space Step 25
Random graph approach to neutral networks Sketch of sequence space Step 50
Random graph approach to neutral networks Sketch of sequence space Step 75
Random graph approach to neutral networks Sketch of sequence space Step 100
λj = 27 ,
/
12 λk = (k)
j
| | Gk
λ κ
cr = 1 - -1 (
1)
/ κ- λ λ
k cr . . . .
> λ λ
k cr . . . .
< network is connected Gk network is connected not Gk Connectivity threshold: Alphabet size : = 4
- AUGC
G S S
k k k
= ( ) | ( ) =
- 1
Υ
- I
I
j j
- cr
2 0.5 3 0.4226 4 0.3700 Mean degree of neutrality and connectivity of neutral networks
Giant Component
A multi-component neutral network
A connected neutral network
Optimization of RNA molecules in silico
W.Fontana, P.Schuster, A computer model of evolutionary optimization. Biophysical Chemistry 26 (1987), 123-147 W.Fontana, W.Schnabl, P.Schuster, Physical aspects of evolutionary optimization and
- adaptation. Phys.Rev.A 40 (1989), 3301-3321
M.A.Huynen, W.Fontana, P.F.Stadler, Smoothness within ruggedness. The role of neutrality in adaptation. Proc.Natl.Acad.Sci.USA 93 (1996), 397-401 W.Fontana, P.Schuster, Continuity in evolution. On the nature of transitions. Science 280 (1998), 1451-1455 W.Fontana, P.Schuster, Shaping space. The possible and the attainable in RNA genotype- phenotype mapping. J.Theor.Biol. 194 (1998), 491-515
s p a c e Sequence Concentration
Master sequence Mutant cloud “Off-the-cloud” mutations
The molecular quasispecies in sequence space
S{ = ( ) I{ f S
{ {
ƒ = ( )
S{ f{ I{
Mutation Genotype-Phenotype Mapping Evaluation of the Phenotype
Q{j
I1 I2 I3 I4 I5 In
Q
f1 f2 f3 f4 f5 fn
I1 I2 I3 I4 I5 I{ In+1 f1 f2 f3 f4 f5 f{ fn+1
Q
Evolutionary dynamics including molecular phenotypes
Stock Solution Reaction Mixture
Fitness function for
- ptimization in the
flow reactor: fk = / [+ dS
(k)]
- dS
(k) = ds(Ik,I
) The flowreactor as a device for studies of evolution in vitro and in silico
In silico optimization in the flow reactor: Trajectory Time (arbitrary units) A v e r a g e s t r u c t u r e d i s t a n c e t
- t
a r g e t d
- S
500 750 1000 1250 250 50 40 30 20 10
Evolutionary trajectory
Mean Hamming distance Position in sequence space Mean distance to target
Variation of the population in genotype space during optimization of phenotypes
t = 1.200 t = 1.360 t = 1.600 t = 4.000 t = 6.400 t = 6.640 t = 6.696 t = 6.840 time t in 106 replications
Spreading of a population during neutral evolution on a fitness plateau
44
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
00
Initial structure and final conformation of the optimization process
44 43
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
Reconstruction of the last step 43 44
44 43 42
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
Reconstruction of last-but-one step 42 43 ( 44)
44 43 42 41
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
Reconstruction of step 41 42 ( 43 44)
44 43 42 41 40
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
Reconstruction of step 40 41 ( 42 43 44)
44 43 42 41 40 39 Evolutionary process Reconstruction
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
Reconstruction of the relay series
Transition inducing point mutations Neutral point mutations
Change in RNA sequences during the final five relay steps 39 44
In silico optimization in the flow reactor: Trajectory and relay steps Time (arbitrary units) A v e r a g e s t r u c t u r e d i s t a n c e t
- t
a r g e t d
- S
500 750 1000 1250 250 50 40 30 20 10
Evolutionary trajectory
Relay steps
In silico optimization in the flow reactor: Uninterrupted presence Time (arbitrary units) A v e r a g e s t r u c t u r e d i s t a n c e t
- t
a r g e t d
- S
500 750 1000 1250 250 50 40 30 20 10
Evolutionary trajectory Uninterrupted presence
Relay steps
10 08 12 14 Time (arbitrary units) Average structure distance to target dS
- 500
250 20 10
Uninterrupted presence Evolutionary trajectory Number of relay step
09
Transition inducing point mutations Neutral point mutations
Neutral genotype evolution during phenotypic stasis
„...Variations neither useful not injurious would not be affected by natural selection, and would be left either a fluctuating element, as perhaps we see in certain polymorphic species, or would ultimately become fixed, owing to the nature of the organism and the nature of the conditions. ...“
Charles Darwin, Origin of species (1859)
Genotype Space F i t n e s s
Start of Walk End of Walk Random Drift Periods Adaptive Periods
Evolution in genotype space sketched as a non-descending walk in a fitness landscape
18 19 20 21 26 28 29 31
Time (arbitrary units)
750 1000 1250
Average structure distance to target dS
- 30
20 10
Uninterrupted presence Evolutionary trajectory 35 30 25 20 Number of relay step
A random sequence of minor or continuous transitions in the relay series
18 19 20 21 26 28 29 31
A random sequence of minor or continuous transitions in the relay series
Elongation of Stacks Shortening of Stacks Opening of Constrained Stacks
Multi- loop
Minor or continuous transitions: Occur frequently on single point mutations
10 10
1
10
2
10
3
10
4
10
5
Rank
10
- 6
10
- 5
10
- 4
10
- 3
10
- 2
10
- 1
Frequency of occurrence
5'-End 3'-End
70 60 50 40 30 20 10
10 2 5
Common neighbors Minor transitions
Probability of occurrence of different structures in the mutational neighborhood of tRNAphe
In silico optimization in the flow reactor: Uninterrupted presence Time (arbitrary units) A v e r a g e s t r u c t u r e d i s t a n c e t
- t
a r g e t d
- S
500 750 1000 1250 250 50 40 30 20 10
Evolutionary trajectory Uninterrupted presence
Relay steps
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
38 37 36 Major transition leading to clover leaf
Reconstruction of a major transitions 36 37 ( 38)
44 43 42 41 40 39 Evolutionary process Reconstruction
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
38 37 36 Major transition leading to clover leaf
Final reconstruction 36 44
Shift Roll-Over Flip Double Flip
a a b a a b α α α α β β
Closing of Constrained Stacks
Multi- loop
Major or discontinuous transitions: Structural innovations, occur rarely on single point mutations
10 10
1
10
2
10
3
10
4
10
5
Rank
10
- 6
10
- 5
10
- 4
10
- 3
10
- 2
10
- 1
Frequency of occurrence
5'-End 3'-End
70 60 50 40 30 20 10
10 2 5
Rare neighbors Major transitions
Probability of occurrence of different structures in the mutational neighborhood of tRNAphe
In silico optimization in the flow reactor: Major transitions Relay steps Major transitions Time (arbitrary units) A v e r a g e s t r u c t u r e d i s t a n c e t
- t
a r g e t d
- S
500 750 1000 1250 250 50 40 30 20 10
Evolutionary trajectory
In silico optimization in the flow reactor Time (arbitrary units) A v e r a g e s t r u c t u r e d i s t a n c e t
- t
a r g e t d
- S
500 750 1000 1250 250 50 40 30 20 10
Relay steps Major transitions
Uninterrupted presence Evolutionary trajectory
S1
(j)
Sk
(j)
S2
(j)
S3
(j)
Sm
(j)
k k k k k
P P P P P
- P
- Transition probabilities determining the presence of phenotype Sk
(j) in the population
N N-1 1 2 3 4 5 6 7 8 9 10
x
µ ν µ ν λ µ ν λ µ ν λ µ ν λ µ ν λ µ ν λ µ ν λ µ ν λ µ ν ν ν λ µ λ
λ λ ν (x) = x + ( -x)
N
(x) = x µ µ
T1,0 T0,1
Time t P a r t i c l e n u m b e r ( t )
X
2 4 6 8 10 12
Calculation of transition probabilities by means of a birth-and-death process with immigration
S1
(j)
Sk
(j)
S2
(j)
S3
(j)
Sm
(j)
k k k k k
P P P P P
- P
- N
=
sat (j)
p . . < > l
- (j)
1
00 09 31 44
Three important steps in the formation of the tRNA clover leaf from a randomly chosen initial structure
Stable tRNA clover leaf structures built from binary, GC-only, sequences exist. The corresponding sequences are readily found through inverse folding. Optimization by mutation and selection in the flow reactor has so far always been unsuccessful.
5'-End 3'-End
70 60 50 40 30 20 10
The neutral network of the tRNA clover leaf in GC sequence space is not connected, whereas to the corresponding neutral network in AUGC sequence space is very close to the critical connectivity threshold,
cr . Here, both inverse folding
and optimization in the flow reactor are successful.
The success of optimization depends on the connectivity of neutral networks.
Main results of computer simulations of molecular evolution
- Individual trajectories are not reproducible. The sequences of the target structures
- btained and the relay series were different. Nevertheless, solutions of comparable or
the same quality are almost always achieved.
- Transitions between molecular phenotypes represented by RNA structures can be
classified with respect to the induced structural changes. Minor transitions of high probability of occurrence are opposed by major transitions of low probability.
- Major transitions represent the relevant structural innovations in the course of
molecular evolution.
- The number of minor transitions decreases with increasing population size.
- The number of major transitions or structural innovations is approximately
constant for given start and stop structures.
- Not all structures are accessible through evolution in the flow reactor.
Coworkers
Walter Fontana, Santa Fe Institute, NM Christian Reidys, Christian Forst, Los Alamos National Laboratory, NM Peter F. Stadler, Universität Wien, AT Ivo L. Hofacker Christoph Flamm Bärbel Stadler, Andreas Wernitznig, Universität Wien, AT Michael Kospach, Ulrike Langhammer, Ulrike Mückstein, Stefanie Widder, Jan Cupal, Kurt Grünberger, Andreas Svrček-Seiler, Stefan Wuchty Ulrike Göbel, Institut für Molekulare Biotechnologie, Jena, GE Walter Grüner, Stefan Kopp, Jaqueline Weber
Evolutionary design of RNA molecules
D.B.Bartel, J.W.Szostak, In vitro selection of RNA molecules that bind specific ligands. Nature 346 (1990), 818-822 C.Tuerk, L.Gold, SELEX - Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249 (1990), 505-510 D.P.Bartel, J.W.Szostak, Isolation of new ribozymes from a large pool of random
- sequences. Science 261 (1993), 1411-1418
R.D.Jenison, S.C.Gill, A.Pardi, B.Poliski, High-resolution molecular discrimination by
- RNA. Science 263 (1994), 1425-1429
yes
Selection Cycle
no
Genetic Diversity
Desired Properties ? ? ? Selection Amplification Diversification
Selection cycle used in applied molecular evolution to design molecules with predefined properties
Retention of binders Elution of binders C h r
- m
a t
- g
r a p h i c c
- l
u m n
The SELEX technique for the evolutionary design of aptamers
A A A A A C C C C C C C C G G G G G G G G U U U U U U
5’- 3’-
A A A A A U U U U U U C C C C C C C C G G G G G G G G
5’-
- 3’
Formation of secondary structure of the tobramycin binding RNA aptamer
- L. Jiang, A. K. Suri, R. Fiala, D. J. Patel, Chemistry & Biology 4:35-50 (1997)
The three-dimensional structure of the tobramycin aptamer complex
- L. Jiang, A. K. Suri, R. Fiala, D. J. Patel,
Chemistry & Biology 4:35-50 (1997)
U U U U U G G G G G G G G G G G G G G G G G A A A A A A A A A A C C C C C C C C C C C C C C C
Cleavage site
The "hammerhead" ribozyme
OH OH OH ppp 5' 5' 3' 3'