RNA Bioinformatics Beyond the One Sequence-One Structure Paradigm - - PowerPoint PPT Presentation

rna bioinformatics beyond the one sequence one structure
SMART_READER_LITE
LIVE PREVIEW

RNA Bioinformatics Beyond the One Sequence-One Structure Paradigm - - PowerPoint PPT Presentation

RNA Bioinformatics Beyond the One Sequence-One Structure Paradigm Peter Schuster Institut fr Theoretische Chemie, Universitt Wien, Austria and The Santa Fe Institute, Santa Fe, New Mexico, USA 2008 Molecular Informatics and Bioinformatics


slide-1
SLIDE 1
slide-2
SLIDE 2

RNA Bioinformatics Beyond the One Sequence-One Structure Paradigm Peter Schuster

Institut für Theoretische Chemie, Universität Wien, Austria and The Santa Fe Institute, Santa Fe, New Mexico, USA

2008 Molecular Informatics and Bioinformatics Collegium Budapest, 27.– 29.03.2008

slide-3
SLIDE 3

Web-Page for further information: http://www.tbi.univie.ac.at/~pks

slide-4
SLIDE 4

1. Computation of RNA equilibrium structures 2. Inverse folding and neutral networks 3. Evolutionary optimization of structure 4. Suboptimal conformations and kinetic folding

slide-5
SLIDE 5
  • 1. Computation of RNA equilibrium structures

2. Inverse folding and neutral networks 3. Evolutionary optimization of structure 4. Suboptimal conformations and kinetic folding

slide-6
SLIDE 6

O CH2 OH O O P O O O

N1

O CH2 OH O P O O O

N2

O CH2 OH O P O O O

N3

O CH2 OH O P O O O

N4

N A U G C

k =

, , ,

3' - end 5' - end Na Na Na Na

5'-end 3’-end

GCGGAU AUUCGC UUA AGUUGGGA G CUGAAGA AGGUC UUCGAUC A ACCA GCUC GAGC CCAGA UCUGG CUGUG CACAG

Definition of RNA structure

slide-7
SLIDE 7

N = 4n NS < 3n Criterion: Minimum free energy (mfe) Rules: _ ( _ ) _ {AU,CG,GC,GU,UA,UG} A symbolic notation of RNA secondary structure that is equivalent to the conventional graphs

slide-8
SLIDE 8

Conventional definition of RNA secondary structures

slide-9
SLIDE 9

H-type pseudoknot

slide-10
SLIDE 10

j n n j j n n

S S S S

− − = − +

⋅ + =

1 1 1 1

Counting the numbers of structures of chain length n n+1

M.S. Waterman, T.F. Smith (1978) Math.Bioscience 42:257-266

slide-11
SLIDE 11

Restrictions on physically acceptable mfe-structures: 3 and 2

slide-12
SLIDE 12

Size restriction of elements: (i) hairpin loop (ii) stack

σ λ ≥ ≥

stack loop

n n

⎣ ⎦

∑ ∑

+ − − = + − + − − + = + − + − + +

Ξ = Φ ⋅ Φ + = Ξ Φ + Ξ =

2 / ) 1 ( 1 1 2 1 2 2 2 1 1 1 1 1 λ σ σ λ m k k m m m k k m k m m m m m

S S S Sn # structures of a sequence with chain length n

Recursion formula for the number of physically acceptable stable structures

I.L.Hofacker, P.Schuster, P.F. Stadler. 1998. Discr.Appl.Math. 89:177-207

slide-13
SLIDE 13

RNA sequence: GUAUCGAAAUACGUAGCGUAUGGGGAUGCUGGACGGUCCCAUCGGUACUCCA

Empirical parameters Biophysical chemistry: thermodynamics and kinetics RNA folding: Structural biology, spectroscopy of biomolecules, understanding molecular function

Sequence, structure, and design

RNA structure

  • f minimal free

energy

slide-14
SLIDE 14

S1

(h)

S9

(h)

F r e e e n e r g y G

  • Minimum of free energy

Suboptimal conformations

S0

(h) S2

(h)

S3

(h)

S4

(h)

S7

(h)

S6

(h)

S5

(h)

S8

(h)

The minimum free energy structures on a discrete space of conformations

slide-15
SLIDE 15

Elements of RNA secondary structures as used in free energy calculations

L

∑ ∑ ∑ ∑

+ + + + = ∆

loops internal bulges loops hairpin pairs base

  • f

stacks , 300

) ( ) ( ) (

i b l kl ij

n i n b n h g G

slide-16
SLIDE 16

Maximum matching

An example of a dynamic programming computation of the maximum number of base pairs Back tracking yields the structure(s).

i i+1 i+2 k Xi,k-1 j-1 j Xk+1,j j+1 [ k+1,j ] [i,k-1]

( ) { }

1 , , 1 1 , 1 , 1 ,

) 1 ( max , max

+ + − − ≤ ≤ +

+ + =

j k j k k i j k i j i j i

X X X X ρ

j 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 i G G C G C G C C C G G C G C C 1 G * * 1 1 1 1 2 3 3 3 4 4 5 6 6 2 G * * 1 1 2 2 2 3 3 4 4 5 6 3 C * * 1 1 1 2 3 3 3 4 5 5 4 G * * 1 1 2 2 2 3 4 5 5 5 C * * 1 1 2 2 3 4 4 4 6 G * * 1 1 1 2 3 3 3 4 7 C * * 1 2 2 2 2 3 8 C * * 1 1 1 2 2 2 9 C * * 1 1 2 2 2 10 G * * 1 1 1 2 11 G * * 1 1 12 C * * 1 13 G * * 1 14 C * * 15 C *

Minimum free energy computations are based on empirical energies

slide-17
SLIDE 17

1. Computation of RNA equilibrium structures

  • 2. Inverse folding and neutral networks

3. Evolutionary optimization of structure 4. Suboptimal conformations and kinetic folding

slide-18
SLIDE 18

RNA sequence: GUAUCGAAAUACGUAGCGUAUGGGGAUGCUGGACGGUCCCAUCGGUACUCCA

RNA folding: Structural biology, spectroscopy of biomolecules, understanding molecular function Inverse Folding Algorithm Iterative determination

  • f a sequence for the

given secondary structure

RNA structure

  • f minimal free

energy

Inverse folding of RNA: Biotechnology, design of biomolecules with predefined structures and functions

Sequence, structure, and design

slide-19
SLIDE 19

Compatibility of sequences and structures

slide-20
SLIDE 20

Compatibility of sequences and structures

slide-21
SLIDE 21

Inverse folding algorithm I0 I1 I2 I3 I4 ... Ik Ik+1 ... It S0 S1 S2 S3 S4 ... Sk Sk+1 ... St Ik+1 = Mk(Ik) and dS(Sk,Sk+1) = dS(Sk+1,St) - dS(Sk,St) < 0 M ... base or base pair mutation operator dS (Si,Sj) ... distance between the two structures Si and Sj ‚Unsuccessful trial‘ ... termination after n steps

slide-22
SLIDE 22

Approach to the target structure Sk in the inverse folding algorithm

slide-23
SLIDE 23

The inverse folding algorithm searches for sequences that form a given RNA secondary structure under the minimum free energy criterion.

slide-24
SLIDE 24

A mapping and its inversion

  • Gk =

( ) | ( ) =

  • 1

U

  • S

I S

k j j k

I

( ) = I S

j k Space of genotypes: = { I

S I I I I I S S S S S

1 2 3 4 N 1 2 3 4 M

, , , , ... , } ; Hamming metric Space of phenotypes: , , , , ... , } ; metric (not required) N M = {

slide-25
SLIDE 25
slide-26
SLIDE 26

1. Computation of RNA equilibrium structures 2. Inverse folding and neutral networks

  • 3. Evolutionary optimization of structure

4. Suboptimal conformations and kinetic folding

slide-27
SLIDE 27

Phenylalanyl-tRNA as target structure Structure of andomly chosen initial sequence

slide-28
SLIDE 28

Evolution in silico

  • W. Fontana, P. Schuster,

Science 280 (1998), 1451-1455

slide-29
SLIDE 29

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

slide-30
SLIDE 30

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

slide-31
SLIDE 31

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

slide-32
SLIDE 32

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

slide-33
SLIDE 33

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

slide-34
SLIDE 34

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

slide-35
SLIDE 35

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

slide-36
SLIDE 36

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

slide-37
SLIDE 37

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

slide-38
SLIDE 38

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

slide-39
SLIDE 39

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

slide-40
SLIDE 40

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

slide-41
SLIDE 41

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

slide-42
SLIDE 42

Replication rate constant: fk = / [ + dS

(k)]

dS

(k) = dH(Sk,S)

Selection constraint: Population size, N = # RNA molecules, is controlled by the flow Mutation rate: p = 0.001 / site replication N N t N ± ≈ ) ( The flowreactor as a device for studies of evolution in vitro and in silico

slide-43
SLIDE 43

In silico optimization in the flow reactor: Evolutionary Trajectory

slide-44
SLIDE 44

28 neutral point mutations during a long quasi-stationary epoch Transition inducing point mutations change the molecular structure Neutral point mutations leave the molecular structure unchanged

Neutral genotype evolution during phenotypic stasis

slide-45
SLIDE 45
slide-46
SLIDE 46
slide-47
SLIDE 47

A sketch of optimization on neutral networks

slide-48
SLIDE 48

Randomly chosen initial structure Phenylalanyl-tRNA as target structure

slide-49
SLIDE 49

Application of molecular evolution to problems in biotechnology

slide-50
SLIDE 50

1. Computation of RNA equilibrium structures 2. Inverse folding and neutral networks 3. Evolutionary optimization of structure

  • 4. Suboptimal conformations and kinetic folding
slide-51
SLIDE 51

RNA secondary structures derived from a single sequence

slide-52
SLIDE 52

An algorithm for the computation of all suboptimal structures of RNA molecules using the same concept for retrieval as applied in the sequence alignment algorithm by M.S. Waterman and T.F. Smith. Math.Biosci. 42:257-266, 1978.

slide-53
SLIDE 53

An algorithm for the computation of RNA folding kinetics

slide-54
SLIDE 54

The Folding Algorithm

A sequence I specifies an energy ordered set of compatible structures S(I):

S(I) = {S0 , S1 , … , Sm , O}

A trajectory Tk(I) is a time ordered series of structures in S(I). A folding trajectory is defined by starting with the open chain O and ending with the global minimum free energy structure S0 or a metastable structure Sk which represents a local energy minimum:

T0(I) = {O , S (1) , … , S (t-1) , S (t) , S (t+1) , … , S0} Tk(I) = {O , S (1) , … , S (t-1) , S (t) , S (t+1) , … , Sk}

Kinetic equation

( )

1 , , 1 , ) ( ) (

1 1 1

+ = − = − =

∑ ∑ ∑

+ = + = + =

m k k P P k t P t P dt dP

m i ki k i m i ik m i ki ik k

K

Transition rate prameters Pij(t) are defined by

Pij(t) = Pi(t) kij = Pi(t) exp(-∆Gij/2RT) / Σi Pji(t) = Pj(t) kji = Pj(t) exp(-∆Gji/2RT) / Σj exp(-∆Gki/2RT)

The symmetric rule for transition rate parameters is due to Kawasaki (K. Kawasaki, Diffusion constants near the critical point for time dependent Ising models. Phys.Rev. 145:224-230, 1966).

+ ≠ =

= Σ

2 , 1 m i k k k

Formulation of kinetic RNA folding as a stochastic process and by reaction kinetics

slide-55
SLIDE 55

Sh S1

(h)

S6

(h)

S7

(h)

S5

(h)

S2

(h)

S9

(h)

Free energy G

  • Local minimum

Suboptimal conformations

Search for local minima in conformation space

slide-56
SLIDE 56

F r e e e n e r g y G

  • "Reaction coordinate"

Sk S{ Saddle point T

{ k

F r e e e n e r g y G

  • Sk

S{ T

{ k

"Barrier tree"

Definition of a ‚barrier tree‘

slide-57
SLIDE 57

CUGCGGCUUUGGCUCUAGCC ....((((........)))) -4.30 (((.(((....))).))).. -3.50 (((..((....))..))).. -3.10 ..........(((....))) -2.80 ..(((((....)))...)). -2.20 ....(((..........))) -2.20 ((..(((....)))..)).. -2.00 ..((.((....))....)). -1.60 ....(((....)))...... -1.60 .....(((........))). -1.50 .((.(((....))).))... -1.40 ....((((..(...).)))) -1.40 .((..((....))..))... -1.00 (((.(((....)).)))).. -0.90 (((.((......)).))).. -0.90 ....((((..(....))))) -0.80 .....((....))....... -0.80 ..(.(((....))))..... -0.60 ....(((....)).)..... -0.60 (((..(......)..))).. -0.50 ..(((((....)).)..)). -0.50 ..(.(((....))).).... -0.40 ..((.......))....... -0.30 ..........((......)) -0.30 ...........((....)). -0.30 (((.(((....)))).)).. -0.20 ....(((.(.......)))) -0.20 ....(((..((....))))) -0.20 ..(..((....))..).... 0.00 .................... 0.00 .(..(((....)))..)... 0.10

S0 S1

M.T. Wolfinger, W.A. Svrcek-Seiler, C. Flamm, I.L. Hofacker, P.F. Stadler. 2004. J.Phys.A: Math.Gen. 37:4731-4741.

slide-58
SLIDE 58

CUGCGGCUUUGGCUCUAGCC ....((((........)))) -4.30 (((.(((....))).))).. -3.50 (((..((....))..))).. -3.10 ..........(((....))) -2.80 ..(((((....)))...)). -2.20 ....(((..........))) -2.20 ((..(((....)))..)).. -2.00 ..((.((....))....)). -1.60 ....(((....)))...... -1.60 .....(((........))). -1.50 .((.(((....))).))... -1.40 ....((((..(...).)))) -1.40 .((..((....))..))... -1.00 (((.(((....)).)))).. -0.90 (((.((......)).))).. -0.90 ....((((..(....))))) -0.80 .....((....))....... -0.80 ..(.(((....))))..... -0.60 ....(((....)).)..... -0.60 (((..(......)..))).. -0.50 ..(((((....)).)..)). -0.50 ..(.(((....))).).... -0.40 ..((.......))....... -0.30 ..........((......)) -0.30 ...........((....)). -0.30 (((.(((....)))).)).. -0.20 ....(((.(.......)))) -0.20 ....(((..((....))))) -0.20 ..(..((....))..).... 0.00 .................... 0.00 .(..(((....)))..)... 0.10

M.T. Wolfinger, W.A. Svrcek-Seiler, C. Flamm, I.L. Hofacker, P.F. Stadler. 2004. J.Phys.A: Math.Gen. 37:4731-4741.

slide-59
SLIDE 59

Arrhenius kinetics M.T. Wolfinger, W.A. Svrcek-Seiler, C. Flamm, I.L. Hofacker, P.F. Stadler. 2004. J.Phys.A: Math.Gen. 37:4731-4741.

slide-60
SLIDE 60

Arrhenius kinetic Exact solution of the kinetic equation M.T. Wolfinger, W.A. Svrcek-Seiler, C. Flamm, I.L. Hofacker, P.F. Stadler. 2004. J.Phys.A: Math.Gen. 37:4731-4741.

slide-61
SLIDE 61

JN1LH

1D 1D 1D 2D 2D 2D R R R

G GGGUGGAAC GUUC GAAC GUUCCUCCC CACGAG CACGAG CACGAG

  • 28.6 kcal·mol
  • 1

G/

  • 31.8 kcal·mol
  • 1

G G G G G G C C C C C C A A U U U U G G C C U U A A G G G C C C A A A A G C G C A A G C /G

  • 28.2 kcal·mol
  • 1

G G G G G G GG CCC C C C C C U G G G G C C C C A A A A A A A A U U U U U G G C C A A

  • 28.6 kcal·mol
  • 1

3 3 3 13 13 13 23 23 23 33 33 33 44 44 44

5' 5' 3’ 3’

Design of an RNA switch

slide-62
SLIDE 62

4 5 8 9 11

1 9 2 2 4 2 5 2 7 3 3 3 4

36

38 39 41 46 47

3

49

1

2 6 7 10

1 2 1 3 1 4 1 5 1 6 1 7 1 8 2 1 22 2 3 2 6 2 8 2 9 3 3 1 32 3 5 3 7

40

4 2 4 3 44 45 48 50

  • 26.0
  • 28.0
  • 30.0
  • 32.0
  • 34.0
  • 36.0
  • 38.0
  • 40.0
  • 42.0
  • 44.0
  • 46.0
  • 48.0
  • 50.0

2.77 5.32 2 . 9 3.4 2.36 2 . 4 4 2.44 2.44 1.46 1.44 1.66

1.9

2.14

2.51 2.14 2.51

2 . 1 4 1 . 4 7

1.49

3.04 2.97 3.04 4.88 6.13 6 . 8 2.89

Free energy [kcal / mole]

J1LH barrier tree

J.H.A. Nagel, C. Flamm, I.L. Hofacker, K. Franke, M.H. de Smit, P. Schuster, and C.W.A. Pleij. Nucleic Acids Res. 34:3568-3576 (2006)

slide-63
SLIDE 63

A ribozyme switch

E.A.Schultes, D.B.Bartel, Science 289 (2000), 448-452

slide-64
SLIDE 64

Two ribozymes of chain lengths n = 88 nucleotides: An artificial ligase (A) and a natural cleavage ribozyme of hepatitis--virus (B)

slide-65
SLIDE 65

The sequence at the intersection: An RNA molecules which is 88 nucleotides long and can form both structures

slide-66
SLIDE 66

Two neutral walks through sequence space with conservation of structure and catalytic activity

slide-67
SLIDE 67

Acknowledgement of support

Fonds zur Förderung der wissenschaftlichen Forschung (FWF) Projects No. 09942, 10578, 11065, 13093 13887, and 14898 Wiener Wissenschafts-, Forschungs- und Technologiefonds (WWTF) Project No. Mat05 Jubiläumsfonds der Österreichischen Nationalbank Project No. Nat-7813 European Commission: Contracts No. 98-0189, 12835 (NEST) Austrian Genome Research Program – GEN-AU: Bioinformatics Network (BIN) Österreichische Akademie der Wissenschaften Siemens AG, Austria Universität Wien and the Santa Fe Institute

Universität Wien

slide-68
SLIDE 68

Coworkers

Walter Fontana, Harvard Medical School, MA Christian Forst, Christian Reidys, Los Alamos National Laboratory, NM Peter Stadler, Bärbel Stadler, Universität Leipzig, GE Jord Nagel, Kees Pleij, Universiteit Leiden, NL Christoph Flamm, Ivo L.Hofacker, Andreas Svrček-Seiler, Universität Wien, AT Stefan Bernhart, Jan Cupal, Lukas Endler, Kurt Grünberger, Michael Kospach, Ulrike Langhammer, Rainer Machne, Ulrike Mückstein, Hakim Tafer, Andreas Wernitznig, Stefanie Widder, Michael Wolfinger, Stefan Wuchty, Dilmurat Yusuf, Universität Wien, AT Ulrike Göbel, Walter Grüner, Stefan Kopp, Jaqueline Weber, Institut für Molekulare Biotechnologie, Jena, GE

Universität Wien

slide-69
SLIDE 69

Web-Page for further information: http://www.tbi.univie.ac.at/~pks

slide-70
SLIDE 70