Modeling and studying RNA secondary structure Eugne Asarin LIAFA, - - PowerPoint PPT Presentation

modeling and studying rna secondary structure
SMART_READER_LITE
LIVE PREVIEW

Modeling and studying RNA secondary structure Eugne Asarin LIAFA, - - PowerPoint PPT Presentation

Modeling and studying RNA secondary structure Eugne Asarin LIAFA, CNRS & Univ. Paris Diderot Credits Co-authors, partners and teachers: Vassily Lyubetsky, Alexander Seliverstov (IITP) Thierry Cachat, Tayssir Touili (LIAFA)


slide-1
SLIDE 1

Modeling and studying RNA secondary structure

Eugène Asarin LIAFA, CNRS & Univ. Paris Diderot

slide-2
SLIDE 2

Credits

Co-authors, partners and teachers:

  • Vassily Lyubetsky, Alexander Seliverstov (IITP)
  • Thierry Cachat, Tayssir Touili (LIAFA)

Sponsor:

  • CNRS/RAS convention d’échanges

EVOLVER/REVERA

Special thanks to:

  • Hervé Isambert (I. Curie)
slide-3
SLIDE 3

Disclaimer

  • I am not a bioinformatician (yet)
  • I am (still) a computer scientist with verification

background

– everything is a transition system – one should explore its state space in a smart way

  • This talk

– More informatics than byology+physics – More models than solutions – More questions and speculations than answers

slide-4
SLIDE 4

Motivating example: Classical Attenuation Regulation

slide-5
SLIDE 5

Transcription and Translation

DNA A G C T G C

slide-6
SLIDE 6

Transcription and Translation

Polymerase DNA RNA Ribosome Amino Acids Transcription: DNA to RNA, done by Polymerase Translation: RNA to Amino Acids , done by Ribosome Gene

slide-7
SLIDE 7

Gene Expression

Gene DNA RNA Ribosome Amino Acids

Gene expressed if Polymerase reach the Gene

slide-8
SLIDE 8

Classical Attenuation Regulation

Gene DNA RNA Ribosome

Depends on the structure of the RNA between Ribosome and Polymerase

slide-9
SLIDE 9

RNA Secondary Structure

A C A C U G G C U C A C C U U C G G G U G G G C C U U U C U C G

RNA: a sequence of nucleotides A, G, C, U with links A-U and G-C ACACUG C C A C U C G G G U G A G C CUUUCUGCG U U G C Helix

slide-10
SLIDE 10

RNA Secondary Structure

(a simplified view)

This structure is dynamic and changes very fast and can cause the slippage of the Polymerase

slide-11
SLIDE 11

Polymerase Slippage

Gene RNA Ribosome

T-rich region: connection of Pol and DNA weakens

slide-12
SLIDE 12

Polymerase Slippage

Gene RNA Ribosome

T-rich region: connection of Pol and DNA weakens Slippage of the polymerase and the gene is not expressed: Termination

slide-13
SLIDE 13

Polymerase Slippage

Gene RNA Ribosome

T-rich region: connection of Pol and DNA weakens Polymerase reaches Gene and the Gene is expressed: Antitermination

Each of these two situations can happen with some probability.

slide-14
SLIDE 14

Regulation mechanism : causal chain

  • Concentration of a product (say trp)
  • Speed of Ribosome
  • Dynamics of secondary structure
  • Probability of Polymerase slippage
  • Gene expression
  • (production of trp)
slide-15
SLIDE 15

Wanted

  • A model of dynamics of the RNA

secondary structure capable to predict the probability of gene expression.

– Should be quantitative – Should represent transient behaviours (steady state not enough) – Should be validated by biological data on regulation

slide-16
SLIDE 16

Other motivations

  • Other kinds of regulation
  • Other alternative behaviors/structures on a

RNA, e.g. in ribozymes

  • Scientific curiosity
  • Ineresting transfers between Bio and Info
slide-17
SLIDE 17

Models, Analyses, Tools

slide-18
SLIDE 18

Tools (2 examples)

  • Rnamodel - Lyubetsky et al.
  • Kinefold – Isambert et al.
slide-19
SLIDE 19

The approach: Markov chain

  • Features

– The sequence is fixed. – States of the MC : all possible secondary structures on this sequence (or part of it) – Transitions: simple events (see below) – Rates: determined by E

  • Difficulties

– As usual, many parameters are difficult to find – The Markov Chain is huge and complex – Only Monte-Carlo simulation is possible – It is still heavy and slow

slide-20
SLIDE 20

Some recipes

  • Find a succinct structured representation of MC
  • Use on-the-fly state generation
  • Use symbolic representations
  • Use abstractions
  • Use acceleration
  • Use other advanced trchniques – perfect

simulation etc.

  • Think
slide-21
SLIDE 21

A succinct representation

We suggest Probabilistic Rewriting Systems

slide-22
SLIDE 22

The idea?

  • Represent the RNA secondary structure

by a term

  • Represent the dynamics of the secondary

structure by rewriting rules

slide-23
SLIDE 23

The set of windows

RNA Ribosome Polymerase

w =(R,P) R: position of the Ribosome in the RNA P: position of the Polymerase in the RNA W={w=(R,P) s.t. 13 R P l } l: the length of the regulatory region

slide-24
SLIDE 24

The helices

A C B D f = (A,B,C,D)

slide-25
SLIDE 25

The hypohelices

A C B D f = (A,B,C,D) g = (E,F,G,H) F E G H

f(g)

slide-26
SLIDE 26

The structure as a term

A C B D f = (A,B,C,D) w =(R,P) P R

w(f)

slide-27
SLIDE 27

The structure as a term

A C B D f = (A,B,C,D) g = (E,F,G,H) w =(R,P) P R

w(f , g)

E G F H

slide-28
SLIDE 28

The structure as a term

f j k h g R P

w =(R,P) w(f , g(h,k) , j) The order is not important

slide-29
SLIDE 29

The Dynamics as a Probabilistic Rewriting System

slide-30
SLIDE 30

Extension and Reduction of a helix

A C B D H G F E

f = (A,B,C,D) g = (E,F,G,H)

Rewriting rule: f g One rule for multipe contexts

slide-31
SLIDE 31

(De)-Composition of a Hypohelix

f j k h

slide-32
SLIDE 32

f j k h g

Rewriting rule: w(f , h , k , j) w(f , g(h , k) , j) Meta Rewriting rule: w( , h , k ) w( , g(h , k) )

(De)-Composition of a Helix

slide-33
SLIDE 33

f j k h m

(De)-Composition of a Helix

slide-34
SLIDE 34

f j k h g m

Meta Rewriting rule: m( , h , k ) m( , g(h , k) )

(De)-Composition of a Helix

One rule for multipe contexts

slide-35
SLIDE 35

The Window Movement

RNA Ribosome Polymerase

w =(R,P) Movement of the Ribosome: (R , P)(x) (R+3 , P) (x’) Movement of the Polymerase: (R , P) (x) (R , P+1) (x) Termination: Slippage of the Polymerase: (R , P) (x) three rules for multipe contexts

slide-36
SLIDE 36

Rates of the rewriting Rules

  • Rates of the Markov Chain determined by
  • E. Two terms

– Stacking energy (easy) – Free energy (a bit obscure)

  • Movements of Rib and Pol – even more
  • bscure
slide-37
SLIDE 37

Other optimisations

Implemented or not

slide-38
SLIDE 38

On-the-fly (everybody does it)

  • States of the MC are created only when

visited.

  • Still needed efficient data structures for the

states

  • Needed efficient algorithms to find all the

successors of a given state (and the rates)

slide-39
SLIDE 39

Concretely

repeat 1000 times s=empty window repeat find all successors s’ of s and rates s->s’ s= a random s’ until expressed or aborted

  • utput statistics
slide-40
SLIDE 40

Symbolic representation

(not used yet) A data structure (close to a formula) to represent current set of states or probability distribution.

  • Very successfull in verification domain
  • We tried probabilistic tree automata, they

explode

  • Thinking
slide-41
SLIDE 41

Abstraction

(everybody use it in a naive way)

  • Aim : to have fewer states
  • Idea: to group together several states
  • Rnamodel and Kinefold : only maximal

helices stored.

  • We try to group some close helices:

800000->300000 states

  • Abstraction-based algorithms should be

done systematically

slide-42
SLIDE 42

Abstraction (what if)

  • Biological description is very abstract

(terminator, antiterminator, noise)

  • What if … a model is possible at this

level?

  • To think
slide-43
SLIDE 43

Acceleration

  • Problem : many fast transitions without

any progress, major event rare

  • Partial solution: group similar states

together

  • A better one : Isambert’s clustering
  • To think more
slide-44
SLIDE 44

Advanced simulation methods

  • « Perfect simulation » - nice but only for

steady state

  • Other methods from Performance

Evaluation – mostly for N^k

  • To read and to think
slide-45
SLIDE 45

More complex structures

slide-46
SLIDE 46

Other structures

  • Helices can be « flipped » -

not a big deal

  • They can be pseudoknotted
  • Biologically relevant
slide-47
SLIDE 47

Consequences for modeling and analysis

  • What are legal configurations?
  • How to compute free energy?
  • Data structure : a set of helices, a tree with

shortcuts, a colored graphs?

  • Alphabet and state space much bigger…
  • Abstractions more involved.
  • Kinefold – yes, Rnamodel – in progress
slide-48
SLIDE 48

Dreams

  • To find simple abstract models
  • To analyze them w/o Monte-Carlo
  • To transfer techniques from Verification

and Performance evaluation and backwards

slide-49
SLIDE 49

Questions?

slide-50
SLIDE 50

Continuous Markov Chain

  • Probability for moving from s to s’ within t time

units is:

t s s

e

) ' , (

1

slide-51
SLIDE 51

Continuous Markov Chain

  • Probability for moving from s to s’ within t time

units is:

  • =
  • )

' , ( ) ( ), 1 ( ) ( ) ' , (

) (

s s s E e s E s s

t s E