RNA-RNA Interaction Prediction with Stochastic Grammars Sebastian - - PowerPoint PPT Presentation

rna rna interaction prediction with stochastic grammars
SMART_READER_LITE
LIVE PREVIEW

RNA-RNA Interaction Prediction with Stochastic Grammars Sebastian - - PowerPoint PPT Presentation

RNA-RNA Interaction Prediction with Stochastic Grammars Sebastian Wild Markus Nebel Anika Scheid {s_wild,nebel}@cs.uni-kl.de Fachbereich Informatik 5. October 2012 10. Herbstseminar der Bioinformatik Sebastian Wild RNA-RNA Interaction


slide-1
SLIDE 1

RNA-RNA Interaction Prediction with Stochastic Grammars

Sebastian Wild Markus Nebel Anika Scheid

{s_wild,nebel}@cs.uni-kl.de

Fachbereich Informatik

  • 5. October 2012
  • 10. Herbstseminar der Bioinformatik

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 1 / 12

slide-2
SLIDE 2

RNA secondary structure: model

primary structure: word over

  • a, c, g, u
  • secondary structure:

parentheses word

  • ( , |, )
  • gcccugauagcguagucacuagcgagucuguauucuaagaagaucacugaggguucgcgggg

| ( ( ( ( ( | ( ( ( ( | | | ) ) ) ) | | ( ( | | ( ( ( | | | | ) ) ) ) ) | | | ( ( | | ( ( | | | | | ) ) | ) ) | | ) ) ) ) ) Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 2 / 12

slide-3
SLIDE 3

RNA also interacts!

CopA CopT G A U C G A G A A A A G A U U C C G G G G A U C U C A U U U C G U U C U A A A G C C C C C A U G G A A A A G A U U C G G G G A U A U U U C A U U C U U C U A A G C C C C G A U G C G G U C C C A G 3' 5' 5' 3' G U C G G U G C C A C U U A A A 29bp fhlA G U G G A G G C C U U C CC U U U A C C G G C A U A U U A U A U A U A G C U C G U A G G U U C C G A U A C C G A C A A C U A U U G C A C G G G U U G U A A U G C C G G U A A C U A C G G U A U C G C G U A A U C C C A U A C G A U C A A G C G U C G G C U U U C U U A A A U C G G G G C C C C U 5' 3' 5' 3' A U G C C G G C A U A U G C A 41 nt AUACACCGA OxyS

bacterial antisense RNA interact non-trivally, “knotted” structure goal: predict whole interaction structure, not only interaction sites

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 3 / 12

slide-4
SLIDE 4

RNA-RNA interaction problem (RIP)

two RNA molecules interact, i. e. form a joint secondary structure predict the “best” one possible RIP in general NP-complete restrict structure

1

Exclude pseudo knots (internal & external)

2

Exclude Zig-Zags:

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 4 / 12

slide-5
SLIDE 5

Example joint secondary structure

joint structure can be encoded as “2D-word”:

pair of upper and lower word matching ( ) internal bond matching [ ] external bond (between two molecules) | unpaired base

( ( | | ) ) ( ( ) ) [ [ [ [ [ ( ( ( ) ) ) ] ] ] ] ]

  • ( ( ||) ) [ [ [ ( ( [ [ ) )

( ] ] ] ( ( ] ] ) ) )

  • Sebastian Wild

RNA-RNA Interaction Prediction 2012/10/05 5 / 12

slide-6
SLIDE 6

Stochastic CFG

Stochastic Context-free Grammars G = (N, Σ, R, S, P)

nonterminals N terminal alphabet Σ rule set R ⊂ N ×

  • N ∪ Σ

⋆ start nonterminal S ∈ N P : R → [0, 1]: rule probabilities

Probability of a derivation tree: Product of used rules’ probabilities For structure prediction:

Σ =

  • |b, ( b, ) b

: b ∈ {a, c, g, u}

  • unambiguous w. r. t. structure

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 6 / 12

slide-7
SLIDE 7

Structure Prediction with Formal Grammars

Grammar

terminals ACGU structure annotations parametric

stochastic model training data

ggcggugcca ( ( ( | | | ) ) ) | compute

parse trees

for secondary str.

prediction data

auggugggugccaa compute

most likely parse tree

for sequence

most likely structure

auggugggugccaa | ( ( ( ( | | | | ) ) ) ) | induces

  • rel. rule freq.

training phase

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 7 / 12

slide-8
SLIDE 8

Structure Prediction with Formal Grammars

Grammar

terminals ACGU structure annotations parametric

stochastic model training data

ggcggugcca ( ( ( | | | ) ) ) | compute

parse trees

for secondary str.

prediction data

auggugggugccaa compute

most likely parse tree

for sequence

most likely structure

auggugggugccaa | ( ( ( ( | | | | ) ) ) ) | induces

  • rel. rule freq.

training phase Both can all be done with the same stochastic parser.

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 7 / 12

slide-9
SLIDE 9

Stochastic CFG

Stochastic Context-free Grammars G = (N, Σ, R, S, P)

nonterminals N terminal alphabet Σ rule set R ⊂ N ×

  • N ∪ Σ

⋆ start nonterminal S ∈ N P : R → [0, 1]: rule probabilities

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 8 / 12

slide-10
SLIDE 10

2-dimensional CFG

2-dimensional Context-free Grammars G = (N, Σ, R, S, P)

nonterminals N terminal alphabet Σ rule set R ⊂ N ×

  • N ∪ (Σ⋆)2⋆

start nonterminal S ∈ N P : R → [0, 1]: rule probabilities

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 8 / 12

slide-11
SLIDE 11

A Simplistic Grammar for RIP

S → | , S → | S, S → ( S ) , S → ( S ) S S →

  • |
  • ,

S →

  • |
  • S,

S →

  • (
  • S
  • )
  • ,

S →

  • (
  • S
  • )
  • S

S → [

]

  • ,

S → [

]

  • S

Example: S

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 9 / 12

slide-12
SLIDE 12

A Simplistic Grammar for RIP

S → | , S → | S, S → ( S ) , S → ( S ) S S →

  • |
  • ,

S →

  • |
  • S,

S →

  • (
  • S
  • )
  • ,

S →

  • (
  • S
  • )
  • S

S → [

]

  • ,

S → [

]

  • S

Example: | S

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 9 / 12

slide-13
SLIDE 13

A Simplistic Grammar for RIP

S → | , S → | S, S → ( S ) , S → ( S ) S S →

  • |
  • ,

S →

  • |
  • S,

S →

  • (
  • S
  • )
  • ,

S →

  • (
  • S
  • )
  • S

S → [

]

  • ,

S → [

]

  • S

Example: |

(

  • S
  • )
  • S

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 9 / 12

slide-14
SLIDE 14

A Simplistic Grammar for RIP

S → | , S → | S, S → ( S ) , S → ( S ) S S →

  • |
  • ,

S →

  • |
  • S,

S →

  • (
  • S
  • )
  • ,

S →

  • (
  • S
  • )
  • S

S → [

]

  • ,

S → [

]

  • S

Example: |

(

  • (
  • S
  • )
  • )
  • S

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 9 / 12

slide-15
SLIDE 15

A Simplistic Grammar for RIP

S → | , S → | S, S → ( S ) , S → ( S ) S S →

  • |
  • ,

S →

  • |
  • S,

S →

  • (
  • S
  • )
  • ,

S →

  • (
  • S
  • )
  • S

S → [

]

  • ,

S → [

]

  • S

Example: |

(

  • (

[

]

  • S
  • )
  • )
  • S

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 9 / 12

slide-16
SLIDE 16

A Simplistic Grammar for RIP

S → | , S → | S, S → ( S ) , S → ( S ) S S →

  • |
  • ,

S →

  • |
  • S,

S →

  • (
  • S
  • )
  • ,

S →

  • (
  • S
  • )
  • S

S → [

]

  • ,

S → [

]

  • S

Example: |

(

  • (

[

]

[

]

  • S
  • )
  • )
  • S

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 9 / 12

slide-17
SLIDE 17

A Simplistic Grammar for RIP

S → | , S → | S, S → ( S ) , S → ( S ) S S →

  • |
  • ,

S →

  • |
  • S,

S →

  • (
  • S
  • )
  • ,

S →

  • (
  • S
  • )
  • S

S → [

]

  • ,

S → [

]

  • S

Example: |

(

  • (

[

]

[

]

[

]

  • )
  • )
  • S

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 9 / 12

slide-18
SLIDE 18

A Simplistic Grammar for RIP

S → | , S → | S, S → ( S ) , S → ( S ) S S →

  • |
  • ,

S →

  • |
  • S,

S →

  • (
  • S
  • )
  • ,

S →

  • (
  • S
  • )
  • S

S → [

]

  • ,

S → [

]

  • S

Example: |

(

  • (

[

]

[

]

[

]

  • )
  • )

| S

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 9 / 12

slide-19
SLIDE 19

A Simplistic Grammar for RIP

S → | , S → | S, S → ( S ) , S → ( S ) S S →

  • |
  • ,

S →

  • |
  • S,

S →

  • (
  • S
  • )
  • ,

S →

  • (
  • S
  • )
  • S

S → [

]

  • ,

S → [

]

  • S

Example: |

(

  • (

[

]

[

]

[

]

  • )
  • )

||

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 9 / 12

slide-20
SLIDE 20

A Simplistic Grammar for RIP

S → | , S → | S, S → ( S ) , S → ( S ) S S →

  • |
  • ,

S →

  • |
  • S,

S →

  • (
  • S
  • )
  • ,

S →

  • (
  • S
  • )
  • S

S → [

]

  • ,

S → [

]

  • S

Example: |

(

  • (

[

]

[

]

[

]

  • )
  • )

||

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 9 / 12

slide-21
SLIDE 21

A Simplistic Grammar for RIP

S → | , S → | S, S → ( S ) , S → ( S ) S S →

  • |
  • ,

S →

  • |
  • S,

S →

  • (
  • S
  • )
  • ,

S →

  • (
  • S
  • )
  • S

S → [

]

  • ,

S → [

]

  • S

Example:

| ( ( [ ] [ ] [ ] ) ) | |

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 9 / 12

slide-22
SLIDE 22

A Simplistic Grammar for RIP

S → | , S → | S, S → ( S ) , S → ( S ) S S →

  • |
  • ,

S →

  • |
  • S,

S →

  • (
  • S
  • )
  • ,

S →

  • (
  • S
  • )
  • S

S → [

]

  • ,

S → [

]

  • S

Example: |[ [ [ ||

((] ] ] ))

  • Sebastian Wild

RNA-RNA Interaction Prediction 2012/10/05 9 / 12

slide-23
SLIDE 23

Used Grammar

  • r
  • r
  • r
  • r
  • r
  • r
  • r
  • r
  • r
  • r
  • r

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 10 / 12

slide-24
SLIDE 24

Implementation

stochastic parser, independent of grammar mixture of Earley-Parser and dynamic programming fast manual implementation in C+ +

RNA pair n m runtime memory DIS DIS 35 35 2 min 300 MB CopA CopT 56 57 1 h 2 GB

  • mpA MicA

137 72 2 d 18 GB U2 and U6 snRNAs in yeast 21 144 95 1 week 34 GB

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 11 / 12

slide-25
SLIDE 25

Summary

This Talk: 2D-CFGs give stochastic model for RNA joint structures (only slight extension of CFGs needed) Earley parsing can be used to train model and compute most likely structures efficient implementation available Open Problems: get good training data full empirical evaluation of prediction quality

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 12 / 12

slide-26
SLIDE 26

Earley-Parsing

does not need normal form here: as formal calculus defined in terms of items

  • i j , A → α • β
  • can be derived iff

S ⇒

∗ w1,i−1Aγ ⇒ w1,i−1αβγ ⇒ ∗ w1,j−1βγ

S A α • β wi · · · wj − 1

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 13 / 12

slide-27
SLIDE 27

Earley-Parser for SCFGs

Items:

  • i j , A → α • β
  • derivable iff S′ ⇒

∗ w1,i−1Aγ ⇒ w1,i−1αβγ ⇒ ∗ w1,j−1βγ

Start-Item:

  • 1 1 , S′ → • S
  • , Goal-Item
  • 1 n + 1 , S′ → S •
  • Derivation Rules:

Scanner

  • i j − 1 , A → α • wj−1β
  • i

j , A → αwj−1 • β

  • Predictor:
  • i j , A → α • Bγ
  • j j , B → • β
  • Completer:
  • i r , A → α • Bγ
  • r j , B → β •
  • i j , A → αB • γ
  • Sebastian Wild

RNA-RNA Interaction Prediction 2012/10/05 14 / 12

slide-28
SLIDE 28

Earley-Parser for 2D-CFGs

Items:

  • i

k j l , A → α1 α2

  • β1

β2

  • derivable iff S′ ⇒

∗ u1,i−1 v1,k−1 Aγ ⇒ u1,i−1 v1,k−1 αβγ ⇒ ∗ u1,j−1 v1,l−1 βγ

Start-Item:

  • 1

1 1 1 , S′ → •

  • S1

S2

  • , Goal-Item
  • 1

1 n+1 m+1 , S′ → S1 S2

  • Derivation Rules:

Scanner (upper) (lower similar):

  • i

k j−1 l

, A → α1

α2

  • uj−1β1

β2

  • i

k j l

, A → α1uj−1

α2

  • β1

β2

  • Predictor:
  • i

k j l , A → α1 α2

  • B1γ1

B2γ2

  • j

l j l , B → •

  • β1

β2

  • Completer:
  • i

k r s , A → α1 α2

  • B1γ1

B2γ2

  • r

s j l , B → β1 β2

  • i

k j l , A → α1B1 α2B2

  • γ1

γ2

  • Sebastian Wild

RNA-RNA Interaction Prediction 2012/10/05 14 / 12

slide-29
SLIDE 29

Implementation Note: Dense Grammars

Number of items: Θ( 1

4n2m2)

for n and m lengths of two RNA sequences structure prediction grammars are dense: most items are derivable compute values for all items à la dynamic programming

Sebastian Wild RNA-RNA Interaction Prediction 2012/10/05 15 / 12