Prediction of RNA-RNA-Interaction 20 1 15 1 5 10 20 5 10 20 - - PowerPoint PPT Presentation

prediction of rna rna interaction
SMART_READER_LITE
LIVE PREVIEW

Prediction of RNA-RNA-Interaction 20 1 15 1 5 10 20 5 10 20 - - PowerPoint PPT Presentation

Prediction of RNA-RNA-Interaction 20 1 15 1 5 10 20 5 10 20 15 10 1 15 5 1 20 10 Can Alkan, Emre Karakoc, Joseph H. Nadeau, S. Cenk Sahinalp, Kaizhong Zhang. RNA-RNA interaction prediction and antisense RNA target search. JCB


slide-1
SLIDE 1

S.Will, 18.417, Fall 2011

Prediction of RNA-RNA-Interaction

1 10 20 10 1 1 10 1 10 20 20 20 5 5 5 15 15 15

Can Alkan, Emre Karakoc, Joseph H. Nadeau, S. Cenk Sahinalp, Kaizhong Zhang. RNA-RNA interaction prediction and antisense RNA target search. JCB 2006

  • define problem RIP (with and without PKs)
  • prove NP-completeness even without PK for Base pair-energy

model and more complex models (reduction from “longest common subsequence of multiple binary strings”, mLCP)

slide-2
SLIDE 2

S.Will, 18.417, Fall 2011

Relation between PK-Prediction and RIP

1 10 1 10 20 20 15 15 5 5 1 10 20 10 1 20 5 5 15 15

  • RNAcofold: concatenate RNAs A and B, predict PK-free

structure

  • specific restrictions on the structure of the interaction complex
  • Can we apply pseudoknot-prediction to concatenation?

Difference to Alkan-algorithm?

slide-3
SLIDE 3

S.Will, 18.417, Fall 2011

Semiautomatic RNA 3D Structure Modeling

Bruce A Shapiro, Yaroslava G Yingling, Wojciech Kasprzak and Eckart Bindewald. Bridging the gap in RNA structure prediction Current Opinion in Structural Biology. 2007

slide-4
SLIDE 4

S.Will, 18.417, Fall 2011

An automated pipeline: MC-Fold/MC-Sym

Marc Parisien & Francois Major. The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data. Nature 2008.

slide-5
SLIDE 5

S.Will, 18.417, Fall 2011

Potential obstacles

  • Reliability of secondary structure prediction

→prediction from alignments, covariance

  • Pseudoknots

→ pseudoknot prediction → covariance analysis of large multiple alignments

  • Non-canonical base pairs

→ experimental loop energies? learn from 3D-structures!

  • 3D-motifs (due to non-canonical base pairs)

→ learn from 3D-structures, isostericity

slide-6
SLIDE 6

S.Will, 18.417, Fall 2011

Non-canonical Base Pairs, 3D-Motifs and Isostericity

Recurrent structural RNA motifs, Isostericity Matrices and sequence

  • alignments. Aur´

elie Lescoute, Neocles B. Leontis, Christian Massire and Eric Westhof. NAR 2005.

slide-7
SLIDE 7

S.Will, 18.417, Fall 2011

Non-Canonical Base Pairs

Leontis, N.B. and Westhof, E. Geometric nomenclature and classification of RNA base pairs. RNA 2001

slide-8
SLIDE 8

S.Will, 18.417, Fall 2011

Back to MC-Fold/MC-sym

  • NCMs: Nucleotide Cyclic Motifs from PDB (531 structures)
  • MC-fold predicts secondary structure including non-canonical

base pairs by merging NCMs

  • Probability-based scoring

Pr[structure|seq] = Pr[NCMs|seq] × Pr[junctions|NCMs] × Pr[hinges|junctions] × Pr[pairs|hinges]

  • predict sub-optimals
slide-9
SLIDE 9

S.Will, 18.417, Fall 2011

Prediction Performance of MC-Fold

Predic t ed base pairs (%) RNAsubopt (Therm

  • dynam

ic s) CONTRAfold (Machine learning) MC-Fold (NCM)

False posit ives 6.7 7.5 17.9 False negat ives 25.2 26.9 10.1 True Posit ives 74.8 73.1 89.9

Canonic als 88.4 86.3 94.7 Non-c anonicals N/A 1.4 62.1

Mat t hew s =

TP (TPFN) TP (TPFP)

82.8 81.4 86.6

1968 base pairs (1665 Watson-Crick) in 264 hairpins from 182 different PDB structures

slide-10
SLIDE 10

S.Will, 18.417, Fall 2011

MC-Sym

  • libraries of 3D-fragments for each NCM
  • solve combinatorial puzzle, satisfy steric/RMSD constraints
  • Las-vegas algorithm (no exhaustive enumeration, could fail to

produce solution)

  • run-time in pipeline 24h
slide-11
SLIDE 11

S.Will, 18.417, Fall 2011

Example Predictions of MC-Fold/MC-Sym

G G A G U 5 G C U U C 10 A A C A G 15 U G C U U 20 G G A C G 25 C U C C 5 3 G A G A C U A U C G A C A U U U G A U A C A C U A U U U A U C A A U G G A U G U C U C 5 3 10 15 20 5 30 40 25 35 A G U G G A C C A C U G C C G G C A C G 3 25 16 5 G U G G U C U G A U G A G G C G C C G A A A C U C G U A A G A G U C A C C A C 3 5 I II A15 A20 A25 B125 B120 B115 B110 B105 III 3 5 G G G C C 5 A U A C C 10 U C U U G 15 G G C C U 20 G G U U A 25 G U A C C 30 U C U U C 35 G G U G G 40 G A A U A 45 C C A G G 50 U G C C C 55 5 3

e b a d c

3 5 3 5 7 16 5 11 3 7 5 3 4

II III

3 5 5 3

I

18 16 14

[Parisien&Major, Nature 2007]

slide-12
SLIDE 12

S.Will, 18.417, Fall 2011

Rfam / Infernal

  • Infernal: scan genomic data for RNA family members

Inference of RNA alignments

  • important tool for Rfam

Rfam 10.1 (June 2011, 1973 families) http://rfam.sanger.ac.uk/

  • in Rfam: ’hand-curated’ seed alignments ⇒ full alignments
  • use Stochastic Context Free Grammars to model RNA families
  • model of a family: Consensus Model (CM)

example structure: A U A : A A G : G C G < A A U < C C C < U U U _ U U U _ C C C _ G G

  • _

G G G > A A C

  • U

U A > C G C > U

  • G

: G C G < G A G < C C C

  • G

C A < A A C _ C A C _ A A A _ C G U > C U U > C G C > human mouse

  • rc

[structure] g . . . c . . . . a . . . a . . input multiple alignment:

1 5 10 15 20 25 28

C C G C G C GA A C G C A U A C G U U C G U A A

2 5 10 15 25 27 21

slide-13
SLIDE 13

S.Will, 18.417, Fall 2011

Infernal

Construct grammatical description

ROOT 1 MATL 2 2 MATL 3 3 BIF 4 4 14 5 13 12 6 11 7 8 9 10 BEGL 5 MATP 6 MATP 7 MATR 8 MATP 9 MATL 10 MATL 11 MATL 12 MATL 13 END 14 BEGR 15 MATL 16 15 MATP 17 16 27 MATP 18 17 26 MATL 19 18 MATP 20 19 25 MATL 21 21 MATL 22 22 MATL 23 23 END 24 3 2 4 14 5 13 12 6 11 7 10 8 9 15 16 27 17 26 18 19 25 21 23 22 consensus structure: guide tree:

slide-14
SLIDE 14

S.Will, 18.417, Fall 2011

Infernal

  • Construct CM from guide tree
  • Expand nodes of guide tree:

Add match, insertion, and deletion states

  • learn transition and output probabilities from alignment
  • CM comparable to profile HMM for protein families (Pfam)

S 1 IL 2 IR 3 ML 4 D 5 IL 6 ML 7 D 8 IL 9 B 10 S 11 MP 12 ML 13 MR 14 D 15 IL 16 IR 17 MP 18 ML 19 MR 20 D 21 IL 22 IR 23 MR 24 D 25 IR 26 MP 27 ML 28 MR 29 D 30 IL 31 IR 32 ML 33 D 34 IL 35 ML 36 D 37 IL 38 ML 39 D 40 IL 41 ML 42 D 43 IL 44 E 45 S 46 IL 47 ML 48 D 49 IL 50 MP 51 ML 52 MR 53 D 54 IL 55 IR 56 MP 57 ML 58 MR 59 D 60 IL 61 IR 62 ML 63 D 64 IL 65 MP 66 ML 67 MR 68 D 69 IL 70 IR 71 ML 72 D 73 IL 74 ML 75 D 76 IL 77 ML 78 D 79 IL 80 E 81

ROOT 1 MATL 2 MATL 3 BIF 4 BEGL 5 MATP 6 MATP 7 MATR 8 MATP 9 MATL 10 MATL 11 MATL 12 MATL 13 END 14 BEGR 15 MATL 16 MATP 17 MATP 18 MATL 19 MATP 20 MATL 21 MATL 22 MATL 23 END 24

MP 12 ML 13 MR 14 D 15 IL 16 IR 17

"split set" inserts "split set" inserts "split set" insert MATP 6 MATP 7 MATR 8

MP 18 ML 19 MR 20 D 21 IL 22 IR 23 MR 24 D 25 IR 26