rna secondary structure prediction
play

RNA Secondary Structure Prediction allowed pairs: G-C A-U G-U - PowerPoint PPT Presentation

LinearFold: Linear-Time Approximate RNA Folding by 5-to-3 dynamic programming and beam search x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA y


  1. LinearFold: Linear-Time Approximate RNA Folding 
 by 5’-to-3’ dynamic programming and beam search x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA y (((((((..((((........)))).(((((.......))))).....(((((.......)))))))))))).... 1 G C U C C A C G G C 70 76 G C 60 Liang Huang * G C A U G U A U A C U G C U U 10 G A G G G A Oregon State University & Baidu Research USA G C A U C U U C U C G C U 50 U G G A G C G G A U Joint work with He Zhang **, Dezhong Deng **, Kai Zhao, 
 A G G C G 20 Kaibo Liu, David Hendrix and David Mathews G C A U 30 C G 40 C G U A ISMB 2019 Proceedings Talk U A G C C * corresponding author ** equal contribution

  2. LinearFold: Linear-Time Approximate RNA Folding 
 by 5’-to-3’ dynamic programming and beam search x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA y (((((((..((((........)))).(((((.......))))).....(((((.......)))))))))))).... 1 G C U C C A C G G C 70 76 G C 60 Liang Huang * G C A U G U A U A C U G C U U 10 G A G G G A Oregon State University & Baidu Research USA G C A U C U U C U C G C U 50 U G G A G C G G A U Joint work with He Zhang **, Dezhong Deng **, Kai Zhao, 
 A G G C G 20 Kaibo Liu, David Hendrix and David Mathews G C A U 30 C G 40 C G U A ISMB 2019 Proceedings Talk U A G C C * corresponding author ** equal contribution

  3. LinearFold: Linear-Time Approximate RNA Folding 
 by 5’-to-3’ dynamic programming and beam search x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA y (((((((..((((........)))).(((((.......))))).....(((((.......)))))))))))).... 1 G C U C C A C G G C 70 76 G C 60 Liang Huang * G C A U G U A U A C U G C U U 10 G A G G G A Oregon State University & Baidu Research USA G C A U C U U C U C G C U 50 U G G A G C G G A U Joint work with He Zhang **, Dezhong Deng **, Kai Zhao, 
 A G G C G 20 Kaibo Liu, David Hendrix and David Mathews G C A U 30 C G 40 C G U A first O ( n ) (approx.) RNA folding algorithm 
 ISMB 2019 Proceedings Talk U A G C C & server (linearfold.org) with even higher accuracy than O ( n 3 ) algorithms * corresponding author ** equal contribution

  4. RNA Secondary Structure Prediction allowed pairs: G-C A-U G-U example: transfer RNA (tRNA) assume no crossing pairs (no pseudoknots) input x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA 2 2

  5. RNA Secondary Structure Prediction allowed pairs: G-C A-U G-U example: transfer RNA (tRNA) assume no crossing pairs (no pseudoknots) input x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA output y (((((((..((((........)))).(((((.......))))).....(((((.......)))))))))))).... 2 2

  6. RNA Secondary Structure Prediction allowed pairs: G-C A-U G-U example: transfer RNA (tRNA) assume no crossing pairs (no pseudoknots) input x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA output y (((((((..((((........)))).(((((.......))))).....(((((.......)))))))))))).... 1 G C U C C A C G G C 70 76 G C 60 G C A U G U A U A C C U U G U 10 G A G A G G C G A U C U C U G C U C U 50 U G C G A G G G A U A G G C G 20 G C A U 30 C 40 G C G U A U A G C C 2 2

  7. RNA Secondary Structure Prediction allowed pairs: G-C A-U G-U example: transfer RNA (tRNA) assume no crossing pairs (no pseudoknots) input x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA output y (((((((..((((........)))).(((((.......))))).....(((((.......)))))))))))).... 1 G C U C C A C G G C 70 76 G C 60 G C A U G U A U A C C U U G U 10 G A G A G G C G A U C U C U G C U C U 50 U G C G A G G G A U A G G C G 20 G C A U 30 C 40 G C G parse tree U A U A G C C 2 2

  8. RNA Secondary Structure Prediction allowed pairs: G-C A-U G-U example: transfer RNA (tRNA) assume no crossing pairs (no pseudoknots) input x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA output y (((((((..((((........)))).(((((.......))))).....(((((.......)))))))))))).... . S O ( n 3 ) . 1 G C U C C A NP VP C G . G C 70 76 DT NN VB NP G C 60 G C the man bit DT NN A U G U A U A C C U U G U the dog 10 G A G A G G C G A U C U C U G C U C U problem: standard structure prediction 50 U G C G A G G G A U algorithms are way too slow: O ( n 3 ) A G G C G 20 G C A U 30 C 40 G C G parse tree U A U A G C C 2 2

  9. RNA Secondary Structure Prediction allowed pairs: G-C A-U G-U example: transfer RNA (tRNA) assume no crossing pairs (no pseudoknots) input x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA output y (((((((..((((........)))).(((((.......))))).....(((((.......)))))))))))).... . S O ( n 3 ) . 1 G C U C C A NP VP C G . G C 70 76 DT NN VB NP G C 60 G C the man bit DT NN A U G U A U A C C U U G U the dog 10 G A G A G G C G A U C U C U G C U C U problem: standard structure prediction 50 U G C G A G G G A U algorithms are way too slow: O ( n 3 ) A G G C G 20 G C A U 30 C 40 G solution: adapt my linear-time dynamic C G parse tree U A programming algorithms from parsing U A G C C 2 2

  10. Results: LinearFold is Much Faster and More Accurate A 80 10 Standard O ( n 3 ) search Vienna RNAfold: ~ n 2.4 70 LinearFold: O ( n ) search CONTRAfold MFE: ~ n 2.2 Precision * 60 LinearFold-V: ~ n 1.2 8 running time (seconds) LinearFold-C: ~ n 1.1 50 existing ones * * 40 6 80 t 5 S R t G t 1 2 R m e S 6 3 R N r l N o R o S S P a r m u A Standard O ( n 3 ) search R N s r r p R R e e N A r P I N N A a I A A n s 70 LinearFold: O ( n ) search e t r o R n N A 4 Recall 60 * * 50 our work * 2 * 40 ** t 5 S R t G t 1 2 R m e S 6 3 R N r l N o R o S S P a r m u A R N s r r p R R e e N A r P I 0 N N A a I A A n s e t r o R n 0 1000 nt 2000 nt 3000 nt N A 3 3 C

  11. From Linguistics to Biology x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA y (((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....

  12. From Linguistics to Biology x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA y (((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....

  13. Computational Linguistics => Computational Biology linguistics compiler theory comp. linguistics computational biology 1955 Chomsky: context-free 
 dynamic O ( n 3 ) 1958 Backus & Naur: grammars (CFGs) … programming CFGs for program, lang. S 1964 C ocke \ bottom-up NP VP 1965 K asami - CKY O ( n 3 ) DT NN VB NP 1967 Y ounger / for all CFGs the man bit DT NN the dog 1978: Nussinov O ( n 3 ) RNA folding 1981: Zuker & Siegler 5

  14. Computational Linguistics => Computational Biology linguistics compiler theory comp. linguistics computational biology 1955 Chomsky: context-free 
 dynamic O ( n 3 ) 1958 Backus & Naur: grammars (CFGs) … programming CFGs for program, lang. S 1964 C ocke \ bottom-up NP VP 1965 K asami - CKY O ( n 3 ) DT NN VB NP 1967 Y ounger / for all CFGs the man bit DT NN the dog 1965 Knuth: LR parsing for 
 := restricted CFGs: O ( n ) id + x id const 1978: Nussinov O ( n 3 ) RNA folding y 3 x = y + 3; 1981: Zuker & Siegler O ( n ) 5

  15. Computational Linguistics => Computational Biology linguistics compiler theory comp. linguistics computational biology 1955 Chomsky: context-free 
 dynamic O ( n 3 ) 1958 Backus & Naur: grammars (CFGs) … programming CFGs for program, lang. S 1964 C ocke \ bottom-up NP VP 1965 K asami - CKY O ( n 3 ) DT NN VB NP 1967 Y ounger / for all CFGs the man bit DT NN the dog 1965 Knuth: LR parsing for 
 := restricted CFGs: O ( n ) id + x id const 1978: Nussinov O ( n 3 ) RNA folding y 3 x = y + 3; 1986 Tomita: G eneralized LR 
 1981: Zuker & Siegler O ( n ) for all CFGs: O ( n 3 ) O ( n 3 ) 5

  16. Computational Linguistics => Computational Biology linguistics compiler theory comp. linguistics computational biology 1955 Chomsky: context-free 
 dynamic O ( n 3 ) 1958 Backus & Naur: grammars (CFGs) … programming CFGs for program, lang. S 1964 C ocke \ bottom-up NP VP 1965 K asami - CKY O ( n 3 ) DT NN VB NP 1967 Y ounger / for all CFGs the man bit DT NN the dog 1965 Knuth: LR parsing for 
 := restricted CFGs: O ( n ) id + x id const 1978: Nussinov O ( n 3 ) RNA folding y 3 x = y + 3; 1986 Tomita: G eneralized LR 
 1981: Zuker & Siegler O ( n ) for all CFGs: O ( n 3 ) O ( n 3 ) 2010: Huang & Sagae: O ( n ) 
 O ( n ) (approx.) DP for all CFGs 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend