Rfam Faster Genome Annotation of Input (hand-tuned): IRE (partial - PowerPoint PPT Presentation

Rfam Faster Genome Annotation of • Input (hand-tuned): IRE (partial seed alignment): Non-coding RNAs Without – MSA Hom.sap. GUUCCUGCUUCAACAGUGUUUGGAUGGAAC Hom.sap. UUUCUUC.UUCAACAGUGUUUGGAUGGAAC – SS_cons Loss of Accuracy Hom.sap. UUUCCUGUUUCAACAGUGCUUGGA.GGAAC – Score Thresh T Hom.sap. UUUAUC..AGUGACAGAGUUCACU.AUAAA Hom.sap. UCUCUUGCUUCAACAGUGUUUGGAUGGAAC – Window Len W Zasha Weinberg Hom.sap. AUUAUC..GGGAACAGUGUUUCCC.AUAAU Hom.sap. UCUUGC..UUCAACAGUGUUUGGACGGAAG • Output: & W.L. Ruzzo Hom.sap. UGUAUC..GGAGACAGUGAUCUCC.AUAUG Hom.sap. AUUAUC..GGAAGCAGUGCCUUCC.AUAAU – CM Recomb ‘04 Cav.por. UCUCCUGCUUCAACAGUGCUUGGACGGAGC – scan results Mus.mus. UAUAUC..GGAGACAGUGAUCUCC.AUAUG Mus.mus. UUUCCUGCUUCAACAGUGCUUGAACGGAAC Mus.mus. GUACUUGCUUCAACAGUGUUUGAACGGAAC Rat.nor. UAUAUC..GGAGACAGUGACCUCC.AUAUG Rat.nor. UAUCUUGCUUCAACAGUGUUUGGACGGAAC SS_cons <<<<<...<<<<<......>>>>>.>>>>> Covariance CM’s are good, but slow Model Rfam Reality Our Work Rfam Goal EMBL EMBL EMBL Key difference of CM vs HMM: Pair states emit paired symbols, Z Blast corresponding to base-paired nucleotides; 16 emission probabilities here. CM CM CM junk junk hits hits hits 1 month, ~2 months, 10 years, 1000 computers 1000 computers 1000 computers

Oversimplified CM CM to HMM (for pedagogical purposes only) A A A A C C C C A A G G G G C C U U U U G G – – – – U U – – CM HMM A A A A C C C C A A G G G G C C U U U U G G – – – – U U – – 25 emisions per state 5 emissions per state, 2x states Key Issue: 25 scores � 10 Viterbi/Forward Scoring A A A A • Path � defines transitions/emissions C C C C CM HMM G G G G • Score( � ) = product of “probabilities” on � U U U U – – – – • NB: ok if “probs” aren’t, e.g. !"# • E.g. in CM, emissions are odds ratios vs 0th- order background • For any nucleotide sequence x: • Need: log Viterbi scores CM � HMM – Viterbi-score(x) = max{ score( � ) | � emits x} – Forward-score(x) = !$ score( � ) | � emits x}

P AA � L A + R A P AC � L A + R C Key Issue: 25 scores � 10 Rigorous Filtering P AG � L A + R G P AU � L A + R U P A– � L A + R – … • Any scores satisfying the linear A A A A C C C C CM HMM inequalities give rigorous filtering G G G G U U U U – – – – NB:HMM not a prob. model Proof: • Need: log Viterbi scores CM � HMM CM Viterbi path score � “corresponding” HMM path score P AA � L A + R A P CA � L C + R A … P AC � L A + R C P CC � L C + R C … � Viterbi HMM path score P AG � L A + R G P CG � L C + R G … (even if it does not correspond to any CM path) P AU � L A + R U P CU � L C + R U … P A– � L A + R – P C– � L C + R – … Some scores filter better Optimizing filtering P UA = 1 � L U + R A • For any nucleotide sequence x: P UG = 4 � L U + R G Viterbi-score(x) = max{ score( � ) | � emits x } Forward-score(x) = !$ score( � ) | � emits x } Assuming ACGU � 25% • Expected Forward Score Option 1: Opt 1: E(L i , R i ) = ! x Forward-score(x)*Pr(x) L U = R A = R G = 2 L U + (R A + R G )/2 = 4 – NB: E is a function of L i , R i only Under 0th-order background model • Optimization: Option 2: Opt 2: Minimize E(L i , R i ) subject to score L.I.s L U = 0, R A = 1, R G = 4 L U + (R A + R G )/2 = 2.5 – This is heuristic (“forward � � Viterbi � � filter � ”) – But still rigorous because “subject to score L.I.s”

Calculating E(L i , R i ) Minimizing E(L i , R i ) E(L i , R i ) = ! x Forward-score(x)*Pr(x) • Calculate E(L i , R i ) symbolically , in terms of emission scores, so we can do • Forward-like: for every state, calculate partial derivatives for numerical convex expected score for all paths ending optimization algorithm there, easily calculated from expected � E ( L 1 , L 2 ,...) scores of predecessors & transition/ emission probabilities/scores � L i Results: buried treasures Estimated Filtering Efficiency # found # found # new (139 Rfam 4.0 families) rigorous Name BLAST filter + CM + CM Filtering # families # families Pyrococcus snoRNA 57 180 123 fraction (compact) (expanded) Iron response element 201 322 121 < 10 -4 105 110 Histone 3’ element 1004 1106 102 Purine riboswitch 69 123 54 10 -4 - 10 -2 8 17 Retron msr 11 59 48 .01 - .10 11 3 Hammerhead I 167 193 26 Hammerhead III 251 264 13 .10 - .25 2 2 U4 snRNA 283 290 7 S-box 128 131 3 .25 - .99 6 4 U6 snRNA 1462 1464 2 U5 snRNA 199 200 1 .99 - 1.0 7 3 U7 snRNA 312 313 1

Results: With additional work # with # with rigorous # new BLAST+CM filter series + CM Rfam tRNA 58609 63767 5158 Group II 5708 6039 331 intron tRNAscan- 608 729 121 SE (human) tmRNA 226 247 21 Lysine 60 71 11 riboswitch And more…

Rfam Faster Genome Annotation of Input (hand-tuned): IRE (partial - PowerPoint PPT Presentation

Rfam Faster Genome Annotation of Input (hand-tuned): IRE (partial seed alignment): Non-coding RNAs Without MSA Hom.sap. GUUCCUGCUUCAACAGUGUUUGGAUGGAAC Hom.sap. UUUCUUC.UUCAACAGUGUUUGGAUGGAAC SS_cons Loss of Accuracy Hom.sap.

COMP364: Manipula0ng Rfam data with Biopython Jrme Waldisphl,

Bacterial Genome Annotation Lucile Soler Annotation course 9 th -11 th may 2017 Bacterial genome

Genome Annotation The steps in genome sequencing Generate genome sequence Assembly ORF

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Annotation Processing in a Kotlin World Zac Sweers @pandanomic Annotation Processing in a

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

Genome Reassembly From Fragments 7 January 2019 OSU CSE 1 Genome A genome is the encoding

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

Quantifying gene expression Genome Sequence reads GTF (annotation)? FASTQ (+reference

Annotation and Evaluation Diana Maynard, Niraj Aswani University of Sheffield University of

Lecture 2 Annotation tools & Segmentation Summary of Part 1 Annotation theory

Systematic Annotation Mark Voorhies 4/5/2012 Mark Voorhies Systematic Annotation Review RTFM

Assessing annotation Assessing annotation consistency in the Gene consistency in the Gene

Introduction Detecting Errors in Effects of Annotation Errors Detecting Errors in Corpus

Web Annotations Building the Experience Annotation An annotation is something added. It is not

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, Ph.D. UCSC Genome Bioinformatics

Can You Afford Not To? Au Augu gust 1 13 th th , 201 2019 Chris ristoph pher B Bro rown

Machine Translation May 28, 2013 Christian Federmann Saarland University

Delegates Assembly Membership Update October 28, 2020 Lostant Community Library - Update New

Cdigo funcional em Java: Superando o hype Eder Ignatowicz @ederign Lambda Expressions 101

FairRoot Status &

version 0.1 Svelte. .

Local optimization of JavaScript code FBIO FARZAT MRCIO BARROS MRCIO BARROS MRCIO

Building fast web applications Thierry Sans Users respond to speed Amazon found every 100ms

Sambuz

Useful Links

Newsletter

Mail Us

Rfam Faster Genome Annotation of Input (hand-tuned): IRE (partial - PowerPoint PPT Presentation

Rfam Faster Genome Annotation of Input (hand-tuned): IRE (partial seed alignment): Non-coding RNAs Without MSA Hom.sap. GUUCCUGCUUCAACAGUGUUUGGAUGGAAC Hom.sap. UUUCUUC.UUCAACAGUGUUUGGAUGGAAC SS_cons Loss of Accuracy Hom.sap.

COMP364: Manipula0ng Rfam data with Biopython Jrme Waldisphl,

Bacterial Genome Annotation Lucile Soler Annotation course 9 th -11 th may 2017 Bacterial genome

Genome Annotation The steps in genome sequencing Generate genome sequence Assembly ORF

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics &amp; Computational

Annotation Processing in a Kotlin World Zac Sweers @pandanomic Annotation Processing in a

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

Genome Reassembly From Fragments 7 January 2019 OSU CSE 1 Genome A genome is the encoding

Genome Sequencing &amp; Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

Quantifying gene expression Genome Sequence reads GTF (annotation)? FASTQ (+reference

Annotation and Evaluation Diana Maynard, Niraj Aswani University of Sheffield University of

Lecture 2 Annotation tools &amp; Segmentation Summary of Part 1 Annotation theory

Systematic Annotation Mark Voorhies 4/5/2012 Mark Voorhies Systematic Annotation Review RTFM

Assessing annotation Assessing annotation consistency in the Gene consistency in the Gene

Introduction Detecting Errors in Effects of Annotation Errors Detecting Errors in Corpus

Web Annotations Building the Experience Annotation An annotation is something added. It is not

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, Ph.D. UCSC Genome Bioinformatics

Can You Afford Not To? Au Augu gust 1 13 th th , 201 2019 Chris ristoph pher B Bro rown

Machine Translation May 28, 2013 Christian Federmann Saarland University

Delegates Assembly Membership Update October 28, 2020 Lostant Community Library - Update New

Cdigo funcional em Java: Superando o hype Eder Ignatowicz @ederign Lambda Expressions 101

FairRoot Status &amp;

version 0.1 Svelte. .

Local optimization of JavaScript code FBIO FARZAT MRCIO BARROS MRCIO BARROS MRCIO

Building fast web applications Thierry Sans Users respond to speed Amazon found every 100ms

Sambuz

Useful Links

Newsletter

Mail Us

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

Lecture 2 Annotation tools & Segmentation Summary of Part 1 Annotation theory

FairRoot Status &