gene prediction with augustus
play

Gene Prediction with AUGUSTUS Genome annotation: challenges in - PowerPoint PPT Presentation

Gene Prediction with AUGUSTUS Ingo Bulla Gene Prediction with AUGUSTUS Genome annotation: challenges in eukaryotes and consequences for evolutionary genomics, 13 February 2018 Overview on Gene Prediction with RNA-Seq RGASP Assessment B


  1. Gene Prediction with AUGUSTUS Ingo Bulla Gene Prediction with AUGUSTUS Genome annotation: challenges in eukaryotes and consequences for evolutionary genomics, 13 February 2018 Overview on Gene Prediction with RNA-Seq RGASP Assessment B RAKER 1 homology-based Ingo Bulla Institut für Mathematik und Informatik Universität Greifswald 1.1

  2. Gene Prediction with AUGUSTUS About the speaker Ingo Bulla • PhD in mathematics about a non-applied topic, switched to bioinformatics in 2006 • Main research topic: Sequence analysis, phylogeny, evolution, epidemiology and public health of HIV Overview on Gene Prediction • Now working with Mario Stanke (developer of with RNA-Seq RGASP Assessment AUGUSTUS) on improving the algorithm used by B RAKER 1 AUGUSTUS homology-based • Limited experience in genomics, has only applied AUGUSTUS once in a research project → Speaker will have a Skype with • Mario Stanke or • Katharina Hoff (long-time user of AUGUSTUS, implementer of BRAKER) during the lunch talk if questions come up he cannot answer • Ingénieur de recherche in Perpignan from 1st of April on, in a wetlab group (Christoph Grunau, Guillaume Mitta) 1.2

  3. Gene Prediction with AUGUSTUS Ingo Bulla Overview on Gene Prediction 1 Overview on Gene Prediction with RNA-Seq RGASP Assessment B RAKER 1 with RNA-Seq 2 homology-based RGASP Assessment B RAKER 1 homology-based 3 1.3

  4. Structural Genome Annotation Problem Input • genome assemblie(s) • extrinsic evidence, e.g. from RNA-Seq, MS/MS, protein database Output • start- and end positions of genes, CDS, exons and introns ( .gff ) Example (12 600 bp from algae Chlamydomonas reinhardtii , with JGI)

  5. Gene Prediction with Example Application AUGUSTUS Ingo Bulla iBeetle: RNAi screen for the beetle Tribolium castaneum 1 predict genes 2 design primers based on prediction 3 produce dsRNA for each gene Overview on Gene Prediction 4 knock down each gene in larval and pupal stage with RNA-Seq RGASP Assessment 5 observe phenotype B RAKER 1 homology-based 6 study function for select genes 1.5

  6. Major Approaches to Protein-Coding Gene Prediction approach extrinsic evidence used programs ab initio - G ENE M ARK , A UGUSTUS , S NAP , F GENESH transcript seqs, BRAKER, Exonerate transcript- e.g. RNA-Seq A UGUSTUS , mGene based protein sequences A UGUSTUS -P PX , protein GeneWise, Exonerate homology additional (unannotated) comparative A UGUSTUS , genomes ( de novo ) C ONTRAST , N-S CAN peptides from A UGUSTUS proteogenomics mass spectrometry other gene predictions + J IGSAW , G LEAN , combiners/ transcript seqs + proteins + ? M AKER 2, P ASA selectors State of the art usually requires a combination of approaches: Use for every part of a gene all evidence available for that gene or region.

  7. Single species gene-finding: 1-species graph Assumptions: no alternative splicing, no gene overlap • graph represents all candidate gene structures • nodes: exon candidates (EC) • edges: introns and intergenic regions • each path from s to t is one gene structure • single species gene-finding in linear time: longest path algorithm 6 explicit intron 12 9 11 20 −2 7 30 8 forward 4 strand intron+2 intron+1 intron+0 t s intergenic region intron+0 intron+1 intron+2 6 reverse 5 12 strand 3 3 7 9

  8. Gene finder A UGUSTUS • developed since 2002 (PI: Mario Stanke) • based on conditional random field (generalization of HMM) • probabilistic model of gene structures given signals, CDS, evidence • get most likely genes structure or a sample of likely ones Some genome annotation collobarations using A UGUSTUS Aedes aegypti yellow fewer mosquito: dengue fever Science , 2007 Brugia malayi parasitic worm, causes elephantiasis Science , 2007 Tribolium castaneum red flour beetle, pest and model organism Nature , 2008 Schistosoma mansoni parasite causing bilharziosis Nature , 2009 Coprinus cinereus fungus PNAS , 2010 Nasonia vitripennis wasp Science , 2010 Amphimedon queenslandica sponge Nature , 2010 Culex pipiens common mosquito Science , 2010 Ricinus communis castor bean Nature Biotechnology , 2010 Chlamydomonas reinhardtii green algae Proteomics , 2011 Galdieria sulphuraria red algae Science , 2013 Arabidopsis thaliana plant model organism PNAS , 2008 Heliconius melpomene butterfly Nature , 2012 Apis mellifera honey bee BMC Genomics , 2014

  9. Gene Prediction with AUGUSTUS Ingo Bulla Overview on Gene Prediction 1 Overview on Gene Prediction with RNA-Seq RGASP Assessment B RAKER 1 with RNA-Seq 2 homology-based RGASP Assessment B RAKER 1 homology-based 3 1.9

  10. Three Major Approaches to Gene-Finding with RNA-Seq align to genome RNA-Seq C e.g. Augustus de novo assembly coverage A e.g. Cu ffl inks B genome guided assembly noncoding gene protein-coding genes fi nd soon with Augustus also A evidence integration into gene finder (e.g. A UGUSTUS , F GENESH , M G ENE , G ENEID ) 1 align reads to genome first 2 integrate evidence from coverage and spliced alignments into gene finder B purely alignment-based (e.g. Cufflinks) 1 align reads to genome first 2 construct transcripts from spliced alignments (no gene finding) C de novo assembly of reads (e.g. Trinitry, TransDecoder, Velvet + A UGUSTUS ) 1 assemble transcriptome reads into transcript contigs 2 use contigs for gene finding or just align them

  11. A UGUSTUS using RNA-Seq Using RNA-Seq only (on human) spliced alignments used to predict alternative splicing ab initio model dominates where little or no evidence

  12. Gene Prediction with AUGUSTUS Ingo Bulla RGASP: RNA-Seq Genome Annotation Assessment Project Overview on Gene Prediction Assessment of transcript reconstruction methods for RNA-seq with RNA-Seq Steijger et al., Nature Methods , Nov. 2013 RGASP Assessment B RAKER 1 • assessed the progress of automatic gene building using homology-based RNAseq • part of ENCODE project • 17 participating groups submitted, all on same data 1.12

  13. Excerpt of RGASP assessment results on human Calling transcripts and proteins: Best results on transcript sensitivity gene sensitivity fly 24% 49% (A UGUSTUS ) worm 48% 61% (T RANSOMICS )

  14. Why was the accuracy not better? Problems: intronic transcription, self-similarity of genome

  15. Reminder: RNA-Seq does not give you the protein sequence

  16. B RAKER 1 Collaboration with former competitor Mark Borodovsky (G ENE M ARK ) • M AKER 2 pipeline uses G ENE M ARK and A UGUSTUS • Why not throw together • G ENE M ARK -ET that self-trains on RNA-Seq and • A UGUSTUS that predicts with RNA-Seq ourselves ? • easy to use: braker.pl [OPTIONS] -genome=genome.fa -bam=rnaseq.bam • fast (1 day for fly on 1 CPU)

  17. Gene Prediction with GeneMark-ET (2014): unsupervised training of parameters AUGUSTUS Ingo Bulla Overview on Gene Prediction with RNA-Seq RGASP Assessment B RAKER 1 GeneMark does not use RNA-Seq for prediction. homology-based Anchors from RNA-Seq for training 1.17

  18. Gene Prediction with BRAKER1 Pipeline AUGUSTUS Ingo Bulla Overview on Gene Prediction with RNA-Seq RGASP Assessment B RAKER 1 homology-based 1.18

  19. Gene Prediction with Comparing BRAKER1 to MAKER2 (using RNA-Seq only) AUGUSTUS Ingo Bulla C. elegans D. melanogaster S. pombe A. thaliana 38 ● ● ● Gene Sensitivity ● ● ● Overview on Gene 33 Gene Specificity ● Prediction ● ● Transcript Sensitivity ● BRAKER1 − MAKER2 28 with RNA-Seq Transcript Specificity ● Exon Sensitivity RGASP Assessment 23 Exon Specificity B RAKER 1 18 homology-based ● 13 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 8 ● ● 3 ● ● ● ● ● ● −2 −7 BRAKER1− GeneMark−ET BRAKER1− AUGUSTUS BRAKER1− GeneMark−ET BRAKER1− AUGUSTUS BRAKER1− GeneMark−ET BRAKER1− AUGUSTUS BRAKER1− GeneMark−ET BRAKER1− AUGUSTUS 1.19

  20. Gene Prediction with Accuracy of BRAKER1 AUGUSTUS Ingo Bulla C. elegans D. melanogaster A. thaliana S. pombe 86 ● ● Overview on Gene ● ● Prediction 81 ● ● ● ● ● ● ● ● ● ● ● 76 with RNA-Seq ● ● ● 71 RGASP Assessment ● ● ● B RAKER 1 66 ● % 61 homology-based ● 56 ● ● ● ● Gene Sensitivity ● 51 Gene Specificity Transcript Sensitivity ● 46 ● ● ● Transcript Specificity ● 41 Exon Sensitivity ● 36 Exon Specificity ● 31 BRAKER1− GeneMark−ET BRAKER1− AUGUSTUS BRAKER1− GeneMark−ET BRAKER1− AUGUSTUS BRAKER1− GeneMark−ET BRAKER1− AUGUSTUS BRAKER1− GeneMark−ET BRAKER1− AUGUSTUS 1.20

  21. Gene Prediction with AUGUSTUS Ingo Bulla Overview on Gene Prediction 1 Overview on Gene Prediction with RNA-Seq RGASP Assessment B RAKER 1 with RNA-Seq 2 homology-based RGASP Assessment B RAKER 1 homology-based 3 1.21

  22. Homology-Based Gene-Finding Approaches genome MSA simultaneous genome annotation e.g. AUGUSTUS, GSA-MPSA e.g. N-SCAN, CONTRAST conservation conserved non-coding e.g. Genewise, e.g. AUGUSTUS-PPX exonerate single protein alignment protein MSA

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend