bcool trans accurate and variant preserving correction
play

BCOOL-Trans Accurate and variant-preserving correction for RNA-seq - PowerPoint PPT Presentation

BCOOL-Trans Accurate and variant-preserving correction for RNA-seq Camille Marchet and Antoine Limasset Univ. Lille, CNRS, Inria, UMR 9189 - CRIStAL SeqBio 2018 Rouen 1 / 25 Introduction Tools to study RNA-seq: Most assembly/quantification


  1. BCOOL-Trans Accurate and variant-preserving correction for RNA-seq Camille Marchet and Antoine Limasset Univ. Lille, CNRS, Inria, UMR 9189 - CRIStAL SeqBio 2018 Rouen 1 / 25

  2. Introduction Tools to study RNA-seq: Most assembly/quantification and some variant calling methods are k -mer based Correctors are mostly k -mer based Rely on "solidity" 2 / 25

  3. Introduction: RNA-seq correction challenges 3 / 25

  4. Introduction: RNA-seq correction challenges 4 / 25

  5. Motivations 5 / 25

  6. Motivations 6 / 25

  7. Motivations 7 / 25

  8. Motivations 8 / 25

  9. State of the art: k -mer spectrum Main idea Find abundant/trusted kmers in dataset Replace untrusted kmers in reads by trusted ones 9 / 25

  10. State of the art: RNA correction Strategy included in assemblers/KISSPLICE[Sacomoto et al. 2012]/Rcorrector[Song et al. 2015]: 10 / 25

  11. BCOOL [Limasset et al. 2018]: main concepts 11 / 25

  12. BCOOL: improvements in read correction Map reads on unitigs: better handle close errors (distant of less than k ) Use large k : handle repeated region Results after correction of genomic reads: *Correction ratio = by how much the number of errors was divided 12 / 25

  13. BCOOL-Trans enhancements 1- Work with all k -mers Graph construction scale to dozen billions kmers Keep rare k -mer Easier to find overlaps 13 / 25

  14. BCOOL-Trans enhancements 2- Work with large k -mers and remove only tips 14 / 25

  15. BCOOL-Trans enhancements 2- Work with large k -mers and remove only tips 15 / 25

  16. BCOOL-Trans enhancements 3- Advanced tip removal 16 / 25

  17. BCOOL-Trans enhancements 4- Mapping strategy 17 / 25

  18. BCOOL-Trans enhancements 4- Mapping strategy 18 / 25

  19. BCOOL-Trans enhancements 5- Paired-end reads merging 19 / 25

  20. Correction quality proof of concept Data Mouse transcriptome 100M reads with FluxSimulator[Griebel et al. 2012] 1% error rate Mock BCOOL-Trans "cleaned" graph + BCOOL’s mapping module BCOOL-TransN means: all true k -mers + erroneous k -mers of occurence > N in cleaned graph Corrector Recall Precision Ratio correction* % Erroneous reads BFC 58.76 96.16 2.34 30.18 Rcorrector 93.37 99.80 14.68 4.34 BCOOL-Trans5 92.75 97.75 10.64 5.58 BCOOL-Trans7 98.41 13.7 94.20 4.17 *Correction ratio = by how much the number of errors was divided 20 / 25

  21. Results: paired-end merging Data Mouse transcriptome Paired-end reads (2x150 nt) 4,422,720 reads with Flux Simulator Results: Percentage of merged pairs: 97.3454 Not trivial pair-mapping rate: 76% 21 / 25

  22. Discussion: expected outcomes After Bcool-Trans correction: Speedup Scaling Less errors in data More signal to detect rare and significant Longer "merged-reads": more context for assembly, variant calling... Other applications Meta-genomic data Meta-transcriptomic data 22 / 25

  23. Future work: development Graph construction Relative abundance tipping, adaptative threshold (cf Rcorrector) Distance related tipping (cf RNASpades [Bushmanova et al. 2018 BiorXiv]) Graph alignment Use partial read mapping Multiple starting anchors 23 / 25

  24. Future work: experiments Benchmark versus main SOTA methods (Rcorrector, BayesHammer [Nikolenko et al. 2013], BFC [Li 2015] . . . ) Assess impacts on assembly, variant calling, quantification, differential expression Assess impact (in particular merged reads) on hybrid long read correction 24 / 25

  25. Conclusion BCOOL-Trans is a work-in-progress RNA-seq corrector. It is scalable, and it uses new strategies to well-preserve variants. 25 / 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend