BCOOL-Trans Accurate and variant-preserving correction for RNA-seq - - PowerPoint PPT Presentation

bcool trans accurate and variant preserving correction
SMART_READER_LITE
LIVE PREVIEW

BCOOL-Trans Accurate and variant-preserving correction for RNA-seq - - PowerPoint PPT Presentation

BCOOL-Trans Accurate and variant-preserving correction for RNA-seq Camille Marchet and Antoine Limasset Univ. Lille, CNRS, Inria, UMR 9189 - CRIStAL SeqBio 2018 Rouen 1 / 25 Introduction Tools to study RNA-seq: Most assembly/quantification


slide-1
SLIDE 1

BCOOL-Trans Accurate and variant-preserving correction for RNA-seq

Camille Marchet and Antoine Limasset

  • Univ. Lille, CNRS, Inria, UMR 9189 - CRIStAL

SeqBio 2018 Rouen

1 / 25

slide-2
SLIDE 2

Introduction

Tools to study RNA-seq: Most assembly/quantification and some variant calling methods are k-mer based Correctors are mostly k-mer based Rely on "solidity"

2 / 25

slide-3
SLIDE 3

Introduction: RNA-seq correction challenges

3 / 25

slide-4
SLIDE 4

Introduction: RNA-seq correction challenges

4 / 25

slide-5
SLIDE 5

Motivations

5 / 25

slide-6
SLIDE 6

Motivations

6 / 25

slide-7
SLIDE 7

Motivations

7 / 25

slide-8
SLIDE 8

Motivations

8 / 25

slide-9
SLIDE 9

State of the art: k-mer spectrum

Main idea

Find abundant/trusted kmers in dataset Replace untrusted kmers in reads by trusted ones

9 / 25

slide-10
SLIDE 10

State of the art: RNA correction

Strategy included in assemblers/KISSPLICE[Sacomoto et al. 2012]/Rcorrector[Song et al. 2015]:

10 / 25

slide-11
SLIDE 11

BCOOL [Limasset et al. 2018]: main concepts

11 / 25

slide-12
SLIDE 12

BCOOL: improvements in read correction

Map reads on unitigs: better handle close errors (distant of less than k) Use large k: handle repeated region Results after correction of genomic reads: *Correction ratio = by how much the number of errors was divided

12 / 25

slide-13
SLIDE 13

BCOOL-Trans enhancements

1- Work with all k-mers Graph construction scale to dozen billions kmers Keep rare k-mer Easier to find overlaps

13 / 25

slide-14
SLIDE 14

BCOOL-Trans enhancements

2- Work with large k-mers and remove only tips

14 / 25

slide-15
SLIDE 15

BCOOL-Trans enhancements

2- Work with large k-mers and remove only tips

15 / 25

slide-16
SLIDE 16

BCOOL-Trans enhancements

3- Advanced tip removal

16 / 25

slide-17
SLIDE 17

BCOOL-Trans enhancements

4- Mapping strategy

17 / 25

slide-18
SLIDE 18

BCOOL-Trans enhancements

4- Mapping strategy

18 / 25

slide-19
SLIDE 19

BCOOL-Trans enhancements

5- Paired-end reads merging

19 / 25

slide-20
SLIDE 20

Correction quality proof of concept

Data

Mouse transcriptome 100M reads with FluxSimulator[Griebel et al. 2012] 1% error rate

Mock BCOOL-Trans

"cleaned" graph + BCOOL’s mapping module BCOOL-TransN means: all true k-mers + erroneous k-mers of occurence > N in cleaned graph

Corrector Recall Precision Ratio correction* % Erroneous reads BFC 58.76 96.16 2.34 30.18 Rcorrector 93.37 99.80 14.68 4.34 BCOOL-Trans5 92.75 97.75 10.64 5.58 BCOOL-Trans7 94.20 98.41 13.7 4.17

*Correction ratio = by how much the number of errors was divided 20 / 25

slide-21
SLIDE 21

Results: paired-end merging

Data

Mouse transcriptome Paired-end reads (2x150 nt) 4,422,720 reads with Flux Simulator Results: Percentage of merged pairs: 97.3454 Not trivial pair-mapping rate: 76%

21 / 25

slide-22
SLIDE 22

Discussion: expected outcomes

After Bcool-Trans correction: Speedup Scaling Less errors in data More signal to detect rare and significant Longer "merged-reads": more context for assembly, variant calling...

Other applications

Meta-genomic data Meta-transcriptomic data

22 / 25

slide-23
SLIDE 23

Future work: development

Graph construction

Relative abundance tipping, adaptative threshold (cf Rcorrector) Distance related tipping (cf RNASpades [Bushmanova et al. 2018 BiorXiv])

Graph alignment

Use partial read mapping Multiple starting anchors

23 / 25

slide-24
SLIDE 24

Future work: experiments

Benchmark versus main SOTA methods (Rcorrector, BayesHammer [Nikolenko et al. 2013], BFC [Li 2015] . . . ) Assess impacts on assembly, variant calling, quantification, differential expression Assess impact (in particular merged reads) on hybrid long read correction

24 / 25

slide-25
SLIDE 25

Conclusion

BCOOL-Trans is a work-in-progress RNA-seq corrector. It is scalable, and it uses new strategies to well-preserve variants.

25 / 25