CONSENT: Scalable self-correction of long reads with multiple - PowerPoint PPT Presentation

CONSENT: Scalable self-correction of long reads with multiple sequence alignment Pierre Morisse 1 , Camille Marchet 2 , Antoine Limasset 2 , Arnaud Lefebvre 1 , Thierry Lecroq 1 1 Normandie Univ, UNIROUEN, LITIS, Rouen 76000, France. 2 Lille Univ, CNRS, CRIStAL, Lille 59000, France. JOBIM 2019 Nantes July 5th

Introduction Workflow Experiments Conclusion Introduction Context 2011: Inception of third generation sequencing technologies Two main actors: Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) Sequencing of much longer reads, tens of kbps on average Expected to solve various problem in the genome assembly field But also very noisy (10-30% error rates), most errors being indels Morisse et al. CONSENT 2/31

Introduction Workflow Experiments Conclusion Introduction Error correction Correction: efficient way to handle these errors Two approaches: Hybrid correction (makes use of complementary short reads) Self-correction (corrects the long reads solely based on the information they contain) Morisse et al. CONSENT 3/31

Introduction Workflow Experiments Conclusion Introduction Self-correction Third generation sequencing technologies evolve fast: Error rates greatly decreased, and now reach 10-12% on average Read length is evergrowing, especially with ONT ultra-long reads (up to 1Mbp) Error correction is still the first step of many analysis projects Self-correction is now much more developped Morisse et al. CONSENT 4/31

Introduction Workflow Experiments Conclusion Introduction Self-correction State-of-the-art: Compute overlaps between the LRs 1 Compute consensus from the overlaps 2 Morisse et al. CONSENT 5/31

Introduction Workflow Experiments Conclusion Introduction Pseudo Multiple Sequence De Bruijn graph Alignment (MSA) Divide the alignments into Build a directed acyclic graph small windows (DAG) to represent the pseudo MSA and compute Correct the windows consensus independently with DBGs AC C A A GGT R 1 ACCAA GG T R 1 AC A A G GGT R 2 ACCAA .. T R 3 .GATCGGG..TAT.TGCCCGTGTTTATGCGTGTG R 1 TGTTCAGGCAAATATG...GAAACAAGGCCTG.. R 2 1 C A GAT..CGGGTATTGCCCGTGTTTATGCGTG..TG R 1 3 3 2 R 3 3 TATTTCTG..AT.GCGC.TGACTTTTCTTGGCAG 3 3 3 A C A G G T 1 1 1 1 A G Morisse et al. CONSENT 6/31

Introduction Workflow Experiments Conclusion Introduction Contribution Major issue: no self-correction tool scales to ONT ultra-long reads We introduce CONSENT, a new self-correction method that: Combines the two previous approaches (MSA + DBG) Computes actual MSA Compares well to the state-of-the-art, and scales better Is also able to polish contigs Morisse et al. CONSENT 7/31

Introduction Workflow Experiments Conclusion Pre-treatment Overlap the long reads Currently with Minimap2 [Li, 2018] But not dependent on the aligner Morisse et al. CONSENT 8/31

Introduction Workflow Experiments Conclusion First step: retrieve alignment piles Select a long read to correct A Morisse et al. CONSENT 9/31

Introduction Workflow Experiments Conclusion First step: retrieve alignment piles Retrieve overlapping long reads A Morisse et al. CONSENT 10/31

Introduction Workflow Experiments Conclusion First step: retrieve alignment piles Get the alignment pile A R 1 R 2 R 3 R 4 R 5 R 6 Morisse et al. CONSENT 11/31

Introduction Workflow Experiments Conclusion First step: retrieve alignment piles Trim the alignment pile A R 1 R 2 R 3 R 4 R 5 R 6 Morisse et al. CONSENT 12/31

Introduction Workflow Experiments Conclusion First step: retrieve alignment piles Trim the alignment pile A R 1 R 2 R 3 R 4 R 5 R 6 Morisse et al. CONSENT 13/31

Introduction Workflow Experiments Conclusion Second step: divide piles into windows For correction, we will only consider windows that: Have a fixed length Are supported by at least c reads Example On the previous example, with c = 4: A R 1 R 2 R 3 R 4 R 5 R 6 Morisse et al. CONSENT 14/31

Introduction Workflow Experiments Conclusion Third step: compute consensus of a window 2. Compute consensus Compute MSA of the sequences Compute consensus from the MSA Unlike other methods, actual MSA is computed ⇒ POA [Lee et al., 2002] Morisse et al. CONSENT 15/31

Introduction Workflow Experiments Conclusion Third step: compute consensus of a window POA (Partial Order Alignment) Multiple sequence alignment strategy based on partial order graphs Two interests: Computes actual multiple sequence alignment 1 Directly builds the DAG representing the multiple sequence 2 alignment Morisse et al. CONSENT 16/31

Introduction Workflow Experiments Conclusion Third step: compute consensus of a window Segmentation strategy In practice, we use windows of a few hundred bases POA is time consuming, even on such windows We developed a segmentation strategy Compute MSA and consensus for smaller sequences ⇒ faster Morisse et al. CONSENT 17/31

Introduction Workflow Experiments Conclusion Third step: compute consensus of a window Segmentation strategy 1. Compute shared anchors between the window’s sequences Morisse et al. CONSENT 18/31

Introduction Workflow Experiments Conclusion Third step: compute consensus of a window Segmentation strategy 2. Search for the longest anchors chain such as ∀ A i , A i + 1 : A i is followed by A i + 1 in at least N sequences 1 A i + 1 is never followed by A i 2 Morisse et al. CONSENT 19/31

Introduction Workflow Experiments Conclusion Third step: compute consensus of a window Segmentation strategy 3. Compute MSA / consensus for sequences bordered by anchors cons. cons. cons. cons. cons. cons. Morisse et al. CONSENT 20/31

Introduction Workflow Experiments Conclusion Fourth step: polish the window’s consensus Approach Consensus ⇒ solid k -mers in uppercase, weak k -mers in lowercase GATCGGGTcatTGCCCGTGTTTATGCGTgtg Build a DBG from the window’s sequences Correct lowercase regions Morisse et al. CONSENT 21/31

Introduction Workflow Experiments Conclusion Fifth step: anchor the consensus to the read By alignment Local alignment, around the positions of the window Repeat with other windows Morisse et al. CONSENT 22/31

CONSENT: Scalable self-correction of long reads with multiple - PowerPoint PPT Presentation

CONSENT: Scalable self-correction of long reads with multiple sequence alignment Pierre Morisse 1 , Camille Marchet 2 , Antoine Limasset 2 , Arnaud Lefebvre 1 , Thierry Lecroq 1 1 Normandie Univ, UNIROUEN, LITIS, Rouen 76000, France. 2 Lille Univ,

CONSENT: Scalable self-correction of long reads with multiple sequence alignment Pierre Morisse 1

GPU accelerated partial order multiple sequence alignment for long reads self-correction

Informed Consent R Jane McKay Informed Consent Consent why and when do we need it ?

GDPR Consent Data Protection Practitioners #DPPC2018 Conference 2018 Whats new? When is

Informed Consent in Research consent has been firmly established in clinical practice and

Facts and Fiction Thomas Srensen, Wiebke Langreder IWTMA April 2017 LT Long-term Correction

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Quantum Information Processing and Quantum Error Correction and Quantum Error Correction with

Eight Truths about Correction from the Book of Proverbs 3 1. The right attitude to correction

HG-CoLoR: Hybrid Graph for the error Correction of Long Reads Pierre Morisse , Thierry Lecroq and

HG-CoLoR: enHanced de bruijn Graph for the error COrrection of LOng Reads Pierre Morisse , Thierry

ELECTOR: Evaluator for long reads correction methods Camille Marchet 1 , , Pierre Morisse 2 , ,

CONSENT PROCESS SPECIAL POPULATIONS CONSENT Com m on Rule ANPRM Current provisions of the

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

Alaska Reads Big Anna Bjartmarsdottir, UAA/APU Books of the Year Rayette Sterling, Anchorage

WDA waveform feeders ew2wda reads from EW waveform ring cs2wda reads from Comserv

Bio-communication and Natural Genome Editing A new concept for the emergence of biological

Semester projects The Plan Principles of Complex Systems Suggestions for CSYS/MATH 300, Spring,

Reverse engineering minimal wiring diagrams Elena Dimitrova School of Mathematical and

JOBIM 3 July 2012 Chondrichthyans Teleostomi Scyliorhinus canicula (dog fish) Genome sequencing

Introduction to Computational Graph Analytics Lecture 1 CSCI 4974/6971 29 August 2016 1 / 6

Disclosure Cervical Spinal Disorders Morio Matsumoto Morio Matsumoto received honorarium for

Outline 1 Introduction 2 Bayesian Networks 3 Neuroscience 4 Industry 5 Sport 6

Disclosures Complications in Cervical Deformity Surgery Stryker Spine: royalties Fellowship