Long-read error correction: a survey and qualitative comparison - PowerPoint PPT Presentation

Long-read error correction: a survey and qualitative comparison Pierre Morisse 1 , Arnaud Lefebvre 2 , Thierry Lecroq 2 1 Normandie Universit´ e, UNIROUEN, INSA Rouen, LITIS, 76000 Rouen, France. 2 Normandie Universit´ e, UNIROUEN, LITIS, Rouen 76000, France.

Introduction Survey Experiments Conclusion Long reads Error correction Context 2011: Inception of third generation sequencing technologies Two main actors: Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) Sequencing of much longer reads, tens of kbps on average Expected to solve various problem in the genome assembly field Very noisy (10-30% error rates), most errors being indels Morisse et al. Long-read correction survey 2/26

Introduction Survey Experiments Conclusion Long reads Error correction Error correction Correction: efficient way to handle these errors Two approaches: Hybrid correction (makes use of complementary short reads) 1 Self-correction (only relies on long reads) 2 Morisse et al. Long-read correction survey 3/26

Introduction Survey Experiments Conclusion Hybrid correction Self-correction Summary Hybrid correction Long reads + short reads, sequenced for the same individual Use the short reads to correct the long reads 3 main approaches: Short reads alignment 1 Contigs alignement 2 De Bruijn graphs (DBG) 3 Morisse et al. Long-read correction survey 4/26

Introduction Survey Experiments Conclusion Hybrid correction Self-correction Summary 1) Short reads alignment Overview Morisse et al. Long-read correction survey 5/26

Introduction Survey Experiments Conclusion Hybrid correction Self-correction Summary 2) Contigs alignment Overview Morisse et al. Long-read correction survey 6/26

Introduction Survey Experiments Conclusion Hybrid correction Self-correction Summary 3) De Bruijn graphs Overview Morisse et al. Long-read correction survey 7/26

Introduction Survey Experiments Conclusion Hybrid correction Self-correction Summary 3) De Bruijn graphs Overview src dst src dst Morisse et al. Long-read correction survey 7/26

Introduction Survey Experiments Conclusion Hybrid correction Self-correction Summary 3) De Bruijn graphs Overview src dst src src dst dst Morisse et al. Long-read correction survey 7/26

Introduction Survey Experiments Conclusion Hybrid correction Self-correction Summary 3) De Bruijn graphs Overview src dst src dst Morisse et al. Long-read correction survey 7/26

Introduction Survey Experiments Conclusion Hybrid correction Self-correction Summary 17 Available methods Method Approach Release PBcR SR alignment 2012 LSC SR alignment 2012 ECTools Contigs alignment 2014 LoRDEC DBG 2014 Proovread SR alignment 2014 Nanocorr SR alignment 2015 NaS SR alignment 2015 CoLoRMap SR alignment 2016 Jabba DBG 2016 LSCplus SR alignment 2016 HALC Contigs alignment 2017 HECIL SR alignment 2017 Hercules Hidden Markov models 2017 FMLRC DBG 2018 HG-CoLoR SR alignment + DBG 2018 MiRCA Contigs alignment 2018 ParLECH DBG 2019 Morisse et al. Long-read correction survey 8/26

Introduction Survey Experiments Conclusion Hybrid correction Self-correction Summary Self-correction Only uses the information contained in the long reads State-of-the-art: Overlap the long reads 1 Compute consensus from the overlaps 2 Two approaches: Pseudo multiple sequence alignment (MSA) 1 De Bruin graphs 2 Morisse et al. Long-read correction survey 9/26

Introduction Survey Experiments Conclusion Hybrid correction Self-correction Summary 1) Pseudo MSA Overview 1 C A 3 3 2 AC C A A GGT R 1 3 AC A A G GGT R 2 3 3 3 A C A G G T 1 ACCAA GG T R 1 1 1 1 ACCAA .. T R 3 G A Morisse et al. Long-read correction survey 10/26

Introduction Survey Experiments Conclusion Hybrid correction Self-correction Summary 1) Pseudo MSA Overview 1 1 C C A A 3 3 3 3 2 2 AC C A A GGT R 1 3 3 AC A A G GGT R 2 3 3 3 3 3 3 A A C C A A G G G G T T 1 1 ACCAA GG T R 1 1 1 1 1 1 1 ACCAA .. T R 3 A G Morisse et al. Long-read correction survey 10/26

Introduction Survey Experiments Conclusion Hybrid correction Self-correction Summary 2) De Bruijn graphs Overview .GATCGGG..TAT.TGCCCGTGTTTATGCGTGTG R 1 TGTTCAGGCAAATATG...GAAACAAGGCCTG.. R 2 R 1 GAT..CGGGTATTGCCCGTGTTTATGCGTG..TG R 3 TATTTCTG..AT.GCGC.TGACTTTTCTTGGCAG Morisse et al. Long-read correction survey 11/26

Introduction Survey Experiments Conclusion Hybrid correction Self-correction Summary 12 Available methods Method Approach Release PBcR-BLASR Pseudo MSA 2013 PBDAGCon Pseudo MSA 2013 Sprai Pseudo MSA 2014 PBcR-MHAP Pseudo MSA 2015 FalconSense Pseudo MSA 2016 Sparc Pseudo MSA 2016 Canu Pseudo MSA 2017 Daccord DBG 2017 LoRMA DBG 2017 MECAT Pseudo MSA 2017 FLAS Pseudo MSA 2018 CONSENT Pseudo MSA + DBG 2019 Morisse et al. Long-read correction survey 12/26

Introduction Survey Experiments Conclusion Hybrid correction Self-correction Summary Problem Today: 29 tools are available Each of them claims to be the best... ... But what is the truth ? Morisse et al. Long-read correction survey 13/26

Introduction Survey Experiments Conclusion Hybrid correction Self-correction Summary A truth Datasets charasteristics have huge impacts on correction: Read length Error rate Sequencing depth Organism complexity Morisse et al. Long-read correction survey 14/26

Introduction Survey Experiments Conclusion Datasets and tools Scenarios & aim Results Datasets We gathered a total of 20 datasets having varying: Complexity (from bacteria to human) Sequencing technologies (PacBio and ONT) Error rates (12 to 44%) Sequencing depths (20x to 100x) Read length (few kbps to few hundreds of kbps) Morisse et al. Long-read correction survey 15/26

Introduction Survey Experiments Conclusion Datasets and tools Scenarios & aim Results Minimalist benchmark To lighten the presentation, we only study Dataset Number of reads Error rate Coverage Number of bases Simulated PacBio data S. cerevisiae 30x 45,198 12.28 30x 371 Mbp C. elegans 30x 366,416 12.28 30x 3,006 Mbp S. cerevisiae 60x 90,397 12.28 60x 742 Mbp C. elegans 60x 732,832 12.28 60x 6,011 Mbp Real ONT data A. baylyi 89,011 29.91 106x 381 Mbp S. cerevisiae real 205,923 44.51 95x 1,173 Mbp Hybrid correction: Self-correction: CoLoRMap MECAT LoRDEC Daccord HG-CoLoR CONSENT Morisse et al. Long-read correction survey 16/26

Introduction Survey Experiments Conclusion Datasets and tools Scenarios & aim Results Scenarios Low error rate, low coverage (30x S. cerevisiae , C. elegans ) 1 Low error rate, medium coverage (60x S. cerevisiae , C. elegans ) 2 High error rate, high coverage (real A. baylyi , S. cerevisiae ) 3 Morisse et al. Long-read correction survey 17/26

Introduction Survey Experiments Conclusion Datasets and tools Scenarios & aim Results Aim For each scenario, identify: Is hybrid correction or self-correction more suited? Which method does perform the best? Morisse et al. Long-read correction survey 18/26

Long-read error correction: a survey and qualitative comparison - PowerPoint PPT Presentation

Long-read error correction: a survey and qualitative comparison Pierre Morisse 1 , Arnaud Lefebvre 2 , Thierry Lecroq 2 1 Normandie Universit e, UNIROUEN, INSA Rouen, LITIS, 76000 Rouen, France. 2 Normandie Universit e, UNIROUEN, LITIS, Rouen

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits

Quantum Information Processing and Quantum Error Correction and Quantum Error Correction with

Private quantum subsystems and error Tomas Jochym- OConnor correction Privacy & error

QEC11 Quantum Error Correction and Quantum Error-Correcting Codes Todd A. Brun Center for

The Capability of Error Correction for Burst-noise Channels Using Error Estimating Code Yaoyu

On Error Correction in the Exponent Chris Peikert MIT Computer Science and AI Laboratory Theory

Constructing Error- -Correction Codes Correction Codes Constructing Error from Scale- -Free

Parametric Linear System Solving with Error Correction. Cleveland Waddell S YMBOLIC -N UMERIC C

RNA-seq nanopore read correction R. Chikhi, L. Lima, C. Marchet, ASTER Consortium December 2017

Quantum Error Correction for Long-Distance Quantum Communication Institute of Physics,

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Error Detection Codes Error Detection Two types Nave scheme Error Detection Codes

Quantum Error Correction Shyam Sundhar R Department of EE Mid Term Presentation, CS 682 Mid

Benefits and Challenges of Analyzing Qualitative Data Sheelagh Carpendale empirical research

REVIEW OF QUALITATIVE RESEARCH AND PRINCIPLES OF QUALITATIVE ANALYSIS SCWK 242 SESSION 2

Facts and Fiction Thomas Srensen, Wiebke Langreder IWTMA April 2017 LT Long-term Correction

Error-correcting learning: Delta rule Effects of training on response to input patterns Important

Searching MDS Burst-Correcting Codes Ana Lucila Sandoval Orozco Advisor : Luis Javier Garca

Learning Theory and Model Selection Weinan Zhang Shanghai Jiao Tong University

Hierarchical Polynomial Approximation Vincent LEFVRE, Jean-Michel MULLER, Serge TORRES

Quantum Computation - Lecture 08 - Quantum Error Correction II Mateus de Oliveira Oliveira

Coding and A Applications in Sensor Networks pplications in Sensor Networks Coding and Jie Gao

CIRM - Dynamic Error Detection Peter Pirkelbauer Center for Applied Scientific Computing (CASC)

Computer Organization & Assembly Language Programming (CSE 2312) Lecture 26: Overflow

Sambuz

Useful Links

Newsletter

Mail Us

Long-read error correction: a survey and qualitative comparison - PowerPoint PPT Presentation

Long-read error correction: a survey and qualitative comparison Pierre Morisse 1 , Arnaud Lefebvre 2 , Thierry Lecroq 2 1 Normandie Universit e, UNIROUEN, INSA Rouen, LITIS, 76000 Rouen, France. 2 Normandie Universit e, UNIROUEN, LITIS, Rouen

ERROR DETECTON &amp; CORRECTION Error Detection EDC= Error Detection and Correction bits

Quantum Information Processing and Quantum Error Correction and Quantum Error Correction with

Private quantum subsystems and error Tomas Jochym- OConnor correction Privacy &amp; error

QEC11 Quantum Error Correction and Quantum Error-Correcting Codes Todd A. Brun Center for

The Capability of Error Correction for Burst-noise Channels Using Error Estimating Code Yaoyu

On Error Correction in the Exponent Chris Peikert MIT Computer Science and AI Laboratory Theory

Constructing Error- -Correction Codes Correction Codes Constructing Error from Scale- -Free

Parametric Linear System Solving with Error Correction. Cleveland Waddell S YMBOLIC -N UMERIC C

RNA-seq nanopore read correction R. Chikhi, L. Lima, C. Marchet, ASTER Consortium December 2017

Quantum Error Correction for Long-Distance Quantum Communication Institute of Physics,

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Error Detection Codes Error Detection Two types Nave scheme Error Detection Codes

Quantum Error Correction Shyam Sundhar R Department of EE Mid Term Presentation, CS 682 Mid

Benefits and Challenges of Analyzing Qualitative Data Sheelagh Carpendale empirical research

REVIEW OF QUALITATIVE RESEARCH AND PRINCIPLES OF QUALITATIVE ANALYSIS SCWK 242 SESSION 2

Facts and Fiction Thomas Srensen, Wiebke Langreder IWTMA April 2017 LT Long-term Correction

Error-correcting learning: Delta rule Effects of training on response to input patterns Important

Searching MDS Burst-Correcting Codes Ana Lucila Sandoval Orozco Advisor : Luis Javier Garca

Learning Theory and Model Selection Weinan Zhang Shanghai Jiao Tong University

Hierarchical Polynomial Approximation Vincent LEFVRE, Jean-Michel MULLER, Serge TORRES

Quantum Computation - Lecture 08 - Quantum Error Correction II Mateus de Oliveira Oliveira

Coding and A Applications in Sensor Networks pplications in Sensor Networks Coding and Jie Gao

CIRM - Dynamic Error Detection Peter Pirkelbauer Center for Applied Scientific Computing (CASC)

Computer Organization &amp; Assembly Language Programming (CSE 2312) Lecture 26: Overflow

Sambuz

Useful Links

Newsletter

Mail Us

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits

Private quantum subsystems and error Tomas Jochym- OConnor correction Privacy & error

Computer Organization & Assembly Language Programming (CSE 2312) Lecture 26: Overflow