Nanopore Sequencing Technology and Tools for Genome Assembly: - PowerPoint PPT Presentation

Nanopore Sequencing Technology and Tools for Genome Assembly: Computational Analysis of the Current State, Bottlenecks and Future Directions Damla Senol Cali, Jeremie S. Kim, Saugata Ghose, Can Alkan and Onur Mutlu Contact: dsenol@andrew.cmu.edu February 16, 2019

Nanopore Sequencing & Tools Damla Senol Cali, Jeremie S. Kim, Saugata Ghose, Can Alkan, and Onur Mutlu. "Nanopore Sequencing Technology and Tools for Genome Assembly: Computational Analysis of the Current State, Bottlenecks BiBVersion arXivVersion and Future Directions." Briefings in Bioinformatics (2018). Damla Senol Cali 2 02/16/2019

Executive Summary q Motivation: Nanopore sequencing is an emerging and a promising technology with its ability to generate long reads and provide portability . q Problem: q High error rates of the technology q Critical importance of the tools to 1) overcome the high error rates of the technology, and 2) enable fast, real-time data analysis. q Goal: Analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data. q Key Contributions: o Analysis of the tools in multiple dimensions: accuracy , performance , memory usage and scalability . o New bottlenecks and tradeoffs that different combinations of tools lead to o Guidelines for both practitioners and tool developers Damla Senol Cali 3 02/16/2019

Outline q Background and Motivation o Nanopore Sequencing Technology o Comparison with Prior Technologies o Nanopore Genome Assembly Pipeline o Our Goal q Experimental Methodology q Results and Analysis q Conclusion Damla Senol Cali 4 02/16/2019

Nanopore Sequencing Technology q Nanopore sequencing is an emerging and a promising single-molecule DNA sequencing technology. q First nanopore sequencing device, MinION , made commercially available by Oxford Nanopore Technologies (ONT) in May 2014. o Inexpensive o Long read length (> 882 Kbp) o Produces data in real time o Pocket-sized and portable Damla Senol Cali 5 02/16/2019

Nanopore Sequencing q Nanopore is a nano-scale hole. q In nanopore sequencers, an ionic current passes through the nanopores. q When the DNA strand passes through the nanopore, the sequencer measures the change in current . q This change is used to identify the bases in the strand with the help of different electrochemical structures of the different bases. Damla Senol Cali 6 02/16/2019

Why Nanopore Sequencing? Nanopore Sequencing (Prior) High-Throughput Technology Sequencing Technologies q q Require an amplification step Do not require an amplification before the sequencing process, step before the sequencing q Require labeling of the DNA or process, q nucleotide for detection during Do not require any labeling of sequencing, the DNA or nucleotide for q Generate billions of short but detection during sequencing, accurate reads, q Allow sequencing of very long q Provide high throughput, high reads , and speed and low cost, q Provide portability, low cost and q Suffers from massive amount of high throughput. data and short reads, which poses q One major drawback: high error challenges due to the repetitive rates ( ∽ 10-15%) sequences in the genome. Damla Senol Cali 7 02/16/2019

Nanopore Genome Assembly Pipeline Raw signal Basecalling data DNA reads Read-to-Read Overlap Finding Overlaps Assembly Assembly Draft assembly Read Mapping (Optional) Mappings of reads against Improved draft assembly Polishing (Optional) assembly Damla Senol Cali 8 02/16/2019

Our Goal q Comprehensively analyze the multiple steps and the associated state-of-the-art tools in genome assembly pipelines using nanopore sequence data in terms of accuracy , performance , memory usage , and scalability . q Reveal bottlenecks and trade-offs that different combinations of tools lead to. q Provide guidelines for both practitioners , such that they can determine the appropriate tools and tool combinations that can satisfy their goals, and tool developers , such that they can make design choices to improve current and future tools. Damla Senol Cali 9 02/16/2019

Outline q Background and Motivation q Experimental Methodology q Results and Analysis q Conclusion Damla Senol Cali 10 02/16/2019

Experimental Methodology Damla Senol Cali 11 02/16/2019

Experimental Methodology (cont.) Accuracy Metrics Performance Metrics q q Average Identity Wall clock time q o Percentage similarity between the assembly Peak memory usage q and the reference genome Parallel speedup o Higher ( ≃100% ) is preferred q Coverage o Ratio of the #aligned bases in the reference genome to the length of reference genome o Higher ( ≃100% ) is preferred q Number of mismatches o Total number of single-base differences between the assembly and the reference genome o Lower ( ≃0 ) is preferred q Number of indels o Total number of insertions and deletions between the assembly and the reference genome o Lower ( ≃0 ) is preferred Damla Senol Cali 12 02/16/2019

Outline q Background and Motivation q Experimental Methodology q Results and Analysis o Basecalling Tools § Accuracy § Performance o Read-to-Read Overlap Finding Tools o Assembly Tools o Read Mapping and Polishing Tools (optional) q Conclusion Damla Senol Cali 13 02/16/2019

Nanopore Genome Assembly Pipeline Raw signal Basecalling data Tools: Metrichor, Nanonet, Scrappie, Nanocall, DeepNano DNA reads Read-to-Read Overlap Finding Tools: GraphMap, Minimap Overlaps Assembly Assembly Tools: Canu, Miniasm Draft assembly Read Mapping Tools: BWA-MEM, Minimap, (GraphMap) Mappings of reads against Polishing Improved draft assembly Tools: Nanopolish, Racon assembly Damla Senol Cali 14 02/16/2019

Basecalling Tools q Metrichor o ONT’s cloud-based basecaller o Uses recurrent neural networks ( RNN ) for basecalling q Nanonet o ONT’s offline and open-source alternative for Metrichor o Uses RNN for basecalling q Scrappie o ONT’s newest basecaller that explicitly addresses basecalling errors in homopolymer regions q Nanocall [David+, Bioinformatics 2016] o Uses Hidden Markov Models ( HMM ) for basecalling q DeepNano [Boža+, PloS One 2017] o Uses RNN for basecalling Damla Senol Cali 15 02/16/2019

Nanopore Genome Assembly Pipeline Raw signal Basecalling data Tools: Metrichor, Nanonet, Scrappie, Nanocall, DeepNano DNA reads Pipeline A: [Basecalling tool] Read-to-Read Overlap Finding + Canu Tools: GraphMap, Minimap Pipeline B: [Basecalling tool] Overlaps + GraphMap + Miniasm Assembly Pipeline C: [Basecalling tool] Assembly Tools: Canu, Miniasm + Minimap + Miniasm Draft assembly Read Mapping Tools: BWA-MEM, Minimap, (GraphMap) Mappings of reads against Polishing Improved draft assembly Tools: Nanopolish, Racon assembly Damla Senol Cali 16 02/16/2019

Basecalling –Accuracy Accuracy An Ac Analysis Re Results for Ba Basecalling Tools 100 100 450 450 90 90 400 400 80 80 350 350 70 70 300 300 Percentage (%) Percentage (%) 60 60 250 250 KBp) # (KBp 50 50 200 200 # 40 40 150 150 30 30 100 100 20 20 50 50 10 10 0 0 Metrichor Scrappie Nanocall DeepNano Nanonet Observation 1-a: Metrichor, Nanonet and Scrappie have similar A A B B C C A A B B C C A A B B C C A A B B C C A A B B C C L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P identity and coverage trends among all of the evaluated Iden entity (%) Cov over erage e (%) # Mismatches es # Indel els scenarios. Damla Senol Cali 17 02/16/2019

Basecalling –Accuracy Accuracy An Ac Analysis Re Results for Ba Basecalling Tools 100 100 450 450 90 90 400 400 80 80 350 350 70 70 300 300 Percentage (%) Percentage (%) 60 60 250 250 KBp) # (KBp 50 50 200 200 # 40 40 150 150 30 30 100 100 20 20 50 50 10 10 0 0 Metrichor Scrappie Nanocall DeepNano Nanonet Observation 1-b: However, Nanocall and DeepNano cannot A A B B C C A A B B C C A A B B C C A A B B C C A A B B C C L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P reach these three basecallers’ accuracies: they have lower identity Iden entity (%) Cov over erage e (%) # Mismatches es # Indel els and lower coverage . Damla Senol Cali 18 02/16/2019

Nanopore Sequencing Technology and Tools for Genome Assembly: - PowerPoint PPT Presentation

Nanopore Sequencing Technology and Tools for Genome Assembly: Computational Analysis of the Current State, Bottlenecks and Future Directions Damla Senol Cali, Jeremie S. Kim, Saugata Ghose, Can Alkan and Onur Mutlu Contact:

10 Technology To Watch - 2012 - Thaweesak Koanantakool Sep. 20, 2012 1 Nanopore Sequencing

Nanopore sequencing High molecular weight DNA isolations Hi-C Ruta Sahasrabudhe Assistant

Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

NANOPORE SENSING OF AN ANTHRAX PROTIEN Nanopore Sensing Wilner & Katz eds.

Apicomplexan Genome Sequencing in Sanger Arnab Pain, The Pathogen Sequencing Unit (PSU) 2 nd

Sequencing technology and assembly Sanger sequencing Sanger sequencing with radioactivity

Genomics Sequencing tech Sequencing tech: next generation What do we get from sequencing? How

Genomes and Metagenomes Whole Genome Sequencing and Metagenomics Whole Genome Sequencing

Genome Annotation The steps in genome sequencing Generate genome sequence Assembly ORF

RNA-seq nanopore read correction R. Chikhi, L. Lima, C. Marchet, ASTER Consortium December 2017

Genetic Testing: Genome Sequencing A-Z for Mitochondrial Disease Christine Stanley PhD, FACMG

Electronic Detection of DNA-nicks Using 2D Solid-state Nanopore Transistor I use Blue Waters to

using nanopore long reads Jean-Marc Aury jmaury@genoscope.cns.fr @J_M_Aury ONT workshop,

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Genome Reassembly From Fragments 7 January 2019 OSU CSE 1 Genome A genome is the encoding

A Bayesian nonparametric method for the LR assessment in case of rare type match Giulia Cereda

CO COVID-19 Vir irtual al Communit ity Meetin ing March 27, 2020 11:00 12:00 AM PDT

Certainty in Uncertain Times Certainty is Only a Molecule Away Investor Call, Q1 FY17 NASDAQ:

Advising the Federal Government Susan L. Graham University of California, Berkeley LISPI

Cybersecurity for Future Presidents Lecture 8: How can individuals be associated with actions in

Secure Genome me Analysis The privacy workshop is jointly

Welcome to the 2015 Cyber Risk Insights Conference! @Advisen #CyberRisk Welcoming Remarks Bill

what I am after Statistical modelling and analysis do not from gR2002 respect

Sambuz

Useful Links

Newsletter

Mail Us

Nanopore Sequencing Technology and Tools for Genome Assembly: - PowerPoint PPT Presentation

Nanopore Sequencing Technology and Tools for Genome Assembly: Computational Analysis of the Current State, Bottlenecks and Future Directions Damla Senol Cali, Jeremie S. Kim, Saugata Ghose, Can Alkan and Onur Mutlu Contact:

10 Technology To Watch - 2012 - Thaweesak Koanantakool Sep. 20, 2012 1 Nanopore Sequencing

Nanopore sequencing High molecular weight DNA isolations Hi-C Ruta Sahasrabudhe Assistant

Introduction to Bioinformatics Genome sequencing &amp; assembly Genome sequencing &amp; assembly

Genome Sequencing &amp; Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

NANOPORE SENSING OF AN ANTHRAX PROTIEN Nanopore Sensing Wilner &amp; Katz eds.

Apicomplexan Genome Sequencing in Sanger Arnab Pain, The Pathogen Sequencing Unit (PSU) 2 nd

Sequencing technology and assembly Sanger sequencing Sanger sequencing with radioactivity

Genomics Sequencing tech Sequencing tech: next generation What do we get from sequencing? How

Genomes and Metagenomes Whole Genome Sequencing and Metagenomics Whole Genome Sequencing

Genome Annotation The steps in genome sequencing Generate genome sequence Assembly ORF

RNA-seq nanopore read correction R. Chikhi, L. Lima, C. Marchet, ASTER Consortium December 2017

Genetic Testing: Genome Sequencing A-Z for Mitochondrial Disease Christine Stanley PhD, FACMG

Electronic Detection of DNA-nicks Using 2D Solid-state Nanopore Transistor I use Blue Waters to

using nanopore long reads Jean-Marc Aury jmaury@genoscope.cns.fr @J_M_Aury ONT workshop,

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics &amp; Computational

Genome Reassembly From Fragments 7 January 2019 OSU CSE 1 Genome A genome is the encoding

A Bayesian nonparametric method for the LR assessment in case of rare type match Giulia Cereda

CO COVID-19 Vir irtual al Communit ity Meetin ing March 27, 2020 11:00 12:00 AM PDT

Certainty in Uncertain Times Certainty is Only a Molecule Away Investor Call, Q1 FY17 NASDAQ:

Advising the Federal Government Susan L. Graham University of California, Berkeley LISPI

Cybersecurity for Future Presidents Lecture 8: How can individuals be associated with actions in

Secure Genome me Analysis The privacy workshop is jointly

Welcome to the 2015 Cyber Risk Insights Conference! @Advisen #CyberRisk Welcoming Remarks Bill

what I am after Statistical modelling and analysis do not from gR2002 respect

Sambuz

Useful Links

Newsletter

Mail Us

Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

NANOPORE SENSING OF AN ANTHRAX PROTIEN Nanopore Sensing Wilner & Katz eds.

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational