Computing and Deep Learning Johnny Israeli COMPUTE TRENDS - PowerPoint PPT Presentation

Accelerating Sequencing with GPU Computing and Deep Learning Johnny Israeli

COMPUTE TRENDS GPU-Computing perf 10 1.5X per year APPLICATIONS 7 10 6 ALGORITHMS 1.1X per 10 year 5 10 SYSTEMS 4 10 CUDA 1.5X per 3 10 year 2 Single-threaded perf ARCHITECTURE 2

COMPUTE TRENDS Publications 3

Sequencing Trends 4

SEQUENCING TRENDS Sequencing Data Growing in Volume and Complexity Rise of Single Cell Decreasing Cost Increasing Read Length Data 5

SEQUENCING TRENDS 6

Worldwide Annual Sequencing Capacity 10 21 10 18 SEQUENCING TRENDS 10 15 10 12 2000 2005 2010 2015 2020 2025 7

Sequencing Data Types 8

SEQUENCING TRENDS: Genomics *ENA Database 9

SEQUENCING TRENDS: Transcriptomics *ENA Database 10

SEQUENCING TRENDS: Epigenomics *ENA Database 11

SEQUENCING TRENDS: Nanopore Long Read Sequencing *ENA Database 12

Variant Calling 13

Variant Calling Reference TGGATTTGAAAAC G GAGCAAATGACTG TGGATTTGAAAAC G GAGCAAATGACTG Illumina TGGATTTGAAAAC G GAGCAAATGACTG Reads TGGATTTGAAAAC A GAGCAAATGACTG TGGATTTGAAAAC A GAGCAAATGACTG Map to Sequence TGGATTTGAAAAC A GAGCAAATGACTG Reference DNA TGGATTTGAAAAC G GAGCAAATGACTG ● Identify sites with potential mismatch Likely heterozygous variant ● True variants or instrument errors? ● SNPs or insertions or deletions? ● Heterozygous or homozygous variants?

Example Pileup Input Data Read Index Heterozygous SNP Position

GATK Variant Calling Pipeline Variant Calling Pipeline Sort Align to Mark Duplicates Call Variants Joint Call Filter Variants Reference Calibrate 16

Accelerated GATK Variant Calling Pipeline Variant Calling Pipeline Sort Align to Mark Duplicates Call Variants Joint Call Filter Variants Reference Calibrate Parabricks Variant Alignment Preprocessing Variant Calling Joint Genotyping Processing 17

Accelerated Variant Calling Pipelines Whole Genome Processing in Minutes Alignment + Haplotype Mutec2 GenotypeGVCF DeepVariant Preprocessing Caller Parabricks Germline Copy Number Somatic Alignment Preprocessing Variant Calling Joint Genotyping Variant Processing 18

Deep Averaging Network (DAN)

DAN Development ● PyTorch-based 1D model ● Learned embeddings of bases ● Encoding variant proposals ● Downsample easy variant candidates during training

Variant Calling Errors

Variant Calling Error Breakdown

Atac Sequencing 23

DNA: Open And Closed Closed DNA inactive Open DNA active Open DNA changes affect development & disease 24

Atac Sequencing Mapping Open DNA Sites Sequence Map & Count Open DNA Reads Open DNA site Open DNA site 25

Atac-seq Limits Atac-seq signal degrades in due to: Less sequencing • Low quality sample preparation • Small cell populations • 26

AtacWorks SDK AI-Denoised ATAC-seq Data Processing High Quality Sequencing Low Quality Sequencing Sequence Map, Align, Low Quality Open DNA Count Sequencing Denoised with AtacWorks AI 27

AtacWorks Model Denoising + Open Chromatin Identification Input (Noisy ATAC-Seq data) Predicted Coverage Predicted open Resblock 1 Resblock 2 Resblock 3 Resblock 4 Resblock 5 Resblock 6 Resblock 7 chromatin Evaluation: Evaluation: MSE AUPRC Pearson correlation ⊕ ReLU Conv ReLU Conv ReLU Conv 28

Denoising Low Sequencing Data AtacWorks identifies open chromatin from low-coverage data 50 Million Reads 1 Million Reads 1 Million Reads + AtacWorks 29

Genome-wide Sequencing Reduction AtacWorks Reduces Sequencing Requirements 3x 1M Reads 1M Reads + AtacWorks 30

Denoising Low Quality Sample AtacWorks improves signal-to-noise ratio in low quality samples Distance from transcription start site 31

Denoising Single Cell Atac-seq Data AtacWorks Improves Open DNA Detection From Few Cells Open DNA Detection auPRC 90 Cells 90 Cells With AtacWorks 32

AtacWorks SDK SDK on Clara Genomics: https://github.com/clara-genomics/AtacWorks AtacWorks Preprint: https://www.biorxiv.org/content/10.1101/829481v1 90 Cells + AtacWorks 1M Reads + AtacWorks 1M Reads 90 Cells Reduce Sequencing Cost Improve Sample Quality Increase Single Cell Resolution 33

Genome Assembly 34

Long Read De Novo Assembly Step 2: Overlap graph Step 3: Error correction to Step 1: Mapping to detect traversal to generate polish genomes overlaps between reads draft genomes Draft genome Original reads ACTCGGTCATTCGTGCTTTATC GCGTTATCGTCTACTTCGT 35

Genome Assembly Workflow Genome Assembly Pipeline Overlap Assemble Align Polish DL Polish MiniMap MiniASM Racon x 5 Medaka 36

Accelerated Genome Assembly Workflow Before ClaraGenomicsAnalysis Genome Assembly Pipeline Overlap Assemble Align Polish DL Polish MiniMap MiniASM Racon x 5 Medaka cuDNN 37

Accelerated Genome Assembly Workflow ClaraGenomicsAnalysis 0.1 Genome Assembly Pipeline Overlap Assemble Align Polish DL Polish MiniMap MiniASM Racon x 5 Medaka cudaPOA cuDNN 38

Accelerated Genome Assembly Workflow ClaraGenomicsAnalysis 0.2 Genome Assembly Pipeline Overlap Assemble Align Polish DL Polish MiniMap MiniASM Racon x 5 Medaka cudaAligner cudaPOA cuDNN 39

Accelerated Genome Assembly Workflow ClaraGenomicsAnalysis 0.3 Genome Assembly Pipeline Overlap Assemble Align Polish DL Polish MiniMap MiniASM Racon x 5 Medaka cudaAligner cudaPOA cuDNN cudaMapper 40

ClaraGenomicsAnalysis SDK Enabling Accelerated Genome Assembly Bacteria Genome Assembly Acceleration Assembly Pipeline Overlap Assemble Align Polish DL Polish MiniMap MiniASM Racon x 5 Medaka cudaAligner cudaPOA cuDNN cudaMapper Azure v32 CPU V100 GPU 41

CLARA GENOMICS SW Open Source CUDA-Accelerated Sequencing Analysis Tools APPLICATIONS Reference Applications BASECALLING GENOME ASSEMBLY AI-DENOISED ATAC-SEQ Integration with 3rd Party ClaraGenomicsAnalysis SDK AtacWorks SDK Applications and Workflows Transfer Optimized C++ API Python API Learning Inference C++ and Python APIs cudaAligne Genomics Reference cudaMapper cudaPOA r I/O Models CUDA Accelerated HPC and CUDA Deep Learning Modules 42

Useful Links Parabricks: https://www.parabricks.com • ClaraGenomicsAnalysis • SDK on GitHub: https://github.com/clara-genomics/ClaraGenomicsAnalysis • C++ API Examples: cudapoa, cudaaligner • Python API Examples: cudapoa, cudaaligner • AtacWorks • SDK on GitHub: https://github.com/clara-genomics/AtacWorks • AtacWorks Preprint: https://www.biorxiv.org/content/10.1101/829481v1 • 3rd party integrations: • Racon: https://github.com/lbcb-sci/racon • Raven: https://github.com/lbcb-sci/raven • Bonito: https://github.com/nanoporetech/bonito • Additional GPU Accelerated Genomics Applications: • Kipoi Model Zoo: https://ngc.nvidia.com/catalog/containers/hpc:kipoi • SigProfiler: https://github.com/AlexandrovLab/SigProfilerExtractor • 43

Accelerating Sequencing with GPU Computing and Deep Learning Johnny Israeli

Computing and Deep Learning Johnny Israeli COMPUTE TRENDS - PowerPoint PPT Presentation

Accelerating Sequencing with GPU Computing and Deep Learning Johnny Israeli COMPUTE TRENDS GPU-Computing perf 10 1.5X per year APPLICATIONS 7 10 6 ALGORITHMS 1.1X per 10 year 5 10 SYSTEMS 4 10 CUDA 1.5X per 3 10 year 2

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

17 March 2015, San Jose The research has been supported by grant No. 2012/05/B/ST6/03026 from the

Repetitive DNA and next-generation sequencing: computational challenges and solutions Todd J.

De novo genome assembly Dr Torsten Seemann IMB Winter School - Brisbane Mon 1 July 2013

DNA Assembly and Finishing DNA Assembly and Finishing Latin American Course on Bioinformatics

Universal Network Design and Assembly Introduction DNA Assembly This year, we improved upon

Cloud Computing and the DNA Data Race Michael Schatz June 8, 2011 HPDC11/3DAPAS/ECMLS

Electric Field Devices for Manipulation, Electric Field Devices for Manipulation, Directed

Integraseinhibitoren Integraseinhibitoren HIV HIV gp41 gp41 gp120 gp120 virale virale

Computing and Deep Learning Johnny Israeli COMPUTE TRENDS - PowerPoint PPT Presentation

Accelerating Sequencing with GPU Computing and Deep Learning Johnny Israeli COMPUTE TRENDS GPU-Computing perf 10 1.5X per year APPLICATIONS 7 10 6 ALGORITHMS 1.1X per 10 year 5 10 SYSTEMS 4 10 CUDA 1.5X per 3 10 year 2

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

17 March 2015, San Jose The research has been supported by grant No. 2012/05/B/ST6/03026 from the

Repetitive DNA and next-generation sequencing: computational challenges and solutions Todd J.

De novo genome assembly Dr Torsten Seemann IMB Winter School - Brisbane Mon 1 July 2013

DNA Assembly and Finishing DNA Assembly and Finishing Latin American Course on Bioinformatics

Universal Network Design and Assembly Introduction DNA Assembly This year, we improved upon

Cloud Computing and the DNA Data Race Michael Schatz June 8, 2011 HPDC11/3DAPAS/ECMLS

Electric Field Devices for Manipulation, Electric Field Devices for Manipulation, Directed

Integraseinhibitoren Integraseinhibitoren HIV HIV gp41 gp41 gp120 gp120 virale virale

Deep learning for natural language processing A short primer on deep learning Benoit Favre <