Genesis: A Hardware Acceleration Framework for Genomic Data Analysis - PowerPoint PPT Presentation

The 47th IEEE International Symposium on Computer Architecture Genesis: A Hardware Acceleration Framework for Genomic Data Analysis Tae Jun Ham , David Bruns-Smith, Brendan Sweeney, Yejin Lee, Seong Hoon Seo, U Gyeong Song, Young H. Oh, Krste Asanovic, Jae W. Lee, Lisa Wu Wills SEOUL NATIONAL UNIVERSITY

Genomics and Genome Sequencing  DNA (deoxyribonucleic acid): the chemical compound containing the instructions an organism needs to develop, live, and reproduce. A T Base • DNA is made of two paired strands, where each strand pair is represented G C pair with a single character (A, C, G, or T) that corresponds to the nucleotide base of a single pair Backbone  DNA sequencing (genome sequencing): a process of identifying the base pair sequence for a DNA  Why is it important? DNA Source: U.S National Library of Medicine • Can identify if a person is susceptible to a specific disease • Can identify the type/variant of the cancer • Can be used for genetics research • Also used for COVID-19 researches (e.g., identification of the virus, virus variant analysis) Berkeley 2 Ham et al. ─ Genesis: A Hardware Acceleration Framework for Genomic Data Analysis APEX Lab @ Duke Architecture ARC Lab @ SNU Research

Genomics and Genome Sequencing  Genome Sequencing was very expensive, and time-consuming. • Human Genome Project cost $2.7B billion and took 13 years.  Next-Generation Sequencing (NGS) technology enabled the rapid sequencing of a whole genome • Whole genome sequencing now costs $300-$700 [1] and takes Cost of Genome Sequencing less than an hour per genome [2] Source: U.S National Human Genome Institute  Genome sequencing comes with a huge computational demand • Data obtained from Genome sequencing instruments (i.e., raw reads) needs to be processed with the various algorithms • This process is called Secondary Analysis [1] https://nebula.org/whole-genome-sequencing/ [2] https://sapac.illumina.com/systems/sequencing-platforms/novaseq/specifications.html Berkeley 3 Ham et al. ─ Genesis: A Hardware Acceleration Framework for Genomic Data Analysis APEX Lab @ Duke Architecture ARC Lab @ SNU Research

Advent of Hardware Accelerators for Genome Sequencing 10.0% 15.4% 9.3% 63.4% Base Metadata Mark Alignment Quality Score Duplicates Update Recalibration GATK4 Best Practices Data Preprocessing Pipeline Runtime Breakdown (measured on Intel Xeon 8-cores) (Miscellaneous stages accounting for 1.9% of the runtime are omitted)  Complex stage such as Alignment takes most of the runtime and thus has been targets for many hardware accelerators • GenAx [ISCA ’18], Darwin [ASPLOS’ 18], Guo et al. [FCCM ‘19] • Other complex stages such as Variant Calling (downstream) are accelerated as well  Advent of hardware accelerators shifts the bottleneck to simple data-manipulation operations Berkeley 4 Ham et al. ─ Genesis: A Hardware Acceleration Framework for Genomic Data Analysis APEX Lab @ Duke Architecture ARC Lab @ SNU Research

Advent of Hardware Accelerators for Genome Sequencing 0.7% 27.2% 41.8% 24% Alignment Base Metadata Mark Quality Score Duplicates Update Recalibration GATK4 Best Practices Data Preprocessing Pipeline Runtime Breakdown (measured on Intel Xeon 8-cores) (Miscellaneous stages accounting for 1.9% of the runtime are omitted)  Complex stage such as Alignment takes most of the runtime and thus has been targets for many hardware accelerators • GenAx [Fujiki et al., ISCA ’18] , Darwin [Turakhia et al., ASPLOS’ 18] , [Guo et al., FCCM ‘19] • Other complex stages such as Variant Calling (downstream) are accelerated as well  Advent of hardware accelerators shifts the bottleneck to simple data-manipulation operations Assuming GenAx throughput (4058K reads/s), the alignment only • takes 0.7% of the total data preprocessing runtime Data-manipulation operations accounts for 93% of the total runtime • Berkeley 5 Ham et al. ─ Genesis: A Hardware Acceleration Framework for Genomic Data Analysis APEX Lab @ Duke Architecture ARC Lab @ SNU Research

Genesis: A Hardware Acceleration Framework for Genomic Data Analysis Genesis is a framework that enables the users to easily design a cloud- deployable hardware accelerator for the genomic data-manipulation operations A user utilizes Genesis SQL Frontend to represent the target data-manipulation operation 1 in a way that can be easily mapped to the hardware Components in Genesis Hardware Library (configurable accelerator building blocks) is 2 used to construct a dataflow pipeline for the specified SQL query Genesis Backend automatically augments the pipeline with 3 parallelism, deploys it on cloud FPGA, and allows a user to access it with high-level API Berkeley 6 Ham et al. ─ Genesis: A Hardware Acceleration Framework for Genomic Data Analysis APEX Lab @ Duke Architecture ARC Lab @ SNU Research

Presentation Outline  Genomics and Genome Sequencing  Genesis: A Hardware Acceleration Framework for Genomic Data Analysis • Genesis SQL Frontend • Genesis Hardware Library • Genesis Backend • Genesis-generated HW accelerators  Evaluation  Conclusions Berkeley 7 Ham et al. ─ Genesis: A Hardware Acceleration Framework for Genomic Data Analysis APEX Lab @ Duke Architecture ARC Lab @ SNU Research

Genesis SQL Interface  Observation : Most simple data manipulation operations for genomic data can be easily represented with a SQL Query [1,2] on genomic data represented in tabular form  Key Data Types : Reference and Reads • Reference: A reference genome sequence for an individual organism of a species (e.g., human) • (Aligned) Reads: A fragment of the genome sequence measured using sequencing instruments with some metadata 0000000000111111111122222222223333 0123456789012345678901234567890123 ... ... Reference AGTTTAGTACCATAGCTAGCTGAAGGAACCAGTA Sequence Read1 (0-15) AGTGTAGTACCCTAGC Read2 (12-27) TA-CTAGATGATGGAA Read3 (18-33) GCTGAAGGAACCAGTA [1] Massie et al., ADAM: Genomics Formats and Processing Patterns for Cloud Scale Computing, UC Berkeley Tech Report, 2013 [2] Kozanitis et al., GenAp: a distributed SQL interface for genomic data, BMC informatics, 2016 Berkeley 8 Ham et al. ─ Genesis: A Hardware Acceleration Framework for Genomic Data Analysis APEX Lab @ Duke Architecture ARC Lab @ SNU Research

Genesis SQL Interface (Tabular Data Representation)  Observation : Most simple data manipulation operations for genomic data can be easily represented with a SQL Query [1,2] on genomic data represented in tabular form Metadata representing  Key Data Types : References and Reads alignment information Reference Table (Simplified) Reads Table 2 Aligned ( M ), 1 Deleted ( D ) 13 Aligned ( M ) POS SEQ POS SEQ CIGAR 1111111122222222 0 16 M 0 AGTGTAGTACCCTAGC AGTTTAGTACCATAGCTAG 2345678901234567 12 2 M , 1 D , 13 M TACTAGATGATGGAA CTGAAGGAACCAGTA TA-CTAGATGATGGAA 16 M 18 GCTGAAGGAACCAGTA 0000000000111111111122222222223333 2M 1D 13M ... 0123456789012345678901234567890123 Reference AGTTTAGTACCATAGCTAGCTGAAGGAACCAGTA Sequence Read1 (0-15) AGTGTAGTACCCTAGC Read2 (12-27) TA-CTAGATGATGGAA Read3 (18-33) GCTGAAGGAACCAGTA [1] Massie et al., ADAM: Genomics Formats and Processing Patterns for Cloud Scale Computing, UC Berkeley Tech Report [2] Kozanitis et al., GenAp: a distributed SQL interface for genomic data, BMC informatics, 2016 Berkeley 9 Ham et al. ─ Genesis: A Hardware Acceleration Framework for Genomic Data Analysis APEX Lab @ Duke Architecture ARC Lab @ SNU Research

Genesis SQL Interface (Operations)  (Common) Supported SQL Operations : Select , Where , GroupBy , Join , Limit (i.e., select a subset of rows), Count , Sum , etc.  Additional Supported Operations : PosExplode & ReadExplode Reference Table (Simplified) Reads Table POS SEQ POS SEQ CIGAR 0 16 M 0 AGTGTAGTACCCTAGC AGTTTAGTACCATAGCTAG 12 2 M , 1 D , 13 M TACTAGATGATGGAA CTGAAGGAACCAGTA 16 M 18 GCTGAAGGAACCAGTA PosExplode ReadExplode (Reference.POS, (Reads.POS, Reference Read#1 Read#2 Read#3 Reference.SEQ) Reads.SEQ, POSSEQ POSSEQ POSSEQ POSSEQ Reads.CIGAR) 12 T 18 G A A 0 0 1 1 13 19 G G A C 2 2 14 20 T T _ T 3 3 15 21 G T G C ... ... ... ... ... ... ... ... 33 15 27 33 A C A A Berkeley 10 Ham et al. ─ Genesis: A Hardware Acceleration Framework for Genomic Data Analysis APEX Lab @ Duke Architecture ARC Lab @ SNU Research

Genesis SQL Interface (Example App.) Example Application Compute the number of base pair mismatches between the reference and each read Reference REF READ REF READ POS SEQ 0 PosExplode POSSEQ POSSEQ POS SEQ SEQ 0 A A A A AGTTTAGTACCATAGCTAGCTGAAG ... (Reference) 0 0 0 3 Inner Count 1 1 1 G G G G 2 Join Mismatch 2 2 2 T T T T Reads POS SEQ CIGAR 3 3 3 T G 1 T G ReadExplode 1 0 16 M AGTGTAGTACCCTAGC (Read #1) ... ... ... ... ... ... ... 12 TACTAGATGAAGGAA 2 M , 1 D , 13 M 1 Repeat from 33 15 15 A C C C 18 16 M GCTGAAGGAACCAGTA w/ different Read Step #1 CREATE TABLE READ AS CREATE TABLE REF AS ReadExplode (R.POS, R.SEQ, R.CIGAR) FROM R PosExplode (Reference.SEQ, Reference.POS) FROM Reference Step #2 CREATE TABLE RefRead AS SELECT READ.SEQ, REF.SEQ FROM READ FOR R IN Reads: INNER JOIN ( SELECT * FROM REF LIMIT 0, 15) /* Step 1 */ ON READ.POS = REF.POS INSERT INTO Output /* Step 2 */ /* Step 3 */ Step #3 SELECT SUM(READ.SEQ == REF.SEQ) END LOOP; FROM RefRead Berkeley 11 Ham et al. ─ Genesis: A Hardware Acceleration Framework for Genomic Data Analysis APEX Lab @ Duke Architecture ARC Lab @ SNU Research

Genesis: A Hardware Acceleration Framework for Genomic Data Analysis - PowerPoint PPT Presentation

The 47th IEEE International Symposium on Computer Architecture Genesis: A Hardware Acceleration Framework for Genomic Data Analysis Tae Jun Ham , David Bruns-Smith, Brendan Sweeney, Yejin Lee, Seong Hoon Seo, U Gyeong Song, Young H. Oh,

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Genesis 3 Genesis 3 Genesis 3 All that has been made was very good ... Said no one ever

GENESIS ENERGY H1 2015 Results Presentation GENESIS ENERGY H1 2015 RESULTS DISCLAIMER Genesis

The Book Of Genesis GENESIS By Boomy Tokan Genesis Who wrote the Book? Moses by revelation

Genesis 17-21 Blue Bible pg 15 Genesis 17-21 Blue Bible pg 15 In the beginning, God created

Background of GENESIS Program GENESIS Older adult program field capable clinical services

Genesis GX-37D Genesis GX-37D Product Presentation Product Presentation Nov 1st, 2011 Nov 1st,

Genesis 3:1 15 Blue Bible pg 3 Genesis 3:1 15 Blue Bible pg 3 Genesis 3:1 15 ESV 1

J Genesis 3:15 Seed of Woman Genesis 22:18 Abrahams Seed Genesis 49:10 Lawgiver from

GENESIS GENESIS Group Group Presentation to Presentation to ISRN Conference ISRN Conference

Hardware Acceleration of Hardware Acceleration of Graphics and Imaging Graphics and Imaging

Genomic Knowledge Standards (GKS) genomicsandhealth.org Genomic Knowledge Standards GKS aims

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Genesis Genesis Meaning: Beginnings Author: Moses Date Written: 1446-1406 B.C. Time Period: First

GENESIS The Book of Beginnings GENESIS 1250 Abraham and the Covenant Family What is

1 1 The Tower in Genesis 11: The Tower in Genesis 11: built in direct opposition to God: built

SPLICING SYSTEMS ACCEPTING VS. GENERATING Juan Castellanos Victor Mitrana Eugenio Santos

Genetic Algorithms: An introductory Overview References:An introduction to Genetic Algorithms by

Prof.

Outline of Next 2 Weeks THIS WEEK What is Life Origin of Life & Man Genesis //

Squamous Cell Carcinoma Squamous Cell Carcinoma of Precursors Squamous intraepithelial

Case 1 Illuminating Consultation 44 year old woman with a mass in the Cases rectus muscle.

Color, Art, and Chemistry Dr. Sakya S. Sen CSIR National Chemical Laboratory Pashan, Pune

MEDICAL MARIJUANA PANEL: HAZE OF CHANGE Shannon Lowe PharmD, MPH, BPS, CPHE, NCPS LCDR U.S.

Genesis: A Hardware Acceleration Framework for Genomic Data Analysis - PowerPoint PPT Presentation

The 47th IEEE International Symposium on Computer Architecture Genesis: A Hardware Acceleration Framework for Genomic Data Analysis Tae Jun Ham , David Bruns-Smith, Brendan Sweeney, Yejin Lee, Seong Hoon Seo, U Gyeong Song, Young H. Oh,

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Genesis 3 Genesis 3 Genesis 3 All that has been made was very good ... Said no one ever

GENESIS ENERGY H1 2015 Results Presentation GENESIS ENERGY H1 2015 RESULTS DISCLAIMER Genesis

The Book Of Genesis GENESIS By Boomy Tokan Genesis Who wrote the Book? Moses by revelation

Genesis 17-21 Blue Bible pg 15 Genesis 17-21 Blue Bible pg 15 In the beginning, God created

Background of GENESIS Program GENESIS Older adult program field capable clinical services

Genesis GX-37D Genesis GX-37D Product Presentation Product Presentation Nov 1st, 2011 Nov 1st,

Genesis 3:1 15 Blue Bible pg 3 Genesis 3:1 15 Blue Bible pg 3 Genesis 3:1 15 ESV 1

J Genesis 3:15 Seed of Woman Genesis 22:18 Abrahams Seed Genesis 49:10 Lawgiver from

GENESIS GENESIS Group Group Presentation to Presentation to ISRN Conference ISRN Conference

Hardware Acceleration of Hardware Acceleration of Graphics and Imaging Graphics and Imaging

Genomic Knowledge Standards (GKS) genomicsandhealth.org Genomic Knowledge Standards GKS aims

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Genesis Genesis Meaning: Beginnings Author: Moses Date Written: 1446-1406 B.C. Time Period: First

GENESIS The Book of Beginnings GENESIS 1250 Abraham and the Covenant Family What is

1 1 The Tower in Genesis 11: The Tower in Genesis 11: built in direct opposition to God: built

SPLICING SYSTEMS ACCEPTING VS. GENERATING Juan Castellanos Victor Mitrana Eugenio Santos

Genetic Algorithms: An introductory Overview References:An introduction to Genetic Algorithms by

Prof.

Outline of Next 2 Weeks THIS WEEK What is Life Origin of Life &amp; Man Genesis //

Squamous Cell Carcinoma Squamous Cell Carcinoma of Precursors Squamous intraepithelial

Case 1 Illuminating Consultation 44 year old woman with a mass in the Cases rectus muscle.

Color, Art, and Chemistry Dr. Sakya S. Sen CSIR National Chemical Laboratory Pashan, Pune

MEDICAL MARIJUANA PANEL: HAZE OF CHANGE Shannon Lowe PharmD, MPH, BPS, CPHE, NCPS LCDR U.S.

Outline of Next 2 Weeks THIS WEEK What is Life Origin of Life & Man Genesis //