genax a genome sequence accelerator
play

GenAx: A Genome Sequence Accelerator Daichi Fujiki et al Presented - PowerPoint PPT Presentation

GenAx: A Genome Sequence Accelerator Daichi Fujiki et al Presented by: Amani Alkayyali Ben Cyr EECS 573 GenAx Paper Presentation 1 Genome Sequencing Thymine DNA: Thymine, Cytosine, Adenine, Guanine Genome Sequencing: Determining


  1. GenAx: A Genome Sequence Accelerator Daichi Fujiki et al Presented by: Amani Alkayyali Ben Cyr EECS 573 GenAx Paper Presentation 1

  2. Genome Sequencing Thymine ● DNA: Thymine, Cytosine, Adenine, Guanine ● Genome Sequencing: Determining T,C,A,G Order Adenine ● Genome Sequencing Goals: ○ Understanding entire DNA sequence as system Cytosine lgrdlmnqvtthequickababcmfxlqbrownfoxj lgrdlmnqvt thequick ababcmfxlq brownfo xju rvs mpedoverthelazy yyzplf dog jjiurttl urvsmpedoverthelazyyyzplfdogjjiurttlythe doglayhhbeldquietlydreaminghwwiqldns y thedoglay hhbeld quietlydreaming hwwi Guanine ofdinnerplwosiucnd qldns ofdinner plwosiucnd EECS 573 GenAx Paper Presentation 2

  3. Uses ● Individualized treatment and personalized medicine ○ Understanding an individual’s cancer cell mutations ● Understanding causes of diseases https://rnsights.com/the-push-for-personalized-medicine/ EECS 573 GenAx Paper Presentation 3

  4. Methods ● Steps of Genome sequencing: ○ Break into small pieces (reads) at random positions ○ Determine the sequence ○ Figure out which pieces fit together (read alignment) ● Two approaches: ○ Clone-by-Clone ○ Whole Genome Sequencing EECS 573 GenAx Paper Presentation 4

  5. Current State: Genome Sequencing and Computing ● Expensive ○ 2001: $3 billion - first human genome sequencing ● Requires several hundreds to thousands of CPU hours ● Large output ○ Data from 1 mill genomes produces over 300 Petabytes of data ● Moore’s Law tapering leads to hardware acceleration ● BWA-MEM: Burrows-Wheeler Aligner ○ Broad Institute’s standard software for read alignment EECS 573 GenAx Paper Presentation 5

  6. Goals ● Smaller seeds → More parallelism ● Improving locality of data access ● Improve upon Smith-Waterman and Levenshtein Automata (LA) ○ Improve scaling ● Accelerator for read alignment ● Resolve issues from variants and sequencing errors EECS 573 GenAx Paper Presentation 6

  7. Sequence Aligners ● Edit distance: number of deletions, insertions, or substitutions ● Seeding: finding potential matches 1 ● Seed-Extensions: finding best match Reference Genome Seeding 2 Read Alignment Seed Extension EECS 573 GenAx Paper Presentation 7

  8. Seeding Algorithm ● Seeding locates the potential match locations ● Finds the “seeds” for seed extension phase ○ “k-mers”: string matches of k length ○ Super Maximal Exact Matches (SMEMs): Seeding Maximum length match extending from k-mer ● Key Idea: Intersect sets of k-mers until the longest match is found. Seed Extension EECS 573 GenAx Paper Presentation 8

  9. Seeding Algorithm K = 4 EECS 573 GenAx Paper Presentation 9

  10. Seeding Algorithm K = 4 EECS 573 GenAx Paper Presentation 10

  11. Seeding Algorithm K = 4 EECS 573 GenAx Paper Presentation 11

  12. Seeding Accelerator ● Index and Position Tables are kept in large SRAM blocks ● Intersection computation w/ Content Addressable Memory (CAM) ○ CAMs tell you very quickly if certain data is in the CAM block ○ Small 512 index CAM table ○ When k = 12 (avg case), matches usually < 500 ● If larger than 512 indices, use binary search EECS 573 GenAx Paper Presentation 12

  13. Silla: String Independent Local Levenshtein Automata ● Seed extension algorithm ● Finite-state automata ● Traceback: trace of edits needed to align ● Scored using an affine gap function Seeding ● Insertions, deletions, substitutions ● 3D vs 2D Silla Seed Extension ● Merging confluence paths EECS 573 GenAx Paper Presentation 13

  14. Silla: String Independent Local Levenshtein Automata EECS 573 GenAx Paper Presentation 14

  15. Silla: String Independent Local Levenshtein Automata EECS 573 GenAx Paper Presentation 15

  16. SillaX: Silla Accelerator ● Edit distance, affine gap penalty, traceback ● State = processing element, communicates with neighbor ● Retro comparison = two shift registers ● Scoring → Clipping ● Composable Subgrids ● Verified on human genome ● 62.9x speedup over Smith-Waterman EECS 573 GenAx Paper Presentation 16

  17. GenAx ● Combine seeding accelerator and SillaX ● Direct replacement to BWA-MEM software sequence aligner EECS 573 GenAx Paper Presentation 17

  18. GenAx Architecture EECS 573 GenAx Paper Presentation 18

  19. GenAx Performance Test ● Compared with two other sequence aligners ○ Intel Xeon Processor running BWA-MEM (128 GB DDR4) ○ Nvidia TITAN Xp running CUSHAW2 ● Synthesized and simulated GenAx with 28nm process ● Used real human genome reference from dataset ○ 800 Million reads at 101 base pairs / read EECS 573 GenAx Paper Presentation 19

  20. Performance Results ● GenAx vs BWA-MEM ○ 31.7x Speedup ○ 12x less power ○ ~10 Hrs vs. ~300 Hrs ● Even better vs GPU ○ 72.4x Speedup EECS 573 GenAx Paper Presentation 20

  21. Conclusion and Contributions ● Silla: ○ Computes edit distance between two strings ○ String independent and local communication ● SillaX: ○ Accelerator for Silla supporting traceback ● GenAx: ○ SillaX + Seeding Accelerator ○ Drop-In replacement for BWA-MEM software EECS 573 GenAx Paper Presentation 21

  22. Discussion Questions ● GenAx might take large performance hits when handling certain inputs (i.e. large K-edit distances, many “k-mer” seeds). Is it worth using GenAx even if it is not flexible enough to handle these edge cases? ● Are composable systems (many small systems to form one large system) a good solution for scaling? ● The authors ran the performance test on one specific genome and read configuration. Do you think this is enough to show the usefulness of GenAx? EECS 573 GenAx Paper Presentation 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend