dataflow acceleration of smith waterman with traceback
play

Dataflow acceleration of Smith- Waterman with Traceback for high - PowerPoint PPT Presentation

Dataflow acceleration of Smith- Waterman with Traceback for high throughput Next Generation Sequencing Konstantina Koliogeorgi*, Nils Voss , Sotiria Fytraki , Sotirios Xydis*, Georgi Gaydadjiev , Dimitrios Soudris* * National


  1. Dataflow acceleration of Smith- Waterman with Traceback for high throughput Next Generation Sequencing Konstantina Koliogeorgi*, Nils Voss ⴕ , Sotiria Fytraki ⴕ , Sotirios Xydis*, Georgi Gaydadjiev ⴕ , Dimitrios Soudris* * National Technical University of Athens, Greece, {konstantina, sxydis, dsoudris}@microlab.ntua.gr Maxeler Technologies UK {nvoss, sfytraki, georgi}@maxeler.com FPL 2019 Conference

  2. Genome Sequencing • Genome represents entire genetic information of an organism • Next-Generation Sequencing technologies allow to compare individual to reference genome • Typical genomic workflow e.g. SeqMule • short read alignment: reads ~100 bases long Alignment QC assessment Extract Variant Calling coverage Alignment on input consensus calls statistics sequences • Operate on huge amount of data WES Analysis Bowtie2 Workflows Bowtie2 in SeqMule WES Workflows Time for Increasing Input Size 6 10 5 Other Time in hrs 8 4 Time in hrs Bowtie2 6 3 10 GB 4 2 14.8 GB 2 1 19.3 GB 0 0 SomaticVarscan SomaticSamtools NormalVarscan SomaticVarscan SomaticSamtools Normal Workflow Workflow • Aligners Bottleneck in Workflow => in need of acceleration! 8 September 2019

  3. Problem Statement • Most Aligners utilize Seed & Extend Model • Fragment reads into short pieces (seeds) that align exactly to genome • Extend seeds to full alignment with SmithWaterman • SmithWaterman 60 83% % of total reads • Matrix Fill Stage followed by Traceback 50 40 • Takes up 60% (55% + 5% respectively) 30 20 of total time 10 • Distributed over hundreds of tasks per read 0 1 5 8 10 15 20 30 40 50 100 200 • calling & data transfer overhead number of calls • Challenge • Co-designed Solution to avoid overhead • Extract parallelism to further boost performance FPL 2019 Conference 8 September 2019

  4. Standalone Optimized Dataflow Implementation • Matrix Fill Calculates Matrices E,H,F • Traceback traverses matrices in reverse order to construct alignment path up, left elements: not yet past n th antidiagonal current checks computed checks 𝑄𝐹 � 0 0 0 0 1 4 7 1 0 0 1 2 1 3 6 4 0 0 1 2 1 3 6 4 Traceback 𝑄𝐹 � 1 1 2 1 1 2 5 4 5 2 1 1 5 4 1 2 2 1 4 3 2 1 2 5 1 1 4 1 2 2 1 4 3 2 1 4 𝑜 2 1 3 2 2 1 3 2 1 1 2 2 1 3 1 4 1 0 2 1 0 0 0 2 7 2 7 2 1 3 1 4 1 0 2 1 0 0 0 2 4 𝑄𝐹 � 7 2 1 2 1 0 2 1 2 1 0 2 1 2 1 0 0 2 1 0 0 0 2 2 0 2 1 2 1 0 2 1 0 0 0 2 2 𝑄𝐹 � 1 st row 𝑜 + 𝑛 − 1 reference upleft element: stream Matrix-Fill next check • Interleaving Data Scheme • Interlace data from subsequent read-reference pairs • Double Buffering • operate in pipeline fashion FPL 2019 Conference 8 September 2019

  5. Proposed Integration Architecture Key Architectural Decisions • Move Traceback on Hardware to alleviate transfer cost • Major Software Restructure to constraint number of accelerator calls Reads Alignments L-interleaved pairs 1 st C-CTACC 1 st Sm Sm Sm Sm Sm Sm Sm Sm Sm Sm W W W W W W W W W W 2 nd Sm Sm Sm Sm PE0 2 nd ACGT--CG W W W W Traceback . . . interleaving Sm Sm Sm Sm Sm Sm Sm Sm PCIe PCIe W W W W W W W W . . . Sm Sm Sm Sm Sm Sm PE1 W W W W W W . . . chain of seed-extend alignments Sm Sm W W . . . Sm W PEn L th Sm Sm Sm Sm Sm Sm Sm Sm Sm Sm L th ACGTGCC W W W W W W W W W W HW Execution Data gathering Data distribution phase phase phase Results • x18 speedup standalone • x1,55 speedup end to end FPL 2019 Conference 8 September 2019

  6. Thank you for your attention! FPL 2019 Conference 8 September 2019

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend