Dataflow acceleration of Smith- Waterman with Traceback for high - - PowerPoint PPT Presentation

dataflow acceleration of smith waterman with traceback
SMART_READER_LITE
LIVE PREVIEW

Dataflow acceleration of Smith- Waterman with Traceback for high - - PowerPoint PPT Presentation

Dataflow acceleration of Smith- Waterman with Traceback for high throughput Next Generation Sequencing Konstantina Koliogeorgi*, Nils Voss , Sotiria Fytraki , Sotirios Xydis*, Georgi Gaydadjiev , Dimitrios Soudris* * National


slide-1
SLIDE 1

Dataflow acceleration of Smith- Waterman with Traceback for high throughput Next Generation Sequencing

Konstantina Koliogeorgi*, Nils Vossⴕ, Sotiria Fytraki ⴕ, Sotirios Xydis*, Georgi Gaydadjiev ⴕ, Dimitrios Soudris*

* National Technical University of Athens, Greece, {konstantina, sxydis, dsoudris}@microlab.ntua.gr Maxeler Technologies UK {nvoss, sfytraki, georgi}@maxeler.com

FPL 2019 Conference

slide-2
SLIDE 2

1 2 3 4 5 6 SomaticVarscan SomaticSamtools Normal

Time in hrs Workflow

Bowtie2 in SeqMule WES Workflows

Other Bowtie2 2 4 6 8 10 SomaticVarscan SomaticSamtools NormalVarscan Time in hrs Workflow

WES Analysis Bowtie2 Workflows

Time for Increasing Input Size

10 GB 14.8 GB 19.3 GB

Genome Sequencing

  • Genome represents entire genetic information of an organism
  • Next-Generation Sequencing technologies allow to compare individual to reference

genome

  • Typical genomic workflow e.g. SeqMule
  • short read alignment: reads ~100 bases long
  • Operate on huge amount of data
  • Aligners Bottleneck in Workflow => in need of acceleration!

QC assessment

  • n input

sequences Alignment Variant Calling Extract consensus calls Alignment coverage statistics 8 September 2019

slide-3
SLIDE 3

Problem Statement

  • Most Aligners utilize Seed & Extend Model
  • Fragment reads into short pieces (seeds) that align exactly to genome
  • Extend seeds to full alignment with SmithWaterman
  • SmithWaterman
  • Matrix Fill Stage followed by Traceback
  • Takes up 60% (55% + 5% respectively)
  • f total time
  • Distributed over hundreds of tasks per read
  • calling & data transfer overhead
  • Challenge
  • Co-designed Solution to avoid overhead
  • Extract parallelism to further boost performance

10 20 30 40 50 60 1 5 8 10 15 20 30 40 50 100 200

% of total reads number of calls

83% 8 September 2019

FPL 2019 Conference

slide-4
SLIDE 4

Standalone Optimized Dataflow Implementation

  • Matrix Fill Calculates Matrices E,H,F
  • Traceback traverses matrices in reverse order to construct alignment path

2 1 3 6 4 1 2 2 1 4 3 2 1 1 2 1 2 2 1 2 2 2 1 3 6 4 1 2 2 1 4 3 2 1 1 2 1 2 2 1 2 2 1 4 7 1 1 2 5 4 2 1 3 2 1 2 1 2 1

Matrix-Fill not yet computed

𝑄𝐹 𝑄𝐹 𝑄𝐹 𝑄𝐹 reference stream 𝑜 + 𝑛 − 1 𝑜

1st row nth antidiagonal

1 1 2 1 3 2 1 2 1 2 2 1 5 1 7 4 4 Traceback 1 1 2 1 3 2 1 2 1 1 1 2 1 3 2 1 2 1 2 2 2 2 1 5 1 1 5 1 4 4 4 4 7 7

up, left elements: current checks upleft element: next check

past checks

  • Interleaving Data Scheme
  • Interlace data from subsequent read-reference pairs
  • Double Buffering
  • operate in pipeline fashion

8 September 2019

FPL 2019 Conference

slide-5
SLIDE 5

Proposed Integration Architecture

Key Architectural Decisions

  • Move Traceback on Hardware to alleviate transfer cost
  • Major Software Restructure to constraint number of accelerator calls

Results

  • x18 speedup standalone
  • x1,55 speedup end to end

Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W

. . .

1st 2nd . . . Lth

Reads

PCIe Data gathering phase interleaving

. . .

PE0 PE1 PEn

Traceback

HW Execution phase PCIe 1st 2nd . . . Lth

Alignments

C-CTACC ACGT--CG ACGTGCC Data distribution phase

L-interleaved pairs chain of seed-extend alignments 8 September 2019

FPL 2019 Conference

slide-6
SLIDE 6

Thank you for your attention!

8 September 2019

FPL 2019 Conference