Dataflow acceleration of Smith- Waterman with Traceback for high - - PowerPoint PPT Presentation

▶

Nov 23, 2023 110 likes •177 views

Dataflow acceleration of Smith- Waterman with Traceback for high throughput Next Generation Sequencing Konstantina Koliogeorgi*, Nils Voss , Sotiria Fytraki , Sotirios Xydis*, Georgi Gaydadjiev , Dimitrios Soudris* * National

SLIDE 1

Dataflow acceleration of Smith- Waterman with Traceback for high throughput Next Generation Sequencing

Konstantina Koliogeorgi*, Nils Vossⴕ, Sotiria Fytraki ⴕ, Sotirios Xydis*, Georgi Gaydadjiev ⴕ, Dimitrios Soudris*

* National Technical University of Athens, Greece, {konstantina, sxydis, dsoudris}@microlab.ntua.gr Maxeler Technologies UK {nvoss, sfytraki, georgi}@maxeler.com

FPL 2019 Conference

SLIDE 2

1 2 3 4 5 6 SomaticVarscan SomaticSamtools Normal

Time in hrs Workflow

Bowtie2 in SeqMule WES Workflows

Other Bowtie2 2 4 6 8 10 SomaticVarscan SomaticSamtools NormalVarscan Time in hrs Workflow

WES Analysis Bowtie2 Workflows

Time for Increasing Input Size

10 GB 14.8 GB 19.3 GB

Genome Sequencing

Genome represents entire genetic information of an organism
Next-Generation Sequencing technologies allow to compare individual to reference

genome

Typical genomic workflow e.g. SeqMule
short read alignment: reads ~100 bases long
Operate on huge amount of data
Aligners Bottleneck in Workflow => in need of acceleration!

QC assessment

n input

sequences Alignment Variant Calling Extract consensus calls Alignment coverage statistics 8 September 2019

SLIDE 3

Problem Statement

Most Aligners utilize Seed & Extend Model
Fragment reads into short pieces (seeds) that align exactly to genome
Extend seeds to full alignment with SmithWaterman
SmithWaterman
Matrix Fill Stage followed by Traceback
Takes up 60% (55% + 5% respectively)
f total time
Distributed over hundreds of tasks per read
calling & data transfer overhead
Challenge
Co-designed Solution to avoid overhead
Extract parallelism to further boost performance

10 20 30 40 50 60 1 5 8 10 15 20 30 40 50 100 200

% of total reads number of calls

83% 8 September 2019

FPL 2019 Conference

SLIDE 4

Standalone Optimized Dataflow Implementation

Matrix Fill Calculates Matrices E,H,F
Traceback traverses matrices in reverse order to construct alignment path

2 1 3 6 4 1 2 2 1 4 3 2 1 1 2 1 2 2 1 2 2 2 1 3 6 4 1 2 2 1 4 3 2 1 1 2 1 2 2 1 2 2 1 4 7 1 1 2 5 4 2 1 3 2 1 2 1 2 1

Matrix-Fill not yet computed

𝑄𝐹 𝑄𝐹 𝑄𝐹 𝑄𝐹 reference stream 𝑜 + 𝑛 − 1 𝑜

1st row nth antidiagonal

1 1 2 1 3 2 1 2 1 2 2 1 5 1 7 4 4 Traceback 1 1 2 1 3 2 1 2 1 1 1 2 1 3 2 1 2 1 2 2 2 2 1 5 1 1 5 1 4 4 4 4 7 7

up, left elements: current checks upleft element: next check

past checks

Interleaving Data Scheme
Interlace data from subsequent read-reference pairs
Double Buffering
operate in pipeline fashion

8 September 2019

FPL 2019 Conference

SLIDE 5

Proposed Integration Architecture

Key Architectural Decisions

Move Traceback on Hardware to alleviate transfer cost
Major Software Restructure to constraint number of accelerator calls

Results

x18 speedup standalone
x1,55 speedup end to end

Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W Sm W

. . .

1st 2nd . . . Lth

Reads

PCIe Data gathering phase interleaving

. . .

PE0 PE1 PEn

Traceback

HW Execution phase PCIe 1st 2nd . . . Lth

Alignments

C-CTACC ACGT--CG ACGTGCC Data distribution phase

L-interleaved pairs chain of seed-extend alignments 8 September 2019

FPL 2019 Conference

SLIDE 6

Thank you for your attention!

8 September 2019

FPL 2019 Conference