Pattern Matching in Genomic Sequences through ReRAM Technology - - PowerPoint PPT Presentation

pattern matching in genomic sequences
SMART_READER_LITE
LIVE PREVIEW

Pattern Matching in Genomic Sequences through ReRAM Technology - - PowerPoint PPT Presentation

FindeR: Accelerating FM-Index-based Exact Pattern Matching in Genomic Sequences through ReRAM Technology Farzaneh Zokaee and Lei Jiang Indiana University Bloomington 3th HPCA Workshop on ACCELERATOR ARCHITECTURE IN COMPUTATIONAL BIOLOGY AND


slide-1
SLIDE 1

FindeR: Accelerating FM-Index-based Exact Pattern Matching in Genomic Sequences through ReRAM Technology

Farzaneh Zokaee and Lei Jiang

Indiana University Bloomington

3th HPCA Workshop on ACCELERATOR ARCHITECTURE IN COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

slide-2
SLIDE 2

Executive summary

  • 1. Designing PIM: for genome sequence analysis
  • Read alignment uses FM-Index algorithm to find exact locations of reads in

reference genome.

  • 2. Problems:
  • Accessing and finding exact matches for huge amount of generated reads by

FM-Index (Billions of reads).

  • 3. Proposed solutions: speeding up FM-Index
  • FindeR: ReRAM-based process-in-memory architecture
  • Remove cost of data transferring between cpu and memories
  • Hardware/algorithm co-design → operation parallelism ↑
  • 4. Results:
  • Throughput: 83% ~ 30k× over the state-of-the-art.
  • Throughput/power : 3.5× ~ 42.5k× over the state-of-the-art.

26

slide-3
SLIDE 3

Genome sequencing pipeline

27

Illumina PacBio Nanopore

  • rganic DNA

PacBio and Nanopore: long reads (1k bp) with error rate 15-40% Illumina HiSeq2000: short reads (100 bp) with error rate 1%

~3.2B bps

A T C G

TATATATACGTACTAGTACGT

ACGACTTTAGTACGTACGT

TATATATACGTACTAGTACGT

ACGTACGCCCC TACGTA ACGACTTTAGTACGTACGT

TATATATACGTACTAA AAAGTACGT CCCCC CTATATATACGTACTAGTACGT TATATATACGTACTAGTACGT TATATATACGTACTAGTACGT

ACG TTTTT AAA ACGTA ACGACGGG GGG GAGTACGTACGT

slide-4
SLIDE 4

Genome sequencing cost decreases

28

[Wetterstrand_GSP’19] available at www.genome.gov/sequencingcostsdata

$0.00 $0.01 $0.02 $0.03 $0.04 $0.05 $0.06 $0.07 Aug-13 Dec-14 May-16 Sep-17 Feb-19 Jun-20 Cost per mega-base

slide-5
SLIDE 5

Genome sequencing pipeline

29

Onur Mutlu, Processing Data Where It Makes Sense in Modern Computing Systems: Enabling In-Memory Computation, 17 September 2018 Cordoba HiPerNav Workshop 2018 Keynote

TATATATACGTACTAGTACGT

ACGACTTTAGTACGTACGT

TATATATACGTACTAGTACGT

ACGTACGCCCC TACGTA ACGACTTTAGTACGTACGT

TATATATACGTACTAA AAAGTACGT CCCCC CTATATATACGTACTAGTACGT TATATATACGTACTAGTACGT TATATATACGTACTAGTACGT

ACG TTTTT AAA ACGTA ACGACGGG GGG GAGTACGTACGT

Billions of Short Reads

Illumina HiSeq2000

1

Sequencing

A C T T A G C A C T 1 2 A 1 1 2 C 2 1 1 2 T 2 1 1 2 A 2 1 2 1 2 G 2 2 2 1 2 A 3 2 2 2 2 A 3 3 3 2 3 C 4 3 3 2 3 T 4 4 3 2 T 5 4 3

Short Read

... ...

Reference Genome Read Alignment

CCTATAATACG C C A T A T A T A C G

2

Read Alignment

3

Variant Calling

4

Discovery

slide-6
SLIDE 6

The pipeline latency matters!

30

[MolecularTesting_2019] available at www.mycancergenome.org/content/page/molecular-testing

Genome sequencing for profiling tumor

  • Variants → prioritize anti-cancer therapy

and direct patient management

life or death? which type?

Such a test takes

several days to weeks!!!

slide-7
SLIDE 7

Bottleneck in genome sequencing pipeline

31

2 Million bases/minute

Genome Sequencing Read Alignment

300 Million bases/minute

Bottlenecked in Alignment!!

Onur Mutlu, Processing Data Where It Makes Sense in Modern Computing Systems: Enabling In-Memory Computation, 17 September 2018 Cordoba HiPerNav Workshop 2018 Keynote

slide-8
SLIDE 8

The explosion in the genomic data capacity

32

1.00E+00 1.00E+02 1.00E+04 1.00E+06 1.00E+08 1.00E+10 2000 2005 2010 2015 2020 2025 2030 Cumulative # of Human Genomes projection Sanger Illumina PacBio Nanopore

[Stephens_PLoSBiol2015]

Moore’s Law

slide-9
SLIDE 9

Read alignment

33

C A T A

Reference

C G T A T T C A A A G A

Reads

A T C C G T A T C C G TA C A G A T T T T T C C A T C C G T A

slide-10
SLIDE 10

Read alignment

34

C A T A

Reference

T T C A A A G A

Reads

hit A T C C G T A T C C G TA C A G A T T T T T C C A T C C G T A C G T A

slide-11
SLIDE 11

Read alignment

35

C A T A

Reference

T T C A A A G A

Reads

hit insert A T C C G T A T C C G TA C A G A T T T T T C C A T C C G T A C G T A

slide-12
SLIDE 12

Read alignment

36

C A T A

Reference

T T C A

Reads

hit insert delete A T C C G T A T C C G TA C A G A T T T T T C C A T C C G T A A A G A C G T A

slide-13
SLIDE 13

Read alignment

37

C A T A

Reference Reads

hit insert delete substitute A T C C G T A T C C G TA C A G A T T T T T C C A T C C G T A T T C A A A G A C G T A

slide-14
SLIDE 14

Read alignment

38

C A T A

Reference Reads

hit insert delete substitute A T C C G T A T C C G TA C A G A T T T T T C C A T C C G T A T T C A A A G A C G T A

Seeding Find exact matches (FM-Index) Seed extension Find inexact matches Read alignment Seed-and-Extend :

Seeding is slow due to FM-Index search algorithm.

slide-15
SLIDE 15

Burrows-wheeler transform

39

0 A T C C G T $ 6 $ A T C C G T 5 T $ A T C C G 4 G T $ A T C C 3 C G T $ A T C 2 C C G T $ A T 1 T C C G T $ A Ref: A T C C G T $

slide-16
SLIDE 16

Burrows-wheeler transform

40

0 A T C C G T $ 6 $ A T C C G T 5 T $ A T C C G 4 G T $ A T C C 3 C G T $ A T C 2 C C G T $ A T 1 T C C G T $ A Ref: A T C C G T $

slide-17
SLIDE 17

Burrows-wheeler transform

41

0 A T C C G T $ 6 $ A T C C G T 5 T $ A T C C G 4 G T $ A T C C 3 C G T $ A T C 2 C C G T $ A T 1 T C C G T $ A

BWT: T $ T C C G A

Ref: A T C C G T $

slide-18
SLIDE 18

FM-Index

42

i A C G T 0 0 0 0 0 1 0 0 0 1 2 0 0 0 1 3 0 0 0 2 4 0 1 0 2 5 0 2 0 2 6 0 2 1 2 7 1 2 1 2 A C G T 1 2 4 5 BWT: T $ T C C G A Ref: A T C C G T $

Occ(S, i) Count

slide-19
SLIDE 19

FM-Index

43

i A C G T 0 0 0 0 0 1 0 0 0 1 2 0 0 0 1 3 0 0 0 2 4 0 1 0 2 5 0 2 0 2 6 0 2 1 2 7 1 2 1 2 A C G T 1 2 4 5 BWT: T $ T C C G A Ref: A T C C G T $

Occ(S, i) Count

0 1 2 3 4

4 0 1 2

slide-20
SLIDE 20

FM-Index

44

i A C G T 0 0 0 0 0 1 0 0 0 1 2 0 0 0 1 3 0 0 0 2 4 0 1 0 2 5 0 2 0 2 6 0 2 1 2 7 1 2 1 2 A C G T 1 2 4 5 BWT: T $ T C C G A Ref: A T C C G T $

Occ(S, i) Count

0 1 2 3 4

4 0 1 2 1

slide-21
SLIDE 21

FM-Index

45

i A C G T 0 0 0 0 0 1 0 0 0 1 2 0 0 0 1 3 0 0 0 2 4 0 1 0 2 5 0 2 0 2 6 0 2 1 2 7 1 2 1 2 A C G T 1 2 4 5 BWT: T $ T C C G A Ref: A T C C G T $

Occ(S, i) Count

8 entries!

0 1 2 3 4

4 0 1 2 1

slide-22
SLIDE 22

FM-Index

46

i A C G T 0 0 0 0 0 1 0 0 0 1 2 0 0 0 1 3 0 0 0 2 4 0 1 0 2 5 0 2 0 2 6 0 2 1 2 7 1 2 1 2 A C G T 1 2 4 5 BWT: T $ T C C G A

1

Ref: A T C C G T $

Occ(S, i) Count

8 entries!

0 1 2 3 4

4 0 1 2

tag

1

slide-23
SLIDE 23

FM-Index

47

i A C G T 0 0 0 0 0 1 0 0 0 1 2 0 0 0 1 3 0 0 0 2 4 0 1 0 2 5 0 2 0 2 6 0 2 1 2 7 1 2 1 2 A C G T 1 2 4 5 BWT: T $ T C C G A

BWT BWT 1

Ref: A T C C G T $

Occ(S, i) Count

8 entries! 2 entries

0 1 2 3 4

4 0 1 2

tag BWT

1

slide-24
SLIDE 24

Backward search

48

00 BackwardSearch(BWT, Q){ 01 int low = 0; 02 int high = max_occ; 03 for (int i = len; i >= 0; i--){

04 low = LFM(BWT[low/4], Q[i], low); 05 high=LFM(BWT[high/4], Q[i], high); 06

if (low >= high) return;

07 } 08 }

09 int LFM(BWT[x/4], Q[index], x){ 10 int co = 0; 11 int tag = TAG[Q[index]]; 12 for (int j = 0; j < x % 4; j++) 13 if (BWT[x/4][j] == s) co ++; 14 return co + tag; 15 }

Ref: A T C C G T $ Query: C G T BWT: T $ T C C G A

BWT BWT

tag tag

slide-25
SLIDE 25

Problem: operations in backward search

49

04 low = LFM(BWT[low/4], Q[i], low); 05 high=LFM(BWT[high/4], Q[i], high);

  • Random memory accesses due to pointer chasing

Processing-in-memory!

slide-26
SLIDE 26

Problem: operations in backward search

50

04 low = LFM(BWT[low/4], Q[i], low); 05 high=LFM(BWT[high/4], Q[i], high);

  • Random memory accesses due to pointer chasing

12 for (int j = 0; j < x % 4; j++) 13 if (BWT[x/4][j] == s) c ++;

Processing-in-memory!

  • Counting a symbol S in a string

Hamming distance between “SSSSS” and the string Hardware/algorithm co-design → operation parallelism ↑

slide-27
SLIDE 27

Solution: ReRAM Hamming Distance Unit

slide-28
SLIDE 28

ReRAM basics

52

low resistivity

SET RESET

high resistivity metal

  • xide

metal layer metal layer V

Form

high resistivity

slide-29
SLIDE 29

ReRAM-based Hamming Distance Unit

53

ADC

Reram array

Counting G in “CG”, Hamming distance between “GG” and “CG”

word-line bit-line

slide-30
SLIDE 30

ReRAM-based Hamming Distance Unit

54

ADC

Reram array

Counting G in “CG”, Hamming distance between “GG” and “CG”

word-line bit-line

slide-31
SLIDE 31

ReRAM-based Hamming Distance Unit

55

ADC

Reram array

Counting G in “CG”, Hamming distance between “GG” and “CG”

HR HR HR HR word-line bit-line

slide-32
SLIDE 32

ReRAM-based Hamming Distance Unit

56 A : 00 C : 01 G : 10 T : 11

ADC

Reram array

Counting G in “CG”, Hamming distance between “GG” and “CG”

HR HR HR HR word-line bit-line

slide-33
SLIDE 33

ReRAM-based Hamming Distance Unit

57 A : 00 C : 01 G : 10 T : 11

C G G G 1 1 1 1

ADC

Reram array

Counting G in “CG”, Hamming distance between “GG” and “CG”

HR HR HR HR word-line bit-line

slide-34
SLIDE 34

ReRAM-based Hamming Distance Unit

58 A : 00 C : 01 G : 10 T : 11

C G G G 1 1 1 1

ADC

Reram array

2 1 Counting G in “CG”, Hamming distance between “GG” and “CG”

HR HR HR HR LR LR word-line bit-line

slide-35
SLIDE 35

ReRAM-based Hamming Distance Unit

59 A : 00 C : 01 G : 10 T : 11

C G G G 1 1 1 1

ADC

Reram array

Counting G in “CG”, Hamming distance between “GG” and “CG”

HR HR HR HR LR LR HR HR HR HR word-line bit-line

slide-36
SLIDE 36

… Cin = 1 … … … …

ReRAM Lookup Table-based Adder

255

… Cin = 0 … Column address Row address A7 ~ A0 … …

255

… B7 ~ B0

09 int LFM(BWT[x/4], Q[index], x){ 10 int co = 0; 11 int tag = TAG[Q[index]]; 12 for (int j = 0; j < x % 4; j++) 13 if (BWT[x/4][j] == s) co ++; 14 return co + tag; 15 }

Z7 ~ Z0

60

slide-37
SLIDE 37

Pipeline Design

low/high/Q[i]

Pointer fetch

1 cycle 1 read

FM-Index mem

1 cycle 1 read low/high

ADC 1 cycle

RHU

1 Set 1 read 1 reset 3 cycles 4 cycles 4 reads

LUT Adder

61

slide-38
SLIDE 38

Results

slide-39
SLIDE 39

Short read alignment

CPU GPU ASIC FPGA FindR

Die size (mm2) 14.3K 1.6K 352 14.8K 1.1K Main memory(GB) 128 6 1.3 48 Power(W) 130 258 3.1 247 9.09 Throughput 68K 150K 379K 1.5M 10.07M Throughput/Watt 523 581 121K 6.2K 1.18M

  • FindR improves throughput by 10x over FPGA
  • FindR improves throughput/watt by 9.75x over ASIC

  

63

slide-40
SLIDE 40

Performance Quality Throughput Throughput/Watt Sensitivity Precision

FindR-PAC 2.9K 1.64K 95.95% 95.95% FindR-ONT 98.11% 99.10% Darwin-PAC 3.9K 0.26K Darwin-ONT

Long Read Seeding

 

× Darwin uses the power hungry SRAM buffers

  • FindR improves Throughput/Watt by 5.3x

99.71% 99.91% 98.2% 99.1% 64

slide-41
SLIDE 41

Performance Quality Throughput Throughput/Watt Sensitivity Precision

FindR-PAC 2.9K 1.64K 95.95% 95.95% FindR-ONT 98.11% 99.10% Darwin-PAC 3.9K 0.26K Darwin-ONT

Long Read Seeding

 

  • FindeR improves quality by using FM-Index-based error correction

technique with SMEM seeding

99.71% 99.91% 98.2% 99.1% 65

slide-42
SLIDE 42

Performance Quality Throughput Throughput/Watt Sensitivity Precision

FindR-PAC 2.9K 1.64K 95.95% 95.95% FindR-ONT 98.11% 99.10% Darwin-PAC 3.9K 0.26K Darwin-ONT

Long Read Seeding

 

  • FindeR improves quality by using FM-Index-based error correction

technique with SMEM seeding

 

99.71% 99.91% 98.2% 99.1% 66

slide-43
SLIDE 43

Short Read Alignment

Hash Table Dynamic Automata FM-Index RADAR BioCAM Race RCAM GenAx FindR Die size (mm2) 120 9.8K 450 383 4.6K 1.1K Off-chip memory(GB) 8 120 Function Seeding Seed Extension Both Power(W) 12.5 153 24.3 6.6K 20 9.09 Throughput 125 186.8K 2.1M 177K 973 3.86M Throughput/Watt 10 1.2K 86K 26 48.65 424.6K

  • FindR improves throughput by 83% ~ 30Kx
  • FindR improves throughput/watt by 3.5x ~ 42.5Kx

  

67

slide-44
SLIDE 44

Executive summary

  • 1. Designing PIM: for genome sequence analysis
  • Read alignment uses FM-Index algorithm to find exact locations of reads in

reference genome.

  • 2. Problems:
  • Accessing and finding exact matches for huge amount of generated reads by

FM-Index (Billions of reads).

  • 3. Proposed solutions: speeding up FM-Index
  • FindeR: ReRAM-based process-in-memory architecture
  • Remove cost of data transferring between cpu and memories
  • Hardware/algorithm co-design → operation parallelism ↑
  • 4. Results:
  • Throughput: 83% ~ 30k× over the state-of-the-art.
  • Throughput/power : 3.5× ~ 42.5k× over the state-of-the-art.

68

slide-45
SLIDE 45

FindeR: Accelerating FM-Index-based Exact Pattern Matching in Genomic Sequences through ReRAM Technology

Farzaneh Zokaee and Lei Jiang

Indiana University Bloomington

3th HPCA Workshop on ACCELERATOR ARCHITECTURE IN COMPUTATIONAL BIOLOGY AND BIOINFORMATICS