Generalized Pattern Matching Micro-Engine
Yuanwei Fang*, Raihan Rasool‡, Dilip Vasudevan*, Andrew A. Chien*† University of Chicago * Argonne National Laboratory† King Faisal University‡
Generalized Pattern Matching Micro-Engine Yuanwei Fang*, Raihan - - PowerPoint PPT Presentation
Generalized Pattern Matching Micro-Engine Yuanwei Fang*, Raihan Rasool , Dilip Vasudevan*, Andrew A. Chien* Argonne National Laboratory King Faisal University University of Chicago * Big Data Applications Deep Packet
Yuanwei Fang*, Raihan Rasool‡, Dilip Vasudevan*, Andrew A. Chien*† University of Chicago * Argonne National Laboratory† King Faisal University‡
6/24/2014 UNIVERSITY OF CHICAGO
2
6/24/2014 UNIVERSITY OF CHICAGO
3
High speed network : 100Gb/s Growing number of patterns: 6000 Snort Rules Speed requirement: > 75 Tera DFAops/s Power budget : 200 W Energy efficiency requirement: > 375Gops/J
6/24/2014 UNIVERSITY OF CHICAGO
4
Genome size: 130G base pairs Bioinformatics database: millions of species Speed requirement: > 1 Tera DFAops/s Power budget : 200 W Energy efficiency requirement: > 5 Gops/J
6/24/2014 UNIVERSITY OF CHICAGO
5
Intel Xeon E5-2600: 17G DFAops/second with 130W, 0.13Gops/J ;
6/24/2014 UNIVERSITY OF CHICAGO
6
target
M input characters(M DFA transitions) N DFA rules perform on the M input characters
Compute N x M transitions efficiently
Parallelize DFA execution Fused Instruction
6/24/2014 UNIVERSITY OF CHICAGO
7
Generalized Pattern Matching Micro-Engine ( GenPM ) is one micro-engine of 10x10 approach
6/24/2014 UNIVERSITY OF CHICAGO
8
Basic RISC CPU
I-Cache
Micro- engine 2
I-Cache
Micro- engine 3
I-Cache
Micro- engine 4
I-Cache
GenPM
I-Cache
Micro- engine 7
I-Cache
Micro- engine 6
I-Cache
Micro- engine 8
I-Cache
Shared L1 Data Cache Local Memory
6/24/2014 UNIVERSITY OF CHICAGO
9
6/24/2014 UNIVERSITY OF CHICAGO
10
A D Q1 Q4 ENB
Current State
Local Mem
c b a
String buffer
Next State 1
Acc_Vec
address
Accept
A D Q1 Q4 ENB
Current State
Local Mem
c b a
String buffer
Next State
Acc_Vec
address
Accept
6/24/2014 UNIVERSITY OF CHICAGO
11
A D Q1 Q4 ENB
Current State
Local Mem
c b a
String buffer
Next State 1
Acc_Vec
address
Accept
6/24/2014 UNIVERSITY OF CHICAGO
12
A D Q1 Q4 ENB
Current State
Local Mem
c b a
String buffer
Next State 1
Acc_Vec
address
Accept
A D Q1 Q4 ENB
Current State
Local Mem
c b a
String buffer
Next State 1
Acc_Vec
address
Accept
6/24/2014 UNIVERSITY OF CHICAGO
13
A D Q1 Q4 ENB
Current State
Local Mem
c b a
String buffer
Next State 1
Acc_Vec
address
Accept
A D Q1 Q4 ENB
Current State
Local Mem
c b a
String buffer
Next State 1
Acc_Vec
address
Accept
Acc_Vec
CHECK
6/24/2014 UNIVERSITY OF CHICAGO
14
SSE ADD + + + + + + +
6/24/2014 UNIVERSITY OF CHICAGO
15
GMVSNEXT DFAop DFAop DFAop DFAop DFAop DFAop DFAop
6/24/2014 UNIVERSITY OF CHICAGO
16
Data movement Multi-step parallel DFA execution Find precise matching position
6/24/2014 UNIVERSITY OF CHICAGO
17
6/24/2014 UNIVERSITY OF CHICAGO
18
36 243 300
289 1947 2498
500 1000 1500 2000 2500 3000
1 8 16
speedup versus RISC step length
Speedup
GenPM_8way GenPM_64way
6/24/2014 UNIVERSITY OF CHICAGO
19
31 151 174 213 861 980
200 400 600 800 1000 1200
1 8 16
energy improvement versus RISC
step length
GenPM_8way GenPM_64way
6/24/2014 UNIVERSITY OF CHICAGO
20
5 10 15 20 25 30 35 40
1 8 16
Throughput per watt(Gops/J) step length
Throughput/watt
GenPM_8way GenPM_64way
Scale to a 75W chip, GenPM delivers > 2.6 Tera DFAops/second
6/24/2014 UNIVERSITY OF CHICAGO
21
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
RISC GenPM_8B_1S GenPM_8B_8S GenPM_8B_16S GenPM_64B_1S GenPM_64B_8S GenPM_64B_16S
total energy LM L1_I L1_D L2 DRAM Core
LM_max = 83%
6/24/2014 UNIVERSITY OF CHICAGO
22
ASIC: [Brodie, et.al. ISCA 2006], [Titanic System RXP], [ Cisco SCE ] FPGA: [Yang Xu, et.al. ANCS 2011], [ T Song, et.al. INFOCOM 2008], [I Sourdis et.al. VLSI 2008] CPU: [Mytkowicz et.al. ASPLOS 2014 ] , [ Intel HyperScan] GPU: [Vasiliadis G, et.al. CCS 2011], [ Lin CH, et.al. INFOCOM 2012] SoC: [C Johnson et.al. ISSCC 2010 ], [ Cavium Octeon ], [ IBM PowerEN ]
6/24/2014 UNIVERSITY OF CHICAGO
23
pattern matching workloads
programmable core
6/24/2014 UNIVERSITY OF CHICAGO
24
6/24/2014 UNIVERSITY OF CHICAGO
25
6/24/2014 UNIVERSITY OF CHICAGO
26