ASPEN: A Scalable In- SRAM Architecture for Pushdown Automata - - PowerPoint PPT Presentation

aspen a scalable in sram architecture for pushdown
SMART_READER_LITE
LIVE PREVIEW

ASPEN: A Scalable In- SRAM Architecture for Pushdown Automata - - PowerPoint PPT Presentation

ASPEN: A Scalable In- SRAM Architecture for Pushdown Automata Kevin A Angstadt , Arun Subramaniyan , Elaheh Sadredini , Reza Rahimi , Kevin Skadron , Westley Weimer , Reetuparna Das Computer Science and


slide-1
SLIDE 1

ASPEN: A Scalable In- SRAM Architecture for Pushdown Automata

Kevin A Angstadt∗, Arun Subramaniyan∗, Elaheh Sadredini†, Reza Rahimi†, Kevin Skadron†, Westley Weimer∗, Reetuparna Das∗

∗Computer Science and Engineering, University of Michigan †Department of Computer Science, University of Virginia This work is funded, in part, by the NSF (1763674, 1619098, CAREER-1652294 and CCF-1629450); Air Force (FA8750-17-2-0079); and CRISP, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA.

slide-2
SLIDE 2

Processing Growing Quantities of Data

2

  • 2.5 quintillion bytes of data/day
  • Analysis /manipulation requires

deserializati tion

  • Most data use re

recursi sively- nested g grammars

  • XML
  • JSON
  • Poor performance on CPU
  • High branching

Source: Domo — Data Never Sleeps 5.0

slide-3
SLIDE 3

<course> <footnote></footnote> <sln>10637</sln> <prefix>ACCTG</prefix> <crs>230</crs> <lab></lab> <sect>01</sect> <title>INT FIN ACCT</title> <credit>3.0</credit> <days>TU,TH</days> <times> <start>7:45</start> <end>9</end> </times> <place> <bldg>TODD</bldg> <room>230</room> </place> <instructor>

  • B. MCELDOWNEY

</instructor> <limit>0112</limit> <enrolled>0108</enrolled> </course>

XML Nesting

3

slide-4
SLIDE 4

<course> <footnote></footnote> <sln>10637</sln> <prefix>ACCTG</prefix> <crs>230</crs> <lab></lab> <sect>01</sect> <title>INT FIN ACCT</title> <credit>3.0</credit> <days>TU,TH</days> <times> <start>7:45</start> <end>9</end> </times> <place> <bldg>TODD</bldg> <room>230</room> </place> <instructor>

  • B. MCELDOWNEY

</instructor> <limit>0112</limit> <enrolled>0108</enrolled> </course>

XML Nesting Subtree Mining

A B C A C B C C A A C B B C A C B C A A C B

4

slide-5
SLIDE 5

<course> <footnote></footnote> <sln>10637</sln> <prefix>ACCTG</prefix> <crs>230</crs> <lab></lab> <sect>01</sect> <title>INT FIN ACCT</title> <credit>3.0</credit> <days>TU,TH</days> <times> <start>7:45</start> <end>9</end> </times> <place> <bldg>TODD</bldg> <room>230</room> </place> <instructor>

  • B. MCELDOWNEY

</instructor> <limit>0112</limit> <enrolled>0108</enrolled> </course>

XML Nesting Subtree Mining

A B C A C B C C A A C B B C A C B C A A C B S Exp Term

int * ( ) int int

Term Exp Exp Term Term

+

S → Exp ⊣ Exp → Term + Exp ∣ Term Term → int * Term ∣ ( Exp ) ∣ int

Parsing

5

slide-6
SLIDE 6

Processing Growing Quantities of Data

6

  • Automata/RegEx help tame

analysis of big data sets

  • Frequent Itemset/Pattern Mining
  • NLP Part-of-Speech Tagging
  • Data Deduplication
  • Ensemble-Based Classification
  • Particle Physics Analyses
  • Growing number of

architectural solutions

Source: Domo — Data Never Sleeps 5.0

slide-7
SLIDE 7

Automata/RegEx Processing Platforms

7

Custom A ASIC IC Existing A Architecture Von N Neumann Spa Spatia ial-Re Reconfigurable

slide-8
SLIDE 8

Automata/RegEx Processing Platforms

8

Custom A ASIC IC Existing A Architecture Von N Neumann Spa Spatia ial-Re Reconfigurable

HyperScan PCRE Becchi, et al. VASim iNFAnt2 DFAGE REAPR Micron AP UAP HARE IBM PowerEN PAP Cache Automaton

slide-9
SLIDE 9

Automata/RegEx Processing Platforms

9

Custom A ASIC IC Existing A Architecture Von N Neumann Spa Spatia ial-Re Reconfigurable

HyperScan PCRE Becchi, et al. VASim

CPU-Based

iNFAnt2 DFAGE REAPR Micron AP UAP HARE IBM PowerEN PAP Cache Automaton

slide-10
SLIDE 10

Automata/RegEx Processing Platforms

10

Custom A ASIC IC Existing A Architecture Von N Neumann Spa Spatia ial-Re Reconfigurable

HyperScan PCRE Becchi, et al. VASim

CPU-Based

iNFAnt2 DFAGE

GPU-Based

REAPR Micron AP UAP HARE IBM PowerEN PAP Cache Automaton

slide-11
SLIDE 11

Automata/RegEx Processing Platforms

11

Custom A ASIC IC Existing A Architecture Von N Neumann Spa Spatia ial-Re Reconfigurable

HyperScan PCRE Becchi, et al. VASim

CPU-Based

iNFAnt2 DFAGE

GPU-Based

REAPR Micron AP UAP HARE IBM PowerEN PAP Cache Automaton

FPGA-Based

slide-12
SLIDE 12

Automata/RegEx Processing Platforms

12

Custom A ASIC IC Existing A Architecture Von N Neumann Spa Spatia ial-Re Reconfigurable

HyperScan PCRE Becchi, et al. VASim

CPU-Based

iNFAnt2 DFAGE

GPU-Based

REAPR Micron AP UAP HARE IBM PowerEN PAP Cache Automaton

FPGA-Based

Finite automata are fu fundam amental ally li limited in the kinds and complexity of analyses they support ASPE PEN is a new processor—inspired by automata processing—that supports a richer c computational m model

slide-13
SLIDE 13

Automata/RegEx Processing Platforms

13

Custom A ASIC IC Existing A Architecture Von N Neumann Spa Spatia ial-Re Reconfigurable

HyperScan PCRE Becchi, et al. VASim

CPU-Based

iNFAnt2 DFAGE

GPU-Based

REAPR Micron AP UAP HARE IBM PowerEN PAP Cache Automaton

FPGA-Based

Finite automata are fu fundam amental ally li limited in the kinds and complexity of analyses they support ASPE PEN is a new processor—inspired by automata processing—that supports a richer c computational m model

slide-14
SLIDE 14
  • Accelerated in-SRAM Pushdown EN

ENgine

  • Scalable processing engine that uses L

LLC s slices to accelerate Pushdown Automata computation

  • Custom five-stage datapath using SRAM lookups can process up to
  • n
  • ne by

byte per c cycle

  • Optimizing compiler supports

ts e existi ting gra grammars, packs states efficiently, and reduces the number processing stalls

  • Provides additional cache when not in use

ASPEN Supports Richer Analyses

14

slide-15
SLIDE 15
  • Pushdown Automata Refresher
  • Architectural Design of ASPEN
  • Why LLC?
  • Datapath innovation
  • Optimizations
  • Evaluation
  • XML Parsing
  • Subtree Mining

Overview of this Talk

15

Source: https://www.flickr.com/photos/10623456@N02/45022262771 CC BY-NC 2.0

slide-16
SLIDE 16

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

ST STACK

16

Pushdown Automata Refresher

slide-17
SLIDE 17

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

ST STACK

17

Pushdown Automata Refresher

Finite State Control

slide-18
SLIDE 18

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

ST STACK

18

Pushdown Automata Refresher

Finite State Control Stack Memory

slide-19
SLIDE 19

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

ST STACK

19

Pushdown Automata Refresher

Input Symbol Match

Finite State Control Stack Memory

slide-20
SLIDE 20

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

ST STACK

20

Pushdown Automata Refresher

Input Symbol Match Top of Stack Match

Finite State Control Stack Memory

slide-21
SLIDE 21

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

ST STACK

21

Pushdown Automata Refresher

Input Symbol Match Top of Stack Match Stack Actions

Finite State Control Stack Memory

slide-22
SLIDE 22

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

ST STACK

22

Pushdown Automata Refresher

Input Symbol Match Top of Stack Match Stack Actions

Finite State Control Stack Memory

Deterministic Pushdown Automata (DPDA) av avoid stack d divergence, but still support parsing of most common languages

slide-23
SLIDE 23

01010c0 c01010

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

ST STACK

23

Recognizing Palindromes with a Middle Character

slide-24
SLIDE 24

01010c0 c01010

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

ST STACK

24

Recognizing Palindromes with a Middle Character

slide-25
SLIDE 25

010 1010 10c010 1010 10

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

ST STACK

25

Recognizing Palindromes with a Middle Character

slide-26
SLIDE 26

01010c0 c01010

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

1 ⊥

ST STACK

26

Recognizing Palindromes with a Middle Character

slide-27
SLIDE 27

01 01010 10c010 1010 10

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

1 ⊥

ST STACK

27

Recognizing Palindromes with a Middle Character

slide-28
SLIDE 28

01 01010c 0c01010

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

1 1 ⊥

ST STACK

28

Recognizing Palindromes with a Middle Character

slide-29
SLIDE 29

01 0101 010c0 c01010

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

1 1 ⊥

ST STACK

29

Recognizing Palindromes with a Middle Character

slide-30
SLIDE 30

01 0101 010c01 0101 010

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

1 1 ⊥

ST STACK

30

Recognizing Palindromes with a Middle Character

slide-31
SLIDE 31

01 0101 010c010 1010 10

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

1 1 ⊥

ST STACK

31

Recognizing Palindromes with a Middle Character

slide-32
SLIDE 32

01010c0 c0101 010

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

1 1 ⊥

ST STACK

32

Recognizing Palindromes with a Middle Character

slide-33
SLIDE 33

01010c0 c01010 10

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

1 ⊥

ST STACK

33

Recognizing Palindromes with a Middle Character

slide-34
SLIDE 34

01010c0 c01010

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

1 ⊥

ST STACK

34

Recognizing Palindromes with a Middle Character

slide-35
SLIDE 35

01010c0 c01010

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

ST STACK

35

Recognizing Palindromes with a Middle Character

slide-36
SLIDE 36

01010c0 c01010 ✓

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

ST STACK

36

Recognizing Palindromes with a Middle Character

slide-37
SLIDE 37
  • ASPEN supports ho

homoge geneous DPDA

  • All transitions to a state occur on the same input character, stack

comparison, and stack operation

  • Similar in nature to homogeneous NFAs
  • Equal expressive power as standard DPDA
  • State increase is quadrati

tic i in th the w worst c t case with a fixed alphabet

  • Allows for efficient m

t mapping to hardware resources

  • Transitions decoupled from input/stack matches

Mapping DPDA Efficiently to Hardware

37

slide-38
SLIDE 38

01010c0 c01010

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

ST STACK

Five Steps of DPDA Execution Per Cycle

38

slide-39
SLIDE 39

01010c0 c01010

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

ST STACK

  • 1. Input Match"
  • 2. Stack Match
  • 3. Action Lookup
  • 4. Stack Update
  • 5. State

Transition

Five Steps of DPDA Execution Per Cycle

39

1"

slide-40
SLIDE 40

01010c0 c01010

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

ST STACK

  • 1. Input Match"
  • 2. Stack Match
  • 3. Action Lookup
  • 4. Stack Update
  • 5. State

Transition

Five Steps of DPDA Execution Per Cycle

40

1" 2

slide-41
SLIDE 41

01010c0 c01010

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

ST STACK

  • 1. Input Match"
  • 2. Stack Match
  • 3. Action Lookup
  • 4. Stack Update
  • 5. State

Transition

Five Steps of DPDA Execution Per Cycle

41

1" 2 3

slide-42
SLIDE 42

01010c0 c01010

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

ST STACK

  • 1. Input Match"
  • 2. Stack Match
  • 3. Action Lookup
  • 4. Stack Update
  • 5. State

Transition

Five Steps of DPDA Execution Per Cycle

42

1" 2 4 3

slide-43
SLIDE 43

01010c0 c01010

* Pop 0 Push ‘0’ 1 * Pop 0 Push ‘1’ c * Pop 0 No Push 1 1 Pop 1 No Push Pop 1 No Push ε ⊥ Pop 0 No Push

ST STACK

  • 1. Input Match"
  • 2. Stack Match
  • 3. Action Lookup
  • 4. Stack Update
  • 5. State

Transition

Five Steps of DPDA Execution Per Cycle

43

1" 2 4 3 5

slide-44
SLIDE 44
  • ASPEN repurposes LLC slices for pushdown automata computation
  • Location in LLC supports tighter coupling with CPU operations than

dedicated accelerator

  • PDA often part of a larger workflow
  • ASPEN similar to auxiliary functional unit in CPU (similar to FPU or vector

unit)

  • SRAM arrays in LLC already support necessary operations for DPDA

execution

Implementing ASPEN in LLC

44

slide-45
SLIDE 45
  • ASPEN uses 2

arrays per bank

  • 240 states per

bank

  • Full connectivity

within bank

  • Global switch

and stack in CBOX for large DPDA

Where is ASPEN?

45

CB CBOX Wa Way 20 Wa Way 2 Wa Way 1 AS ASPEN G-sw switc tch AS ASPEN G-sta stack AS ASPEN

slide-46
SLIDE 46
  • Check all s

sta tates against top of stack

  • One column of SRAM/state
  • Input TOS as row address
  • “1”: match; ”0”: no match
  • In

Intersect with currently active states

Stack Match in SRAM

46

4:1 column mux Row Decoder

255

255 255

Top of Stack (TOS) Active State Vector Active States Matching Stack

slide-47
SLIDE 47
  • Check all s

sta tates against top of stack

  • One column of SRAM/state
  • Input TOS as row address
  • “1”: match; ”0”: no match
  • In

Intersect with currently active states

Stack Match in SRAM

47

4:1 column mux Row Decoder

255

255 255

Top of Stack (TOS) Active State Vector Active States Matching Stack

S0 S0 * S1 S1 * S3 S3 * S2 S254 1 S2 S255 S2 S253 ⊥

slide-48
SLIDE 48
  • Check all s

sta tates against top of stack

  • One column of SRAM/state
  • Input TOS as row address
  • “1”: match; ”0”: no match
  • In

Intersect with currently active states

Stack Match in SRAM

48

4:1 column mux Row Decoder

255

255 255

Top of Stack (TOS) Active State Vector Active States Matching Stack

S0 S0 * S1 S1 * S3 S3 * S2 S254 1 S2 S255 S2 S253 ⊥

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

slide-49
SLIDE 49

49

ASPEN Datapath — 240 States per Two SRAM Arrays

  • 1. Input Match!
  • 2. Stack Match
  • 3. Action Lookup
  • 4. Stack Update
  • 5. State

Transition

240b 8-bit Input

Row Decoder

255

EN EN 255 255

4:1 column mux Row Decoder

255

255 255

4:1 column mux Active State Vector

256b 256b

256 x 256 6-T SRAM L-Switch

256b

SM Vector

240b

IM Vector

Push S

  • Sym. Pop #

#

8b 8b

TO TOS Local TOS

8b 8b

TOS + +1

St Stack Po Pointer

SRAM Array0 SRAM Array1

256b

slide-50
SLIDE 50

50

ASPEN Datapath — 240 States per Two SRAM Arrays

  • 1. Input Match!
  • 2. Stack Match
  • 3. Action Lookup
  • 4. Stack Update
  • 5. State

Transition

240b 8-bit Input

Row Decoder

255

EN EN 255 255

4:1 column mux Row Decoder

255

255 255

4:1 column mux Active State Vector

256b 256b

256 x 256 6-T SRAM L-Switch

256b

SM Vector

240b

IM Vector

  • 1. Input

Matching — One Column per State

Push S

  • Sym. Pop #

#

8b 8b

TO TOS Local TOS

8b 8b

TOS + +1

St Stack Po Pointer

SRAM Array0 SRAM Array1

256b

slide-51
SLIDE 51

51

ASPEN Datapath — 240 States per Two SRAM Arrays

  • 1. Input Match!
  • 2. Stack Match
  • 3. Action Lookup
  • 4. Stack Update
  • 5. State

Transition

240b 8-bit Input

Row Decoder

255

EN EN 255 255

4:1 column mux Row Decoder

255

255 255

4:1 column mux Active State Vector

256b 256b

256 x 256 6-T SRAM L-Switch

256b

SM Vector

240b

IM Vector

  • 1. Input

Matching — One Column per State

  • 2. Stack

Matching — One Column per State

Push S

  • Sym. Pop #

#

8b 8b

TO TOS Local TOS

8b 8b

TOS + +1

St Stack Po Pointer

SRAM Array0 SRAM Array1

256b

slide-52
SLIDE 52

52

ASPEN Datapath — 240 States per Two SRAM Arrays

  • 1. Input Match!
  • 2. Stack Match
  • 3. Action Lookup
  • 4. Stack Update
  • 5. State

Transition

240b 8-bit Input

Row Decoder

255

EN EN 255 255

4:1 column mux Row Decoder

255

255 255

4:1 column mux Active State Vector

256b 256b

256 x 256 6-T SRAM L-Switch

256b

SM Vector

240b

IM Vector

  • 1. Input

Matching — One Column per State

  • 2. Stack

Matching — One Column per State

  • 3. Stack Actions —

One Row per State

Push S

  • Sym. Pop #

#

8b 8b

TO TOS Local TOS

8b 8b

TOS + +1

St Stack Po Pointer

SRAM Array0 SRAM Array1

256b

slide-53
SLIDE 53

53

ASPEN Datapath — 240 States per Two SRAM Arrays

  • 1. Input Match!
  • 2. Stack Match
  • 3. Action Lookup
  • 4. Stack Update
  • 5. State

Transition

240b 8-bit Input

Row Decoder

255

EN EN 255 255

4:1 column mux Row Decoder

255

255 255

4:1 column mux Active State Vector

256b 256b

256 x 256 6-T SRAM L-Switch

256b

SM Vector

240b

IM Vector

  • 1. Input

Matching — One Column per State

  • 2. Stack

Matching — One Column per State

  • 3. Stack Actions —

One Row per State

Push S

  • Sym. Pop #

#

8b 8b

TO TOS Local TOS

  • 4. Local Stack —

One Row per Entry

8b 8b

TOS + +1

St Stack Po Pointer

SRAM Array0 SRAM Array1

256b

slide-54
SLIDE 54

54

ASPEN Datapath — 240 States per Two SRAM Arrays

  • 1. Input Match!
  • 2. Stack Match
  • 3. Action Lookup
  • 4. Stack Update
  • 5. State

Transition

240b 8-bit Input

Row Decoder

255

EN EN 255 255

4:1 column mux Row Decoder

255

255 255

4:1 column mux Active State Vector

256b 256b

256 x 256 6-T SRAM L-Switch

256b

SM Vector

240b

IM Vector

  • 1. Input

Matching — One Column per State

  • 2. Stack

Matching — One Column per State

  • 3. Stack Actions —

One Row per State

Push S

  • Sym. Pop #

#

8b 8b

TO TOS Local TOS

  • 4. Local Stack —

One Row per Entry

8b 8b

TOS + +1

St Stack Po Pointer

SRAM Array0 SRAM Array1

256b

5. Reconfigurable Transition Matrix

slide-55
SLIDE 55
  • Av

Average of 6 65% re reduc uctio ion n in epsilon states

Optimizations

55

[A-Z] * Pop 0 No Push ε * Pop 1 Push ‘a’ … … [A-Z] * Pop 1 Push ‘a’ … … ε * Pop 1 No Push ε * Pop 1 No Push ε * Pop 1 No Push ε * Pop 1 No Push … … ε * Pop 4 No Push … …

Epsilon M Merging Multi tipop Go Goal: Reduce the number of stalls while processing input

slide-56
SLIDE 56
  • XML Parsing
  • Common to many data analyses (needed to read input data)
  • Step in a larger pipeline: Tokenization, Parsing, Validation, DOM

construction

  • Pipelined with Cache Automaton for tokenization
  • Single Large D

DPDA w with G Global S Stack

  • Frequent Subtree Mining
  • Task of identifying subtrees occurring above a threshold frequency in a

corpus of trees

  • Common in recommendation systems, packet routing, NLP, etc.
  • Many Small D

DPDA w with L Local S Stacks

Evaluation: Two Real-World Applications

56

slide-57
SLIDE 57

Co Component Max F Frequency Operating F Frequency ASPEN 880 MHz 850 MHz Cache Automaton 4 GHz 3.4 GHz

Experimental Methodology

57

  • Baseline E

Evaluation

  • CPU:

: 2.6 GHz dual-socket Intel Xeon E5-2697-v3 (28 cores total)

  • GPU:

: NVIDIA TITAN Xp

  • Pe

Performance an and Po Power: PAPI, Intel RAPL, NVIDIA nvprof

  • AS

ASPEN Si Simu mulation:

  • METIS graph partitioning framework
  • VASim modified for cycle-accurate DPDA simulation
slide-58
SLIDE 58
  • ASPEN is 13

13-18x x fa faster (on average) than popular CP CPU Parsers

  • Performance did not

vary significantly with complexity of XML

  • Optimizations and

tokenization hide !- stalls

XML Parsing: Parabix, Ximpleware, UW XML

58

5 10 15 20 25

Low (< 0.3) Medium (0.3-0.7) High (>0.7) Average

Speedup (normalized to Xerces) Markup Density Xerces Expat ASPEN ASPEN-MP

slide-59
SLIDE 59
  • ASPEN is (on

average) 67x f faster than CPUs 6x f faster than GPUs for end- to-end application

  • Performance on

ASPEN is independent from tree size and complexity

  • No !-transitions

Frequent Subtree Mining

59

Dat Datas aset Automata A Alphabets Stack A Alphabets Stack S Size T1M 16 17 29 T2M 38 39 49 TREEBANK 100 101 110

0.5 50 5000 T1M T2M TREEBANK Speedup (normalized to TreeMatcher) CPU-Kernel CPU-Full GPU-Kernel GPU-Full

slide-60
SLIDE 60

Conclusion

60

slide-61
SLIDE 61

Conclusion

61

CB CBOX Wa Way 20 Wa Way 2 Wa Way 1 AS ASPEN G-sw switc tch AS ASPEN G-sta stack AS ASPEN

ASPEN: P : Processor f for D DPDA A Acceleration

slide-62
SLIDE 62

Conclusion

62

CB CBOX Wa Way 20 Wa Way 2 Wa Way 1 AS ASPEN G-sw switc tch AS ASPEN G-sta stack AS ASPEN

ASPEN: P : Processor f for D DPDA A Acceleration

S → Exp ⊣ Exp → Term + Exp ∣ Term Term → int * Term ∣ ( Exp ) ∣ int

Supports P Processing o

  • f

Re Recursively-Nested a and T Tree-Structured D Data

A B C A C B C C A A C B B C A C B C A A C B
slide-63
SLIDE 63

Conclusion

63

CB CBOX Wa Way 20 Wa Way 2 Wa Way 1 AS ASPEN G-sw switc tch AS ASPEN G-sta stack AS ASPEN

ASPEN: P : Processor f for D DPDA A Acceleration

S → Exp ⊣ Exp → Term + Exp ∣ Term Term → int * Term ∣ ( Exp ) ∣ int

Supports P Processing o

  • f

Re Recursively-Nested a and T Tree-Structured D Data

A B C A C B C C A A C B B C A C B C A A C B

ST STACK

Homogeneous D Deterministic Pushdown A Automaton

slide-64
SLIDE 64

Conclusion

64

CB CBOX Wa Way 20 Wa Way 2 Wa Way 1 AS ASPEN G-sw switc tch AS ASPEN G-sta stack AS ASPEN

ASPEN: P : Processor f for D DPDA A Acceleration

S → Exp ⊣ Exp → Term + Exp ∣ Term Term → int * Term ∣ ( Exp ) ∣ int

Supports P Processing o

  • f

Re Recursively-Nested a and T Tree-Structured D Data

A B C A C B C C A A C B B C A C B C A A C B

ST STACK

Homogeneous D Deterministic Pushdown A Automaton

240b 8-bit Input

Row Decoder

255 EN EN 255 255

4:1 column mux Row Decoder

255 255 255

4:1 column mux Active State Vector

256b 256b

256 x 256 6-T SRAM L-Switch

Fr From G-Sw Switch 256b 32b 32b To To G-Sw Switch

SM Vector

240b

IM Vector Input Matching — One Column per State Stack Matching — One Column per State Stack Actions — One Row per State

Push S
  • Sym. Pop #
# 8b 8b TO TOS Local TOS Global TOS

Local Stack — One Row per Entry

8b 8b TOS + +1 St Stack Po Pointer

SRAM Array0 SRAM Array1

256b

Custom 5 5-Stage D Datapath

slide-65
SLIDE 65

Conclusion

65

CB CBOX Wa Way 20 Wa Way 2 Wa Way 1 AS ASPEN G-sw switc tch AS ASPEN G-sta stack AS ASPEN

ASPEN: P : Processor f for D DPDA A Acceleration

S → Exp ⊣ Exp → Term + Exp ∣ Term Term → int * Term ∣ ( Exp ) ∣ int

Supports P Processing o

  • f

Re Recursively-Nested a and T Tree-Structured D Data

A B C A C B C C A A C B B C A C B C A A C B

ST STACK

Homogeneous D Deterministic Pushdown A Automaton

240b 8-bit Input

Row Decoder

255 EN EN 255 255

4:1 column mux Row Decoder

255 255 255

4:1 column mux Active State Vector

256b 256b

256 x 256 6-T SRAM L-Switch

Fr From G-Sw Switch 256b 32b 32b To To G-Sw Switch

SM Vector

240b

IM Vector Input Matching — One Column per State Stack Matching — One Column per State Stack Actions — One Row per State

Push S
  • Sym. Pop #
# 8b 8b TO TOS Local TOS Global TOS

Local Stack — One Row per Entry

8b 8b TOS + +1 St Stack Po Pointer

SRAM Array0 SRAM Array1

256b

Custom 5 5-Stage D Datapath