Accelerating Pattern Matching Queries in Hybrid CPU-FPGA - - PowerPoint PPT Presentation

accelerating pattern matching queries in hybrid cpu fpga
SMART_READER_LITE
LIVE PREVIEW

Accelerating Pattern Matching Queries in Hybrid CPU-FPGA - - PowerPoint PPT Presentation

Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures David Sidler , Zsolt Istv an, Muhsen Owaida, Gustavo Alonso Dept. of Computer Science, ETH Z urich Systems Group, Dept. of Computer Science, ETH Z urich Increasing


slide-1
SLIDE 1

Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures

David Sidler, Zsolt Istv´ an, Muhsen Owaida, Gustavo Alonso

  • Dept. of Computer Science, ETH Z¨

urich

Systems Group, Dept. of Computer Science, ETH Z¨ urich

slide-2
SLIDE 2

Increasing amount of user generated data

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 2 / 32

slide-3
SLIDE 3

Increasing amount of user generated data

Query (WHERE clause) Response time (s) Database MonetDB DBx LIKE ’%Alan%Turing%Cheshire%’ 0.02 0.43 REGEXP LIKE(’Alan.*Turing.*Cheshire’) 0.36 8.86

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 2 / 32

slide-4
SLIDE 4

Increasing amount of user generated data

Query (WHERE clause) Response time (s) Database MonetDB DBx LIKE ’%Alan%Turing%Cheshire%’ 0.02 0.43 REGEXP LIKE(’Alan.*Turing.*Cheshire’) 0.36 8.86 Databases are not suitable for complex text queries!

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 2 / 32

slide-5
SLIDE 5

Accelerators to the rescue

Using GPUs [1,2] or Xeon Phi [3] to accelerate string matching:

High speed-up Data already on accelerator or data movement reduces acceleration benefit Change of data layout Performance depends on pattern complexity

[1] E. Sitaridi, K. Ross, GPU-Accelerated string matching for database applications, VLDB Journal, Oct. 2016 [2] C.-H. Lin, et al., Accelerating regular expression matching using hierarchical parallel machines on GPU, GLOBECOM’11 [3] E. Sitaridi, O. Polychroniou, K. Ross, SIMD-Accelerated regular expression matching, DAMON’16

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 3 / 32

slide-6
SLIDE 6

Accelerators to the rescue

Using GPUs [1,2] or Xeon Phi [3] to accelerate string matching:

High speed-up Data already on accelerator or data movement reduces acceleration benefit Change of data layout Performance depends on pattern complexity

Integration into database engine often unclear

[1] E. Sitaridi, K. Ross, GPU-Accelerated string matching for database applications, VLDB Journal, Oct. 2016 [2] C.-H. Lin, et al., Accelerating regular expression matching using hierarchical parallel machines on GPU, GLOBECOM’11 [3] E. Sitaridi, O. Polychroniou, K. Ross, SIMD-Accelerated regular expression matching, DAMON’16

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 3 / 32

slide-7
SLIDE 7

Accelerators to the rescue

Using GPUs [1,2] or Xeon Phi [3] to accelerate string matching:

High speed-up Data already on accelerator or data movement reduces acceleration benefit Change of data layout Performance depends on pattern complexity

Integration into database engine often unclear

[1] E. Sitaridi, K. Ross, GPU-Accelerated string matching for database applications, VLDB Journal, Oct. 2016 [2] C.-H. Lin, et al., Accelerating regular expression matching using hierarchical parallel machines on GPU, GLOBECOM’11 [3] E. Sitaridi, O. Polychroniou, K. Ross, SIMD-Accelerated regular expression matching, DAMON’16

Data partitioning/movement hinders wide-spread adoption of database accelerators!

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 3 / 32

slide-8
SLIDE 8

New hybrid architectures are emerging

IBM Power8 + CAPI

Source: Heterogeneous computing on POWER, Cesar Diniz Maciel, IBM

Intel Xeon+FPGA

Source: Intel Xeon+FPGA Platform for the Data Center, PK Gupta, Intel Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 4 / 32

slide-9
SLIDE 9

New hybrid architectures are emerging

IBM Power8 + CAPI

Source: Heterogeneous computing on POWER, Cesar Diniz Maciel, IBM

Intel Xeon+FPGA

Source: Intel Xeon+FPGA Platform for the Data Center, PK Gupta, Intel

Eliminate the issue of data movement/partitioning

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 4 / 32

slide-10
SLIDE 10

Intel Xeon+FPGA prototype platform

Version 1 (used in this work) Stratix V

FPGA cache

Xeon E5

  • Mem. controller

Memory User Logic QPI Read-heavy: 6.5 GB/s Read/Write: 3 GB/s

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 5 / 32

slide-11
SLIDE 11

Intel Xeon+FPGA prototype platform

Version 1 (used in this work) Stratix V

FPGA cache

Xeon E5

  • Mem. controller

Memory User Logic QPI Read-heavy: 6.5 GB/s Read/Write: 3 GB/s Version 2 Larger bandwidth (1xQPI, 2xPCI) Larger FPGA FPGA in same package (single socket)

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 5 / 32

slide-12
SLIDE 12

Intel Xeon+FPGA prototype platform

Version 1 (used in this work) Stratix V

FPGA cache

Xeon E5

  • Mem. controller

Memory User Logic QPI Read-heavy: 6.5 GB/s Read/Write: 3 GB/s Version 2 Larger bandwidth (1xQPI, 2xPCI) Larger FPGA FPGA in same package (single socket)

Disclaimer

This is an experimental system provided by Intel any results presented are generated using pre-production hardware and software, and may not reflect the performance of production or future systems.

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 5 / 32

slide-13
SLIDE 13

FPGA (Field Programmable Gate Array)

CLB CLB CLB CLB CLB CLB BRAM CLB CLB BRAM CLB BRAM CLB CLB BRAM CLB BRAM CLB CLB BRAM CLB BRAM CLB CLB BRAM CLB BRAM CLB CLB BRAM CLB BRAM CLB CLB BRAM CLB Routing On-chip memory Logic blocks

Reprogrammable, load arbitrary circuits

  • nto the FPGA

Once programmed acts similar to an integrated circuit (lower frequency) Logic blocks (around 100,000) Fast on-chip memory (36K each)

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 6 / 32

slide-14
SLIDE 14

Parameterizable Regular Expression Engine

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 7 / 32

slide-15
SLIDE 15

Regular Expression in Hardware

Regex can be mapped to a Non-deterministic finite automata (NFA) NFAs can be efficiently executed on FPGAs [4,5] Regular expression: (ab+|ba+)c Input:

[4] R. Sidhu, V. Prasanna, Fast regular expression matching using FPGAs, FCCM’01 [5] L. Woods, J. Teubner, Complex event detection at wire speed with FPGAs, VLDB’10

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 8 / 32

slide-16
SLIDE 16

Regular Expression in Hardware

Regex can be mapped to a Non-deterministic finite automata (NFA) NFAs can be efficiently executed on FPGAs [4,5] Regular expression: (ab+|ba+)c Input: S0 start S1 S2 S3 S4 * a b b b a a c S0 start S1 S2 S3 S4 S5 a b a

¬a∧¬b

b b a

¬a∧¬b

b

¬a∧¬b∧¬c

a c a

¬a∧¬b∧¬c

b c S0 S0

[4] R. Sidhu, V. Prasanna, Fast regular expression matching using FPGAs, FCCM’01 [5] L. Woods, J. Teubner, Complex event detection at wire speed with FPGAs, VLDB’10

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 8 / 32

slide-17
SLIDE 17

Regular Expression in Hardware

Regex can be mapped to a Non-deterministic finite automata (NFA) NFAs can be efficiently executed on FPGAs [4,5] Regular expression: (ab+|ba+)c Input: a S0 start S1 S2 S3 S4 * a b b b a a c S0 start S1 S2 S3 S4 S5 a b a

¬a∧¬b

b b a

¬a∧¬b

b

¬a∧¬b∧¬c

a c a

¬a∧¬b∧¬c

b c S0 S1 S1

[4] R. Sidhu, V. Prasanna, Fast regular expression matching using FPGAs, FCCM’01 [5] L. Woods, J. Teubner, Complex event detection at wire speed with FPGAs, VLDB’10

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 8 / 32

slide-18
SLIDE 18

Regular Expression in Hardware

Regex can be mapped to a Non-deterministic finite automata (NFA) NFAs can be efficiently executed on FPGAs [4,5] Regular expression: (ab+|ba+)c Input: ad S0 start S1 S2 S3 S4 * a b b b a a c S0 start S1 S2 S3 S4 S5 a b a

¬a∧¬b

b b a

¬a∧¬b

b

¬a∧¬b∧¬c

a c a

¬a∧¬b∧¬c

b c S0 S0

[4] R. Sidhu, V. Prasanna, Fast regular expression matching using FPGAs, FCCM’01 [5] L. Woods, J. Teubner, Complex event detection at wire speed with FPGAs, VLDB’10

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 8 / 32

slide-19
SLIDE 19

Regular Expression in Hardware

Regex can be mapped to a Non-deterministic finite automata (NFA) NFAs can be efficiently executed on FPGAs [4,5] Regular expression: (ab+|ba+)c Input: ada S0 start S1 S2 S3 S4 * a b b b a a c S0 start S1 S2 S3 S4 S5 a b a

¬a∧¬b

b b a

¬a∧¬b

b

¬a∧¬b∧¬c

a c a

¬a∧¬b∧¬c

b c S0 S1 S1

[4] R. Sidhu, V. Prasanna, Fast regular expression matching using FPGAs, FCCM’01 [5] L. Woods, J. Teubner, Complex event detection at wire speed with FPGAs, VLDB’10

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 8 / 32

slide-20
SLIDE 20

Regular Expression in Hardware

Regex can be mapped to a Non-deterministic finite automata (NFA) NFAs can be efficiently executed on FPGAs [4,5] Regular expression: (ab+|ba+)c Input: adab S0 start S1 S2 S3 S4 * a b b b a a c S0 start S1 S2 S3 S4 S5 a b a

¬a∧¬b

b b a

¬a∧¬b

b

¬a∧¬b∧¬c

a c a

¬a∧¬b∧¬c

b c S0 S1 S3 S3

[4] R. Sidhu, V. Prasanna, Fast regular expression matching using FPGAs, FCCM’01 [5] L. Woods, J. Teubner, Complex event detection at wire speed with FPGAs, VLDB’10

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 8 / 32

slide-21
SLIDE 21

Regular Expression in Hardware

Regex can be mapped to a Non-deterministic finite automata (NFA) NFAs can be efficiently executed on FPGAs [4,5] Regular expression: (ab+|ba+)c Input: adabb S0 start S1 S2 S3 S4 * a b b b a a c S0 start S1 S2 S3 S4 S5 a b a

¬a∧¬b

b b a

¬a∧¬b

b

¬a∧¬b∧¬c

a c a

¬a∧¬b∧¬c

b c S0 S1 S3 S3

[4] R. Sidhu, V. Prasanna, Fast regular expression matching using FPGAs, FCCM’01 [5] L. Woods, J. Teubner, Complex event detection at wire speed with FPGAs, VLDB’10

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 8 / 32

slide-22
SLIDE 22

Regular Expression in Hardware

Regex can be mapped to a Non-deterministic finite automata (NFA) NFAs can be efficiently executed on FPGAs [4,5] Regular expression: (ab+|ba+)c Input: adabbb S0 start S1 S2 S3 S4 * a b b b a a c S0 start S1 S2 S3 S4 S5 a b a

¬a∧¬b

b b a

¬a∧¬b

b

¬a∧¬b∧¬c

a c a

¬a∧¬b∧¬c

b c S0 S1 S3 S3

[4] R. Sidhu, V. Prasanna, Fast regular expression matching using FPGAs, FCCM’01 [5] L. Woods, J. Teubner, Complex event detection at wire speed with FPGAs, VLDB’10

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 8 / 32

slide-23
SLIDE 23

Regular Expression in Hardware

Regex can be mapped to a Non-deterministic finite automata (NFA) NFAs can be efficiently executed on FPGAs [4,5] Regular expression: (ab+|ba+)c Input: adabbbc S0 start S1 S2 S3 S4 * a b b b a a c S0 start S1 S2 S3 S4 S5 a b a

¬a∧¬b

b b a

¬a∧¬b

b

¬a∧¬b∧¬c

a c a

¬a∧¬b∧¬c

b c S0 S4 S5

[4] R. Sidhu, V. Prasanna, Fast regular expression matching using FPGAs, FCCM’01 [5] L. Woods, J. Teubner, Complex event detection at wire speed with FPGAs, VLDB’10

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 8 / 32

slide-24
SLIDE 24

Complexity vs Hardware resources

Regular expression: SIGMOD.*(Chicago|Raleigh)

S0

start

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19

S I G M O D * C h i c a g

  • R

a l e i g h Resource usage and routing are a crucial factors in FPGA development FPGA resource usage grows with regular expression complexity If the NFA becomes too large routing/connecting its resources might not be possible ⇒ Compress the NFA

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 9 / 32

slide-25
SLIDE 25

NFA compression

Regular expression: SIGMOD.*(Chicago|Raleigh)

S0

start

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19

S I G M O D * C h i c a g

  • R

a l e i g h

[6] J. Teubner, L. Woods, Skeleton automata for FPGAs: reconfiguring without reconstructing, SIGMOD’12

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 10 / 32

slide-26
SLIDE 26

NFA compression

Regular expression: SIGMOD.*(Chicago|Raleigh)

S0

start

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19

S I G M O D * C h i c a g

  • R

a l e i g h

Extracted sequences:

  • SIGMOD
  • Chicago
  • Raleigh

[6] J. Teubner, L. Woods, Skeleton automata for FPGAs: reconfiguring without reconstructing, SIGMOD’12

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 10 / 32

slide-27
SLIDE 27

NFA compression

Regular expression: SIGMOD.*(Chicago|Raleigh)

S0

start

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19

S I G M O D * C h i c a g

  • R

a l e i g h

Extracted sequences:

  • SIGMOD
  • Chicago
  • Raleigh

S0

start

S1 S2

SIGMOD * Chicago Raleigh

[6] J. Teubner, L. Woods, Skeleton automata for FPGAs: reconfiguring without reconstructing, SIGMOD’12

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 10 / 32

slide-28
SLIDE 28

NFA compression

Regular expression: SIGMOD.*(Chicago|Raleigh)

S0

start

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19

S I G M O D * C h i c a g

  • R

a l e i g h

Extracted sequences:

  • SIGMOD
  • Chicago
  • Raleigh

S0

start

S1 S2

SIGMOD * Chicago Raleigh Decouple character encoding from state transitions in NFA [6]

[6] J. Teubner, L. Woods, Skeleton automata for FPGAs: reconfiguring without reconstructing, SIGMOD’12

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 10 / 32

slide-29
SLIDE 29

Character Encoder

’S’ ’I’ ’G’ ’M’ ’O’ ’D’ ’C’ . . . S0 S1 S2 Character Encoder Input character

Enables compression of NFA by chaining characters into sequences Can check for ranges by comparing upper and lower value Can support case-insensitivity or collations (e.g., a, ae, ¨ a)

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 11 / 32

slide-30
SLIDE 30

Character Encoder

’S’ ’I’ ’G’ ’M’ ’O’ ’D’ ’C’ . . . S0 S1 S2 Character Encoder Input character

Enables compression of NFA by chaining characters into sequences Can check for ranges by comparing upper and lower value Can support case-insensitivity or collations (e.g., a, ae, ¨ a) Character Encoder can be parametrized at runtime.

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 11 / 32

slide-31
SLIDE 31

Runtime parametrization

Regular expression: (a|b).*c S0 S1 S4 a b c *

C1 C2 C3 C4 ’a’ ’b’ ’c’ C1 C2 C3 C4 S1 1 1 S2 S3 1 S4 S1 S2 S3 S4 S1 1 S2 S3 S4 1

C1 C2 C3 C4 S1 S2 S3 S4 Encoder State Graph (fully connected) Input character Characters Triggers State Transitions

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 12 / 32

slide-32
SLIDE 32

Runtime parametrization

Regular expression: (a|b).*c S0 S1 S4 a b c *

C1 C2 C3 C4 ’a’ ’b’ ’c’ C1 C2 C3 C4 S1 1 1 S2 S3 1 S4 S1 S2 S3 S4 S1 1 S2 S3 S4 1

C1 C2 C3 C4 S1 S2 S3 S4 Encoder State Graph (fully connected) Input character Characters Triggers State Transitions

’a’ ’b’ ’c’

C1 C2 C3

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 12 / 32

slide-33
SLIDE 33

Runtime parametrization

Regular expression: (a|b).*c S0 S1 S4 a b c *

C1 C2 C3 C4 ’a’ ’b’ ’c’ C1 C2 C3 C4 S1 1 1 S2 S3 1 S4 S1 S2 S3 S4 S1 1 S2 S3 S4 1

C1 C2 C3 C4 S1 S2 S3 S4 Encoder State Graph (fully connected) Input character Characters Triggers State Transitions

’a’ ’b’ ’c’

C1 C2 C3

1 1 1 Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 12 / 32

slide-34
SLIDE 34

Runtime parametrization

Regular expression: (a|b).*c S0 S1 S4 a b c *

C1 C2 C3 C4 ’a’ ’b’ ’c’ C1 C2 C3 C4 S1 1 1 S2 S3 1 S4 S1 S2 S3 S4 S1 1 S2 S3 S4 1

C1 C2 C3 C4 S1 S2 S3 S4 Encoder State Graph (fully connected) Input character Characters Triggers State Transitions

’a’ ’b’ ’c’

C1 C2 C3

1 1 1 1 1 Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 12 / 32

slide-35
SLIDE 35

Configuration vector

C1 C2 C3 C4 ’a’ ’b’ ’c’ C1 C2 C3 C4 S1 1 1 S2 S3 1 S4 S1 S2 S3 S4 S1 1 S2 S3 S4 1

Characters Triggers State Transitions

’a’ ’b’ ’c’ 1 1 1 1 1 0x61 0x62 0x63 0x00 0xC0 0x32 0x80 0x08 . . .

Configuration vector Parametrization

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 13 / 32

slide-36
SLIDE 36

Configuration vector

C1 C2 C3 C4 ’a’ ’b’ ’c’ C1 C2 C3 C4 S1 1 1 S2 S3 1 S4 S1 S2 S3 S4 S1 1 S2 S3 S4 1

Characters Triggers State Transitions

’a’ ’b’ ’c’ 1 1 1 1 1 0x61 0x62 0x63 0x00 0xC0 0x32 0x80 0x08 . . .

Configuration vector Parametrization

Configuration of Regex Engine takes only 2 cycles.

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 13 / 32

slide-37
SLIDE 37

Assembly of a Regex Engine

Input Fifo Input Fifo . . . Input Fifo NFA NFA . . . NFA Result Fifo Result Fifo . . . Result Fifo String router Result Merger

Input JohnSmit h123Barb ara0Alex

JohnSmith123 Barbara0 Match, pos=12 No Match Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 14 / 32

slide-38
SLIDE 38

Integration into Database

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 15 / 32

slide-39
SLIDE 39

Integration into MonetDB

Column store Simple data layout Minimize memory bandwidth overhead UDF can operate on columns Strings are stored in a heap · 1 · 2 · 3 · 4 · 5 · 6 · 7 ·

OID

  • ffset

meta John Doe, Stras..

pad

me ta Hans..M

pad

meta A nna Zuricher, stra..

pad

Column String Heap

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 16 / 32

slide-40
SLIDE 40

System Overview

MonetDB

UDF

Regex Eng 3 Regex Eng 4 Regex Eng 1 Regex Eng 2

Job Dist.

Centaur Centaur

MonetDB columns Job Queue Parameters Status

CPU CPU-FPGA Shared Memory FPGA [7] M. Owaida, D. Sidler, Centaur: A Framework for Hybrid CPU-FPGA Databases, FCCM’17

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 17 / 32

slide-41
SLIDE 41

System Overview

MonetDB

UDF

Regex Eng 3 Regex Eng 4 Regex Eng 1 Regex Eng 2

Job Dist.

Centaur Centaur

MonetDB columns Job Queue Parameters Status

CPU CPU-FPGA Shared Memory FPGA

Database ex- tended with a HUDF

[7] M. Owaida, D. Sidler, Centaur: A Framework for Hybrid CPU-FPGA Databases, FCCM’17

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 17 / 32

slide-42
SLIDE 42

System Overview

MonetDB

UDF

Regex Eng 3 Regex Eng 4 Regex Eng 1 Regex Eng 2

Job Dist.

Centaur Centaur

MonetDB columns Job Queue Parameters Status

CPU CPU-FPGA Shared Memory FPGA

Database ex- tended with a HUDF Centaur [7] bridges the gap between the database and the hardware operators Centaur [7] bridges the gap between the database and the hardware operators

[7] M. Owaida, D. Sidler, Centaur: A Framework for Hybrid CPU-FPGA Databases, FCCM’17

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 17 / 32

slide-43
SLIDE 43

System Overview

MonetDB

UDF

Regex Eng 3 Regex Eng 4 Regex Eng 1 Regex Eng 2

Job Dist.

Centaur Centaur

MonetDB columns Job Queue Parameters Status

CPU CPU-FPGA Shared Memory FPGA

Database ex- tended with a HUDF Centaur [7] bridges the gap between the database and the hardware operators Centaur [7] bridges the gap between the database and the hardware operators Each engine can process at 6.4 GB/s

[7] M. Owaida, D. Sidler, Centaur: A Framework for Hybrid CPU-FPGA Databases, FCCM’17

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 17 / 32

slide-44
SLIDE 44

Execution Walkthrough

MonetDB

UDF

Regex Eng 3 Regex Eng 4 Regex Eng 1 Regex Eng 2

Job Dist.

Centaur Centaur User Query

MonetDB columns Result columns Job Queue Parameters Status

1 CPU CPU-FPGA Shared Memory FPGA 1 Query containing regular expression is submitted

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 18 / 32

slide-45
SLIDE 45

Execution Walkthrough

MonetDB

UDF

Regex Eng 3 Regex Eng 4 Regex Eng 1 Regex Eng 2

Job Dist.

Centaur Centaur User Query

MonetDB columns Result columns Job Queue Parameters Status

1 2 CPU CPU-FPGA Shared Memory FPGA 1 Query containing regular expression is submitted 2 MonetDB calls the Hardware UDF

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 18 / 32

slide-46
SLIDE 46

Execution Walkthrough

MonetDB

UDF

Regex Eng 3 Regex Eng 4 Regex Eng 1 Regex Eng 2

Job Dist.

Centaur Centaur User Query

MonetDB columns Result columns Job Queue Parameters Status

1 2 3 3 CPU CPU-FPGA Shared Memory FPGA 1 Query containing regular expression is submitted 2 MonetDB calls the Hardware UDF 3 UDF converts the regular expression into a

configuration vector and allocates the result column

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 18 / 32

slide-47
SLIDE 47

Execution Walkthrough

MonetDB

UDF

Regex Eng 3 Regex Eng 4 Regex Eng 1 Regex Eng 2

Job Dist.

Centaur Centaur User Query

MonetDB columns Result columns Job Queue Parameters Status

1 2 3 3 4 4 CPU CPU-FPGA Shared Memory FPGA 4 Centaur allocates memory for the job parameters and

job status

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 18 / 32

slide-48
SLIDE 48

Execution Walkthrough

MonetDB

UDF

Regex Eng 3 Regex Eng 4 Regex Eng 1 Regex Eng 2

Job Dist.

Centaur Centaur User Query

MonetDB columns Result columns Job Queue Parameters Status

1 2 3 3 5 4 4 CPU CPU-FPGA Shared Memory FPGA 4 Centaur allocates memory for the job parameters and

job status

5 Job is enqueued into job queue

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 18 / 32

slide-49
SLIDE 49

Execution Walkthrough

MonetDB

UDF

Regex Eng 3 Regex Eng 4 Regex Eng 1 Regex Eng 2

Job Dist.

Centaur Centaur User Query

MonetDB columns Result columns Job Queue Parameters Status

1 2 3 3 5 4 4 6 6 CPU CPU-FPGA Shared Memory FPGA 4 Centaur allocates memory for the job parameters and

job status

5 Job is enqueued into job queue 6 Job Distributor fetches the job from the job queue and

assigns it to an idle Regex Engine

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 18 / 32

slide-50
SLIDE 50

Execution Walkthrough

MonetDB

UDF

Regex Eng 3 Regex Eng 4 Regex Eng 1 Regex Eng 2

Job Dist.

Centaur Centaur User Query

MonetDB columns Result columns Job Queue Parameters Status

1 2 3 3 5 4 4 6 6 7 CPU CPU-FPGA Shared Memory FPGA 7 Regex Engine reads parameters from shared memory,

configures itself with the configuration vector and starts execution

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 18 / 32

slide-51
SLIDE 51

Execution Walkthrough

MonetDB

UDF

Regex Eng 3 Regex Eng 4 Regex Eng 1 Regex Eng 2

Job Dist.

Centaur Centaur User Query

MonetDB columns Result columns Job Queue Parameters Status

1 2 3 3 5 4 4 6 6 7 8 CPU CPU-FPGA Shared Memory FPGA 7 Regex Engine reads parameters from shared memory,

configures itself with the configuration vector and starts execution

8 After termination the done bit is set

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 18 / 32

slide-52
SLIDE 52

Execution Walkthrough

MonetDB

UDF

Regex Eng 3 Regex Eng 4 Regex Eng 1 Regex Eng 2

Job Dist.

Centaur Centaur User Query

MonetDB columns Result columns Job Queue Parameters Status

1 2 3 3 5 4 4 6 6 7 8 9 CPU CPU-FPGA Shared Memory FPGA 7 Regex Engine reads parameters from shared memory,

configures itself with the configuration vector and starts execution

8 After termination the done bit is set 9 UDF waits on the done bit and then hands the result

column over to MonetDB

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 18 / 32

slide-53
SLIDE 53

Evaluation

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 19 / 32

slide-54
SLIDE 54

Evaluation - Queries

Q1 : SELECT count (∗) FROM a d d r e s s t a b l e WHERE a d d r e s s s t r i n g LIKE ’%S t r a s s e%’ ; Q2 : SELECT count (∗) FROM a d d r e s s t a b l e WHERE REGEXP LIKE( a d d r e s s s t r i n g , ’ ( S t r a s s e | Str \ . ) .∗(8[0 −9]{4}) ’ ; Q3 : SELECT count (∗) FROM a d d r e s s t a b l e WHERE REGEXP LIKE( a d d r e s s s t r i n g , ’ [0 −9]+(USD|EUR|GBP) ’ ) ; Q4 : SELECT count (∗) FROM a d d r e s s t a b l e WHERE REGEXP LIKE( a d d r e s s s t r i n g , ’ [A−Za−z ]{3}\:[0 −9]{4} ’ ) ;

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 20 / 32

slide-55
SLIDE 55

Evaluation - Microbenchmark

Q1 Q2 Q3 Q4 0.05 0.1 0.15 0.2 0.25 Time [ms]

Database UDF (software part)

  • Config. Gen.

Centaur Hardware Processing

Query over a small relation with 10 K tuples

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 21 / 32

slide-56
SLIDE 56

Evaluation - Throughput

1 2 3 4 5 6 7 8 9 10 100 101 102 Q1 Q2 Q3 Q4 Q1,2,3,4 Number of Clients Queries/s

MonetDB FPGA

1 2 3 4 5 6 7 8 9 10 10−2 10−1 100 101 Q1 Q2 Q3 Q4 Q1,2,3,4 Number of Clients Queries/s

DBx FPGA

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 22 / 32

slide-57
SLIDE 57

Evaluation - TPC-H Q13

Original Case-Insensitive 10 20 30 40 Response Time [s]

MonetDB FPGA

Scaling factor set to 0.1 due to limited memory space

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 23 / 32

slide-58
SLIDE 58

Comparison to Accelerators

GPU [1] GPU [2] Xeon Phi [3] Our work Regex evaluation No Yes Yes Yes Complexity indp. perf. Yes No No Yes TP - local data [GB/s] 60-70 10-15 30-40 25.6* TP - host data [GB/s] – 1-5 – 6.4 fast GDDR fast GDDR 60-70 cores, specialized core, Architecture memory memory GDDR5 memory direct memory access

* Without the memory bandwidth limitation [1] E. Sitaridi, K. Ross, GPU-Accelerated string matching for database applications, VLDB Journal, Oct. 2016 [2] C.-H. Lin, et al., Accelerating regular expression matching using hierarchical parallel machines on GPU, GLOBECOM’11 [3] E. Sitaridi, O. Polychroniou, K. Ross, SIMD-Accelerated regular expression matching, DAMON’16

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 24 / 32

slide-59
SLIDE 59

Comparison to Accelerators

GPU [1] GPU [2] Xeon Phi [3] Our work Regex evaluation No Yes Yes Yes Complexity indp. perf. Yes No No Yes TP - local data [GB/s] 60-70 10-15 30-40 25.6* TP - host data [GB/s] – 1-5 – 6.4 fast GDDR fast GDDR 60-70 cores, specialized core, Architecture memory memory GDDR5 memory direct memory access

* Without the memory bandwidth limitation [1] E. Sitaridi, K. Ross, GPU-Accelerated string matching for database applications, VLDB Journal, Oct. 2016 [2] C.-H. Lin, et al., Accelerating regular expression matching using hierarchical parallel machines on GPU, GLOBECOM’11 [3] E. Sitaridi, O. Polychroniou, K. Ross, SIMD-Accelerated regular expression matching, DAMON’16

Your next CPU might come with an FPGA!

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 24 / 32

slide-60
SLIDE 60

Visit our Demo!

More Information: systems.ethz.ch/fpga/db acceleration Code on GitHub: github.com/fpgasystems/dobbiodb

Systems Group, Dept. of Computer Science, ETH Z¨ urich SIGMOD 2017 May 16, 2017 25 / 32