SIMD Vectorized Hashing for Grouped Aggregation Bala Gurumurthy, - - PowerPoint PPT Presentation

simd vectorized hashing for grouped aggregation
SMART_READER_LITE
LIVE PREVIEW

SIMD Vectorized Hashing for Grouped Aggregation Bala Gurumurthy, - - PowerPoint PPT Presentation

OVGU Prsentation 16.05.2017 1 SIMD Vectorized Hashing for Grouped Aggregation Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake 1 SIMD vectorized Hashing for Grouped Aggregation 04.09.2018 Bala


slide-1
SLIDE 1

1 16.05.2017 OVGU Präsentation

SIMD Vectorized Hashing for Grouped Aggregation

1 Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

slide-2
SLIDE 2

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Grouped Aggregation

2

  • Commonly-used and time-consuming operation

Based on analysis by Boncz et al.[1]

slide-3
SLIDE 3

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Grouped Aggregation

2

  • Commonly-used and time-consuming operation
  • All input must be consumed for single output

Based on analysis by Boncz et al.[1]

slide-4
SLIDE 4

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Grouped Aggregation

2

  • Commonly-used and time-consuming operation
  • All input must be consumed for single output
  • Faster input processing = higher throughput

Based on analysis by Boncz et al.[1]

slide-5
SLIDE 5

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Grouped Aggregation

2

  • Commonly-used and time-consuming operation
  • All input must be consumed for single output
  • Faster input processing = higher throughput
  • Improving underlying technique improves efficiency

Based on analysis by Boncz et al.[1]

slide-6
SLIDE 6

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Capability in Modern Processors

3

  • SIMD – Single Instruction Multiple Data
slide-7
SLIDE 7

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Capability in Modern Processors

  • SIMD – Single Instruction Multiple Data
  • Allows vectorized execution in modern processors
  • Reduces overall execution time of an operation

3

slide-8
SLIDE 8

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Capability in Modern Processors

  • SIMD – Single Instruction Multiple Data
  • Allows vectorized execution in modern processors
  • Reduces overall execution time of an operation
  • SIMD is shown to increase throughput in orders of magnitude for

DBMS operation [2] [3]

SIMD accelerated selection [3] 3

slide-9
SLIDE 9

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD for Grouped Aggregation

  • SIMD acceleration of hashing techniques improves throughput

4 + Group-By = High throughput

slide-10
SLIDE 10

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD for Grouped Aggregation

  • SIMD acceleration of hashing techniques improves throughput

How to incorporate SIMD for Grouped Aggregation? What is the impact of SIMD? + Group-By = High throughput 4

slide-11
SLIDE 11

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Hash Based Aggregation

  • Grouped aggregation commonly implemented using hashing

techniques

5

slide-12
SLIDE 12

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Hash Based Aggregation

  • Grouped aggregation commonly implemented using hashing

techniques

5

slide-13
SLIDE 13

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Hash Based Aggregation

  • Grouped aggregation commonly implemented using hashing

techniques

  • Separates groups into buckets

5

slide-14
SLIDE 14

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Hash Based Aggregation

  • Grouped aggregation commonly implemented using hashing

techniques

  • Separates groups into buckets
  • Aggregation done within each buckets

5

slide-15
SLIDE 15

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Hash Based Aggregation: Example

Key Aggregate

h(x)

Hash function

6

slide-16
SLIDE 16

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Hash Based Aggregation: Example

Input: 3 h(x) 6

slide-17
SLIDE 17

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Hash Based Aggregation: Example

Input: 3 h(x)

3

6

slide-18
SLIDE 18

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Hash Based Aggregation: Example

Input: 3 h(x) 6

slide-19
SLIDE 19

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Hash Based Aggregation: Example

Input: 3 h(x)

3

6

slide-20
SLIDE 20

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Hash Based Aggregation: Example

Input: 3 h(x) 6

slide-21
SLIDE 21

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Collision: Increasing Complexity

  • Not all keys have unique location

h(x) 7

slide-22
SLIDE 22

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Collision: Increasing Complexity

  • Not all keys have unique location
  • Two keys might hash to same slots

h(x)

1 11

7

slide-23
SLIDE 23

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Collision: Increasing Complexity

  • Not all keys have unique location
  • Two keys might hash to same slots

h(x)

11

7

slide-24
SLIDE 24

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Collision: Increasing Complexity

  • Not all keys have unique location
  • Two keys might hash to same slots
  • Hash table must be probed for alternative location

h(x)

11

7

slide-25
SLIDE 25

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Collision: Increasing Complexity

  • Not all keys have unique location
  • Two keys might hash to same slots
  • Hash table must be probed for alternative location

h(x)

11

# of probes : 4 7

slide-26
SLIDE 26

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Collision: Increasing Complexity

  • Not all keys have unique location
  • Two keys might hash to same slots
  • Hash table must be probed for alternative location

h(x) Probing is time consuming 7

slide-27
SLIDE 27

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD for Probing

  • Multiple slots are probed in an instant using SIMD

h(x) 7

slide-28
SLIDE 28

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD for Probing

h(x) # of probes : 1

  • Multiple slots are probed in an instant using SIMD
  • Reduces overall number of probes

7

slide-29
SLIDE 29

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Accelerated Hash Probing

  • Each hashing techniques have their own collision resolution mechanism

8

slide-30
SLIDE 30

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Accelerated Hash Probing

  • Each hashing techniques have their own collision resolution mechanism
  • We use open-addressing hashing techniques

8

slide-31
SLIDE 31

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Accelerated Hash Probing

  • Each hashing techniques have their own collision resolution mechanism
  • We use open-addressing hashing techniques
  • Have constant hashtable size

8

slide-32
SLIDE 32

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Accelerated Hash Probing

  • Each hashing techniques have their own collision resolution mechanism
  • We use open-addressing hashing techniques
  • Have constant hashtable size
  • Suitable for SIMD

8

slide-33
SLIDE 33

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Accelerated Hash Probing

  • Each hashing techniques have their own collision resolution mechanism
  • We use open-addressing hashing techniques
  • Have constant hashtable size
  • Suitable for SIMD

Hashing techniques used are ➢ Cuckoo hashing ➢ Linear probing ➢ Two-choice hashing ➢ Hopscotch hashing 8

slide-34
SLIDE 34

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Cuckoo Hashing

  • Stores keys in multiple hash tables

9

slide-35
SLIDE 35

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Cuckoo Hashing

  • Stores keys in multiple hash tables
  • On collision swaps current value to alternative tables

Input : 1 9

slide-36
SLIDE 36

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Cuckoo Hashing

  • Stores keys in multiple hash tables
  • On collision swaps current value to alternative tables
  • Might form swap loop; solved using a threshold

Input : 111 9

slide-37
SLIDE 37

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Cuckoo Hashing

  • Stores keys in multiple hash tables
  • On collision swaps current value to alternative tables
  • Might form swap loop; solved using a threshold
  • Has constant look-up time

Input : 1 9

slide-38
SLIDE 38

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Grouped Aggregation using Cuckoo Hashing

  • Probe for key in each hash table

9

slide-39
SLIDE 39

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Grouped Aggregation using Cuckoo Hashing

  • Probe for key in each hash table
  • If found update aggregate

9

slide-40
SLIDE 40

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Grouped Aggregation using Cuckoo Hashing

  • Probe for key in each hash table
  • If found update aggregate
  • Else, insert the key

9

slide-41
SLIDE 41

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Optimization of Cuckoo Hashing

  • Hash function computed for multiple tables

9

slide-42
SLIDE 42

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Optimization of Cuckoo Hashing

  • Hash function computed for multiple tables
  • Multiple slots probed in parallel

9

slide-43
SLIDE 43

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Linear Probing

  • Straight forward approach for collision resolution

10

slide-44
SLIDE 44

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Linear Probing

  • Straight forward approach for collision resolution
  • Probes hash table linearly for alternative location

10

slide-45
SLIDE 45

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Linear Probing

  • Straight forward approach for collision resolution
  • Probes hash table linearly for alternative location
  • Empty location encountered : insert key

10

slide-46
SLIDE 46

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Optimization of Linear Probing

  • Multiple slot is probed

10

slide-47
SLIDE 47

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Optimization of Linear Probing

  • Multiple slot is probed
  • Comparison mask used for updating aggregate

10

slide-48
SLIDE 48

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Two-choice Hashing

11 h1(x)

11

h1(x)

  • Improvement on linear probing
slide-49
SLIDE 49

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Two-choice Hashing

h1(x)

11

h1(x)

  • Improvement on linear probing
  • Two hashing function for same hash table (can be more)

11

slide-50
SLIDE 50

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Two-choice Hashing

  • Improvement on linear probing
  • Two hashing function for same hash table (can be more)
  • If all slots occupied, probing done from both slots

h1(x)

11

h1(x) 11

slide-51
SLIDE 51

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Optimization of Two-Choice Hashing

  • SIMD hash function from cuckoo hashing

11

slide-52
SLIDE 52

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Optimization of Two-Choice Hashing

  • SIMD hash function from cuckoo hashing
  • SIMD probing from linear probing

11

slide-53
SLIDE 53

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Hopscotch Hashing

  • Has a limited probe length known as neighborhood

Neighborhood : 3

x y z 12

slide-54
SLIDE 54

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Hopscotch Hashing

  • Has a limited probe length known as neighborhood
  • A key will be available within neighborhood distance from hash slot

Neighborhood : 3 key

x y z 12

slide-55
SLIDE 55

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Hopscotch Hashing

  • Has a limited probe length known as neighborhood
  • A key will be available within neighborhood distance from hash slot
  • If no slot available within neighborhood, table is rearranged by swapping keys

Neighborhood : 3

x (0) y (1) z (2)

Input : New key Key (0)

12

slide-56
SLIDE 56

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Hopscotch Hashing

  • Has a limited probe length known as neighborhood
  • A key will be available within neighborhood distance from hash slot
  • If no slot available within neighborhood, table is rearranged by swapping keys

Neighborhood : 3

x (0) y (1) z (2)

Input : New key Key (0)

12

slide-57
SLIDE 57

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Hopscotch Hashing

  • Has a limited probe length known as neighborhood
  • A key will be available within neighborhood distance from hash slot
  • If no slot available within neighborhood, table is rearranged by swapping keys

Neighborhood : 3

x (0) y (1)

Input : New key

z (2)

Key (0)

12

slide-58
SLIDE 58

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Hopscotch Hashing

  • Has a limited probe length known as neighborhood
  • A key will be available within neighborhood distance from hash slot
  • If no slot available within neighborhood, table is rearranged by swapping keys

Neighborhood : 3 key New key

x y z 12

slide-59
SLIDE 59

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Optimization of Hopscotch Hashing

  • SIMD for Probing
  • Uses SIMD linear probing

13

slide-60
SLIDE 60

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Optimization of Hopscotch Hashing

  • SIMD for Probing
  • Uses SIMD linear probing
  • SIMD for Insertion
  • Swapping multiple values in an instant

13

slide-61
SLIDE 61

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Optimization of Hopscotch Hashing

  • SIMD for Probing
  • Uses SIMD linear probing
  • SIMD for Insertion
  • Swapping multiple values in an instant

13

slide-62
SLIDE 62

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Optimization of Hopscotch Hashing

  • SIMD for Probing
  • Uses SIMD linear probing
  • SIMD for Insertion
  • Swapping multiple values in an instant
  • Collect values using SIMD Gather

13

slide-63
SLIDE 63

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Optimization of Hopscotch Hashing

  • SIMD for Probing
  • Uses SIMD linear probing
  • SIMD for Insertion
  • Swapping multiple values in an instant
  • Collect values using SIMD Gather
  • Swap them

13

slide-64
SLIDE 64

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Optimization of Hopscotch Hashing

  • SIMD for Probing
  • Uses SIMD linear probing
  • SIMD for Insertion
  • Swapping multiple values in an instant
  • Collect values using SIMD Gather
  • Swap them
  • Store back

13

slide-65
SLIDE 65

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Evaluation Setup

  • Environment
  • Processor : Intel Octa Core Xeon E5-2630
  • OS: Linux
  • SIMD : AVX2 instruction set
  • Grouped Aggregation
  • Keys : 32 bit integers (0 is not valid)
  • Aggregation : count()
  • Hashing technique
  • Hash function : Multiplicative function – h(x) = Ax%tableSize
  • A => knuth’s number
  • Data Distribution
  • Uniform random
  • Sequential
  • Unique random
  • Moving cluster

14

slide-66
SLIDE 66

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Unique Random Distribution

  • Serial linear probing has worse time
  • Serial hopscotch hashing is efficient
  • Vectorized two-choice hashing

competes with Hopscotch hashing

14

slide-67
SLIDE 67

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Uniform Random Distribution

  • Both serial and vectorized hopscotch

has worse efficiency

  • Vectorized with the worst time
  • Vectorized Two-choice hashing has

best execution time

15

slide-68
SLIDE 68

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Uniform Random Distribution

  • Both serial and vectorized hopscotch

has worse efficiency

  • Vectorized with the worst time
  • Vectorized Two-choice hashing has

best execution time

Other distributions have the same characteristics 15

slide-69
SLIDE 69

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Overall Speed-up

  • SIMD has worse impact on hopsctoch

hashing

  • Nearly 2x slower
  • Due to random access for

insertion

  • Linear probing has highest SIMD

impact

16

slide-70
SLIDE 70

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Summary

  • Hash probing reduces efficiency of grouped aggregation

17

slide-71
SLIDE 71

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Summary

  • Hash probing reduces efficiency of grouped aggregation
  • SIMD accelerated probing not always improves efficiency
  • Specifically, hopscotch hashing has negative impact
  • Linear probing has up to 3.5x speed up due to SIMD

17

slide-72
SLIDE 72

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Summary

  • Hash probing reduces efficiency of grouped aggregation
  • SIMD accelerated probing not always improves efficiency
  • Specifically, hopscotch hashing has negative impact
  • Linear probing has up to 3.5x speed up due to SIMD
  • Hashing technique related parameters also improve the efficiency

17

slide-73
SLIDE 73

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Summary

  • Hash probing reduces efficiency of grouped aggregation
  • SIMD accelerated probing not always improves efficiency
  • Specifically, hopscotch hashing has negative impact
  • Linear probing has up to 3.5x speed up due to SIMD
  • Hashing technique related parameters also improve the efficiency
  • Further, improvement can be extended using SIMD for multiple

insertion

17

slide-74
SLIDE 74

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Questions? Thank You

slide-75
SLIDE 75

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

References

[1] Boncz, P., Neumann, T., & Erling, O. (2014). TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark. Springer International Publishing. [2] Orestis Polychroniou, Arun Raghavan, and Kenneth A. Ross. (2015). Rethinking SIMD Vectorization for In- Memory Databases. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15) [3] Broneske, D., Meister, A., & Saake, G. (2017). Hardware-Sensitive Scan Operator Variants for Compiled Selection Pipelines. BTW

slide-76
SLIDE 76

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Cuckoo Hashing – Table Structure

  • Table Structure

Packed key and payload

Each bucket contain one key, payload pack

  • Pack size = SIMD vector size
  • Ross et al. explored SIMD probing [4]
slide-77
SLIDE 77

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Adapted from [6] 7

SIMD Acclerated Cuckoo Hashing

Hash Function Compute (Multiplicative hashing) Table Probe

slide-78
SLIDE 78

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Search key – K is duplicated

Adapted from [6] 7

SIMD Acclerated Cuckoo Hashing

slide-79
SLIDE 79

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

  • Hashing function
  • Returns two

bucket positions

Adapted from [6] 7

SIMD Acclerated Cuckoo Hashing

slide-80
SLIDE 80

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Acclerated Cuckoo Hashing

  • Probe values in

the table

  • Compare with

search key

Adapted from [6] 7

slide-81
SLIDE 81

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Acclerated Cuckoo Hashing

Comparison result(MASK) are added to payload in the slots

Adapted from [6] 7

slide-82
SLIDE 82

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Acclerated Cuckoo Hashing

If result of all the masks are 0, insertion is performed

Adapted from [6] 7

slide-83
SLIDE 83

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Linear Probing

  • Table Structure

Has SoA (Structure of Array format)

Key and payload in individual array

payload in corresponding key position 8

slide-84
SLIDE 84

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Linear Probing – SIMD Code Optimization

9

slide-85
SLIDE 85

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Linear Probing – SIMD Code Optimization

Search key – K is duplicated

9

slide-86
SLIDE 86

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Linear Probing – SIMD Code Optimization

  • Scalar hash function
  • Table bucket is selected

9

slide-87
SLIDE 87

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Linear Probing – SIMD Code Optimization

Compare slot values with search key

9

slide-88
SLIDE 88

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Linear Probing – SIMD Code Optimization

Add comparison results with payloads

9

slide-89
SLIDE 89

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

Linear Probing – SIMD Code Optimization

Comparison result Equality – return Inequality – Search next slot Empty location – Insert the value

9

slide-90
SLIDE 90

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Accelerated Hopscotch Hashing

  • SIMD for Probing

Uses SIMD linear probing

Probe key within Neighborhood

Probe empty space outside

  • SIMD For Insertion

Starts when empty space is found

slide-91
SLIDE 91

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Accelerated Hopscotch Hashing

  • SIMD for Probing

Uses SIMD linear probing

Probe key within Neighborhood

Probe empty space outside

  • SIMD For Insertion

Starts when empty space is found

Swap previous values until empty space is inside neighborhood

slide-92
SLIDE 92

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Accelerated Hopscotch Hashing

  • SIMD for Probing

Uses SIMD linear probing

Probe key within Neighborhood

Probe empty space outside

  • SIMD For Insertion

Starts when empty space is found

Swap previous values until empty space is inside neighborhood

Swap array holds key to swap Use SIMD gather to collect keys from hash table

slide-93
SLIDE 93

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Accelerated Hopscotch Hashing

  • SIMD for Probing

Uses SIMD linear probing

Probe key within Neighborhood

Probe empty space outside

  • SIMD For Insertion

Starts when empty space is found

Swap previous values until empty space is inside neighborhood

Using SIMD shift to move the keys one step

slide-94
SLIDE 94

04.09.2018

SIMD vectorized Hashing for Grouped Aggregation

Bala Gurumurthy, David Broneske, Marcus Pinnecke, Gabriel Campero Durand and Gunter Saake

SIMD Accelerated Hopscotch Hashing

  • SIMD for Probing

Uses SIMD linear probing

Probe key within Neighborhood

Probe empty space outside

  • SIMD For Insertion

Starts when empty space is found

Swap previous values until empty space is inside neighborhood

Using the position the new values are written back