Hardware-Sensitive Scan Operator Variants for Compiled Selection - - PowerPoint PPT Presentation

hardware sensitive scan operator variants for compiled
SMART_READER_LITE
LIVE PREVIEW

Hardware-Sensitive Scan Operator Variants for Compiled Selection - - PowerPoint PPT Presentation

Hardware-Sensitive Scan Operator Variants for Compiled Selection Pipelines Databases D B and Software S E Engineering David Broneske , Andreas Meister, Gunter Saake University of Magdeburg 1 Introduction Query Compilation sum(A*B)


slide-1
SLIDE 1

Hardware-Sensitive Scan Operator Variants for Compiled Selection Pipelines

David Broneske, Andreas Meister, Gunter Saake

University of Magdeburg

1

D S E B

Databases Software Engineering and

slide-2
SLIDE 2

D S E B

Introduction Query Compilation

2

ɣ sum(A*B) ⋈ lo_orderdate = d_datekey 훔d_year=1993 훔lo_discount …, lo_quantity

Lineorder Dates

slide-3
SLIDE 3

D S E B

Introduction Query Compilation

2

ɣ sum(A*B) ⋈ lo_orderdate = d_datekey 훔d_year=1993 훔lo_discount …, lo_quantity

Lineorder Dates

slide-4
SLIDE 4

D S E B

Introduction Query Compilation

2

ɣ sum(A*B) ⋈ lo_orderdate = d_datekey 훔d_year=1993 훔lo_discount …, lo_quantity

Lineorder Dates

Bandwidth-bound -> compute-bound Possibility for code optimizations

slide-5
SLIDE 5

D S E B

Motivating Examples

3

slide-6
SLIDE 6

D S E B

Motivating Examples

3

1 for(int i = 0; i < input_size; ++i){ 2 if(col[i] < pred) 3 agg+=agg_col[i]; 4 }

  • List. 1: Branching scan for

Branching

slide-7
SLIDE 7

D S E B

Motivating Examples

3

1 for(int i = 0; i < input_size; ++i){ 2 if(col[i] < pred) 3 agg+=agg_col[i]; 4 }

  • List. 1: Branching scan for

Branching

1 for(int i = 0; i < input_size; ++i){ 2 agg+=agg_col[i]∗(col[i] < pred); 3 }

Predicated

slide-8
SLIDE 8

D S E B

Motivating Examples

3

1 for(int i = 0; i < input_size; ++i){ 2 if(col[i] < pred) 3 agg+=agg_col[i]; 4 }

  • List. 1: Branching scan for

Branching

1 for(int i = 0; i < input_size; ++i){ 2 agg+=agg_col[i]∗(col[i] < pred); 3 }

Predicated

1 for(int i = 0; i < simd_size; ++i){ 2 mask= SIMD_COMP(simd_col[i],pred); 3 if(mask){ 4 for (int j=0;j < SIMD_LENGTH;++j){ 5 if((mask >> j) & 1) 6 agg+=agg_col[i]; 7 } 8 } 9 }

SIMD [ZR02]

slide-9
SLIDE 9

D S E B

Motivating Examples

3

Branching Scan SIMD Scan Predicated Scan Predicated SIMD Scan 0.2 0.4 0.6 0.8 1 100 200 300 Selectivity response time in ms a) Single Predicate

1 for(int i = 0; i < input_size; ++i){ 2 if(col[i] < pred) 3 agg+=agg_col[i]; 4 }

  • List. 1: Branching scan for

Branching

1 for(int i = 0; i < input_size; ++i){ 2 agg+=agg_col[i]∗(col[i] < pred); 3 }

Predicated

1 for(int i = 0; i < simd_size; ++i){ 2 mask= SIMD_COMP(simd_col[i],pred); 3 if(mask){ 4 for (int j=0;j < SIMD_LENGTH;++j){ 5 if((mask >> j) & 1) 6 agg+=agg_col[i]; 7 } 8 } 9 }

SIMD [ZR02]

slide-10
SLIDE 10

D S E B

Motivating Examples

4

Branching Scan SIMD Scan Predicated Scan Predicated SIMD Scan 0.2 0.4 0.6 0.8 1 100 200 300 Selectivity response time in ms a) Single Predicate

slide-11
SLIDE 11

D S E B

Motivating Examples

0.2 0.4 0.6 0.8 1 500 1,000 Selectivity b) Query Q1 0.2 0.4 0.6 0.8 1 100 200 300 400 Selectivity c) Query Q6

4

Branching Scan SIMD Scan Predicated Scan Predicated SIMD Scan 0.2 0.4 0.6 0.8 1 100 200 300 Selectivity response time in ms a) Single Predicate

slide-12
SLIDE 12

D S E B

Motivating Examples

0.2 0.4 0.6 0.8 1 500 1,000 Selectivity b) Query Q1 0.2 0.4 0.6 0.8 1 100 200 300 400 Selectivity c) Query Q6

4

8 Aggregates 1 Filter Predicate

Branching Scan SIMD Scan Predicated Scan Predicated SIMD Scan 0.2 0.4 0.6 0.8 1 100 200 300 Selectivity response time in ms a) Single Predicate

slide-13
SLIDE 13

D S E B

Motivating Examples

0.2 0.4 0.6 0.8 1 500 1,000 Selectivity b) Query Q1 0.2 0.4 0.6 0.8 1 100 200 300 400 Selectivity c) Query Q6

4

8 Aggregates 1 Filter Predicate 1 Aggregate 3 Filter Predicates

Branching Scan SIMD Scan Predicated Scan Predicated SIMD Scan 0.2 0.4 0.6 0.8 1 100 200 300 Selectivity response time in ms a) Single Predicate

slide-14
SLIDE 14

D S E B

Motivating Examples

0.2 0.4 0.6 0.8 1 500 1,000 Selectivity b) Query Q1 0.2 0.4 0.6 0.8 1 100 200 300 400 Selectivity c) Query Q6

4

8 Aggregates 1 Filter Predicate 1 Aggregate 3 Filter Predicates

Branching Scan SIMD Scan Predicated Scan Predicated SIMD Scan 0.2 0.4 0.6 0.8 1 100 200 300 Selectivity response time in ms a) Single Predicate

When to use which scan variant?

slide-15
SLIDE 15

D S E B

Evaluation Setup

5

Variants:

Scalar vs. SIMD Branching vs. Predication

Evaluation Criteria

Number of predicates Number of aggregates inside loop

Workload & Machine

TPC-H LineItem table SF 10 Intel Xeon E5- 2630 v3 with SSE4.2

slide-16
SLIDE 16

D S E B

0.5 1 5 10 200 400 Selectivity P1 #

  • f

P r e d i c a t e s Time in ms Branching Scan

Number of Predicates

6

slide-17
SLIDE 17

D S E B

0.5 1 5 10 200 400 Selectivity P1 #

  • f

P r e d i c a t e s Time in ms Branching Scan

Number of Predicates

6

0.5 1 5 10 200 400 Selectivity P1 #

  • f

P r e d i c a t e s Time in ms Branching Scan 0.5 1 5 10 200 400 Selectivity P1 #

  • f

P r e d i c a t e s Time in ms SIMD Scan 200 400 0.5 1 5 10 200 400 Selectivity P1 #

  • f

P r e d i c a t e s Time in ms Predicated Scan 0.5 1 5 10 200 400 Selectivity P1 #

  • f

P r e d i c a t e s Time in ms SIMD Predicated Scan

slide-18
SLIDE 18

D S E B

0.5 1 5 10 200 400 Selectivity P1 #

  • f

P r e d i c a t e s Time in ms Branching Scan

Number of Predicates

6

0.5 1 5 10 200 400 Selectivity P1 #

  • f

P r e d i c a t e s Time in ms Branching Scan 0.5 1 5 10 200 400 Selectivity P1 #

  • f

P r e d i c a t e s Time in ms SIMD Scan 200 400 0.5 1 5 10 200 400 Selectivity P1 #

  • f

P r e d i c a t e s Time in ms Predicated Scan 0.5 1 5 10 200 400 Selectivity P1 #

  • f

P r e d i c a t e s Time in ms SIMD Predicated Scan

For one predicate SIMD does not pay out

Results:

slide-19
SLIDE 19

D S E B

0.5 1 5 10 200 400 Selectivity P1 #

  • f

P r e d i c a t e s Time in ms Branching Scan

Number of Predicates

6

0.5 1 5 10 200 400 Selectivity P1 #

  • f

P r e d i c a t e s Time in ms Branching Scan 0.5 1 5 10 200 400 Selectivity P1 #

  • f

P r e d i c a t e s Time in ms SIMD Scan 200 400 0.5 1 5 10 200 400 Selectivity P1 #

  • f

P r e d i c a t e s Time in ms Predicated Scan 0.5 1 5 10 200 400 Selectivity P1 #

  • f

P r e d i c a t e s Time in ms SIMD Predicated Scan

The more predicates, the better SIMD For one predicate SIMD does not pay out

Results:

slide-20
SLIDE 20

D S E B

Work Inside the Loop

7

0.5 1 0 5 10 250 500 750 Selectivity P1 #

  • f

A g g r e g a t e s Time in ms Branching Scan 0.5 1 0 5 10 250 500 750 Selectivity P1 #

  • f

A g g r e g a t e s Time in ms SIMD Scan 200 400 600 0.5 1 0 5 10 250 500 750 Selectivity P1 #

  • f

A g g r e g a t e s Time in ms Predicated Scan 0.5 1 0 5 10 250 500 750 Selectivity P1 #

  • f

A g g r e g a t e s Time in ms SIMD Predicated Scan

slide-21
SLIDE 21

D S E B

Work Inside the Loop

7

0.5 1 0 5 10 250 500 750 Selectivity P1 #

  • f

A g g r e g a t e s Time in ms Branching Scan 0.5 1 0 5 10 250 500 750 Selectivity P1 #

  • f

A g g r e g a t e s Time in ms SIMD Scan 200 400 600 0.5 1 0 5 10 250 500 750 Selectivity P1 #

  • f

A g g r e g a t e s Time in ms Predicated Scan 0.5 1 0 5 10 250 500 750 Selectivity P1 #

  • f

A g g r e g a t e s Time in ms SIMD Predicated Scan

More aggregates, less impact of branch misprediction

Results:

slide-22
SLIDE 22

D S E B

Work Inside the Loop

7

0.5 1 0 5 10 250 500 750 Selectivity P1 #

  • f

A g g r e g a t e s Time in ms Branching Scan 0.5 1 0 5 10 250 500 750 Selectivity P1 #

  • f

A g g r e g a t e s Time in ms SIMD Scan 200 400 600 0.5 1 0 5 10 250 500 750 Selectivity P1 #

  • f

A g g r e g a t e s Time in ms Predicated Scan 0.5 1 0 5 10 250 500 750 Selectivity P1 #

  • f

A g g r e g a t e s Time in ms SIMD Predicated Scan

The more aggregates, the better branching scans for low selectivity More aggregates, less impact of branch misprediction

Results:

slide-23
SLIDE 23

D S E B

Decision Trees

8

Number of Predicates Number of Aggregates

< 6 >= 6 #aggregates SIMD Branching SIMD Predicated selectivity < 0.1 >= 0.1 selectivity < 0.05 >= 0.05 SIMD Branching SIMD Predicated #predicates selectivity < 0.05 >= 0.05 < 4 >= 4 Branching Scan SIMD Branching #predicates < 2 >=2 Predicated Scan SIMD Predicated

slide-24
SLIDE 24

D S E B

Conclusion

Hash table put / probe (joins, groupings)

9

Pipeline code for filter-&-aggregate pipelines1

1http:/

/git.iti.cs.ovgu.de/dbronesk/BTW-Pipeline-Variants

Future Work

Decision trees as a result of our evaluation in the paper SIMD outperforms scalar variants for several predicates Increasing number of aggregates slows down predicated variants Automatic calibration for query compilation

slide-25
SLIDE 25

D S E B

References

[BBS14] David Broneske, Sebastian Breß, and Gunter Saake. Database Scan Variants on Modern CPUs: A Performance Study. In Proceedings of the 2nd International Workshop on In- Memory Data Management and Analytics (IMDM), Lecture Notes in Computer Science, pages 97–111. Springer, 2014 [ZR02] Jingren Zhou, Kenneth A. Ross: Implementing database operations using SIMD instructions. In: SIGMOD. Pp. 145–156, 2002.

10

slide-26
SLIDE 26

D S E B

Selectivity of Two Predicates

11

selectivity1 Bitwise AND Conditional AND SIMD Predicated selectivity2 < 0.05 >= 0.05 < 0.05 >= 0.05

slide-27
SLIDE 27

D S E B

Selectivity of Two Predicates

12

0.5 1 0.5 1 200 S e l e c t i v i t y P 1 Selectivity P2 Time in ms Conditional AND Scan 0.5 1 0.5 1 200 S e l e c t i v i t y P 1 Selectivity P2 Bitwise AND Scan 100 200 300 0.5 1 0.5 1 200 S e l e c t i v i t y P 1 Selectivity P2 Time in ms SIMD Scan 0.5 1 0.5 1 200 S e l e c t i v i t y P 1 Selectivity P2 Predicated Scan 0.5 1 0.5 1 200 S e l e c t i v i t y P 1 Selectivity P2 SIMD Predicated Scan