Hardware-Sensitive Scan Operator Variants for Compiled Selection Pipelines
David Broneske, Andreas Meister, Gunter Saake
University of Magdeburg
1
D S E B
Databases Software Engineering and
Hardware-Sensitive Scan Operator Variants for Compiled Selection - - PowerPoint PPT Presentation
Hardware-Sensitive Scan Operator Variants for Compiled Selection Pipelines Databases D B and Software S E Engineering David Broneske , Andreas Meister, Gunter Saake University of Magdeburg 1 Introduction Query Compilation sum(A*B)
David Broneske, Andreas Meister, Gunter Saake
University of Magdeburg
1
D S E B
Databases Software Engineering and
D S E B
2
ɣ sum(A*B) ⋈ lo_orderdate = d_datekey 훔d_year=1993 훔lo_discount …, lo_quantity
Lineorder Dates
D S E B
2
ɣ sum(A*B) ⋈ lo_orderdate = d_datekey 훔d_year=1993 훔lo_discount …, lo_quantity
Lineorder Dates
D S E B
2
ɣ sum(A*B) ⋈ lo_orderdate = d_datekey 훔d_year=1993 훔lo_discount …, lo_quantity
Lineorder Dates
Bandwidth-bound -> compute-bound Possibility for code optimizations
D S E B
3
D S E B
3
1 for(int i = 0; i < input_size; ++i){ 2 if(col[i] < pred) 3 agg+=agg_col[i]; 4 }
Branching
D S E B
3
1 for(int i = 0; i < input_size; ++i){ 2 if(col[i] < pred) 3 agg+=agg_col[i]; 4 }
Branching
1 for(int i = 0; i < input_size; ++i){ 2 agg+=agg_col[i]∗(col[i] < pred); 3 }
Predicated
D S E B
3
1 for(int i = 0; i < input_size; ++i){ 2 if(col[i] < pred) 3 agg+=agg_col[i]; 4 }
Branching
1 for(int i = 0; i < input_size; ++i){ 2 agg+=agg_col[i]∗(col[i] < pred); 3 }
Predicated
1 for(int i = 0; i < simd_size; ++i){ 2 mask= SIMD_COMP(simd_col[i],pred); 3 if(mask){ 4 for (int j=0;j < SIMD_LENGTH;++j){ 5 if((mask >> j) & 1) 6 agg+=agg_col[i]; 7 } 8 } 9 }
SIMD [ZR02]
D S E B
3
Branching Scan SIMD Scan Predicated Scan Predicated SIMD Scan 0.2 0.4 0.6 0.8 1 100 200 300 Selectivity response time in ms a) Single Predicate
1 for(int i = 0; i < input_size; ++i){ 2 if(col[i] < pred) 3 agg+=agg_col[i]; 4 }
Branching
1 for(int i = 0; i < input_size; ++i){ 2 agg+=agg_col[i]∗(col[i] < pred); 3 }
Predicated
1 for(int i = 0; i < simd_size; ++i){ 2 mask= SIMD_COMP(simd_col[i],pred); 3 if(mask){ 4 for (int j=0;j < SIMD_LENGTH;++j){ 5 if((mask >> j) & 1) 6 agg+=agg_col[i]; 7 } 8 } 9 }
SIMD [ZR02]
D S E B
4
Branching Scan SIMD Scan Predicated Scan Predicated SIMD Scan 0.2 0.4 0.6 0.8 1 100 200 300 Selectivity response time in ms a) Single Predicate
D S E B
0.2 0.4 0.6 0.8 1 500 1,000 Selectivity b) Query Q1 0.2 0.4 0.6 0.8 1 100 200 300 400 Selectivity c) Query Q6
4
Branching Scan SIMD Scan Predicated Scan Predicated SIMD Scan 0.2 0.4 0.6 0.8 1 100 200 300 Selectivity response time in ms a) Single Predicate
D S E B
0.2 0.4 0.6 0.8 1 500 1,000 Selectivity b) Query Q1 0.2 0.4 0.6 0.8 1 100 200 300 400 Selectivity c) Query Q6
4
Branching Scan SIMD Scan Predicated Scan Predicated SIMD Scan 0.2 0.4 0.6 0.8 1 100 200 300 Selectivity response time in ms a) Single Predicate
D S E B
0.2 0.4 0.6 0.8 1 500 1,000 Selectivity b) Query Q1 0.2 0.4 0.6 0.8 1 100 200 300 400 Selectivity c) Query Q6
4
Branching Scan SIMD Scan Predicated Scan Predicated SIMD Scan 0.2 0.4 0.6 0.8 1 100 200 300 Selectivity response time in ms a) Single Predicate
D S E B
0.2 0.4 0.6 0.8 1 500 1,000 Selectivity b) Query Q1 0.2 0.4 0.6 0.8 1 100 200 300 400 Selectivity c) Query Q6
4
Branching Scan SIMD Scan Predicated Scan Predicated SIMD Scan 0.2 0.4 0.6 0.8 1 100 200 300 Selectivity response time in ms a) Single Predicate
D S E B
5
Scalar vs. SIMD Branching vs. Predication
Number of predicates Number of aggregates inside loop
TPC-H LineItem table SF 10 Intel Xeon E5- 2630 v3 with SSE4.2
D S E B
0.5 1 5 10 200 400 Selectivity P1 #
P r e d i c a t e s Time in ms Branching Scan
6
D S E B
0.5 1 5 10 200 400 Selectivity P1 #
P r e d i c a t e s Time in ms Branching Scan
6
0.5 1 5 10 200 400 Selectivity P1 #
P r e d i c a t e s Time in ms Branching Scan 0.5 1 5 10 200 400 Selectivity P1 #
P r e d i c a t e s Time in ms SIMD Scan 200 400 0.5 1 5 10 200 400 Selectivity P1 #
P r e d i c a t e s Time in ms Predicated Scan 0.5 1 5 10 200 400 Selectivity P1 #
P r e d i c a t e s Time in ms SIMD Predicated Scan
D S E B
0.5 1 5 10 200 400 Selectivity P1 #
P r e d i c a t e s Time in ms Branching Scan
6
0.5 1 5 10 200 400 Selectivity P1 #
P r e d i c a t e s Time in ms Branching Scan 0.5 1 5 10 200 400 Selectivity P1 #
P r e d i c a t e s Time in ms SIMD Scan 200 400 0.5 1 5 10 200 400 Selectivity P1 #
P r e d i c a t e s Time in ms Predicated Scan 0.5 1 5 10 200 400 Selectivity P1 #
P r e d i c a t e s Time in ms SIMD Predicated Scan
For one predicate SIMD does not pay out
D S E B
0.5 1 5 10 200 400 Selectivity P1 #
P r e d i c a t e s Time in ms Branching Scan
6
0.5 1 5 10 200 400 Selectivity P1 #
P r e d i c a t e s Time in ms Branching Scan 0.5 1 5 10 200 400 Selectivity P1 #
P r e d i c a t e s Time in ms SIMD Scan 200 400 0.5 1 5 10 200 400 Selectivity P1 #
P r e d i c a t e s Time in ms Predicated Scan 0.5 1 5 10 200 400 Selectivity P1 #
P r e d i c a t e s Time in ms SIMD Predicated Scan
The more predicates, the better SIMD For one predicate SIMD does not pay out
D S E B
7
0.5 1 0 5 10 250 500 750 Selectivity P1 #
A g g r e g a t e s Time in ms Branching Scan 0.5 1 0 5 10 250 500 750 Selectivity P1 #
A g g r e g a t e s Time in ms SIMD Scan 200 400 600 0.5 1 0 5 10 250 500 750 Selectivity P1 #
A g g r e g a t e s Time in ms Predicated Scan 0.5 1 0 5 10 250 500 750 Selectivity P1 #
A g g r e g a t e s Time in ms SIMD Predicated Scan
D S E B
7
0.5 1 0 5 10 250 500 750 Selectivity P1 #
A g g r e g a t e s Time in ms Branching Scan 0.5 1 0 5 10 250 500 750 Selectivity P1 #
A g g r e g a t e s Time in ms SIMD Scan 200 400 600 0.5 1 0 5 10 250 500 750 Selectivity P1 #
A g g r e g a t e s Time in ms Predicated Scan 0.5 1 0 5 10 250 500 750 Selectivity P1 #
A g g r e g a t e s Time in ms SIMD Predicated Scan
More aggregates, less impact of branch misprediction
D S E B
7
0.5 1 0 5 10 250 500 750 Selectivity P1 #
A g g r e g a t e s Time in ms Branching Scan 0.5 1 0 5 10 250 500 750 Selectivity P1 #
A g g r e g a t e s Time in ms SIMD Scan 200 400 600 0.5 1 0 5 10 250 500 750 Selectivity P1 #
A g g r e g a t e s Time in ms Predicated Scan 0.5 1 0 5 10 250 500 750 Selectivity P1 #
A g g r e g a t e s Time in ms SIMD Predicated Scan
The more aggregates, the better branching scans for low selectivity More aggregates, less impact of branch misprediction
D S E B
8
< 6 >= 6 #aggregates SIMD Branching SIMD Predicated selectivity < 0.1 >= 0.1 selectivity < 0.05 >= 0.05 SIMD Branching SIMD Predicated #predicates selectivity < 0.05 >= 0.05 < 4 >= 4 Branching Scan SIMD Branching #predicates < 2 >=2 Predicated Scan SIMD Predicated
D S E B
Hash table put / probe (joins, groupings)
9
Pipeline code for filter-&-aggregate pipelines1
1http:/
/git.iti.cs.ovgu.de/dbronesk/BTW-Pipeline-Variants
Decision trees as a result of our evaluation in the paper SIMD outperforms scalar variants for several predicates Increasing number of aggregates slows down predicated variants Automatic calibration for query compilation
D S E B
[BBS14] David Broneske, Sebastian Breß, and Gunter Saake. Database Scan Variants on Modern CPUs: A Performance Study. In Proceedings of the 2nd International Workshop on In- Memory Data Management and Analytics (IMDM), Lecture Notes in Computer Science, pages 97–111. Springer, 2014 [ZR02] Jingren Zhou, Kenneth A. Ross: Implementing database operations using SIMD instructions. In: SIGMOD. Pp. 145–156, 2002.
10
D S E B
11
selectivity1 Bitwise AND Conditional AND SIMD Predicated selectivity2 < 0.05 >= 0.05 < 0.05 >= 0.05
D S E B
12
0.5 1 0.5 1 200 S e l e c t i v i t y P 1 Selectivity P2 Time in ms Conditional AND Scan 0.5 1 0.5 1 200 S e l e c t i v i t y P 1 Selectivity P2 Bitwise AND Scan 100 200 300 0.5 1 0.5 1 200 S e l e c t i v i t y P 1 Selectivity P2 Time in ms SIMD Scan 0.5 1 0.5 1 200 S e l e c t i v i t y P 1 Selectivity P2 Predicated Scan 0.5 1 0.5 1 200 S e l e c t i v i t y P 1 Selectivity P2 SIMD Predicated Scan