Columnstore and B+ tree – Are Hybrid Physical Designs Important?
1
Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ - - PowerPoint PPT Presentation
Columnstore and B+ tree Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree B+ tree B+ tree & Columnstore on same table = Hybrid design 4 ? C O L C O L B+ tree B+ tree Are Hybrid
1
B+ tree
2
B+ tree
C O L
3
B+ tree B+ tree
C O L C O L
B+ tree & Columnstore on same table = Hybrid design
4
B+ tree B+ tree
C O L
C O L
Are Hybrid Designs important and which workloads can benefit? 5
B+ tree
C O L
6
C O L
B+ tree
8
B+ tree
Selectivity Sort order Updates Mix: Scans & Updates Concurrency
C O L
▪ ▪ ▪
▪ ▪ ▪
9
1 10 100 1,000 10,000 100,000 8.00E-06 4.00E-05 0.0002 0.001 0.008 0.03 0.06 0.09 0.3 0.7 2 10 25 40 60 100 Execution time (millisec) Selectivity (%) B+ tree hot
SELECT sum(col1) FROM table WHERE col1 < {1} 10 GB, 1 int col
10
1 10 100 1,000 10,000 100,000 8.00E-06 4.00E-05 0.0002 0.001 0.008 0.03 0.06 0.09 0.3 0.7 2 10 25 40 60 100 Execution time (millisec) Selectivity (%) B+ tree hot
SELECT sum(col1) FROM table WHERE col1 < {1} 10 GB, 1 int col
11
0.2%
Skip data effectively & run single-threaded
1 10 100 1,000 10,000 100,000 8.00E-06 4.00E-05 0.0002 0.001 0.008 0.03 0.06 0.09 0.3 0.7 2 10 25 40 60 100 Execution time (millisec) Selectivity (%) Col hot B+ tree hot
SELECT sum(col1) FROM table WHERE col1 < {1} 10 GB, 1 int col
0.03%
12
Superior performance of Columnstore scans
1 10 100 1,000 10,000 100,000 8.00E-06 4.00E-05 0.0002 0.001 0.008 0.03 0.06 0.09 0.3 0.7 2 10 25 40 60 100 Execution time (millisec) Selectivity (%) Col cold B+ tree cold Col hot B+ tree hot
SELECT sum(col1) FROM table WHERE col1 < {1} 10 GB, 1 int col
B+ tree helps for low selectivity & slower storage
0.03%
13
1 10 100 1,000 10,000 100,000 8.00E-06 4.00E-05 0.0002 0.001 0.008 0.03 0.06 0.09 0.3 0.7 2 10 25 40 60 100 Execution time (millisec) Selectivity (%) Col cold B+ tree cold Col hot B+ tree hot
SELECT sum(col1) FROM table WHERE col1 < {1} 10 GB, 1 int col
B+ tree helps for low selectivity & slower storage
0.03% 10%
14
10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000 100 10,000 1,000,000 Execution time (millisec) # of groups B+ tree Col
SELECT col1, sum(col2) FROM table GROUP BY col1 20 GB, 2 int col, vary number of distinct values in col1 B+ tree sorted on col1
15
10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000 100 10,000 1,000,000 Execution time (millisec) # of groups B+ tree Col
SELECT col1, sum(col2) FROM table GROUP BY col1 20 GB, 2 int col, vary number of distinct values in col1 B+ tree sorted on col1
16
Scanning & hashing Col faster than reading sorted B+ tree
10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000 100 10,000 1,000,000 Execution time (millisec) # of groups B+ tree Col
Sort order of B+ tree beneficial for scarce query memory
SELECT col1, sum(col2) FROM table GROUP BY col1 20 GB, 2 int col, vary number of distinct values in col1 B+ tree sorted on col1
17
▪ ▪ ▪ ▪
Primary Columnstores
18
Delete buffer Key not in
▪ ▪ ▪
Secondary Columnstores
19
Update top 10 rows TPC-H 30 GB, 10 concurrent queries, Read Committed
1 10 100 1,000 10,000 100,000 1,000,000 scan: 0,update: 100 scan: 1, update: 99 scan: 5, update: 95 Execution time (millisec) Percentage (%) for scans and updates
20
TPC-H 30 GB, 10 concurrent queries, Read Committed Hybrid design
Update top 10 rows TPC-H 30 GB, 10 concurrent queries, Read Committed
1 10 100 1,000 10,000 100,000 1,000,000 scan: 0,update: 100 scan: 1, update: 99 scan: 5, update: 95 Execution time (millisec) Percentage (%) for scans and updates
21
B+ trees cheaper than Columnstores for pure updates
TPC-H 30 GB, 10 concurrent queries, Read Committed Hybrid design
Update top 10 rows, Select sum of quantity & price for a single shipdate from lineitem table TPC-H 30 GB, 10 concurrent queries, Read Committed
1 10 100 1,000 10,000 100,000 1,000,000 scan: 0,update: 100 scan: 1, update: 99 scan: 5, update: 95 Total execution time (millisec) Percentage (%) for scans and updates
TPC-H 30 GB, 10 concurrent queries, Read Committed
22
Secondary Columnstore: balance small updates & large scans
Database Server
Query Optimizer
Output what-if
24
Create Index Drop Index … Workload, Constraints (e.g. storage budget)
Database Server
Query Optimizer
Output what-if
25
Create Index Drop Index … Workload, Constraints (e.g. storage budget)
Database Server
Query Optimizer
Output what-if
26
Create Index Drop Index … Workload, Constraints (e.g. storage budget)
Database Server
Query Optimizer
Output what-if
27
Create Index Drop Index … Workload, Constraints (e.g. storage budget)
28
Hypothetical Columnstores
29
Hypothetical Columnstores
30
combined space of Columnstores & B+ trees
31
combined space of Columnstores & B+ trees
SQL Server 2017 CTP
▪ ▪
▪ ▪
▪
▪
DTA
▪
▪ ▪
▪
33
34
0% 20% 40% 60% 80% 100% Cust1 Cust2 Cust3 Cust4 Cust5 TPC-DS Percentage of query plans Hybrid Query Plans
35
0% 20% 40% 60% 80% 100% Cust1 Cust2 Cust3 Cust4 Cust5 TPC-DS Percentage of query plans Hybrid Query Plans Hybrid Query Plans (Same Table)
7 5 10 5 10 27 23 10 4 16 46 11 4 4 1 11 10 20 30 40 50 0.5 0.8 1.2 1.5 2 5 10 >10 # of Queries Bins of Speedup (CPU time)
Hybrid Vs B+ tree only Hybrid Vs Columnstore only
TPC-DS benchmark 100 GB
36
7 5 10 5 10 27 23 10 4 16 46 11 4 4 1 11 10 20 30 40 50 0.5 0.8 1.2 1.5 2 5 10 >10 # of Queries Bins of Speedup (CPU time)
Hybrid Vs B+ tree only Hybrid Vs Columnstore only
TPC-DS benchmark 100 GB
37
7 5 10 5 10 27 23 10 4 16 46 11 4 4 1 11 10 20 30 40 50 0.5 0.8 1.2 1.5 2 5 10 >10 # of Queries Bins of Speedup (CPU time)
Hybrid Vs B+ tree only Hybrid Vs Columnstore only
TPC-DS benchmark 100 GB
38
7 5 10 5 10 27 23 10 4 16 46 11 4 4 1 11 10 20 30 40 50 0.5 0.8 1.2 1.5 2 5 10 >10 # of Queries Bins of Speedup (CPU time)
39
Hybrid designs significantly improve decision support workload
▪ ▪ ▪ 20X ▪ 10X
Columnstore-only design Hybrid design Col scan catalog_sales Col scan web_sales concatenate Col scan date_dim hash join B+tree seek web_sales B+tree seek date_dim B+tree seek date_dim B+tree seek catalog_sales nested-loop join nested- loop join
3 2 3 3 11 1 4 10 20 0.5 0.8 1.2 1.5 2 5 10 >10
Bins of Speedup (wall-clock time) Hybrid Vs B+ tree only (Snapshot Isolation)
1K warehouses
41
3 2 3 3 11 1 4 10 20 0.5 0.8 1.2 1.5 2 5 10 >10
Bins of Speedup (wall-clock time) Hybrid Vs B+ tree only (Snapshot Isolation)
1K warehouses
2X slowdown of transactions & 10X speed-up of analytics
42
▪
▪
▪
10X 0X
▪ DTA ▪
▪ ▪
44
▪ ▪ ▪ ▪ ▪ ▪ ▪
47
VECTORIZED COMPRESSED LATE MATERIALIZATION
SORTED GLOBALLY FAST MODIFICATIONS MEMORY EFFICIENT
C O L
B+ tree Selectivity values Sortedness Concurrency
48
B+ tree
C O L
streaming via sortedness Leverage both B+ tree & Col Batch-mode & compression point lookups & short scans large & fast scans balance scans & updates
49
▪ ▪ ▪
50
A B 3 3 1 1 3 1 3 1 A B 1 3 3 1 3 1 3 1 A 0, 1 1, 1 3, 4 B 0, 3 1, 3 A B 30 0.0 30 0.1 0.0 10 0.0 30 0.1 30 0.1
GEE estimator groups that occur once in the sample are scaled by total_size / sample_size (e.g. [0,1] and [1,1]), other groups are counted once in total
51
▪ ▪ ▪ ▪
52
Work- load DB Size (GB) # of tables Max table size (GB)
per table # of queries
per plan Cust1 172 23 63.8 14.1 36 7.2 29.1 Cust2 44.6 614 44.6 23.5 40 8.1 28.3 Cust3 138.4 3394 79.8 26.3 40 8.7 24.1 Cust4 93 22 54.8 20.3 24 6.9 24.4 Cust5 9.83 474 1.52 5.5 47 23.1 57.7 TPC-DS 87.7 24 34.9 17.2 97 7.9 28.2 TPC-CH 11 27
53
▪ ▪ ▪ ▪
54
▪
▪ ▪ ▪ ▪
▪
▪
▪
55
▪ ▪
▪
Delete buffer Key not in
57
58
PRIMARY B+ TREE & SECONDARY COLUMNSTORE COLUMNSTORE PRIMARY STORAGE & SECONDARY B+ TREE
B+ tree
C O L C O L
B+ tree
SORTED B+ TREE NOT SORTED GLOBALLY COLUMNSTORE
B+ tree
C O L
short range scans & lookups scarce memory large scans & bulk updates Balance updates & scans small storage footprint Mixed workload
59
▪ B+
B+ tr trees ees
▪ ▪ Second
econdar ary y CSIs Is
▪ Hybr
brid id physic ysical l desi signs gns
B+ tree
C O L
60
▪ ▪ ▪ ▪ ▪
61
Large data & design space Complex storage
A 0.03 0.02 0.02 0.03 0.02 A 3 2 2 3 2 A 2 2 2 3 3 A 2, 3 3, 2
Input Encode Sort Compress
Sample based techniques 1) Build index on samples 2) Model full index
62
Build index on samples Model full index
63
64
1 10 100 1,000 10,000 100,000 1,000,000 0.01 0.02 5.12 40.98 Execution time (millisec) % of updated rows
Primary & Secondary Columnstores comparable for large updates
UPDATE N rows lineitem WHERE l_shipdate = {1} TPC-H 30 GB, single thread
65
66
0% 20% 40% 60% 80% 100% Cust1 Cust2 Cust3 Cust4 Cust5 TPC-DS Percentage of queries Hybrid Plans (Same Query) Hybrid Plans (Same Table)
1 10 100 1,000 10,000 100,000 1,000,000 0.01 0.02 5.12 40.98 Execution time (millisec) % of updated rows
Primary Columnstores incur high cost for small updates
19 X UPDATE N rows lineitem WHERE l_shipdate = {1} TPC-H 30 GB, single thread
67
Cheaper updates for B+ trees than for Columnstores
1 10 100 1,000 10,000 100,000 1,000,000 0.01 0.02 5.12 40.98 Execution time (millisec) % of updated rows
UPDATE N rows lineitem WHERE l_shipdate = {1} TPC-H 30 GB, single thread
68
Large data & design space Complex Columnar storage
A 0.03 0.02 0.02 0.03 0.02 A 3 2 2 3 2 A 2 2 2 3 3 A 2, 3 3, 2
Input Encode Sort Compress
Sample based techniques At least as hard as distinct value estimation 69
70
0% 20% 40% 60% 80% 100% Cust1 Cust2 Cust3 Cust4 Cust5 TPC-DS Percentage of query plans Hybrid Query Plans Hybrid Qeury Plans (Same Table)
Large data & design space Complex Columnar storage
A 0.03 0.02 0.02 0.03 0.02 A 3 2 2 3 2 A 2 2 2 3 3 A 2, 3 3, 2
Input Encode Sort Compress
Sample based techniques At least as hard as distinct value estimation 71
Update top 10 rows, Select sum of quantity & price for a single shipdate from lineitem table TPC-H 30 GB, 10 concurrent queries, Read Committed
1 10 100 1,000 10,000 100,000 1,000,000 scan: 0,update: 100 scan: 1, update: 99 scan: 2, update: 98 scan: 3, update: 97 scan: 4, update: 96 scan: 5, update: 95 Execution time (millisec) Percentage (%) for scans and updates
72
Secondary Columnstore: balance small updates & large scans
73
Concept Access Path Selection Hybrid designs B+ tree Main memory
General (disk- based) Scans Shared Non-shared DB engine Prototype SQL Server Main focus Model Concurrency DTA and many workloads Physical designs Columnstore and Secondary B+tree Hybrid Physical Designs
74
B+ tree clustered
linenumber
Ship date
Primary B+ tree (base table) Secondary Columnstore Secondary B+tree on ship date Heap file base table Secondary Columnstore Secondary B+tree on ship date Covers all columns
HEAP
FILE
Ship date
Covers all columns
75
B+ tree clustered
linenumber
Ship date
Primary B+ tree (base table) Secondary Columnstore Secondary B+ tree on ship date Primary Columnstore Secondary B+ tree on order/line Secondary B+ tree on ship date Covers all columns
Ship date
Covers all columns
B+ tree
linenumber
DB Engine
Query Optimizer
Create Index Drop Index Create View …
Output
C O S T
what-if
1) Select candidates 2) Merge Indexes 3) Enumerate
Workload Constraints
budget
76
DB Engine
Query Optimizer
Create Index Drop Index Create View …
Output
C O S T
what-if
1) Select candidates 2) Merge Indexes 3) Enumerate
Workload Constraints
budget
77
DB Engine
Query Optimizer
Create Index Drop Index Create View …
Output
C O S T
what-if
1) Select candidates 2) Merge Indexes 3) Enumerate
Workload Constraints
budget
78
DB Engine
Query Optimizer
Create Index Drop Index Create View …
Output
C O S T
what-if
1) Select candidates 2) Merge Indexes 3) Enumerate
Workload Constraints
budget
79
80
Source: Real-Time Analytical Processing with SQL Server – Paul Larson, Adrian Birka, Eric Hanson, Weiyun Huang, Michal Novakiewicz, Vassilis Papadimos (Microsoft) Primary Columnstore & Secondary B+ tree index:
efficiently (e.g. Primary Key)