Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ - - PowerPoint PPT Presentation

are hybrid physical
SMART_READER_LITE
LIVE PREVIEW

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ - - PowerPoint PPT Presentation

Columnstore and B+ tree Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree B+ tree B+ tree & Columnstore on same table = Hybrid design 4 ? C O L C O L B+ tree B+ tree Are Hybrid


slide-1
SLIDE 1

Columnstore and B+ tree – Are Hybrid Physical Designs Important?

1

slide-2
SLIDE 2

B+ tree

2

slide-3
SLIDE 3

B+ tree

C O L

3

slide-4
SLIDE 4

B+ tree B+ tree

C O L C O L

B+ tree & Columnstore on same table = Hybrid design

4

?

slide-5
SLIDE 5

B+ tree B+ tree

C O L

?

C O L

Are Hybrid Designs important and which workloads can benefit? 5

slide-6
SLIDE 6

B+ tree

Hybrid design = B+ tree & Columnstore

C O L

6

slide-7
SLIDE 7

Hybrid design = B+ tree & Columnstore

slide-8
SLIDE 8

C O L

B+ tree

8

B+ tree

Selectivity Sort order Updates Mix: Scans & Updates Concurrency

C O L

slide-9
SLIDE 9

▪ ▪ ▪

▪ ▪ ▪

9

slide-10
SLIDE 10

1 10 100 1,000 10,000 100,000 8.00E-06 4.00E-05 0.0002 0.001 0.008 0.03 0.06 0.09 0.3 0.7 2 10 25 40 60 100 Execution time (millisec) Selectivity (%) B+ tree hot

SELECT sum(col1) FROM table WHERE col1 < {1} 10 GB, 1 int col

10

slide-11
SLIDE 11

1 10 100 1,000 10,000 100,000 8.00E-06 4.00E-05 0.0002 0.001 0.008 0.03 0.06 0.09 0.3 0.7 2 10 25 40 60 100 Execution time (millisec) Selectivity (%) B+ tree hot

SELECT sum(col1) FROM table WHERE col1 < {1} 10 GB, 1 int col

11

0.2%

Skip data effectively & run single-threaded

slide-12
SLIDE 12

1 10 100 1,000 10,000 100,000 8.00E-06 4.00E-05 0.0002 0.001 0.008 0.03 0.06 0.09 0.3 0.7 2 10 25 40 60 100 Execution time (millisec) Selectivity (%) Col hot B+ tree hot

SELECT sum(col1) FROM table WHERE col1 < {1} 10 GB, 1 int col

0.03%

12

Superior performance of Columnstore scans

slide-13
SLIDE 13

1 10 100 1,000 10,000 100,000 8.00E-06 4.00E-05 0.0002 0.001 0.008 0.03 0.06 0.09 0.3 0.7 2 10 25 40 60 100 Execution time (millisec) Selectivity (%) Col cold B+ tree cold Col hot B+ tree hot

SELECT sum(col1) FROM table WHERE col1 < {1} 10 GB, 1 int col

B+ tree helps for low selectivity & slower storage

0.03%

13

slide-14
SLIDE 14

1 10 100 1,000 10,000 100,000 8.00E-06 4.00E-05 0.0002 0.001 0.008 0.03 0.06 0.09 0.3 0.7 2 10 25 40 60 100 Execution time (millisec) Selectivity (%) Col cold B+ tree cold Col hot B+ tree hot

SELECT sum(col1) FROM table WHERE col1 < {1} 10 GB, 1 int col

B+ tree helps for low selectivity & slower storage

0.03% 10%

14

slide-15
SLIDE 15

10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000 100 10,000 1,000,000 Execution time (millisec) # of groups B+ tree Col

SELECT col1, sum(col2) FROM table GROUP BY col1 20 GB, 2 int col, vary number of distinct values in col1 B+ tree sorted on col1

15

slide-16
SLIDE 16

10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000 100 10,000 1,000,000 Execution time (millisec) # of groups B+ tree Col

SELECT col1, sum(col2) FROM table GROUP BY col1 20 GB, 2 int col, vary number of distinct values in col1 B+ tree sorted on col1

16

Scanning & hashing Col faster than reading sorted B+ tree

slide-17
SLIDE 17

10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000 100 10,000 1,000,000 Execution time (millisec) # of groups B+ tree Col

Sort order of B+ tree beneficial for scarce query memory

SELECT col1, sum(col2) FROM table GROUP BY col1 20 GB, 2 int col, vary number of distinct values in col1 B+ tree sorted on col1

17

slide-18
SLIDE 18

▪ ▪ ▪ ▪

Primary Columnstores

18

slide-19
SLIDE 19

Delete buffer Key not in

▪ ▪ ▪

Secondary Columnstores

19

slide-20
SLIDE 20

Update top 10 rows TPC-H 30 GB, 10 concurrent queries, Read Committed

1 10 100 1,000 10,000 100,000 1,000,000 scan: 0,update: 100 scan: 1, update: 99 scan: 5, update: 95 Execution time (millisec) Percentage (%) for scans and updates

  • Pri. B+ tree
  • Pri. B+ tree with Sec. Col
  • Pri. Col

20

TPC-H 30 GB, 10 concurrent queries, Read Committed Hybrid design

slide-21
SLIDE 21

Update top 10 rows TPC-H 30 GB, 10 concurrent queries, Read Committed

1 10 100 1,000 10,000 100,000 1,000,000 scan: 0,update: 100 scan: 1, update: 99 scan: 5, update: 95 Execution time (millisec) Percentage (%) for scans and updates

  • Pri. B+ tree
  • Pri. B+ tree with Sec. Col
  • Pri. Col

21

B+ trees cheaper than Columnstores for pure updates

TPC-H 30 GB, 10 concurrent queries, Read Committed Hybrid design

slide-22
SLIDE 22

Update top 10 rows, Select sum of quantity & price for a single shipdate from lineitem table TPC-H 30 GB, 10 concurrent queries, Read Committed

1 10 100 1,000 10,000 100,000 1,000,000 scan: 0,update: 100 scan: 1, update: 99 scan: 5, update: 95 Total execution time (millisec) Percentage (%) for scans and updates

  • Pri. B+ tree
  • Pri. B+ tree with Sec. Col
  • Pri. Col

TPC-H 30 GB, 10 concurrent queries, Read Committed

22

Secondary Columnstore: balance small updates & large scans

slide-23
SLIDE 23

Hybrid design = B+ tree & Columnstore 1.

  • 1. Micro-benchmarks
slide-24
SLIDE 24

DTA

Database Server

Query Optimizer

Output what-if

24

Create Index Drop Index … Workload, Constraints (e.g. storage budget)

slide-25
SLIDE 25

DTA

Database Server

Query Optimizer

Output what-if

25

Create Index Drop Index … Workload, Constraints (e.g. storage budget)

slide-26
SLIDE 26

DTA

Database Server

Query Optimizer

Output what-if

26

Create Index Drop Index … Workload, Constraints (e.g. storage budget)

slide-27
SLIDE 27

DTA

Database Server

Query Optimizer

Output what-if

27

Create Index Drop Index … Workload, Constraints (e.g. storage budget)

slide-28
SLIDE 28

28

  • 1. Costing queries on

Hypothetical Columnstores

slide-29
SLIDE 29

29

  • 1. Costing queries on

Hypothetical Columnstores

  • 2. Per-column size estimation
  • stay within storage budget
  • per-column access cost
  • hard problem
slide-30
SLIDE 30

30

  • 1. Optimal designs searched over:

combined space of Columnstores & B+ trees

slide-31
SLIDE 31

31

  • 1. Optimal designs searched over:

combined space of Columnstores & B+ trees

  • 2. Released as part of

SQL Server 2017 CTP

slide-32
SLIDE 32

Hybrid design = B+ tree & Columnstore 1.

  • 1. Micro-benchmarks

2.

  • 2. Auto Recommend Hybrid Designs
slide-33
SLIDE 33

▪ ▪

▪ ▪

DTA

▪ ▪

33

slide-34
SLIDE 34

34

0% 20% 40% 60% 80% 100% Cust1 Cust2 Cust3 Cust4 Cust5 TPC-DS Percentage of query plans Hybrid Query Plans

slide-35
SLIDE 35

35

0% 20% 40% 60% 80% 100% Cust1 Cust2 Cust3 Cust4 Cust5 TPC-DS Percentage of query plans Hybrid Query Plans Hybrid Query Plans (Same Table)

slide-36
SLIDE 36

7 5 10 5 10 27 23 10 4 16 46 11 4 4 1 11 10 20 30 40 50 0.5 0.8 1.2 1.5 2 5 10 >10 # of Queries Bins of Speedup (CPU time)

Hybrid Vs B+ tree only Hybrid Vs Columnstore only

TPC-DS benchmark 100 GB

36

slide-37
SLIDE 37

7 5 10 5 10 27 23 10 4 16 46 11 4 4 1 11 10 20 30 40 50 0.5 0.8 1.2 1.5 2 5 10 >10 # of Queries Bins of Speedup (CPU time)

Hybrid Vs B+ tree only Hybrid Vs Columnstore only

TPC-DS benchmark 100 GB

37

slide-38
SLIDE 38

7 5 10 5 10 27 23 10 4 16 46 11 4 4 1 11 10 20 30 40 50 0.5 0.8 1.2 1.5 2 5 10 >10 # of Queries Bins of Speedup (CPU time)

Hybrid Vs B+ tree only Hybrid Vs Columnstore only

TPC-DS benchmark 100 GB

38

slide-39
SLIDE 39

7 5 10 5 10 27 23 10 4 16 46 11 4 4 1 11 10 20 30 40 50 0.5 0.8 1.2 1.5 2 5 10 >10 # of Queries Bins of Speedup (CPU time)

39

Hybrid designs significantly improve decision support workload

slide-40
SLIDE 40

▪ ▪ ▪ 20X ▪ 10X

Columnstore-only design Hybrid design Col scan catalog_sales Col scan web_sales concatenate Col scan date_dim hash join B+tree seek web_sales B+tree seek date_dim B+tree seek date_dim B+tree seek catalog_sales nested-loop join nested- loop join

slide-41
SLIDE 41

3 2 3 3 11 1 4 10 20 0.5 0.8 1.2 1.5 2 5 10 >10

  • No. of Queries

Bins of Speedup (wall-clock time) Hybrid Vs B+ tree only (Snapshot Isolation)

1K warehouses

41

slide-42
SLIDE 42

3 2 3 3 11 1 4 10 20 0.5 0.8 1.2 1.5 2 5 10 >10

  • No. of Queries

Bins of Speedup (wall-clock time) Hybrid Vs B+ tree only (Snapshot Isolation)

1K warehouses

2X slowdown of transactions & 10X speed-up of analytics

42

slide-43
SLIDE 43
slide-44
SLIDE 44

10X 0X

▪ DTA ▪

▪ ▪

44

slide-45
SLIDE 45
slide-46
SLIDE 46
slide-47
SLIDE 47

▪ ▪ ▪ ▪ ▪ ▪ ▪

47

slide-48
SLIDE 48

VECTORIZED COMPRESSED LATE MATERIALIZATION

SORTED GLOBALLY FAST MODIFICATIONS MEMORY EFFICIENT

C O L

B+ tree Selectivity values Sortedness Concurrency

48

slide-49
SLIDE 49

B+ tree

C O L

streaming via sortedness Leverage both B+ tree & Col Batch-mode & compression point lookups & short scans large & fast scans balance scans & updates

49

slide-50
SLIDE 50

▪ ▪ ▪

50

slide-51
SLIDE 51

A B 3 3 1 1 3 1 3 1 A B 1 3 3 1 3 1 3 1 A 0, 1 1, 1 3, 4 B 0, 3 1, 3 A B 30 0.0 30 0.1 0.0 10 0.0 30 0.1 30 0.1

GEE estimator groups that occur once in the sample are scaled by total_size / sample_size (e.g. [0,1] and [1,1]), other groups are counted once in total

51

slide-52
SLIDE 52

▪ ▪ ▪ ▪

52

slide-53
SLIDE 53

Work- load DB Size (GB) # of tables Max table size (GB)

  • Avg. #
  • f cols

per table # of queries

  • Avg. #
  • f joins
  • Avg. #
  • f ops

per plan Cust1 172 23 63.8 14.1 36 7.2 29.1 Cust2 44.6 614 44.6 23.5 40 8.1 28.3 Cust3 138.4 3394 79.8 26.3 40 8.7 24.1 Cust4 93 22 54.8 20.3 24 6.9 24.4 Cust5 9.83 474 1.52 5.5 47 23.1 57.7 TPC-DS 87.7 24 34.9 17.2 97 7.9 28.2 TPC-CH 11 27

53

slide-54
SLIDE 54

▪ ▪ ▪ ▪

54

slide-55
SLIDE 55

▪ ▪ ▪ ▪

55

slide-56
SLIDE 56

▪ ▪

Delete buffer Key not in

57

slide-57
SLIDE 57

58

PRIMARY B+ TREE & SECONDARY COLUMNSTORE COLUMNSTORE PRIMARY STORAGE & SECONDARY B+ TREE

B+ tree

C O L C O L

B+ tree

SORTED B+ TREE NOT SORTED GLOBALLY COLUMNSTORE

slide-58
SLIDE 58

B+ tree

C O L

short range scans & lookups scarce memory large scans & bulk updates Balance updates & scans small storage footprint Mixed workload

59

slide-59
SLIDE 59

▪ B+

B+ tr trees ees

▪ ▪ Second

econdar ary y CSIs Is

▪ Hybr

brid id physic ysical l desi signs gns

B+ tree

C O L

60

slide-60
SLIDE 60

▪ ▪ ▪ ▪ ▪

61

slide-61
SLIDE 61

Large data & design space Complex storage

A 0.03 0.02 0.02 0.03 0.02 A 3 2 2 3 2 A 2 2 2 3 3 A 2, 3 3, 2

Input Encode Sort Compress

Sample based techniques 1) Build index on samples 2) Model full index

62

slide-62
SLIDE 62

Build index on samples Model full index

63

slide-63
SLIDE 63

64

slide-64
SLIDE 64

1 10 100 1,000 10,000 100,000 1,000,000 0.01 0.02 5.12 40.98 Execution time (millisec) % of updated rows

  • Pri. B+ tree
  • Pri. B+ tree with Sec. Col
  • Pri. Col

Primary & Secondary Columnstores comparable for large updates

UPDATE N rows lineitem WHERE l_shipdate = {1} TPC-H 30 GB, single thread

65

slide-65
SLIDE 65

66

0% 20% 40% 60% 80% 100% Cust1 Cust2 Cust3 Cust4 Cust5 TPC-DS Percentage of queries Hybrid Plans (Same Query) Hybrid Plans (Same Table)

slide-66
SLIDE 66

1 10 100 1,000 10,000 100,000 1,000,000 0.01 0.02 5.12 40.98 Execution time (millisec) % of updated rows

  • Pri. B+ tree
  • Pri. B+ tree with Sec. Col
  • Pri. Col

Primary Columnstores incur high cost for small updates

19 X UPDATE N rows lineitem WHERE l_shipdate = {1} TPC-H 30 GB, single thread

67

slide-67
SLIDE 67

Cheaper updates for B+ trees than for Columnstores

1 10 100 1,000 10,000 100,000 1,000,000 0.01 0.02 5.12 40.98 Execution time (millisec) % of updated rows

  • Pri. B+ tree
  • Pri. B+ tree with Sec. Col
  • Pri. Col

UPDATE N rows lineitem WHERE l_shipdate = {1} TPC-H 30 GB, single thread

68

slide-68
SLIDE 68

Large data & design space Complex Columnar storage

A 0.03 0.02 0.02 0.03 0.02 A 3 2 2 3 2 A 2 2 2 3 3 A 2, 3 3, 2

Input Encode Sort Compress

Sample based techniques At least as hard as distinct value estimation 69

slide-69
SLIDE 69

70

0% 20% 40% 60% 80% 100% Cust1 Cust2 Cust3 Cust4 Cust5 TPC-DS Percentage of query plans Hybrid Query Plans Hybrid Qeury Plans (Same Table)

slide-70
SLIDE 70

Large data & design space Complex Columnar storage

A 0.03 0.02 0.02 0.03 0.02 A 3 2 2 3 2 A 2 2 2 3 3 A 2, 3 3, 2

Input Encode Sort Compress

Sample based techniques At least as hard as distinct value estimation 71

slide-71
SLIDE 71

Update top 10 rows, Select sum of quantity & price for a single shipdate from lineitem table TPC-H 30 GB, 10 concurrent queries, Read Committed

1 10 100 1,000 10,000 100,000 1,000,000 scan: 0,update: 100 scan: 1, update: 99 scan: 2, update: 98 scan: 3, update: 97 scan: 4, update: 96 scan: 5, update: 95 Execution time (millisec) Percentage (%) for scans and updates

  • Pri. B+ tree
  • Pri. B+ tree with Sec. Col
  • Pri. Col

72

Secondary Columnstore: balance small updates & large scans

slide-72
SLIDE 72

73

Concept Access Path Selection Hybrid designs B+ tree Main memory

  • ptimized

General (disk- based) Scans Shared Non-shared DB engine Prototype SQL Server Main focus Model Concurrency DTA and many workloads Physical designs Columnstore and Secondary B+tree Hybrid Physical Designs

slide-73
SLIDE 73

74

B+ tree clustered

  • n order &

linenumber

Ship date

Primary B+ tree (base table) Secondary Columnstore Secondary B+tree on ship date Heap file base table Secondary Columnstore Secondary B+tree on ship date Covers all columns

HEAP

FILE

Ship date

Covers all columns

slide-74
SLIDE 74

75

B+ tree clustered

  • n order &

linenumber

Ship date

Primary B+ tree (base table) Secondary Columnstore Secondary B+ tree on ship date Primary Columnstore Secondary B+ tree on order/line Secondary B+ tree on ship date Covers all columns

Ship date

Covers all columns

B+ tree

  • n order &

linenumber

slide-75
SLIDE 75

DTA

DB Engine

Query Optimizer

Create Index Drop Index Create View …

Output

C O S T

what-if

1) Select candidates 2) Merge Indexes 3) Enumerate

Workload Constraints

  • Storage

budget

76

slide-76
SLIDE 76

DTA

DB Engine

Query Optimizer

Create Index Drop Index Create View …

Output

C O S T

what-if

1) Select candidates 2) Merge Indexes 3) Enumerate

Workload Constraints

  • Storage

budget

77

slide-77
SLIDE 77

DTA

DB Engine

Query Optimizer

Create Index Drop Index Create View …

Output

C O S T

what-if

1) Select candidates 2) Merge Indexes 3) Enumerate

Workload Constraints

  • Storage

budget

78

slide-78
SLIDE 78

DTA

DB Engine

Query Optimizer

Create Index Drop Index Create View …

Output

C O S T

what-if

1) Select candidates 2) Merge Indexes 3) Enumerate

Workload Constraints

  • Storage

budget

79

slide-79
SLIDE 79

80

Source: Real-Time Analytical Processing with SQL Server – Paul Larson, Adrian Birka, Eric Hanson, Weiyun Huang, Michal Novakiewicz, Vassilis Papadimos (Microsoft) Primary Columnstore & Secondary B+ tree index:

  • enforce constraints

efficiently (e.g. Primary Key)