A Comparison of Knives for Bread Slicing Alekh Jindal*, Endre - - PowerPoint PPT Presentation

a comparison of knives for bread slicing
SMART_READER_LITE
LIVE PREVIEW

A Comparison of Knives for Bread Slicing Alekh Jindal*, Endre - - PowerPoint PPT Presentation

A Comparison of Knives for Bread Slicing Alekh Jindal*, Endre Palatinus, Vladimir Pavlov, Jens Dittrich Information Systems Group Saarland University *MIT CSAIL Data Layout 2 Row-layout CUSTKEY NATIONKEY NAME MKTSEGMENT ADDRESS COMMENT


slide-1
SLIDE 1

A Comparison of Knives for Bread Slicing

Alekh Jindal*, Endre Palatinus, Vladimir Pavlov, Jens Dittrich Information Systems Group Saarland University *MIT CSAIL

slide-2
SLIDE 2

Data Layout

2

slide-3
SLIDE 3

Row-layout

3

CUSTKEY NATIONKEY NAME MKTSEGMENT ADDRESS COMMENT PHONE ACCTBAL 1234556 DE G.W. 43 E1.4

  • 1234

€987,513 2334444 GB I.N. 22 OX13

  • 332

€10,522 1123234 US M.S. 22 CA16

  • 1233

€6,452 2323454 DE J.D. 43 E1.3 CST_LOW 54443 €399 2311555 GB A.M. 12 WA154

  • 23442

€46,523 1231235 NL T.V. 42 AM3321

  • 1123

€180,000

slide-4
SLIDE 4

Column-layout

4

CUSTKEY 1234556 2334444 1123234 2323454 2311555 1231235 NATIONKEY DE GB US DE GB NL NAME G.W. I.N. M.S. J.D. A.M. T.V. MKTSEGMENT 43 22 22 43 12 42 ADDRESS E1.4 OX13 CA16 E1.3 WA154 AM3321 PHONE 1234 332 1233 54443 23442 1123 COMMENT

  • CST_LOW
  • ACCTBAL

€987,513 €10,522 €6,452 €399 €46,523 €180,000

slide-5
SLIDE 5

Column-grouping

5

CUSTKEY 1234556 2334444 1123234 2323454 2311555 1231235 NATIONKEY DE GB US DE GB NL NAME G.W. I.N. M.S. J.D. A.M. T.V. MKTSEGMENT 43 22 22 43 12 42 ADDRESS COMMENT E1.4

  • OX13
  • CA16
  • E1.3

CST_LOW WA154

  • AM3321
  • PHONE

ACCTBAL 1234 €987,513 332 €10,522 1233 €6,452 54443 €399 23442 €46,523 1123 €180,000

slide-6
SLIDE 6

The Vertical Partitioning Problem

  • Given a workload and a cost function
  • Provide a complete and disjunct partitioning
  • f the set of attributes of a table
slide-7
SLIDE 7

Workload

7

slide-8
SLIDE 8

CUSTKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL MKTSEGMENT COMMENT Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22

8

slide-9
SLIDE 9

Selectivity?

9

High selectivity Low selectivity Indexes Vertical partitioning

slide-10
SLIDE 10

Vertical Partitioning in Legacy Row-Stores

10

slide-11
SLIDE 11

TPC-H Customer

CUSTKEY NATIONKEY NAME MKTSEGMENT ADDRESS COMMENT PHONE ACCTBAL

P1

CUSTKEY

P2

NATIONKEY

P3

NAME

P4

MKTSEGMENT

P5

ADDRESS COMMENT

P6

PHONE ACCTBAL

11

slide-12
SLIDE 12

SELECT Name, Address, Acctbal FROM Customer

P1

CUSTKEY

P2

NATIONKEY

P3

NAME

P4

MKTSEGMENT

P5

ADDRESS COMMENT

P6

PHONE ACCTBAL

12

slide-13
SLIDE 13

Database Buffer

NAME ADDRESS COMMENT PHONE ACCTBAL

P3

NAME

P5

ADDRESS COMMENT

P6

PHONE ACCTBAL

SELECT Name, Address, Acctbal FROM Customer

13

slide-14
SLIDE 14

Database Buffer

NAME ADDRESS COMMENT PHONE ACCTBAL

P3

NAME

P5

ADDRESS COMMENT

P6

PHONE ACCTBAL

SELECT Name, Address, Acctbal FROM Customer

14

slide-15
SLIDE 15

Database Buffer

NAME ADDRESS COMMENT PHONE ACCTBAL

P3

NAME

P5

ADDRESS COMMENT

P6

PHONE ACCTBAL

SELECT Name, Address, Acctbal FROM Customer

15

slide-16
SLIDE 16

Database Buffer

NAME ADDRESS COMMENT PHONE ACCTBAL

P3

NAME

P5

ADDRESS COMMENT

P6

PHONE ACCTBAL

SELECT Name, Address, Acctbal FROM Customer

16

slide-17
SLIDE 17

Database Buffer

NAME ADDRESS COMMENT PHONE ACCTBAL

P3

NAME

P5

ADDRESS COMMENT

P6

PHONE ACCTBAL

SELECT Name, Address, Acctbal FROM Customer

17

slide-18
SLIDE 18

Database Buffer

NAME ADDRESS COMMENT PHONE ACCTBAL

P3

NAME

P5

ADDRESS COMMENT

P6

PHONE ACCTBAL

SELECT Name, Address, Acctbal FROM Customer

18

slide-19
SLIDE 19

Database Buffer

NAME ADDRESS COMMENT PHONE ACCTBAL

P3

NAME

P5

ADDRESS COMMENT

P6

PHONE ACCTBAL

SELECT Name, Address, Acctbal FROM Customer

19

slide-20
SLIDE 20

Database Buffer

NAME ADDRESS COMMENT PHONE ACCTBAL

P3

NAME

P5

ADDRESS COMMENT

P6

PHONE ACCTBAL

SELECT Name, Address, Acctbal FROM Customer

20

slide-21
SLIDE 21

Classification of VP algorithms

21

slide-22
SLIDE 22

22

Starting Point

Whole workload

Starting Point

Attribute subset

Starting Point

Query subset

Search Strategy

Brute force

Search Strategy

Top-down

Search Strategy

Bottom-up

Candidate Pruning

No pruning

Candidate Pruning

Threshold-based

slide-23
SLIDE 23

23

AutoPart HillClimb HYRISE Navathe O2P Trojan Brute Force

Starting Point

Whole workload

Starting Point

Attribute subset

Starting Point

Query subset

Search Strategy

Brute force

Search Strategy

Top-down

Search Strategy

Bottom-up

Candidate Pruning

No pruning

Candidate Pruning

Threshold-based

slide-24
SLIDE 24

HillClimb Example

slide-25
SLIDE 25

Iteration 1

CUSTKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL MKTSEGMENT COMMENT Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22

slide-26
SLIDE 26

Iteration 2

CUSTKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL MKTSEGMENT COMMENT Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22

slide-27
SLIDE 27

Iteration 3

CUSTKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL MKTSEGMENT COMMENT Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22

slide-28
SLIDE 28

Granularity

FILE

Granularity

DATA PAGE

Granularity

DATABASE BLOCK

Hardware

HARD DISK

Hardware

MAIN MEMORY

Workload

OFFLINE

Workload

ONLINE

Replication

NONE

Replication

FULL

Replication

PARTIAL

System

CUSTOM

System

COST MODEL

System

OPEN SOURCE

Settings for VP Algos

28

slide-29
SLIDE 29

Settings for VP Algos

29

Granularity

FILE

Granularity

DATA PAGE

Granularity

DATABASE BLOCK

Hardware

HARD DISK

Hardware

MAIN MEMORY

Workload

OFFLINE

Workload

ONLINE

Replication

NONE

Replication

FULL

Replication

PARTIAL

System

CUSTOM

System

COST MODEL

System

OPEN SOURCE

slide-30
SLIDE 30

Settings for VP Algos

30

AutoPart HillClimb HYRISE Navathe O2P Trojan

Granularity

FILE

Granularity

DATA PAGE

Granularity

DATABASE BLOCK

Hardware

HARD DISK

Hardware

MAIN MEMORY

Workload

OFFLINE

Workload

ONLINE

Replication

NONE

Replication

FULL

Replication

PARTIAL

System

CUSTOM

System

COST MODEL

System

OPEN SOURCE

slide-31
SLIDE 31

Experimental Results

31

slide-32
SLIDE 32

TPC-H Query Runtimes

150 300 450 600 AutoPart HillClimb HYRISE Navathe O2P Trojan BruteForce Column 393 381 381 506 481 387 381 381

  • Estd. workload runtime (s)

Row 2,058

32

slide-33
SLIDE 33

Unnecessary Data Read

0% 25% 50% 75% 100% AutoPart HillClimb HYRISE Navathe O2P Trojan BruteForce Column Row 83.81% 0% 0.80% 0.91% 25.37% 21.34% 0.00% 0.80% 0.80% Unnecessary data read (%)

33

slide-34
SLIDE 34

Average #Tuple Reconstruction Joins

1 2 3 AutoPart HillClimb HYRISE Navathe O2P Trojan BruteForce Column Row

  • Avg. tuple reconstruction joins

34

slide-35
SLIDE 35

Try another Benchmark

[O’Neil et al: Star Schema Benchmark]

35

Layout TPC-H SSB AutoPart 3.71% 5.29% HillClimb 3.71% 5.29% HYRISE 1.58% 5.27% Navathe

  • 21.47%

1.64% O2P

  • 27.74%

1.64% Trojan 3.71% 0.05% BruteForce 3.71% 5.29%

Improvement over Column-layout

slide-36
SLIDE 36

Try another Cost Model

[Grund et al: HYRISE, VLDB’10]

36

Layout HDD MM AutoPart 3.71% 0.00% HillClimb 3.71% 0.00% HYRISE 1.58% 0.00% Navathe

  • 21.47%
  • 15.07%

O2P

  • 27.74%
  • 15.53%

Trojan 3.71% 0.00% BruteForce 3.71% 0.00%

Improvement over Column-layout

slide-37
SLIDE 37

Try it in DBMS-X

Compression Row Column HillClimb Default (LZO or Delta) 1652 s 377 s 450 s Dictionary 1265 s 511 s 532 s

37

Actual Workload Runtimes

slide-38
SLIDE 38

Buffer Size is Crucial

0% 25% 50% 75% 100% 125% 150% 0.01 0.1 1 10 100 1,000 10,000 Normalized estd. costs (%) Buffer Size (MB, log scale)

HillClimb Navathe Materialized views Column

38

slide-39
SLIDE 39

Summary

  • Buffer size is crucial
  • Column layout is good enough
  • HillClimb is the best

VP algorithm