Merging Whats Cracked, Cracking Whats Merged Adaptive Indexing in - - PowerPoint PPT Presentation

merging what s cracked cracking what s merged
SMART_READER_LITE
LIVE PREVIEW

Merging Whats Cracked, Cracking Whats Merged Adaptive Indexing in - - PowerPoint PPT Presentation

Merging Whats Cracked, Cracking Whats Merged Adaptive Indexing in Main-Memory Column-Stores Stratos Idreos, Stefan Manegold, (CWI) Harumi Kuno, Goetz Graefe (HP Labs) PVLDB 2011, 4(9) Physical design problem Database systems perform


slide-1
SLIDE 1

Merging What’s Cracked, Cracking What’s Merged

Adaptive Indexing in Main-Memory Column-Stores

Stratos Idreos, Stefan Manegold, (CWI) Harumi Kuno, Goetz Graefe (HP Labs) PVLDB 2011, 4(9)

slide-2
SLIDE 2

Physical design problem

which indexes to build?

  • n which data parts?

and when to build them? Database systems perform efficiently

  • nly after proper tuning...

DBA without cracking

slide-3
SLIDE 3

Physical Design

Sample Workload

Timeline

slide-4
SLIDE 4

Physical Design

Sample Workload Analyze Performance

Timeline

slide-5
SLIDE 5

Physical Design

Sample Workload Analyze Performance Prepare Estimated physical design

Timeline

slide-6
SLIDE 6

Physical Design

Sample Workload Analyze Performance Prepare Estimated physical design

Timeline

Queries

slide-7
SLIDE 7

Physical Design

Sample Workload Analyze Performance Prepare Estimated physical design

Timeline

Queries

Complex and time consuming process

slide-8
SLIDE 8

Physical Design

Sample Workload Analyze Performance Prepare Estimated physical design

Timeline

Queries

Complex and time consuming process

?

Dynamic Workloads Very Large Databases?

slide-9
SLIDE 9

Dynamic environments

idle time workload knowledge

slide-10
SLIDE 10

Dynamic environments

idle time workload knowledge

some problem cases

slide-11
SLIDE 11

Dynamic environments

idle time workload knowledge

  • Not enough idle time to finish proper tuning

some problem cases

slide-12
SLIDE 12

Dynamic environments

  • By the time we finish tuning, the workload changes

idle time workload knowledge

  • Not enough idle time to finish proper tuning

some problem cases

slide-13
SLIDE 13

Dynamic environments

  • By the time we finish tuning, the workload changes
  • No index support during tuning

idle time workload knowledge

  • Not enough idle time to finish proper tuning

some problem cases

slide-14
SLIDE 14

Dynamic environments

  • By the time we finish tuning, the workload changes
  • No index support during tuning
  • Not all data parts are equally useful

idle time workload knowledge

  • Not enough idle time to finish proper tuning

some problem cases

slide-15
SLIDE 15

Database Cracking

Remove all tuning, physical design steps but still get similar performance as a fully tuned system

How? Design new auto-tuning kernels

(operators, plans, structures, etc.) For dynamic environments:

DBA with cracking

slide-16
SLIDE 16

Database Cracking

no monitoring no preparation no human involvement no external tools no full indexes

slide-17
SLIDE 17

Database Cracking

no monitoring no preparation no human involvement no external tools

Continuous on-the-fly physical reorganization

no full indexes

slide-18
SLIDE 18

Database Cracking

no monitoring no preparation no human involvement no external tools

Continuous on-the-fly physical reorganization

no full indexes

partial, incremental, adaptive indexing

slide-19
SLIDE 19

Database Cracking

no monitoring no preparation no human involvement no external tools

Continuous on-the-fly physical reorganization designed for modern column-stores

no full indexes

partial, incremental, adaptive indexing

slide-20
SLIDE 20

Database Cracking

Each query is treated as an advice

  • n how data should be stored
slide-21
SLIDE 21

Cracking Example

Database Cracking CIDR 2007

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

slide-22
SLIDE 22

Cracking Example

Database Cracking CIDR 2007

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate

slide-23
SLIDE 23

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate

Database Cracking CIDR 2007

slide-24
SLIDE 24

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate

Database Cracking CIDR 2007

slide-25
SLIDE 25

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate

Database Cracking CIDR 2007

slide-26
SLIDE 26

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate

Database Cracking CIDR 2007

slide-27
SLIDE 27

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate

Database Cracking CIDR 2007

slide-28
SLIDE 28

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate

Database Cracking CIDR 2007

slide-29
SLIDE 29

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate

Database Cracking CIDR 2007

slide-30
SLIDE 30

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate Result tuples

Database Cracking CIDR 2007

slide-31
SLIDE 31

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Gain knowledge

  • n how data is
  • rganized

Physically reorganize based on the selection predicate Result tuples

Database Cracking CIDR 2007

slide-32
SLIDE 32

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Gain knowledge

  • n how data is
  • rganized

Dynamically/on-the-fly within the select-operator

Physically reorganize based on the selection predicate Result tuples

Database Cracking CIDR 2007

slide-33
SLIDE 33

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

slide-34
SLIDE 34

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

slide-35
SLIDE 35

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

slide-36
SLIDE 36

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

slide-37
SLIDE 37

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

slide-38
SLIDE 38

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

slide-39
SLIDE 39

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

slide-40
SLIDE 40

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

slide-41
SLIDE 41

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

slide-42
SLIDE 42

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

Physically reorganize based on the selection predicate Result tuples

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

slide-43
SLIDE 43

Cracking Example

Each query is treated as an advice on how data should be stored

from R where R.A > 10 and R.A < 14 select * Q2: select * from R where R.A > 7 and R.A <= 16 Q1: 1 3 6 7 9 8 13 12 11 14 16 19 Piece 5: 16 < A Piece 3: 10 < A < 14 Piece 1: A <= 7 Piece 2: 7 < A <= 10 Piece 4: 14 <= A <= 16 Cracker column of A Cracker column of A 10 < A < 14 14 <= A A <= 10 Piece 1: Piece 3: Piece 2: (in−place) (copy) Q1 Q2 Column A 13 16 4 9 2 12 7 1 19 3 14 11 8 6 4 9 2 7 1 3 8 6 13 12 11 16 19 14 4 2

The more we crack, the more we learn

Physically reorganize based on the selection predicate Result tuples

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

slide-44
SLIDE 44

Self-organizing behavior (TPC-H)

70 330 100 150 200 250 300 5 10 15 20 25 30 Query sequence TPC-H Query 15 764 420 1000 10000

  • Sel. Crack
  • Sid. Crack

MonetDB Presorted MySQL Presorted Response time (milli secs)

Database Cracking, SIGMOD 09

slide-45
SLIDE 45

Self-organizing behavior (TPC-H)

70 330 100 150 200 250 300 5 10 15 20 25 30 Query sequence TPC-H Query 15 764 420 1000 10000

  • Sel. Crack
  • Sid. Crack

MonetDB Presorted MySQL Presorted Response time (milli secs)

Normal MonetDB selection cracking

Database Cracking, SIGMOD 09

slide-46
SLIDE 46

Self-organizing behavior (TPC-H)

70 330 100 150 200 250 300 5 10 15 20 25 30 Query sequence TPC-H Query 15 764 420 1000 10000

  • Sel. Crack
  • Sid. Crack

MonetDB Presorted MySQL Presorted

Preparation cost 3-14 minutes

Response time (milli secs)

Presorted MonetDB Normal MonetDB selection cracking

Database Cracking, SIGMOD 09

slide-47
SLIDE 47

Self-organizing behavior (TPC-H)

70 330 100 150 200 250 300 5 10 15 20 25 30 Query sequence TPC-H Query 15 764 420 1000 10000

  • Sel. Crack
  • Sid. Crack

MonetDB Presorted MySQL Presorted

Preparation cost 3-14 minutes

Response time (milli secs)

Presorted MonetDB MonetDB with sideways cracking Normal MonetDB selection cracking

Database Cracking, SIGMOD 09

slide-48
SLIDE 48

Self-organizing behavior (TPC-H)

70 330 100 150 200 250 300 5 10 15 20 25 30 Query sequence TPC-H Query 15 764 420 1000 10000

  • Sel. Crack
  • Sid. Crack

MonetDB Presorted MySQL Presorted

Preparation cost 3-14 minutes

Response time (milli secs)

Presorted MonetDB MonetDB with sideways cracking Normal MonetDB selection cracking

Database Cracking, SIGMOD 09

slide-49
SLIDE 49

Self-organizing behavior (TPC-H)

70 330 100 150 200 250 300 5 10 15 20 25 30 Query sequence TPC-H Query 15 764 420 1000 10000

  • Sel. Crack
  • Sid. Crack

MonetDB Presorted MySQL Presorted

Preparation cost 3-14 minutes

Response time (milli secs)

Presorted MonetDB MonetDB with sideways cracking Normal MonetDB selection cracking

Database Cracking, SIGMOD 09

slide-50
SLIDE 50

Self-organizing behavior (TPC-H)

70 330 100 150 200 250 300 5 10 15 20 25 30 Query sequence TPC-H Query 15 764 420 1000 10000

  • Sel. Crack
  • Sid. Crack

MonetDB Presorted MySQL Presorted

Preparation cost 3-14 minutes

Response time (milli secs)

Presorted MonetDB MonetDB with sideways cracking Normal MonetDB selection cracking

Database Cracking, SIGMOD 09

slide-51
SLIDE 51

Indexing Overview

workload analysis index building query processing

  • ffline indexing

workload analysis workload analysis

slide-52
SLIDE 52

Indexing Overview

workload analysis index building query processing

  • ffline indexing
  • nline indexing

workload analysis index building query processing workload analysis workload analysis

slide-53
SLIDE 53

Indexing Overview

workload analysis index building query processing

  • ffline indexing
  • nline indexing

adaptive indexing workload analysis index building query processing workload analysis workload analysis adaptive indexing

slide-54
SLIDE 54

Indexing Overview

workload analysis index building query processing

  • ffline indexing
  • nline indexing

adaptive indexing workload analysis index building query processing workload analysis workload analysis adaptive indexing

workload knowledge idle time adaptive

  • nline
  • ffline
slide-55
SLIDE 55

Database Cracking

Each query is treated as an advice on how data should be stored

CIDR’07 Selection cracking SIGMOD’07 Updates SIGMOD’09 Sideways and partial cracking

Can be thought of as an incremental quicksort

slide-56
SLIDE 56

Database Cracking

Each query is treated as an advice on how data should be stored

CIDR’07 Selection cracking SIGMOD’07 Updates SIGMOD’09 Sideways and partial cracking The core cracking algorithm is extremely lazy

Can be thought of as an incremental quicksort

slide-57
SLIDE 57

Adaptive Merging

Incremental sort via external merge sort steps 100

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno

slide-58
SLIDE 58

Adaptive Merging

Incremental sort via external merge sort steps 100

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno

slide-59
SLIDE 59

Adaptive Merging

Incremental sort via external merge sort steps

select(A,50,100)

100

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno

slide-60
SLIDE 60

Adaptive Merging

Incremental sort via external merge sort steps

sort

select(A,50,100)

100

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno

slide-61
SLIDE 61

Adaptive Merging

Incremental sort via external merge sort steps

sort sort

select(A,50,100)

100

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno

slide-62
SLIDE 62

Adaptive Merging

Incremental sort via external merge sort steps

sort sort sort

select(A,50,100)

100

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno

slide-63
SLIDE 63

Adaptive Merging

Incremental sort via external merge sort steps

sort sort sort sort

select(A,50,100)

100

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno

slide-64
SLIDE 64

Adaptive Merging

Incremental sort via external merge sort steps

sort sort sort sort

select(A,50,100)

binary search 100

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno

slide-65
SLIDE 65

Adaptive Merging

Incremental sort via external merge sort steps

sort sort sort sort

select(A,50,100)

binary search 100 binary search

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno

slide-66
SLIDE 66

Adaptive Merging

Incremental sort via external merge sort steps

sort sort sort sort

select(A,50,100)

binary search 100 binary search binary search

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno

slide-67
SLIDE 67

Adaptive Merging

Incremental sort via external merge sort steps

sort sort sort sort

select(A,50,100)

binary search 100 binary search binary search binary search

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno

slide-68
SLIDE 68

Adaptive Merging

Incremental sort via external merge sort steps

sort sort sort sort

select(A,50,100)

50 100 binary search 100 binary search binary search binary search

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno

slide-69
SLIDE 69

Adaptive Merging

Incremental sort via external merge sort steps

sort sort sort sort

select(A,50,100)

50 100 binary search 100 binary search binary search binary search

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno

slide-70
SLIDE 70

Adaptive Merging

Incremental sort via external merge sort steps

sort sort sort sort

select(A,50,100)

50 100 binary search 100 binary search binary search binary search

sorted

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno

slide-71
SLIDE 71

Adaptive Merging

Incremental sort via external merge sort steps

sort sort sort sort

select(A,50,100)

50 100 binary search 100 binary search binary search binary search

sorted

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno Initial Final

slide-72
SLIDE 72

Adaptive Merging

Incremental sort via external merge sort steps

sort sort sort sort

select(A,50,100) select(A,55,70)

50 100 100

sorted

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno Initial Final

slide-73
SLIDE 73

Adaptive Merging

Incremental sort via external merge sort steps

sort sort sort sort

select(A,50,100) select(A,55,70)

50 100 50 100 100

sorted

binary search

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno Initial Final

slide-74
SLIDE 74

Adaptive Merging

Incremental sort via external merge sort steps

sort sort sort sort

select(A,50,100) select(A,55,70)

50 100 50 100

select(A,150,170)

100

sorted

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno Initial Final

slide-75
SLIDE 75

Adaptive Merging

Incremental sort via external merge sort steps

sort sort sort sort

select(A,50,100) select(A,55,70)

50 100 50 100

select(A,150,170)

50 100 100

sorted

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno Initial Final

slide-76
SLIDE 76

Adaptive Merging

Incremental sort via external merge sort steps

sort sort sort sort

select(A,50,100) select(A,55,70)

50 100 50 100

select(A,150,170)

50 100 100

sorted

binary search binary search binary search binary search

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno Initial Final

slide-77
SLIDE 77

Adaptive Merging

Incremental sort via external merge sort steps

sort sort sort sort

select(A,50,100) select(A,55,70)

50 100 50 100

select(A,150,170)

50 100 150 170 100

sorted

binary search binary search binary search binary search

EDBT’10, SMDB’10, Goetz Graefe and Harumi Kuno Initial Final

slide-78
SLIDE 78

Questions

  • Adaptive merging in column-stores?
  • Adaptive merging Vs Cracking?
  • Can we learn from both AM and Cracking?
slide-79
SLIDE 79

Performance Analysis

0.004 200 0.001 0.01 0.1 1 10 100 1 10 100 1000 10000 Query sequence c) All queries (cf., Fig. 10b) Scan Sort AM Crack

Cumulative Average (secs)

10K random selections selectivity 10% random value ranges in a 30 million integer column

set-up

slide-80
SLIDE 80

Performance Analysis

0.004 200 0.001 0.01 0.1 1 10 100 1 10 100 1000 10000 Query sequence c) All queries (cf., Fig. 10b) Scan Sort AM Crack

Cumulative Average (secs)

10K random selections selectivity 10% random value ranges in a 30 million integer column

set-up

slide-81
SLIDE 81

Performance Analysis

0.004 200 0.001 0.01 0.1 1 10 100 1 10 100 1000 10000 Query sequence c) All queries (cf., Fig. 10b) Scan Sort AM Crack

Cumulative Average (secs)

10K random selections selectivity 10% random value ranges in a 30 million integer column

set-up

slide-82
SLIDE 82

Performance Analysis

0.004 200 0.001 0.01 0.1 1 10 100 1 10 100 1000 10000 Query sequence c) All queries (cf., Fig. 10b) Scan Sort AM Crack

Cumulative Average (secs)

10K random selections selectivity 10% random value ranges in a 30 million integer column

set-up AM: high init overhead but fast convergence

slide-83
SLIDE 83

Performance Analysis

0.004 200 0.001 0.01 0.1 1 10 100 1 10 100 1000 10000 Query sequence c) All queries (cf., Fig. 10b) Scan Sort AM Crack

Cumulative Average (secs)

10K random selections selectivity 10% random value ranges in a 30 million integer column

set-up AM: high init overhead but fast convergence Crack: low init overhead but slow convergence

slide-84
SLIDE 84

Questions

Adaptive merging and Cracking are extremes

slide-85
SLIDE 85

Questions

What is there in between? Adaptive merging and Cracking are extremes

slide-86
SLIDE 86

Crack-Crack

vary initialization and incremental steps taken 100

slide-87
SLIDE 87

Crack-Crack

vary initialization and incremental steps taken 100

slide-88
SLIDE 88

Crack-Crack

vary initialization and incremental steps taken

select(A,50,100)

100

slide-89
SLIDE 89

Crack-Crack

vary initialization and incremental steps taken

crack

select(A,50,100)

100

slide-90
SLIDE 90

Crack-Crack

vary initialization and incremental steps taken

crack crack

select(A,50,100)

100

slide-91
SLIDE 91

Crack-Crack

vary initialization and incremental steps taken

crack crack crack

select(A,50,100)

100

slide-92
SLIDE 92

Crack-Crack

vary initialization and incremental steps taken

crack crack crack crack

select(A,50,100)

100

slide-93
SLIDE 93

Crack-Crack

vary initialization and incremental steps taken

crack crack crack crack

select(A,50,100)

50 100 100

slide-94
SLIDE 94

Crack-Crack

vary initialization and incremental steps taken

crack crack crack crack

select(A,50,100)

50 100 100

slide-95
SLIDE 95

Crack-Crack

vary initialization and incremental steps taken

crack crack crack crack

select(A,50,100)

50 100 100

not sorted

slide-96
SLIDE 96

Crack-Crack

vary initialization and incremental steps taken

crack crack crack crack

select(A,50,100) select(A,55,70)

50 100 100

not sorted

slide-97
SLIDE 97

Crack-Crack

vary initialization and incremental steps taken

crack crack crack crack

select(A,50,100) select(A,55,70)

50 100 50 100 100

not sorted

crack

slide-98
SLIDE 98

Crack-Crack

vary initialization and incremental steps taken

crack crack crack crack

select(A,50,100) select(A,55,70)

50 100 50 100

select(A,150,170)

100

not sorted

slide-99
SLIDE 99

Crack-Crack

vary initialization and incremental steps taken

crack crack crack crack

select(A,50,100) select(A,55,70)

50 100 50 100

select(A,150,170)

50 100 100

not sorted

slide-100
SLIDE 100

Crack-Crack

vary initialization and incremental steps taken

crack crack crack crack

select(A,50,100) select(A,55,70)

50 100 50 100

select(A,150,170)

50 100 150 170 crack 100 crack crack crack

not sorted

slide-101
SLIDE 101

Adaptive Indexing

Sort Radix Crack final partitions fast − convergence − slow initial partitions high − overhead − low HSC HSR HSS HRS HRR HRC HCC HCR HCS Crack Radix Sort slow − convergence − fast low − overhead − high

slide-102
SLIDE 102

Adaptive Indexing

Sort Radix Crack final partitions fast − convergence − slow initial partitions high − overhead − low HSC HSR HSS HRS HRR HRC HCC HCR HCS Crack Radix Sort slow − convergence − fast low − overhead − high

slide-103
SLIDE 103

Adaptive Indexing

Sort Radix Crack final partitions fast − convergence − slow initial partitions high − overhead − low HSC HSR HSS HRS HRR HRC HCC HCR HCS Crack Radix Sort slow − convergence − fast low − overhead − high

slide-104
SLIDE 104

Adaptive Indexing

Sort Radix Crack final partitions fast − convergence − slow initial partitions high − overhead − low HSC HSR HSS HRS HRR HRC HCC HCR HCS Crack Radix Sort slow − convergence − fast low − overhead − high

slide-105
SLIDE 105

Adaptive Indexing

Sort Radix Crack final partitions fast − convergence − slow initial partitions high − overhead − low HSC HSR HSS HRS HRR HRC HCC HCR HCS Crack Radix Sort slow − convergence − fast low − overhead − high

slide-106
SLIDE 106

Adaptive Indexing

1e-05 0.0001 0.001 0.01 0.1 1 10 100 Response time (secs) b) (scan) Hybrid: Crack Crack Crack Radix Crack Sort 1e-05 0.0001 0.001 0.01 0.1 1 10 100 Response time (secs) a) (scan) Scan Cracking Adaptive Merging Full Index

1 10 100 1000

Queries

slide-107
SLIDE 107

Adaptive Indexing

1e-05 0.0001 0.001 0.01 0.1 1 10 100 Response time (secs) b) (scan) Hybrid: Crack Crack Crack Radix Crack Sort 1e-05 0.0001 0.001 0.01 0.1 1 10 100 Response time (secs) a) (scan) Scan Cracking Adaptive Merging Full Index

1 10 100 1000

Queries

slide-108
SLIDE 108

Adaptive Indexing

1e-05 0.0001 0.001 0.01 0.1 1 10 100 Response time (secs) b) (scan) Hybrid: Crack Crack Crack Radix Crack Sort 1e-05 0.0001 0.001 0.01 0.1 1 10 100 Response time (secs) a) (scan) Scan Cracking Adaptive Merging Full Index

1 10 100 1000

Queries

slide-109
SLIDE 109

Adaptive Indexing

1e-05 0.0001 0.001 0.01 0.1 1 10 100 Response time (secs) b) (scan) Hybrid: Crack Crack Crack Radix Crack Sort 1e-05 0.0001 0.001 0.01 0.1 1 10 100 Response time (secs) a) (scan) Scan Cracking Adaptive Merging Full Index

1 10 100 1000

Queries

slide-110
SLIDE 110

Adaptive Indexing

1e-05 0.0001 0.001 0.01 0.1 1 10 100 Response time (secs) b) (scan) Hybrid: Crack Crack Crack Radix Crack Sort 1e-05 0.0001 0.001 0.01 0.1 1 10 100 Response time (secs) a) (scan) Scan Cracking Adaptive Merging Full Index

1 10 100 1000

Queries

slide-111
SLIDE 111

Adaptive Indexing

1e-05 0.0001 0.001 0.01 0.1 1 10 100 Response time (secs) b) (scan) Hybrid: Crack Crack Crack Radix Crack Sort 1e-05 0.0001 0.001 0.01 0.1 1 10 100 Response time (secs) a) (scan) Scan Cracking Adaptive Merging Full Index

1 10 100 1000

Queries

slide-112
SLIDE 112

Adaptive Indexing

How many queries before the index fully supports a random query? none Cost of first query relative to in- memory scan effort Full Index 1x Adaptive Merging Database Cracking 10 100 1000 2x 5x 10x Ideal Hybrid never Scan Bad Hybrid CC CR CS

Initialization Vs convergence tradeoff

slide-113
SLIDE 113

Adaptive Indexing

How many queries before the index fully supports a random query? none Cost of first query relative to in- memory scan effort Full Index 1x Adaptive Merging Database Cracking 10 100 1000 2x 5x 10x Ideal Hybrid never Scan Bad Hybrid CC CR CS

Initialization Vs convergence tradeoff

slide-114
SLIDE 114

Adaptive Indexing

How many queries before the index fully supports a random query? none Cost of first query relative to in- memory scan effort Full Index 1x Adaptive Merging Database Cracking 10 100 1000 2x 5x 10x Ideal Hybrid never Scan Bad Hybrid CC CR CS

Initialization Vs convergence tradeoff

slide-115
SLIDE 115

Adaptive Indexing

How many queries before the index fully supports a random query? none Cost of first query relative to in- memory scan effort Full Index 1x Adaptive Merging Database Cracking 10 100 1000 2x 5x 10x Ideal Hybrid never Scan Bad Hybrid CC CR CS

Initialization Vs convergence tradeoff

slide-116
SLIDE 116

Adaptive Indexing

How many queries before the index fully supports a random query? none Cost of first query relative to in- memory scan effort Full Index 1x Adaptive Merging Database Cracking 10 100 1000 2x 5x 10x Ideal Hybrid never Scan Bad Hybrid CC CR CS

Initialization Vs convergence tradeoff

Disk Concurrency

slide-117
SLIDE 117

Adaptive Indexing

How many queries before the index fully supports a random query? none Cost of first query relative to in- memory scan effort Full Index 1x Adaptive Merging Database Cracking 10 100 1000 2x 5x 10x Ideal Hybrid never Scan Bad Hybrid CC CR CS

Initialization Vs convergence tradeoff

Disk Concurrency

More active is best

slide-118
SLIDE 118

Adaptive Indexing

How many queries before the index fully supports a random query? none Cost of first query relative to in- memory scan effort Full Index 1x Adaptive Merging Database Cracking 10 100 1000 2x 5x 10x Ideal Hybrid never Scan Bad Hybrid CC CR CS

Initialization Vs convergence tradeoff

Disk Concurrency Updates

More active is best

slide-119
SLIDE 119

Adaptive Indexing

How many queries before the index fully supports a random query? none Cost of first query relative to in- memory scan effort Full Index 1x Adaptive Merging Database Cracking 10 100 1000 2x 5x 10x Ideal Hybrid never Scan Bad Hybrid CC CR CS

Initialization Vs convergence tradeoff

Disk Concurrency Updates

More active is best More lazy is best

slide-120
SLIDE 120

More in the paper...

Column-store design details and concerns Selectivity effects Concurrency control examples and more...

slide-121
SLIDE 121

Ongoing and open topics

Disk based Concurrency control

Multi-cores Compression Workload robustness Aggregations Row-stores Pipelining Optimizer rules

Adaptive Indexing + Auto tuning tools

...and many more...

slide-122
SLIDE 122

Ongoing and open topics

Disk based Concurrency control

Multi-cores Compression Workload robustness Aggregations Row-stores Pipelining Optimizer rules

Adaptive Indexing + Auto tuning tools

...and many more...

Thank you!