Relax and Let the Database do the Partitioning Online Alekh Jindal, - - PowerPoint PPT Presentation

relax and let the database do the partitioning online
SMART_READER_LITE
LIVE PREVIEW

Relax and Let the Database do the Partitioning Online Alekh Jindal, - - PowerPoint PPT Presentation

Information Systems Group Relax and Let the Database do the Partitioning Online Alekh Jindal, Jens Dittrich - presented by Stefan Schuh VLDB International Workshop on Real-Time Business Intelligence September 2, 2011 Thursday, September 1,


slide-1
SLIDE 1

Information Systems Group

Relax and Let the Database do the Partitioning Online

Alekh Jindal, Jens Dittrich

VLDB International Workshop on Real-Time Business Intelligence

  • presented by Stefan Schuh

September 2, 2011

Thursday, September 1, 2011

slide-2
SLIDE 2

Motivation: Offline Physical Database Design

2

  • ffline workload analysis
  • ffline transformation

BI Applications Database Advisory Tools DBA

Thursday, September 1, 2011

slide-3
SLIDE 3

Offline Design Cheats!

  • Workloads infrequently change over time

3

  • DBAs always available
  • Physical design once-in-a-while process
  • DBAs make perfect decisions

Thursday, September 1, 2011

slide-4
SLIDE 4

Motivation: Offline Physical Database Design

4

  • ffline workload analysis
  • ffline transformation

BI Applications Database Advisory Tools DBA

Thursday, September 1, 2011

slide-5
SLIDE 5

Motivation: Online Physical Database Design

5 BI Applications Database

Sub-Problem Proposed Solution Indexing Online Indexing Database Cracking Adaptive Indexing Materialized Views Dynamic Materialized Views Partitioning

WE!

Thursday, September 1, 2011

slide-6
SLIDE 6

Challenges in Online Partitioning

  • Collecting online workload

6

  • Analyzing workload online
  • Creating partitions online
  • Querying with online workload analysis

Thursday, September 1, 2011

slide-7
SLIDE 7

Challenges in Online Partitioning

  • Collecting online workload

6

  • Analyzing workload online
  • Creating partitions online
  • Querying with online workload analysis

Thursday, September 1, 2011

slide-8
SLIDE 8

What is the Workload?

  • offline approach: take the last query log as

workload (static)

7

  • online approach: collect incoming queries in a

window and slide it when more queries come (dynamic)

Thursday, September 1, 2011

slide-9
SLIDE 9

What is the Workload?

  • offline approach: take the last query log as

workload (static)

7

  • online approach: collect incoming queries in a

window and slide it when more queries come (dynamic)

Q1 Q2 Q3 Q4 Q5

Window Size = 5

Thursday, September 1, 2011

slide-10
SLIDE 10

What is the Workload?

  • offline approach: take the last query log as

workload (static)

7

  • online approach: collect incoming queries in a

window and slide it when more queries come (dynamic)

Q1 Q2 Q3 Q4 Q5 Q6

Window Size = 5

Thursday, September 1, 2011

slide-11
SLIDE 11

What is the Workload?

  • offline approach: take the last query log as

workload (static)

7

  • online approach: collect incoming queries in a

window and slide it when more queries come (dynamic)

Q1 Q2 Q3 Q4 Q5 Q6 Q7

Window Size = 5

Thursday, September 1, 2011

slide-12
SLIDE 12

What is the Workload?

  • offline approach: take the last query log as

workload (static)

7

  • online approach: collect incoming queries in a

window and slide it when more queries come (dynamic)

Q1 Q2 Q3 Q4 Q5 Q6 Q7

Window Size = 5

.....

Thursday, September 1, 2011

slide-13
SLIDE 13

a1 a2 a3 a4 a5 a6

How to Express the Partitioning Problem?

  • Partitioning unit e.g. a1, a2, a3, a4, a5, a6

8

set Pu

Thursday, September 1, 2011

slide-14
SLIDE 14

a1 a2 a3 a4 a5 a6

How to Express the Partitioning Problem?

  • Partitioning unit e.g. a1, a2, a3, a4, a5, a6

8

set Pu

  • ordering e.g. a3 a2 a1 a5 a4 a6
  • rdering ⇥,

(

a1 a2 a3 a4 a5 a6 a3 a2 a1 a5 a4 a6

set Pu

  • rdering ⇥,

(

  • rdering ⇥,

(

  • rdering ⇥,

(

  • rdering ⇥,

(

  • rdering ⇥,

(

Thursday, September 1, 2011

slide-15
SLIDE 15

a1 a2 a3 a4 a5 a6

How to Express the Partitioning Problem?

  • Partitioning unit e.g. a1, a2, a3, a4, a5, a6

8

a1 a2 a3 a4 a5 a6

1 0 1 ] [

  • Split line, Split vector e.g. [01001]

set Pu

ctor S

  • ordering e.g. a3 a2 a1 a5 a4 a6
  • rdering ⇥,

(

a1 a2 a3 a4 a5 a6 a3 a2 a1 a5 a4 a6

set Pu

  • rdering ⇥,

(

  • rdering ⇥,

(

  • rdering ⇥,

(

  • rdering ⇥,

(

  • rdering ⇥,

(

Thursday, September 1, 2011

slide-16
SLIDE 16

a1 a2 a3 a4 a5 a6

How to Express the Partitioning Problem?

  • Partitioning unit e.g. a1, a2, a3, a4, a5, a6

8

a1 a2 a3 a4 a5 a6 1 0 1 ] [

  • Split line, Split vector e.g. [01001]
  • Partition e.g. (a1, a2)

set Pu

ctor S

artition pm,r(S,)

  • ordering e.g. a3 a2 a1 a5 a4 a6
  • rdering ⇥,

(

a1 a2 a3 a4 a5 a6 a3 a2 a1 a5 a4 a6

(a1 , a2)

a1 a2 a3 a4 a5 a6

set Pu

  • rdering ⇥,

(

  • rdering ⇥,

(

  • rdering ⇥,

(

  • rdering ⇥,

(

  • rdering ⇥,

(

Thursday, September 1, 2011

slide-17
SLIDE 17

a1 a2 a3 a4 a5 a6

How to Express the Partitioning Problem?

  • Partitioning unit e.g. a1, a2, a3, a4, a5, a6

8

a1 a2 a3 a4 a5 a6 1 0 1 ] [

  • Split line, Split vector e.g. [01001]
  • Partition e.g. (a1, a2)
  • Partitioning scheme e.g. (a1, a2), (a3, a4, a5), (a6)

set Pu

ctor S

artition pm,r(S,)

scheme P(S,)

  • ordering e.g. a3 a2 a1 a5 a4 a6
  • rdering ⇥,

(

a1 a2 a3 a4 a5 a6 a3 a2 a1 a5 a4 a6

(a1 , a2)

a1 a2 a3 a4 a5 a6

(a1 , a2) (a3 , a4, a5) (a6) { }

a1 a2 a3 a4 a5 a6

set Pu

  • rdering ⇥,

(

  • rdering ⇥,

(

  • rdering ⇥,

(

  • rdering ⇥,

(

  • rdering ⇥,

(

Thursday, September 1, 2011

slide-18
SLIDE 18

What about Horizontal Partitioning?

  • Just rotate the table by 90 degrees

9

a1 a2 a3 a4 a5 a6 r6 r5 r4 r3 r2 r1 a1 a2 a3 a4 a5 a6

set Pu

  • abstraction allows us to solve both problems
  • can be attributes, row-ranges, or any other table slice

set Pu

Thursday, September 1, 2011

slide-19
SLIDE 19

Partitioning Problem: What to Analyze?

10

  • Partitioning unit e.g. a1, a2, a3, a4, a5, a6
  • Split line, Split vector e.g. [01001]
  • Partition e.g. (a1, a2)
  • Problem statement

Find , such that:

set Pu

ctor S

artition pm,r(S,)

scheme P(S,)

S⇥ = argmin

S

Cest.

  • Wtk, P(S, )

S⇥

  • Workload
  • ordering e.g. a3 a2 a1 a5 a4 a6
  • rdering ⇥,

(

ad Wtk

set Pu

  • rdering ⇥,

(

  • rdering ⇥,

(

  • rdering ⇥,

(

  • rdering ⇥,

(

  • rdering ⇥,

(

  • rdering ⇥,

(

  • Partitioning scheme e.g. (a1, a2), (a3, a4, a5), (a6)

Thursday, September 1, 2011

slide-20
SLIDE 20

How to Analyze the Workload?

11

  • offline approach: create affinity matrix and

cluster it once, as proposed by Navathe et. al.

  • online approach: leverage the affinity idea, but

dynamically update and cluster the affinity matrix

Step 1: Finding Partitioning Unit Ordering

Thursday, September 1, 2011

slide-21
SLIDE 21

Offline Partitioning Unit Ordering

  • Create affinity matrix having attributes co-occurrences

12

  • Cluster affinity matrix to maximize the affinity measure

⇥ ⇥ as M(⇥) = x

i=1

x

j=1 A(ai, aj)[A(ai, aj−1)+A(ai, aj+1)].

) = A(a , a ) = A(a , a ) = A(a , a )=0. For the

PartKey SuppKey Quantity PartKey 8 5 6 SuppKey 5 8 4 Quantity 6 4 9 PartKey Quantity SuppKey PartKey 8 6 5 Quantity 6 9 4 SuppKey 5 4 8 PartKey SuppKey Quantity PartKey 8 5 6 SuppKey 5 8 4 Quantity 6 4 9

e M(⇥) matrix has e M(⇥) matrix has

= 404 = 440

Thursday, September 1, 2011

slide-22
SLIDE 22

Online Partitioning Unit Ordering

  • Update only the referenced in affinity matrix

13

  • Re-cluster only the referenced in affinity matrix

SuppKey PartKey Quantity SuppKey 6 9 6 PartKey 4 6 9 Quantity 9 6 4 PartKey Quantity SuppKey PartKey 8 6 5 Quantity 6 9 4 SuppKey 5 4 8 PartKey Quantity SuppKey PartKey 9 6 6 Quantity 6 9 4 SuppKey 6 4 9

+48

PartKey Quantity SuppKey PartKey 9 6 6 Quantity 6 9 4 SuppKey 6 4 9

set Pu set Pu

(PartKey, SuppKey)

Thursday, September 1, 2011

slide-23
SLIDE 23

How to Analyze the Workload?

14

  • offline approach: consider all possible split vectors

(brute force)

Step 2: Enumerating Split Vectors

a1 a2 a3 .. ... an

is 2n1.

Complexity:

Thursday, September 1, 2011

slide-24
SLIDE 24

How to Analyze the Workload?

15

  • offline approach: consider all possible split vectors

(brute force)

  • online approach: One-dimensional Online

Partitioning (O2P) Algorithm

Technique 1: prune non-referenced partitioning units Technique 2: consider split vectors greedily Technique 3: save previous best split vectors using

dynamic programming

Step 2: Enumerating Split Vectors

Thursday, September 1, 2011

slide-25
SLIDE 25

Partitioning Unit Pruning

Idea: Prune the unused (non-referenced) in at most two separate partitions

16

a1 a2 a3 .. ... an

Complexity: For p leading and q trailing unused

set Pu

is 2npq1.

set Pu

Thursday, September 1, 2011

slide-26
SLIDE 26

Idea: Mark only one (best) split vector at a time

17

a1 a2 a3 .. ... an a1 a2 a3 .. ... an a1 a2 a3 .. ... an

Complexity: worst case n2

Greedy Split Vector Enumeration

Thursday, September 1, 2011

slide-27
SLIDE 27

Idea: save best split vectors in un-split partitions

18

a1 a2 a3 .. ... an a1 a2 a3 .. ... an a1 a2 a3 .. ... an

Best Best

Dynamic Programming

Thursday, September 1, 2011

slide-28
SLIDE 28

How to Amortize Partitioning Analysis?

  • offline approach: querying after computing and

creating partitions

19

Query Cost Analysis Cost Query Cost Analysis Cost

  • online approach:
  • ption1: interleave queries with partitioning analysis
  • ption2: queries in a separate thread

Thursday, September 1, 2011

slide-29
SLIDE 29

Goals of the Experiments

20

  • How much is O2P faster?
  • Does greedy partitioning hurt Quality?
  • Will our approach work on real systems?
  • Can such a system adapt to changing workload ?

Thursday, September 1, 2011

slide-30
SLIDE 30

Dynamic Workload

  • Mix of OLTP and OLAP style queries
  • OLTP: 1% selectivity and 75-100% attributes
  • OLAP: 10% selectivity and 1-25% attributes
  • Vary the fraction of OLTP-OLAP over time

21

Thursday, September 1, 2011

slide-31
SLIDE 31

Does Greedy Partitioning Hurt Quality?

22

Quality: Ratio of expected query costs of optimal partitioning and the partitioning produced by the algorithm

Customer Lineitem Optimal Navathe O2P Optimal Navathe O2P Quality

100% 99.29% 92.76% 100% 97.45% 95.80%

Iterations

100% 14.60% 2.28% 100% 2.42% 0.14%

We can see that

2

significantly reduces the number

Thursday, September 1, 2011

slide-32
SLIDE 32

How much is O2P Faster?

Setup: TPC-H Lineitem table, 10,000 queries in total

23

0.001 0.01 0.1 1 10 100 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Analysis Time (sec) Fraction of OLAP against OLTP queries NV/HC O2Pp O2Ppg O2pgd

Thursday, September 1, 2011

slide-33
SLIDE 33

Can such a System Adapt to Changing Workload ?

Setup: Universal relation de-normalized from TPC-H schema *, SF 1

24

17.8978616

10 20 30 40 50 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Workload Execution Time (sec) Fraction of OLAP against OLTP queries No Partitioning Full Vertical Partitioning AutoStore (O2Ppgd) AutoStore (O2Ppgdm) AutoStore (O2Ppgda)

* Constant-Time Query Processing, V. Raman et.al., ICDE 2008

Thursday, September 1, 2011

slide-34
SLIDE 34

Will our Approach Work on Real System?

Setup: TPC-H Customer table, SF 1, BerkeleyDB

25 25.5781685

750 1500 2250 3000

No Partitioning Full Vertical Partitioning AutoStore

22.818 1324.88 2113.447 2290.16

Execution Time Analysis Time

Cumulative Workload Execution Time (sec)

Thursday, September 1, 2011

slide-35
SLIDE 35

So Whats the Point Again?

  • Workloads infrequently change
  • ver time

26

  • DBAs always available
  • Physical design once-in-a-while

process

  • DBAs make perfect decisions

Thursday, September 1, 2011

slide-36
SLIDE 36

Summary

27

Motivation: Online Physical Database Design

5 BI Applications Database Sub-Problem Proposed Solution Indexing Online Indexing Database Cracking Adaptive Indexing Materialized Views Dynamic Materialized Views Partitioning

WE!

Partitioning Problem: What to Analyze?

10
  • Partitioning unit e.g. a1, a2, a3, a4, a5, a6
  • Split line, Split vector e.g. [01001]
  • Partition e.g. (a1, a2)
  • Partitioning scheme e.g. (a1, a2), (a3, a4), (a5, a6)
  • Problem statement

Find , such that:

set Pu

ctor S

artition pm,r(S,)

scheme P(S,)

S⇥ = argmin

S

Cest.

  • Wtk, P(S, )

S⇥

  • Workload
  • ordering e.g. a3 a2 a1 a5 a4 a6
  • rdering ⇥,

(

ad Wtk

set Pu

  • rdering ⇥,

(

  • rdering ⇥,
(
  • rdering ⇥,
(
  • rdering ⇥,
(
  • rdering ⇥,
(
  • rdering ⇥,
(

Online Partitioning Unit Ordering

  • Update only the referenced in affinity matrix
13
  • Re-cluster only the referenced in affinity matrix
SuppKey PartKey Quantity SuppKey 6 9 6 PartKey 4 6 9 Quantity 9 6 4 PartKey Quantity SuppKey PartKey 8 6 5 Quantity 6 9 4 SuppKey 5 4 8 PartKey Quantity SuppKey PartKey 9 6 6 Quantity 6 9 4 SuppKey 6 4 9 +48 PartKey Quantity SuppKey PartKey 9 6 6 Quantity 6 9 4 SuppKey 6 4 9

set Pu set Pu

(PartKey, SuppKey)

How to Analyze the Workload?

15
  • offline approach: consider all possible split vectors

(brute force)

  • online approach: One-dimensional Online

Partitioning (O2P) Algorithm

  • prune non-referenced partitioning units
  • consider split vectors greedily
  • save previous best split vectors using dynamic programming

Step 2: Enumerating Split Vectors

How much is O2P Faster?

Setup: TPC-H Lineitem table, 10,000 queries in total

23 0.001 0.01 0.1 1 10 100 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Analysis Time (sec) Fraction of OLAP against OLTP queries NV/HC O2Pp O2Ppg O2pgd

Can such a System Adapt to Changing Workload ?

Setup: Universal relation de-normalized from TPC-H schema, 11 attributes, SF 1

24 10 20 30 40 50 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Workload Execution Time (sec) Fraction of OLAP against OLTP queries No Partitioning Full Vertical Partitioning AutoStore (O2Ppgd) AutoStore (O2Ppgdm) AutoStore (O2Ppgda)

Thursday, September 1, 2011