Skew-Aware Automatic Database Partitioning in Shared- Nothing, - - PowerPoint PPT Presentation

skew aware automatic
SMART_READER_LITE
LIVE PREVIEW

Skew-Aware Automatic Database Partitioning in Shared- Nothing, - - PowerPoint PPT Presentation

Skew-Aware Automatic Database Partitioning in Shared- Nothing, Parallel OLTP Systems SIGMOD 2012, Pavlo et al. Hefu Chai Credit Part of slides from Andy Pavlo There is a saying Girls are really only interested in two things. They


slide-1
SLIDE 1

Skew-Aware Automatic Database Partitioning in Shared- Nothing, Parallel OLTP Systems

SIGMOD 2012, Pavlo et al. Hefu Chai

slide-2
SLIDE 2

Credit

  • Part of slides from Andy Pavlo
slide-3
SLIDE 3

There is a saying…

  • Girls are really only interested in two things. They want a guy that is

good looking, or they want a guy that really knows a lot about databases. Andy Pavlo

slide-4
SLIDE 4

4

Client Application

Database Cluster

Procedure Name Input Parameters

Transaction Execution

Database Cluster

Transaction Result

slide-5
SLIDE 5

5

slide-6
SLIDE 6

Existing database partitioning Techniques

  • Notion of data declustering
  • Overhead of maintaining transaction consistency
  • Lock contention

Not applicable to OLTP systems !

slide-7
SLIDE 7

Fa Fast st Re Repe petitive titive Sma Small ll

OLT OLTP Tr P Transac ansactio tions ns

slide-8
SLIDE 8

We need an approach that supports…

  • Stored Procedure
  • Load balancing in the presence of time-varying skew
  • Complex schemas
  • Deployments with larger number of partitions
slide-9
SLIDE 9

Automatic Database Design Tool for Parallel Systems

Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems

SIGMOD 2012

slide-10
SLIDE 10

What are the key issues

  • Distributed transactions
  • Temporal workload skew
slide-11
SLIDE 11

25,000 50,000 75,000 100,000 125,000 150,000 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64

txn/s Partitions

No Distributed Txns 20% Distributed Txns

TPC-C NewOrder

11

Distributed transactions

slide-12
SLIDE 12

What are the key issues

  • Distributed transactions
  • Temporal workload skew
slide-13
SLIDE 13

Temporal workload skew

  • Think about the example of Wikipedia
  • Even though the average load of the cluster for the entire day is uniform, the

load across the cluster for any point is unbalanced

  • Static Skew Vs. Temporal Skew
slide-14
SLIDE 14

CUSTOME R ORDERS ITEM CUSTOME R ORDERS ITEM CUSTOME R ORDERS ITEM CUSTOME R ORDERS ITEM

Skew Estimator DTxn Estimator

Schema Workload

  • DDL

DDL

SELECT * FROM WARE REHO HOUSE USE WHERE W_ID = 10; INSERT INTO ORDERS ERS (O_W_ID, O_D_ID, O_C_ID) VALUES (10, 9, 12345);

SELECT * FROM WARE REHOUSE HOUSE WHERE W_ID = 10; INSERT INTO ORDERS ERS (O_W_ID, O_D_ID, O_C_ID) VALUES (10, 9, 12345);

SELECT * FROM WARE REHO HOUSE USE WHERE W_ID = 10; SELECT * FROM DISTRICT RICT D_W_ID = 10 AND D_ID =9; INSERT INTO ORDERS ERS (O_W_ID, O_D_ID, O_C_ID) VALUES (10, 9, 12345);

SELECT * FROM WAREHOUSE WHERE W_ID = 10; SELECT * FROM DISTRICT WHERE D_W_ID = 10 AND D_ID =9; INSERT INTO ORDERS (O_W_ID, O_D_ID, O_C_ID,…) VALUES (10, 9, 12345,…);

DDL DDL

CUSTOMER ORDERS ITEM

14

slide-15
SLIDE 15
  • Maintain the tradeoff

between distributed transactions and temporal skew

  • Extend design space

to include replicated secondary indexes

  • Organically handling

stored procedure routing Large Neighborhood Search Skew-Aware Cost Model

slide-16
SLIDE 16

For each table:

  • Horizontally partition
  • Replicate on all partitions
  • Replicate a secondary index for a subset of its column
  • Effectively route incoming transaction requests

What are the design options

slide-17
SLIDE 17
  • _id
  • _c_id

_id

  • _w_id

id …

78703 1004 5

  • 78704

1002 3

  • 78705

1006 7

  • 78706

1005 6

  • 78707

1005 6

  • 78708

1003 12

  • c_id

c_w_id id c_last st …

1001 5 RZA

  • 1002

3 GZA

  • 1003

12 Raekwon

  • 1004

5 Deck

  • 1005

6 Killah

  • 1006

7 ODB

  • CUSTOMER

ORDERS

CUSTOMER ORDERS CUSTOMER ORDERS CUSTOMER ORDERS

17

Horizontal Partitioning

slide-18
SLIDE 18

CUSTOMER ORDERS CUSTOMER ORDERS CUSTOMER ORDERS

ITEM

i_id i_name i_price ice …

603514 XXX 23.99

  • 267923

XXX 19.99

  • 475386

XXX 14.99

  • 578945

XXX 9.98

  • 476348

XXX 103.49

  • 784285

XXX 69.99

  • ITEM

ITEM ITEM

18

Table Replication

slide-19
SLIDE 19

CUSTOMER ORDERS CUSTOMER ORDERS CUSTOMER ORDERS ITM ITEM ITEM

CUSTOMER

c_id c_w_id id c_last st …

1001 5 RZA

  • 1002

3 GZA

  • 1003

12 Raekwon

  • 1004

5 Deck

  • 1005

6 Killah

  • 1006

7 ODB

  • 19

Secondary Index

slide-20
SLIDE 20

CUSTOMER ORDERS CUSTOMER ORDERS CUSTOMER ORDERS ITEM ITEM ITEM

Client Application NewOrder(5, “Method Man”, 1234)

20

Stored Procedure Routing

slide-21
SLIDE 21

What are the key technique contributions

  • Large-Neighborhood Search
  • Skew-Aware Cost Model
slide-22
SLIDE 22

Input

Workload

  • Schema

DD DD L

Initial Design

Large-Neighborhood Search

22

  • Select the most frequently accessed column for horizontal partitioning
  • Greedily replicate read-only tables until no space left
  • Select next most frequently accessed, read-only column as secondary
  • Index attribute
  • Select the routing parameter for stored procedures

Initial Design

slide-23
SLIDE 23

Initial Design Relaxation

Large-Neighborhood Search

23

  • Allow LNS to escape a local minimum and jump to a new neighborhood of potential solutions
  • Horticulture must decide:
  • How many tables to relax
  • Which tables to relax
  • What design options will be examined for each relaxed table

Relaxation

slide-24
SLIDE 24

Best Design Relaxation Local Search Restart

Large-Neighborhood Search

24

Local Search

slide-25
SLIDE 25

What are the key technique contributions

  • Large-Neighborhood Search
  • Skew-Aware Cost Model
slide-26
SLIDE 26

Distributed Transactions Workload Skew Factor

+

Cost Model

slide-27
SLIDE 27

Skew-Aware Cost Model

  • Accentuates the properties that are important in a DB
  • Compute quickly
  • Estimate the cost of an incomplete design
  • The cost estimates must increase monotonically as more variables

are set

slide-28
SLIDE 28
  • Measure
  • How much workload executes as a single-partition transactions
  • How uniformly load is distributed across the cluster

Skew-Aware Cost Model

Tradeoff!

slide-29
SLIDE 29

Skew-Aware Cost Model

Total number of partitions accessed divided by total number of partitions could have been accessed, and scale it up.

Coordinator Cost

slide-30
SLIDE 30

Skew-Aware Cost Model

To avoid time varying skew, divide W into finite intervals

Skew Factor

slide-31
SLIDE 31

Incomplete Designs

  • Query that references a table with an unset attribute in a design as being unknown
  • For each unknown query:
  • Coordinator Cost: Assume that any unknown query is single-partitioned
  • Skew Factor: Assume that unknown queries execute on all partitions in the cluster
  • ‘Unknown’ change to ‘known’
  • ‘Known’ cannot change to ‘Unknown’

monotonically increase!

slide-32
SLIDE 32

Optimizations

  • Access Graphs
  • Workload Compression
slide-33
SLIDE 33

Vertex: Table Edge: tables are co-accessed Weight of edges: the number of times the queries forming the relationship Access Graph

slide-34
SLIDE 34

Optimizations

  • Access Graphs
  • Workload Compression
slide-35
SLIDE 35
  • combine sets of similar queries in individual transactions into fewer weighted records
  • combine similar transactions into a smaller number of weighted records in the same manner

Workload Compression

slide-36
SLIDE 36

TA TATP TP TPC TPC-C TPC TPC-C C Sk Skew ewed ed

(txn/s /s)

+88% +88% +16% +16% +183% +183%

Horticulture State-of-the-Art

Th Throu roughp ghput ut

10,000 10,000 20,000 20,000 30,000 30,000 40,000 40,000 50,000 50,000 60, 60,000 000 4 8 16 16 32 32 64 64 2, 2,000 000 4, 4,000 000 6, 6,000 000 8, 8,000 000 10,000 10,000 12,000 12,000 14, 14,000 000 4 8 16 16 32 32 64 64 10,000 10,000 20,000 20,000 30,000 30,000 40,000 40,000 50,000 50,000 60,000 60,000 70,000 70,000 80, 80,000 000 4 8 16 16 32 32 64 64

slide-37
SLIDE 37

TATP SEATS TPC-C TPC-C Skewed AuctionMark TPC-E

Se Searc arch Tim h Times es

% Single-Partitioned Transactions

slide-38
SLIDE 38

38

Andy: it works !