Low Overhead Concurrency Control for Partitioned Main Memory - - PowerPoint PPT Presentation

low overhead concurrency control for partitioned main
SMART_READER_LITE
LIVE PREVIEW

Low Overhead Concurrency Control for Partitioned Main Memory - - PowerPoint PPT Presentation

Low Overhead Concurrency Control for Partitioned Main Memory Databases Evan P. C. Jones Daniel J. Abadi Samuel Madden Banks Payment Processing Airline Reservations E-Commerce Web 2.0 Problem :


slide-1
SLIDE 1

Low Overhead Concurrency Control for Partitioned Main Memory Databases


Evan P. C. Jones Daniel J. Abadi Samuel Madden

slide-2
SLIDE 2

Banks

  • Payment Processing
  • Airline Reservations
  • E-Commerce
  • Web 2.0
slide-3
SLIDE 3

Problem:

  • Millions of transactions per second
slide-4
SLIDE 4

Problem:

  • Millions of transactions per second
slide-5
SLIDE 5

Problem:

  • Millions of transactions per second
  • =
  • $$$$
slide-6
SLIDE 6

Alternative: H-Store Project

Redesign specifically for OLTP Prototype: ~10X throughput Idea: Remove un-needed features

Source: Stonebraker et. al, “The End of an Architectural Era”, VLDB 2007.

slide-7
SLIDE 7

H-Store: High Throughput OLTP

Redesign DB specifically for OLTP Prototype: ~10X throughput Main memory database Concurrency control consumes ~30-40% of CPU time

slide-8
SLIDE 8

CPU Cycle Breakdown for Shore on TPC-C New Order Source: Harizopoulos, Abadi, Madden and Stonebraker, “OLTP Under the Looking Glass”, SIGMOD 2008

!"# !)# (!"# (!)# $!"# $!)# *!"# *!)#

87',+#

*$%++ ,)--("- ,)'.("- ,/$'0("- 1&22+% 3/"/-+% .+7#

0I=12 I`=L2 0C=I2 0Q=W2 I02 Q=02

concurrency control

! 30-40%

slide-9
SLIDE 9

CPU Cycle Breakdown for Shore on TPC-C New Order Source: Harizopoulos, Abadi, Madden and Stonebraker, “OLTP Under the Looking Glass”, SIGMOD 2008

!"# !)# (!"# (!)# $!"# $!)# *!"# *!)#

87',+#

*$%++ ,)--("- ,)'.("- ,/$'0("- 1&22+% 3/"/-+% .+7#

0I=12 I`=L2 0C=I2 0Q=W2 I02 Q=02

concurrency control

! 30-40%

slide-10
SLIDE 10

Speculative Concurrency Control

Eliminate fine-grained access tracking (locks or read/write sets) Eliminate undo logs (where possible) Up to 2X faster than locking for appropriate workloads

slide-11
SLIDE 11

Why Support Concurrency?

main memory stored procedures partition per core donʼt do them Use idle resources: disk stalls user stalls Physical resources: multiple CPUs multiple disks Long running txns:

slide-12
SLIDE 12

H-Store: Single thread engine

Assumptions: Database divided into partitions Transactions access one partition (mostly) Mapping procedures to partitions is given Total data fits in memory of N machines Partitions are replicated on 2 machines

slide-13
SLIDE 13

System Overview

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator

slide-14
SLIDE 14

Single Partition Transaction

  • Client

Primary Backup 1

slide-15
SLIDE 15

Single Partition Transaction

  • Client

Primary Backup 1 2

slide-16
SLIDE 16

Single Partition Transaction

  • Client

Primary Backup 1 2 execute

slide-17
SLIDE 17

Single Partition Transaction

  • Client

Primary Backup 1 2 3 execute

slide-18
SLIDE 18

Single Partition Transaction

  • Client

Primary Backup 1 2 3 4 execute

slide-19
SLIDE 19

Single Partition Transaction

  • Client

Primary Backup 1 2 3 4 execute execute

slide-20
SLIDE 20

Single Partition Transaction

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 1

slide-21
SLIDE 21

Single Partition Transaction

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 1 2

slide-22
SLIDE 22

Single Partition Transaction

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2

slide-23
SLIDE 23

Single Partition Transaction

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 4

slide-24
SLIDE 24

Single Partition Transaction

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 4

slide-25
SLIDE 25

Single Partition Transaction

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 4

slide-26
SLIDE 26

Single Partition Transaction

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 4

slide-27
SLIDE 27

Single Partition Transaction

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 4

slide-28
SLIDE 28

Single Partition Transaction

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 4

slide-29
SLIDE 29

Single Partition Transaction

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 4

slide-30
SLIDE 30

Single Partition Transaction

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 4

slide-31
SLIDE 31

Not Perfectly Partionable?

Example: users and groups Many applications are mostly partitionable e.g. TPC-C: 11% multi-partition transactions

slide-32
SLIDE 32

Distributed Transactions

Need two-phase commit (consensus) Simple solution: block until the transaction finishes Introduces network stall (bad)

slide-33
SLIDE 33

Blocking Multi-Partition

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 1

slide-34
SLIDE 34

Blocking Multi-Partition

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 1 2 2

slide-35
SLIDE 35

Blocking Multi-Partition

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 1 2 2

slide-36
SLIDE 36

Blocking Multi-Partition

  • Client

Coordinator P1 Primary P1 Backup 2 1

slide-37
SLIDE 37

Blocking Multi-Partition

  • Client

Coordinator P1 Primary P1 Backup 2 3 1

slide-38
SLIDE 38

Blocking Multi-Partition

  • Client

Coordinator P1 Primary P1 Backup 2 3 1 execute

slide-39
SLIDE 39

Blocking Multi-Partition

  • Client

Coordinator P1 Primary P1 Backup 2 3 1 execute 4

slide-40
SLIDE 40

Blocking Multi-Partition

  • Client

Coordinator P1 Primary P1 Backup 2 3 1 execute 4 5

slide-41
SLIDE 41

Blocking Multi-Partition

  • Client

Coordinator P1 Primary P1 Backup 2 3 1 execute 4 5

slide-42
SLIDE 42

Blocking Multi-Partition

  • Client

Coordinator P1 Primary P1 Backup 2 3 1 execute 4 5 6 6

slide-43
SLIDE 43

Blocking Multi-Partition

  • Client

Coordinator P1 Primary P1 Backup 2 3 1 execute 4 5 6 6 execute

slide-44
SLIDE 44

Blocking Multi-Partition

  • Client

Coordinator P1 Primary P1 Backup 2 3 1 execute 4 5 6 6 execute network stall

slide-45
SLIDE 45

Blocking Multi-Partition

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 2 3

slide-46
SLIDE 46

Blocking Multi-Partition

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 2 4 3 4

slide-47
SLIDE 47

Blocking Multi-Partition

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 2 4 3 4 5 5

slide-48
SLIDE 48

Blocking Multi-Partition

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 2 4 3 4 5 5 6 6 6

slide-49
SLIDE 49

Blocking Multi-Partition

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 2 4 3 4 5 5 6 6 6

slide-50
SLIDE 50

Two-Phase Locking

+ Execute non-conflicting txns during stall + No need to order in advance – Locking overhead – Deadlocks Optimization: turn off locks and undo logging when no multi-partition transactions

slide-51
SLIDE 51

Speculative CC

While waiting for commit/abort, speculatively execute other transactions + No locks; no read/write sets – Need global transaction order – Cascading aborts

slide-52
SLIDE 52

Speculative Multi-Partition

  • Client

Coordinator P1 Primary P1 Backup 2 3 1 execute 4 5

slide-53
SLIDE 53

Speculative Multi-Partition

  • Client

Coordinator P1 Primary P1 Backup 2 3 1 execute 4 5 execute

slide-54
SLIDE 54

Speculative Multi-Partition

  • Client

Coordinator P1 Primary P1 Backup 2 3 1 execute 4 5 execute

slide-55
SLIDE 55

Speculative Multi-Partition

  • Client

Coordinator P1 Primary P1 Backup 2 3 1 execute 4 5 execute

slide-56
SLIDE 56

Speculative Multi-Partition

  • Client

Coordinator P1 Primary P1 Backup 2 3 1 execute 4 5 execute 6 6

slide-57
SLIDE 57

Speculative Multi-Partition

  • Client

Coordinator P1 Primary P1 Backup 2 3 1 execute 4 5 execute 6 6

slide-58
SLIDE 58

Speculation Limitation

Transactions with multiple “rounds” of work: need network stall Example:

  • 1. Read x on partition 1, y on partition 2
  • 2. Update x = f(x, y); y = f(x, y)
slide-59
SLIDE 59

Speculative Multi-Partition

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 2 3

slide-60
SLIDE 60

Speculative Multi-Partition

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 2

slide-61
SLIDE 61

Speculative Multi-Partition

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 2 4 3 4

slide-62
SLIDE 62

Speculative Multi-Partition

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 2 4 3 4

slide-63
SLIDE 63

Speculative Multi-Partition

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 2 4 3 4 5 5

slide-64
SLIDE 64

Speculative Multi-Partition

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 2 4 3 4 5 5

slide-65
SLIDE 65

Speculative Multi-Partition

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 2 4 3 4 5 5 6 6 6

slide-66
SLIDE 66

Speculative Multi-Partition

  • Clients

H-Store

Partition 2 Partition 2 Client Library Client Library Client Library

Primary Backup

Partition 1 Partition 1

Primary Backup

Coordinator 3 1 2 2 4 3 4 5 5 6 6 6

slide-67
SLIDE 67

Microbenchmark

Primary Backup Primary Backup Partition 1 Partition 2 Coordinator Client Load Generator

Two partitions of a single table

¡ (id ¡INTEGER ¡PRIMARY ¡KEY, ¡value ¡INTEGER) ¡

slide-68
SLIDE 68

Microbenchmark

Single partition transaction: read/write keys on one partition Multi-partition transaction: access half keys from each partition single partition work = multi-partition work No deadlocks, no aborts, no conflicts

slide-69
SLIDE 69

5000 10000 15000 20000 25000 30000 20 40 60 80 100 % Multi-Partition Locking Speculation Blocking Throughput (transactions/s) Locking 40% Conflicts

slide-70
SLIDE 70

5000 10000 15000 20000 25000 30000 0% 20% 40% 60% 80% 100% Transactions/second Multi-Partition Transactions Speculation Locking Blocking

slide-71
SLIDE 71

TPC-C Based

~11% multi-partition transactions More complex locking Many conflicts Some deadlocks Some aborts

slide-72
SLIDE 72

5000 10000 15000 20000 25000 30000 2 4 6 8 10 12 14 16 18 20 Warehouses Locking Speculation Blocking Throughput (transactions/s)

slide-73
SLIDE 73

5000 10000 15000 20000 25000 2 4 6 8 10 12 14 16 18 20 Transactions/second Warehouses Speculation Blocking Locking

slide-74
SLIDE 74

Speculative CC

better for “mostly partitionable” apps on main memory DBs Up to 2X throughput No locking overhead No deadlocks

slide-75
SLIDE 75

5000 10000 15000 20000 25000 30000 0% 20% 40% 60% 80% 100% Transactions/second Multi-Partition Transactions Speculation 0% aborts Speculation 1% aborts Speculation 3% aborts Speculation 5% aborts Speculation 10% aborts Blocking 10% aborts Locking 10% aborts

slide-76
SLIDE 76

5000 10000 15000 20000 25000 30000 0% 20% 40% 60% 80% 100% Transactions/second Multi-Partition Transactions Locking 0% conflict Locking 20% conflict Locking 60% conflict Locking 100% conflict Speculation Blocking

slide-77
SLIDE 77

5000 10000 15000 20000 25000 30000 0% 20% 40% 60% 80% 100% Transactions/second Multi-Partition Transactions Speculation Blocking Locking

slide-78
SLIDE 78

5000 10000 15000 20000 25000 30000 35000 0% 20% 40% 60% 80% 100% Transactions/second Multi-Partition Transactions Model Blocking Model Speculate Model Locking Measured Blocking Measured Speculate Measured Locking

slide-79
SLIDE 79

5000 10000 15000 20000 25000 30000 35000 0% 20% 40% 60% 80% 100% Transactions/second Multi-Partition Transactions Speculation Blocking Locking

slide-80
SLIDE 80

5000 10000 15000 20000 25000 30000 0% 20% 40% 60% 80% 100% Transactions/second Multi-Partition Transactions Speculation Local Speculation Blocking Locking Optimistic

slide-81
SLIDE 81

5000 10000 15000 20000 25000 2 4 6 8 10 12 14 16 18 20 Transactions/second Warehouses Speculation Blocking Locking Optimistic