High Performance Transactions via Early Write Visibility Jose - - PowerPoint PPT Presentation

high performance transactions via early write visibility
SMART_READER_LITE
LIVE PREVIEW

High Performance Transactions via Early Write Visibility Jose - - PowerPoint PPT Presentation

High Performance Transactions via Early Write Visibility Jose Faleiro Daniel Abadi Joseph Hellerstein Serializability is our gold standard Developers focus on individual transaction correctness System ensures correctness under concurrency


slide-1
SLIDE 1

High Performance Transactions via Early Write Visibility

Jose Faleiro Daniel Abadi Joseph Hellerstein

slide-2
SLIDE 2

Serializability is our gold standard

Developers focus on individual transaction correctness System ensures correctness under concurrency Developers can focus on application logic

slide-3
SLIDE 3

Elephant in the room: Serializability is the exception Weak isolation is the norm

slide-4
SLIDE 4

Serializability in practice…

is snapshot isolation is not the default

slide-5
SLIDE 5

Non-modular applications: changing anything changes everything

slide-6
SLIDE 6

Non-modular applications: changing anything changes everything Silent data corruption

slide-7
SLIDE 7

Silent data corruption Non-modular applications: changing anything changes everything Security bugs

slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10

“The hacker discovered that if you place several withdrawals all in practically the same instant, they will get processed at more or less the same time. This will result in a negative balance, but valid insertions into the database… ’’

slide-11
SLIDE 11

(Second!) Elephant in the room: Very little progress towards addressing the gap

slide-12
SLIDE 12

The real hurdle is recoverability mechanism

slide-13
SLIDE 13

Recoverability + isolation

Strong isolation mechanisms have limited mileage Due to: Recoverability mechanisms Isolation level specifications

slide-14
SLIDE 14

Recoverability + isolation

Limitation is independent of isolation level implementation

slide-15
SLIDE 15

Recoverability + isolation

Limitation is independent of isolation level implementation

Includes all modern concurrency control protocols based on 2PL, OCC, MVCC, Timestamp ordering

slide-16
SLIDE 16

This talk

State-of-the-art recoverability mechanisms fundamentally limit strong isolation levels New recoverability mechanism based on deterministic execution

slide-17
SLIDE 17

Recoverability

Committed transactions must read committed data Required of popular isolation levels: Read committed, snapshot isolation, repeatable read, serializable

slide-18
SLIDE 18

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id)

Recoverability mechanisms

slide-19
SLIDE 19

Purchase(item_id, cust_id): item = item = items_tbl items_tbl[item_id item_id] if if item.count item.count == 0: == 0: Abort() Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id)

Recoverability mechanisms

slide-20
SLIDE 20

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count item.count -= 1 = 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id)

Recoverability mechanisms

slide-21
SLIDE 21

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert bills.insert(cust_id cust_id, , item_id item_id, , item.price item.price) history.insert history.insert(cust_id cust_id, , item_id item_id)

Recoverability mechanisms

slide-22
SLIDE 22

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort()

Recoverability mechanisms

slide-23
SLIDE 23

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1

Recoverability mechanisms

slide-24
SLIDE 24

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort()

Recoverability mechanisms

slide-25
SLIDE 25

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort()

Recoverability mechanisms

slide-26
SLIDE 26

Recoverability mechanisms

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id)

Strawmen: Wait until commit Limited throughput Expose writes immediately Cascaded rollbacks

slide-27
SLIDE 27

Recoverability mechanisms

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id)

State-of-the-art: Group commit Wait until end of execution, not commit Readers “share fate” with writers Durable write latency does not limit throughput Cascaded rollbacks restricted to failures

slide-28
SLIDE 28

State-of-the-art: Group commit

slide-29
SLIDE 29

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id)

Writes made visible at the end of txn’s execution

slide-30
SLIDE 30

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id)

Write visibility delay

slide-31
SLIDE 31

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id)

Serializability: Conflicting txns must wait Read Committed: Conflicting txns can read old values Write visibility delay

slide-32
SLIDE 32

Serializable

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id) Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id) Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id)

slide-33
SLIDE 33

Read committed

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id) Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id) Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id)

Item

Read

slide-34
SLIDE 34

Impact of delayed write visibility

slide-35
SLIDE 35

Impact of delayed write visibility

Workload: 10 read-modify-write txns 1 hot, 9 cold Vary position of hot update

slide-36
SLIDE 36

Impact of delayed write visibility

Cold update Cold update . . . Cold update Cold update Hot update Hot update Workload: 10 read-modify-write txns 1 hot, 9 cold Vary position of hot update

slide-37
SLIDE 37

Impact of delayed write visibility

Cold update Cold update . . . Cold update Cold update Hot update Hot update Workload: 10 read-modify-write txns 1 hot, 9 cold Vary position of hot update Write visibility delay: 0

slide-38
SLIDE 38

Impact of delayed write visibility

Cold update Cold update . . . Hot update Hot update Cold update Cold update Workload: 10 read-modify-write txns 1 hot, 9 cold Vary position of hot update Write visibility delay: 2

slide-39
SLIDE 39

Impact of delayed write visibility

Hot Hot update update Cold update Cold update . . . Cold update Cold update Workload: 10 read-modify-write txns 1 hot, 9 cold Vary position of hot update Write visibility delay: 9

slide-40
SLIDE 40

Impact of delayed write visibility

0 K 100 K 200 K 300 K 400 K 1 2 3 4 5 6 7 8 9 Throughput (txns/sec) Write visibility delay

Read Committed Serializable

slide-41
SLIDE 41

Impact of delayed write visibility 30% drop in Read committed vs 3x drop in Serializable

slide-42
SLIDE 42

Metaphor credit: Bill Thies

Evolution of transaction processing

Concurrency control Recoverability

slide-43
SLIDE 43

Metaphor credit: Bill Thies

Evolution of transaction processing

Concurrency control Recoverability

slide-44
SLIDE 44

Why delayed write visibility?

Database systems have the flexibility to arbitrarily abort transactions

slide-45
SLIDE 45

Why delayed write visibility?

Abort statements Constraint violations Deadlocks Failures Validation errors Resource constraints

slide-46
SLIDE 46

Why delayed write visibility?

Abort statements Constraint violations Deadlocks Failures Validation errors Resource constraints State induced System induced

slide-47
SLIDE 47

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id)

Write visibility delay

slide-48
SLIDE 48

Why delayed write visibility

Abort statements Constraint violations Deadlocks Failures Validation errors Resource constraints State induced System induced

slide-49
SLIDE 49

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id)

Early write visibility

slide-50
SLIDE 50

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id)

Item’s count update is visible here

Early write visibility

slide-51
SLIDE 51

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id) Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id)

Early write visibility

slide-52
SLIDE 52

Challenges

Delayed write visibility for serializability

Long duration write locks (e.g., 2PL) Write buffering (e.g., OCC, MVCC, T/O)

Intrinsic system induced aborts

Dynamic locking can deadlock Abort on serialization errors (OCC, MVCC, T/O)

slide-53
SLIDE 53

Early write visibility is incompatible with existing isolation mechanisms

slide-54
SLIDE 54

Deterministic execution

slide-55
SLIDE 55

Deterministic execution

T0 T1 T2 T3 T4

slide-56
SLIDE 56

Deterministic execution

T0 T1 T2 T3 T4 T0 T1 T2 T4 T3

Determine legal schedule

slide-57
SLIDE 57

Deterministic execution

T0 T1 T2 T3 T4 T0 T1 T2 T4 T3

Determine legal schedule Execute

slide-58
SLIDE 58

Deterministic execution

T0 T1 T2 T3 T4 T0 T1 T2 T4 T3

Determine legal schedule Execute

Serializable “by definition” No system induced aborts

slide-59
SLIDE 59

Piece wise visibility

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id)

Item Bills Hist

slide-60
SLIDE 60

Piece wise visibility

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id)

Item Bills Hist

Data dependencies

slide-61
SLIDE 61

Piece wise visibility

Purchase(item_id, cust_id): item = items_tbl[item_id] if item.count == 0: Abort() item.count -= 1 bills.insert(cust_id, item_id, item.price) history.insert(cust_id, item_id)

Item Bills Hist

Data dependencies Commit dependencies

Abortable Non-abortable Non-abortable

slide-62
SLIDE 62

Piece wise visibility

Item1 Bills Hist

Data dependencies Commit dependencies

Item0

slide-63
SLIDE 63

Piece wise visibility

Item1 Bills Hist

Data dependencies Commit dependencies

Item0

RVP RVPs (rendezvous points) implement a lightweight commit protocol

slide-64
SLIDE 64

Piece wise visibility

Item1 Bills Hist

Data dependencies Commit dependencies

Item0

C=2 RVPs (rendezvous points) implement a lightweight commit protocol

slide-65
SLIDE 65

Piece wise visibility

Item1 Bills Hist

Data dependencies Commit dependencies

Item0

C=1 RVPs (rendezvous points) implement a lightweight commit protocol

slide-66
SLIDE 66

Piece wise visibility

Item1 Bills Hist

Data dependencies Commit dependencies

Item0

C=0 RVPs (rendezvous points) implement a lightweight commit protocol

slide-67
SLIDE 67

Piece wise visibility

Item1 Bills Hist

Data dependencies Commit dependencies

Item0

C=-1 RVPs (rendezvous points) implement a lightweight commit protocol

slide-68
SLIDE 68

Piece wise visibility

Item1 Bills Hist

Data dependencies Commit dependencies

Item0

RVP RVPs (rendezvous points) implement a lightweight commit protocol Abortable writes visible after commit decision Non-abortable writes visible after they execute

slide-69
SLIDE 69

Executing pieces

T0 T1 T2 T3 T4 T0 T1 T2 T4 T3

Determine legal schedule Execute

Replace transactions with pieces Use Piece Wise Visibility rules

slide-70
SLIDE 70

Conflict information

PWV can use coarse-grained conflict information

E.g., partitions, foreign-keys

Conventional wisdom: Coarse-grained conflict info significantly constrains concurrency PWV isolates at finer granularity; pieces not transactions Significantly reduces blocking due to conflicts

slide-71
SLIDE 71

Isn’t this transaction chopping?

Chopping S S C C No conflicts on more than one piece

slide-72
SLIDE 72

Isn’t this transaction chopping?

Chopping PWV

T0 T1 T2 T3 T4 T0 T1 T2 T4 T3

S S C C

slide-73
SLIDE 73

TPC-C: NewOrder + Payment

0.0 M 0.2 M 0.4 M 0.6 M 0.8 M 1.0 M 4 8 12 16 20 24 28 32 36 40

1 Warehouse

Throughput (txns/sec) Number of CPU cores

RC OCC Locking

slide-74
SLIDE 74

TPC-C: NewOrder + Payment

0.0 M 0.2 M 0.4 M 0.6 M 0.8 M 1.0 M 4 8 12 16 20 24 28 32 36 40

1 Warehouse

Throughput (txns/sec) Number of CPU cores

PWV RC OCC Locking

slide-75
SLIDE 75

Conclusions

Serializability is fundamentally limited by recoverability mechanisms Recoverability limited by DB flexibility to arbitrarily abort txns Take this flexibility away! Writes can be made visible early without the risk of rolling back Piece wise visibility achieves this vision using deterministic execution

slide-76
SLIDE 76

Conclusions

Serializability is fundamentally limited by recoverability mechanisms Recoverability limited by DB flexibility to arbitrarily abort txns Take this flexibility away! Writes can be made visible early without the risk of rolling back Piece wise visibility achieves this vision using deterministic execution http://jmfaleiro.com jose.faleiro@yale.edu

slide-77
SLIDE 77

Scalability

0.0 M 0.4 M 0.8 M 1.2 M 1.6 M 2.0 M 4 8 12 16 20 24 28 32 36 40

# Warehouses = # cores

Throughput (txns/sec) Number of CPU cores

RC OCC Locking

slide-78
SLIDE 78

Scalability

0.0 M 0.4 M 0.8 M 1.2 M 1.6 M 2.0 M 4 8 12 16 20 24 28 32 36 40

# Warehouses = # cores

Throughput (txns/sec) Number of CPU cores

PWV RC OCC Locking

slide-79
SLIDE 79

Effect of aborts

0.0 M 0.2 M 0.4 M 0.6 M 0.8 M 1.0 M 2 4 6 8 10 12 14 16 18 20 Throughput (txns/sec) Commit point

PWV RC OCC Locking

Low contention

slide-80
SLIDE 80

Effect of aborts

0.0 M 0.2 M 0.4 M 0.6 M 0.8 M 1.0 M 2 4 6 8 10 12 14 16 18 20 Throughput (txns/sec) Commit point

PWV RC OCC Locking

High contention

slide-81
SLIDE 81

Txn chopping

0.0 M 0.2 M 0.4 M 0.6 M 0.8 M 1.0 M 4 8 12 16 20 24 28 32 36 40

1 Warehouse

Throughput (txns/sec) Number of CPU cores

PWV RC OCC Locking Chopping

slide-82
SLIDE 82

Txn chopping

0.0 M 0.2 M 0.4 M 0.6 M 0.8 M 1.0 M 4 8 12 16 20 24 28 32 36 40

1 Warehouse

Throughput (txns/sec) Number of CPU cores

PWV RC OCC Locking Chopping Chopping-comm

slide-83
SLIDE 83

Related work

Exploit semantics beyond r/w conflicts:

Sagas, escrow, logical undo, and more recent variants (e.g., Doppel)

Build on existing concurrency control protocols:

Transaction chopping and recent incarnations (IC3, Salt, Runtime Pipelining, DGCC) Restricted piece-wise interleavings at runtime