Granola: LowOverhead Distributed Transac9on Coordina9on James - - PowerPoint PPT Presentation

granola low overhead distributed transac9on coordina9on
SMART_READER_LITE
LIVE PREVIEW

Granola: LowOverhead Distributed Transac9on Coordina9on James - - PowerPoint PPT Presentation

Granola: LowOverhead Distributed Transac9on Coordina9on James Cowling and Barbara Liskov MIT CSAIL Granola An infrastructure for building distributed storage applica<ons. A distributed transac9on coordina9on protocol, that provides strong


slide-1
SLIDE 1

Granola: Low‐Overhead Distributed Transac9on Coordina9on

James Cowling and Barbara Liskov MIT CSAIL

slide-2
SLIDE 2

Granola

An infrastructure for building distributed storage applica<ons. A distributed transac9on coordina9on protocol, that provides strong consistency without locking.

slide-3
SLIDE 3

Repository 1 Client 1

Distributed Storage

Repository 2 Repository 3

Client 2

slide-4
SLIDE 4

Why Transac9ons?

Atomic opera9ons allow users and developers to ignore concurrency Distributed atomic opera<ons allow data to span mul9ple repositories

avoids inconsistency between repositories

slide-5
SLIDE 5

Distributed transac9ons are hard

Tension between consistency and performance

slide-6
SLIDE 6

Op9ng for Consistency

Two‐phase commit with strict two‐ phase locking

Distributed databases, Sinfonia, etc.

slide-7
SLIDE 7

Op9ng for Consistency

Two‐phase commit with strict two‐ phase locking

Distributed databases, Sinfonia, etc.

High transac9on cost: mul<ple message delays, forced log writes, locking/logging overhead (≈30‐40%*)

* OLTP through the looking glass, and what we found there Harizopoulos et al, SIGMOD ‘08

slide-8
SLIDE 8

Op9ng for Performance

No distributed transac<ons

SimpleDB, Bigtable, CRAQ, etc.

Weak consistency models

Dynamo, Cassandra, MongoDB, Hbase, PNUTS, etc.

slide-9
SLIDE 9

Op9ng for Performance

No distributed transac<ons

SimpleDB, Bigtable, CRAQ, etc.

Weak consistency models

Dynamo, Cassandra, MongoDB, Hbase, PNUTS, etc.

Place the burden of consistency

  • n the applica9on developer
slide-10
SLIDE 10

Where we come in…

Strong Consistency and High Performance (for a large class of transac<ons)

slide-11
SLIDE 11

Introduce a new transac9on class which lets us provide consistency without locking

slide-12
SLIDE 12

MOTIVATION TRANSACTION MODEL PROTOCOL EVALUATION

slide-13
SLIDE 13

One‐Round Transac9ons

Expressed in one round of communica9on between clients and repositories Execute to comple9on at each par<cipant

slide-14
SLIDE 14

General Opera9ons

Transac<ons are uninterpreted by Granola, and can execute arbitrary

  • pera9ons
slide-15
SLIDE 15

Transac9on Classes

Single‐Repository

execute en<rely on one repository

Distributed Transac<ons

  • Coordinated
  • Independent
slide-16
SLIDE 16

Coordinated Transac9ons

Commit only if all par<cipants vote to commit Example:

  • Transfer $50 between accounts
slide-17
SLIDE 17

Independent Transac9ons

Transac<ons where all par<cipants will make the same commit/abort decision Examples:

  • Add 1% interest to each bank balance.
  • Compute total amount of money in the

bank.

slide-18
SLIDE 18

Independent Transac9ons

Evidence these are common in OLTP workloads

  • Any read‐only distributed transac<on
  • Transac<ons that always commit
  • Atomically update replicated data
  • Where commit decision is

determinis9c func9on of shared state

slide-19
SLIDE 19

Example: TPC‐C

TPC‐C benchmark can be expressed en<rely using single repository and independent transac9ons

e.g., new_order transac<on only aborts if invalid item number

can be computed locally if we replicate Item table

slide-20
SLIDE 20

MOTIVATION TRANSACTION MODEL PROTOCOL EVALUATION

slide-21
SLIDE 21

Server Applica<on Granola Repository Library Granola Client Library Client Applica<on

invoke transac<on coordina<on run

Repository Client

result result

slide-22
SLIDE 22

Replica9on

Repository

Primary Backup Backup

Implemented as

slide-23
SLIDE 23

Repository Modes

Primarily in Timestamp Mode

  • Single‐repository, Independent

Occasionally in Locking Mode

  • When coordinated transac<ons are

required

slide-24
SLIDE 24

Timestamps

Each transac<on is assigned a <mestamp Each repository executes transac<ons in 9mestamp order Timestamps define global serial order

slide-25
SLIDE 25

Key Ques9ons

How do we assign <mestamps in a scalable, fault‐tolerant way? How do we make sure we always execute in 9mestamp order?

slide-26
SLIDE 26

Single‐Repository Protocol

Clients present the highest 9mestamp they have observed Repository chooses <mestamp higher than the client 9mestamp, any previous transac<on, and its clock value Repository executes in 9mestamp order, sends response and <mestamp to client

slide-27
SLIDE 27

Assign <mestamp

Log Transac<on Run

Choose <mestamp higher than previous transac<ons

slide-28
SLIDE 28

Assign <mestamp

Log Transac<on Run

Run replica<on protocol to record <mestamp Choose <mestamp higher than previous transac<ons

slide-29
SLIDE 29

Assign <mestamp

Log Transac<on Run

Run replica<on protocol to record <mestamp Execute in <mestamp order Send result and <mestamp to the client Choose <mestamp higher than previous transac<ons

slide-30
SLIDE 30

Independent Protocol

Clients present highest <mestamp they observed Repository chooses proposed 9mestamp higher than clock value and previous transac<ons Repositories vote to determine highest 9mestamp Repository executes in <mestamp order, sends <mestamp to client

slide-31
SLIDE 31

Propose <mestamp Log Transac<on Vote Run replica<on protocol to record proposed <mestamp Send proposed 9mestamp to the other par9cipants Choose proposed 9mestamp higher than previous transac<ons Pick final <mestamp Highest 9mestamp from among votes Run Execute in <mestamp order Send result and final <mestamp to client

slide-32
SLIDE 32

Timestamp Constraint

Won’t execute transac<on un<l it has the lowest 9mestamp of all concurrent transac<ons Guarantees a global serial execu9on

  • rder
slide-33
SLIDE 33

Repository 1 Repository 2

Queue: History: Queue: History:

Alice Bob

Example 1:

slide-34
SLIDE 34

Queue: History: Queue: History: T1 T1

Repository 1 Repository 2 Alice Bob

slide-35
SLIDE 35

Queue: History: Queue: History: T1 [prop. ts: 9] T1 [prop. ts: 3] T1 T1

Repository 1 Repository 2 Alice Bob

slide-36
SLIDE 36

Queue: History: Queue: History: Vote T1 [9] T1 [prop. ts: 9] T1 [prop. ts: 3] Vote T1 [3]

Repository 1 Repository 2 Alice Bob

slide-37
SLIDE 37

Queue: History: Queue: History: T1 [final ts: 9] T1 [prop. ts: 3] Vote T1 [9] … T1 [final ts: 9]

Repository 1 Repository 2 Alice Bob

slide-38
SLIDE 38

Queue: History: Queue: History: T1 [final ts: 9] T1 [prop. ts: 3] T2 Vote T1 [9] …

Repository 1 Repository 2 Alice Bob

slide-39
SLIDE 39

Queue: History: Queue: History: T1 [final ts: 9] T1 [prop. ts: 3], T2 [ts: 5] Vote T1 [9] … T2

Repository 1 Repository 2 Alice Bob

slide-40
SLIDE 40

Queue: History: Queue: History: T1 [final ts: 9] T1 [prop. ts: 3], T2 [ts: 5] Vote T1 [9] …

Repository 1 Repository 2 Alice Bob

slide-41
SLIDE 41

Queue: History: Queue: History: T1 [final ts: 9] T1 [prop. ts: 3], T2 [ts: 5] Vote T1 [9]

Repository 1 Repository 2 Alice Bob

slide-42
SLIDE 42

Queue: History: Queue: History: Vote T1 [9] T1 [final ts: 9] T2 [ts: 5], T1 [final ts: 9]

Repository 1 Repository 2 Alice Bob

slide-43
SLIDE 43

Queue: History: Queue: History: T1 [final ts: 9] T1 [final ts: 9] T2 [ts: 5] T2 [ts: 5]

Repository 1 Repository 2 Alice Bob

slide-44
SLIDE 44

Queue: History: Queue: History: T1 [final ts: 9] T2 [ts: 5], T1 [final ts: 9] T1 [final ts: 9]

Repository 1 Repository 2 Alice Bob

slide-45
SLIDE 45

Queue: History: Queue: History: T1 [final ts: 9] T2 [ts: 5], T1 [final ts: 9]

Global serial order: T2 ‐> T1 Repository 1 Repository 2 Alice Bob

slide-46
SLIDE 46

Choosing 9mestamps

Client‐provided <mestamp guarantees transac<on will be serialized aYer any transac9on it observed

slide-47
SLIDE 47

Queue: History: Queue: History: T1 [final ts: 9] T1 [prop. ts: 3] Vote T1 [9] …

Example 2:

Repository 1 Repository 2 Alice Bob

slide-48
SLIDE 48

Queue: History: Queue: History: T1 [final ts: 9] T1 [prop. ts: 3] Vote T1 [9] … T2

Repository 1 Repository 2 Alice Bob

slide-49
SLIDE 49

Queue: History: Queue: History: T1 [final ts: 9], T2 [ts: 10] T1 [prop. ts: 3] Vote T1 [9] … T2 [ts: 10]

Repository 1 Repository 2 Alice Bob

slide-50
SLIDE 50

Queue: History: Queue: History: T1 [final ts: 9] , T2 [ts: 10] T1 [prop. ts: 3] Vote T1 [9] … T3 [latest ts: 10]

Repository 1 Repository 2 Alice Bob

slide-51
SLIDE 51

Queue: History: Queue: History: T1 [final ts: 9] , T2 [ts: 10] T1 [prop. ts: 3], T3 [ts: 11] Vote T1 [9] … T3 [latest ts: 10]

Repository 1 Repository 2 Alice Bob

slide-52
SLIDE 52

Queue: History: Queue: History: T1 [final ts: 9] , T2 [ts: 10] T1 [final. ts: 9], T3 [ts: 11] Vote T1 [9]

Repository 1 Repository 2 Alice Bob

slide-53
SLIDE 53

Queue: History: Queue: History: T1 [final ts: 9] , T2 [ts: 10] T1 [final ts: 9], T3 [ts: 11]

Global serial order: T1 ‐> T2 ‐> T3 Repository 1 Repository 2 Alice Bob

slide-54
SLIDE 54

Where are we now?

Timestamp‐based transac<on coordina<on

  • Interoperability with coordinated

transac9ons

  • Recovery from failures
slide-55
SLIDE 55

Coordinated Transac9ons

Applica<on determines commit/abort vote Requires locking to ensure vote isn’t invalidated

slide-56
SLIDE 56

Protocol changes

Prepare phase

applica<on acquires locks and determines vote Timestamp vo<ng

Repository can commit transac<ons

  • ut of 9mestamp order
slide-57
SLIDE 57

Protocol changes

Prepare phase where applica<on acquires locks and determines vote Repository can commit transac<ons

  • ut of <mestamp order

Timestamp Constraint: Timestamps s&ll match the serial order, even if execu<on happens out of <mestamp order

slide-58
SLIDE 58

Repository 1 Repository 3

Client 1 Client 2

Repository 2

coordinated transac<on independent transac<on Locking Mode Timestamp Mode Locking Mode

slide-59
SLIDE 59

Effect on other Transac9ons

Independent transac<ons processed using locking‐mode protocol No locks held for single‐repository transac<ons

slide-60
SLIDE 60

MOTIVATION TRANSACTION MODEL PROTOCOL EVALUATION

slide-61
SLIDE 61

Experimental Setup

Implemented as a Java library Deployed on 20 servers

2005‐vintage Xeon machines colocated on same LAN

slide-62
SLIDE 62

Experimental Setup

Throughput was CPU bound

> 65,000 tps on microbenchmarks latency kept around 1‐2 ms

We compare against extended version

  • f Sinfonia
slide-63
SLIDE 63

TPC‐C

Implemented distributed TPC‐C benchmark using a non‐distributed codebase Par<<oned such that all transac<ons single‐repository or independent

slide-64
SLIDE 64

Scalability

slide-65
SLIDE 65

Distributed transac9ons

slide-66
SLIDE 66

Performance Gains

No locking or undo logging

40% lower CPU overhead no aborts or retries

3 one‐way message delays (vs 4) 1 forced log write

slide-67
SLIDE 67

Conclusion

Granola provides strong consistency and high performance for distributed transac<ons Exploit independent transac9ons to provide coordina<on without locking