Granola: LowOverhead Distributed Transac9on Coordina9on James - - PowerPoint PPT Presentation
Granola: LowOverhead Distributed Transac9on Coordina9on James - - PowerPoint PPT Presentation
Granola: LowOverhead Distributed Transac9on Coordina9on James Cowling and Barbara Liskov MIT CSAIL Granola An infrastructure for building distributed storage applica<ons. A distributed transac9on coordina9on protocol, that provides strong
Granola
An infrastructure for building distributed storage applica<ons. A distributed transac9on coordina9on protocol, that provides strong consistency without locking.
Repository 1 Client 1
Distributed Storage
Repository 2 Repository 3
…
Client 2
…
Why Transac9ons?
Atomic opera9ons allow users and developers to ignore concurrency Distributed atomic opera<ons allow data to span mul9ple repositories
avoids inconsistency between repositories
Distributed transac9ons are hard
Tension between consistency and performance
Op9ng for Consistency
Two‐phase commit with strict two‐ phase locking
Distributed databases, Sinfonia, etc.
Op9ng for Consistency
Two‐phase commit with strict two‐ phase locking
Distributed databases, Sinfonia, etc.
High transac9on cost: mul<ple message delays, forced log writes, locking/logging overhead (≈30‐40%*)
* OLTP through the looking glass, and what we found there Harizopoulos et al, SIGMOD ‘08
Op9ng for Performance
No distributed transac<ons
SimpleDB, Bigtable, CRAQ, etc.
Weak consistency models
Dynamo, Cassandra, MongoDB, Hbase, PNUTS, etc.
Op9ng for Performance
No distributed transac<ons
SimpleDB, Bigtable, CRAQ, etc.
Weak consistency models
Dynamo, Cassandra, MongoDB, Hbase, PNUTS, etc.
Place the burden of consistency
- n the applica9on developer
Where we come in…
Strong Consistency and High Performance (for a large class of transac<ons)
Introduce a new transac9on class which lets us provide consistency without locking
MOTIVATION TRANSACTION MODEL PROTOCOL EVALUATION
One‐Round Transac9ons
Expressed in one round of communica9on between clients and repositories Execute to comple9on at each par<cipant
General Opera9ons
Transac<ons are uninterpreted by Granola, and can execute arbitrary
- pera9ons
Transac9on Classes
Single‐Repository
execute en<rely on one repository
Distributed Transac<ons
- Coordinated
- Independent
Coordinated Transac9ons
Commit only if all par<cipants vote to commit Example:
- Transfer $50 between accounts
Independent Transac9ons
Transac<ons where all par<cipants will make the same commit/abort decision Examples:
- Add 1% interest to each bank balance.
- Compute total amount of money in the
bank.
Independent Transac9ons
Evidence these are common in OLTP workloads
- Any read‐only distributed transac<on
- Transac<ons that always commit
- Atomically update replicated data
- Where commit decision is
determinis9c func9on of shared state
Example: TPC‐C
TPC‐C benchmark can be expressed en<rely using single repository and independent transac9ons
e.g., new_order transac<on only aborts if invalid item number
can be computed locally if we replicate Item table
MOTIVATION TRANSACTION MODEL PROTOCOL EVALUATION
Server Applica<on Granola Repository Library Granola Client Library Client Applica<on
invoke transac<on coordina<on run
Repository Client
result result
Replica9on
Repository
Primary Backup Backup
Implemented as
Repository Modes
Primarily in Timestamp Mode
- Single‐repository, Independent
Occasionally in Locking Mode
- When coordinated transac<ons are
required
Timestamps
Each transac<on is assigned a <mestamp Each repository executes transac<ons in 9mestamp order Timestamps define global serial order
Key Ques9ons
How do we assign <mestamps in a scalable, fault‐tolerant way? How do we make sure we always execute in 9mestamp order?
Single‐Repository Protocol
Clients present the highest 9mestamp they have observed Repository chooses <mestamp higher than the client 9mestamp, any previous transac<on, and its clock value Repository executes in 9mestamp order, sends response and <mestamp to client
Assign <mestamp
Log Transac<on Run
Choose <mestamp higher than previous transac<ons
Assign <mestamp
Log Transac<on Run
Run replica<on protocol to record <mestamp Choose <mestamp higher than previous transac<ons
Assign <mestamp
Log Transac<on Run
Run replica<on protocol to record <mestamp Execute in <mestamp order Send result and <mestamp to the client Choose <mestamp higher than previous transac<ons
Independent Protocol
Clients present highest <mestamp they observed Repository chooses proposed 9mestamp higher than clock value and previous transac<ons Repositories vote to determine highest 9mestamp Repository executes in <mestamp order, sends <mestamp to client
Propose <mestamp Log Transac<on Vote Run replica<on protocol to record proposed <mestamp Send proposed 9mestamp to the other par9cipants Choose proposed 9mestamp higher than previous transac<ons Pick final <mestamp Highest 9mestamp from among votes Run Execute in <mestamp order Send result and final <mestamp to client
Timestamp Constraint
Won’t execute transac<on un<l it has the lowest 9mestamp of all concurrent transac<ons Guarantees a global serial execu9on
- rder
Repository 1 Repository 2
Queue: History: Queue: History:
Alice Bob
Example 1:
Queue: History: Queue: History: T1 T1
Repository 1 Repository 2 Alice Bob
Queue: History: Queue: History: T1 [prop. ts: 9] T1 [prop. ts: 3] T1 T1
Repository 1 Repository 2 Alice Bob
Queue: History: Queue: History: Vote T1 [9] T1 [prop. ts: 9] T1 [prop. ts: 3] Vote T1 [3]
…
Repository 1 Repository 2 Alice Bob
Queue: History: Queue: History: T1 [final ts: 9] T1 [prop. ts: 3] Vote T1 [9] … T1 [final ts: 9]
Repository 1 Repository 2 Alice Bob
Queue: History: Queue: History: T1 [final ts: 9] T1 [prop. ts: 3] T2 Vote T1 [9] …
Repository 1 Repository 2 Alice Bob
Queue: History: Queue: History: T1 [final ts: 9] T1 [prop. ts: 3], T2 [ts: 5] Vote T1 [9] … T2
Repository 1 Repository 2 Alice Bob
Queue: History: Queue: History: T1 [final ts: 9] T1 [prop. ts: 3], T2 [ts: 5] Vote T1 [9] …
Repository 1 Repository 2 Alice Bob
Queue: History: Queue: History: T1 [final ts: 9] T1 [prop. ts: 3], T2 [ts: 5] Vote T1 [9]
Repository 1 Repository 2 Alice Bob
Queue: History: Queue: History: Vote T1 [9] T1 [final ts: 9] T2 [ts: 5], T1 [final ts: 9]
Repository 1 Repository 2 Alice Bob
Queue: History: Queue: History: T1 [final ts: 9] T1 [final ts: 9] T2 [ts: 5] T2 [ts: 5]
Repository 1 Repository 2 Alice Bob
Queue: History: Queue: History: T1 [final ts: 9] T2 [ts: 5], T1 [final ts: 9] T1 [final ts: 9]
Repository 1 Repository 2 Alice Bob
Queue: History: Queue: History: T1 [final ts: 9] T2 [ts: 5], T1 [final ts: 9]
Global serial order: T2 ‐> T1 Repository 1 Repository 2 Alice Bob
Choosing 9mestamps
Client‐provided <mestamp guarantees transac<on will be serialized aYer any transac9on it observed
Queue: History: Queue: History: T1 [final ts: 9] T1 [prop. ts: 3] Vote T1 [9] …
Example 2:
Repository 1 Repository 2 Alice Bob
Queue: History: Queue: History: T1 [final ts: 9] T1 [prop. ts: 3] Vote T1 [9] … T2
Repository 1 Repository 2 Alice Bob
Queue: History: Queue: History: T1 [final ts: 9], T2 [ts: 10] T1 [prop. ts: 3] Vote T1 [9] … T2 [ts: 10]
Repository 1 Repository 2 Alice Bob
Queue: History: Queue: History: T1 [final ts: 9] , T2 [ts: 10] T1 [prop. ts: 3] Vote T1 [9] … T3 [latest ts: 10]
Repository 1 Repository 2 Alice Bob
Queue: History: Queue: History: T1 [final ts: 9] , T2 [ts: 10] T1 [prop. ts: 3], T3 [ts: 11] Vote T1 [9] … T3 [latest ts: 10]
Repository 1 Repository 2 Alice Bob
Queue: History: Queue: History: T1 [final ts: 9] , T2 [ts: 10] T1 [final. ts: 9], T3 [ts: 11] Vote T1 [9]
Repository 1 Repository 2 Alice Bob
Queue: History: Queue: History: T1 [final ts: 9] , T2 [ts: 10] T1 [final ts: 9], T3 [ts: 11]
Global serial order: T1 ‐> T2 ‐> T3 Repository 1 Repository 2 Alice Bob
Where are we now?
Timestamp‐based transac<on coordina<on
- Interoperability with coordinated
transac9ons
- Recovery from failures
Coordinated Transac9ons
Applica<on determines commit/abort vote Requires locking to ensure vote isn’t invalidated
Protocol changes
Prepare phase
applica<on acquires locks and determines vote Timestamp vo<ng
Repository can commit transac<ons
- ut of 9mestamp order
Protocol changes
Prepare phase where applica<on acquires locks and determines vote Repository can commit transac<ons
- ut of <mestamp order
Timestamp Constraint: Timestamps s&ll match the serial order, even if execu<on happens out of <mestamp order
Repository 1 Repository 3
Client 1 Client 2
Repository 2
coordinated transac<on independent transac<on Locking Mode Timestamp Mode Locking Mode
Effect on other Transac9ons
Independent transac<ons processed using locking‐mode protocol No locks held for single‐repository transac<ons
MOTIVATION TRANSACTION MODEL PROTOCOL EVALUATION
Experimental Setup
Implemented as a Java library Deployed on 20 servers
2005‐vintage Xeon machines colocated on same LAN
Experimental Setup
Throughput was CPU bound
> 65,000 tps on microbenchmarks latency kept around 1‐2 ms
We compare against extended version
- f Sinfonia