Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control
Jialin Li, Ellis Michael, Dan R. K. Ports
Eris: Coordination-Free Consistent Transactions Using In-Network - - PowerPoint PPT Presentation
Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control Jialin Li, Ellis Michael, Dan R. K. Ports Web services and applications rely on distributed storage systems Web services and applications rely on distributed
Jialin Li, Ellis Michael, Dan R. K. Ports
Shard 3 Client Shard 1 Shard 2
Shard 3 Client Shard 1 Shard 2
req prepare
commit
Shard 3 Client Shard 1 Shard 2
req prepare
commit
Shard 3 Client Shard 1 Shard 2
req prepare
commit
without coordination in the normal case
unreplicated system on TPC-C
minimal performance penalties
A new architecture that divides the responsibility for transactional guarantees in a new way …leveraging the datacenter network to order messages within and across shards …and a co-designed transaction protocol with minimal coordination.
Atomic Commitment (2PC) Concurrency Control (2PL) Concurrency Control (2PL) Replication (Paxos)
Replica Replica Replica
Replication (Paxos)
Replica Replica Replica
Atomic Commitment (2PC) Concurrency Control (2PL) Concurrency Control (2PL) Replication (Paxos)
Replica Replica Replica
Replication (Paxos)
Replica Replica Replica
Ordering (within shard) Reliability (within shard)
Isolation
Atomic Commitment (2PC) Concurrency Control (2PL) Concurrency Control (2PL) Replication (Paxos)
Replica Replica Replica
Replication (Paxos)
Replica Replica Replica
Ordering (within shard) Reliability (within shard)
Ordering (across shard) Isolation
Atomic Commitment (2PC) Concurrency Control (2PL) Concurrency Control (2PL) Replication (Paxos)
Replica Replica Replica
Replication (Paxos)
Replica Replica Replica
Ordering (within shard) Reliability (within shard) Reliability (across shards)
Ordering (across shard) Isolation
Ordering (within shard) Reliability (within shard) Reliability (across shards)
Ordering (across shard) Isolation
Ordering (within shard) Reliability (within shard) Reliability (across shards)
Multi-sequencing Independent Transaction Protocol General Transaction Protocol
Eris
Ordering (across shard) Isolation
Ordering (within shard) Reliability (within shard) Reliability (across shards)
Multi-sequencing Independent Transaction Protocol General Transaction Protocol
Eris Application Network
delivered to multiple destination shards
A B C Receivers
T1
(ABC)
T1
(ABC)
T1
(ABC)
T2
(AB)
T2
(AB)
A B C Receivers
T1
(ABC)
T1
(ABC)
T1
(ABC)
T2
(AB)
T2
(AB)
A B C Receivers
T1
(ABC)
T1
(ABC)
T1
(ABC)
T2
(AB)
T2
(AB)
A B C Receivers
T1
(ABC)
T1
(ABC)
T2
(AB)
T2
(AB)
A B C Receivers
T1
(ABC)
T1
(ABC)
T2
(AB)
T2
(AB)
DROP
A B C Receivers
T1
(ABC)
T1
(ABC)
T2
(AB)
T2
(AB)
DROP
T1
(ABC)
T2
(AB)
T2
(AB)
T2
(AB)
T2
(AB)
T1
(ABC)
T1
(ABC)
T1
(ABC)
T1
(ABC)
T1
(ABC)
T1
(ABC)
A B C Receivers
T1
(ABC)
T1
(ABC)
T2
(AB)
T2
(AB)
DROP
T1
(ABC)
destination multicast groups
sequenced atomically across all recipient groups
A B C Receivers Sequencer Counter: A0 B0 C0
A B C Receivers Sequencer
T1
(ABC)
Counter: A0 B0 C0
A B C Receivers Sequencer
T1
(ABC)
Counter: A0 B0 C0
A B C Receivers Sequencer
T1
(ABC)
Counter: A0 B0 C0 A1 B1 C1
A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1
T1
(ABC) A1 B1 C1
A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1
T1
(ABC) A1 B1 C1
T2
(AB)
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1
T1
(ABC) A1 B1 C1
T2
(AB)
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1
T1
(ABC) A1 B1 C1
T2
(AB)
A2 B2 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1
T1
(ABC) A1 B1 C1
A2 B2 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T2
(AB) A2 B2
A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1
T1
(ABC) A1 B1 C1
A2 B2 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T2
(AB) A2 B2
T2
(AB) A2 B2
A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1
T1
(ABC) A1 B1 C1
A2 B2 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T2
(AB) A2 B2
A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1
T1
(ABC) A1 B1 C1
A2 B2 C1
T3
(A)
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T2
(AB) A2 B2
A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1
T1
(ABC) A1 B1 C1
A2 B2 C1
T3
(A)
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T2
(AB) A2 B2
A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1
T1
(ABC) A1 B1 C1
A2 B2 C1
T3
(A)
A3 B2 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T2
(AB) A2 B2
A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1
T1
(ABC) A1 B1 C1
A2 B2 C1 A3 B2 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T2
(AB) A2 B2
T3
(A) A3
A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1
T1
(ABC) A1 B1 C1
A2 B2 C1 A3 B2 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T2
(AB) A2 B2
T3
(A) A3
A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1
T1
(ABC) A1 B1 C1
A2 B2 C1 A3 B2 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T2
(AB) A2 B2
T3
(A) A3
A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1
T1
(ABC) A1 B1 C1
A2 B2 C1 A3 B2 C1
DROP
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T2
(AB) A2 B2
T3
(A) A3
✤ Programmable switches, written in P4 ✤ Middlebox prototype using network processors
drop detection
transactions?
Eris supports two types of transactions
✤ One-shot (stored procedures) ✤ No cross-shard dependencies ✤ Proposed by H-Store [VLDB ’07] and Granola
[ATC ’12]
Name Salary Alice 600 Name Salary Bob 350 Name Salary Charlie 400
START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT
Name Salary Alice 600 Name Salary Bob 350 Name Salary Charlie 400
START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT
Name Salary Alice 600 Name Salary Bob 350 Name Salary Charlie 400
START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT
Name Salary Bob 450 Name Salary Charlie 500
Name Salary Alice 600 Name Salary Bob 350 Name Salary Charlie 400
START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT
Name Salary Bob 450 Name Salary Charlie 500
START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE 500 < (SELECT AVG(t2.Salary) FROM tb t2) COMMIT
Name Salary Alice 600 Name Salary Bob 350 Name Salary Charlie 400
START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT
Name Salary Bob 450 Name Salary Charlie 500
START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE 500 < (SELECT AVG(t2.Salary) FROM tb t2) COMMIT
Name Salary Alice 600 Name Salary Bob 350 Name Salary Charlie 400
START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT
Name Salary Bob 450 Name Salary Charlie 500
Name Salary Alice 600 Name Salary Bob 350 Name Salary Charlie 400
START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT
Name Salary Bob 450 Name Salary Charlie 500
Many applications consist entirely of independent transactions (e.g. TPC-C)
consistent order guarantees serializability
server failures?
Shard 3 Client Shard 1 Shard 2 Sequencer
Learner Learner Learner Replica Replica Replica Replica Replica Replica
Shard 3 Client Shard 1 Shard 2 Sequencer
Learner Learner Learner Replica Replica Replica Replica Replica Replica
Shard 3 Client Shard 1 Shard 2 Sequencer
Learner Learner Learner Replica Replica Replica Replica Replica Replica
Shard 3 Client Shard 1 Shard 2 Sequencer
Learner Learner Learner Replica Replica Replica Replica Replica Replica
Shard 3 Client Shard 1 Shard 2 Sequencer
Learner Learner Learner Replica Replica Replica Replica Replica Replica
Shard 3 Client Shard 1 Shard 2 Sequencer
1 round trip
Learner Learner Learner Replica Replica Replica Replica Replica Replica
Shard 3 Client Shard 1 Shard 2 Sequencer
1 round trip no coordination
Learner Learner Learner Replica Replica Replica Replica Replica Replica
A B C
DROP
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T3
(A) A3
A B C
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T3
(A) A3
A B C
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T2
(AB) A2 B2
T3
(A) A3
A B C
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T2
(AB) A2 B2
T3
(A) A3
A B C
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T2
(AB) A2 B2
T3
(A) A3
A B C
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T2
(AB) A2 B2
T3
(A) A3
A B C
DROP
Failure Coordinator
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T3
(A) A3
T2
(AB) A2 B2
A B C
DROP
Failure Coordinator
Received A2?
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T3
(A) A3
T2
(AB) A2 B2
A B C
DROP
Failure Coordinator
Received A2? Received A2?
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T3
(A) A3
T2
(AB) A2 B2
A B C
DROP
Failure Coordinator
Received A2? Received A2?
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T3
(A) A3
T2
(AB) A2 B2
A B C
DROP
Failure Coordinator
Not Found
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T3
(A) A3
T2
(AB) A2 B2
T2
(AB) A2 B2
A B C
DROP
Failure Coordinator
Not Found
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T3
(A) A3
T2
(AB) A2 B2
T2
(AB) A2 B2
A B C
DROP
Failure Coordinator
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T3
(A) A3
T2
(AB) A2 B2
T2
(AB) A2 B2
A B C Failure Coordinator
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T3
(A) A3
T2
(AB) A2 B2
T2
(AB) A2 B2
A B C
DROP
Received A2? Received A2?
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T3
(A) A3
T1
(ABC) A1 B1 C1
Failure Coordinator
A B C
DROP
Not Found Not Found
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T3
(A) A3
T1
(ABC) A1 B1 C1
Failure Coordinator
A B C
DROP
Not Found Not Found
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T3
(A) A3
T1
(ABC) A1 B1 C1
Failure Coordinator
A B C
DROP
Drop A2 Drop A2 Drop A2
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T3
(A) A3
T1
(ABC) A1 B1 C1
Failure Coordinator
A B C
Drop A2 Drop A2 Drop A2
NO OP T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T3
(A) A3
T1
(ABC) A1 B1 C1
Failure Coordinator
A B C
Drop A2 Drop A2 Drop A2
NO OP
Drops: A2 Drops: A2
T1
(ABC) A1 B1 C1
T1
(ABC) A1 B1 C1
T3
(A) A3
T1
(ABC) A1 B1 C1
Failure Coordinator
Designated learner (DL) failure:
from previous views Sequencer failure:
start the new epoch in consistent states
Cavium Octeon CN6880
Lock-Store TAPIR Granola
Eris
NT-UR 0K 300K 600K 900K 1,200K
Distributed independent transactions
Throughput (txns/sec)
Lock-Store TAPIR Granola
Eris
NT-UR 0K 300K 600K 900K 1,200K
Distributed independent transactions
Throughput (txns/sec)
Eris outperforms Lock-Store, TAPIR and Granola by more than 3X
Lock-Store TAPIR Granola
Eris
NT-UR 0K 300K 600K 900K 1,200K
Distributed independent transactions
Throughput (txns/sec)
Eris achieves throughput within 10% of NT-UR Eris outperforms Lock-Store, TAPIR and Granola by more than 3X
Lock-Store TAPIR Granola
Eris
NT-UR 0K 300K 600K 900K 1,200K
Distributed independent transactions
Throughput (txns/sec)
Eris achieves throughput within 10% of NT-UR Eris outperforms Lock-Store, TAPIR and Granola by more than 3X
More than 70% reduction in latency compared to Lock-Store, and within 10% latency of NT-UR
Lock-Store TAPIR Granola
Eris
NT-UR 0K 300K 600K 900K 1,200K
Distributed general transactions
Throughput (txns/sec)
Lock-Store TAPIR Granola
Eris
NT-UR 0K 300K 600K 900K 1,200K
Distributed general transactions
Throughput (txns/sec)
Eris maintains throughput within 10% of NT-UR
0K 60K 120K 180K 240K Lock-Store TAPIR Granola
Eris
NT-UR
TPC-C benchmark
Throughput (txns/sec)
0K 60K 120K 180K 240K Lock-Store TAPIR Granola
Eris
NT-UR
TPC-C benchmark
Throughput (txns/sec)
7.6X and 6.4X higher throughput than Lock-Store and Tapir
0K 60K 120K 180K 240K Lock-Store TAPIR Granola
Eris
NT-UR
TPC-C benchmark
Throughput (txns/sec)
7.6X and 6.4X higher throughput than Lock-Store and Tapir within 3% throughput
0K 450K 900K 1,350K 1,800K 0.01% 0.1% 1% 10%
Eris Lock-Store TAPIR Granola NT-UR
Packet Drop Rate
Throughput (txns/sec)
0K 450K 900K 1,350K 1,800K 0.01% 0.1% 1% 10%
Eris Lock-Store TAPIR Granola NT-UR
Packet Drop Rate
TAPIR Lock-Store Eris Granola NT-UR
Throughput (txns/sec)
Co-designing distributed systems with the network
NetPaxos [SOSR ‘15] Sequencers for transaction processing
Calvin [SIGMOD ‘12] Independent and other restricted transaction models
Calvin [SIGMOD ‘12]
✤ An in-network concurrency control mechanism that
establishes a consistent order of transactions across shards
✤ An efficient protocol that ensures reliable delivery of
independent transactions
✤ A general transaction layer atop independent transaction
processing
minimal performance overhead