Eris: Coordination-Free Consistent Transactions Using In-Network - - PowerPoint PPT Presentation

eris coordination free consistent transactions using in
SMART_READER_LITE
LIVE PREVIEW

Eris: Coordination-Free Consistent Transactions Using In-Network - - PowerPoint PPT Presentation

Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control Jialin Li, Ellis Michael, Dan R. K. Ports Web services and applications rely on distributed storage systems Web services and applications rely on distributed


slide-1
SLIDE 1

Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control

Jialin Li, Ellis Michael, Dan R. K. Ports

slide-2
SLIDE 2

Web services and applications rely

  • n distributed storage systems
slide-3
SLIDE 3

Web services and applications rely

  • n distributed storage systems
slide-4
SLIDE 4

Web services and applications rely

  • n distributed storage systems
slide-5
SLIDE 5

Web services and applications rely

  • n distributed storage systems
slide-6
SLIDE 6

Web services and applications rely

  • n distributed storage systems
slide-7
SLIDE 7

Partitioned for Scalability, Replicated for Availability

slide-8
SLIDE 8

Partitioned for Scalability, Replicated for Availability

slide-9
SLIDE 9

Partitioned for Scalability, Replicated for Availability

slide-10
SLIDE 10

Shard 3 Client Shard 1 Shard 2

Existing transactional systems: extensive coordination

slide-11
SLIDE 11

Shard 3 Client Shard 1 Shard 2

req prepare

  • k

commit

Existing transactional systems: extensive coordination

slide-12
SLIDE 12

Shard 3 Client Shard 1 Shard 2

req prepare

  • k

commit

Existing transactional systems: extensive coordination

slide-13
SLIDE 13

Shard 3 Client Shard 1 Shard 2

req prepare

  • k

commit

Existing transactional systems: extensive coordination

slide-14
SLIDE 14
  • Processes independent transactions 


without coordination in the normal case

  • Performance within 3% of a nontransactional,

unreplicated system on TPC-C

  • Strongly consistent, fault tolerant transactions with

minimal performance penalties

In this talk … Eris

slide-15
SLIDE 15

Key Contributions

A new architecture that divides the responsibility for transactional guarantees in a new way …leveraging the datacenter network to order messages within and across shards …and a co-designed transaction protocol 
 with minimal coordination.

slide-16
SLIDE 16

Traditional Layered Approach

Atomic Commitment (2PC) Concurrency Control (2PL) Concurrency Control (2PL) Replication (Paxos)

Replica Replica Replica

Replication (Paxos)

Replica Replica Replica

slide-17
SLIDE 17

Traditional Layered Approach

Atomic Commitment (2PC) Concurrency Control (2PL) Concurrency Control (2PL) Replication (Paxos)

Replica Replica Replica

Replication (Paxos)

Replica Replica Replica

Ordering (within shard) Reliability (within shard)

slide-18
SLIDE 18

Isolation

Traditional Layered Approach

Atomic Commitment (2PC) Concurrency Control (2PL) Concurrency Control (2PL) Replication (Paxos)

Replica Replica Replica

Replication (Paxos)

Replica Replica Replica

Ordering (within shard) Reliability (within shard)

slide-19
SLIDE 19

Ordering (across shard) Isolation

Traditional Layered Approach

Atomic Commitment (2PC) Concurrency Control (2PL) Concurrency Control (2PL) Replication (Paxos)

Replica Replica Replica

Replication (Paxos)

Replica Replica Replica

Ordering (within shard) Reliability (within shard) Reliability (across shards)

slide-20
SLIDE 20

Ordering (across shard) Isolation

Traditional Layered Approach

Ordering (within shard) Reliability (within shard) Reliability (across shards)

slide-21
SLIDE 21

Ordering (across shard) Isolation

Traditional Layered Approach

Ordering (within shard) Reliability (within shard) Reliability (across shards)

Multi-sequencing Independent Transaction Protocol General Transaction Protocol

Eris

A new way to divide the responsibilities for different guarantees

slide-22
SLIDE 22

Ordering (across shard) Isolation

Traditional Layered Approach

Ordering (within shard) Reliability (within shard) Reliability (across shards)

Multi-sequencing Independent Transaction Protocol General Transaction Protocol

Eris Application Network

A new way to divide the responsibilities for different guarantees

slide-23
SLIDE 23

Outline

  • 1. Introduction
  • 2. In-Network Concurrency Control
  • 3. Transaction Model
  • 4. Eris Protocol
  • 5. Evaluation
slide-24
SLIDE 24

In-Network Concurrency Control Goals

  • Globally consistent ordering across messages

delivered to multiple destination shards

  • No reliable delivery guarantee
  • Recipients can detect dropped messages
slide-25
SLIDE 25

A B C Receivers

T1

(ABC)

T1

(ABC)

T1

(ABC)

T2

(AB)

T2

(AB)

slide-26
SLIDE 26

A B C Receivers

T1

(ABC)

T1

(ABC)

T1

(ABC)

T2

(AB)

T2

(AB)

slide-27
SLIDE 27

A B C Receivers

T1

(ABC)

T1

(ABC)

T1

(ABC)

T2

(AB)

T2

(AB)

slide-28
SLIDE 28

A B C Receivers

T1

(ABC)

T1

(ABC)

T2

(AB)

T2

(AB)

slide-29
SLIDE 29

A B C Receivers

T1

(ABC)

T1

(ABC)

T2

(AB)

T2

(AB)

DROP

slide-30
SLIDE 30

A B C Receivers

T1

(ABC)

T1

(ABC)

T2

(AB)

T2

(AB)

DROP

T1

(ABC)

slide-31
SLIDE 31

T2

(AB)

T2

(AB)

T2

(AB)

T2

(AB)

T1

(ABC)

T1

(ABC)

T1

(ABC)

T1

(ABC)

T1

(ABC)

T1

(ABC)

A B C Receivers

T1

(ABC)

T1

(ABC)

T2

(AB)

T2

(AB)

DROP

T1

(ABC)

slide-32
SLIDE 32

Multi-Sequenced Groupcast

  • Groupcast: message header specifies a set of

destination multicast groups

  • Multi-sequenced groupcast: messages are

sequenced atomically across all recipient groups

  • Sequencer keeps a counter for each group
  • Extends OUM in NOPaxos [OSDI ’16]
slide-33
SLIDE 33

A B C Receivers Sequencer Counter: A0 B0 C0

slide-34
SLIDE 34

A B C Receivers Sequencer

T1

(ABC)

Counter: A0 B0 C0

slide-35
SLIDE 35

A B C Receivers Sequencer

T1

(ABC)

Counter: A0 B0 C0

slide-36
SLIDE 36

A B C Receivers Sequencer

T1

(ABC)

Counter: A0 B0 C0 A1 B1 C1

slide-37
SLIDE 37

A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1

T1

(ABC) A1 B1 C1

slide-38
SLIDE 38

A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

slide-39
SLIDE 39

A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1

T1

(ABC) A1 B1 C1

T2

(AB)

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

slide-40
SLIDE 40

A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1

T1

(ABC) A1 B1 C1

T2

(AB)

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

slide-41
SLIDE 41

A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1

T1

(ABC) A1 B1 C1

T2

(AB)

A2 B2 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

slide-42
SLIDE 42

A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1

T1

(ABC) A1 B1 C1

A2 B2 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T2

(AB) A2 B2

slide-43
SLIDE 43

A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1

T1

(ABC) A1 B1 C1

A2 B2 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T2

(AB) A2 B2

T2

(AB) A2 B2

slide-44
SLIDE 44

A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1

T1

(ABC) A1 B1 C1

A2 B2 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T2

(AB) A2 B2

slide-45
SLIDE 45

A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1

T1

(ABC) A1 B1 C1

A2 B2 C1

T3

(A)

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T2

(AB) A2 B2

slide-46
SLIDE 46

A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1

T1

(ABC) A1 B1 C1

A2 B2 C1

T3

(A)

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T2

(AB) A2 B2

slide-47
SLIDE 47

A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1

T1

(ABC) A1 B1 C1

A2 B2 C1

T3

(A)

A3 B2 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T2

(AB) A2 B2

slide-48
SLIDE 48

A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1

T1

(ABC) A1 B1 C1

A2 B2 C1 A3 B2 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T2

(AB) A2 B2

T3

(A) A3

slide-49
SLIDE 49

A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1

T1

(ABC) A1 B1 C1

A2 B2 C1 A3 B2 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T2

(AB) A2 B2

T3

(A) A3

slide-50
SLIDE 50

A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1

T1

(ABC) A1 B1 C1

A2 B2 C1 A3 B2 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T2

(AB) A2 B2

T3

(A) A3

slide-51
SLIDE 51

A B C Receivers Sequencer Counter: A0 B0 C0 A1 B1 C1

T1

(ABC) A1 B1 C1

A2 B2 C1 A3 B2 C1

DROP

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T2

(AB) A2 B2

T3

(A) A3

slide-52
SLIDE 52

Network Implementation

  • Groupcast routing using OpenFlow
  • Sequencer implementations:

✤ Programmable switches, written in P4 ✤ Middlebox prototype using network processors

  • Global epoch number for sequencer failures
slide-53
SLIDE 53

What have we accomplished so far?

  • Consistently ordered groupcast primitive with 


drop detection

  • How do we go from multi-sequenced groupcast to

transactions?

slide-54
SLIDE 54

Outline

  • 1. Introduction
  • 2. In-Network Concurrency Control
  • 3. Transaction Model
  • 4. Eris Protocol
  • 5. Evaluation
slide-55
SLIDE 55

Transaction Model

Eris supports two types of transactions

  • Independent transactions:

✤ One-shot (stored procedures) ✤ No cross-shard dependencies ✤ Proposed by H-Store [VLDB ’07] and Granola

[ATC ’12]

  • Fully general transactions
slide-56
SLIDE 56

Independent Transaction

Name Salary Alice 600 Name Salary Bob 350 Name Salary Charlie 400

START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT

slide-57
SLIDE 57

Independent Transaction

Name Salary Alice 600 Name Salary Bob 350 Name Salary Charlie 400

START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT

slide-58
SLIDE 58

Independent Transaction

Name Salary Alice 600 Name Salary Bob 350 Name Salary Charlie 400

START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT

Name Salary Bob 450 Name Salary Charlie 500

slide-59
SLIDE 59

Independent Transaction

Name Salary Alice 600 Name Salary Bob 350 Name Salary Charlie 400

START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT

Name Salary Bob 450 Name Salary Charlie 500

START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE 500 < (SELECT AVG(t2.Salary) FROM tb t2) COMMIT

slide-60
SLIDE 60

Independent Transaction

Name Salary Alice 600 Name Salary Bob 350 Name Salary Charlie 400

START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT

Name Salary Bob 450 Name Salary Charlie 500

START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE 500 < (SELECT AVG(t2.Salary) FROM tb t2) COMMIT

N

  • t

I n d e p e n d e n t !

slide-61
SLIDE 61

Independent Transaction

Name Salary Alice 600 Name Salary Bob 350 Name Salary Charlie 400

START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT

Name Salary Bob 450 Name Salary Charlie 500

slide-62
SLIDE 62

Independent Transaction

Name Salary Alice 600 Name Salary Bob 350 Name Salary Charlie 400

START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT START TRANSACTION
 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 COMMIT

Name Salary Bob 450 Name Salary Charlie 500

Many applications consist entirely of independent transactions (e.g. TPC-C)

slide-63
SLIDE 63

Why independent transactions?

  • No coordination/communication across shards
  • Executing them serially at each shard in a

consistent order guarantees serializability

  • Multi-sequenced groupcast establishes such an
  • rder
  • How to handle message drops and sequencer/

server failures?

slide-64
SLIDE 64

Outline

  • 1. Introduction
  • 2. In-Network Concurrency Control
  • 3. Transaction Model
  • 4. Eris Protocol
  • 5. Evaluation
slide-65
SLIDE 65

Shard 3 Client Shard 1 Shard 2 Sequencer

Normal Case

Learner Learner Learner Replica Replica Replica Replica Replica Replica

slide-66
SLIDE 66

Shard 3 Client Shard 1 Shard 2 Sequencer

Normal Case

Learner Learner Learner Replica Replica Replica Replica Replica Replica

slide-67
SLIDE 67

Shard 3 Client Shard 1 Shard 2 Sequencer

Normal Case

Learner Learner Learner Replica Replica Replica Replica Replica Replica

slide-68
SLIDE 68

Shard 3 Client Shard 1 Shard 2 Sequencer

Normal Case

Learner Learner Learner Replica Replica Replica Replica Replica Replica

slide-69
SLIDE 69

Shard 3 Client Shard 1 Shard 2 Sequencer

Normal Case

Learner Learner Learner Replica Replica Replica Replica Replica Replica

slide-70
SLIDE 70

Shard 3 Client Shard 1 Shard 2 Sequencer

1 round trip

Normal Case

Learner Learner Learner Replica Replica Replica Replica Replica Replica

slide-71
SLIDE 71

Shard 3 Client Shard 1 Shard 2 Sequencer

1 round trip no coordination

Normal Case

Learner Learner Learner Replica Replica Replica Replica Replica Replica

slide-72
SLIDE 72

How to handle dropped messages?

A B C

DROP

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T3

(A) A3

slide-73
SLIDE 73

How to handle dropped messages?

A B C

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T3

(A) A3

slide-74
SLIDE 74

How to handle dropped messages?

A B C

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T2

(AB) A2 B2

T3

(A) A3

slide-75
SLIDE 75

How to handle dropped messages?

A B C

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T2

(AB) A2 B2

T3

(A) A3

slide-76
SLIDE 76

How to handle dropped messages?

A B C

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T2

(AB) A2 B2

T3

(A) A3

slide-77
SLIDE 77

How to handle dropped messages?

A B C

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T2

(AB) A2 B2

T3

(A) A3

Global coordination problem

slide-78
SLIDE 78

The Failure Coordinator

A B C

DROP

Failure Coordinator

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T3

(A) A3

T2

(AB) A2 B2

slide-79
SLIDE 79

The Failure Coordinator

A B C

DROP

Failure Coordinator

Received A2?

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T3

(A) A3

T2

(AB) A2 B2

slide-80
SLIDE 80

The Failure Coordinator

A B C

DROP

Failure Coordinator

Received A2? Received A2?

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T3

(A) A3

T2

(AB) A2 B2

slide-81
SLIDE 81

The Failure Coordinator

A B C

DROP

Failure Coordinator

Received A2? Received A2?

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T3

(A) A3

T2

(AB) A2 B2

slide-82
SLIDE 82

The Failure Coordinator

A B C

DROP

Failure Coordinator

Not Found

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T3

(A) A3

T2

(AB) A2 B2

T2

(AB) A2 B2

slide-83
SLIDE 83

The Failure Coordinator

A B C

DROP

Failure Coordinator

Not Found

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T3

(A) A3

T2

(AB) A2 B2

T2

(AB) A2 B2

slide-84
SLIDE 84

The Failure Coordinator

A B C

DROP

Failure Coordinator

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T3

(A) A3

T2

(AB) A2 B2

T2

(AB) A2 B2

slide-85
SLIDE 85

The Failure Coordinator

A B C Failure Coordinator

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T3

(A) A3

T2

(AB) A2 B2

T2

(AB) A2 B2

slide-86
SLIDE 86

The Failure Coordinator

A B C

DROP

Received A2? Received A2?

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T3

(A) A3

T1

(ABC) A1 B1 C1

Failure Coordinator

slide-87
SLIDE 87

The Failure Coordinator

A B C

DROP

Not Found Not Found

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T3

(A) A3

T1

(ABC) A1 B1 C1

Failure Coordinator

slide-88
SLIDE 88

The Failure Coordinator

A B C

DROP

Not Found Not Found

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T3

(A) A3

T1

(ABC) A1 B1 C1

Failure Coordinator

slide-89
SLIDE 89

The Failure Coordinator

A B C

DROP

Drop A2 Drop A2 Drop A2

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T3

(A) A3

T1

(ABC) A1 B1 C1

Failure Coordinator

slide-90
SLIDE 90

The Failure Coordinator

A B C

Drop A2 Drop A2 Drop A2

NO OP T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T3

(A) A3

T1

(ABC) A1 B1 C1

Failure Coordinator

slide-91
SLIDE 91

The Failure Coordinator

A B C

Drop A2 Drop A2 Drop A2

NO OP

Drops: A2 Drops: A2

T1

(ABC) A1 B1 C1

T1

(ABC) A1 B1 C1

T3

(A) A3

T1

(ABC) A1 B1 C1

Failure Coordinator

slide-92
SLIDE 92

Designated Learner and Sequencer Failures

Designated learner (DL) failure:

  • View change based protocol
  • Ensures new DL learns all committed transactions

from previous views Sequencer failure:

  • Higher epoch number from the new sequencer
  • Epoch change ensures all replicas across all shards

start the new epoch in consistent states

slide-93
SLIDE 93

Can we process non-independent transactions efficiently?

slide-94
SLIDE 94

Can we process non-independent transactions efficiently? Yes, by dividing them into multiple independent transactions (See the paper!)

slide-95
SLIDE 95

Outline

  • 1. Introduction
  • 2. In-Network Concurrency Control
  • 3. Transaction Model
  • 4. Eris Protocol
  • 5. Evaluation
slide-96
SLIDE 96

Evaluation Setup

  • 3-level fat-tree topology testbed
  • 15 shards, 3 replicas per shard
  • 2.5 GHz Intel Xeon E5-2680 servers
  • Middlebox sequencer implementation using

Cavium Octeon CN6880

  • YCSB+T and TPC-C workloads
slide-97
SLIDE 97

Comparison Systems

  • Lock-Store (2PC + 2PL + Paxos)
  • TAPIR [SOSP ’15]
  • Granola [ATC ‘12]
  • Non-transactional, unreplicated (NT-UR)
slide-98
SLIDE 98

Eris performs well on independent transactions

Lock-Store TAPIR Granola

Eris

NT-UR 0K 300K 600K 900K 1,200K

Distributed independent transactions

Throughput (txns/sec)

slide-99
SLIDE 99

Eris performs well on independent transactions

Lock-Store TAPIR Granola

Eris

NT-UR 0K 300K 600K 900K 1,200K

Distributed independent transactions

Throughput (txns/sec)

Eris outperforms 
 Lock-Store, TAPIR and Granola by more than 3X

slide-100
SLIDE 100

Eris performs well on independent transactions

Lock-Store TAPIR Granola

Eris

NT-UR 0K 300K 600K 900K 1,200K

Distributed independent transactions

Throughput (txns/sec)

Eris achieves throughput within 10% of NT-UR Eris outperforms 
 Lock-Store, TAPIR and Granola by more than 3X

slide-101
SLIDE 101

Eris performs well on independent transactions

Lock-Store TAPIR Granola

Eris

NT-UR 0K 300K 600K 900K 1,200K

Distributed independent transactions

Throughput (txns/sec)

Eris achieves throughput within 10% of NT-UR Eris outperforms 
 Lock-Store, TAPIR and Granola by more than 3X

More than 70% reduction in latency compared to Lock-Store, and within 10% latency of NT-UR

slide-102
SLIDE 102

Eris also performs well on general transactions

Lock-Store TAPIR Granola

Eris

NT-UR 0K 300K 600K 900K 1,200K

Distributed general transactions

Throughput (txns/sec)

slide-103
SLIDE 103

Eris also performs well on general transactions

Lock-Store TAPIR Granola

Eris

NT-UR 0K 300K 600K 900K 1,200K

Distributed general transactions

Throughput (txns/sec)

Eris maintains throughput within 10% of NT-UR

slide-104
SLIDE 104

0K 60K 120K 180K 240K Lock-Store TAPIR Granola

Eris

NT-UR

TPC-C benchmark

Throughput (txns/sec)

Eris excels at complex transactional application.

slide-105
SLIDE 105

0K 60K 120K 180K 240K Lock-Store TAPIR Granola

Eris

NT-UR

TPC-C benchmark

Throughput (txns/sec)

Eris excels at complex transactional application.

7.6X and 6.4X higher throughput than Lock-Store and Tapir

slide-106
SLIDE 106

0K 60K 120K 180K 240K Lock-Store TAPIR Granola

Eris

NT-UR

TPC-C benchmark

Throughput (txns/sec)

Eris excels at complex transactional application.

7.6X and 6.4X higher throughput than Lock-Store and Tapir within 3% throughput

  • f NT-UR
slide-107
SLIDE 107

Eris is resilient to network anomalies

0K 450K 900K 1,350K 1,800K 0.01% 0.1% 1% 10%

Eris Lock-Store TAPIR Granola NT-UR

Packet Drop Rate

Throughput (txns/sec)

slide-108
SLIDE 108

Eris is resilient to network anomalies

0K 450K 900K 1,350K 1,800K 0.01% 0.1% 1% 10%

Eris Lock-Store TAPIR Granola NT-UR

Packet Drop Rate

TAPIR Lock-Store Eris Granola NT-UR

Throughput (txns/sec)

slide-109
SLIDE 109

Related Work

Co-designing distributed systems with the network

  • NOPaxos [OSDI ‘16], Speculative Paxos [NSDI ‘15],

NetPaxos [SOSR ‘15] Sequencers for transaction processing

  • Hyder [CIDR ‘11], vCorfu [NSDI ‘17], 


Calvin [SIGMOD ‘12] Independent and other restricted transaction models

  • H-Store [VLDB ‘07], Granola [ATC ‘12], 


Calvin [SIGMOD ‘12]

slide-110
SLIDE 110

Conclusion

  • A new division of responsibility for transaction processing

✤ An in-network concurrency control mechanism that

establishes a consistent order of transactions across shards

✤ An efficient protocol that ensures reliable delivery of

independent transactions

✤ A general transaction layer atop independent transaction

processing

  • Result: strongly consistent, fault-tolerant transactions with

minimal performance overhead