SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State - - PowerPoint PPT Presentation
SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State - - PowerPoint PPT Presentation
SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines Hanyu Zhao * , Quanlu Zhang , Zhi Yang * , Ming Wu , Yafei Dai * * Peking University Microsoft Research Replication for Fault Tolerance Peking
Peking University, Microsoft Research
Replication for Fault Tolerance
2
Peking University, Microsoft Research
Replication in the Wide Area
3
- Reducing wide-area latency
for clients
20ms 150ms
Peking University, Microsoft Research
Keeping the Replicated State Consistent
4
“Having fun at SoCC!” “Having fun at OSDI!”
Inconsistent!
Peking University, Microsoft Research
State Machine Replication (SMR)
5
A = 1 A = 2 A = 3 A = 1 A = 2 A = 3 A = 1 A = 2 A = 3
Execute the same sequence of commands in the same order A = 3 A = 3 A = 3
Peking University, Microsoft Research
Paxos
- A distributed agreement protocol
- Tolerates F failures given 2F+1 replicas
- Choose a single command for eac
ach command slo slot t using a Paxos ins instance
6
A = 1 A = 1 A = 1 Paxos instance 1
Peking University, Microsoft Research
Paxos
- A distributed agreement protocol
- Tolerates F failures given 2F+1 replicas
- Choose a single command for eac
ach command slo slot t using a Paxos ins instance
7
A = 1 A = 2 A = 1 A = 2 A = 1 A = 2 Paxos instance 2
Peking University, Microsoft Research
Paxos
- A distributed agreement protocol
- Tolerates F failures given 2F+1 replicas
- Choose a single command for eac
ach command slo slot t using a Paxos ins instance
8
A = 1 A = 2 A = 3 A = 1 A = 2 A = 3 A = 1 A = 2 A = 3 Paxos instance 3
Peking University, Microsoft Research
Centralized SMR
- Liveness property of Paxos:
- There should not be multiple replicas proposing commands in the same
instance simultaneously
9
A = 1 Conflict! A = 2 A = 3
Peking University, Microsoft Research
Centralized SMR
- Liveness property of Paxos:
- There should not be multiple replicas proposing commands in the same
instance simultaneously
10
A = 1 A = 2 A = 3 A stable leader
Peking University, Microsoft Research 11
Drawbacks of Centralized SMR
- Potential performance bottleneck
- Low throughput
Peking University, Microsoft Research 12
Drawbacks of Centralized SMR
- Potential performance bottleneck
- Low throughput
- High wide-area latency
20ms 200ms
Peking University, Microsoft Research 13
Drawbacks of Centralized SMR
- Potential performance bottleneck
- Low throughput
- High wide-area latency
Centralized SMR Limited performance
Peking University, Microsoft Research 14
Drawbacks of Centralized SMR
- Potential performance bottleneck
- Low throughput
- High wide-area latency
Centralized SMR Decentralized SMR High performance? Limited performance
Peking University, Microsoft Research
Decentralizing SMR
15
A = 0 A = 0 A = 0 R0 R1 R2 Replicas should propose commands in different command slots How to order them?
Peking University, Microsoft Research
Decentralizing SMR
16
A = 0 A = 0 A = 0 A = 1 A = 1 A = 1 R0 R1 R2 Replicas should propose commands in different command slots How to order them?
Peking University, Microsoft Research
Decentralizing SMR
17
A = 0 A = 0 A = 0 A = 1 A = 1 A = 1 A = 2 A = 2 A = 2 R0 R1 R2 Replicas should propose commands in different command slots How to order them?
Peking University, Microsoft Research 18
Static Ordering
- The system runs at the speed of the slo
slowest one
A = 1 A = 2 A = 3
Blocked
Straggler
Peking University, Microsoft Research
Dependency-based Ordering
- Ordering overhead under contention
19
A = 1 A = 2 A = 3 A = 3 A = 1 A = 3 A = 2 A = 3
Peking University, Microsoft Research
Dependency-based Ordering
- Ordering overhead under contention
20
A = 1 A = 3 A = 2
Peking University, Microsoft Research
Drawbacks of Decentralized SMR
- Extra coordination for ordering => performance degradation
- Lower throughput
- Higher latency
21
Centralized SMR Decentralized SMR Poor performance stability Limited performance
Peking University, Microsoft Research
Drawbacks of Decentralized SMR
- Extra coordination for ordering => performance degradation
- Lower throughput
- Higher latency
22
Semi-Decentralized SMR
SDPaxos High performance Strong performance stability
Peking University, Microsoft Research
SDPaxos Intuition
23
A = 0 A = 0 A = 0 A = 1 A = 1 A = 1 A = 2 A = 2 A = 2 R0 R1 R2
Peking University, Microsoft Research
SDPaxos Intuition
24
A = 0 A = 0 A = 0 A = 1 A = 1 A = 1 A = 2 A = 2 A = 2 R0 R1 R2 R0 R1 R2 A = 0 A = 1 A = 2
Peking University, Microsoft Research
Centralizing Ordering
25
R0 R1 R2 R0 R2 Sequencer
- Dynamical leadership establishment (stragglers won’t block others)
- All commands are serialized (no conflicts)
- Ordering is more lightweight than replicating
I want to propose a command
Peking University, Microsoft Research 26
SDPaxos: The Basic Protocol
R0 R1 R2 (Sequencer)
C-accept (A) C-ACK (A) O-accept (R0) O-ACK (R0) Client request for command A Replicating A to others w/o execution order Assigning A to the next slot O-ACK (R0)
1.5 round trips
Peking University, Microsoft Research 27
Reducing Latency for 3 Replicas
R0 R1 R2 (Sequencer)
C-accept (A) C-ACK (A) O-accept (R0) O-ACK (R0) Client request for command A Replicating A to others w/o execution order Assigning A to the next slot O-ACK (R0) R0 and R2 have constituted a majority
Peking University, Microsoft Research 28
Reducing Latency for 3 Replicas
R0 R1 R2 (Sequencer)
C-accept (A) C-ACK (A) O-accept (R0) Client request for command A Replicating A to others w/o execution order Assigning A to the next slot O-ACK (R0)
1 round trip
R0 and R2 have constituted a majority
Peking University, Microsoft Research 29
Reducing Latency for 5 Replicas
R0 R1 R2 (Sequencer) R3 R4
C-accept (A) C-ACK (A) O-accept (R0) This assignment can be lost if R0 and R2 fail
Peking University, Microsoft Research 30
Reducing Latency for 5 Replicas
R0 R1 R2 (Sequencer) R3 R4
C-accept & O-accept C-ACK & O-ACK
Assignments for the sequencer can be seen by a majority in just one round trip
Peking University, Microsoft Research 31
Handling Failures for 5 Replicas
R0 R1 R2 R3 R4 R0 R1 R0 R2 R3 R4 R0 (Seq) R1 R2 R3 R4
Peking University, Microsoft Research 32
Handling Failures for 5 Replicas
R0 R1 R2 R3 R4 R0 R1 R0 R2 R3 R4 R2 R3 R4 R0 R1 R0 (Seq) R1 R2 R3 R4
Peking University, Microsoft Research
More Details in the Paper
- The detailed protocol and fault tolerance approach
- Reads bypassing Paxos
- Leveraging the centralized ordering to perform fast and safe reads
- Performance optimizations
- Lightening the load of ordering
- Straggler detection
- …
33
Peking University, Microsoft Research 34
Experimental Setup
- Baselines
- Multi-Paxos
- Mencius
- EPaxos
- Workload: a replicated key-value store
- Testbed: Amazon EC2 m4.large instances
- Wide-area experiments: CA, OR, OH, IRE, SEL
Peking University, Microsoft Research 35
20000 40000 60000 80000 100000 120000
Multi-Paxos Mencius SDPaxos-N SDPaxos-S Throughput (ops / sec)
Performance Stability against Stragglers
67.2% 47.7% 20.0% 28.2% 1.6x
Peking University, Microsoft Research 36
Performance Stability against Contention
30000 35000 40000 45000 50000 55000 60000 65000 70000 75000
0% 5% 25% 50% 75% 100%
Throughput (ops / sec) Contention rate EPaxos-3 EPaxos-5 SDPaxos-3 SDPaxos-5
1.35x
Peking University, Microsoft Research 37
Wide-area Latency
- SDPaxos achieves optimal number of round trips
- SDPaxos’s latency is relevant to the distance to the sequencer (IRE)
- SDPaxos’s latency is not impacted by stragglers or contention
Latency (ms)
Peking University, Microsoft Research 38
Conclusion
- The first semi-decentralized SMR protocol
- High performance
- Strong performance stability
- One-round-trip under realistic configurations tolerating one or two
failures
- High throughput, low latency with stragglers, under contention or in
ideal cases
Peking University, Microsoft Research
Q & A
39