SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State - - PowerPoint PPT Presentation

sdpaxos building efficient semi decentralized
SMART_READER_LITE
LIVE PREVIEW

SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State - - PowerPoint PPT Presentation

SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines Hanyu Zhao * , Quanlu Zhang , Zhi Yang * , Ming Wu , Yafei Dai * * Peking University Microsoft Research Replication for Fault Tolerance Peking


slide-1
SLIDE 1

SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines

Hanyu Zhao*, Quanlu Zhang†, Zhi Yang*, Ming Wu†, Yafei Dai*

* Peking University † Microsoft Research

slide-2
SLIDE 2

Peking University, Microsoft Research

Replication for Fault Tolerance

2

slide-3
SLIDE 3

Peking University, Microsoft Research

Replication in the Wide Area

3

  • Reducing wide-area latency

for clients

20ms 150ms

slide-4
SLIDE 4

Peking University, Microsoft Research

Keeping the Replicated State Consistent

4

“Having fun at SoCC!” “Having fun at OSDI!”

Inconsistent!

slide-5
SLIDE 5

Peking University, Microsoft Research

State Machine Replication (SMR)

5

A = 1 A = 2 A = 3 A = 1 A = 2 A = 3 A = 1 A = 2 A = 3

Execute the same sequence of commands in the same order A = 3 A = 3 A = 3

slide-6
SLIDE 6

Peking University, Microsoft Research

Paxos

  • A distributed agreement protocol
  • Tolerates F failures given 2F+1 replicas
  • Choose a single command for eac

ach command slo slot t using a Paxos ins instance

6

A = 1 A = 1 A = 1 Paxos instance 1

slide-7
SLIDE 7

Peking University, Microsoft Research

Paxos

  • A distributed agreement protocol
  • Tolerates F failures given 2F+1 replicas
  • Choose a single command for eac

ach command slo slot t using a Paxos ins instance

7

A = 1 A = 2 A = 1 A = 2 A = 1 A = 2 Paxos instance 2

slide-8
SLIDE 8

Peking University, Microsoft Research

Paxos

  • A distributed agreement protocol
  • Tolerates F failures given 2F+1 replicas
  • Choose a single command for eac

ach command slo slot t using a Paxos ins instance

8

A = 1 A = 2 A = 3 A = 1 A = 2 A = 3 A = 1 A = 2 A = 3 Paxos instance 3

slide-9
SLIDE 9

Peking University, Microsoft Research

Centralized SMR

  • Liveness property of Paxos:
  • There should not be multiple replicas proposing commands in the same

instance simultaneously

9

A = 1 Conflict! A = 2 A = 3

slide-10
SLIDE 10

Peking University, Microsoft Research

Centralized SMR

  • Liveness property of Paxos:
  • There should not be multiple replicas proposing commands in the same

instance simultaneously

10

A = 1 A = 2 A = 3 A stable leader

slide-11
SLIDE 11

Peking University, Microsoft Research 11

Drawbacks of Centralized SMR

  • Potential performance bottleneck
  • Low throughput
slide-12
SLIDE 12

Peking University, Microsoft Research 12

Drawbacks of Centralized SMR

  • Potential performance bottleneck
  • Low throughput
  • High wide-area latency

20ms 200ms

slide-13
SLIDE 13

Peking University, Microsoft Research 13

Drawbacks of Centralized SMR

  • Potential performance bottleneck
  • Low throughput
  • High wide-area latency

Centralized SMR Limited performance

slide-14
SLIDE 14

Peking University, Microsoft Research 14

Drawbacks of Centralized SMR

  • Potential performance bottleneck
  • Low throughput
  • High wide-area latency

Centralized SMR Decentralized SMR High performance? Limited performance

slide-15
SLIDE 15

Peking University, Microsoft Research

Decentralizing SMR

15

A = 0 A = 0 A = 0 R0 R1 R2 Replicas should propose commands in different command slots How to order them?

slide-16
SLIDE 16

Peking University, Microsoft Research

Decentralizing SMR

16

A = 0 A = 0 A = 0 A = 1 A = 1 A = 1 R0 R1 R2 Replicas should propose commands in different command slots How to order them?

slide-17
SLIDE 17

Peking University, Microsoft Research

Decentralizing SMR

17

A = 0 A = 0 A = 0 A = 1 A = 1 A = 1 A = 2 A = 2 A = 2 R0 R1 R2 Replicas should propose commands in different command slots How to order them?

slide-18
SLIDE 18

Peking University, Microsoft Research 18

Static Ordering

  • The system runs at the speed of the slo

slowest one

A = 1 A = 2 A = 3

Blocked

Straggler

slide-19
SLIDE 19

Peking University, Microsoft Research

Dependency-based Ordering

  • Ordering overhead under contention

19

A = 1 A = 2 A = 3 A = 3 A = 1 A = 3 A = 2 A = 3

slide-20
SLIDE 20

Peking University, Microsoft Research

Dependency-based Ordering

  • Ordering overhead under contention

20

A = 1 A = 3 A = 2

slide-21
SLIDE 21

Peking University, Microsoft Research

Drawbacks of Decentralized SMR

  • Extra coordination for ordering => performance degradation
  • Lower throughput
  • Higher latency

21

Centralized SMR Decentralized SMR Poor performance stability Limited performance

slide-22
SLIDE 22

Peking University, Microsoft Research

Drawbacks of Decentralized SMR

  • Extra coordination for ordering => performance degradation
  • Lower throughput
  • Higher latency

22

Semi-Decentralized SMR

SDPaxos High performance Strong performance stability

slide-23
SLIDE 23

Peking University, Microsoft Research

SDPaxos Intuition

23

A = 0 A = 0 A = 0 A = 1 A = 1 A = 1 A = 2 A = 2 A = 2 R0 R1 R2

slide-24
SLIDE 24

Peking University, Microsoft Research

SDPaxos Intuition

24

A = 0 A = 0 A = 0 A = 1 A = 1 A = 1 A = 2 A = 2 A = 2 R0 R1 R2 R0 R1 R2 A = 0 A = 1 A = 2

slide-25
SLIDE 25

Peking University, Microsoft Research

Centralizing Ordering

25

R0 R1 R2 R0 R2 Sequencer

  • Dynamical leadership establishment (stragglers won’t block others)
  • All commands are serialized (no conflicts)
  • Ordering is more lightweight than replicating

I want to propose a command

slide-26
SLIDE 26

Peking University, Microsoft Research 26

SDPaxos: The Basic Protocol

R0 R1 R2 (Sequencer)

C-accept (A) C-ACK (A) O-accept (R0) O-ACK (R0) Client request for command A Replicating A to others w/o execution order Assigning A to the next slot O-ACK (R0)

1.5 round trips

slide-27
SLIDE 27

Peking University, Microsoft Research 27

Reducing Latency for 3 Replicas

R0 R1 R2 (Sequencer)

C-accept (A) C-ACK (A) O-accept (R0) O-ACK (R0) Client request for command A Replicating A to others w/o execution order Assigning A to the next slot O-ACK (R0) R0 and R2 have constituted a majority

slide-28
SLIDE 28

Peking University, Microsoft Research 28

Reducing Latency for 3 Replicas

R0 R1 R2 (Sequencer)

C-accept (A) C-ACK (A) O-accept (R0) Client request for command A Replicating A to others w/o execution order Assigning A to the next slot O-ACK (R0)

1 round trip

R0 and R2 have constituted a majority

slide-29
SLIDE 29

Peking University, Microsoft Research 29

Reducing Latency for 5 Replicas

R0 R1 R2 (Sequencer) R3 R4

C-accept (A) C-ACK (A) O-accept (R0) This assignment can be lost if R0 and R2 fail

slide-30
SLIDE 30

Peking University, Microsoft Research 30

Reducing Latency for 5 Replicas

R0 R1 R2 (Sequencer) R3 R4

C-accept & O-accept C-ACK & O-ACK

Assignments for the sequencer can be seen by a majority in just one round trip

slide-31
SLIDE 31

Peking University, Microsoft Research 31

Handling Failures for 5 Replicas

R0 R1 R2 R3 R4 R0 R1 R0 R2 R3 R4 R0 (Seq) R1 R2 R3 R4

slide-32
SLIDE 32

Peking University, Microsoft Research 32

Handling Failures for 5 Replicas

R0 R1 R2 R3 R4 R0 R1 R0 R2 R3 R4 R2 R3 R4 R0 R1 R0 (Seq) R1 R2 R3 R4

slide-33
SLIDE 33

Peking University, Microsoft Research

More Details in the Paper

  • The detailed protocol and fault tolerance approach
  • Reads bypassing Paxos
  • Leveraging the centralized ordering to perform fast and safe reads
  • Performance optimizations
  • Lightening the load of ordering
  • Straggler detection

33

slide-34
SLIDE 34

Peking University, Microsoft Research 34

Experimental Setup

  • Baselines
  • Multi-Paxos
  • Mencius
  • EPaxos
  • Workload: a replicated key-value store
  • Testbed: Amazon EC2 m4.large instances
  • Wide-area experiments: CA, OR, OH, IRE, SEL
slide-35
SLIDE 35

Peking University, Microsoft Research 35

20000 40000 60000 80000 100000 120000

Multi-Paxos Mencius SDPaxos-N SDPaxos-S Throughput (ops / sec)

Performance Stability against Stragglers

67.2% 47.7% 20.0% 28.2% 1.6x

slide-36
SLIDE 36

Peking University, Microsoft Research 36

Performance Stability against Contention

30000 35000 40000 45000 50000 55000 60000 65000 70000 75000

0% 5% 25% 50% 75% 100%

Throughput (ops / sec) Contention rate EPaxos-3 EPaxos-5 SDPaxos-3 SDPaxos-5

1.35x

slide-37
SLIDE 37

Peking University, Microsoft Research 37

Wide-area Latency

  • SDPaxos achieves optimal number of round trips
  • SDPaxos’s latency is relevant to the distance to the sequencer (IRE)
  • SDPaxos’s latency is not impacted by stragglers or contention

Latency (ms)

slide-38
SLIDE 38

Peking University, Microsoft Research 38

Conclusion

  • The first semi-decentralized SMR protocol
  • High performance
  • Strong performance stability
  • One-round-trip under realistic configurations tolerating one or two

failures

  • High throughput, low latency with stragglers, under contention or in

ideal cases

slide-39
SLIDE 39

Peking University, Microsoft Research

Q & A

39