Liberating distributed consensus Heidi Howard @ Cambridge - - PowerPoint PPT Presentation

liberating distributed consensus
SMART_READER_LITE
LIVE PREVIEW

Liberating distributed consensus Heidi Howard @ Cambridge - - PowerPoint PPT Presentation

Liberating distributed consensus Heidi Howard @ Cambridge University heidi.howard@cl.cam.ac.uk @heidiann360 heidihoward.co.uk Distributed Dream Performance - scalability, low latency, high throughput, low cost, energy e ffj ciency,


slide-1
SLIDE 1

Liberating distributed consensus

Heidi Howard @ Cambridge University heidi.howard@cl.cam.ac.uk @heidiann360 heidihoward.co.uk

slide-2
SLIDE 2

Distributed Dream

Performance - scalability, low latency, high throughput, low cost, energy effjciency, versatility, adaptability Reliability - fault-tolerance, dependability, high availability, self- healing, geo-replicated Correctness - consistency, bug-free, easy to understand

2

slide-3
SLIDE 3

3

[JACM’85] [PODC’89]

slide-4
SLIDE 4

4

[CSUR’16]

slide-5
SLIDE 5

Deciding a single value

In this talk, we will reach agreement over a single value The system is comprised of:

  • servers which store the value
  • clients which propose values and learn the decided value

5

slide-6
SLIDE 6

This is not a blockchain talk

6

slide-7
SLIDE 7

Requirements of consensus

Safety - All clients must learn the same decided value Progress - Eventually, all clients must learn the decided value

7

slide-8
SLIDE 8

Requirements of consensus

Safety - All clients must learn the same decided value Progress - Eventually, all clients must learn the decided value

7

Safety must hold even in unreliable and asynchronous systems

slide-9
SLIDE 9

8

[TOCS’98]

slide-10
SLIDE 10

Theory perspective

9

“The Paxos algorithm, when presented in plain English, is very simple.” “The Paxos algorithm … is among the simplest and most

  • bvious of distributed algorithms”

“… this consensus algorithm follows almost unavoidably from the properties we want it to satisfy.” Leslie Lamport, Paxos Made Simple

slide-11
SLIDE 11

Engineering perspective

10

“Paxos is exceptionally diffjcult to understand… few people succeed in understanding it, and only with great efgort. …” “… we found few people who were comfortable with Paxos, even among seasoned researchers.” “We concluded that Paxos does not provide a good foundation either for system building or for education.” Diego Ongaro and John Ousterhout, In Search of an Understandable Consensus Algorithm

slide-12
SLIDE 12

Limitations of Paxos

11

slide-13
SLIDE 13

Limitations of Paxos

11

Paxos is subtle

slide-14
SLIDE 14

Limitations of Paxos

11

Paxos is subtle Paxos is slow

slide-15
SLIDE 15

Back to basics

12

slide-16
SLIDE 16

Back to basics

12

Immutability

slide-17
SLIDE 17

Back to basics

12

Immutability Generality

slide-18
SLIDE 18

Today’s Talk

13

slide-19
SLIDE 19

Today’s Talk

13

Part 1 We reframe the problem of distributed consensus.

slide-20
SLIDE 20

Today’s Talk

13

Part 1 We reframe the problem of distributed consensus. Part 2 We generalise the Paxos algorithm.

slide-21
SLIDE 21

Today’s Talk

13

Part 1 We reframe the problem of distributed consensus. Part 2 We generalise the Paxos algorithm. Part 3 We introduce the All aboard algorithm.

slide-22
SLIDE 22

Part 1 Distributed consensus using write-once registers

14

slide-23
SLIDE 23

S0

Single server

15

slide-24
SLIDE 24

S0

Single server

15

C0

slide-25
SLIDE 25

S0

Single server

15

C0

PROPOSE(A)

slide-26
SLIDE 26

S0

Single server

15

C0

PROPOSE(A)

A

slide-27
SLIDE 27

S0

Single server

15

C0

PROPOSE(A) ACCEPTED(A)

A

slide-28
SLIDE 28

S0

Single server

15

C0

PROPOSE(A) ACCEPTED(A)

C1

A

slide-29
SLIDE 29

S0

Single server

15

C0

PROPOSE(A) ACCEPTED(A)

C1

PROPOSE(B)

A

slide-30
SLIDE 30

S0

Single server

15

C0

PROPOSE(A) ACCEPTED(A)

C1

PROPOSE(B) ACCEPTED(A)

A

slide-31
SLIDE 31

Multiple servers

16

S0 S1 S2

slide-32
SLIDE 32

Multiple servers

16

C0 S0 S1 S2

slide-33
SLIDE 33

Multiple servers

16

C0 S0 S1 S2

PROPOSE(A) PROPOSE(A) PROPOSE(A)

slide-34
SLIDE 34

Multiple servers

16

C0 S0 S1 S2

PROPOSE(A)

A A A

PROPOSE(A) PROPOSE(A)

slide-35
SLIDE 35

Multiple servers

17

C0 S0 S1 S2

ACCEPTED(A)

A A A

ACCEPTED(A)

slide-36
SLIDE 36

Multiple servers

18

S0 S1 S2

A A A

slide-37
SLIDE 37

Multiple servers

18

S0 S1 S2

A A A

C1

slide-38
SLIDE 38

Multiple servers

18

S0 S1 S2

A A A

C1

PROPOSE(B) PROPOSE(B) PROPOSE(B)

slide-39
SLIDE 39

Multiple servers

19

S0 S1 S2

A A A

C1

slide-40
SLIDE 40

Multiple servers

19

S0 S1 S2

A A A

C1

ACCEPTED(A) ACCEPTED(A)

slide-41
SLIDE 41

Split Votes

20

S0 S1 S2

A B C

C0 C1 C2

slide-42
SLIDE 42

Multiple write-once registers

21

C0 C1 S0

A

C2

S1

B …

S2

C … A A

  • S0

A

slide-43
SLIDE 43

Example state table

22

slide-44
SLIDE 44

Example state table

22

S0 S1 S2 R0 A B C R1 A A

  • R2

A

slide-45
SLIDE 45

Example state table

22

S0 S1 S2 R0 A B C R1 A A

  • R2

A

  • Server

Epochs Register sets Servers

slide-46
SLIDE 46

Making decisions

23

slide-47
SLIDE 47

Making decisions

23

A value is decided when it has been written to the same register on a subset of servers, known as a quorum.

slide-48
SLIDE 48

Example quorum table

24

Quorums R0 {S0,S1} R1 {S2,S3} R2+ {S0,S1} {S2,S3}

slide-49
SLIDE 49

Example decision table

25

Quorum Decision? R0 {S0,S1} No R1 {S2,S3} Yes A R2 {S0,S1} Any {S2,S3} Maybe A

slide-50
SLIDE 50

Example decision table

25

Quorum Decision? R0 {S0,S1} No R1 {S2,S3} Yes A R2 {S0,S1} Any {S2,S3} Maybe A No decision can be made by this quorum

slide-51
SLIDE 51

Example decision table

25

Quorum Decision? R0 {S0,S1} No R1 {S2,S3} Yes A R2 {S0,S1} Any {S2,S3} Maybe A No decision can be made by this quorum This quorum decided A

slide-52
SLIDE 52

Example decision table

25

Quorum Decision? R0 {S0,S1} No R1 {S2,S3} Yes A R2 {S0,S1} Any {S2,S3} Maybe A No decision can be made by this quorum This quorum decided A This quorum can decide any value

slide-53
SLIDE 53

Example decision table

25

Quorum Decision? R0 {S0,S1} No R1 {S2,S3} Yes A R2 {S0,S1} Any {S2,S3} Maybe A No decision can be made by this quorum If a decision is made by this quorum, it will decide A This quorum decided A This quorum can decide any value

slide-54
SLIDE 54

Putting it all together

26

slide-55
SLIDE 55

Putting it all together

26

Quorums R0+ {S0,S1} {S1,S2} {S0,S2}

slide-56
SLIDE 56

Putting it all together

26

S0 S1 S2 R0

  • A

A R1

  • A

Quorums R0+ {S0,S1} {S1,S2} {S0,S2}

slide-57
SLIDE 57

Putting it all together

26

S0 S1 S2 R0

  • A

A R1

  • A

Quorums R0+ {S0,S1} {S1,S2} {S0,S2} Quorum Decision? R0 {S0,S1} No {S0,S2} No {S1,S2} Yes A R1 {S0,S1} No {S0,S2} No {S1,S2} Maybe A

slide-58
SLIDE 58

Putting it all together

27

slide-59
SLIDE 59

Putting it all together

27

Quorums R0 {S0,S1,S2,S3} R1+ {S0,S1} {S2,S3}

slide-60
SLIDE 60

Putting it all together

27

S0 S1 S2 S3 R0 B B A R1

  • A

A R2 A A Quorums R0 {S0,S1,S2,S3} R1+ {S0,S1} {S2,S3}

slide-61
SLIDE 61

Putting it all together

27

S0 S1 S2 S3 R0 B B A R1

  • A

A R2 A A Quorums R0 {S0,S1,S2,S3} R1+ {S0,S1} {S2,S3} Quorum Decision? R0 {S0,S1,S2,S3} No R1 {S0,S1} No {S2,S3} Yes A R2 {S0,S1} Yes A {S2,S3} Any

slide-62
SLIDE 62

We can decide multiple values

28

slide-63
SLIDE 63

We can decide multiple values

28

S0 S1 S2 S3 R0

  • A

A R1 C C A A Quorums R0 {S0,S1,S2,S3} R1+ {S0,S1} {S2,S3} Quorum Decision? R0 {S0,S1,S2,S3} No R1 {S0,S1} Yes C {S2,S3} Yes A

slide-64
SLIDE 64

We can decide multiple values

29

S0 S1 S2 R0 C A A R1 B B A Quorums R0+ {S0,S1} {S1,S2} {S0,S2} Quorum Decision? R0 {S0,S1} No {S0,S2} No {S1,S2} Yes A R1 {S0,S1} Yes B {S0,S2} No {S1,S2} No

slide-65
SLIDE 65

Safety

30

Before a client writes a value to register i it must ensure that no other values could be decided in register sets 0 to i.

slide-66
SLIDE 66

Part 2 Generalising Paxos

31

slide-67
SLIDE 67

Safety

32

Before a client writes a value to register i it must ensure that:

  • 1. No other values could be decided in register set i
  • 2. No other values could be decided in register sets 0 to i-1
slide-68
SLIDE 68

Register allocation rule

We allocate registers to clients round robin and require clients to write at most one value to each of their allocated registers.

33

Client Registers C0 R0, R3, … C1 R1, R4, … C2 R2, R5, …

slide-69
SLIDE 69

Safety

Before a client writes a value to register i it must ensure that:

  • 1. No other values could be decided in register set i
  • 2. No other values could be decided in register sets 0 to i-1

34

Register allocation rule

slide-70
SLIDE 70

Value selection rule

We require clients to read one register from each quorum of register sets 0 to i-1 and ensure that:

  • 1. All of the registers are written, and
  • 2. If any registers contain values, the client must write the value

from the greatest register.

35

slide-71
SLIDE 71

Safety

Before a client writes a value to register i it must ensure that:

  • 1. No other values could be decided in register set i
  • 2. No other values could be decided in register sets 0 to i-1

36

Register allocation rule Value selection rule

slide-72
SLIDE 72

Classic Paxos

Paxos is a two phase consensus algorithm.

  • Phase one ensures the safety of phase two.
  • Phase two writes a value to the servers to achieve consensus.

37

slide-73
SLIDE 73

Classic Paxos

Paxos is a two phase consensus algorithm.

  • Phase one ensures the safety of phase two.
  • Phase two writes a value to the servers to achieve consensus.

37

Quorums R0+ {S0,S1} {S1,S2} {S0,S2}

slide-74
SLIDE 74

Classic Paxos - Phase one

38

slide-75
SLIDE 75

Classic Paxos - Phase one

  • The client chooses an allocated register set i and sends PREPARE(i)

to all servers.

38

slide-76
SLIDE 76

Classic Paxos - Phase one

  • The client chooses an allocated register set i and sends PREPARE(i)

to all servers.

  • Each server writes nil in any unwritten registers from 0 to i-1 and

replies with the register number j and value w of the greatest non-nil register using PROMISED(i,j,w).

38

slide-77
SLIDE 77

Classic Paxos - Phase one

  • The client chooses an allocated register set i and sends PREPARE(i)

to all servers.

  • Each server writes nil in any unwritten registers from 0 to i-1 and

replies with the register number j and value w of the greatest non-nil register using PROMISED(i,j,w).

  • When PROMISED(i,…) has been received from a quorum of servers,

the client chooses the value v from the greatest register or its own value if none exists.

38

slide-78
SLIDE 78

Classic Paxos - Phase two

39

slide-79
SLIDE 79

Classic Paxos - Phase two

  • The client sends PROPOSE(i,v) to all servers.

39

slide-80
SLIDE 80

Classic Paxos - Phase two

  • The client sends PROPOSE(i,v) to all servers.
  • Each server checks if register i is unwritten. If so, it writes the

value v to register i and replies with ACCEPTED(i).

39

slide-81
SLIDE 81

Classic Paxos - Phase two

  • The client sends PROPOSE(i,v) to all servers.
  • Each server checks if register i is unwritten. If so, it writes the

value v to register i and replies with ACCEPTED(i).

  • The client terminates when ACCEPTED(i) has been received from a

quorum of servers.

39

slide-82
SLIDE 82

Example - Phase one

40

S0 S1 S2 S0 S1 S2 R0 R1 R2 R3

slide-83
SLIDE 83

Example - Phase one

40

C1

PREPARE(R1)

S0 S1 S2 S0 S1 S2 R0 R1 R2 R3

slide-84
SLIDE 84

Example - Phase one

41

C1 S0 S1 S2 S0 S1 S2 R0

  • R1

R2 R3

slide-85
SLIDE 85

Example - Phase one

41

C1

PROMISED(R1)

S0 S1 S2 S0 S1 S2 R0

  • R1

R2 R3

PROMISED(R1)

slide-86
SLIDE 86

Example - Phase two

42

C1

PROPOSE(R1,A)

S0 S1 S2 S0 S1 S2 R0

  • R1

R2 R3

slide-87
SLIDE 87

Example - Phase two

43

C1 S0 S1 S2 S0 S1 S2 R0

  • R1

A A A R2 R3

slide-88
SLIDE 88

Example - Phase two

43

C1

ACCEPTED(R1)

S0 S1 S2 S0 S1 S2 R0

  • R1

A A A R2 R3

slide-89
SLIDE 89

Example - Phase one

44

S0 S1 S2 S0 S1 S2 R0

  • R1

A A A R2 R3 C2

slide-90
SLIDE 90

Example - Phase one

44

PREPARE(R2)

S0 S1 S2 S0 S1 S2 R0

  • R1

A A A R2 R3 C2

slide-91
SLIDE 91

Example - Phase one

45

PROMISED(R2,R1,A)

S0 S1 S2 S0 S1 S2 R0

  • R1

A A A R2 R3 C2

PROMISED(R2,R1,A)

slide-92
SLIDE 92

Example - Phase two

46

PROPOSE(R2,A)

S0 S1 S2 S0 S1 S2 R0

  • R1

A A A R2 R3 C2

slide-93
SLIDE 93

Example - Phase two

47

S0 S1 S2 S0 S1 S2 R0

  • R1

A A A R2 A A A R3 C2

slide-94
SLIDE 94

Example - Phase two

47

ACCEPTED(R2)

S0 S1 S2 S0 S1 S2 R0

  • R1

A A A R2 A A A R3 C2

ACCEPTED(R2)

slide-95
SLIDE 95

Quorum intersection

48

slide-96
SLIDE 96

Quorum intersection

Original requirement Paxos requires that a quorum of servers participate in each of its two phases and that any two quorums must intersect.

48

slide-97
SLIDE 97

Quorum intersection

Original requirement Paxos requires that a quorum of servers participate in each of its two phases and that any two quorums must intersect.

48

Revised requirement A client using register i must get at least one server from each quorum

  • f registers 0 to i-1 to participate in phase one.
slide-98
SLIDE 98

Part 3 All aboard consensus

49

slide-99
SLIDE 99

Current Reality

50

Classic Paxos Multi Paxos Minimum round trips? 2 1 Which client can decide the value? Any Leader only

slide-100
SLIDE 100

Current Reality

50

Classic Paxos Multi Paxos Minimum round trips? 2 1 Which client can decide the value? Any Leader only

Can we design an algorithm in which any client can achieve consensus in just 1 round trip?

slide-101
SLIDE 101

Designing for today

51

slide-102
SLIDE 102

Designing for today

  • 1. Failures are rare.

51

slide-103
SLIDE 103

Designing for today

  • 1. Failures are rare.
  • 2. Each host is a client and server.

51

slide-104
SLIDE 104

All aboard - Quorum table

52

Registers partitioned at R9 Quorums R0, R1, R9 {S0,S1,S2} R10+ {S0,S1} {S1,S2} {S0,S2} Majority quorums All servers

slide-105
SLIDE 105

All aboard - Algorithm

53

slide-106
SLIDE 106

Fast path [R0 - R9] Client executes phase

  • ne locally, followed by

phase two with all servers.

All aboard - Algorithm

53

slide-107
SLIDE 107

Fast path [R0 - R9] Client executes phase

  • ne locally, followed by

phase two with all servers.

All aboard - Algorithm

53

Slow path [R10 +] Client executes classic Paxos with majority quorums for both phases.

slide-108
SLIDE 108

All aboard - Summary

54

slide-109
SLIDE 109

All aboard - Summary

  • Any clients can

terminate in just one round trip (provided all servers are up).

54

Pros

slide-110
SLIDE 110

All aboard - Summary

  • Any clients can

terminate in just one round trip (provided all servers are up).

54

  • The fast path has

increased the quorum size from majority to all.

  • More round trips are

needed if a server is slow/unavailable.

Pros Cons

slide-111
SLIDE 111

Lessons learned

55

slide-112
SLIDE 112

Lessons learned

Immutability and generality can change our perspective on distributed consensus.

55

slide-113
SLIDE 113

Lessons learned

Immutability and generality can change our perspective on distributed consensus. Paxos can relax its quorum intersection requirements. Utilising difgerent quorums tables can produce difgerent tradeofgs.

55

slide-114
SLIDE 114

Lessons learned

Immutability and generality can change our perspective on distributed consensus. Paxos can relax its quorum intersection requirements. Utilising difgerent quorums tables can produce difgerent tradeofgs. Paxos with majorities is a single point on a broad and diverse spectrum of consensus algorithms.

55

slide-115
SLIDE 115

This is just the beginning

Today, we focused on Paxos and its

  • quorums. We can use these tools to

do much more. Learn more in our latest draft: A generalised solution to distributed consensus.

56

slide-116
SLIDE 116

Q & A

57

Heidi Howard heidi.howard@cl.cam.ac.uk @heidiann360 heidihoward.co.uk