RSM & Paxos Consensus Trilogy - Episode II Replicated State - - PowerPoint PPT Presentation

rsm paxos
SMART_READER_LITE
LIVE PREVIEW

RSM & Paxos Consensus Trilogy - Episode II Replicated State - - PowerPoint PPT Presentation

RSM & Paxos Consensus Trilogy - Episode II Replicated State Machine What is the problem? Fault Tolerance by Replication KV-Store Client Fault Tolerance by Replication KV-Store Client set("s0", ...) Fault Tolerance by


slide-1
SLIDE 1

RSM & Paxos

Consensus Trilogy - Episode II

slide-2
SLIDE 2

Replicated State Machine

slide-3
SLIDE 3

What is the problem?

slide-4
SLIDE 4

Fault Tolerance by Replication

KV-Store Client

slide-5
SLIDE 5

Fault Tolerance by Replication

KV-Store

set("s0", ...)

Client

slide-6
SLIDE 6

Fault Tolerance by Replication

KV-Store

set("s0", ...) set("s1", ...)

Client

slide-7
SLIDE 7

Fault Tolerance by Replication

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client

slide-8
SLIDE 8

Fault Tolerance by Replication

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client

slide-9
SLIDE 9

Fault Tolerance by Replication

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client

get("s1")->☠

slide-10
SLIDE 10

Fault Tolerance by Replication

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client

get("s1")->☠

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client KV-Store

slide-11
SLIDE 11

Fault Tolerance by Replication

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client

get("s1")->☠

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client KV-Store

slide-12
SLIDE 12

Fault Tolerance by Replication

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client

get("s1")->☠ get("s1")->...

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client KV-Store

slide-13
SLIDE 13

Fault Tolerance by Replication

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client

get("s1")->☠ get("s1")->...

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client KV-Store •Replica takes over on failure.

slide-14
SLIDE 14

Fault Tolerance by Replication

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client

get("s1")->☠ get("s1")->...

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client KV-Store •Replica takes over on failure.

  • Or in other scenarios.
slide-15
SLIDE 15

Fault Tolerance by Replication

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client

get("s1")->☠ get("s1")->...

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client KV-Store •Replica takes over on failure.

  • Or in other scenarios.
  • Challenge:
slide-16
SLIDE 16

Fault Tolerance by Replication

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client

get("s1")->☠ get("s1")->...

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client KV-Store •Replica takes over on failure.

  • Or in other scenarios.
  • Challenge:
  • Ensure replicas are equivalent.
slide-17
SLIDE 17

Fault Tolerance by Replication

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client

get("s1")->☠ get("s1")->...

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client KV-Store •Replica takes over on failure.

  • Or in other scenarios.
  • Challenge:
  • Ensure replicas are equivalent.
  • Why?
slide-18
SLIDE 18

Replication Requirements

  • Replicas must have the same state/be equivalent.
slide-19
SLIDE 19

Replication Requirements

  • Replicas must have the same state/be equivalent.
  • A simple way to build this out
slide-20
SLIDE 20

Replication Requirements

  • Replicas must have the same state/be equivalent.
  • A simple way to build this out
  • Ensure software running at each replica is deterministic.
slide-21
SLIDE 21

Replication Requirements

  • Replicas must have the same state/be equivalent.
  • A simple way to build this out
  • Ensure software running at each replica is deterministic.
  • Ensure commands/operations are executed in the same order.
slide-22
SLIDE 22

Determinism

  • Ensure that equivalent replicas executing the same operation remain equivalent.
slide-23
SLIDE 23

Determinism

  • Ensure that equivalent replicas executing the same operation remain equivalent.
  • What does it mean to be equivalent?
slide-24
SLIDE 24

Determinism

  • Ensure that equivalent replicas executing the same operation remain equivalent.
  • What does it mean to be equivalent?
  • Depends on what you are running.
slide-25
SLIDE 25

Determinism

  • Ensure that equivalent replicas executing the same operation remain equivalent.
  • What does it mean to be equivalent?
  • Depends on what you are running.
  • What does it mean to be deterministic?
slide-26
SLIDE 26

Determinism

  • Ensure that equivalent replicas executing the same operation remain equivalent.
  • What does it mean to be equivalent?
  • Depends on what you are running.
  • What does it mean to be deterministic?
  • Depends on what you are running.
slide-27
SLIDE 27

Determinism

  • Ensure that equivalent replicas executing the same operation remain equivalent.
  • What does it mean to be equivalent?
  • Depends on what you are running.
  • What does it mean to be deterministic?
  • Depends on what you are running.
  • State machines are an abstraction over these details.
slide-28
SLIDE 28

Determinism

  • Ensure that equivalent replicas executing the same operation remain equivalent.
  • What does it mean to be equivalent?
  • Depends on what you are running.
  • What does it mean to be deterministic?
  • Depends on what you are running.
  • State machines are an abstraction over these details.
  • Think back to ADTs from linearizability.
slide-29
SLIDE 29

Ordering

slide-30
SLIDE 30

What is the Problem

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client KV-Store Client

set("s2", ...) set("s1", ...) get("s1")->...

Client

set("s1", ...) get("s0")->...

slide-31
SLIDE 31

What is the Problem

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client KV-Store Client

set("s2", ...) set("s1", ...) get("s1")->...

Client

set("s1", ...) get("s0")->...

In what order should these commands be run?

slide-32
SLIDE 32

A Possible Solution

KV-Store Client KV-Store Client Client

slide-33
SLIDE 33

A Possible Solution

KV-Store Client KV-Store Client Client

set("s1", 25)

slide-34
SLIDE 34

A Possible Solution

KV-Store

set("s1", 42)

Client KV-Store Client Client

set("s1", 25)

slide-35
SLIDE 35

A Possible Solution

KV-Store

set("s1", 42)

Client KV-Store Client

set("s1", 1729)

Client

set("s1", 25)

slide-36
SLIDE 36

A Possible Solution

KV-Store

set("s1", 42)

Client KV-Store Client

set("s1", 1729)

Client

set("s1", 25) set("s1", 25)

slide-37
SLIDE 37

A Possible Solution

KV-Store

set("s1", 42)

Client KV-Store Client

set("s1", 1729)

Client

set("s1", 25) set("s1", 25) set("s1", 42)

slide-38
SLIDE 38

A Possible Solution

KV-Store

set("s1", 42)

Client KV-Store Client

set("s1", 1729)

Client

set("s1", 25) set("s1", 25) set("s1", 42) set("s1", 1729)

slide-39
SLIDE 39

A Possible Solution

KV-Store

set("s1", 42)

Client KV-Store Client

set("s1", 1729)

Client

set("s1", 25) set("s1", 25) set("s1", 42) set("s1", 1729) set("s1", 1729)

slide-40
SLIDE 40

A Possible Solution

KV-Store

set("s1", 42)

Client KV-Store Client

set("s1", 1729)

Client

set("s1", 25) set("s1", 25) set("s1", 42) set("s1", 1729) set("s1", 1729) set("s1", 42)

slide-41
SLIDE 41

A Possible Solution

KV-Store

set("s1", 42)

Client KV-Store Client

set("s1", 1729)

Client

set("s1", 25) set("s1", 25) set("s1", 42) set("s1", 1729) set("s1", 1729) set("s1", 42)

slide-42
SLIDE 42

A Possible Solution

KV-Store

set("s1", 42)

Client KV-Store Client

set("s1", 1729)

Client

set("s1", 25) set("s1", 25) set("s1", 42) set("s1", 1729) set("s1", 1729) set("s1", 42)

slide-43
SLIDE 43

A Possible Solution

KV-Store

set("s1", 42)

Client KV-Store Client

set("s1", 1729)

Client

set("s1", 25) set("s1", 25) set("s1", 42) set("s1", 1729) set("s1", 1729) set("s1", 42)

slide-44
SLIDE 44

A Possible Solution

KV-Store

set("s1", 42)

Client KV-Store Client

set("s1", 1729)

Client

set("s1", 25) set("s1", 25) set("s1", 42) set("s1", 1729) set("s1", 1729) set("s1", 42)

slide-45
SLIDE 45

A Possible Solution

KV-Store

set("s1", 42)

Client KV-Store Client

set("s1", 1729)

Client

set("s1", 25) set("s1", 25) set("s1", 42) set("s1", 1729) set("s1", 1729) set("s1", 42)

slide-46
SLIDE 46

A Possible Solution

KV-Store

set("s1", 42)

Client KV-Store Client

set("s1", 1729)

Client

set("s1", 25) set("s1", 25) set("s1", 42) set("s1", 1729) set("s1", 1729) set("s1", 42)

slide-47
SLIDE 47

A Possible Solution

KV-Store

set("s1", 42)

Client KV-Store Client

set("s1", 1729)

Client

set("s1", 25) set("s1", 42) set("s1", 25) set("s1", 1729) set("s1", 42) set("s1", 25) set("s1", 1729)

slide-48
SLIDE 48

A Possible Solution

KV-Store

set("s1", 42)

Client KV-Store Client

set("s1", 1729)

Client

set("s1", 25) set("s1", 42) set("s1", 25) set("s1", 1729) set("s1", 42) set("s1", 25) set("s1", 1729)

slide-49
SLIDE 49

How to build fault tolerant oracles?

slide-50
SLIDE 50

What Do We Need?

  • Agreement on operation order.
slide-51
SLIDE 51

What Do We Need?

  • Agreement on operation order.
  • Validity to ensure operations executed were actually issued.
slide-52
SLIDE 52

Consensus Protocols

  • Termination: All correct nodes eventually decide on a value to output.
slide-53
SLIDE 53

Consensus Protocols

  • Termination: All correct nodes eventually decide on a value to output.
  • Agreement: All decided nodes decide on the same value.
slide-54
SLIDE 54

Consensus Protocols

  • Termination: All correct nodes eventually decide on a value to output.
  • Agreement: All decided nodes decide on the same value.
  • Validity: The decision must be one of the inputs.
slide-55
SLIDE 55

Consensus Protocols

  • Termination: All correct nodes eventually decide on a value to output.
  • Eventual Agreement: All decided nodes eventually decide on the same value.
  • Validity: The decision must be one of the inputs.
slide-56
SLIDE 56

Welcome to Paxos

slide-57
SLIDE 57

Outline

  • Going to go over single-decree Paxos.
  • Lamport's paper. Idea is to understand when and why it works.
  • Then look at how to apply this idea to build out a RSM.
slide-58
SLIDE 58

Outline

  • Going to go over single-decree Paxos.
  • Lamport's paper. Idea is to understand when and why it works.
  • Then look at how to apply this idea to build out a RSM.
slide-59
SLIDE 59

Single Decree Paxos

slide-60
SLIDE 60

Three Types of Participants

Proposers

slide-61
SLIDE 61

Three Types of Participants

Proposers Acceptors

slide-62
SLIDE 62

Three Types of Participants

Proposers Acceptors Learners

slide-63
SLIDE 63

Three Types of Participants

Proposers Acceptors Learners Propose values that should be selected from.

slide-64
SLIDE 64

Three Types of Participants

Proposers Acceptors Learners Propose values that should be selected from. Decide what value is ultimately accepted.

slide-65
SLIDE 65

Three Types of Participants

Proposers Acceptors Learners Propose values that should be selected from. Decide what value is ultimately accepted. Are told what decision was made and can then act on the decision.

slide-66
SLIDE 66

Paxos: Requirements

  • Validity: Acceptors should only choose values that are proposed.
slide-67
SLIDE 67

Paxos: Requirements

  • Validity: Acceptors should only choose values that are proposed.
  • Agreement: Only one value should be chosen.
slide-68
SLIDE 68

Achieving Agreement

  • Relies on both proposers and acceptors.
slide-69
SLIDE 69

Achieving Agreement

  • Relies on both proposers and acceptors.
  • Acceptors make sure that a chosen value cannot be forgotten.
slide-70
SLIDE 70

Achieving Agreement

  • Relies on both proposers and acceptors.
  • Acceptors make sure that a chosen value cannot be forgotten.
  • How?
slide-71
SLIDE 71

Achieving Agreement

  • Relies on both proposers and acceptors.
  • Acceptors make sure that a chosen value cannot be forgotten.
  • How?
  • Proposers make sure that they don't try to override a chosen value.
slide-72
SLIDE 72

Achieving Agreement

  • Relies on both proposers and acceptors.
  • Acceptors make sure that a chosen value cannot be forgotten.
  • How?
  • Proposers make sure that they don't try to override a chosen value.
  • How?
slide-73
SLIDE 73

Paxos Invariants

  • Each proposal has a unique ID. [For example use machine ID to ensure this].
slide-74
SLIDE 74

Paxos Invariants

  • Each proposal has a unique ID. [For example use machine ID to ensure this].

Proposal: (id, value)

slide-75
SLIDE 75

Paxos Invariants

  • Each proposal has a unique ID. [For example use machine ID to ensure this].
  • Need to make sure proposals are totally ordered.

Proposal: (id, value)

slide-76
SLIDE 76

Paxos Invariants

  • Each proposal has a unique ID. [For example use machine ID to ensure this].
  • Need to make sure proposals are totally ordered.
  • If some proposal with ID i and value v is chosen then

Proposal: (id, value)

slide-77
SLIDE 77

Paxos Invariants

  • Each proposal has a unique ID. [For example use machine ID to ensure this].
  • Need to make sure proposals are totally ordered.
  • If some proposal with ID i and value v is chosen then
  • all proposals with ID > i must also have value v.

Proposal: (id, value)

slide-78
SLIDE 78

Paxos Invariants

  • Each proposal has a unique ID. [For example use machine ID to ensure this].
  • Need to make sure proposals are totally ordered.
  • If some proposal with ID i and value v is chosen then
  • all proposals with ID > i must also have value v.

Proposal: (id, value) (1, a) (2, b) (3, a) (4, a) (5, a) Chosen

slide-79
SLIDE 79

Paxos Protocol: Phase 1

prepare (1, a) a b c p r e p a r e ( 1 , a ) prepare (1, a) Prepare Message: prepare <proposal ID> Proposal ID: (<index>, <Sequence #>)

Want to propose cake Proposal: (0, z) Accepted: ∅ Proposal: (0, z) Accepted: ∅ Proposal: (0, z) Accepted: ∅ Proposal: (0, z) Accepted: ∅

slide-80
SLIDE 80

Paxos Protocol: Phase 1

promise (1, a) ∅ a b c p r

  • m

i s e ( 1 , a ) ∅ promise (1, a) ∅ Promise Message: promise <proposal ID> <accepted value>

Want to propose cake Proposal: (1, a) Accepted: ∅ Proposal: (1, a) Accepted: ∅ Proposal: (1, a) Accepted: ∅ Proposal: (0, z) Accepted: ∅

slide-81
SLIDE 81

Paxos Protocol: Phase 2

accept (1, a) cake a b c Accept Message: accept <proposal ID> <value>

Want to propose cake Proposal: (1, a) Accepted: ∅ Proposal: (1, a) Accepted: ∅ Proposal: (1, a) Accepted: ∅ Proposal: (0, z) Accepted: ∅

a c c e p t ( 1 , a ) c a k e accept (1, a) cake

slide-82
SLIDE 82

Paxos Protocol: Phase 2

a b c accepted cake

Want to propose cake Proposal: (1, a) Accepted: cake Proposal: (1, a) Accepted: cake Proposal: (1, a) Accepted: cake Proposal: (0, z) Accepted: ∅

slide-83
SLIDE 83

Paxos Protocol: Phase 1

prepare (1, b) a b c p r e p a r e ( 1 , b ) prepare (1, b) Prepare Message: prepare <proposal ID>

Want to propose ice cream Proposal: (1, a) Accepted: cake Proposal: (1, a) Accepted: cake Proposal: (1, a) Accepted: cake Proposal: (0, z) Accepted: ∅

slide-84
SLIDE 84

Paxos Protocol: Phase 1

promise (1, b) cake a 😟b c p r

  • m

i s e ( 1 , b ) c a k e promise (1, b) ∅ Promise Message: promise <proposal ID> <accepted value>

Proposal: (1, a) Accepted: cake Proposal: (1, b) Accepted: cake Proposal: (1, b) Accepted: cake Proposal: (1, b) Accepted: ∅ Want to propose ice cream

slide-85
SLIDE 85

Paxos Protocol: Phase 2

accept (1, b) cake a b c a c c e p t ( 1 , b ) c a k e accept (1, b) cake Prepare Message: prepare <proposal ID>

Want to propose ice cream Proposal: (1, a) Accepted: cake Proposal: (1, b) Accepted: cake Proposal: (1, b) Accepted: cake Proposal: (1, b) Accepted: ∅

slide-86
SLIDE 86

Paxos Protocol: Phase 2

a b c

Proposal: (1, a) Accepted: cake Proposal: (1, b) Accepted: cake Proposal: (1, b) Accepted: cake Proposal: (1, b) Accepted: cake

slide-87
SLIDE 87

Paxos: Some Questions

  • Why do proposers need to pick the last committed value returned in Phase 1?
slide-88
SLIDE 88

Paxos: Some Questions

a b c

Proposal: (1, a) Accepted: cake Proposal: (1, a) Accepted: cake Proposal: (1, b) Accepted: cannoli Proposal: (1, b) Accepted: cannoli Proposal: (1, b) Accepted: cannoli

slide-89
SLIDE 89

Paxos: Some Questions

a b c

Proposal: (1, a) Accepted: cake Proposal: (1, a) Accepted: cake Proposal: (1, b) Accepted: cannoli Proposal: (1, b) Accepted: cannoli Proposal: (1, b) Accepted: cannoli

Is it possible to reach this situation?

slide-90
SLIDE 90

Paxos: Some Questions

a b c

Proposal: (1, a) Accepted: cake Proposal: (1, a) Accepted: cake Proposal: (1, b) Accepted: cannoli Proposal: (1, b) Accepted: cannoli Proposal: (1, b) Accepted: cannoli Want to propose cake

prepare (1, c) prepare (1, c) prepare (1, c)

slide-91
SLIDE 91

Paxos: Non-Termination

slide-92
SLIDE 92

Paxos Protocol: Phase 1

prepare (1, a) a b c p r e p a r e ( 1 , a ) prepare (1, a)

Proposal: (0, z) Accepted: ∅ Proposal: (0, z) Accepted: ∅ Proposal: (0, z) Accepted: ∅

slide-93
SLIDE 93

Paxos Protocol: Phase 1

promise (1, a) ∅ a b c p r

  • m

i s e ( 1 , a ) ∅ promise (1, a) ∅

Proposal: (1, a) Accepted: ∅ Proposal: (1, a) Accepted: ∅ Proposal: (1, a) Accepted: ∅

slide-94
SLIDE 94

Paxos Protocol: Phase 1

prepare (1, b) a b c prepare (1, b) prepare (1, b)

Proposal: (1, a) Accepted: ∅ Proposal: (1, a) Accepted: ∅ Proposal: (1, a) Accepted: ∅

slide-95
SLIDE 95

Paxos Protocol: Phase 1

promise (1, b) ∅ a b c p r

  • m

i s e ( 1 , b ) ∅ p r

  • m

i s e ( 1 , b ) ∅

Proposal: (1, b) Accepted: ∅ Proposal: (1, b) Accepted: ∅ Proposal: (1, b) Accepted: ∅

slide-96
SLIDE 96

Paxos Protocol: Phase 1

a b c

Proposal: (1, b) Accepted: ∅ Proposal: (1, b) Accepted: ∅ Proposal: (1, b) Accepted: ∅

Accept for (1, a) will fail.

slide-97
SLIDE 97

Paxos Protocol: Phase 1

prepare (2, a) a b c p r e p a r e ( 2 , a ) prepare (2, a)

Proposal: (1, b) Accepted: ∅ Proposal: (1, b) Accepted: ∅ Proposal: (1, b) Accepted: ∅

slide-98
SLIDE 98

Paxos Protocol: Phase 1

promise (2, a) ∅ a b c p r

  • m

i s e ( 2 , a ) ∅ promise (2, a) ∅

Proposal: (2, a) Accepted: ∅ Proposal: (2, a) Accepted: ∅ Proposal: (2, a) Accepted: ∅

slide-99
SLIDE 99

Paxos Protocol: Phase 1

a b c

Proposal: (2, a) Accepted: ∅ Proposal: (2, a) Accepted: ∅ Proposal: (2, a) Accepted: ∅

Accept for (1, b) will fail.

slide-100
SLIDE 100

How to Resolve this Problem?

  • Elect a leader.
slide-101
SLIDE 101

How to Resolve this Problem?

  • Elect a leader.
  • Introduce random timeouts to ensure someone eventually wins.
slide-102
SLIDE 102

How to Resolve this Problem?

  • Elect a leader.
  • Introduce random timeouts to ensure someone eventually wins.
  • Leader is the only proposer (by and large).
slide-103
SLIDE 103

How to Resolve this Problem?

  • Elect a leader.
  • Introduce random timeouts to ensure someone eventually wins.
  • Leader is the only proposer (by and large).
  • Still need acceptors and quorum to make sure future leaders don't forget.
slide-104
SLIDE 104

How to Resolve this Problem?

  • Elect a leader.
  • Introduce random timeouts to ensure someone eventually wins.
  • Leader is the only proposer (by and large).
  • Still need acceptors and quorum to make sure future leaders don't forget.
  • Elect a new leader in response to failure/timeout/etc.
slide-105
SLIDE 105

Extending to State Machine

slide-106
SLIDE 106

What is Going on With This?

  • Return to RSMs: we want a consensus algorithm to decide order of operations.
slide-107
SLIDE 107

What is Going on With This?

  • Return to RSMs: we want a consensus algorithm to decide order of operations.
  • Without knowing all operations a-priori -- so not deciding just one value.
slide-108
SLIDE 108

What is Going on With This?

  • Return to RSMs: we want a consensus algorithm to decide order of operations.
  • Without knowing all operations a-priori -- so not deciding just one value.
  • Model sequence of commands as an array with slots.
slide-109
SLIDE 109

What is Going on With This?

  • Return to RSMs: we want a consensus algorithm to decide order of operations.
  • Without knowing all operations a-priori -- so not deciding just one value.
  • Model sequence of commands as an array with slots.
  • "Run" an instance of Paxos for each slot in this array.
slide-110
SLIDE 110

But Use a Leader

  • Rather than doing this naively, we are going to rely on a leader.
slide-111
SLIDE 111

But Use a Leader

  • Rather than doing this naively, we are going to rely on a leader.
  • Allow leader to avoid the promise phase.
slide-112
SLIDE 112

Multi Paxos: Phase 1

a b c Can I be leader?

slide-113
SLIDE 113

Multi Paxos: Phase 1

a b c Can I be leader?

Ballot: (0,z) Ballot: (0,z) Ballot: (0,z) p1a(a, (1,a)) p 1 a ( a , ( 1 , a ) ) p1a(a, (1,a))

slide-114
SLIDE 114

Multi Paxos: Phase 1

a b c Can I be leader?

Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] p1b((1,a), accepted) p 1 b ( ( 1 , a ) , a c c e p t e d ) p1b((1,a), accepted)

slide-115
SLIDE 115

Multi Paxos: Phase 1

a b c

Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...]

slide-116
SLIDE 116

Multi Paxos: Phase 2

a b c

Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] p2a(a, <(1,a), 1, x>) p 2 a ( a , < ( 1 , a ) , 1 , x > ) p2a(a, <(1,a), 1, x>)

slide-117
SLIDE 117

Multi Paxos: Phase 2

a b c

Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] p2b((1, a)) p 2 b ( ( 1 , a ) ) p2b((1, a))

slide-118
SLIDE 118

Multi Paxos: Phase 2

a b c

Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] p2a(a, <(1,a), 2, y>) p 2 a ( a , < ( 1 , a ) , 2 , y > ) p2a(a, <(1,a), 2, y>)

slide-119
SLIDE 119

Multi Paxos: Phase 2

a b c

Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] p2b((1, a)) p 2 b ( ( 1 , a ) ) p2b((1, a))

slide-120
SLIDE 120

Multi Paxos: Phase 1

a b c Can I be leader?

Ballot: (1,a) Ballot: (1,a) Ballot: (1,a) p1a(b, (1,b)) p1a(b, (1,b)) p 1 a ( b , ( 1 , b ) )

slide-121
SLIDE 121

Multi Paxos: View Change

a b c

Ballot: (1,b) Ballot: (1,b) Ballot: (1,b)

slide-122
SLIDE 122

Multi Paxos: View Change

a b c

Ballot: (1, b) Accepted: [...] Ballot: (1, b) Accepted: [...] Ballot: (1, b) Accepted: [...] p2b((1, b)) p 2 b ( ( 1 , b ) ) p2b((1, b))

slide-123
SLIDE 123

Multi Paxos: View Change

a b c

Ballot: (1, b) Accepted: [...] Ballot: (1, b) Accepted: [...] Ballot: (1, b) Accepted: [...]

slide-124
SLIDE 124

Interface

  • As an aside: how does one build a reusable version of this system?
  • Most common abstraction now: build a key-value store.
  • Popularized by Chubby at Google, implemented multipaxos.
  • Can use key-value store to implement locks, indicate what is alive, etc.
  • Often extended with leases to make sure state is cleaned up despite failures.
slide-125
SLIDE 125

Summary

  • Replicated state machines are a powerful abstraction for fault tolerance.
  • However, require an oracle that can order commands across all replicas.
  • Enter consensus protocols.