Paxos Week: Return of the State Machine Doug Woos Logistics notes - - PowerPoint PPT Presentation

paxos week return of the state machine
SMART_READER_LITE
LIVE PREVIEW

Paxos Week: Return of the State Machine Doug Woos Logistics notes - - PowerPoint PPT Presentation

Paxos Week: Return of the State Machine Doug Woos Logistics notes No in-class lecture Monday Problem Set 2 due tonight Lab 3 out Paxos Made Simple Discussion Paxos vs. Primary/Backup Paxos vs. 2PC What about reconfig? The story of Paxos


slide-1
SLIDE 1

Paxos Week: Return of the State Machine

Doug Woos

slide-2
SLIDE 2

Logistics notes

No in-class lecture Monday Problem Set 2 due tonight Lab 3 out

slide-3
SLIDE 3

Paxos Made Simple Discussion

Paxos vs. Primary/Backup Paxos vs. 2PC What about reconfig? The story of Paxos

slide-4
SLIDE 4

State machine replication

Reminder: want to agree on order of ops Can think of operations as a log

Op1 Op2 Op3 Op4 Op5 Op6

slide-5
SLIDE 5

Op1 Op2 Op3 Op4 Op5 Op6

S1 S3 S2

slide-6
SLIDE 6

Op1 Op2 Op3 Op4 Op5 Op6

S1 S3 S2 I want to do “Put k1 v1” I want to do “Put k2 v2”

slide-7
SLIDE 7

Op1 Op2 Op3 Op4 Op5 Op6

S1 S3 S2 I want to do “Put k1 v1” I want to do “Put k2 v2” Paxos for Op1

slide-8
SLIDE 8

Op1 Op2 Op3 Op4 Op5 Op6

S1 S3 S2

Put k1 v1

I want to do “Put k2 v2”

slide-9
SLIDE 9

Op1 Op2 Op3 Op4 Op5 Op6

S1 S3 S2

Put k1 v1

I want to do “Put k2 v2” Paxos for Op2

slide-10
SLIDE 10

Op1 Op2 Op3 Op4 Op5 Op6

S1 S3 S2

Put k1 v1 Put k2 v2

slide-11
SLIDE 11

Op1 Op2 Op3 Op4 Op5 Op6

S1 S3 S2

Put k1 v1 Put k2 v2

Paxos?

slide-12
SLIDE 12

Lab 3

Paxos = Paxos Made Simple

slide-13
SLIDE 13

Lab 3

Paxos = Phase 1

  • Send prepare messages
  • Pick value to accept

Phase 2

  • Send accept messages
slide-14
SLIDE 14

Can we do better?

Phase 1: “leader election”

  • Deciding whose value we will use

Phase 2: “commit”

  • Leader makes sure it’s still leader, commits value

What if we split these phases?

  • Lets us do operations with one round-trip
slide-15
SLIDE 15

Op1 Op2 Op3 Op4 Op5 Op6

S1 S3 S2

Put k1 v1 Put k2 v2

PMMC

slide-16
SLIDE 16

Roles in PMMC

Replicas (like learners)

  • Keep log of operations, state machine, configs

Leaders (like proposers)

  • Get elected, drive the consensus protocol

Acceptors (simpler than in Paxos Made Simple!)

  • “Vote” on leaders
slide-17
SLIDE 17

A note about ballots in PMMC

(leader, seqnum) pairs Isomorphic to the system we discussed Mon, Wed

1 2 3

0, 4, 8, 12, 16, … 1, 5, 9, 13, 17, … 2, 6, 10, 14, 18, … 3, 7, 11, 15, 19, …

slide-18
SLIDE 18

A note about ballots in PMMC

(leader, seqnum) pairs Isomorphic to the system we discussed Mon, Wed

1 2 3

(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), … (1, 0), (1, 1), (1, 2), (1, 3), (1, 4), … (2, 0), (2, 1), (2, 2), (2, 3), (2, 4), … (3, 0), (3, 1), (3, 2), (3, 3), (3, 4), …

slide-19
SLIDE 19

A note about ballots in PMMC

(leader, seqnum) pairs Isomorphic to the system we discussed Mon, Wed

1 2 3

0.0, 1.0, 2.0, 3.0, 4.0, … 0.1, 1.1, 2.1, 3.1, 4.1, … 0.2, 1.2, 2.2, 3.2, 4.2, … 0.3, 1.3, 2.3, 3.3, 4.3, …

slide-20
SLIDE 20

Paxos Made Moderately Complex Made Simple

slide-21
SLIDE 21

Paxos Made Moderately Complex Made Simple

slide-22
SLIDE 22

Acceptors

Acceptor ballot_num: 0 accepted:[]

slide-23
SLIDE 23

Acceptors

Acceptor ballot_num: _ accepted:[] p1a(0.1)

slide-24
SLIDE 24

Acceptors

Acceptor ballot_num: 0.1 accepted:[] p1a(0.1)

slide-25
SLIDE 25

Acceptors

Acceptor ballot_num: 0.1 accepted:[] p1a(0.1) p1b([])

slide-26
SLIDE 26

Acceptors

Acceptor ballot_num: 0.1 accepted:[]

slide-27
SLIDE 27

Acceptors

Acceptor ballot_num: 0.1 accepted:[] p1a(0.0)

slide-28
SLIDE 28

Acceptors

Acceptor ballot_num: 0.1 accepted:[] p1a(0.0) Nope!

slide-29
SLIDE 29

Acceptors

Acceptor ballot_num: 0.1 accepted:[]

slide-30
SLIDE 30

Acceptors

Acceptor ballot_num: 0.1 accepted:[] p2a(<0.1, 0, A>)

slide-31
SLIDE 31

Acceptors

Acceptor ballot_num: 0.1 accepted:[<0.1, 0, A>] p2a(<0.1, 0, A>)

slide-32
SLIDE 32

Acceptors

Acceptor ballot_num: 0.1 accepted:[<0.1, 0, A>] p2a(<0.1, 0, A>) OK!

slide-33
SLIDE 33

Acceptors

Acceptor ballot_num: 0.1 accepted:[<0.1, 0, A>]

slide-34
SLIDE 34

Acceptors

Acceptor ballot_num: 0.1 accepted:[<0.1, 0, A>] p2a(<0.0, 0, B>)

slide-35
SLIDE 35

Acceptors

Acceptor ballot_num: 0.1 accepted:[<0.1, 0, A>] p2a(<0.0, 0, B>) Nope!

slide-36
SLIDE 36

Acceptors

Acceptor ballot_num: 0.1 accepted:[<0.1, 0, A>]

slide-37
SLIDE 37

Acceptors

  • Ballot numbers increase
  • Only accept values from current ballot
  • Never remove ballots
  • If a value v is chosen by a majority on ballot b, then

any value accepted by any acceptor in the same slot

  • n ballot b’ > b has the same value
slide-38
SLIDE 38

Paxos Made Moderately Complex Made Simple

slide-39
SLIDE 39

Paxos Made Moderately Complex Made Simple

slide-40
SLIDE 40

Leader: Getting Elected

Leader active: false ballot_num: 0.0 proposals: []

slide-41
SLIDE 41

Leader: Getting Elected

Leader active: false ballot_num: 0.0 proposals: [] Acceptor Acceptor Acceptor p1a(0.0)

slide-42
SLIDE 42

Leader: Getting Elected

Leader active: false ballot_num: 0.0 proposals: [] Acceptor Acceptor Acceptor Nope! Nope!

slide-43
SLIDE 43

Leader: Getting Elected

Leader active: false ballot_num: 1.0 proposals: [] Acceptor Acceptor Acceptor

slide-44
SLIDE 44

Leader: Getting Elected

Leader active: false ballot_num: 1.0 proposals: [] Acceptor Acceptor Acceptor Or…

slide-45
SLIDE 45

Leader: Getting Elected

Leader active: false ballot_num: 0.0 proposals: [] Acceptor Acceptor Acceptor OK([])! OK([])!

slide-46
SLIDE 46

Leader: Getting Elected

Leader active: true ballot_num: 0.0 proposals: [] Acceptor Acceptor Acceptor

slide-47
SLIDE 47

Paxos Made Moderately Complex Made Simple

slide-48
SLIDE 48

Paxos Made Moderately Complex Made Simple

slide-49
SLIDE 49

Leader: Handling proposals

Leader active: true ballot_num: 0.0 proposals: [] Acceptor Acceptor Acceptor Replica Op1 should be A (A = “Put k1 v1”)

slide-50
SLIDE 50

Leader: Handling proposals

Leader active: true ballot_num: 0.0 proposals: [<1, A>] Acceptor Acceptor Acceptor Replica

slide-51
SLIDE 51

Leader: Handling proposals

Leader active: true ballot_num: 0.0 proposals: [<1, A>] Acceptor Acceptor Acceptor p2a(<0.0, 1, A>) Replica

slide-52
SLIDE 52

Leader: Handling proposals

Leader active: true ballot_num: 0.0 proposals: [<1, A>] Acceptor Acceptor Acceptor Replica Nope! Nope!

slide-53
SLIDE 53

Leader: Handling proposals

Leader active: false ballot_num: 0.0 proposals: [<1, A>] Acceptor Acceptor Acceptor Replica

slide-54
SLIDE 54

Leader: Handling proposals

Leader active: false ballot_num: 0.0 proposals: [<1, A>] Acceptor Acceptor Acceptor Replica Or…

slide-55
SLIDE 55

Leader: Handling proposals

Leader active: true ballot_num: 0.0 proposals: [<1, A>] Acceptor Acceptor Acceptor Replica OK! OK!

slide-56
SLIDE 56

Leader: Handling proposals

Leader active: true ballot_num: 0.0 proposals: [<1, A>] Acceptor Acceptor Acceptor Replica Replica Replica Op1 is A

slide-57
SLIDE 57

Paxos Made Moderately Complex Made Simple

slide-58
SLIDE 58

Election revisited

Acceptor ballot_num: 2.1 accepted:[<2.1, 1, A>] Leader active: false ballot_num: 3.0 proposals: [<1, B>]

slide-59
SLIDE 59

Election revisited

Acceptor ballot_num: 2.1 accepted:[<2.1, 1, A>] Leader active: false ballot_num: 3.0 proposals: [<1, B>] p1a(3.0)

slide-60
SLIDE 60

Election revisited

Acceptor ballot_num: 3.0 accepted:[<2.1, 1, A>] Leader active: false ballot_num: 3.0 proposals: [<1, B>]

slide-61
SLIDE 61

Election revisited

Acceptor ballot_num: 3.0 accepted:[<2.1, 1, A>] Leader active: false ballot_num: 3.0 proposals: [<1, B>] OK([<2.1, 1, A>])

slide-62
SLIDE 62

Election revisited

Acceptor ballot_num: 3.0 accepted:[<2.1, 1, A>] Leader active: true ballot_num: 3.0 proposals: [<1, A>]

slide-63
SLIDE 63

Leaders

  • Only propose one value per ballot and slot
  • If a value v is chosen by a majority on ballot b, then

any value proposed by any leader in the same slot on ballot b’ > b has the same value

slide-64
SLIDE 64

Paxos Made Moderately Complex Made Simple

slide-65
SLIDE 65

Paxos Made Moderately Complex Made Simple

slide-66
SLIDE 66

Replicas

Op1 Op2 Op3 Op4 Op5 Op6 Put k1 v1 Put k2 v2

Replica

slide-67
SLIDE 67

Replicas

Op1 Op2 Op3 Op4 Op5 Op6 Put k1 v1 Put k2 v2 App k1 v1 App k2 v2

slot_out slot_in Replica

slide-68
SLIDE 68

Replicas

Op1 Op2 Op3 Op4 Op5 Op6 Put k1 v1 Put k2 v2 App k1 v1 App k2 v2

slot_out slot_in Replica Leader decision(3, “App k1 v1”)

slide-69
SLIDE 69

Replicas

Op1 Op2 Op3 Op4 Op5 Op6 Put k1 v1 Put k2 v2 App k1 v1 App k2 v2

slot_out slot_in Replica Leader

slide-70
SLIDE 70

Replicas

Op1 Op2 Op3 Op4 Op5 Op6 Put k1 v1 Put k2 v2 App k1 v1 App k2 v2

slot_out slot_in Replica Leader decision(4, “Put k3 v3”)

slide-71
SLIDE 71

Replicas

Op1 Op2 Op3 Op4 Op5 Op6 Put k1 v1 Put k2 v2 App k1 v1 Put k3 v3

slot_out slot_in Replica Leader

App k2 v2

propose(5, “App k2 v2”)

slide-72
SLIDE 72

Paxos Made Moderately Complex Made Simple

slide-73
SLIDE 73

When to run for office

When should a leader try to get elected?

  • At the beginning of time
  • When the current leader seems to have failed

Paper describes an algorithm, based on pinging the leader and timing out If you get preempted, don’t immediately try for election again!

slide-74
SLIDE 74

Reconfiguration

All replicas must agree on who the leaders and acceptors are How do we do this?

slide-75
SLIDE 75

Reconfiguration

All replicas must agree on who the leaders and acceptors are How do we do this?

  • Use the log!
  • Commit a special reconfiguration command
  • New config applies after WINDOW slots
slide-76
SLIDE 76

Reconfiguration

What if we need to reconfigure now and client requests aren’t coming in?

slide-77
SLIDE 77

Reconfiguration

What if we need to reconfigure now and client requests aren’t coming in?

  • Commit no-ops until WINDOW is cleared
slide-78
SLIDE 78

Other complications

State simplifications

  • Can track much less information, esp. on replicas

Garbage collection

  • Unbounded memory growth is bad
  • Lab 3: track finished slots across all instances,

garbage collect when everyone is ready Read-only commands

  • Can’t just read from replica (why?)
  • But, don’t need their own slot
slide-79
SLIDE 79