RSM & Paxos Consensus Trilogy - Episode II Replicated State - - PowerPoint PPT Presentation

rsm paxos
SMART_READER_LITE
LIVE PREVIEW

RSM & Paxos Consensus Trilogy - Episode II Replicated State - - PowerPoint PPT Presentation

RSM & Paxos Consensus Trilogy - Episode II Replicated State Machine What is the problem? Fault Tolerance by Replication KV-Store Replica takes over on failure. KV-Store KV-Store Or in other scenarios. Challenge: Ensure


slide-1
SLIDE 1

RSM & Paxos

Consensus Trilogy - Episode II

slide-2
SLIDE 2

Replicated State Machine

slide-3
SLIDE 3

What is the problem?

slide-4
SLIDE 4

Fault Tolerance by Replication

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client

get("s1")->☠ get("s1")->...

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client KV-Store •Replica takes over on failure.

  • Or in other scenarios.
  • Challenge:
  • Ensure replicas are equivalent.
  • Why?
slide-5
SLIDE 5

Replication Requirements

  • Replicas must have the same state/be equivalent.
  • A simple way to build this out
  • Ensure software running at each replica is deterministic.
  • Ensure commands/operations are executed in the same order.
slide-6
SLIDE 6

Determinism

  • Ensure that equivalent replicas executing the same operation remain equivalent.
  • What does it mean to be equivalent?
  • Depends on what you are running.
  • What does it mean to be deterministic?
  • Depends on what you are running.
  • State machines are an abstraction over these details.
  • Think back to ADTs from linearizability.
slide-7
SLIDE 7

Ordering

slide-8
SLIDE 8

What is the Problem

KV-Store

set("s0", ...) set("s1", ...) get("s0")->...

Client KV-Store Client

set("s2", ...) set("s1", ...) get("s1")->...

Client

set("s1", ...) get("s0")->...

In what order should these commands be run?

slide-9
SLIDE 9

A Possible Solution

KV-Store

set("s1", 42)

Client KV-Store Client

set("s1", 1729)

Client

set("s1", 25) set("s1", 25) set("s1", 42) set("s1", 1729) set("s1", 1729) set("s1", 42)

slide-10
SLIDE 10

A Possible Solution

KV-Store

set("s1", 42)

Client KV-Store Client

set("s1", 1729)

Client

set("s1", 25) set("s1", 25) set("s1", 42) set("s1", 1729) set("s1", 1729) set("s1", 42)

slide-11
SLIDE 11

A Possible Solution

KV-Store

set("s1", 42)

Client KV-Store Client

set("s1", 1729)

Client

set("s1", 25) set("s1", 42) set("s1", 25) set("s1", 1729) set("s1", 42) set("s1", 25) set("s1", 1729)

slide-12
SLIDE 12

How to build fault tolerant oracles?

slide-13
SLIDE 13

What Do We Need?

  • Agreement on operation order.
  • Validity to ensure operations executed were actually issued.
slide-14
SLIDE 14

Consensus Protocols

  • Termination: All correct nodes eventually decide on a value to output.
  • Agreement: All decided nodes decide on the same value.
  • Validity: The decision must be one of the inputs.
slide-15
SLIDE 15

Consensus Protocols

  • Termination: All correct nodes eventually decide on a value to output.
  • Eventual Agreement: All decided nodes eventually decide on the same value.
  • Validity: The decision must be one of the inputs.
slide-16
SLIDE 16

Welcome to Paxos

slide-17
SLIDE 17

Outline

  • Going to go over single-decree Paxos.
  • Lamport's paper. Idea is to understand when and why it works.
  • Then look at how to apply this idea to build out a RSM.
slide-18
SLIDE 18

Outline

  • Going to go over single-decree Paxos.
  • Lamport's paper. Idea is to understand when and why it works.
  • Then look at how to apply this idea to build out a RSM.
slide-19
SLIDE 19

Single Decree Paxos

slide-20
SLIDE 20

Three Types of Participants

Proposers Acceptors Learners

slide-21
SLIDE 21

Three Types of Participants

Proposers Acceptors Learners Propose values that should be selected from. Decide what value is ultimately accepted. Are told what decision was made and can then act on the decision.

slide-22
SLIDE 22

Paxos: Requirements

  • Validity: Acceptors should only choose values that are proposed.
  • Agreement: Only one value should be chosen.
slide-23
SLIDE 23

Achieving Agreement

  • Relies on both proposers and acceptors.
  • Acceptors make sure that a chosen value cannot be forgotten.
  • How?
  • Proposers make sure that they don't try to override a chosen value.
  • How?
slide-24
SLIDE 24

Paxos Invariants

  • Each proposal has a unique ID. [For example use machine ID to ensure this].
  • Need to make sure proposals are totally ordered.
  • If some proposal with ID i and value v is chosen then
  • all proposals with ID > i must also have value v.

Proposal: (id, value) (1, a) (2, b) (3, a) (4, a) (5, a) Chosen

slide-25
SLIDE 25

Paxos Protocol: Phase 1

prepare (1, a) a b c p r e p a r e ( 1 , a ) prepare (1, a) Prepare Message: prepare <proposal ID> Proposal ID: (<index>, <Sequence #>)

Want to propose cake Proposal: (0, z) Accepted: ∅ Proposal: (0, z) Accepted: ∅ Proposal: (0, z) Accepted: ∅ Proposal: (0, z) Accepted: ∅

slide-26
SLIDE 26

Paxos Protocol: Phase 1

promise (1, a) ∅ a b c p r

  • m

i s e ( 1 , a ) ∅ promise (1, a) ∅ Promise Message: promise <proposal ID> <accepted value>

Want to propose cake Proposal: (1, a) Accepted: ∅ Proposal: (1, a) Accepted: ∅ Proposal: (1, a) Accepted: ∅ Proposal: (0, z) Accepted: ∅

slide-27
SLIDE 27

Paxos Protocol: Phase 2

accept (1, a) cake a b c Accept Message: accept <proposal ID> <value>

Want to propose cake Proposal: (1, a) Accepted: ∅ Proposal: (1, a) Accepted: ∅ Proposal: (1, a) Accepted: ∅ Proposal: (0, z) Accepted: ∅

a c c e p t ( 1 , a ) c a k e accept (1, a) cake

slide-28
SLIDE 28

Paxos Protocol: Phase 2

a b c accepted cake

Want to propose cake Proposal: (1, a) Accepted: cake Proposal: (1, a) Accepted: cake Proposal: (1, a) Accepted: cake Proposal: (0, z) Accepted: ∅

slide-29
SLIDE 29

Paxos Protocol: Phase 1

prepare (1, b) a b c p r e p a r e ( 1 , b ) prepare (1, b) Prepare Message: prepare <proposal ID>

Want to propose ice cream Proposal: (1, a) Accepted: cake Proposal: (1, a) Accepted: cake Proposal: (1, a) Accepted: cake Proposal: (0, z) Accepted: ∅

slide-30
SLIDE 30

Paxos Protocol: Phase 1

promise (1, b) cake a 😟b c p r

  • m

i s e ( 1 , b ) c a k e promise (1, b) ∅ Promise Message: promise <proposal ID> <accepted value>

Proposal: (1, a) Accepted: cake Proposal: (1, b) Accepted: cake Proposal: (1, b) Accepted: cake Proposal: (1, b) Accepted: ∅ Want to propose ice cream

slide-31
SLIDE 31

Paxos Protocol: Phase 2

accept (1, b) cake a b c a c c e p t ( 1 , b ) c a k e accept (1, b) cake Prepare Message: prepare <proposal ID>

Want to propose ice cream Proposal: (1, a) Accepted: cake Proposal: (1, b) Accepted: cake Proposal: (1, b) Accepted: cake Proposal: (1, b) Accepted: ∅

slide-32
SLIDE 32

Paxos Protocol: Phase 2

a b c

Proposal: (1, a) Accepted: cake Proposal: (1, b) Accepted: cake Proposal: (1, b) Accepted: cake Proposal: (1, b) Accepted: cake

slide-33
SLIDE 33

Paxos: Some Questions

  • Why do proposers need to pick the last committed value returned in Phase 1?
slide-34
SLIDE 34

Paxos: Some Questions

a b c

Proposal: (1, a) Accepted: cake Proposal: (1, a) Accepted: cake Proposal: (1, b) Accepted: cannoli Proposal: (1, b) Accepted: cannoli Proposal: (1, b) Accepted: cannoli

Is it possible to reach this situation?

slide-35
SLIDE 35

Paxos: Some Questions

a b c

Proposal: (1, a) Accepted: cake Proposal: (1, a) Accepted: cake Proposal: (1, b) Accepted: cannoli Proposal: (1, b) Accepted: cannoli Proposal: (1, b) Accepted: cannoli Want to propose cake

prepare (1, c) prepare (1, c) prepare (1, c)

slide-36
SLIDE 36

Paxos: Non-Termination

slide-37
SLIDE 37

Paxos Protocol: Phase 1

prepare (1, a) a b c p r e p a r e ( 1 , a ) prepare (1, a)

Proposal: (0, z) Accepted: ∅ Proposal: (0, z) Accepted: ∅ Proposal: (0, z) Accepted: ∅

slide-38
SLIDE 38

Paxos Protocol: Phase 1

promise (1, a) ∅ a b c p r

  • m

i s e ( 1 , a ) ∅ promise (1, a) ∅

Proposal: (1, a) Accepted: ∅ Proposal: (1, a) Accepted: ∅ Proposal: (1, a) Accepted: ∅

slide-39
SLIDE 39

Paxos Protocol: Phase 1

prepare (1, b) a b c prepare (1, b) prepare (1, b)

Proposal: (1, a) Accepted: ∅ Proposal: (1, a) Accepted: ∅ Proposal: (1, a) Accepted: ∅

slide-40
SLIDE 40

Paxos Protocol: Phase 1

promise (1, b) ∅ a b c p r

  • m

i s e ( 1 , b ) ∅ p r

  • m

i s e ( 1 , b ) ∅

Proposal: (1, b) Accepted: ∅ Proposal: (1, b) Accepted: ∅ Proposal: (1, b) Accepted: ∅

slide-41
SLIDE 41

Paxos Protocol: Phase 1

a b c

Proposal: (1, b) Accepted: ∅ Proposal: (1, b) Accepted: ∅ Proposal: (1, b) Accepted: ∅

Accept for (1, a) will fail.

slide-42
SLIDE 42

Paxos Protocol: Phase 1

prepare (2, a) a b c p r e p a r e ( 2 , a ) prepare (2, a)

Proposal: (1, b) Accepted: ∅ Proposal: (1, b) Accepted: ∅ Proposal: (1, b) Accepted: ∅

slide-43
SLIDE 43

Paxos Protocol: Phase 1

promise (2, a) ∅ a b c p r

  • m

i s e ( 2 , a ) ∅ promise (2, a) ∅

Proposal: (2, a) Accepted: ∅ Proposal: (2, a) Accepted: ∅ Proposal: (2, a) Accepted: ∅

slide-44
SLIDE 44

Paxos Protocol: Phase 1

a b c

Proposal: (2, a) Accepted: ∅ Proposal: (2, a) Accepted: ∅ Proposal: (2, a) Accepted: ∅

Accept for (1, b) will fail.

slide-45
SLIDE 45

How to Resolve this Problem?

  • Elect a leader.
  • Introduce random timeouts to ensure someone eventually wins.
  • Leader is the only proposer (by and large).
  • Still need acceptors and quorum to make sure future leaders don't forget.
  • Elect a new leader in response to failure/timeout/etc.
slide-46
SLIDE 46

Extending to State Machine

slide-47
SLIDE 47

What is Going on With This?

  • Return to RSMs: we want a consensus algorithm to decide order of operations.
  • Without knowing all operations a-priori -- so not deciding just one value.
  • Model sequence of commands as an array with slots.
  • "Run" an instance of Paxos for each slot in this array.
slide-48
SLIDE 48

But Use a Leader

  • Rather than doing this naively, we are going to rely on a leader.
  • Allow leader to avoid the promise phase.
slide-49
SLIDE 49

Multi Paxos: Phase 1

a b c Can I be leader?

slide-50
SLIDE 50

Multi Paxos: Phase 1

a b c Can I be leader?

Ballot: (0,z) Ballot: (0,z) Ballot: (0,z) p1a(a, (1,a)) p 1 a ( a , ( 1 , a ) ) p1a(a, (1,a))

slide-51
SLIDE 51

Multi Paxos: Phase 1

a b c Can I be leader?

Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] p1b((1,a), accepted) p 1 b ( ( 1 , a ) , a c c e p t e d ) p1b((1,a), accepted)

slide-52
SLIDE 52

Multi Paxos: Phase 1

a b c

Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...]

slide-53
SLIDE 53

Multi Paxos: Phase 2

a b c

Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] p2a(a, <(1,a), 1, x>) p 2 a ( a , < ( 1 , a ) , 1 , x > ) p2a(a, <(1,a), 1, x>)

slide-54
SLIDE 54

Multi Paxos: Phase 2

a b c

Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] p2b((1, a)) p 2 b ( ( 1 , a ) ) p2b((1, a))

slide-55
SLIDE 55

Multi Paxos: Phase 2

a b c

Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] p2a(a, <(1,a), 2, y>) p 2 a ( a , < ( 1 , a ) , 2 , y > ) p2a(a, <(1,a), 2, y>)

slide-56
SLIDE 56

Multi Paxos: Phase 2

a b c

Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] Ballot: (1, a) Accepted: [...] p2b((1, a)) p 2 b ( ( 1 , a ) ) p2b((1, a))

slide-57
SLIDE 57

Multi Paxos: Phase 1

a b c Can I be leader?

Ballot: (1,a) Ballot: (1,a) Ballot: (1,a) p1a(b, (1,b)) p1a(b, (1,b)) p 1 a ( b , ( 1 , b ) )

slide-58
SLIDE 58

Multi Paxos: View Change

a b c

Ballot: (1,b) Ballot: (1,b) Ballot: (1,b)

slide-59
SLIDE 59

Multi Paxos: View Change

a b c

Ballot: (1, b) Accepted: [...] Ballot: (1, b) Accepted: [...] Ballot: (1, b) Accepted: [...] p2b((1, b)) p 2 b ( ( 1 , b ) ) p2b((1, b))

slide-60
SLIDE 60

Multi Paxos: View Change

a b c

Ballot: (1, b) Accepted: [...] Ballot: (1, b) Accepted: [...] Ballot: (1, b) Accepted: [...]

slide-61
SLIDE 61

Interface

  • As an aside: how does one build a reusable version of this system?
  • Most common abstraction now: build a key-value store.
  • Popularized by Chubby at Google, implemented multipaxos.
  • Can use key-value store to implement locks, indicate what is alive, etc.
  • Often extended with leases to make sure state is cleaned up despite failures.
slide-62
SLIDE 62

Summary

  • Replicated state machines are a powerful abstraction for fault tolerance.
  • However, require an oracle that can order commands across all replicas.
  • Enter consensus protocols.