Planning for Change in a Formal Verification of the Raft Consensus - - PowerPoint PPT Presentation

planning for change in a formal verification of the raft
SMART_READER_LITE
LIVE PREVIEW

Planning for Change in a Formal Verification of the Raft Consensus - - PowerPoint PPT Presentation

Planning for Change in a Formal Verification of the Raft Consensus Protocol Doug James Steve Zach Mike Tom Woos Wilcox Anton Tatlock Ernst Anderson Contributions First formal proof of Rafts safety first verified


slide-1
SLIDE 1

Planning for Change 
 in a Formal Verification of the 
 Raft Consensus Protocol

James Wilcox Steve Anton Zach Tatlock Mike Ernst Tom Anderson Doug Woos
slide-2
SLIDE 2

Contributions

First formal proof of Raft’s safety

first verified implementation!

Large-scale Verdi case study

stress test; reverification inevitable

Proof engineering lessons

affinity lemmas, etc.

slide-3
SLIDE 3

Distributed Systems

slide-4
SLIDE 4

Reliably deliver procrastination

slide-5
SLIDE 5

Also serious infrastructure

slide-6
SLIDE 6

One day last summer...

slide-7
SLIDE 7

One day last summer...

slide-8
SLIDE 8

One day last summer...

slide-9
SLIDE 9

How distributed systems fail

slide-10
SLIDE 10

Related Work

IronFleet [SOSP15] EventML [LADA12, AVoCS15]

liveness, log compaction, serialization language for verified distributed systems

Verdi [PLDI15]

network semantics, transformers, higher-order

slide-11
SLIDE 11

Verdi background

Network semantics

  • perational semantics define network behavior

Verified system transformers prove property transfer to adversarial network

VST

App App App App App App
slide-12
SLIDE 12

Big Picture

Past: Verdi Framework

compositional fault tolerance

Present: Verified Raft

critical piece of infrastructure

Future:

dynamically upgrading systems program logic

slide-13
SLIDE 13

Outline

Verification Challenge Raft Algorithm Proof Overview

state machine replication implemented in Verdi and lessons learned

slide-14
SLIDE 14

Replication for fault tolerance

critical components must not fail

slide-15
SLIDE 15

Replication for fault tolerance

available if n/2 nodes are up replicas must be consistent with each other

slide-16
SLIDE 16

Replication for fault tolerance

slide-17
SLIDE 17

Replication correctness

slide-18
SLIDE 18

Replication correctness

linearizability

cluster presents consistent

  • rder of operations to clients
slide-19
SLIDE 19

Internal Correctness

linearizability follows from internal correctness: state machine safety

slide-20
SLIDE 20

Goal: Verify Raft

Reduce linearizability to State Machine Safety [PLDI15] Prove State Machine Safety

slide-21
SLIDE 21

Goal: Verify Raft

Lin. SMS LOC 45k 5k

slide-22
SLIDE 22

Outline

Verification Challenge Raft Algorithm Proof Overview

state machine replication implemented in Verdi and lessons learned

slide-23
SLIDE 23

Formalizing the network

state of the world packets in flight history of I/O data @ nodes

slide-24
SLIDE 24

Formalizing the network

slide-25
SLIDE 25

Formalizing the network

slide-26
SLIDE 26

Defining network semantics

Hnet(dst, Σ[dst], src, m)=(σ0, o, P 0) Σ0 =Σ[dst 7! σ0] ({(src, dst, m)} ] P, Σ, T) (P ] P 0, Σ0, T ++ hoi)

Deliver

slide-27
SLIDE 27

Defining network semantics

Hnet(dst, Σ[dst], src, m)=(σ0, o, P 0) Σ0 =Σ[dst 7! σ0] ({(src, dst, m)} ] P, Σ, T) (P ] P 0, Σ0, T ++ hoi)

Deliver

p 2 P (P, Σ, T) (P ] {p}, Σ, T)

Duplicate

({p} ] P, Σ, T) (P, Σ, T)

Drop

Htmt(n, Σ[n]) = (σ0, o, P 0) Σ0 = Σ[n 7! σ0] (P, Σ, T) (P ] P 0, Σ0, T ++ htmt, oi)

Timeout

slide-28
SLIDE 28

Defining network semantics

Hnet(dst, Σ[dst], src, m)=(σ0, o, P 0) Σ0 =Σ[dst 7! σ0] ({(src, dst, m)} ] P, Σ, T) (P ] P 0, Σ0, T ++ hoi)

Deliver

p 2 P (P, Σ, T) (P ] {p}, Σ, T)

Duplicate

({p} ] P, Σ, T) (P, Σ, T)

Drop

Htmt(n, Σ[n]) = (σ0, o, P 0) Σ0 = Σ[n 7! σ0] (P, Σ, T) (P ] P 0, Σ0, T ++ htmt, oi)

Timeout

systems defined by handlers

slide-29
SLIDE 29

election replication

...

Term 3 Term 2 Term 1

Implementing Raft

slide-30
SLIDE 30

Implementing Raft: Leader Election

Candidate Followers

ReqVote Vote

...

Term 3 Term 2 Term 1

slide-31
SLIDE 31

Implementing Raft

...

Term 3 Term 2 Term 1

slide-32
SLIDE 32

Term 3 Term 2 Term 1

...

Leader Followers

Append AppendAck

Implementing Raft: Log Replication

Leader commits entry when receiving n/2 acks

slide-33
SLIDE 33

Outline

Verification Challenge Raft Algorithm Proof Overview

state machine replication implemented in Verdi and lessons learned

slide-34
SLIDE 34

Verifying Raft: Show linearizability

slide-35
SLIDE 35

Verifying Raft: Approach

slide-36
SLIDE 36

State Machine Safety

Nodes agree about committed entries

proof by induction on an execution since only committed entries executed

slide-37
SLIDE 37

State Machine Safety: Proof

I ⇒ I

not inductive!

slide-38
SLIDE 38

State Machine Safety: Proof

I⇒ I I I

true initially preserved

Lemma Lemma Lemma …

90 invariants in total

slide-39
SLIDE 39

The burden of proof

P⇒ P P with ghost state

P true initially P preserved

Lemma Lemma … Lemma

Re-verification is the primary challenge:

  • invariants are not inductive
  • not-yet-verified code is wrong
  • need additional invariants
slide-40
SLIDE 40

The burden of proof

P⇒ P P with ghost state

P true initially P preserved

Lemma Lemma … Lemma

Re-verification is the primary challenge Proof engineering techniques help:

  • affinity lemmas
  • intermediate reachability
  • structural tactics
  • information hiding
slide-41
SLIDE 41

Ghost State: Example

Capture all entries received by a node

Leader Follower

Append

Log (real) allEntries (ghost) A,B,C [A],B,C A,D {A,D} A,B,C {A,B,C,D} {A,B,C}

slide-42
SLIDE 42

Affinity Lemmas: Example

Affinity Lemma every invariant of entries in logs is invariant of entries in allEntries

e.term > 0 e log

⇒ e.term > 0

e allEntries

slide-43
SLIDE 43

Affinity Lemmas: Example

Affinity Lemma every invariant of entries in logs is invariant of entries in allEntries

P e e log

⇒ P e

e allEntries

slide-44
SLIDE 44

Affinity Lemmas

Ex 1: Relate ghost state to real state

transfer properties once and for all

Ex 2: Relate current messages to past

response => past request

slide-45
SLIDE 45

Structured Handlers: Example

handler = update_state ; respond handler net net’ update_state net net’ neti respond

slide-46
SLIDE 46

Structured Handlers: Example

handler = update_state ; respond handler net net’

I I

update_state net net’ neti respond

slide-47
SLIDE 47

Structured Handlers: Example

handler = update_state ; respond handler net net’ update_state net net’ neti respond

I I I I I

slide-48
SLIDE 48

The burden of proof

P⇒ P P with ghost state

P true initially P preserved

Lemma Lemma … Lemma

Re-verification is the primary challenge Proof engineering techniques help:

  • affinity lemmas
  • intermediate reachability
  • structural tactics
  • information hiding
slide-49
SLIDE 49

Contributions

First formal proof of Raft’s safety

first verified implementation!

Large-scale Verdi case study

stress test; reverification inevitable

Proof engineering lessons

affinity lemmas, etc.

slide-50
SLIDE 50
slide-51
SLIDE 51

Planning for Change 
 in a Formal Verification of the 
 Raft Consensus Protocol

James Wilcox Steve Anton Zach Tatlock Michael Ernst Tom Anderson Doug Woos
slide-52
SLIDE 52