Verifying Distributed Programs via Canonical Sequentialization - - PowerPoint PPT Presentation

verifying distributed programs via canonical
SMART_READER_LITE
LIVE PREVIEW

Verifying Distributed Programs via Canonical Sequentialization - - PowerPoint PPT Presentation

Verifying Distributed Programs via Canonical Sequentialization Klaus von Gleissenthall Joint work with Alexander Bakst, Ranjit Jhala and Rami Gkhan Kc 1 Writing distributed programs A bug appears Issue: random hangs / deadlock in


slide-1
SLIDE 1

Klaus von Gleissenthall Joint work with Alexander Bakst, Ranjit Jhala and Rami Gökhan Kıcı

1

Verifying Distributed Programs via Canonical Sequentialization

slide-2
SLIDE 2

Writing distributed programs

A bug appears…

Issue: random hangs / deadlock in mono

slide-3
SLIDE 3

… haunts you …

Writing distributed programs

  • ccurs in about

10% of our runs

Issue: random hangs / deadlock in mono

slide-4
SLIDE 4

… then you write some more code…

Writing distributed programs

moved to version 4.8.0.483.

slide-5
SLIDE 5

… and the bug disappears…

Writing distributed programs

yet to reproduce the issue in 
 4.8.0.483 moved to version 4.8.0.483.

slide-6
SLIDE 6

…leaving you hoping it stays gone.

should be more confident in a few weeks

Writing distributed programs

yet to reproduce the issue in 
 4.8.0.483 moved to version 4.8.0.483.

slide-7
SLIDE 7

Can we catch all deadlocks during compile-edit cycle?

A better world

slide-8
SLIDE 8

coord :: Transaction -> Int -> SymSet ProcessId -> Process () coord transaction n nodes = do fold query () nodes n_ <- fold countVotes 0 nodes if n == n_ then forEach nodes commit () else forEach nodes abort () forEach nodes expect :: Ack where query () pid = do { me <- myPid; send pid (pid, transaction) } countVotes init nodes = do msg <- expect :: Vote case msg of Accept _ -> return (x + 1) Reject -> return x acceptor :: Process () acceptor = do me <- myPid (who, transaction) <- expect :: (ProcessId, Transaction) vote <- chooseVote transaction send who vote

A better world

check unmatched send unmatched receive sent wrong response address let’s fix it

slide-9
SLIDE 9

coord :: Transaction -> Int -> SymSet ProcessId -> Process () coord transaction n nodes = do fold query () nodes n_ <- fold countVotes 0 nodes if n == n_ then forEach nodes commit () else forEach nodes abort () forEach nodes expect :: Ack where query () pid = do { me <- myPid; send pid (me, transaction) } countVotes init nodes = do msg <- expect :: Vote case msg of Accept _ -> return (x + 1) Reject -> return x acceptor :: Process () acceptor = do me <- myPid (who, transaction) <- expect :: (ProcessId, Transaction) vote <- chooseVote transaction send who vote

A better world

check proof No deadlocks can occur!

A better world

slide-10
SLIDE 10

This talk: Brisk

Fast enough for interactive use Proves absence of deadlocks Provides counterexamples Restricted computation model

slide-11
SLIDE 11

But Expressive Enough to Implement:

  • Map Reduce
  • Distributed File System

Restricted computation model

  • Work Stealing
slide-12
SLIDE 12

The Key Idea The Implementation The Evaluation Outline The Problems

slide-13
SLIDE 13

The Problems

slide-14
SLIDE 14

Example: Two phase commit (2PC)

coordinator nodes

Goal: Commit Transaction to all nodes

slide-15
SLIDE 15

sends data data

Example: Two phase commit (2PC) Phase 1

depending on the value, votes to commit or abort

slide-16
SLIDE 16

commit depending on the value, votes to commit or abort commit commit commit commits if no

  • ne voted to abort

aborts otherwise

Example: Two phase commit (2PC) Phase 1

slide-17
SLIDE 17

sends decision to commit (or abort) commit commits transaction

Phase 2

Example: Two phase commit (2PC)

slide-18
SLIDE 18

send acknowledgement ACK done

Phase 2

Example: Two phase commit (2PC)

slide-19
SLIDE 19

Does Implementation Deadlock?

Sends match receives?

How to verify 2PC?

slide-20
SLIDE 20

How to verify 2PC?

slide-21
SLIDE 21

data commit commit processes execute at 
 different speeds messages may travel at different speeds commit

How to verify 2PC?

Problem: Asynchrony

Races trigger different behaviors

slide-22
SLIDE 22

… …

don’t know how many nodes at runtime

How to verify 2PC?

Problem: Unbounded Processes

slide-23
SLIDE 23

How to verify 2PC?

Testing? No guarantees Proofs? High user burden Model checking…? Infinite number of states

slide-24
SLIDE 24

The Key Idea The Implementation The Evaluation Outline The Problems

slide-25
SLIDE 25

The Key Idea The Implementation The Evaluation Outline The Problems

slide-26
SLIDE 26

The Key Idea Canonical Sequentialization

slide-27
SLIDE 27

1 2 3 1 2 3

; ;

Canonical Sequentialization

Don’t enumerate execution orders…

… Reason about single representative execution

slide-28
SLIDE 28

Canonical Sequentialization

; ; ;

  • 1. Sends

transaction it wants to commit

  • 3. Relay decision
  • 4. Send

acknowledgments

  • 2. Send votes

1 2 3

; ; ;

1 2 3

; ; ;

1 2 3

; ; ;

1 2 3

; ; ; ;

Example 2PC

slide-29
SLIDE 29

Work stealing queue

A Trickier Example

Canonical Sequentialization

slide-30
SLIDE 30

1 2 3

queue assigns work

workers perform tasks

coordinator collects results

Work stealing queue

slide-31
SLIDE 31

1 2 3 idle workers ask for work queue assigns an item sends result to the coordinator compute results

Work stealing queue

slide-32
SLIDE 32

queue assigns task to arbitrary worker

;

1 1 3

; ; ;

1 3 1

; ; ;

arbitrary worker picks result from set

Sequentialized

;

who computes result writes to result set sends it to master for each item

slide-33
SLIDE 33

How can sequentialization help verify programs?

slide-34
SLIDE 34

canonical sequentialization compute its

same halting states implies deadlock freedom use to prove additional properties

  • n simpler,

sequential program no sequentialization means likely wrong

How can sequentialization help verify programs?

slide-35
SLIDE 35

The Key Idea The Implementation The Evaluation Outline The Problems

slide-36
SLIDE 36

The Key Idea The Implementation The Evaluation Outline The Problems

slide-37
SLIDE 37

The Implementation

slide-38
SLIDE 38

The Implementation

  • 1. Restrict Computation Model
  • 2. Sequentialize by Rewriting
slide-39
SLIDE 39

Races yield equivalent outcomes

Symmetric Nondeterminism

  • 1. Restrict Computation Model
slide-40
SLIDE 40

data no race

Example: Phase 1 of 2PC

coordinator sends transaction

Symmetric Nondeterminism

slide-41
SLIDE 41

commit commit commit Race processes are symmetric

Example: Phase 1 of 2PC

Send vote

Symmetric Nondeterminism

same

  • utcome?
slide-42
SLIDE 42

Symmetry means invariance under transformation invariant under rotation not this one look at from above

Symmetry

slide-43
SLIDE 43

Permuting Process Identifiers Yields equivalent halting states

In Distributed Systems

Symmetry

[Norris and Dill 1996]

slide-44
SLIDE 44

Name the processes n1 n2 n3 Permuting n1 and n2 equivalent halting states

Symmetry

Example: Phase 1 of 2PC

slide-45
SLIDE 45

commit commit commit n1 n2 n3 (msg,id) <-recv; (commit,n1) pick n1 choose between picking n1 and n2 did we lose any states?

Example: Phase 1 of 2PC

Symmetric Nondeterminism

slide-46
SLIDE 46

commit commit commit n1 n2 n3 (msg,id)<-recv; (commit,n2) No! if we pick n2 we can permute ids (commit,n1) so the states have the same behavior to end up in same state

Example: Phase 1 of 2PC

Symmetric Nondeterminism

slide-47
SLIDE 47

How can we use symmetry to sequentialize?

slide-48
SLIDE 48

data receive directly after sending no race

Example: Phase 1 of 2PC

coordinator sends transaction

[Lipton75]

Symmetric Nondeterminism

slide-49
SLIDE 49

; ; ;

Example: Phase 1 of 2PC

Symmetric Nondeterminism

slide-50
SLIDE 50

commit commit commit Race What now? processes are symmetric

Example: Phase 1 of 2PC

Send vote equivalent

  • utcomes

pick any!

Symmetric Nondeterminism

slide-51
SLIDE 51

; ; ;

Example: Phase 1 of 2PC

Symmetric Nondeterminism

slide-52
SLIDE 52

The Implementation

  • 1. Restrict Computation Model
  • 2. Sequentialize by Rewriting
slide-53
SLIDE 53

(by example)

  • 2. Sequentialize by Rewriting
slide-54
SLIDE 54

send q ping v <- recv p;

||

send p pong q w <- recv q p v <- ping ; q

Example 1 p, q are in parallel

  • 2. Sequentialize by Rewriting
slide-55
SLIDE 55

||

Sequentialization send p pong q w <- recv q p w <- pong p

p, q are in parallel

v <- ping ; q

Example 1

  • 2. Sequentialize by Rewriting
slide-56
SLIDE 56

||

send q ping w <- recv q for q in qs do end p

q∈qs loop over processes set

  • f symmetric

processes

p, qs={q1…qn} are in parallel

v <- recv p; send p pong q

Example 2

  • 2. Sequentialize by Rewriting
slide-57
SLIDE 57

||

send q ping w <- recv q for q in qs do end p v <- recv p; send p pong

q∈qs Arbitrary iteration q w <- pong p v <- ping ; q for q in qs do end

p, qs={q1…qn} are in parallel

Generalize

Example 2

  • 2. Sequentialize by Rewriting
slide-58
SLIDE 58

for q in qs do end

||

send q ping p v <- recv p; send p pong two loops w <- recv qs for q in qs do end

q∈qs q

{ {

Example 3

  • 2. Sequentialize by Rewriting
slide-59
SLIDE 59

for q in qs do end

||

send q ping p v <- recv p; send p pong w <- recv qs for q in qs do end

q∈qs q end for q in qs do v <- ping ; q

Example 3

  • 2. Sequentialize by Rewriting
slide-60
SLIDE 60

end for q in qs do v <- ping ; q

||

partially sequentialized send p pong q

q∈qs for q in qs do end w <- recv qs p

;

for q in qs do end v <- ping ; w <- pong p q for q in qs do end symmetric (checked)

Example 3

  • 2. Sequentialize by Rewriting
slide-61
SLIDE 61

The Implementation

  • 1. Restrict Computation Model
  • 2. Sequentialize by Rewriting
slide-62
SLIDE 62

The Key Idea The Implementation The Evaluation Outline The Problems

slide-63
SLIDE 63

The Key Idea The Implementation The Evaluation Outline The Problems

slide-64
SLIDE 64

The Evaluation

slide-65
SLIDE 65

Implemented in a Haskell library

; ; ;

communication primitives like send / receive / foreach computes canonical sequentialization provides counterexample to sequentialization

Brisk

The Evaluation

slide-66
SLIDE 66

The Evaluation

Textbook algorithms

Name Time (ms)

ConcDB

20 DistDB 20 Firewall 30 LockServer 30 MapReduce 30 Parikh 20 Registry 30 TwoBuyers 20 2PC 50 WorkSteal 40 Theque 100

Variant of DISCO distributed filesystem Map/ Reduce framework fast enough for interactive use

slide-67
SLIDE 67

Summary

Reason about representative sequentialization

symmetric races + sequentialization = verify deadlock freedom in tens of milliseconds

symmetric races produce equivalent outcomes

slide-68
SLIDE 68

What’s next

larger class

  • f properties

larger program class Faults

slide-69
SLIDE 69

Backup slides

slide-70
SLIDE 70

2PC: Faults

commit may go down back up later crash/ recover same state just more asynchrony what about faults?

slide-71
SLIDE 71

2PC: Faults

Paxos made simple

slide-72
SLIDE 72

File System

immutable mutable

slide-73
SLIDE 73

Real consequences

It is random. We have tried for 6 months to narrow down … … to no avail. Issue tracker: random hangs / deadlock in mono

https://bugzilla.xamarin.com/show_bug.cgi?id=42665

It occurs in about 10% of our runs.

slide-74
SLIDE 74

What happened to the bug?

Normally, 100,000 runs would hang 20% of the runs. This issue still occurs in mono-4.6.2.16 however we have yet to reproduce the issue in 4.8.0.483. So far, 100,000 runs has produced no hangs.

https://bugzilla.xamarin.com/show_bug.cgi?id=42665

I should be more confident in a few more weeks (fingers are still crossed right now).

slide-75
SLIDE 75

Summary

; ; ;

Programmers don’t case split on execution orders Our approach: compute sequentialization fast ~20-100 ms automated proofs Correct programs often have an equivalent sequentialization

slide-76
SLIDE 76

Help writing distributed programs

We want to prove absence of deadlocks quick feedback while compiling no manual proofs

slide-77
SLIDE 77

Canonical Sequentialization in Brisk

; ;

same halting states existence implies deadlock freedom check additional safety properties

  • n

simpler, sequential program Our approach: compute canonical sequentialization no sequentialization = likely wrong

slide-78
SLIDE 78

Results

micro benchmarks Distributed file system Firewall, Map Reduce, 2PC <= 100ms

slide-79
SLIDE 79

The Dream The Key Idea The Implementation The Evaluation Outline The Problems