Practical Formal Verification of MPI and Thread Programs Ganesh - - PowerPoint PPT Presentation

practical formal verification of mpi and thread programs
SMART_READER_LITE
LIVE PREVIEW

Practical Formal Verification of MPI and Thread Programs Ganesh - - PowerPoint PPT Presentation

Practical Formal Verification of MPI and Thread Programs Ganesh Gopalakrishnan, School of Computing, University of Utah, Salt Lake City, UT 84112 A Half-Day Tutorial Proposed for ICS 2009 http:// www.cs.utah.edu / formal_verification


slide-1
SLIDE 1

Practical Formal Verification of MPI and Thread Programs Ganesh Gopalakrishnan, School of Computing, University of Utah, Salt Lake City, UT 84112 A Half-Day Tutorial Proposed for ICS 2009 http:// www.cs.utah.edu / formal_verification Supported by NSF CNS 0509379 and Microsoft

1

slide-2
SLIDE 2

Organization (1)

  • Introduction to Formal Verification

– What are “classic” FV approaches? – How have FV tools evolved to become like debuggers?

  • The role of Dynamic Verification methods in this achievement
  • Practical Formal Verification of MPI programs using ISP

– Interception of MPI calls – Orchestration of Scheduler to achieve coverage of relevant interleavings

  • Practical Formal Verification of Pthread programs

– The role of static analysis – Differences with ISP in terns of interleaving scheduling

2

slide-3
SLIDE 3

Organization (2)

  • Demonstration of ISP (MPI verification) on the development
  • f a distributed Matrix Multiplication Example

– The Development environment – Detection of deadlocks and assertion violations – Viewing completes-before relations that show how scheduling can proceed on different platforms – Demo of Visual Studio and Eclipse front-ends of ISP

  • Demonstration of Inspect (Pthread verification) on the

development of producer/consumer code

– Detection of races, assertion violations, and debugging – How interleaving control works in practice – Demo of Emacs environment for debugging (Eclipse planned for future)

  • All demos will be run on LiveCDs that the attendees can

use during the tutorial and take with them

3

slide-4
SLIDE 4

Organization (3)

  • Scaling ISP (MPI verification)

– Discussion of large examples verified so far – Discussion of future directions

  • Parametric verification of large examples
  • Scaling by integrating strength-reduction methods
  • Scaling Inspect

– Discussion of current large examples – The Binary Instrumentation Route – Discussion of future directions

  • Verifying Thread-pools and other large-scale organizations
  • Verifying in the presence of other APIs
  • Mixed MPI / Thread verification directions

4

slide-5
SLIDE 5

5

SiCortex 5832 processor System (Courtesy SiCortex) IBM Blue Gene (Picture Courtesy IBM) LLNL’s Petascale machine “roadrunner” (AMD Opteron CPUs and IBM PowerX Cell)

Why is MPI Important?

  • Almost the default choice for large-scale parallel simulations
  • Huge support base
  • Very mature codes exist in MPI – cannot easily be re-implemented
  • Performs critical simulations in Science and Engineering

– Weather / Earthquake Prediction, Computational Chemistry,…Parallel Model Checking,..

slide-6
SLIDE 6

6

Illustration of an MPI Program: Integration

if (my_rank == 0) { // MASTER Proc of rank 0 total = integral; for (source = 0; source < p; source++) { MPI_Recv(integral, From source, …); total = total + integral; } } else { // SLAVE Procs of rank 1,2… MPI_Send(region-integral, To proc 0 …); }

5/8/2009

Master adds-up Integrals of sub-regions

slide-7
SLIDE 7

7

Illustration of error-prone nature of MPI

5/8/2009

Yikes – mismatch! Will deadlock! if (my_rank == 0) { // MASTER Proc of rank 0 total = integral; for (source = 0; source < p; source++) { MPI_Recv(integral, From source, …); total = total + integral; } } else { // SLAVE Procs of rank 1,2… MPI_Send(region-integral, To proc 0 …); }

slide-8
SLIDE 8

Why are MPI programs error-prone?

  • API includes > 300 functions
  • Each function has complex semantics
  • Typical programs employ a tenth of these functions

– It is a different tenth for each application type

  • Non-blocking MPI functions (for performance)
  • Out of order completion (for performance)
  • Non-deterministic communication matches
  • Wait, Probe, Barrier

, Broadcast, Communication Spaces, Types, …

8

slide-9
SLIDE 9

MPI Bugs of interest : Deadlocks

P0 P1

  • s(P1);

s(P0); r(P1); r(P0);

Can deadlock if “head to head” sends are done with insufficient buffering

slide-10
SLIDE 10

10

MPI Bugs of interest : Deadlocks

5/8/2009

P0 P1

  • Broadcast;

Barrier; Barrier; Broadcast;

Can deadlock if Collective function calls are incorrectly placed

slide-11
SLIDE 11

11 5/8/2009

P0 P1 P2

  • r(*);

s(P0); s(P0); r(P1); OK

Programmer expected this communication match

MPI Bugs of interest : Communication Races

slide-12
SLIDE 12

12 5/8/2009

P0 P1 P2

  • r(*);

s(P0); s(P0); r(P1); P0 P1 P2

  • r(*);

s(P0); s(P0); r(P1); OK NOK

Programmer did not expect this matching possibility which deadlocks Programmer expected this communication match

MPI Bugs of interest : Communication Races

slide-13
SLIDE 13

13

Resource Leak Pattern…

5/8/2009

P0

  • some_allocation_operation

FORGOTTEN DEALLOCATION !!

slide-14
SLIDE 14

14

Assertion Violations

5/8/2009

Assert statements planted in sequential code

slide-15
SLIDE 15

15

What approach to take?

slide-16
SLIDE 16

Goal: Address Current Programming Realities

Code written using mature libraries (MPI, OpenMP, PThreads, …) API calls made from real programming languages (C, Fortran, C++) Runtime semantics determined by realistic Compilers and Runtimes

Model building and Model maintenance have HUGE costs (I would assert: “impossible in practice”) and does not ensure confidence !!

16

slide-17
SLIDE 17
  • Users want a standard push-button debugger-like interface

– But one that offers coverage guarantees and deeper insights

  • Existing testing methods are woefully inadequate!

Why dynamic analysis?

slide-18
SLIDE 18
  • Users want a standard push-button debugger-like interface

– But one that offers coverage guarantees and deeper insights

  • Existing testing methods are woefully inadequate!

Why dynamic analysis? X

Testing causes

  • missions
slide-19
SLIDE 19
  • Users want a standard push-button debugger-like interface

– But one that offers coverage guarantees and deeper insights

  • Existing testing methods are woefully inadequate!

Why dynamic analysis? X X

Testing causes

  • missions

Static analysis causes false alarms

slide-20
SLIDE 20
  • Users want a standard push-button debugger-like interface

– But one that offers coverage guarantees and deeper insights

  • Existing testing methods are woefully inadequate!

Why dynamic analysis? X X X

Testing causes

  • missions

Static analysis causes false alarms Model-based verif. Models unknown

slide-21
SLIDE 21
  • Users want a standard push-button debugger-like interface

– But one that offers coverage guarantees and deeper insights

  • Existing testing methods are woefully inadequate!

Why dynamic analysis?

21

X X X √

Testing causes

  • missions

Static analysis causes false alarms Model-based verif. Models unknown Dynamic Verification

  • No omissions
  • No false alarms
  • No need for modeling
slide-22
SLIDE 22

22

The “Crooked Barrier” example

P0

  • MPI_Isend (to P2 )

MPI_Barrier P1

  • MPI_Barrier

MPI_Isend(to P2 ) P2

  • MPI_Irecv (from ANY )

MPI_Barrier

Can P1’s Isend (non-blocking send) match P2’s Irecv (non-blocking receive) ?

slide-23
SLIDE 23

23

P0

  • MPI_Isend (to P2 )

MPI_Barrier P1

  • MPI_Barrier

MPI_Isend(to P2 ) P2

  • MPI_Irecv (from ANY )

MPI_Barrier

The “Crooked Barrier” example

Yes!

slide-24
SLIDE 24

24

P0

  • MPI_Isend (to P2 )

MPI_Barrier P1

  • MPI_Barrier

MPI_Isend(to P2 ) P2

  • MPI_Irecv (from ANY )

MPI_Barrier

The “Crooked Barrier” example

slide-25
SLIDE 25

25

P0

  • MPI_Isend (to P2 )

MPI_Barrier P1

  • MPI_Barrier

MPI_Isend(to P2 ) P2

  • MPI_Irecv (from ANY )

MPI_Barrier

The “Crooked Barrier” example

slide-26
SLIDE 26

26

P0

  • MPI_Isend (to P2 )

MPI_Barrier P1

  • MPI_Barrier

MPI_Isend(to P2 ) P2

  • MPI_Irecv (from ANY )

MPI_Barrier

The “Crooked Barrier” example

slide-27
SLIDE 27

27

P0

  • MPI_Isend (to P2 )

MPI_Barrier P1

  • MPI_Barrier

MPI_Isend(to P2 ) P2

  • MPI_Irecv (from ANY )

MPI_Barrier

The “Crooked Barrier” example

slide-28
SLIDE 28

28

P0

  • MPI_Isend (to P2 )

MPI_Barrier P1

  • MPI_Barrier

MPI_Isend(to P2 ) P2

  • MPI_Irecv (from ANY )

MPI_Barrier

The “Crooked Barrier” example

slide-29
SLIDE 29

29

P0

  • MPI_Isend (to P2 )

MPI_Barrier P1

  • MPI_Barrier

MPI_Isend(to P2 ) P2

  • MPI_Irecv (from ANY )

MPI_Barrier

Our Dynamic Analysis must be aware of the fact that MPI_Barrier can complete before Isend / Irecv …

The “Crooked Barrier” example

slide-30
SLIDE 30

30

P0

  • MPI_Isend (to P2 )

MPI_Barrier P1

  • MPI_Barrier

MPI_Isend(to P2 ) P2

  • MPI_Irecv (from ANY )

MPI_Barrier

Notice that while we can issue { P0’s Isend , P2’s Irecv }

Why the standard-type of “replay different matches” does not work in presence of Barrier

slide-31
SLIDE 31

31

P0

  • MPI_Isend (to P2 )

MPI_Barrier P1

  • MPI_Barrier

MPI_Isend(to P2 ) P2

  • MPI_Irecv (from ANY )

MPI_Barrier

Notice that while we can issue { P0’s Isend , P2’s Irecv } we cannot issue { P1’s Isend, P2’s Irecv }

Why the standard-type of “replay different matches” does not work in presence of Barrier

slide-32
SLIDE 32

32

P0

  • MPI_Isend (to P2 )

MPI_Barrier P1

  • MPI_Barrier

MPI_Isend(to P2 ) P2

  • MPI_Irecv (from ANY )

MPI_Barrier

Notice that while we can issue { P0’s Isend , P2’s Irecv } we cannot issue { P1’s Isend, P2’s Irecv }

  • because P1’s Isend can’t be issued WITHOUT ALSO issuing P0’s Isend
  • but then we have no control over the MPI runtime’s match selection

Why the standard-type of “replay different matches” does not work in presence of Barrier

slide-33
SLIDE 33

33

Approach: Issue MPI Operations Out of Order!

P0

  • MPI_Isend (to P2 )

MPI_Barrier P1

  • MPI_Barrier

MPI_Isend(to P2 ) P2

  • MPI_Irecv (from ANY )

MPI_Barrier

slide-34
SLIDE 34

34

First, Fire the Barrier statements!

P0

  • MPI_Isend (to P2 )

MPI_Barrier P1

  • MPI_Barrier

MPI_Isend(to P2 ) P2

  • MPI_Irecv (from ANY )

MPI_Barrier

slide-35
SLIDE 35

35

Then rewrite “ANY” to “from P0” and do one interleaving

P0

  • MPI_Isend (to P2 )

MPI_Barrier P1

  • MPI_Barrier

MPI_Isend(to P2 ) P2

  • MPI_Irecv (from P0 )

MPI_Barrier

Pursue this interleaving

slide-36
SLIDE 36

36

P0

  • MPI_Isend (to P2 )

MPI_Barrier P1

  • MPI_Barrier

MPI_Isend(to P2 ) P2

  • MPI_Irecv (from P0 )

MPI_Barrier

Pursue this interleaving

  • Dynamic Rewriting Forces MPI Runtime to schedule the way we want
  • Several such techniques to ensure “progress” across different MPI libraries

Then rewrite “ANY” to “from P0” and do one interleaving

slide-37
SLIDE 37

37

P0

  • MPI_Isend (to P2 )

MPI_Barrier P1

  • MPI_Barrier

MPI_Isend(to P2 ) P2

  • MPI_Irecv (from P1 )

MPI_Barrier

Then restart and pursue this interleaving

Then rewrite “ANY” to “from P1” and do the other

slide-38
SLIDE 38

Executable Proc1 Proc2 …… Procn Scheduler that generates ALL RELEVANT schedules (Mazurkeiwicz Traces)

Run

MPI Runtime

38

MPI Program

Interposition Layer

Workflow of ISP

slide-39
SLIDE 39

39

P0 P1 P2

Barrier Isend(1, req) Wait(req) Scheduler Irecv(*, req) Barrier Recv(2) Wait(req) Isend(1, req) Wait(req) Barrier Isend(1) sendNext Barrier MPI Runtime

POE

slide-40
SLIDE 40

P0 P1 P2

Barrier Isend(1, req) Wait(req) Scheduler Irecv(*, req) Barrier Recv(2) Wait(req) Isend(1, req) Wait(req) Barrier Isend(1) sendNext Barrier Irecv(*) Barrier

40

MPI Runtime

POE

slide-41
SLIDE 41

P0 P1 P2

Barrier Isend(1, req) Wait(req) Scheduler Irecv(*, req) Barrier Recv(2) Wait(req) Isend(1, req) Wait(req) Barrier Isend(1) Barrier Irecv(*) Barrier Barrier Barrier Barrier Barrier

41

MPI Runtime

POE

slide-42
SLIDE 42

P0 P1 P2

Barrier Isend(1, req) Wait(req) MPI Runtime Scheduler Irecv(*, req) Barrier Recv(2) Wait(req) Isend(1, req) Wait(req) Barrier Isend(1) Barrier Irecv(*) Barrier Barrier Wait (req) Recv(2) Isend(1) SendNext Wait (req) Irecv(2) Isend Wait No Match-Set

42

Deadlock!

POE

slide-43
SLIDE 43

POE Contributions

  • ISP (using POE) is the ONLY dynamic model checker for MPI
  • Insightful formulation of reduction algorithm

– MPI Semantics – Distinguishing Match and Complete events – Completes-before – Prioritized execution giving reduction, and guarantee of maximal senders matching receivers

  • Works really well

– Large examples (e.g. ParMETIS – 14 KLOC) finish in one interleaving – Even if wildcard receives are used, POE often finds that no non- determinism arises – Valuable byproduct : Removal of Functionally Irrelevant Barriers

  • If Barrier does not help introduce orderings that confine non-determinism
  • ..then one can remove the barrier

43

slide-44
SLIDE 44

44

Visual Studio and Java GUI ; Eclipse is planned

slide-45
SLIDE 45

Present Situation

  • ISP: a push-button dynamic verifier for MPI programs
  • Find deadlocks, resource leaks, assertion violations

– Code level model checking – no manual model building – Guarantee soundness for one test input – Works for MacOS, Linux, Windows – Works for MPICH2, OpenMPI, MS MPI – Verifies 14KLOC in seconds

  • ISP is available for download:

http://cs.utah.edu/formal_verification/ISP-release

45

slide-46
SLIDE 46

RESULTS USING ISP

  • The only push-button model checker for MPI/C programs

– (the only other model checking approach is MPI-SPIN)

  • Testing misses deadlocks even on a page of code

– See http://www.cs.utah.edu/formal_verification/ISP_Tests

  • ISP is meant as a safety-net during manual optimizations

– A programmer takes liberties that they would otherwise not – Value amply evident even when tuning the Matrix Mult code

  • Deadlock found in one test of MADRE (3K LOC)

– Later found to be a known deadlock

46

slide-47
SLIDE 47

RESULTS USING ISP

  • Handled these examples

– IRS (Sequoia Benchmarks), ParMETIS (14K LOC), MADRE (3K LOC)

  • Working on these examples

– MPI-BLAST , ADLB

  • There is significant outreach work remaining

– The user community of MPI is receptive to FV – But they really have no experience evaluating a model checker – We are offering many tutorials this year

  • ICS 2009
  • EuroPVM / MPI 2009 (likely)
  • Applying for Cluster 2009
  • Applying for Super Computing 2009

47

slide-48
SLIDE 48

Example of MPI Code (Mat Mat Mult)

48

X =

slide-49
SLIDE 49

Example of MPI Code (Mat Mat Mult)

49

X =

MPI_Bcast MPI_Bcast MPI_Bcast MPI_Bcast MPI_Bcast

slide-50
SLIDE 50

Example of MPI Code (Mat Mat Mult)

50

X =

MPI_Recv MPI_Send

slide-51
SLIDE 51

Example of MPI Code (Mat Mat Mult)

51

X =

MPI_Send MPI_Recv

slide-52
SLIDE 52

Example of MPI Code (Mat Mat Mult)

52

X =

MPI_Send MPI_Recv

slide-53
SLIDE 53

Example of MPI Code (Mat Mat Mult)

53

X =

MPI_Send MPI_Recv

slide-54
SLIDE 54

Example of MPI Code (Mat Mat Mult)

54

X =

slide-55
SLIDE 55

Example of MPI Code (Mat Mat Mult)

55

X =

MPI_Send MPI_Recv

slide-56
SLIDE 56

Example of MPI Code (Mat Mat Mult)

56

X =

MPI_Send MPI_Recv

slide-57
SLIDE 57

Example of MPI Code (Mat Mat Mult)

57

X =

MPI_Send MPI_Recv

slide-58
SLIDE 58

Example of MPI Code (Mat Mat Mult)

58

X =

MPI_Send MPI_Recv

slide-59
SLIDE 59

Salient Code Features

59

MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numprocs); MPI_Comm_rank(MPI_COMM_WORLD, &myid); Master (rank 0) Slaves (ranks 1-4)

slide-60
SLIDE 60

Salient Code Features

60

if (myid == master) { ... MPI_Bcast(b, brows*bcols, MPI_FLOAT, master, …); ... } else { // All Slaves do this ... MPI_Bcast(b, brows*bcols, MPI_FLOAT, master, …); ... }

slide-61
SLIDE 61

Salient Code Features

61

if (myid == master) { ... for (i = 0; i < numprocs-1; i++) { for (j = 0; j < acols; j++) { buffer[j] = a[i*acols+j]; } MPI_Send(buffer, acols, MPI_FLOAT, i+1, …); numsent++; } } else { // slaves ... while (1) { ... MPI_Recv(buffer, acols, MPI_FLOAT, master, …); ... } }

Block till buffer is copied into System Buffer System Buffer

slide-62
SLIDE 62

Handling Rows >> Processors …

62

MPI_Send MPI_Recv Send Next Row to First Slave which By now must be free

slide-63
SLIDE 63

Handling Rows >> Processors …

63

MPI_Send MPI_Recv OR Send Next Row to First Slave that returns the answer!

slide-64
SLIDE 64

Salient Code Features

64

if (myid == master) { ... for (i = 0; i < crows; i++) { MPI_Recv(ans, ccols, MPI_FLOAT, FROM FIRST-PROCESSOR, ...); ... if (numsent < arows) { for (j = 0; j < acols; j++) { buffer[j] = a[numsent*acols+j]; } MPI_Send(buffer, acols, MPI_FLOAT, BACK TO FIRST-PROCESSOR, ...); numsent++; ... } }

slide-65
SLIDE 65

Optimization

65

if (myid == master) { ... for (i = 0; i < crows; i++) { MPI_Recv(ans, ccols, MPI_FLOAT, FROM ANYBODY, ...); ... if (numsent < arows) { for (j = 0; j < acols; j++) { buffer[j] = a[numsent*acols+j]; } MPI_Send(buffer, acols, MPI_FLOAT, BACK TO THAT BODY, ...); numsent++; ... } }

slide-66
SLIDE 66

Optimization

66

if (myid == master) { ... for (i = 0; i < crows; i++) { MPI_Recv(ans, ccols, MPI_FLOAT, FROM ANYBODY, ...); ... if (numsent < arows) { for (j = 0; j < acols; j++) { buffer[j] = a[numsent*acols+j]; } MPI_Send(buffer, acols, MPI_FLOAT, BACK TO THAT BODY, ...); numsent++; ... } }

Shows that wildcard receives can arise quite naturally …

slide-67
SLIDE 67

Further Optimization

67

if (myid == master) { ... for (i = 0; i < crows; i++) { MPI_Recv(ans, ccols, MPI_FLOAT, FROM ANYBODY, ...); ... if (numsent < arows) { for (j = 0; j < acols; j++) { buffer[j] = a[numsent*acols+j]; }

… here, wait for previous Isend (if any) to finish … MPI_Isend(buffer, acols, MPI_FLOAT, BACK TO THAT BODY, ...);

numsent++; ... } }

slide-68
SLIDE 68

68

Run Visual Studio ISP Plug-in Demo

slide-69
SLIDE 69

69

Slides on Inspect

slide-70
SLIDE 70

Inspect’s Workflow

5/8/2009

School of Computing University of Utah 70 Multithreaded C Program Instrumented Program instrumentation Thread Library Wrapper compile

thread 1 thread n request/permit

Scheduler

http://www.cs.utah.edu/~yuyang/inspect

Executable

slide-71
SLIDE 71

Overview of the source transformation done by Inspect

5/8/2009 School of Computing University of Utah

71

Inter-procedural Flow-sensitive Context-insensitive Alias Analysis Thread Escape Analysis Source code transformation Instrumented Program Multithreaded C Program Intra-procedural Dataflow Analysis

slide-72
SLIDE 72

72

Result of instrumentation

void * Philosopher(void * arg){ int i; i = (int)arg; ... pthread_mutex_lock(&mutexes[i%3]); ... while (permits[i%3] == 0) { printf("P%d : tryget F%d\n", i, i%3); pthread_cond_wait(...); } ... permits[i%3] = 0; ... pthread_cond_signal(&conditionVars[i%3]); pthread_mutex_unlock(&mutexes[i%3]); return NULL; } void *Philosopher(void *arg ) { int i ; pthread_mutex_t *tmp ; { inspect_thread_start("Philosopher"); i = (int )arg; tmp = & mutexes*i % 3+; … inspect_mutex_lock(tmp); … while (1) { __cil_tmp43 = read_shared_0(& permits[i % 3] if (! __cil_tmp32) { break; } __cil_tmp33 = i % 3; … tmp___0 = __cil_tmp33; … inspect_cond_wait(...); } ... write_shared_1(& permits[i % 3], 0); ... inspect_cond_signal(tmp___25); ...

slide-73
SLIDE 73

73

Inspect animation

Scheduler action request thread permission Unix domain sockets Message Buffer State stack DPOR Unix domain sockets

Visible operation interceptor

Program under test

slide-74
SLIDE 74

How does Inspect avoid being killed by the exponential number of thread interleavings ??

74

slide-75
SLIDE 75

75

p threads with n actions each: #interleavings = (n.p)! / (n!)p

1: 2: 3: 4: … n:

Thread 1 …. Thread p

1: 2: 3: 4: … n:

  • p=R, n=1 R! interleavings
  • p = 3, n = 5 106 interleavings
  • p = 3, n = 6 17 * 106 interleavings
  • p = 4, n = 5 1010 interleavings
slide-76
SLIDE 76

the exponential number of thread interleavings ?? Ans: Inspect uses Dynamic Partial Order Reduction Basically, interleaves threads ONLY when dependencies exist between thread actions !!

76

slide-77
SLIDE 77

A concrete example of interleaving reductions

77

slide-78
SLIDE 78

78

On the HUGE importance of DPOR

[ NEW SLIDE ]

void * thread_A(void* arg) { pthread_mutex_lock(&mutex); A_count++; pthread_mutex_unlock(&mutex); } void * thread_B(void * arg) { pthread_mutex_lock(&lock); B_count++; pthread_mutex_unlock(&lock); }

BEFORE INSTRUMENTATION

void *thread_A(void *arg ) // thread_B is similar { void *__retres2 ; int __cil_tmp3 ; int __cil_tmp4 ; { inspect_thread_start("thread_A"); inspect_mutex_lock(& mutex); __cil_tmp4 = read_shared_0(& A_count); __cil_tmp3 = __cil_tmp4 + 1; write_shared_1(& A_count, __cil_tmp3); inspect_mutex_unlock(& mutex); __retres2 = (void *)0; inspect_thread_end(); return (__retres2); } }

AFTER INSTRUMENTATION (transitions are shown as bands)

slide-79
SLIDE 79

79

On the HUGE importance of DPOR

[ NEW SLIDE ]

void * thread_A(void* arg) { pthread_mutex_lock(&mutex); A_count++; pthread_mutex_unlock(&mutex); } void * thread_B(void * arg) { pthread_mutex_lock(&lock); B_count++; pthread_mutex_unlock(&lock); }

BEFORE INSTRUMENTATION

void *thread_A(void *arg ) // thread_B is similar { void *__retres2 ; int __cil_tmp3 ; int __cil_tmp4 ; { inspect_thread_start("thread_A"); inspect_mutex_lock(& mutex); __cil_tmp4 = read_shared_0(& A_count); __cil_tmp3 = __cil_tmp4 + 1; write_shared_1(& A_count, __cil_tmp3); inspect_mutex_unlock(& mutex); __retres2 = (void *)0; inspect_thread_end(); return (__retres2); } }

AFTER INSTRUMENTATION (transitions are shown as bands)

  • ONE interleaving with DPOR
  • 252 = (10!) / (5!)2 without DPOR
slide-80
SLIDE 80

80

  • bzip2smp has 6000 lines of code split among 6 threads
  • roughly, it has a theoretical max number of interleavings

being of the order of

– (6000! ) / (1000!) ^ 6 == ?? – This is the execution space that a testing tool foolishly tries to navigate – bzip2smp with Inspect finished in 51,000 interleavings over a few hours – THIS IS THE RELEVANT SET OF INTERLEAVINGS

  • MORE FORMALLY: its Mazurkeiwicz trace set

More eye-popping numbers

slide-81
SLIDE 81

Dynamic Partial Order Reduction (DPOR) “animatronics”

P0 P1 P2

lock(y) ………….. unlock(y) lock(x) ………….. unlock(x) lock(x) ………….. unlock(x) L0 U0 L1 L2 U1 U2 L0 U0 L2 U2 L1 U1 81

slide-82
SLIDE 82

Another DPOR animation (to help show how DDPOR works…)

82

slide-83
SLIDE 83

83

A Simple DPOR Example

{}, {}

t0: lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t)

{ BT }, { Done }

slide-84
SLIDE 84

84

t0: lock {}, {}

t0: lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t)

A Simple DPOR Example

{ BT }, { Done }

slide-85
SLIDE 85

85

t0: lock t0: unlock {}, {}

t0: lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t)

A Simple DPOR Example

{ BT }, { Done }

slide-86
SLIDE 86

86

t0: lock t0: unlock t1: lock {}, {}

t0: lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t)

A Simple DPOR Example

{ BT }, { Done }

slide-87
SLIDE 87

87

t0: lock t0: unlock t1: lock {t1}, {t0}

t0: lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t)

A Simple DPOR Example

{ BT }, { Done }

slide-88
SLIDE 88

88

t0: lock t0: unlock t1: lock t1: unlock t2: lock {t1}, {t0} {}, {}

t0: lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t)

A Simple DPOR Example

{ BT }, { Done }

slide-89
SLIDE 89

89

t0: lock t0: unlock t1: lock t1: unlock t2: lock {t1}, {t0} {t2}, {t1}

t0: lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t)

A Simple DPOR Example

{ BT }, { Done }

slide-90
SLIDE 90

90

t0: lock t0: unlock t1: lock t2: unlock t1: unlock t2: lock {t1}, {t0} {t2}, {t1}

t0: lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t)

A Simple DPOR Example

{ BT }, { Done }

slide-91
SLIDE 91

91

t0: lock t0: unlock t1: lock t1: unlock t2: lock {t1}, {t0} {t2}, {t1}

t0: lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t)

A Simple DPOR Example

{ BT }, { Done }

slide-92
SLIDE 92

92

t0: lock t0: unlock {t1}, {t0} {t2}, {t1}

t0: lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t)

A Simple DPOR Example

{ BT }, { Done }

slide-93
SLIDE 93

93

t0: lock t0: unlock t2: lock {t1,t2}, {t0} {}, {t1, t2}

t0: lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t)

A Simple DPOR Example

{ BT }, { Done }

slide-94
SLIDE 94

94

t0: lock t0: unlock t2: lock t2: unlock {t1,t2}, {t0} {}, {t1, t2}

t0: lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t)

A Simple DPOR Example

{ BT }, { Done }

slide-95
SLIDE 95

95

t0: lock t0: unlock {t1,t2}, {t0} {}, {t1, t2}

t0: lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t)

A Simple DPOR Example

{ BT }, { Done }

slide-96
SLIDE 96

96

{t2}, {t0,t1}

t0: lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t)

A Simple DPOR Example

{ BT }, { Done }

slide-97
SLIDE 97

97

t1: lock t1: unlock {t2}, {t0, t1}

t0: lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t)

A Simple DPOR Example

{ BT }, { Done }

slide-98
SLIDE 98

98

  • Once the backtrack set gets populated, ships work

description to other nodes

  • We obtain distributed model checking using MPI
  • Once we figured out a crucial heuristic (SPIN 2007) we

have managed to get linear speed-up….. so far….

This is how DDPOR works

slide-99
SLIDE 99

99

worker a worker b

Request unloading idle node id work description report result

load balancer We have devised a work-distribution scheme (SPIN 2007)

slide-100
SLIDE 100

100

Speedup on aget

slide-101
SLIDE 101

101

Speedup on bbuf

slide-102
SLIDE 102

Conclusions

  • Dynamic analysis is catching on for very good reasons

– Pioneered by the VeriSoft effort (Godefroid, POPL 1997) – CHESS – MODIST – Our work on MCAPI

  • Integration of threading, messaging, reactive behaviors..
  • Engineering dynamic analyzers can be greatly facilitated if API

designers are considerate to dynamic verification tool builders

  • Immense user interface design opportunities to facilitate debugging

102