Practical Formal Verification of MPI and Thread Programs Ganesh - PowerPoint PPT Presentation

First, Fire the Barrier statements! P1 P2 P0 --- --- --- MPI_Barrier MPI_Irecv (from ANY ) MPI_Isend (to P2 ) MPI_Isend(to P2 ) MPI_Barrier MPI_Barrier 34

Then rewrite “ANY” to “from P0” and do one interleaving P1 P2 P0 --- --- --- MPI_Barrier MPI_Irecv (from P0 ) MPI_Isend (to P2 ) MPI_Isend(to P2 ) MPI_Barrier MPI_Barrier Pursue this interleaving 35

Then rewrite “ANY” to “from P0” and do one interleaving P1 P2 P0 --- --- --- MPI_Barrier MPI_Irecv (from P0 ) MPI_Isend (to P2 ) MPI_Isend(to P2 ) MPI_Barrier MPI_Barrier Pursue this interleaving • Dynamic Rewriting Forces MPI Runtime to schedule the way we want • Several such techniques to ensure “progress” across different MPI libraries 36

Then rewrite “ANY” to “from P1” and do the other P1 P2 P0 --- --- --- MPI_Barrier MPI_Irecv (from P1 ) MPI_Isend (to P2 ) MPI_Isend(to P2 ) MPI_Barrier MPI_Barrier Then restart and pursue this interleaving 37

Workflow of ISP Scheduler that Executable generates ALL MPI RELEVANT Program Run Proc 1 schedules Proc 2 (Mazurkeiwicz …… Interposition Traces) Proc n Layer MPI Runtime 38

POE Scheduler P0 P1 P2 Isend(1) sendNext Barrier Isend(1, req) Irecv(*, req) Barrier Barrier Barrier Isend(1, req) Wait(req) Recv(2) Wait(req) Wait(req) MPI Runtime 39

POE Scheduler P0 P1 P2 Isend(1) Barrier sendNext Isend(1, req) Irecv(*, req) Barrier Irecv(*) Barrier Barrier Barrier Isend(1, req) Wait(req) Recv(2) Wait(req) Wait(req) MPI Runtime 40

POE Scheduler P0 P1 P2 Isend(1) Barrier Barrier Barrier Isend(1, req) Irecv(*, req) Barrier Irecv(*) Barrier Barrier Barrier Isend(1, req) Barrier Wait(req) Recv(2) Wait(req) Barrier Wait(req) MPI Runtime 41

POE Scheduler P0 P1 P2 Isend(1) Irecv(2) Barrier Isend Wait (req) Isend(1, req) Irecv(*, req) Barrier No Irecv(*) Match-Set Barrier Barrier Isend(1, req) Barrier Recv(2) SendNext Wait(req) Recv(2) Wait(req) Barrier Wait(req) Isend(1) Wait Deadlock! Wait (req) MPI Runtime 42

POE Contributions • ISP (using POE) is the ONLY dynamic model checker for MPI • Insightful formulation of reduction algorithm – MPI Semantics – Distinguishing Match and Complete events – Completes-before – Prioritized execution giving reduction, and guarantee of maximal senders matching receivers • Works really well – Large examples (e.g. ParMETIS – 14 KLOC) finish in one interleaving – Even if wildcard receives are used, POE often finds that no non- determinism arises – Valuable byproduct : Removal of Functionally Irrelevant Barriers • If Barrier does not help introduce orderings that confine non-determinism • ..then one can remove the barrier 43

Visual Studio and Java GUI ; Eclipse is planned 44

Present Situation • ISP: a push-button dynamic verifier for MPI programs • Find deadlocks, resource leaks, assertion violations – Code level model checking – no manual model building – Guarantee soundness for one test input – Works for MacOS, Linux, Windows – Works for MPICH2, OpenMPI, MS MPI – Verifies 14KLOC in seconds • ISP is available for download: http://cs.utah.edu/formal_verification/ISP-release 45

RESULTS USING ISP • The only push-button model checker for MPI/C programs – (the only other model checking approach is MPI-SPIN) • Testing misses deadlocks even on a page of code – See http://www.cs.utah.edu/formal_verification/ISP_Tests • ISP is meant as a safety-net during manual optimizations – A programmer takes liberties that they would otherwise not – Value amply evident even when tuning the Matrix Mult code • Deadlock found in one test of MADRE (3K LOC) – Later found to be a known deadlock 46

RESULTS USING ISP • Handled these examples – IRS (Sequoia Benchmarks), ParMETIS (14K LOC), MADRE (3K LOC) • Working on these examples – MPI-BLAST , ADLB • There is significant outreach work remaining – The user community of MPI is receptive to FV – But they really have no experience evaluating a model checker – We are offering many tutorials this year • ICS 2009 • EuroPVM / MPI 2009 (likely) • Applying for Cluster 2009 • Applying for Super Computing 2009 47

Example of MPI Code (Mat Mat Mult) X = 48

Example of MPI Code (Mat Mat Mult) X = MPI_Bcast MPI_Bcast MPI_Bcast MPI_Bcast MPI_Bcast 49

Example of MPI Code (Mat Mat Mult) MPI_Send X = MPI_Recv 50

Example of MPI Code (Mat Mat Mult) X = 54

Example of MPI Code (Mat Mat Mult) MPI_Recv X = MPI_Send 55

Salient Code Features Master (rank 0) MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numprocs); Slaves MPI_Comm_rank(MPI_COMM_WORLD, &myid); (ranks 1-4) 59

Salient Code Features if (myid == master) { ... MPI_Bcast(b, brows*bcols, MPI_FLOAT, master, …); ... } else { // All Slaves do this ... MPI_Bcast(b, brows*bcols, MPI_FLOAT, master, …); ... } 60

Salient Code Features if (myid == master) { ... for (i = 0; i < numprocs-1; i++) { for (j = 0; j < acols; j++) { buffer[j] = a[i*acols+j]; } MPI_Send(buffer, acols, MPI_FLOAT, i+1, …); numsent++; Block till buffer is copied } into System Buffer } System Buffer else { // slaves ... while (1) { ... MPI_Recv(buffer, acols, MPI_FLOAT, master, …); ... } } 61

Handling Rows >> Processors … MPI_Recv Send Next Row to First Slave which By now must be free MPI_Send 62

Handling Rows >> Processors … MPI_Recv OR Send Next Row to First Slave that returns the answer! MPI_Send 63

Salient Code Features if (myid == master) { ... for (i = 0; i < crows; i++) { MPI_Recv(ans, ccols, MPI_FLOAT, FROM FIRST-PROCESSOR, ...); ... if (numsent < arows) { for (j = 0; j < acols; j++) { buffer[j] = a[numsent*acols+j]; } MPI_Send(buffer, acols, MPI_FLOAT, BACK TO FIRST-PROCESSOR, ...); numsent++; ... } } 64

Optimization if (myid == master) { ... for (i = 0; i < crows; i++) { MPI_Recv(ans, ccols, MPI_FLOAT, FROM ANYBODY, ...); ... if (numsent < arows) { for (j = 0; j < acols; j++) { buffer[j] = a[numsent*acols+j]; } MPI_Send(buffer, acols, MPI_FLOAT, BACK TO THAT BODY, ...); numsent++; ... } } 65

Optimization Shows that wildcard receives if (myid == master) { can arise quite naturally … ... for (i = 0; i < crows; i++) { MPI_Recv(ans, ccols, MPI_FLOAT, FROM ANYBODY, ...); ... if (numsent < arows) { for (j = 0; j < acols; j++) { buffer[j] = a[numsent*acols+j]; } MPI_Send(buffer, acols, MPI_FLOAT, BACK TO THAT BODY, ...); numsent++; ... } } 66

Further Optimization if (myid == master) { ... for (i = 0; i < crows; i++) { MPI_Recv(ans, ccols, MPI_FLOAT, FROM ANYBODY, ...); ... if (numsent < arows) { for (j = 0; j < acols; j++) { buffer[j] = a[numsent*acols+j]; } … here, wait for previous Isend (if any) to finish … MPI_Isend(buffer, acols, MPI_FLOAT, BACK TO THAT BODY, ...); numsent++; ... } } 67

Run Visual Studio ISP Plug-in Demo 68

Slides on Inspect 69

Inspect’s Workflow http://www.cs.utah.edu/~yuyang/inspect Multithreaded C Program instrumentation Executable compile request/permit Instrumented thread 1 Scheduler Program thread n Thread Library Wrapper School of Computing 5/8/2009 University of Utah 70

Overview of the source transformation done by Inspect Multithreaded C Program Inter-procedural Flow-sensitive Context-insensitive Alias Analysis Thread Escape Analysis Intra-procedural Dataflow Analysis Source code transformation Instrumented Program School of Computing University of Utah 5/8/2009 71

Result of instrumentation void *Philosopher(void *arg ) { int i ; pthread_mutex_t *tmp ; void * Philosopher(void * arg){ { int i; inspect_thread_start("Philosopher"); i = (int)arg; i = (int )arg; ... tmp = & mutexes*i % 3+; … pthread_mutex_lock(&mutexes[i%3]); inspect_mutex_lock(tmp); … ... while (1) { while (permits[i%3] == 0) { __cil_tmp43 = read_shared_0(& permits[i % 3] printf("P%d : tryget F%d\n", i, i%3); if (! __cil_tmp32) { pthread_cond_wait(...); break; } } ... __cil_tmp33 = i % 3; … permits[i%3] = 0; tmp___0 = __cil_tmp33; … ... inspect_cond_wait(...); pthread_cond_signal(&conditionVars[i%3]); } ... pthread_mutex_unlock(&mutexes[i%3]); write_shared_1(& permits[i % 3], 0); return NULL; ... } inspect_cond_signal(tmp___25); 72 ...

Inspect animation thread action request Scheduler permission DPOR State stack Program under test Visible operation interceptor Message Buffer Unix domain sockets Unix domain sockets 73

How does Inspect avoid being killed by the exponential number of thread interleavings ?? 74

p threads with n actions each: #interleavings = (n.p)! / (n!) p Thread p Thread 1 …. 1: 1: 2: 2: 3: 3: 4: 4: … … n: n: • p=R, n=1 R! interleavings • p = 3, n = 5 10 6 interleavings • p = 3, n = 6 17 * 10 6 interleavings • p = 4, n = 5 10 10 interleavings 75

the exponential number of thread interleavings ?? Ans: Inspect uses Dynamic Partial Order Reduction Basically, interleaves threads ONLY when dependencies exist between thread actions !! 76

A concrete example of interleaving reductions 77

[ NEW SLIDE ] On the HUGE importance of DPOR AFTER INSTRUMENTATION (transitions are shown as bands) BEFORE INSTRUMENTATION void *thread_A(void *arg ) // thread_B is similar void * thread_A(void* arg) { void *__retres2 ; { int __cil_tmp3 ; pthread_mutex_lock(&mutex); int __cil_tmp4 ; A_count++; pthread_mutex_unlock(&mutex); { } inspect_thread_start("thread_A"); inspect_mutex_lock(& mutex); __cil_tmp4 = read_shared_0(& A_count); void * thread_B(void * arg) __cil_tmp3 = __cil_tmp4 + 1; { write_shared_1(& A_count, __cil_tmp3); pthread_mutex_lock(&lock); inspect_mutex_unlock(& mutex); B_count++; __retres2 = (void *)0; pthread_mutex_unlock(&lock); inspect_thread_end(); } return (__retres2); } } 78

[ NEW SLIDE ] On the HUGE importance of DPOR AFTER INSTRUMENTATION (transitions are shown as bands) BEFORE INSTRUMENTATION void *thread_A(void *arg ) // thread_B is similar void * thread_A(void* arg) { void *__retres2 ; { int __cil_tmp3 ; pthread_mutex_lock(&mutex); int __cil_tmp4 ; A_count++; pthread_mutex_unlock(&mutex); { } inspect_thread_start("thread_A"); inspect_mutex_lock(& mutex); __cil_tmp4 = read_shared_0(& A_count); void * thread_B(void * arg) __cil_tmp3 = __cil_tmp4 + 1; { write_shared_1(& A_count, __cil_tmp3); pthread_mutex_lock(&lock); inspect_mutex_unlock(& mutex); B_count++; __retres2 = (void *)0; pthread_mutex_unlock(&lock); inspect_thread_end(); } return (__retres2); } } • ONE interleaving with DPOR • 252 = (10!) / (5!) 2 without DPOR 79

More eye-popping numbers • bzip2smp has 6000 lines of code split among 6 threads roughly, it has a theoretical max number of interleavings • being of the order of (6000! ) / (1000!) ^ 6 == ?? – This is the execution space that a testing tool foolishly tries to navigate – – bzip2smp with Inspect finished in 51,000 interleavings over a few hours – THIS IS THE RELEVANT SET OF INTERLEAVINGS • MORE FORMALLY: its Mazurkeiwicz trace set 80

Dynamic Partial Order Reduction (DPOR) “animatronics” P0 P1 P2 L 0 L 0 U 0 U 0 lock(y) lock(x) lock(x) L 1 L 2 ………….. ………….. ………….. U 1 U 2 L 1 unlock(y) unlock(x) unlock(x) L 2 U 1 U 2 81

Another DPOR animation (to help show how DDPOR works…) 82

{ BT }, { Done } A Simple DPOR Example {}, {} t0: lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t) 83

{ BT }, { Done } A Simple DPOR Example {}, {} t0: t0: lock lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t) 84

{ BT }, { Done } A Simple DPOR Example {}, {} t0: t0: lock lock(t) unlock(t) t0: unlock t1: lock(t) unlock(t) t2: lock(t) unlock(t) 85

{ BT }, { Done } A Simple DPOR Example {}, {} t0: t0: lock lock(t) unlock(t) t0: unlock t1: lock(t) t1: lock unlock(t) t2: lock(t) unlock(t) 86

{ BT }, { Done } A Simple DPOR Example {t1}, {t0} t0: t0: lock lock(t) unlock(t) t0: unlock t1: lock(t) t1: lock unlock(t) t2: lock(t) unlock(t) 87

{ BT }, { Done } A Simple DPOR Example {t1}, {t0} t0: t0: lock lock(t) unlock(t) t0: unlock t1: {}, {} lock(t) t1: lock unlock(t) t1: unlock t2: lock(t) t2: lock unlock(t) 88

{ BT }, { Done } A Simple DPOR Example {t1}, {t0} t0: t0: lock lock(t) unlock(t) t0: unlock t1: {t2}, {t1} lock(t) t1: lock unlock(t) t1: unlock t2: lock(t) t2: lock unlock(t) 89

{ BT }, { Done } A Simple DPOR Example {t1}, {t0} t0: t0: lock lock(t) unlock(t) t0: unlock t1: {t2}, {t1} lock(t) t1: lock unlock(t) t1: unlock t2: lock(t) t2: lock unlock(t) t2: unlock 90

{ BT }, { Done } A Simple DPOR Example {t1}, {t0} t0: t0: lock lock(t) unlock(t) t0: unlock t1: {t2}, {t1} lock(t) t1: lock unlock(t) t1: unlock t2: lock(t) t2: lock unlock(t) 91

{ BT }, { Done } A Simple DPOR Example {t1}, {t0} t0: t0: lock lock(t) unlock(t) t0: unlock t1: {t2}, {t1} lock(t) unlock(t) t2: lock(t) unlock(t) 92

{ BT }, { Done } A Simple DPOR Example {t1,t2}, {t0} t0: t0: lock lock(t) unlock(t) t0: unlock t1: {}, {t1, t2} lock(t) t2: lock unlock(t) t2: lock(t) unlock(t) 93

{ BT }, { Done } A Simple DPOR Example {t1,t2}, {t0} t0: t0: lock lock(t) unlock(t) t0: unlock t1: {}, {t1, t2} lock(t) t2: lock unlock(t) t2: unlock t2: lock(t) … unlock(t) 94

{ BT }, { Done } A Simple DPOR Example {t1,t2}, {t0} t0: t0: lock lock(t) unlock(t) t0: unlock t1: {}, {t1, t2} lock(t) unlock(t) t2: lock(t) unlock(t) 95

{ BT }, { Done } A Simple DPOR Example {t2}, {t0,t1} t0: lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t) 96

{ BT }, { Done } A Simple DPOR Example {t2}, {t0, t1} t0: t1: lock lock(t) unlock(t) t1: unlock t1: … lock(t) unlock(t) t2: lock(t) unlock(t) 97

This is how DDPOR works Once the backtrack set gets populated, ships work • description to other nodes • We obtain distributed model checking using MPI Once we figured out a crucial heuristic (SPIN 2007) we • have managed to get linear speedup….. so far…. 98

We have devised a work-distribution scheme (SPIN 2007) load balancer Request unloading report result idle node id work description worker b worker a 99

Speedup on aget 100

Practical Formal Verification of MPI and Thread Programs Ganesh - PowerPoint PPT Presentation

Practical Formal Verification of MPI and Thread Programs Ganesh Gopalakrishnan, School of Computing, University of Utah, Salt Lake City, UT 84112 A Half-Day Tutorial Proposed for ICS 2009 http:// www.cs.utah.edu / formal_verification

The MPI+MPI programming model and why we need shared-memory MPI libraries Jeff Hammond Extreme

MPI is too High-Level MPI is too Low-Level Marc Snir High-Level MPI MPI is an Application

Introduction to MPI T opics to be covered MPI vs shared memory Initializing MPI MPI

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

To thread or not to thread? Why PETSc favors MPI-only Plenary Discussion PETSc User Meeting 2016

13 IN THIS CHAPTER Benefits of Thread Pooling 308 Considerations and Costs of Thread

MPI-IO: A Retrospective Rajeev Thakur 25 th Anniversary of MPI Workshop Argonne, IL, Sept 25,

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

Programming Miscellaneous MPI-IO topics MPI-IO Errors Unlike the rest of MPI, MPI-IO errors

Formal Verification of RISC-V cores with riscv-formal Clifford Wolf CTO, Symbiotic EDA

Message Passing Programming Designing MPI Applications Overview Lecture will cover MPI

Open MPI on the Cray XT presented by Richard L. Graham Galen Shipman Open MPI Is Open

Model-Checking Acknowledgment Formal Verification Formal verification means to apply

MPI & MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

Advanced MPI USER-DEFINED DATATYPES MPI datatypes MPI datatypes are used for communication

MPI - Message Passing Interface MPI is the mostly used message passing-standard By

Optimized Schwarz Methods for Problems with Discontinuous Coefficients Olivier Dubois

Community based multi-group activity prediction and member identification Snigdha Das Indian

A Session Initiation Protocol (SIP) Load Control Event Package

Structural Loads Structural Loads Dead Loads: Gravity loads of constant magnitudes and fixed t

Geo Twitter Data Collection and Visualization System Hideyuki Fujita Graduate School of

NSRCI 2014 Maritime Strategic Surprise: Can an Emphasis on Resilience be the True Center of

THE UNIVERSE IS BIGGER THAN WE CAN IMAGINE About 2000 exoplanets have been discovered

Exoplanets: a dynamic field Alexander James Mustill Amy Bonsor, Melvyn B. Davies, Boris

Practical Formal Verification of MPI and Thread Programs Ganesh - PowerPoint PPT Presentation

Practical Formal Verification of MPI and Thread Programs Ganesh Gopalakrishnan, School of Computing, University of Utah, Salt Lake City, UT 84112 A Half-Day Tutorial Proposed for ICS 2009 http:// www.cs.utah.edu / formal_verification

The MPI+MPI programming model and why we need shared-memory MPI libraries Jeff Hammond Extreme

MPI is too High-Level MPI is too Low-Level Marc Snir High-Level MPI MPI is an Application

Introduction to MPI T opics to be covered MPI vs shared memory Initializing MPI MPI

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

To thread or not to thread? Why PETSc favors MPI-only Plenary Discussion PETSc User Meeting 2016

13 IN THIS CHAPTER Benefits of Thread Pooling 308 Considerations and Costs of Thread

MPI-IO: A Retrospective Rajeev Thakur 25 th Anniversary of MPI Workshop Argonne, IL, Sept 25,

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

Programming Miscellaneous MPI-IO topics MPI-IO Errors Unlike the rest of MPI, MPI-IO errors

Formal Verification of RISC-V cores with riscv-formal Clifford Wolf CTO, Symbiotic EDA

Message Passing Programming Designing MPI Applications Overview Lecture will cover MPI

Open MPI on the Cray XT presented by Richard L. Graham Galen Shipman Open MPI Is Open

Model-Checking Acknowledgment Formal Verification Formal verification means to apply

MPI &amp; MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

Advanced MPI USER-DEFINED DATATYPES MPI datatypes MPI datatypes are used for communication

MPI - Message Passing Interface MPI is the mostly used message passing-standard By

Optimized Schwarz Methods for Problems with Discontinuous Coefficients Olivier Dubois

Community based multi-group activity prediction and member identification Snigdha Das Indian

A Session Initiation Protocol (SIP) Load Control Event Package

Structural Loads Structural Loads Dead Loads: Gravity loads of constant magnitudes and fixed t

Geo Twitter Data Collection and Visualization System Hideyuki Fujita Graduate School of

NSRCI 2014 Maritime Strategic Surprise: Can an Emphasis on Resilience be the True Center of

THE UNIVERSE IS BIGGER THAN WE CAN IMAGINE About 2000 exoplanets have been discovered

Exoplanets: a dynamic field Alexander James Mustill Amy Bonsor, Melvyn B. Davies, Boris

MPI & MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards