 
              Automated Repair of Concurrency Bugs Ben Liblit with Guoliang Jin and Shan Lu
We need reliable software  People’s daily life now depends on reliable software  Software companies spend lots of resources on debugging  More than 50% effort on finding and fixing bugs  Around $300 billion per year 2
Concurrency bugs hurt  It is an increasingly parallel world  Concurrency bugs in history 3
Multi-threaded program  Concurrent programs under the shared-memory model  Programs execute multiple interacting threads in parallel  Threads communicate via shared memory  Shared-memory accesses should be well-synchronized thread1 thread2 thread3 thread4 core1 core2 core3 core4 cache cache cache cache Multicore chip shared memory 4
An example concurrency bug The interleaving space Thread 1 Thread 2 if ( ptr != NULL) { Bad ptr ->field = 1; Thread 1 Thread 1 Thread 2 Thread 2 Huge } interleavings ptr = NULL; if ( ptr != NULL) { if ( ptr != NULL) { Interleaving ptr = NULL; ptr = NULL; ptr ->field = 1; ptr ->field = 1; space } } Segmentation Thread 1 Thread 2 Fault ptr = NULL; if ( ptr != NULL) { ptr ->field = 1; } Previous research focuses on finding 5
Bug fixing  Software quality does not improve until bugs are fixed  Manual concurrency bug fixing is  time-consuming: 73 days on average  error-prone: 39% patches are buggy in the first release  CFix : automated concurrency- bug fixing [PLDI’11*, OSDI’12] *SIGPLAN:  Program behaves correctly if bad interleavings do not occur  Fix concurrency bugs by disabling bad interleavings “one of the first papers to attack the problem of automated bug fixing” 6
Automated fixing is difficult Description: Patch: ? Symptom Correctness Triggering condition Performance … Simplicity  What is the correct behavior?  Usually requires developers’ knowledge  How to get the correct behavior?  Correct program states under bug-triggering inputs  No change to program states under other inputs 7
Automated concurrency-bug fixing? Description: Patch: ? Symptom Correctness Triggering condition Performance … Simplicity  What is the correct behavior?  The program state is already correct as long as the buggy interleaving does not occur  How to get the correct behavior?  Only need to disable failure-inducing interleavings  Can leverage well-defined synchronization operations 8
Description: Description: Patch: How to get a ? general solution Interleavings that Symptom Correctness that generates Triggering condition lead to software Performance good patches? failure … Simplicity atomicity violation order violation detectors detectors p A r B ParkASPLOS’09, ZhangASPLOS’10, FlanaganPOPL’04, LuciaMICRO’09, c LuASPLOS’06, YuISCA’09, ChewEuroSys’10 GaoASPLOS’11 data race detectors abnormal data flow detectors SenPLDI’08, I 1 W b SavageTOCS’97, I 2 R ZhangASPLOS’11, YuSOSP’05, W g ShiOOPSLA’10 EricksonOSDI’10, KasikciASPLOS’10 9
Description: Patch: CFix Interleavings that Correctness lead to software Performance failure Simplicity Fix-Strategy Bug reports Source code Design Mutual exclusion Mutual exclusion Synchronization . . . Order Order Enforcement Patch Testing Patched binary Patched binary . . . Patched binary Patched binary & Selection Patch Selected binary . . . Selected binary Merging Run-time Merged binary Support Final patched binary 10
Contributions  Show the feasibility of Fix-Strategy Design automated fixing for non- Synchronization deadlock concurrency bugs Enforcement  Techniques that enforce Patch Testing mutual exclusion and order & Selection relationship Patch  A framework that assembles Merging a set of techniques to Run-time automate the whole bug- Support fixing process: CFix 11
CFix: fix-strategy design Fix-Strategy Challenges: Design  Huge variety of bugs Synchronization Enforcement Patch Testing & Selection Patch Merging Run-time Support 12
Two types of synchronization relationships Mutual Exclusion Order Relationship  Why these two?  Basic relationships can be achieved by typical synchronizations  Based on real-world concurrency bug characteristics study 13
Fix-strategy for atomicity-violation detectors example 1 Thread 1 Thread 2 if ( ptr != NULL) { ptr = NULL; ptr ->field = 1; } 14
Fix-strategy for atomicity-violation detectors example 2 Thread 1 Thread 2 ptr ->field = 1; ptr = NULL; ptr ->field = 1; 15
CFix: fix-strategy design Fix-Strategy Challenges: Design  Inaccurate root cause Synchronization  Huge variety of bugs Enforcement Solution: Patch Testing & Selection  A combination of Patch mutual exclusion & Merging order relationship Run-time enforcement Support 16
Fix-strategies AV Detector OV Detector Race Detector DU Detector p W b A I 1 B R r I 2 W g c 17
CFix: synchronization enforcement Fix-Strategy Challenges: Design  Correctness, performance, Synchronization and simplicity Enforcement Solution: Patch Testing  Mutual exclusion & Selection Patch enforcement: AFix [PLDI’11] Merging  Order relationship Run-time enforcement: OFix [OSDI’12] Support 18
Mutual exclusion relationship  Input: three statements ( p , c , r ) with contexts Thread 1 Thread 2 p if ( ptr != NULL) { r ptr = NULL; c ptr ->field = 1; }  Goal: making the code region from p to c be mutually exclusive with r 19
Mutual exclusion enforcement: AFix  Approach: lock p r c  Principles:  Correctly paired lock acquisition and release operations  Small critical section 20
Put p and c into a critical section: naïve  A naïve solution  Add lock on edges reaching p  Add unlock on edges leaving c  Potential new bugs p p p p  Could lock without unlock  Could unlock without lock  etc. c c c c 21
Put p and c into a critical section: AFix  Assume p and c are in the same function f  Step 1: find protected nodes in critical section  Step 2: add lock operations p  unprotected node  protected node  protected node  unprotected node c  Avoid those potential bugs mentioned 22
Subtle details  p and c adjustment when they are in different functions  Observation: people put lock and unlock in one function  Find the longest common prefix of p ’s and c ’s stack traces  Adjust p and c accordingly  Put r into a critical section  Do nothing if we can reach r from the p – c critical section  Lock type:  Lock with timeout: if critical section has blocking operations  Reentrant lock: if recursion is possible within critical section 23
Order relationship  Input: two statements (A, B) with contexts  There could be multiple instances of A in one thread  There could be multiple threads that could execute A  There could be no instance of A during the whole execution  Goal: making A execute before B 24
Order relationship: two sub-types A i … ? A B … A j A 1 A 1 initialization B B … use … destroy read A n A n firstA-B allA-B 25
OFix allA-B enforcement  Approach: condition variable and flag  Insert signal operations in A-threads  Insert wait operation before B  Principles  A-thread signals exactly once when it will not execute more A  A-thread signals as soon as possible  B proceeds when each A-thread has signaled 26
OFix allA-B enforcement: A side How to identify the last A instance in one thread . . .; for (. . .) . . . ; // A A . . .;  Each thread that executes A  exactly once as soon as it can execute no more A 27
OFix allA-B enforcement: A side How to identify the last thread that executes A void main() { void thr_main() { for (. . .) for (. . .) counter for thread_create(thr_main); . . . ; // A signal threads . . .; . . .; } } void ofix_signal() { mutex_lock(L); =1 --; thread if ( == 0) A _create cond_broadcast(con); ++ mutex_unlock(L); } 28
OFix allA-B enforcement: B side  Safe to execute only when is 0 void ofix_wait() { mutex_lock(L); if ( != 0) cond_timedwait(con, L, t); B mutex_unlock(L); }  Give up if OFix knows that it introduces new deadlock  Timed wait-operation to mask potential deadlocks 29
OFix firstA-B  Basic enforcement A B  When A may not execute  Add a safety-net of signal with allA-B algorithm 30
CFix: patch testing & selection Fix-Strategy Design Synchronization Enforcement Patch Testing Challenge: & Selection  Multi-thread software Patch testing Merging Solution: Run-time  CFix-patch oriented testing Support 31
Patch testing principles  Prune incorrect patches  Patches causing failures due to wrong fix strategies, etc.  Prune slow patches  Prune complicated patches  Not exhaustive testing, but patch oriented testing  Leverage existing testing techniques, with extra heuristics 32
Run once without external perturbation  Reject if there is a time-out or failure  Patches fixing wrong root cause  Make software to fail deterministically Thread 1 Thread 2 ptr ->field = 1; ptr = NULL; ptr ->field = 1; 33
Recommend
More recommend