Do you have to reproduce the bug on the first replay attempt? PRES: - PowerPoint PPT Presentation

Do you have to reproduce the bug on the first replay attempt? PRES: Probabilistic Replay with Execution Sketching on Multiprocessors Soyeon Park , Yuanyuan Zhou University of California, San Diego Weiwei Xiong, Zuoning Yin, Rini Kaushik, Kyu H. Lee, Shan Lu University of Illinois at Urbana Champaign

Concurrency bugs are important  Writing concurrent program is difficult  Programmers are used to sequential thinking  Concurrent programs are prone to bugs  Concurrency bugs cause severe real-world problems  Therac-25, Northeast blackout  Multi-core trend worsens the problem

Characteristics of Concurrency Bugs  A concurrency bug may need a special thread interleaving to manifest Thread 1 Thread 2 if ( buf_index + len < BUFFSIZE ) buf_index + = len; memcpy (buf[ buf_index ], log, len); Cr Crash ! Apach che Two implications :  Hard to expose a concurrency bug during testing  Difficult to reproduce a concurrency bug for diagnosis  Difficult to reproduce a concurrency bug for diagnosis

Deterministic Replay of Uniprocessor  Recording non-deterministic factors and re-execution Inputs (keyboards, networks, files, etc)  Thread scheduling  Return values of system calls  input input T1 T1 thread scheduling thread scheduling T2 T2 reproduce syscall uniprocessor the bug syscall < Production run > < Replay run >

Deterministic Replay for Multiprocessors  Much more difficult Multi-threads execute simultaneously on different processors   Extra source of non-determinism: Interleaving of shared memory accesses  T2 T3 T1 S1 S3 S1: if ( buf_index + len < BUFFSIZE ); T2 S2: buf_index T3 += len; S2 S3: memcpy (buf [ buf_ind T4 Cr Crash ! ex ], log, len); multiprocessor

State of the Art on Multiprocessor Replay  Hardware-assisted approach  Recording all thread interactions with new hardware extension  ex) Flight Data Recorder, BugNet, Strata, RTR, DMP, Rerun, etc. None of them exists in reality !  Software-only approach Not practical !  High production-run overhead (> 10-100X )  due to capturing the global order of shared memory accesses  ex) InstantReplay, Strata/s, etc.  Recent work: SMP-Revirt  use page protection mechanism to optimize memory monitoring  > 10X production-run overhead on 2 or 4 processors  has false sharing and page contention issues (scalability)

Contrast between Common Practice & Existing Research Proposals Common practice Existing research proposals Impractical ! Production run error error 0% overhead 10-100 X slowdown … Diagnosis error phase the 1 st replay attempt > 1000 replay attempts* * : according to our experimental results

Observations number of replay attempts Current practice > 1000 Existing s/ w-only I mpractical research proposals 1 Ideal case 0 10-100X production run recording overhead 1) Production run performance is more critical than replay time 2) We do NOT need to reproduce a bug on the 1 st replay attempt

Our Idea Probabilistic Replay with Execution Sketching (PRES)  Record only partial information during production run Low recording overhead  Push the complexity to diagnosis time  Leverage feedback from unsuccessful replays

PRES Overview  Probabilistic Replay via Execution Sketching (PRES) feedback replay partial complete sketches information information off-sketch detected error reproduce the bug reproduce the bug with 100% probability Sketch recording during Partial-Information based Diagnosis phase production run replay (PI-Replay)  Recording partial information (sketch) during production run  Reproducing a bug, not the original execution

Contents  Introduction  Our approach  Overview of PRES  Sketch recording  Bug reproduction  Partial-Information based replayer  Monitor  Feedback generator  Evaluation  Conclusion

Sketch Recording Higher overhead Lower overhead BASE BB BASE SYNC SYNC SYS SYS FUNC FUNC BB-N BB-N BB RW RW uni-processor optimized BB ⊂ ⊂ ⊂ ⊂ ⊂ existing deterministic s/w-only replay deterministic replay production run  BASE: Uni-processor deterministic replay  RW Thread 1 Thread 1 Thread 1 Thread 2 Thread 2 Thread 2 Thread 1 Thread 1 Thread 1 Thread 1 Thread 1 Thread 2 Thread 2 Thread 2 Thread 2 Thread 2  Existing s/w only deterministic replay for multi-processors  Inputs Subsuming relationships worker() worker() worker() worker() worker() worker() worker() worker() worker() worker() worker() worker() worker() worker() worker() worker() BASE+ SYNC + BASE + BASE + BASE+ < BB-2 > { { { { { { { { { { { { { { { {  All non-deterministic events including  Thread scheduling global order of shared global order of lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); global order of global order of global order of BASE + system calls the global order of shared memory accesses memory read / write myid = gid; myid = gid; myid = gid; myid = gid; myid = gid; myid = gid; myid=gid; myid=gid; myid=gid; myid=gid; myid=gid; myid=gid;  System calls myid = gid; myid = gid; myid=gid; myid=gid; function calls synchronization basic-blocks global order of gid = myid+1; gid = myid+1; gid = myid+1; gid = myid+1; gid = myid+1; gid = myid+1; gid = myid+1; gid=myid+1; gid=myid+1; gid=myid+1; gid=myid+1; gid=myid+1; gid=myid+1; gid=myid+1; every 2 nd basic-blocks gid = myid+1; gid=myid+1; accesses operations unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); … … … … … … … … … … … … … … … … if (myid==0) if (myid==0) if (myid==0) if (myid==0) if (myid==0) if (myid==0) result = data; result = data; result = data; result = data; result = data; result = data; } } } } } } } } } } } } } } } } sketch point tmp=result; tmp=result; tmp=result; tmp=result; tmp=result; tmp=result; tmp=result; tmp=result; print(“%d\n”, tmp); print(“%d\n”, tmp); print(“%d\n”, tmp); print(“%d\n”, tmp); print(“%d\n”, tmp); print(“%d\n”, tmp); print(“%d\n”, tmp); print(“%d\n”, tmp); wrong output!

Contents  Introduction  Our approach  Overview of PRES  Sketch recording  Bug reproduction  Partial-Information based replayer (PI-Replayer)  Monitor  Feedback generator  Evaluation  Conclusion

Partial Information-based Replay  Process of bug reproduction phase < reproduction phase > how to improve the replay lessons complete sketches stop /abort restart information feedback sketch PI-replayer monitor replayer generator recorder reproduce the bug with 100% replay probability recorder Monitor is used for:  Detecting successful bug reproduction  Detecting off-sketch path: deviates from sketches 14

lessons feedback monitor PI-replayer generator PI-replayer replay recorder  Partial-Information based replayer  Consults the execution sketch to enforce observed global orders  Right before re-executing a sketch point, make sure that all prior points from other threads have been executed lock (A) lock (A), global order 1 T1 T1 T2 T2 lock (B) lock (B), global order 2 wait for T1 to execute lock A first < Production run > < Replay run > SYNC sketches T1 : lock A, global order 1 T2 : lock B, global order 2

lessons feedback monitor PI-replayer generator Monitor replay recorder  Detect successful bug reproduction Crash failure - PRES can catch exceptions  Deadlock - a periodic timer to check for progress  Incorrect results - programmer needs to provide conditions for checking  Can leverage testing oracles and existing bug detection tools   Detect unsuccessful replay  Compare against the execution sketch from the original execution  Prevent from giving useless replay efforts on a wrong path

What if a replay attempt fails?  Replay it again!  Restart from the beginning or the previous checkpoint  Shall we do something different next time?  Random approach: just leave it to fate  Systematic approach  Actively learn from previous mistakes

lessons feedback monitor PI-replayer generator Feedback Generator (1/2) replay recorder  Why previous replays cannot reproduce a bug?  Some un-recorded data races execute in different orders 1 st replay attempt Production run Thread 1 Thread 2 Thread 1 Thread 2 worker() worker() worker() worker() { { { { … … … … } if (myid==0) tmp = result ; result = data; if (myid==0) printf (“%d\n”, tmp); } } result = data; tmp = result ; printf (“%d\n”, tmp); } fail to reproduce the bug! < FUNC sketches > This original order is not recorded in the sketch

Do you have to reproduce the bug on the first replay attempt? PRES: - PowerPoint PPT Presentation

Do you have to reproduce the bug on the first replay attempt? PRES: Probabilistic Replay with Execution Sketching on Multiprocessors Soyeon Park , Yuanyuan Zhou University of California, San Diego Weiwei Xiong, Zuoning Yin, Rini Kaushik, Kyu H.

2019 NFHS FOOTBALL RULES CHANGES POSTSEASON INSTANT REPLAY RULES 1-3-7 NOTE (NEW), TABLE 1-7

June 5, 2020 Commonwealth Credit Review Replay Information Please note that a replay of the

February 7, 2020 Commonwealth Credit Review Replay Information Please note that a replay of the

November 13, 2020 Commonwealth Credit Review Replay Information Please note that a replay of the

Industrial Bug Mining Industrial Bug Mining Extracting, Grading and Enriching the Ore of Exploits

freedom (to reproduce) john wilbanks @creativecommons @wilbanks 14 July 2011 freedom . 1.

Earnings Presentation Year ended December 2019 Replay Replay passcode 6 March 2020 0207 136

Earnings Presentation Half year ended June 2020 Replay Replay passcode 5 August 2020 0207 136

Earnings Presentation Quarter ended March 2020 Replay Replay passcode 7 May 2020 0207 136 9233

NFC Payments: The Art of Relay & Replay Attacks Who are we? Troopers 2018? NFC

Capture-Replay Tests in J2ME Testy capture-replay w rodowisku J2ME Marcin Zduniak Bartosz

Fedora Bug Triage John "poelcat" Poelstra Jon "jds2001" Stanley June 21,

Bug Driven Bug Finding Chadd C. Williams Jeffrey K. Hollingsworth University of Maryland

3/3/15 Announcement: Bug of the week (extra credit) Architectural Patterns Each group can

Bugzilla, Bug-squad and GNOME3 Presented By Akhil Laddha 1 Agenda About me Bugzilla Bug

Open Source Bug Fixes: Characterization and Dataset Prediction Data Collection Bug

The Virtual Power System Testbed (VPST) and Inter- Testbed Integration August 10, 2009 David

Management in Smart Grids Hoang Hai Nguyen 1 Rui Tan 1 David K. Y. Yau 2,1 1 Advanced Digital

Power System Resilience in the Pacific Northwest Eduardo Cotilla-Sanchez, Ph.D., Assistant

Enhancing Power System Resilience through Computational Optimization Georgios Patsakis

The View from ROC-West 3/02/2016 - 3/09/2016 Jonathan Insler LSU December 2, 2015 Overview 1

CASE STUDY 2 FAINTING ATTACKS FAINTING ANYTHING FROM A NUISANCE TO SERIOUS A 55 year old man

CNA e Tool v 3.0: Ge tting Sta rte d Office of Multifamily Housing Programs We bina r Logistics

Smart Grid Customer Privacy TCIPG Consumer Acceptance of Smart Grid October 31 , 2012 Kevin B.

Do you have to reproduce the bug on the first replay attempt? PRES: - PowerPoint PPT Presentation

Do you have to reproduce the bug on the first replay attempt? PRES: Probabilistic Replay with Execution Sketching on Multiprocessors Soyeon Park , Yuanyuan Zhou University of California, San Diego Weiwei Xiong, Zuoning Yin, Rini Kaushik, Kyu H.

2019 NFHS FOOTBALL RULES CHANGES POSTSEASON INSTANT REPLAY RULES 1-3-7 NOTE (NEW), TABLE 1-7

June 5, 2020 Commonwealth Credit Review Replay Information Please note that a replay of the

February 7, 2020 Commonwealth Credit Review Replay Information Please note that a replay of the

November 13, 2020 Commonwealth Credit Review Replay Information Please note that a replay of the

Industrial Bug Mining Industrial Bug Mining Extracting, Grading and Enriching the Ore of Exploits

freedom (to reproduce) john wilbanks @creativecommons @wilbanks 14 July 2011 freedom . 1.

Earnings Presentation Year ended December 2019 Replay Replay passcode 6 March 2020 0207 136

Earnings Presentation Half year ended June 2020 Replay Replay passcode 5 August 2020 0207 136

Earnings Presentation Quarter ended March 2020 Replay Replay passcode 7 May 2020 0207 136 9233

NFC Payments: The Art of Relay &amp; Replay Attacks Who are we? Troopers 2018? NFC

Capture-Replay Tests in J2ME Testy capture-replay w rodowisku J2ME Marcin Zduniak Bartosz

Fedora Bug Triage John &quot;poelcat&quot; Poelstra Jon &quot;jds2001&quot; Stanley June 21,

Bug Driven Bug Finding Chadd C. Williams Jeffrey K. Hollingsworth University of Maryland

3/3/15 Announcement: Bug of the week (extra credit) Architectural Patterns Each group can

Bugzilla, Bug-squad and GNOME3 Presented By Akhil Laddha 1 Agenda About me Bugzilla Bug

Open Source Bug Fixes: Characterization and Dataset Prediction Data Collection Bug

The Virtual Power System Testbed (VPST) and Inter- Testbed Integration August 10, 2009 David

Management in Smart Grids Hoang Hai Nguyen 1 Rui Tan 1 David K. Y. Yau 2,1 1 Advanced Digital

Power System Resilience in the Pacific Northwest Eduardo Cotilla-Sanchez, Ph.D., Assistant

Enhancing Power System Resilience through Computational Optimization Georgios Patsakis

The View from ROC-West 3/02/2016 - 3/09/2016 Jonathan Insler LSU December 2, 2015 Overview 1

CASE STUDY 2 FAINTING ATTACKS FAINTING ANYTHING FROM A NUISANCE TO SERIOUS A 55 year old man

CNA e Tool v 3.0: Ge tting Sta rte d Office of Multifamily Housing Programs We bina r Logistics

Smart Grid Customer Privacy TCIPG Consumer Acceptance of Smart Grid October 31 , 2012 Kevin B.

NFC Payments: The Art of Relay & Replay Attacks Who are we? Troopers 2018? NFC

Fedora Bug Triage John "poelcat" Poelstra Jon "jds2001" Stanley June 21,