do you have to reproduce the bug on the first replay
play

Do you have to reproduce the bug on the first replay attempt? PRES: - PowerPoint PPT Presentation

Do you have to reproduce the bug on the first replay attempt? PRES: Probabilistic Replay with Execution Sketching on Multiprocessors Soyeon Park , Yuanyuan Zhou University of California, San Diego Weiwei Xiong, Zuoning Yin, Rini Kaushik, Kyu H.


  1. Do you have to reproduce the bug on the first replay attempt? PRES: Probabilistic Replay with Execution Sketching on Multiprocessors Soyeon Park , Yuanyuan Zhou University of California, San Diego Weiwei Xiong, Zuoning Yin, Rini Kaushik, Kyu H. Lee, Shan Lu University of Illinois at Urbana Champaign

  2. Concurrency bugs are important  Writing concurrent program is difficult  Programmers are used to sequential thinking  Concurrent programs are prone to bugs  Concurrency bugs cause severe real-world problems  Therac-25, Northeast blackout  Multi-core trend worsens the problem

  3. Characteristics of Concurrency Bugs  A concurrency bug may need a special thread interleaving to manifest Thread 1 Thread 2 if ( buf_index + len < BUFFSIZE ) buf_index + = len; memcpy (buf[ buf_index ], log, len); Cr Crash ! Apach che Two implications :  Hard to expose a concurrency bug during testing  Difficult to reproduce a concurrency bug for diagnosis  Difficult to reproduce a concurrency bug for diagnosis

  4. Deterministic Replay of Uniprocessor  Recording non-deterministic factors and re-execution Inputs (keyboards, networks, files, etc)  Thread scheduling  Return values of system calls  input input T1 T1 thread scheduling thread scheduling T2 T2 reproduce syscall uniprocessor the bug syscall < Production run > < Replay run >

  5. Deterministic Replay for Multiprocessors  Much more difficult Multi-threads execute simultaneously on different processors   Extra source of non-determinism: Interleaving of shared memory accesses  T2 T3 T1 S1 S3 S1: if ( buf_index + len < BUFFSIZE ); T2 S2: buf_index T3 += len; S2 S3: memcpy (buf [ buf_ind T4 Cr Crash ! ex ], log, len); multiprocessor

  6. State of the Art on Multiprocessor Replay  Hardware-assisted approach  Recording all thread interactions with new hardware extension  ex) Flight Data Recorder, BugNet, Strata, RTR, DMP, Rerun, etc. None of them exists in reality !  Software-only approach Not practical !  High production-run overhead (> 10-100X )  due to capturing the global order of shared memory accesses  ex) InstantReplay, Strata/s, etc.  Recent work: SMP-Revirt  use page protection mechanism to optimize memory monitoring  > 10X production-run overhead on 2 or 4 processors  has false sharing and page contention issues (scalability)

  7. Contrast between Common Practice & Existing Research Proposals Common practice Existing research proposals Impractical ! Production run error error 0% overhead 10-100 X slowdown … Diagnosis error phase the 1 st replay attempt > 1000 replay attempts* * : according to our experimental results

  8. Observations number of replay attempts Current practice > 1000 Existing s/ w-only I mpractical research proposals 1 Ideal case 0 10-100X production run recording overhead 1) Production run performance is more critical than replay time 2) We do NOT need to reproduce a bug on the 1 st replay attempt

  9. Our Idea Probabilistic Replay with Execution Sketching (PRES)  Record only partial information during production run Low recording overhead  Push the complexity to diagnosis time  Leverage feedback from unsuccessful replays

  10. PRES Overview  Probabilistic Replay via Execution Sketching (PRES) feedback replay partial complete sketches information information off-sketch detected error reproduce the bug reproduce the bug with 100% probability Sketch recording during Partial-Information based Diagnosis phase production run replay (PI-Replay)  Recording partial information (sketch) during production run  Reproducing a bug, not the original execution

  11. Contents  Introduction  Our approach  Overview of PRES  Sketch recording  Bug reproduction  Partial-Information based replayer  Monitor  Feedback generator  Evaluation  Conclusion

  12. Sketch Recording Higher overhead Lower overhead BASE BB BASE SYNC SYNC SYS SYS FUNC FUNC BB-N BB-N BB RW RW uni-processor optimized BB ⊂ ⊂ ⊂ ⊂ ⊂ existing deterministic s/w-only replay deterministic replay production run  BASE: Uni-processor deterministic replay  RW Thread 1 Thread 1 Thread 1 Thread 2 Thread 2 Thread 2 Thread 1 Thread 1 Thread 1 Thread 1 Thread 1 Thread 2 Thread 2 Thread 2 Thread 2 Thread 2  Existing s/w only deterministic replay for multi-processors  Inputs Subsuming relationships worker() worker() worker() worker() worker() worker() worker() worker() worker() worker() worker() worker() worker() worker() worker() worker() BASE+ SYNC + BASE + BASE + BASE+ < BB-2 > { { { { { { { { { { { { { { { {  All non-deterministic events including  Thread scheduling global order of shared global order of lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); lock (L); global order of global order of global order of BASE + system calls the global order of shared memory accesses memory read / write myid = gid; myid = gid; myid = gid; myid = gid; myid = gid; myid = gid; myid=gid; myid=gid; myid=gid; myid=gid; myid=gid; myid=gid;  System calls myid = gid; myid = gid; myid=gid; myid=gid; function calls synchronization basic-blocks global order of gid = myid+1; gid = myid+1; gid = myid+1; gid = myid+1; gid = myid+1; gid = myid+1; gid = myid+1; gid=myid+1; gid=myid+1; gid=myid+1; gid=myid+1; gid=myid+1; gid=myid+1; gid=myid+1; every 2 nd basic-blocks gid = myid+1; gid=myid+1; accesses operations unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); unlock (L); … … … … … … … … … … … … … … … … if (myid==0) if (myid==0) if (myid==0) if (myid==0) if (myid==0) if (myid==0) result = data; result = data; result = data; result = data; result = data; result = data; } } } } } } } } } } } } } } } } sketch point tmp=result; tmp=result; tmp=result; tmp=result; tmp=result; tmp=result; tmp=result; tmp=result; print(“%d\n”, tmp); print(“%d\n”, tmp); print(“%d\n”, tmp); print(“%d\n”, tmp); print(“%d\n”, tmp); print(“%d\n”, tmp); print(“%d\n”, tmp); print(“%d\n”, tmp); wrong output!

  13. Contents  Introduction  Our approach  Overview of PRES  Sketch recording  Bug reproduction  Partial-Information based replayer (PI-Replayer)  Monitor  Feedback generator  Evaluation  Conclusion

  14. Partial Information-based Replay  Process of bug reproduction phase < reproduction phase > how to improve the replay lessons complete sketches stop /abort restart information feedback sketch PI-replayer monitor replayer generator recorder reproduce the bug with 100% replay probability recorder Monitor is used for:  Detecting successful bug reproduction  Detecting off-sketch path: deviates from sketches 14

  15. lessons feedback monitor PI-replayer generator PI-replayer replay recorder  Partial-Information based replayer  Consults the execution sketch to enforce observed global orders  Right before re-executing a sketch point, make sure that all prior points from other threads have been executed lock (A) lock (A), global order 1 T1 T1 T2 T2 lock (B) lock (B), global order 2 wait for T1 to execute lock A first < Production run > < Replay run > SYNC sketches T1 : lock A, global order 1 T2 : lock B, global order 2

  16. lessons feedback monitor PI-replayer generator Monitor replay recorder  Detect successful bug reproduction Crash failure - PRES can catch exceptions  Deadlock - a periodic timer to check for progress  Incorrect results - programmer needs to provide conditions for checking  Can leverage testing oracles and existing bug detection tools   Detect unsuccessful replay  Compare against the execution sketch from the original execution  Prevent from giving useless replay efforts on a wrong path

  17. What if a replay attempt fails?  Replay it again!  Restart from the beginning or the previous checkpoint  Shall we do something different next time?  Random approach: just leave it to fate  Systematic approach  Actively learn from previous mistakes

  18. lessons feedback monitor PI-replayer generator Feedback Generator (1/2) replay recorder  Why previous replays cannot reproduce a bug?  Some un-recorded data races execute in different orders 1 st replay attempt Production run Thread 1 Thread 2 Thread 1 Thread 2 worker() worker() worker() worker() { { { { … … … … } if (myid==0) tmp = result ; result = data; if (myid==0) printf (“%d\n”, tmp); } } result = data; tmp = result ; printf (“%d\n”, tmp); } fail to reproduce the bug! < FUNC sketches > This original order is not recorded in the sketch

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend