of concurrency bugs
play

of Concurrency Bugs Yan Cai ( ) ycai.mail@gmail.com State Key Lab. - PowerPoint PPT Presentation

ISHCS 2016 (International Symposium on High Confidence Software), PKU, Beijing, Dec. 18, 2016 Probabilistic Detection and Sampling of Concurrency Bugs Yan Cai ( ) ycai.mail@gmail.com State Key Lab. of Computer Science, I nstitute of S


  1. ISHCS 2016 (International Symposium on High Confidence Software), PKU, Beijing, Dec. 18, 2016 Probabilistic Detection and Sampling of Concurrency Bugs Yan Cai ( 蔡彦 ) ycai.mail@gmail.com State Key Lab. of Computer Science, I nstitute of S oftware, C hinese A cademy of S ciences 中科院软件所 · 计算机科学国家重点实验室

  2. Radius-aware Probabilistic Deadlock detection ASE’16 Yan Cai and Zijiang Yang

  3. Locks and Deadlocks Thread 1 Thread 1 Thread 2 Thread 2  Read Deadlock Write Read Write Read Write Data Data 2 1 Thread t 1 Thread t 2 acq ( m ) acq ( n ) acq ( n ) acq ( m ) 3

  4. Deadlock Testing • Random testing – OS scheduling + random manipulation – Stress testing – Heuristic directed random testing – Systematic scheduling No Guarantee to find a concurrency bug (e.g., Deadlock) 4

  5. PCT – Probabilistic Concurrency Testing • PCT Algorithm – Mathematical randomness with Probabilistic Guarantees 1 n : #threads, k : #events, d : bug depth 𝑜 × 𝑙 𝑒−1 Thread t 1 Thread t 2   k =8, n =2, d =2 s 01 acq ( m ) 1  s 05 acq ( n ) 2 × 8 2−1 = 1/16 s 02 acq ( n ) s 06 acq ( m ) s 03 rel ( n ) s 04 rel ( m ) s 07 rel ( m ) s 08 rel ( n ) 5

  6. PCT – Probabilistic Concurrency Testing • PCT : – Intuition of guaranteed probability: 1. satisfy the 1 st order by assigning the thread a largest priority ( 1/𝑜 ) 2. select d – 1 priority change points at the remaining d – 1 order 1 1 position ( 1/𝑙 × 1/k × … × 1/𝑙 = 𝑙 𝑒−1 ) ⇒ 𝑜×𝑙 𝑒−1 Thread t 1 Thread t 2 k =8, n =2, d =2 s 01 acq ( m )  s 05 acq ( n ) 1 2 × 8 2−1 = 1/16 s 02 acq ( n ) acq ( m ) s 06 s 03 rel ( n ) s 04 rel ( m ) rel ( m ) s 07 s 08 rel ( n ) 6

  7. PCT – Probabilistic Concurrency Testing • Provide a guarantee (a probability ): Threads t 1 , t 2 , … t n , … 1 n : #threads, k : #events, d : bug depth … 𝑜 × 𝑙 𝑒−1 … Execution But … • Theoretical model, not consider thread interaction: real executions do not follow designed executions • Guaranteed probability decreases exponentially with increase of bug 1 depth: due to factor 𝑙 𝑒−1 . (a) Uniform distribution 7

  8. RPro- Radius aware • Our approach: RPro – Radius aware Probabilistic testing Threads t 1 , t 2 , … t n • Consider thread interaction Threads t 1 , t 2 , … t n , … … … … • Guaranteed probability Execution 1 𝑠 (not 1 𝑙 , r ≪ k ) decreases: 1 1 𝑜 × 𝑙 𝑒−1 𝑜 × 𝑙 × 𝑠 𝑒−2 (a) Uniform distribution PCT v.s. RPro 8

  9. RPro- Radius aware • RPro: Theoretical guarantee Probability PCT : Guaranteed probability RPro : Guaranteed probability RPro : Probability in practice 𝑜 × 𝑙 × 𝑠 𝑒−2 1 𝑜 × 𝑙 𝑒−1 1 0 0 Bug Radius r bug – 1 r bug r = k How to find r bug ? 9

  10. 0.07 0.05 r =17, p =0.0439 r =3, p =0.0632 PCT 0.06 0.04 Experiment RPro 0.05 0.03 0.04 0.02 p = 0.0385 0.03 0.01 p =0.0020 0.02 0.00 0 15 30 45 60 75 90 105 120 135 150 0 15 30 45 60 75 90 105 120 135 150 Probability PCT : Guaranteed probability (b) JDBC-2 (a) JDBC-1 RPro : Guaranteed probability 0.03 0.12 • Results r= 5, p= 0.1123 r =11, p =0.0229 RPro : Probability in practice 0.11 0.02 0.10 0.02 0.09 𝑜 × 𝑙 × 𝑠 𝑒−2 1 0.01 0.08 0.01 0.07 p = 0.0005 p = 0.0680 0.00 0.06 0 15 30 45 60 75 90 105 120 135 150 0 15 30 45 60 75 90 105 120 135 150 𝑜 × 𝑙 𝑒−1 1 0 (c) JDBC-3 (d) JDBC-4 0.50 0.70 r= 2, p= 0.453 r= 2, p= 0.6863 0 0.45 0.65 Bug Radius 0.40 0.60 r bug – 1 r bug r = k 0.35 0.55 0.30 0.50 0.25 p = 0.4326 0.45 Table 1. The best radiuses ( r best ) of each benchmarks. 0.20 p = 0.1755 0.15 0.40 𝒔 𝒄𝒇𝒕𝒖 0 15 30 45 60 75 90 105 120 135 150 0 15 30 45 60 75 90 105 120 135 150 # # bug # 𝒇𝒘𝒇𝒐𝒖𝒕 Probability (e) Hawknl (f) SQLite Benchmark depth 𝒔 𝒄𝒇𝒕𝒖 * events threads 0.0024 0.0300 r =47, p =0.0022 r =27, p =0.0256 0.0250 Hawknl 28 3 3 2 - 0.4530 0.0019 0.0200 0.0014 SQLite 16 3 3 2 - 0.6863 0.0150 0.0009 JDBC-2 5,050 3 3 3 0.059% 0.0632 p = 0.0088 0.0100 0.0004 JDBC-4 5,090 3 3 5 0.098% 0.1123 p = 0.0004 0.0050 -0.0001 JDBC-3 5,080 3 3 11 0.217% 0.0229 0 50 100 150 200 250 300 0 15 30 45 60 75 90 105 120 135 150 (g) MySQL-1 (h) MySQL-2 JDBC-1 5,088 3 3 17 0.334% 0.0439 0.0049 0.0069 r= 20, p= 0.0062 r= 114, p= 0.0039 MySQL-4 444,621 19 3 20 0.005% 0.0062 0.0059 0.0039 0.0049 MySQL-2 15,066 17 3 27 0.179% 0.0256 0.0029 0.0039 MySQL-1 19,300 16 3 47 0.244% 0.0022 0.0029 0.0019 0.0019 MySQL-3 406,117 22 6 114 0.028% 0.0039 0.0009 0.0009 p = 0.0000 p = 0.0000 10 -0.0001 -0.0001 (* All rows are sorted on the data in this column.) 0 50 100 150 200 250 300 0 15 30 45 60 75 90 105 120 135 150 (i) MySQL-3 (j) MySQL-4

  11. Deployable Data Race Sampling FSE’16 Yan Cai , Jian Zhang, Lingwei Cao, and Jian Liu

  12. Concurrency bugs • Difficult to detect – Non-determinism (space explosion) – Inadequate test inputs – … • Even after software release, concurrency bugs may still occur 12

  13. Concurrency bugs • It is necessary to detect concurrency bugs in deployed products • Challenges: Detector not to disturb normal executions – light-weighted <5% overhead – … Sample user executions 13

  14. Existing works • Data Race Two threads concurrently access the same memory location and at least one access is a write. • Happens-before (HB Race) • Access pairs not ordered by happens-before relation (HBR) Thread t 1 Thread t 2 Thread t 1 Thread t 2 x++; x++; sync(m){} sync(m) sync(m) sync(m){} {x++;} {x++;} Value of x: +1 or +2? Value of x: +2. 14

  15. Existing works • Happens-before Races – Track full Happens-before relation • Incurring many O(n) operations 0% sampling rate => ~30% overhead (Pacer, PLDI’10) ~15% in our experiment Insight 1: Not to track Full Happens-before Relation 15

  16. Existing works • Hardware based (e.g., DataCollider , OSDI’10) – Code Breakpoints and Data Breakpoints (or Watchpoints ) – Collision Races • A data race: two accesses – Select a memory address => Set a data breakpoint => Wait for the breakpoint to be fired – The waiting time directly increases the sampling overhead Insight 2: Not to directly delay executions 16

  17. Existing works • … • See our paper for more insights 17

  18. Our Proposal • Clock Race – For data race sampling purpose • CRSampler – To detect clock races 18

  19. Clock Race • Clock Race – Thread-local clock : an integer for each thread, increased on synchronization operation. – Two accesses (with at least a write) form a Clock Race if: at least one thread-local clock is not changed in between the two accesses Thread 1 Thread 2 Thread 1 Thread 2 time 1 time 1 1 1 Time elapse Time elapse sync sync time 2 time 2 2 2 19 1 𝑙 is not changed between time 1 and time 2 . 1 𝑙 No clock races

  20. Clock Race • A Quick Demonstration Maintain thread-local clocks Thread 1 Thread 2 1 𝑙 2 𝑙 10 8 acquire ( l ) onSync( ); acquire ( k ) onSync( ); 11 9 … x = 0; sample( x ); 11 9 … 11 9 release ( k ) onSync( ); 11 10 x ++ ; Sampled access 11 10 release ( l ) onSync( ); 12 On this read, t 1 .clock remains 11, a clock race on x is reported 20

  21. Clock Race • Clock Race – Race checking does not need to delay any thread. – But: after e 1 appears, how much time is required to check two accesses? • Given a short time, it is not enough to trap the second access. • Given a long time, all threads’ lock clocks are changed. Thread 1 Thread 2 time 1 1 Time elapse One second, or … time 2 2 1 𝑙 is not changed between time 1 and time 2 . 21

  22. Setup • Implementation – Jikes RVM – Sampling: Java class load time – Memory accesses  Linux Kernel Execution On firing Core of Netlink User-site Kernel CPU DC/CR Com. Agent Site Set breakpoints JikesRVM User space Kernel space • Benchmarks – Dacapo benchmark suite 22

  23. Setup • Comparisons – Sampling rate: 0.1% to 1.0% – Pacer (PLDI’10) – Data Collider (OSDI’10) DC 15 , DC 30 15ms, 30ms – CRSampler CR 15 , CR 30 • ThinkPad Workstation – I7-4710MQ CPU, four cores, 16G memory, 250G SSD 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend