of Concurrency Bugs Yan Cai ( ) ycai.mail@gmail.com State Key Lab. - PowerPoint PPT Presentation

ISHCS 2016 (International Symposium on High Confidence Software), PKU, Beijing, Dec. 18, 2016 Probabilistic Detection and Sampling of Concurrency Bugs Yan Cai ( 蔡彦 ) ycai.mail@gmail.com State Key Lab. of Computer Science, I nstitute of S oftware, C hinese A cademy of S ciences 中科院软件所 · 计算机科学国家重点实验室

Radius-aware Probabilistic Deadlock detection ASE’16 Yan Cai and Zijiang Yang

Locks and Deadlocks Thread 1 Thread 1 Thread 2 Thread 2  Read Deadlock Write Read Write Read Write Data Data 2 1 Thread t 1 Thread t 2 acq ( m ) acq ( n ) acq ( n ) acq ( m ) 3

Deadlock Testing • Random testing – OS scheduling + random manipulation – Stress testing – Heuristic directed random testing – Systematic scheduling No Guarantee to find a concurrency bug (e.g., Deadlock) 4

PCT – Probabilistic Concurrency Testing • PCT Algorithm – Mathematical randomness with Probabilistic Guarantees 1 n : #threads, k : #events, d : bug depth 𝑜 × 𝑙 𝑒−1 Thread t 1 Thread t 2   k =8, n =2, d =2 s 01 acq ( m ) 1  s 05 acq ( n ) 2 × 8 2−1 = 1/16 s 02 acq ( n ) s 06 acq ( m ) s 03 rel ( n ) s 04 rel ( m ) s 07 rel ( m ) s 08 rel ( n ) 5

PCT – Probabilistic Concurrency Testing • PCT : – Intuition of guaranteed probability: 1. satisfy the 1 st order by assigning the thread a largest priority ( 1/𝑜 ) 2. select d – 1 priority change points at the remaining d – 1 order 1 1 position ( 1/𝑙 × 1/k × … × 1/𝑙 = 𝑙 𝑒−1 ) ⇒ 𝑜×𝑙 𝑒−1 Thread t 1 Thread t 2 k =8, n =2, d =2 s 01 acq ( m )  s 05 acq ( n ) 1 2 × 8 2−1 = 1/16 s 02 acq ( n ) acq ( m ) s 06 s 03 rel ( n ) s 04 rel ( m ) rel ( m ) s 07 s 08 rel ( n ) 6

PCT – Probabilistic Concurrency Testing • Provide a guarantee (a probability ): Threads t 1 , t 2 , … t n , … 1 n : #threads, k : #events, d : bug depth … 𝑜 × 𝑙 𝑒−1 … Execution But … • Theoretical model, not consider thread interaction: real executions do not follow designed executions • Guaranteed probability decreases exponentially with increase of bug 1 depth: due to factor 𝑙 𝑒−1 . (a) Uniform distribution 7

RPro- Radius aware • Our approach: RPro – Radius aware Probabilistic testing Threads t 1 , t 2 , … t n • Consider thread interaction Threads t 1 , t 2 , … t n , … … … … • Guaranteed probability Execution 1 𝑠 (not 1 𝑙 , r ≪ k ) decreases: 1 1 𝑜 × 𝑙 𝑒−1 𝑜 × 𝑙 × 𝑠 𝑒−2 (a) Uniform distribution PCT v.s. RPro 8

RPro- Radius aware • RPro: Theoretical guarantee Probability PCT : Guaranteed probability RPro : Guaranteed probability RPro : Probability in practice 𝑜 × 𝑙 × 𝑠 𝑒−2 1 𝑜 × 𝑙 𝑒−1 1 0 0 Bug Radius r bug – 1 r bug r = k How to find r bug ? 9

0.07 0.05 r =17, p =0.0439 r =3, p =0.0632 PCT 0.06 0.04 Experiment RPro 0.05 0.03 0.04 0.02 p = 0.0385 0.03 0.01 p =0.0020 0.02 0.00 0 15 30 45 60 75 90 105 120 135 150 0 15 30 45 60 75 90 105 120 135 150 Probability PCT : Guaranteed probability (b) JDBC-2 (a) JDBC-1 RPro : Guaranteed probability 0.03 0.12 • Results r= 5, p= 0.1123 r =11, p =0.0229 RPro : Probability in practice 0.11 0.02 0.10 0.02 0.09 𝑜 × 𝑙 × 𝑠 𝑒−2 1 0.01 0.08 0.01 0.07 p = 0.0005 p = 0.0680 0.00 0.06 0 15 30 45 60 75 90 105 120 135 150 0 15 30 45 60 75 90 105 120 135 150 𝑜 × 𝑙 𝑒−1 1 0 (c) JDBC-3 (d) JDBC-4 0.50 0.70 r= 2, p= 0.453 r= 2, p= 0.6863 0 0.45 0.65 Bug Radius 0.40 0.60 r bug – 1 r bug r = k 0.35 0.55 0.30 0.50 0.25 p = 0.4326 0.45 Table 1. The best radiuses ( r best ) of each benchmarks. 0.20 p = 0.1755 0.15 0.40 𝒔 𝒄𝒇𝒕𝒖 0 15 30 45 60 75 90 105 120 135 150 0 15 30 45 60 75 90 105 120 135 150 # # bug # 𝒇𝒘𝒇𝒐𝒖𝒕 Probability (e) Hawknl (f) SQLite Benchmark depth 𝒔 𝒄𝒇𝒕𝒖 * events threads 0.0024 0.0300 r =47, p =0.0022 r =27, p =0.0256 0.0250 Hawknl 28 3 3 2 - 0.4530 0.0019 0.0200 0.0014 SQLite 16 3 3 2 - 0.6863 0.0150 0.0009 JDBC-2 5,050 3 3 3 0.059% 0.0632 p = 0.0088 0.0100 0.0004 JDBC-4 5,090 3 3 5 0.098% 0.1123 p = 0.0004 0.0050 -0.0001 JDBC-3 5,080 3 3 11 0.217% 0.0229 0 50 100 150 200 250 300 0 15 30 45 60 75 90 105 120 135 150 (g) MySQL-1 (h) MySQL-2 JDBC-1 5,088 3 3 17 0.334% 0.0439 0.0049 0.0069 r= 20, p= 0.0062 r= 114, p= 0.0039 MySQL-4 444,621 19 3 20 0.005% 0.0062 0.0059 0.0039 0.0049 MySQL-2 15,066 17 3 27 0.179% 0.0256 0.0029 0.0039 MySQL-1 19,300 16 3 47 0.244% 0.0022 0.0029 0.0019 0.0019 MySQL-3 406,117 22 6 114 0.028% 0.0039 0.0009 0.0009 p = 0.0000 p = 0.0000 10 -0.0001 -0.0001 (* All rows are sorted on the data in this column.) 0 50 100 150 200 250 300 0 15 30 45 60 75 90 105 120 135 150 (i) MySQL-3 (j) MySQL-4

Deployable Data Race Sampling FSE’16 Yan Cai , Jian Zhang, Lingwei Cao, and Jian Liu

Concurrency bugs • Difficult to detect – Non-determinism (space explosion) – Inadequate test inputs – … • Even after software release, concurrency bugs may still occur 12

Concurrency bugs • It is necessary to detect concurrency bugs in deployed products • Challenges: Detector not to disturb normal executions – light-weighted <5% overhead – … Sample user executions 13

Existing works • Data Race Two threads concurrently access the same memory location and at least one access is a write. • Happens-before (HB Race) • Access pairs not ordered by happens-before relation (HBR) Thread t 1 Thread t 2 Thread t 1 Thread t 2 x++; x++; sync(m){} sync(m) sync(m) sync(m){} {x++;} {x++;} Value of x: +1 or +2? Value of x: +2. 14

Existing works • Happens-before Races – Track full Happens-before relation • Incurring many O(n) operations 0% sampling rate => ~30% overhead (Pacer, PLDI’10) ~15% in our experiment Insight 1: Not to track Full Happens-before Relation 15

Existing works • Hardware based (e.g., DataCollider , OSDI’10) – Code Breakpoints and Data Breakpoints (or Watchpoints ) – Collision Races • A data race: two accesses – Select a memory address => Set a data breakpoint => Wait for the breakpoint to be fired – The waiting time directly increases the sampling overhead Insight 2: Not to directly delay executions 16

Existing works • … • See our paper for more insights 17

Our Proposal • Clock Race – For data race sampling purpose • CRSampler – To detect clock races 18

Clock Race • Clock Race – Thread-local clock : an integer for each thread, increased on synchronization operation. – Two accesses (with at least a write) form a Clock Race if: at least one thread-local clock is not changed in between the two accesses Thread 1 Thread 2 Thread 1 Thread 2 time 1 time 1 1 1 Time elapse Time elapse sync sync time 2 time 2 2 2 19 1 𝑙 is not changed between time 1 and time 2 . 1 𝑙 No clock races

Clock Race • A Quick Demonstration Maintain thread-local clocks Thread 1 Thread 2 1 𝑙 2 𝑙 10 8 acquire ( l ) onSync( ); acquire ( k ) onSync( ); 11 9 … x = 0; sample( x ); 11 9 … 11 9 release ( k ) onSync( ); 11 10 x ++ ; Sampled access 11 10 release ( l ) onSync( ); 12 On this read, t 1 .clock remains 11, a clock race on x is reported 20

Clock Race • Clock Race – Race checking does not need to delay any thread. – But: after e 1 appears, how much time is required to check two accesses? • Given a short time, it is not enough to trap the second access. • Given a long time, all threads’ lock clocks are changed. Thread 1 Thread 2 time 1 1 Time elapse One second, or … time 2 2 1 𝑙 is not changed between time 1 and time 2 . 21

Setup • Implementation – Jikes RVM – Sampling: Java class load time – Memory accesses  Linux Kernel Execution On firing Core of Netlink User-site Kernel CPU DC/CR Com. Agent Site Set breakpoints JikesRVM User space Kernel space • Benchmarks – Dacapo benchmark suite 22

Setup • Comparisons – Sampling rate: 0.1% to 1.0% – Pacer (PLDI’10) – Data Collider (OSDI’10) DC 15 , DC 30 15ms, 30ms – CRSampler CR 15 , CR 30 • ThinkPad Workstation – I7-4710MQ CPU, four cores, 16G memory, 250G SSD 23

of Concurrency Bugs Yan Cai ( ) ycai.mail@gmail.com State Key Lab. - PowerPoint PPT Presentation

ISHCS 2016 (International Symposium on High Confidence Software), PKU, Beijing, Dec. 18, 2016 Probabilistic Detection and Sampling of Concurrency Bugs Yan Cai ( ) ycai.mail@gmail.com State Key Lab. of Computer Science, I nstitute of S

Defect Detection Thomas Zimmermann The First Bug September 9, 1947 More Bugs More Bugs More

Outline Bugs! 1 Avoiding and Finding bugs 2 Bugs still happen 3 Why do bugs still happen ?!

Understanding and Genera-ng High Quality Patches for Concurrency Bugs Haopeng Liu , Yuxi Chen and

Finding Concurrency Bugs in Java David Hovemeyer and William Pugh July 25, 2004 David Hovemeyer

COMP31212: Concurrency Topics 4.1: Concurrency Patterns - Monitors Topic 4.1: Concurrency

Testing and Debugging for Concurrent Programs Yi-Fan Tsai yifan.tsai@colorado.edu Concurrency

BED BUGS HOW TO HELP SOLVE THE PROBLEM WHAT ARE BED BUGS? Bed bugs are parasites that feed on

IST-Pesticides RESEARCH SUPPORTED BY: Osborne Natural Enemies Bugs eating Bugs What

IN SCRUM PROJECTS Ramesh Shiraddi Bugs Current sprint bugs -- Created and found in current

Bugs, Bugs, Bugs Uwe Schindler Apache Lucene Committer & PMC Member uschindler@apache.org

Part I. Hunting for Bugs Vadim Mutilin Institute for System Programming of the Russian Academy of

Concurrency Control Ensuring Isolation 354 Concurrency control Concurrency To increase

Concurrency What is concurrency? In computer science, concurrency is a property of systems which

Detecting and Avoiding Concurrency Bugs Pil Jae Jang Cyril Agbi Paper similarities Testing

DCatch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems Haopeng Liu ,

Concurrency Bugs Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau) Fall 2017 ::

the PROPAGE Study Gatan Gavazzi 1 , Mathieu Debray 2 , Benoit de Wazieres 3 , Marc Paccalin 4 ,

Understanding the Application Data Sheet (ADS): Little Things Make a Big Difference Office of

Zero Knowledge Succinct Arguments: an Introduction Alessandro Chiesa UC Berkeley 1 Motivation

Slide 1 Mayor Evans, Councillors, staff and community members. Thank you for allowing me to

Academic Achievement and Prison Incarceration Rates Analyzing the School-to-Prison Pipeline

Meet the Presenters Daphne Lainson Ann McCrackin Partner; Smart & Biggar President; Black

Race and Voting in Florida 17.871 Spring 2012 1 Hypothetical Statistics about Voting Pct.

Through the looking glass, and what Joseph found there Joseph Wright L A T EX Project The xfp

of Concurrency Bugs Yan Cai ( ) ycai.mail@gmail.com State Key Lab. - PowerPoint PPT Presentation

ISHCS 2016 (International Symposium on High Confidence Software), PKU, Beijing, Dec. 18, 2016 Probabilistic Detection and Sampling of Concurrency Bugs Yan Cai ( ) ycai.mail@gmail.com State Key Lab. of Computer Science, I nstitute of S

Defect Detection Thomas Zimmermann The First Bug September 9, 1947 More Bugs More Bugs More

Outline Bugs! 1 Avoiding and Finding bugs 2 Bugs still happen 3 Why do bugs still happen ?!

Understanding and Genera-ng High Quality Patches for Concurrency Bugs Haopeng Liu , Yuxi Chen and

Finding Concurrency Bugs in Java David Hovemeyer and William Pugh July 25, 2004 David Hovemeyer

COMP31212: Concurrency Topics 4.1: Concurrency Patterns - Monitors Topic 4.1: Concurrency

Testing and Debugging for Concurrent Programs Yi-Fan Tsai yifan.tsai@colorado.edu Concurrency

BED BUGS HOW TO HELP SOLVE THE PROBLEM WHAT ARE BED BUGS? Bed bugs are parasites that feed on

IST-Pesticides RESEARCH SUPPORTED BY: Osborne Natural Enemies Bugs eating Bugs What

IN SCRUM PROJECTS Ramesh Shiraddi Bugs Current sprint bugs -- Created and found in current

Bugs, Bugs, Bugs Uwe Schindler Apache Lucene Committer &amp; PMC Member uschindler@apache.org

Part I. Hunting for Bugs Vadim Mutilin Institute for System Programming of the Russian Academy of

Concurrency Control Ensuring Isolation 354 Concurrency control Concurrency To increase

Concurrency What is concurrency? In computer science, concurrency is a property of systems which

Detecting and Avoiding Concurrency Bugs Pil Jae Jang Cyril Agbi Paper similarities Testing

DCatch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems Haopeng Liu ,

Concurrency Bugs Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau) Fall 2017 ::

the PROPAGE Study Gatan Gavazzi 1 , Mathieu Debray 2 , Benoit de Wazieres 3 , Marc Paccalin 4 ,

Understanding the Application Data Sheet (ADS): Little Things Make a Big Difference Office of

Zero Knowledge Succinct Arguments: an Introduction Alessandro Chiesa UC Berkeley 1 Motivation

Slide 1 Mayor Evans, Councillors, staff and community members. Thank you for allowing me to

Academic Achievement and Prison Incarceration Rates Analyzing the School-to-Prison Pipeline

Meet the Presenters Daphne Lainson Ann McCrackin Partner; Smart &amp; Biggar President; Black

Race and Voting in Florida 17.871 Spring 2012 1 Hypothetical Statistics about Voting Pct.

Through the looking glass, and what Joseph found there Joseph Wright L A T EX Project The xfp

Bugs, Bugs, Bugs Uwe Schindler Apache Lucene Committer & PMC Member uschindler@apache.org

Meet the Presenters Daphne Lainson Ann McCrackin Partner; Smart & Biggar President; Black