of Concurrency Bugs Yan Cai ( ) ycai.mail@gmail.com State Key Lab. - - PowerPoint PPT Presentation

of concurrency bugs
SMART_READER_LITE
LIVE PREVIEW

of Concurrency Bugs Yan Cai ( ) ycai.mail@gmail.com State Key Lab. - - PowerPoint PPT Presentation

ISHCS 2016 (International Symposium on High Confidence Software), PKU, Beijing, Dec. 18, 2016 Probabilistic Detection and Sampling of Concurrency Bugs Yan Cai ( ) ycai.mail@gmail.com State Key Lab. of Computer Science, I nstitute of S


slide-1
SLIDE 1

Probabilistic Detection and Sampling

  • f Concurrency Bugs

Yan Cai (蔡彦)

ycai.mail@gmail.com

State Key Lab. of Computer Science, Institute of Software, Chinese Academy of Sciences

中科院软件所·计算机科学国家重点实验室

ISHCS 2016 (International Symposium on High Confidence Software), PKU, Beijing, Dec. 18, 2016

slide-2
SLIDE 2

Radius-aware Probabilistic Deadlock detection

ASE’16 Yan Cai and Zijiang Yang

slide-3
SLIDE 3

Locks and Deadlocks

Read

Thread 1

Write

Thread 2

Data 1

Read

Thread 1

Write

Write

Thread 2

Read

Data 2

Deadlock

3

Thread t1 Thread t2 acq(m) acq(n) acq(n) acq(m)

slide-4
SLIDE 4

Deadlock Testing

  • Random testing

– OS scheduling + random manipulation – Stress testing – Heuristic directed random testing – Systematic scheduling

4

No Guarantee to find a

concurrency bug (e.g., Deadlock)

slide-5
SLIDE 5
  • PCT Algorithm

– Mathematical randomness with Probabilistic Guarantees

5

Thread t1 Thread t2 s01 acq(m) s02 acq(n) s03 rel(n) s04 rel(m) s05 acq(n) s06 acq(m) s07 rel(m) s08 rel(n)

  

k =8, n =2, d =2 1 2 × 82−1 = 1/16 1 𝑜 × 𝑙𝑒−1 n: #threads, k: #events, d: bug depth

PCT – Probabilistic Concurrency Testing

slide-6
SLIDE 6

PCT – Probabilistic Concurrency Testing

  • PCT :

– Intuition of guaranteed probability:

  • 1. satisfy the 1st order by assigning the thread a largest priority (1/𝑜)
  • 2. select d – 1 priority change points at the remaining d – 1 order

position (1/𝑙 × 1/k ×…× 1/𝑙 =

1 𝑙𝑒−1) ⇒ 1 𝑜×𝑙𝑒−1

6

Thread t1 Thread t2 s01 acq(m) s02 acq(n) s03 rel(n) s04 rel(m) s05 acq(n) s06 acq(m) s07 rel(m) s08 rel(n) 

k =8, n =2, d =2 1 2 × 82−1 = 1/16

slide-7
SLIDE 7

PCT – Probabilistic Concurrency Testing

  • Provide a guarantee (a probability ):

But …

  • Theoretical model, not consider

thread interaction: real executions do not follow designed executions

  • Guaranteed probability decreases

exponentially with increase of bug depth: due to factor

1 𝑙𝑒−1.

7

, … (a) Uniform distribution Execution Threads t1, t2, … tn … …

1 𝑜 × 𝑙𝑒−1 n: #threads, k: #events, d: bug depth

slide-8
SLIDE 8

RPro- Radius aware

  • Our approach: RPro – Radius aware Probabilistic testing

8

, … (a) Uniform distribution Execution Threads t1, t2, … tn … …

  • Consider thread interaction
  • Guaranteed probability

decreases:

1 𝑠 (not 1 𝑙, r ≪ k)

1 𝑜 × 𝑙𝑒−1 1 𝑜 × 𝑙 × 𝑠𝑒−2

PCT v.s. RPro

Threads t1, t2, … tn …

slide-9
SLIDE 9

RPro- Radius aware

  • RPro: Theoretical guarantee

9

Bug Radius Probability 1 𝑜 × 𝑙𝑒−1 1 𝑜 × 𝑙 × 𝑠𝑒−2 rbug rbug – 1

PCT: Guaranteed probability RPro: Guaranteed probability RPro: Probability in practice

r = k

How to find rbug?

slide-10
SLIDE 10

Experiment

  • Results

10

p=0.0020 r=17, p=0.0439 0.00 0.01 0.02 0.03 0.04 0.05 15 30 45 60 75 90 105 120 135 150 PCT RPro p=0.0385 r=3, p=0.0632 0.02 0.03 0.04 0.05 0.06 0.07 15 30 45 60 75 90 105 120 135 150 p=0.0005 r=11, p=0.0229 0.00 0.01 0.01 0.02 0.02 0.03 15 30 45 60 75 90 105 120 135 150 p=0.0680 r=5, p=0.1123 0.06 0.07 0.08 0.09 0.10 0.11 0.12 15 30 45 60 75 90 105 120 135 150 p=0.1755 r=2, p=0.453 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 15 30 45 60 75 90 105 120 135 150 p=0.4326 r=2, p=0.6863 0.40 0.45 0.50 0.55 0.60 0.65 0.70 15 30 45 60 75 90 105 120 135 150 p=0.0004 r=47, p=0.0022
  • 0.0001
0.0004 0.0009 0.0014 0.0019 0.0024 50 100 150 200 250 300 p=0.0088 r=27, p=0.0256 0.0050 0.0100 0.0150 0.0200 0.0250 0.0300 15 30 45 60 75 90 105 120 135 150 p=0.0000 r=114, p=0.0039
  • 0.0001
0.0009 0.0019 0.0029 0.0039 0.0049 50 100 150 200 250 300 p=0.0000 r=20, p=0.0062
  • 0.0001
0.0009 0.0019 0.0029 0.0039 0.0049 0.0059 0.0069 15 30 45 60 75 90 105 120 135 150 (a) JDBC-1 (b) JDBC-2 (c) JDBC-3 (d) JDBC-4 (e) Hawknl (f) SQLite (g) MySQL-1 (h) MySQL-2 (i) MySQL-3 (j) MySQL-4

Bug Radius Probability 1 𝑜 × 𝑙𝑒−1 1 𝑜 × 𝑙 × 𝑠𝑒−2 rbug rbug – 1

PCT: Guaranteed probability RPro: Guaranteed probability RPro: Probability in practice

r = k Table 1. The best radiuses (rbest) of each benchmarks. Benchmark # events # threads bug depth 𝒔𝒄𝒇𝒕𝒖*

𝒔𝒄𝒇𝒕𝒖 #𝒇𝒘𝒇𝒐𝒖𝒕 Probability

Hawknl 28 3 3 2

  • 0.4530

SQLite 16 3 3 2

  • 0.6863

JDBC-2 5,050 3 3 3 0.059% 0.0632 JDBC-4 5,090 3 3 5 0.098% 0.1123 JDBC-3 5,080 3 3 11 0.217% 0.0229 JDBC-1 5,088 3 3 17 0.334% 0.0439 MySQL-4 444,621 19 3 20 0.005% 0.0062 MySQL-2 15,066 17 3 27 0.179% 0.0256 MySQL-1 19,300 16 3 47 0.244% 0.0022 MySQL-3 406,117 22 6 114 0.028% 0.0039 (* All rows are sorted on the data in this column.)

slide-11
SLIDE 11

Deployable Data Race Sampling

FSE’16

Yan Cai, Jian Zhang, Lingwei Cao, and Jian Liu

slide-12
SLIDE 12

Concurrency bugs

  • Difficult to detect

– Non-determinism (space explosion) – Inadequate test inputs – …

  • Even after software release,

concurrency bugs may still occur

12

slide-13
SLIDE 13

Concurrency bugs

  • It is necessary to detect concurrency

bugs in deployed products

  • Challenges:

not to disturb normal executions

– light-weighted – …

Sample user executions

Detector

<5% overhead

13

slide-14
SLIDE 14

Existing works

  • Data Race
  • Happens-before (HB Race)
  • Access pairs not ordered by happens-before relation (HBR)

Two threads concurrently access the same memory location and at least one access is a write.

Thread t1 x++; sync(m){} Thread t2 sync(m) {x++;} Thread t1 x++; sync(m){} Thread t2 sync(m) {x++;} Value of x: +2. Value of x: +1 or +2?

14

slide-15
SLIDE 15

Existing works

  • Happens-before Races

– Track full Happens-before relation

  • Incurring many O(n) operations

Insight 1: Not to track Full Happens-before Relation

0% sampling rate => ~30%

  • verhead

(Pacer, PLDI’10) ~15% in our experiment

15

slide-16
SLIDE 16

Existing works

  • Hardware based (e.g., DataCollider, OSDI’10)

– Code Breakpoints and Data Breakpoints (or Watchpoints) – Collision Races

  • A data race: two accesses

– Select a memory address => Set a data breakpoint => Wait for the breakpoint to be fired – The waiting time directly increases the sampling overhead Insight 2: Not to directly delay executions

16

slide-17
SLIDE 17

Existing works

  • See our paper for more insights

17

slide-18
SLIDE 18

Our Proposal

  • Clock Race

– For data race sampling purpose

  • CRSampler

– To detect clock races

18

slide-19
SLIDE 19

Thread 1 1 Thread 2 2

1 𝑙

time1 time2 Time elapse

Clock Race

  • Clock Race

– Thread-local clock: an integer for each thread, increased on synchronization operation. – Two accesses (with at least a write) form a Clock Race if: at least one thread-local clock is not changed in between the two accesses

sync

No clock races

sync

19

Thread 1 1 Thread 2 2

1 𝑙 is not changed between time1 and time2.

time1 time2 Time elapse

slide-20
SLIDE 20

Clock Race

  • A Quick Demonstration

Thread 1 acquire(l)

  • nSync( );

x = 0; sample(x); … release(l)

  • nSync( );

Thread 2 acquire(k)

  • nSync( );

… release(k)

  • nSync( );

x ++; 1 𝑙 2 𝑙

10 8 11 9 11 9 11 9 11 10 11 10 12 On this read, t1.clock remains 11, a clock race on x is reported Maintain thread-local clocks Sampled access

20

slide-21
SLIDE 21

Clock Race

  • Clock Race

– Race checking does not need to delay any thread. – But: after e1 appears, how much time is required to check two accesses?

  • Given a short time, it is not enough to trap the second access.
  • Given a long time, all threads’ lock clocks are changed.

Thread 1 1 Thread 2 2

1 𝑙 is not changed between time1 and time2.

time1 time2 Time elapse

One second,

  • r …

21

slide-22
SLIDE 22

Setup

  • Implementation

– Jikes RVM – Sampling: Java class load time – Memory accesses  Linux Kernel

  • Benchmarks

– Dacapo benchmark suite

JikesRVM User-site Agent Kernel Site CPU

Set breakpoints On firing

User space Kernel space Netlink Com. Core of DC/CR Execution

22

slide-23
SLIDE 23

Setup

  • Comparisons

– Sampling rate: 0.1% to 1.0% – Pacer (PLDI’10) – Data Collider (OSDI’10) – CRSampler

  • ThinkPad Workstation

– I7-4710MQ CPU, four cores, 16G memory, 250G SSD

23

15ms, 30ms

DC15, DC30 CR15, CR30

slide-24
SLIDE 24

Experiments

  • Overall Results

– Effectiveness

  • CR: more data races at

low sampling rates

– Overhead

Bench- marks

Binary Size (KB)

# of threads # of sync. Pacer* DC15 DC30 CR15 CR30 avrora09 2,086 7 3,312,801 3 3 3 5 3 xalan06 1,027 9 35,859,489 5 5 5 87 81 xalan09 4,827 9 12,599,144 2 2 84 91 sunflow09 1,017 17 1,590 2 46 45 pmd09 2,996 9 20,550 4 2 2 110 121 eclipse06 41,822 16 51,131,093 19 2 6 58 63 Sum: 31 14 20 390 404

y = 9.0749x + 0.1474 R² = 0.9397 y = 2.6312x + 0.0252 R² = 0.9624 y = 2.0859x + 0.0269 R² = 0.9899 0% 5% 10% 15% 20% 25% 30% 0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0.7% 0.8% 0.9% 1.0% Sampling rate
  • Avg. Overhead of Pacer and CRSampler + Trendlines
Pacer CR15 CR30 y = 9.0749x + 0.1474 R² = 0.9397 y = 92.529x + 0.0588 R² = 0.9851 y = 173.46x + 0.0997 R² = 0.9796 0% 40% 80% 120% 160% 200% 0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0.7% 0.8% 0.9% 1.0% Avereage Overhead Sampling rate
  • Avg. Overhead of Three + Trendlines
Pacer DC15 DC30 CR15 CR30

DC30 DC15 CR30 , CR15 Pacer 5%

24

slide-25
SLIDE 25

Experiments

  • Discussions

– DataCollider: overhead from its delays.

  • DC30 has almost 2 times overhead than DC15.

– Pacer: basic overhead ~15% – CRSampler: ~5% overhead at 1.0% sampling rate.

25

y = 9.0749x + 0.1474 R² = 0.9397 y = 2.6312x + 0.0252 R² = 0.9624 y = 2.0859x + 0.0269 R² = 0.9899 0% 5% 10% 15% 20% 25% 30% 0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0.7% 0.8% 0.9% 1.0% Sampling rate
  • Avg. Overhead of Pacer and CRSampler + Trendlines
Pacer CR15 CR30 y = 9.0749x + 0.1474 R² = 0.9397 y = 92.529x + 0.0588 R² = 0.9851 y = 173.46x + 0.0997 R² = 0.9796 0% 40% 80% 120% 160% 200% 0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0.7% 0.8% 0.9% 1.0% Avereage Overhead Sampling rate
  • Avg. Overhead of Three + Trendlines
Pacer DC15 DC30 CR15 CR30

DC30 DC15 CR30 , CR15 Pacer 5%

slide-26
SLIDE 26

26

Thanks~