via Hardware Performance Counter s Joy Arulraj, Po-Chun Chang, - - PowerPoint PPT Presentation

via hardware performance counter s
SMART_READER_LITE
LIVE PREVIEW

via Hardware Performance Counter s Joy Arulraj, Po-Chun Chang, - - PowerPoint PPT Presentation

Production-Run Software Failure Diagnosis via Hardware Performance Counter s Joy Arulraj, Po-Chun Chang, Guoliang Jin and Shan Lu Motivation Software inevitably fails on production machines These failures are widespread and expensive


slide-1
SLIDE 1

Production-Run Software Failure Diagnosis via Hardware Performance Counters Joy Arulraj, Po-Chun Chang, Guoliang Jin and Shan Lu

slide-2
SLIDE 2

Motivation

  • Software inevitably fails on production machines
  • These failures are widespread and expensive
  • Internet Explorer zero-day bug [2013]
  • Toyota Prius software glitch [2010]

2

These failures need to be diagnosed before

they can be fixed !

slide-3
SLIDE 3

Production-run failure diagnosis

  • Diagnosing failures on client machines
  • Limited info from each client machine
  • One bug can affect many clients
  • Need to figure out root cause & patch quickly

3

slide-4
SLIDE 4

Executive Summary

4

Use existing hardware support to diagnose widespread production-run failures with low monitoring overhead

slide-5
SLIDE 5

Diagnosing a real world bug

  • Sequential bug in print_tokens

5

int is_token_end(char ch){ if(ch == ‘\n’) return (TRUE); else if(ch == ‘ ’) // Bug: should return FALSE return (TRUE); else return (FALSE); } Input: Abc Def Expected Output: {Abc}, {Def} Actual Output: {Abc Def}

slide-6
SLIDE 6

Diagnosing concurrency bugs

  • Concurrency bug in Apache server

6

decrement_refcnt(...) { atomic_dec( &obj->refcnt); if(!obj->refcnt) cleanup(obj); } decrement_refcnt(...) { atomic_dec( &obj->refcnt); if(!obj->refcnt) cleanup(obj); }

2 --> 1

THREAD 1 THREAD 2

1 --> 0

slide-7
SLIDE 7

Requirements for failure diagnosis

  • Performance
  • Low runtime overhead for monitoring apps
  • Suitable for production-run deployment
  • Diagnostic Capability
  • Ability to accurately explain failures
  • Diagnose wide variety of bugs

7

slide-8
SLIDE 8

Existing work

8

Approach Performance Diagnostic Capability FAILURE REPLAY High runtime overhead OR Non-existent hardware support Manually locate root cause BUG DETECTION Many false positives

slide-9
SLIDE 9

Cooperative Bug Isolation

  • Cooperatively diagnose production-run failures
  • Targets widely deployed software
  • Each client machine sends back information
  • Uses sampling
  • Collects only a subset of information
  • Reduces monitoring overhead
  • Fits well with cooperative debugging approach

9

slide-10
SLIDE 10

Cooperative Bug Isolation

Approach Performance Diagnostic Capability CBI / CCI >100% overhead for many apps (CCI) Accurate & Automatic

10

Program Source Predicates & J/L Failure Predictors TRUE in most FAILURE runs, FALSE in most SUCCESS runs. Statistical Debugging Code size increased >10X Compiler Predicates Sampling

slide-11
SLIDE 11

Performance-counter based Bug Isolation

  • Requires no non-existent hardware support
  • Requires no software instrumentation

11

Program Binary Predicates & J/L Failure Predictors Statistical Debugging Hardware Predicates Sampling Code size unchanged. Hardware performance counters

slide-12
SLIDE 12

PBI Contributions

  • Suitable for production-run deployment
  • Can diagnose a wide variety of failures
  • Design addresses privacy concerns

12

Approach Performance Diagnostic Capability PBI <2% overhead for most apps evaluated Accurate & Automatic

slide-13
SLIDE 13

Outline

  • Motivation
  • Overview
  • PBI
  • Hardware performance counters
  • Predicate design
  • Sampling design
  • Evaluation
  • Conclusion

13

slide-14
SLIDE 14

Hardware Performance Counters

  • Registers monitor hardware performance events
  • 1—8 registers per core
  • Each register can contain an event count
  • Large collection of hardware events
  • Instructions retired, L1 cache misses, etc.

14

slide-15
SLIDE 15

Accessing performance counters

INTERRUPT-BASED POLLING-BASED

15

HW (PMU) Kernel User Config Config Interrupt Instruction HW (PMU) User Special Config Count

How do we monitor which event occurs at which instruction using performance counters ?

slide-16
SLIDE 16

Predicate evaluation schemes

16

Natural fit for sampling Requires instrumentation More precise Imprecise due to OO execution

INTERRUPT-BASED POLLING-BASED Counter

  • verflow

Kernel Config Interrupt

Interrupt at Instruction C => Event occurred at C

  • ld = readCounter()

< Instruction C > new = readCounter() if(new > old) Event occurred at C

16

slide-17
SLIDE 17

Concurrency bug failures

  • L1 data cache cache-coherence events

17

How do we use performance counters to diagnose concurrency bug failures ?

Local read Local write Remote read Remote write

Modified Exclusive Shared Invalid

slide-18
SLIDE 18

Atomicity Violation Example

18

THD 1 on CORE 1 decrement_refcnt(...) { apr_atomic_dec( &obj->refcnt);

C:if(!obj->refcnt)

cleanup_cache(obj); } CORE 1 – THD 1 Local Write

Modified

slide-19
SLIDE 19

Atomicity Violation Example

19

decrement_refcnt(...) { apr_atomic_dec( &obj->refcnt);

C:if(!obj->refcnt)

cleanup_cache(obj); } decrement_refcnt(...) { apr_atomic_dec( &obj->refcnt); if(!obj->refcnt) cleanup_cache(obj); } CORE 1 – THD 1 CORE 2 - THD 2

Remote Write

Invalid

slide-20
SLIDE 20

Atomicity Violation Bugs

20

THREAD INTERLEAVING FAILURE PREDICTOR WWR Interleaving INVALID RWR Interleaving INVALID RWW Interleaving INVALID WRW Interleaving SHARED

slide-21
SLIDE 21

Order violation

21

print(‚End‛,Gend)

C:print(‚Run‛,Gend-init)

Gend = time()

CORE 1 – MASTER THD CORE 2 – SLAVE THD

Shared

Remote Write Local Read

slide-22
SLIDE 22

Order violation

22

print(‚End‛,Gend)

C:print(‚Run‛,Gend-init)

Gend = time() Exclusive

CORE 1 – MASTER THD CORE 2 – SLAVE THD Local Read

slide-23
SLIDE 23

PBI Predicate Sampling

  • We use Perf (provided by Linux kernel 2.6.31+)

23

perf record –event=<code> -c <sampling_rate> <program monitored>

Log Id APP Core Performance Event Instruction Function 1 Apache 2 0x140 (Invalid) 401c3b decrement _refcnt

slide-24
SLIDE 24

PBI vs. CBI/CCI (Qualitative)

  • Performance
  • Diagnostic capability
  • Discontinuous monitoring (CCI/CBI)
  • Continuous monitoring (PBI)

CCI Are other threads sampling? Sample in this region? Are other threads sampling?

24

Sample in this region? PBI CBI

slide-25
SLIDE 25

Outline

  • Motivation
  • Overview
  • PBI
  • Hardware performance counters
  • Predicate design
  • Sampling design
  • Evaluation
  • Conclusion

25

slide-26
SLIDE 26

Methodology

  • 23 real-world failures
  • In open-source server, client, utility programs
  • All CCI benchmarks evaluated for comparison
  • Each app executed 1000 runs (400-600 failure runs)
  • Success inputs from standard test suites
  • Failure inputs from bug reports
  • Emulate production-run scenarios
  • Same sampling settings for all apps

26

slide-27
SLIDE 27

Evaluation

27

Program Diagnostic Capability PBI CCI-P CCI-H Apache1    Apache2    Cherokee  X  FFT   X LU   X Mozilla-JS1  X  Mozilla-JS2    Mozilla-JS3    MySQL1 

  • MySQL2

  • PBZIP2

  

slide-28
SLIDE 28

Diagnostic Capability

28

Program Diagnostic Capability PBI CCI-P CCI-H Apache1 (Invalid)   Apache2  (Invalid)   Cherokee  (Invalid) X  FFT  (Exclusive)  X LU  (Exclusive)  X Mozilla-JS1  (Invalid) X  Mozilla-JS2  (Invalid)   Mozilla-JS3  (Invalid)   MySQL1  (Invalid)

  • MySQL2

(Shared)

  • PBZIP2

(Invalid)  

slide-29
SLIDE 29

Diagnostic Capability

29

Program Diagnostic Capability PBI CCI-P CCI-H Apache1    Apache2    Cherokee  X  FFT   X LU   X Mozilla-JS1  X  Mozilla-JS2    Mozilla-JS3    MySQL1 

  • MySQL2

  • PBZIP2

  

slide-30
SLIDE 30

Diagnostic Capability

30

Program Diagnostic Capability PBI CCI-P CCI-H Apache1    Apache2    Cherokee  X  FFT   X LU   X Mozilla-JS1  X  Mozilla-JS2    Mozilla-JS3    MySQL1 

  • MySQL2

  • PBZIP2

  

slide-31
SLIDE 31

Diagnostic Overhead

Program Diagnostic Overhead PBI CCI-P CCI-H Apache1 0.40% 1.90% 1.20% Apache2 0.40% 0.40% 0.10% Cherokee 0.50% 0.00% 0.00% FFT 1.00% 121% 118% LU 0.80% 285% 119% Mozilla-JS1 1.50% 800% 418% Mozilla-JS2 1.20% 432% 229% Mozilla-JS3 0.60% 969% 837% MySQL1 3.80%

  • MySQL2

1.20%

  • PBZIP2

8.40% 1.40% 3.00%

31

slide-32
SLIDE 32

Diagnostic Overhead

Program Diagnostic Overhead PBI CCI-P CCI-H Apache1 0.40% 1.90% 1.20% Apache2 0.40% 0.40% 0.10% Cherokee 0.50% 0.00% 0.00% FFT 1.00% 121% 118% LU 0.80% 285% 119% Mozilla-JS1 1.50% 800% 418% Mozilla-JS2 1.20% 432% 229% Mozilla-JS3 0.60% 969% 837% MySQL1 3.80%

  • MySQL2

1.20%

  • PBZIP2

8.40% 1.40% 3.00%

32

slide-33
SLIDE 33

Conclusion

  • Low monitoring overhead
  • Good diagnostic capability
  • No changes in apps
  • Novel use of performance counters

33

PBI will help developers diagnose production-run software failures with low overhead Thanks !