A Technique for Enabling and Supporting Debugging of Field Failures - - PowerPoint PPT Presentation

a technique for enabling and supporting debugging of
SMART_READER_LITE
LIVE PREVIEW

A Technique for Enabling and Supporting Debugging of Field Failures - - PowerPoint PPT Presentation

A Technique for Enabling and Supporting Debugging of Field Failures James Clause and Alessandro Orso Georgia Institute of Technology This work was supported in part by NSF awards CCF-0541080 and CCR-0205422 to Georgia Tech. 1 3 Field


slide-1
SLIDE 1

A Technique for Enabling and Supporting Debugging of Field Failures

James Clause and Alessandro Orso

Georgia Institute of Technology

This work was supported in part by NSF awards CCF-0541080 and CCR-0205422 to Georgia Tech.

1

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7

3

slide-8
SLIDE 8

3

Field failures: Anomalous behavior (or crashes) of deployed software that occur on user machines

slide-9
SLIDE 9

4

slide-10
SLIDE 10

Crash logs

4

slide-11
SLIDE 11

Crash logs User-provided information

4

slide-12
SLIDE 12

Our solution

5

slide-13
SLIDE 13

Our solution

5

Record

slide-14
SLIDE 14

Our solution

5

Record Replay

slide-15
SLIDE 15

Our solution

5

Record Replay Minimize

slide-16
SLIDE 16

Our solution

5

Record Replay Minimize Debug

slide-17
SLIDE 17

Usage Scenario

6

In house In the field

Replay / Minimize (off line) Record (on line) Develop Replay / Debug

Execution repository

/

slide-18
SLIDE 18

Existing record / replay approaches

7

Regression testing

(e.g. Elbaum et al. 06, Orso et al. 06, Orso and Kennedy 05, Saff et al. 05, Mercury WinRunner)

  • Replay only a portion of an

execution by recording events for specific subsystems Both types of techniques are not amenable to minimization and may cause unacceptable overhead

Deterministic debugging

(e.g. Chen et al. 01, King et al. 05, Narayanasamy et al. 05, Netzer and Weaver 94, Srinivasan et al. 04, VMWare)

  • Replay an entire execution by

recording every component of an application

slide-19
SLIDE 19

Outline

✓ Motivation & background

  • Our technique
  • record
  • replay
  • minimization
  • Empirical evaluation
  • Conclusion & future work

8

slide-20
SLIDE 20

Record & Replay

  • Goal: develop an approach that has low overhead and is

amenable to minimization

  • Key insight: avoid focusing on low-level (internal) events
  • expensive (large number of events)
  • not amenable to minimization (high interdependence)

9

slide-21
SLIDE 21

Record & Replay

  • Goal: develop an approach that has low overhead and is

amenable to minimization

  • Key insight: avoid focusing on low-level (internal) events
  • expensive (large number of events)
  • not amenable to minimization (high interdependence)

➡ Focus on high-level (external) interactions with the

environment

  • efficient (fewer, more “expensive” interactions)
  • amenable to minimization (low interdependence)

10

slide-22
SLIDE 22

Environment interactions

11

slide-23
SLIDE 23

Environment interactions

Streams

11

slide-24
SLIDE 24

Environment interactions

Streams Files

11

slide-25
SLIDE 25

Environment interactions

Streams Files

11

slide-26
SLIDE 26

Environment interactions

Streams Files

11

Interaction events: FILE — interaction with a file POLL — checks for availability of data on a stream PULL — read data from a stream

slide-27
SLIDE 27

Event log: Environment data (files):

12

Environment data (streams):

slide-28
SLIDE 28

Event log: Environment data (files):

12

Environment data (streams):

FILE foo.1 foo.1

slide-29
SLIDE 29

Event log: Environment data (files):

12

Environment data (streams):

FILE foo.1 foo.1

slide-30
SLIDE 30

Event log: Environment data (files):

12

Environment data (streams):

FILE foo.1 foo.1 POLL KEYBOARD NOK

slide-31
SLIDE 31

Event log: Environment data (files):

12

Environment data (streams):

FILE foo.1 foo.1 POLL KEYBOARD NOK

slide-32
SLIDE 32

Event log: Environment data (files):

12

Environment data (streams):

FILE foo.1 foo.1 KEYBOARD: {5680} POLL KEYBOARD OK POLL KEYBOARD NOK

slide-33
SLIDE 33

Event log: Environment data (files):

12

Environment data (streams):

FILE foo.1 foo.1 KEYBOARD: {5680} POLL KEYBOARD OK POLL KEYBOARD NOK

slide-34
SLIDE 34

Event log: Environment data (files):

12

Environment data (streams):

FILE foo.1 foo.1 KEYBOARD: {5680}hello POLL KEYBOARD OK PULL KEYBOARD 5 POLL KEYBOARD NOK

slide-35
SLIDE 35

Event log: Environment data (files):

12

Environment data (streams):

FILE foo.1 foo.1 KEYBOARD: {5680}hello POLL KEYBOARD OK PULL KEYBOARD 5 POLL KEYBOARD NOK

slide-36
SLIDE 36

Event log: Environment data (files):

12

Environment data (streams):

FILE foo.1 foo.1 KEYBOARD: {5680}hello POLL KEYBOARD OK PULL KEYBOARD 5 POLL KEYBOARD NOK POLL NETWORK OK NETWORK: {3405}

slide-37
SLIDE 37

Event log: Environment data (files):

12

Environment data (streams):

FILE foo.1 foo.1 KEYBOARD: {5680}hello POLL KEYBOARD OK PULL KEYBOARD 5 POLL KEYBOARD NOK POLL NETWORK OK NETWORK: {3405} ❙

slide-38
SLIDE 38

Event log: Environment data (files):

12

Environment data (streams):

FILE foo.1 foo.1 KEYBOARD: {5680}hello POLL KEYBOARD OK PULL KEYBOARD 5 POLL KEYBOARD NOK POLL NETWORK OK NETWORK: {3405} ❙

slide-39
SLIDE 39

Environment data (files):

13

Event log: Environment data (streams):

KEYBOARD: {5680}hello❙{4056}c❙{300}... NETWORK: {3405}<html><body>...❙{202}...

FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 1 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ...

foo.1 foo.2 bar.1

slide-40
SLIDE 40

Environment data (files):

14

Event log: Environment data (streams):

KEYBOARD: {5680}hello❙{4056}c❙{300}... NETWORK: {3405}<html><body>...❙{202}...

FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 1 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ...

foo.1 foo.2 bar.1

slide-41
SLIDE 41

Environment data (files):

14

Event log: Environment data (streams):

KEYBOARD: {5680}hello❙{4056}c❙{300}... NETWORK: {3405}<html><body>...❙{202}...

FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 1 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ...

foo.1 foo.2 bar.1

slide-42
SLIDE 42

Environment data (files):

14

Event log: Environment data (streams):

KEYBOARD: {5680}hello❙{4056}c❙{300}... NETWORK: {3405}<html><body>...❙{202}...

FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 1 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ...

foo.1 foo.2 bar.1 ✔

slide-43
SLIDE 43

Environment data (files):

14

Event log: Environment data (streams):

KEYBOARD: {5680}hello❙{4056}c❙{300}... NETWORK: {3405}<html><body>...❙{202}...

FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 1 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ...

foo.1 foo.2 bar.1 ✔

slide-44
SLIDE 44

Environment data (files):

14

Event log: Environment data (streams):

KEYBOARD: {5680}hello❙{4056}c❙{300}... NETWORK: {3405}<html><body>...❙{202}...

FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 1 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ...

foo.1 foo.2 bar.1 ✔

slide-45
SLIDE 45

Environment data (files):

14

Event log: Environment data (streams):

KEYBOARD: {5680}hello❙{4056}c❙{300}... NETWORK: {3405}<html><body>...❙{202}...

FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 1 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ...

foo.1 foo.2 bar.1 ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

slide-46
SLIDE 46

Minimize

15

Goal: focus debugging effort

slide-47
SLIDE 47

Minimize

15

Goal: focus debugging effort

Execution recording

slide-48
SLIDE 48

Minimize

15

Goal: focus debugging effort

Execution recording

  • Time

minimization

slide-49
SLIDE 49

Minimize

15

Goal: focus debugging effort

Execution recording Execution recording

  • Time

minimization

slide-50
SLIDE 50

Minimize

15

Goal: focus debugging effort

Execution recording Execution recording

  • Time

minimization

Data minimization

slide-51
SLIDE 51

Minimize

15

Goal: focus debugging effort

Execution recording Execution recording Execution recording

  • Time

minimization

Data minimization

slide-52
SLIDE 52

Minimize: time

Environment data (files):

16

Event log: Environment data (streams):

KEYBOARD: {5680}hello❙{4056}c❙{300}... NETWORK: {3405}<html><body>...❙{202}...

FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 1 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK

slide-53
SLIDE 53

Minimize: time

Environment data (files):

16

Event log: Environment data (streams):

KEYBOARD: {5680}hello❙{4056}c❙{300}... NETWORK: {3405}<html><body>...❙{202}...

FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 1 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK

slide-54
SLIDE 54

Minimize: time

Environment data (files):

17

Event log: Environment data (streams):

FILE foo.1 POLL KEYBOARD OK PULL KEYBOARD 1 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK OK FILE foo.2 PULL NETWORK 1024 FILE foo.2

KEYBOARD: {5680}hello❙{4056}c❙{300}... NETWORK: {3405}<html><body>...❙{202}...

slide-55
SLIDE 55

Minimize: time

Environment data (files):

17

Event log: Environment data (streams):

FILE foo.1 POLL KEYBOARD OK PULL KEYBOARD 1 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK OK FILE foo.2 PULL NETWORK 1024 FILE foo.2

KEYBOARD: {5680}hello❙{4056}c❙{300}... NETWORK: {3405}<html><body>...❙{202}...

slide-56
SLIDE 56

Minimize: data

18

Atoms Chunks Whole entities Data minimization Environment

slide-57
SLIDE 57

Minimize: data

18

Atoms Chunks Whole entities Data minimization Environment

slide-58
SLIDE 58

Minimize: data

18

  • Atoms

Chunks Whole entities Data minimization Environment

slide-59
SLIDE 59

Minimize: data

18

Atoms Chunks Whole entities Data minimization Environment

slide-60
SLIDE 60

Minimize: data

18

Atoms Chunks Whole entities Data minimization Environment

slide-61
SLIDE 61

Minimize: data

18

Atoms Chunks Whole entities Data minimization Environment

slide-62
SLIDE 62

Minimize: data

18

Atoms Chunks Whole entities Data minimization Environment

slide-63
SLIDE 63

Minimize: data

18

Atoms Chunks Whole entities Data minimization Environment

slide-64
SLIDE 64

Minimize: data

18

Atoms Chunks Whole entities Data minimization Environment

slide-65
SLIDE 65

Minimize: data

18

Atoms Chunks Whole entities Data minimization Environment

slide-66
SLIDE 66

Minimize: data

18

Atoms Chunks Whole entities Data minimization Environment

slide-67
SLIDE 67

Minimize: data

18

Atoms Chunks Whole entities Data minimization Environment

slide-68
SLIDE 68

Minimize: data

18

Atoms Chunks Whole entities Data minimization Environment

slide-69
SLIDE 69

Minimize: data

18

Atoms Chunks Whole entities Data minimization Environment

slide-70
SLIDE 70

The tool: ADDA

Assisting the Debugging

  • f Deployed Applications
  • Record and Replay:
  • Works on x86 (c-lib based) binaries
  • Based on dynamic instrumentation (Pin)
  • Maps c-library calls to interaction events
  • Minimization:
  • Set of extensible scripts

19

slide-71
SLIDE 71

Limitations

Two main limitations:

  • Technique:

May not replay non-deterministic failures

  • Implementation:

Does not handle window system events (yet)

20

slide-72
SLIDE 72

Empirical evaluation

  • Research questions
  • Can ADDA produce minimized executions that can be

used to debug the original failure?

  • How much overhead does ADDA impose?
  • Subject:
  • Pine — widely-used email / news client
  • Data:
  • Two real field failures from Pine’s history
  • Set of 20 failing executions, 10 per failure

21

slide-73
SLIDE 73

Empirical evaluation

  • Research questions
  • Can ADDA produce minimized executions that can be

used to debug the original failure?

  • How much overhead does ADDA impose?
  • Subject:
  • Pine — widely-used email / news client
  • Data:
  • Two real field failures from Pine’s history
  • Set of 20 failing executions, 10 per failure

22

slide-74
SLIDE 74

Minimization results

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% # entities streams size files size

23

Average value after minimization Header-color fault Address book fault

slide-75
SLIDE 75

Minimization results

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% # entities streams size files size

23

Average value after minimization Header-color fault Address book fault

Moreover, these results are conservative: recorded executions only contain the minimal amount of data needed to perform an action.

slide-76
SLIDE 76

Minimization results

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% # entities streams size files size

23

Average value after minimization Header-color fault Address book fault

Overhead

  • Offline: less than 75 minutes for minimization
  • Online: negligible overhead while recording

Moreover, these results are conservative: recorded executions only contain the minimal amount of data needed to perform an action.

slide-77
SLIDE 77

Specific Example: Address Book Failure

  • Complete execution
  • 34 entities (files and streams)
  • ≈800kb
  • Minimized execution
  • 5 partial entities (4 files,1 stream)
  • ≈72kb
slide-78
SLIDE 78

Future work

  • More studies: additional applications and real users
  • Extend technique / implementation
  • Support windowing system
  • Investigate ad-hoc minimization algorithms
  • Include non-deterministic events (if needed)

25

slide-79
SLIDE 79

Conclusions

  • Novel approach that supports debugging

field failures

  • Prototype implementation for x86 binaries
  • Preliminary empirical evaluation: for the cases

considered, our technique can

  • 1. minimize failing executions,
  • 2. preserve their failing behavior, and
  • 3. impose low overhead on users

26

slide-80
SLIDE 80

Questions?

27