SCARPE: A Technique and Tool for Selective Capture and Replay of - - PowerPoint PPT Presentation

scarpe a technique and tool for selective capture and
SMART_READER_LITE
LIVE PREVIEW

SCARPE: A Technique and Tool for Selective Capture and Replay of - - PowerPoint PPT Presentation

SCARPE: A Technique and Tool for Selective Capture and Replay of Program Executions Shrinivas Joshi Alessandro Orso Advanced Micro Devices Georgia Institute of Technology Shrinivas.Joshi@amd.com orso@cc.gatech.edu This work was supported in


slide-1
SLIDE 1

Shrinivas Joshi

Advanced Micro Devices Shrinivas.Joshi@amd.com

Alessandro Orso

Georgia Institute of Technology

  • rso@cc.gatech.edu

SCARPE: A Technique and Tool for Selective Capture and Replay

  • f Program Executions

This work was supported in part by NSF awards CCF-0541080 and CCR-0205422 to Georgia Tech.

slide-2
SLIDE 2

Collecting Field Data

In house In the field Developers

slide-3
SLIDE 3

Collecting Field Data

In house In the field Developers

slide-4
SLIDE 4

Collecting Field Data

?

In house In the field Developers

slide-5
SLIDE 5

Collecting Field Data

?

Field Data In house In the field Developers

slide-6
SLIDE 6

Collecting Field Data

?

Field Data In house In the field Developers

slide-7
SLIDE 7

Collecting Field Data

?

Field Data In house In the field

Maintenance tasks: Debugging Regression testing Impact analysis Behavior classification ...

Developers

slide-8
SLIDE 8
  • Motivation and Overview
  • Record & Replay Technique
  • Implementation and Evaluation
  • Conclusions and Future Work

Presentation Outline

slide-9
SLIDE 9
  • Motivation and Overview
  • Record & Replay Technique
  • Implementation and Evaluation
  • Conclusions and Future Work

Presentation Outline

slide-10
SLIDE 10

Record & Replay: Issues

slide-11
SLIDE 11

Record & Replay: Issues

Users DB Network

slide-12
SLIDE 12

Record & Replay: Issues

  • Practicality
  • High volume of data
  • Ad-hoc mechanisms
  • Inefficiency in recording

Users DB Network

slide-13
SLIDE 13

Record & Replay: Issues

  • Practicality
  • High volume of data
  • Ad-hoc mechanisms
  • Inefficiency in recording
  • Privacy
  • Sensitive information

Users DB Network

slide-14
SLIDE 14

Record & Replay: Issues

  • Practicality
  • High volume of data
  • Ad-hoc mechanisms
  • Inefficiency in recording
  • Privacy
  • Sensitive information
  • Safety
  • Side effects

Users DB Network

slide-15
SLIDE 15

Record & Replay: Issues

  • Practicality
  • High volume of data
  • Ad-hoc mechanisms
  • Inefficiency in recording
  • Privacy
  • Sensitive information
  • Safety
  • Side effects

Users DB Network

Our technique

  • Is specifically designed to be used on deployed software

(but can also be used in-house)

  • Mitigates practicality, safety, and privacy issues through
  • novel technical solutions
  • careful engineering
slide-16
SLIDE 16

Overview of the Approach

R e c

  • r

d R e p l a y

slide-17
SLIDE 17

Overview of the Approach

R e c

  • r

d R e p l a y

slide-18
SLIDE 18

Overview of the Approach

Subsystem

  • f interest

R e c

  • r

d R e p l a y

slide-19
SLIDE 19

Overview of the Approach

Subsystem

  • f interest

R e c

  • r

d R e p l a y

slide-20
SLIDE 20

Overview of the Approach

Subsystem

  • f interest

Output Input

R e c

  • r

d R e p l a y

Environment

slide-21
SLIDE 21

Overview of the Approach

Subsystem

  • f interest

Output Input Event Log

R e c

  • r

d R e p l a y

Environment

slide-22
SLIDE 22

Overview of the Approach

Subsystem

  • f interest

Output Input Event Log Subsystem

  • f interest

R e c

  • r

d R e p l a y

Environment

slide-23
SLIDE 23

Replay Scaffolding

Overview of the Approach

Subsystem

  • f interest

Output Input Event Log Subsystem

  • f interest

R e c

  • r

d R e p l a y

Environment

slide-24
SLIDE 24

Replay Scaffolding

Overview of the Approach

Subsystem

  • f interest

Output Input Event Log Event Log Subsystem

  • f interest

R e c

  • r

d R e p l a y

Environment

slide-25
SLIDE 25

Replay Scaffolding

Overview of the Approach

Subsystem

  • f interest

Output Input Event Log Event Log Subsystem

  • f interest

R e c

  • r

d R e p l a y

Environment

slide-26
SLIDE 26

Record: Recorded Events

Subsystem

  • f interest
slide-27
SLIDE 27

Record: Recorded Events

Subsystem

  • f interest

Method calls

  • INCALL
  • INCALLRET
  • OUTCALL
  • OUTCALLRET

... x = getRatio(myTree) ...

slide-28
SLIDE 28

Record: Recorded Events

Subsystem

  • f interest

x = getRatio(myTree)

Method calls

  • INCALL
  • INCALLRET
  • OUTCALL
  • OUTCALLRET

Method calls

  • INCALL
  • INCALLRET
  • OUTCALL
  • OUTCALLRET

... x = getRatio(myTree) ...

slide-29
SLIDE 29

Record: Recorded Events

Subsystem

  • f interest

x = getRatio(myTree) 28.5

Method calls

  • INCALL
  • INCALLRET
  • OUTCALL
  • OUTCALLRET

Method calls

  • INCALL
  • INCALLRET
  • OUTCALL
  • OUTCALLRET

Method calls

  • INCALL
  • INCALLRET
  • OUTCALL
  • OUTCALLRET

... x = getRatio(myTree) ...

slide-30
SLIDE 30

Record: Recorded Events

Subsystem

  • f interest

Method calls

  • INCALL
  • INCALLRET
  • OUTCALL
  • OUTCALLRET
slide-31
SLIDE 31

Record: Recorded Events

Subsystem

  • f interest

... n = it.next() ...

Method calls

  • INCALL
  • INCALLRET
  • OUTCALL
  • OUTCALLRET
slide-32
SLIDE 32

Record: Recorded Events

Subsystem

  • f interest

it.next() ... n = it.next() ...

Method calls

  • INCALL
  • INCALLRET
  • OUTCALL
  • OUTCALLRET

Method calls

  • INCALL
  • INCALLRET
  • OUTCALL
  • OUTCALLRET
slide-33
SLIDE 33

Record: Recorded Events

Subsystem

  • f interest

it.next() ... n = it.next() ... <some object>

Method calls

  • INCALL
  • INCALLRET
  • OUTCALL
  • OUTCALLRET

Method calls

  • INCALL
  • INCALLRET
  • OUTCALL
  • OUTCALLRET

Method calls

  • INCALL
  • INCALLRET
  • OUTCALL
  • OUTCALLRET
slide-34
SLIDE 34

Record: Recorded Events

Subsystem

  • f interest

Method calls

  • INCALL
  • INCALLRET
  • OUTCALL
  • OUTCALLRET

INCALL / OUTCALL event{

  • Callee’s type
  • Callee’s object ID
  • Callee’s signature
  • Parameter*
slide-35
SLIDE 35

Record: Recorded Events

Method calls

  • INCALL
  • INCALLRET
  • OUTCALL
  • OUTCALLRET
slide-36
SLIDE 36

Record: Recorded Events

Method calls

  • INCALL
  • INCALLRET
  • OUTCALL
  • OUTCALLRET

Field Accesses

  • INWRITE
  • OUTWRITE
  • OUTREAD
slide-37
SLIDE 37

Record: Recorded Events

Method calls

  • INCALL
  • INCALLRET
  • OUTCALL
  • OUTCALLRET

Field Accesses

  • INWRITE
  • OUTWRITE
  • OUTREAD

Exceptions

  • EXCIN
  • EXCOUT
slide-38
SLIDE 38

Record: Capturing Partial Data

Subsystem

  • f interest
slide-39
SLIDE 39

Record: Capturing Partial Data

  • Recording complete data is impractical

(huge time/space overhead in preliminary studies)

Subsystem

  • f interest
slide-40
SLIDE 40

Record: Capturing Partial Data

  • Recording complete data is impractical

(huge time/space overhead in preliminary studies)

Subsystem

  • f interest

x = getRatio(hugeTree) 28.5

slide-41
SLIDE 41

Record: Capturing Partial Data

  • Recording complete data is impractical

(huge time/space overhead in preliminary studies)

Subsystem

  • f interest

double getRatio(HugeTree ht) { Iterator it = ht.iterator(); while (it.hasNext()) { Node n = (Node)it.next(); double res = n.val; if (res > 0) return res / norm; } x = getRatio(hugeTree) 28.5

slide-42
SLIDE 42

Record: Capturing Partial Data

  • Recording complete data is impractical

(huge time/space overhead in preliminary studies)

➡Record only data that affect the computation

  • Scalar values
  • Object IDs and Types

Subsystem

  • f interest

double getRatio(HugeTree ht) { Iterator it = ht.iterator(); while (it.hasNext()) { Node n = (Node)it.next(); double res = n.val; if (res > 0) return res / norm; } x = getRatio(hugeTree) 28.5

slide-43
SLIDE 43

Record: Capturing Partial Data

  • Recording complete data is impractical

(huge time/space overhead in preliminary studies)

➡Record only data that affect the computation

  • Scalar values
  • Object IDs and Types

Subsystem

  • f interest

double getRatio(HugeTree ht) { Iterator it = ht.iterator(); while (it.hasNext()) { Node n = (Node)it.next(); double res = n.val; if (res > 0) return res / norm; } x = getRatio(hugeTree) 28.5 28.5

slide-44
SLIDE 44

Record: Capturing Partial Data

  • Recording complete data is impractical

(huge time/space overhead in preliminary studies)

➡Record only data that affect the computation

  • Scalar values
  • Object IDs and Types

Subsystem

  • f interest

double getRatio(HugeTree ht) { Iterator it = ht.iterator(); while (it.hasNext()) { Node n = (Node)it.next(); double res = n.val; if (res > 0) return res / norm; } x = getRatio(hugeTree) 28.5 28.5 <1110, some.package.HugeTree>

slide-45
SLIDE 45

Possible Applications

slide-46
SLIDE 46

Possible Applications

Subsystem

  • f interest

Output Input Event Log Environment

  • 1. Debugging of field failures
  • 2. Unit test cases from user executions
  • 3. Post-mortem dynamic analysis
slide-47
SLIDE 47

Possible Applications

Subsystem

  • f interest

Environment

  • 1. Debugging of field failures
  • 2. Unit test cases from user executions
  • 3. Post-mortem dynamic analysis
slide-48
SLIDE 48

Possible Applications

Subsystem

  • f interest

Output Input Event Log Environment

  • 1. Debugging of field failures
  • 2. Unit test cases from user executions
  • 3. Post-mortem dynamic analysis
slide-49
SLIDE 49

Possible Applications

Subsystem

  • f interest

Output Input Event Log Environment

  • 1. Debugging of field failures
  • 2. Unit test cases from user executions
  • 3. Post-mortem dynamic analysis
slide-50
SLIDE 50

Possible Applications

Subsystem

  • f interest

Output Input Event Log Environment

  • 1. Debugging of field failures
  • 2. Unit test cases from user executions
  • 3. Post-mortem dynamic analysis
  • For component failures

[WODA 06]

  • For complete executions

[ICSE 07]

slide-51
SLIDE 51

Possible Applications

Subsystem

  • f interest

Output Input Event Log Environment

  • 1. Debugging of field failures
  • 2. Unit test cases from user executions
  • 3. Post-mortem dynamic analysis
slide-52
SLIDE 52

Possible Applications

Subsystem

  • f interest

Output Input Event Log Environment

  • 1. Debugging of field failures
  • 2. Unit test cases from user executions
  • 3. Post-mortem dynamic analysis

Unit of interest

slide-53
SLIDE 53

Possible Applications

Subsystem

  • f interest

Output Input Event Log Environment

  • 1. Debugging of field failures
  • 2. Unit test cases from user executions
  • 3. Post-mortem dynamic analysis

Unit of interest Unit Test Case

slide-54
SLIDE 54

Possible Applications

Subsystem

  • f interest

Output Input Event Log Environment

  • 1. Debugging of field failures
  • 2. Unit test cases from user executions
  • 3. Post-mortem dynamic analysis

Unit of interest Unit Test Case

  • For safe component updates

[PASTE 05]

  • For regression testing

(in progress)

slide-55
SLIDE 55

Possible Applications

Subsystem

  • f interest

Output Input Event Log Environment

  • 1. Debugging of field failures
  • 2. Unit test cases from user executions
  • 3. Post-mortem dynamic analysis

Unit of interest Unit Test Case

  • For safe component updates

[PASTE 05]

  • For regression testing

(in progress) Can also be a system test!

slide-56
SLIDE 56

Possible Applications

Subsystem

  • f interest

Output Input Event Log Environment

  • 1. Debugging of field failures
  • 2. Unit test cases from user executions
  • 3. Post-mortem dynamic analysis
slide-57
SLIDE 57

Possible Applications

Subsystem

  • f interest

Output Input Event Log Environment

  • 1. Debugging of field failures
  • 2. Unit test cases from user executions
  • 3. Post-mortem dynamic analysis
slide-58
SLIDE 58

Possible Applications

  • 1. Debugging of field failures
  • 2. Unit test cases from user executions
  • 3. Post-mortem dynamic analysis

Replay Scaffolding Subsystem

  • f interest
slide-59
SLIDE 59

Possible Applications

  • 1. Debugging of field failures
  • 2. Unit test cases from user executions
  • 3. Post-mortem dynamic analysis

Replay Scaffolding Subsystem

  • f interest

Instrumented Subsystem

  • f interest
slide-60
SLIDE 60

Possible Applications

  • 1. Debugging of field failures
  • 2. Unit test cases from user executions
  • 3. Post-mortem dynamic analysis

Replay Scaffolding Event Log Subsystem

  • f interest

Instrumented Subsystem

  • f interest

Analysis Results

slide-61
SLIDE 61

Possible Applications

  • 1. Debugging of field failures
  • 2. Unit test cases from user executions
  • 3. Post-mortem dynamic analysis

Replay Scaffolding Event Log Subsystem

  • f interest

Instrumented Subsystem

  • f interest

Analysis Results

  • For example:

memory leak detection

slide-62
SLIDE 62
  • Motivation and Overview
  • Record & Replay Technique
  • Implementation and Evaluation
  • Conclusions and Future Work

Presentation Outline

slide-63
SLIDE 63
  • Motivation and Overview
  • Record & Replay Technique
  • Implementation and Evaluation
  • Conclusions and Future Work

Presentation Outline

slide-64
SLIDE 64

The Tool: SCARPE

Selective CApture and Replay of Program Executions

Instrumentation Module

SCARPE Toolset

Record Module JVM raw events execution events I/O class names Users Event Log Observed Set Program Instrumented Program

slide-65
SLIDE 65

The Tool: SCARPE

Selective CApture and Replay of Program Executions

Instrumentation Module

SCARPE Toolset

Record Module JVM raw events execution events I/O class names Users Event Log Observed Set Program Instrumented Program

Replay performed in a similar way

slide-66
SLIDE 66

Empirical Study

  • RQ1 (feasibility): Can SCARPE correctly record and

replay different subsets of an application?

  • RQ2 (efficiency): Can SCARPE record executions

without imposing too much overhead?

  • Subjects:

# Classes KLOC # Test Cases NanoXML JABA

19 3.5 216 500 60 400

slide-67
SLIDE 67

RQ1 – Feasibility

(NanoXML)

slide-68
SLIDE 68

RQ1 – Feasibility

(NanoXML)

Experimental protocol

  • 1. For each class C in NanoXML
  • a. Specify C as the subsystem of interest
  • b. Run all test cases and record executions
  • 2. Replay all recorded executions (> 4,000)
slide-69
SLIDE 69

RQ1 – Feasibility

(NanoXML)

Experimental protocol

  • 1. For each class C in NanoXML
  • a. Specify C as the subsystem of interest
  • b. Run all test cases and record executions
  • 2. Replay all recorded executions (> 4,000)

Results

  • Record and replay successful for all classes

and all test cases

slide-70
SLIDE 70

RQ2 – Efficiency

(JABA)

slide-71
SLIDE 71

RQ2 – Efficiency

(JABA)

Experimental protocol

slide-72
SLIDE 72

RQ2 – Efficiency

(JABA)

Experimental protocol

  • 1. For each test case T in JABA’s test suite

a. Run T

  • b. Measure time to run T

c. Identify nine classes covered by T

slide-73
SLIDE 73

RQ2 – Efficiency

(JABA)

Experimental protocol

  • 1. For each test case T in JABA’s test suite

a. Run T

  • b. Measure time to run T

c. Identify nine classes covered by T

  • 2. For each class C and test case T considered

a. Specify C as the subsystem of interest

  • b. Run all test cases and record executions

c. Measure time to run T

slide-74
SLIDE 74

RQ2 – Efficiency

(JABA)

Experimental protocol

  • 1. For each test case T in JABA’s test suite

a. Run T

  • b. Measure time to run T

c. Identify nine classes covered by T

  • 2. For each class C and test case T considered

a. Specify C as the subsystem of interest

  • b. Run all test cases and record executions

c. Measure time to run T

  • 3. For each T, compare times to run T in (1) and (2)
slide-75
SLIDE 75

RQ2 – Efficiency

(JABA)

slide-76
SLIDE 76

RQ2 – Efficiency

(JABA)

Results

slide-77
SLIDE 77

RQ2 – Efficiency

(JABA)

Results

  • Space overhead limited:
  • 60 MB for largest log (~120M events)
  • ~50KB for 1000 events

(uncompressed, unoptimized)

slide-78
SLIDE 78

RQ2 – Efficiency

(JABA)

Results

  • Space overhead limited:
  • 60 MB for largest log (~120M events)
  • ~50KB for 1000 events

(uncompressed, unoptimized)

  • Time overhead varies widely
  • Minimum: 3%
  • Average: 97%
  • Maximum: 877%
slide-79
SLIDE 79

RQ2 – Detailed Results

225 450 675 900 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536

min max avg

×

Legend:

Percentage Ovehead Classes Considered

slide-80
SLIDE 80

RQ2 – Detailed Results

225 450 675 900 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536

min max avg

×

Legend:

Percentage Ovehead Classes Considered

  • Cost does not depend on event types
  • Overhead depends on #events/sec
  • For example:
  • Lowest overhead (3%): ~1K ev/sec
  • Highest overhead (877%): ~300K ev/sec
slide-81
SLIDE 81

RQ2 – Detailed Results

225 450 675 900 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536

min max avg

×

Legend:

Percentage Ovehead Classes Considered

  • Cost does not depend on event types
  • Overhead depends on #events/sec
  • For example:
  • Lowest overhead (3%): ~1K ev/sec
  • Highest overhead (877%): ~300K ev/sec

Further considerations

  • Overhead often between 30%-100%

(in the single digits in some cases)

  • May be acceptable for interactive apps
  • We are investigating optimizations

(No problem for in-house use)

slide-82
SLIDE 82
  • Motivation and Overview
  • Record & Replay Technique
  • Implementation and Evaluation
  • Conclusions and Future Work

Presentation Outline

slide-83
SLIDE 83
  • Motivation and Overview
  • Record & Replay Technique
  • Implementation and Evaluation
  • Conclusions and Future Work

Presentation Outline

slide-84
SLIDE 84

Related Work

  • Techniques for deterministic debugging (e.g.,

DejaVu [Choi et al. 98])

  • Techniques for automated mock-object creation

([Saff and Ernst 04], [Elbaum et al. 06])

  • Techniques for complete replay ([Steven and

Podgursky 00])

slide-85
SLIDE 85

Summary and Future Work

Output Input

Subsystem 1 Subsystem 2 Subsystem 3

Environment

slide-86
SLIDE 86

Summary and Future Work

Output Input

Subsystem 1 Subsystem 2 Subsystem 3

Environment Event Logs

Subsystem 1 Subsystem 2 Subsystem 3 ... EL1 EL2 EL3 ... EL1 EL2 EL3 ... EL1 EL2 EL3 ... ...

slide-87
SLIDE 87

Summary and Future Work

Output Input

Subsystem 1 Subsystem 2 Subsystem 3

Environment Event Logs

Subsystem 1 Subsystem 2 Subsystem 3 ... EL1 EL2 EL3 ... EL1 EL2 EL3 ... EL1 EL2 EL3 ... ...

1 ≤ cardinality ≤ #classes

slide-88
SLIDE 88

Summary and Future Work

Output Input

Subsystem 1 Subsystem 2 Subsystem 3

Environment Event Logs

Subsystem 1 Subsystem 2 Subsystem 3 ... EL1 EL2 EL3 ... EL1 EL2 EL3 ... EL1 EL2 EL3 ... ...

Same VS different subsystems at different sites

slide-89
SLIDE 89

Summary and Future Work

Output Input

Subsystem 1 Subsystem 2 Subsystem 3

Environment Event Logs

Subsystem 1 Subsystem 2 Subsystem 3 ... EL1 EL2 EL3 ... EL1 EL2 EL3 ... EL1 EL2 EL3 ... ...

Field VS in-house

slide-90
SLIDE 90

Summary and Future Work

Output Input

Subsystem 1 Subsystem 2 Subsystem 3

Environment Event Logs

Subsystem 1 Subsystem 2 Subsystem 3 ... EL1 EL2 EL3 ... EL1 EL2 EL3 ... EL1 EL2 EL3 ... ...

Always collect VS anomaly-driven collection Send back VS replay locally

slide-91
SLIDE 91

Summary and Future Work

Output Input

Subsystem 1 Subsystem 2 Subsystem 3

Environment Event Logs

Subsystem 1 Subsystem 2 Subsystem 3 ... EL1 EL2 EL3 ... EL1 EL2 EL3 ... EL1 EL2 EL3 ... ...

slide-92
SLIDE 92

Summary and Future Work

Output Input

Subsystem 1 Subsystem 2 Subsystem 3

Environment Event Logs Further validation (especially w.r.t. performance)

1

slide-93
SLIDE 93

Summary and Future Work

Output Input

Subsystem 1 Subsystem 2 Subsystem 3

Environment Event Logs Further validation (especially w.r.t. performance)

1

Improve performance (e.g., static/dynamic analysis for selection)

2

slide-94
SLIDE 94

Summary and Future Work

Output Input

Subsystem 1 Subsystem 2 Subsystem 3

Environment Event Logs Alternative approaches (binary level, JVM level)

3

Further validation (especially w.r.t. performance)

1

Improve performance (e.g., static/dynamic analysis for selection)

2

slide-95
SLIDE 95

Summary and Future Work

Output Input

Subsystem 1 Subsystem 2 Subsystem 3

Environment Event Logs Alternative approaches (binary level, JVM level)

3

Further validation (especially w.r.t. performance)

1

Improve performance (e.g., static/dynamic analysis for selection)

2

Investigate Applications (we mentioned three, there are more)

4

slide-96
SLIDE 96

Summary and Future Work

Output Input

Subsystem 1 Subsystem 2 Subsystem 3

Environment Event Logs

slide-97
SLIDE 97

Thank you!

slide-98
SLIDE 98

Questions?