[PPT] - Software Debugging: Past, Present, and Future Alessandro (Alex) PowerPoint Presentation

SLIDE 1

Software Debugging:  Past, Present, and Future

Alessandro (Alex) Orso

School of Computer Science – College of Computing Georgia Institute of Technology http://www.cc.gatech.edu/~orso/ Partially supported by: NSF, Google, IBM, and MSR

SLIDE 2

Automated Debugging

SLIDE 3

SLIDE 4

Past Present Future

SLIDE 5

Past

A Short History of Debugging n

t

s

SLIDE 6

The Birth of Debugging

First reference to software errors Your guess?

??? 2017

SLIDE 7

The Birth of Debugging

Software errors mentioned in Ada Byron's

notes on Charles Babbage's analytical engine

Several uses of the term bug to indicate

defects in computers and software

First actual bug and actual debugging

(Admiral Grace Hopper’s associates working

n Mark II Computer at Harvard University)

1840 2017 1843

SLIDE 8

The Birth of Debugging

Software errors mentioned in Ada Byron's

notes on Charles Babbage's analytical engine

Several uses of the term bug to indicate

defects in computers and software

First actual bug and actual debugging

(Admiral Grace Hopper’s associates working

n Mark II Computer at Harvard University)

1840 2017 ... 1940

SLIDE 9

The Birth of Debugging

Software errors mentioned in Ada Byron's

notes on Charles Babbage's analytical engine

Several uses of the term bug to indicate

defects in computers and software

First actual bug and actual debugging

(Admiral Grace Hopper’s associates working

n Mark II Computer at Harvard University)

1840 2017 1947

SLIDE 10

The Birth of Debugging

Software errors mentioned in Ada Byron's

notes on Charles Babbage's analytical engine

Several uses of the term bug to indicate

defects in computers and software

First actual bug and actual debugging

(Admiral Grace Hopper’s associates working

n Mark II Computer at Harvard University)

1840 2017 1947

SLIDE 11

Symbolic Debugging

UNIVAC 1100’s FLIT

(Fault Location by Interpretive Testing)

Richard Stallman’s GDB
DDD
...

1840 2017 1962

SLIDE 12

Symbolic Debugging

UNIVAC 1100’s FLIT

(Fault Location by Interpretive Testing)

GDB
DDD
...

1840 2017 1986

SLIDE 13

Symbolic Debugging

UNIVAC 1100’s FLIT

(Fault Location by Interpretive Testing)

GDB
DDD
...

1840 2017 1996

SLIDE 14

Symbolic Debugging

UNIVAC 1100’s FLIT

(Fault Location by Interpretive Testing)

GDB
DDD
...

1840 2017 1996

SLIDE 15

Program Slicing

Intuition: developers “slice” backwards

when debugging

Weiser’s breakthrough paper
Korel and Laski’s dynamic slicing
Agrawal
Ko’s Whyline

1960 2017 1981

SLIDE 16

Program Slicing

Intuition: developers “slice” backwards

when debugging

Weiser’s breakthrough paper
Korel and Laski’s dynamic slicing
Agrawal
Ko’s Whyline

1960 2017 1981

SLIDE 17

Static Slicing Example

mid() { int x,y,z,m; 1: read(“Enter 3 numbers:”,x,y,z); 2: m = z; 3: if (y<z) 4: if (x<y) 5: m = y; 6: else if (x<z) 7: m = y; // bug 8: else 9: if (x>y) 10: m = y; 11: else if (x>z) 12: m = x; 13: print(“Middle number is:”, m); }

SLIDE 18

Program Slicing

Intuition: developers “slice” backwards

when debugging

Weiser’s breakthrough paper
Korel and Laski’s dynamic slicing
Agrawal
Ko’s Whyline

1960 2017 1981

SLIDE 19

Program Slicing

Intuition: developers “slice” backwards

when debugging

Weiser’s breakthrough paper
Korel and Laski’s dynamic slicing
Agrawal
Ko’s Whyline

1960 2017 1988  1993

SLIDE 20

Dynamic Slicing Example

mid() { int x,y,z,m; 1: read(“Enter 3 numbers:”,x,y,z); 2: m = z; 3: if (y<z) 4: if (x<y) 5: m = y; 6: else if (x<z) 7: m = y; // bug 8: else 9: if (x>y) 10: m = y; 11: else if (x>z) 12: m = x; 13: print(“Middle number is:”, m); }

SLIDE 21 3,3,5 1,2,3 3,2,1 5,5,5 5,3,4 2,1,3 P P P P P F

Test Cases

Pass/Fail

Dynamic Slicing Example

mid() { int x,y,z,m; 1: read(“Enter 3 numbers:”,x,y,z); 2: m = z; 3: if (y<z) 4: if (x<y) 5: m = y; 6: else if (x<z) 7: m = y; // bug 8: else 9: if (x>y) 10: m = y; 11: else if (x>z) 12: m = x; 13: print(“Middle number is:”, m); }

SLIDE 22 3,3,5 1,2,3 3,2,1 5,5,5 5,3,4 2,1,3 P P P P P F

Test Cases

Pass/Fail

Dynamic Slicing Example

mid() { int x,y,z,m; 1: read(“Enter 3 numbers:”,x,y,z); 2: m = z; 3: if (y<z) 4: if (x<y) 5: m = y; 6: else if (x<z) 7: m = y; // bug 8: else 9: if (x>y) 10: m = y; 11: else if (x>z) 12: m = x; 13: print(“Middle number is:”, m); }

SLIDE 23

Program Slicing

Intuition: developers “slice” backwards

when debugging

Weiser’s breakthrough paper
Korel and Laski’s dynamic slicing
Agrawal
Ko’s Whyline

1960 2017 1988  1993

SLIDE 24

Program Slicing

Intuition: developers “slice” backwards

when debugging

Weiser’s breakthrough paper
Korel and Laski’s dynamic slicing
Agrawal
Ko’s Whyline

1960 2017 2008 why didn’t this color panel change? why is this stroke black?

SLIDE 25

Program Slicing

Intuition: developers “slice” backwards

when debugging

Weiser’s breakthrough paper
Korel and Laski’s dynamic slicing
Agrawal
Ko’s Whyline

1960 2017 2008 why didn’t this color panel change? why is this stroke black?

SLIDE 26

Program Slicing

Intuition: developers “slice” backwards

when debugging

Weiser’s breakthrough paper
Korel and Laski’s dynamic slicing
Agrawal
Ko’s Whyline

1960 2017 2008 why didn’t this color panel change? why is this stroke black?

SLIDE 27

Delta Debugging

Intuition: it’s all about differences!
Isolates failure causes automatically
Zeller’s “Yesterday, My Program Worked.

Today, It Does Not. Why?” 1960 2017 1999

SLIDE 28

Delta Debugging

Intuition: it’s all about differences!
Isolates failure causes automatically
Zeller’s “Yesterday, My Program Worked.

Today, It Does Not. Why?”

Applied in several contexts

1960 2017 1999

SLIDE 29

✔ ✘

Today

✔

Yesterday

SLIDE 30

✔ ✘

Today

✔

Yesterday

✘

SLIDE 31

✔ ✘

Today

✔

Yesterday

✘ ✘

SLIDE 32

✔ ✘

Today

✔

Yesterday

✘ ✘

SLIDE 33

✔ ✘

Today

✔

Yesterday

✘ ✘ ✔ ✘

Failure cause … …

SLIDE 34

✔ ✘

Today

✔

Yesterday

✘ ✘ ✔ ✘

Failure cause … …

Applied to programs, inputs, states, ...

SLIDE 35

Statistical Debugging

Intuition: debugging techniques can

leverage multiple executions

Tarantula
Liblit’s CBI
Many others!

1960 2017 2001

SLIDE 36

Statistical Debugging

Intuition: debugging techniques can

leverage multiple executions

Tarantula
Liblit’s CBI
Many others!

1960 2017 2001

SLIDE 37

Tarantula

mid() { int x,y,z,m; 1: read(“Enter 3 numbers:”,x,y,z); 2: m = z; 3: if (y<z) 4: if (x<y) 5: m = y; 6: else if (x<z) 7: m = y; // bug 8: else 9: if (x>y) 10: m = y; 11: else if (x>z) 12: m = x; 13: print(“Middle number is:”, m); } 3,3,5 1,2,3 3,2,1 5,5,5 5,3,4 2,1,3 P P P P P F

Test Cases

Pass/Fail suspiciousness 0.5

SLIDE 38

Tarantula

mid() { int x,y,z,m; 1: read(“Enter 3 numbers:”,x,y,z); 2: m = z; 3: if (y<z) 4: if (x<y) 5: m = y; 6: else if (x<z) 7: m = y; // bug 8: else 9: if (x>y) 10: m = y; 11: else if (x>z) 12: m = x; 13: print(“Middle number is:”, m); } 3,3,5 1,2,3 3,2,1 5,5,5 5,3,4 2,1,3 P P P P P F

Test Cases

Pass/Fail suspiciousness 0.8 0.5 0.5 0.6 0.0 0.7 0.0 0.0 0.0 0.0 0.0 0.5 0.5

SLIDE 39

Tarantula

mid() { int x,y,z,m; 1: read(“Enter 3 numbers:”,x,y,z); 2: m = z; 3: if (y<z) 4: if (x<y) 5: m = y; 6: else if (x<z) 7: m = y; // bug 8: else 9: if (x>y) 10: m = y; 11: else if (x>z) 12: m = x; 13: print(“Middle number is:”, m); } 3,3,5 1,2,3 3,2,1 5,5,5 5,3,4 2,1,3 P P P P P F

Test Cases

Pass/Fail suspiciousness 0.8 0.5 0.5 0.6 0.0 0.7 0.0 0.0 0.0 0.0 0.0 0.5 0.5

SLIDE 40

Statistical Debugging

Intuition: debugging techniques can

leverage multiple executions

Tarantula
Liblit’s CBI
Many others!

1960 2017 2001

SLIDE 41

Statistical Debugging

Intuition: debugging techniques can

leverage multiple executions

Tarantula
CBI
Many others!

1960 2017 2003

SLIDE 42

Statistical Debugging

Intuition: debugging techniques can

leverage multiple executions

Tarantula
CBI
Ochiai
Many others!

1960 2017 2006

SLIDE 43

Statistical Debugging

Intuition: debugging techniques can

leverage multiple executions

Tarantula
CBI
Ochiai
Causal inference based
Many others!

1960 2017 2010

SLIDE 44

Statistical Debugging

Intuition: debugging techniques can

leverage multiple executions

Tarantula
CBI
Ochiai
Causal inference based
IR-based techniques

1960 2017 2008

SLIDE 45 Bug ID: 90018 Summary: Native tooltips left around on CTabFolder. Description: Hover over the PartStack CTabFolder inside eclipse until some native tooltip is displayed. For example, the maximize button. When the tooltip appears, change perspectives using the keybinding. the CTabFolder gets hidden, but its tooltip is permanently displayed and never goes away. Even if that CTabFolder is disposed (I'm assuming) when the perspective is closed.

IR-Based Techniques

SLIDE 46 Source code file: CTabFolder.java public class CTabFolder extends Composite { // tooltip int [] toolTipEvents = new int[] {SWT.MouseExit, SWT.MouseHover, SWT.MouseMove, SWT.MouseDown, SWT.DragDetect}; Listener toolTipListener; … / * Returns <code>true</code> if the CTabFolder

nly displys the selected tab

* and <code>false</code> if the CTabFolder displays multiple tabs. */ …void onMouseHover(Event event) { showToolTip(event.x, event.y); } void onDispose() { inDispose = true; hideToolTip(); … } } Bug ID: 90018 Summary: Native tooltips left around on CTabFolder. Description: Hover over the PartStack CTabFolder inside eclipse until some native tooltip is displayed. For example, the maximize button. When the tooltip appears, change perspectives using the keybinding. the CTabFolder gets hidden, but its tooltip is permanently displayed and never goes away. Even if that CTabFolder is disposed (I'm assuming) when the perspective is closed.

IR-Based Techniques

SLIDE 47

Statistical Debugging

Intuition: debugging techniques can

leverage multiple executions

Tarantula
CBI
Ochiai
Causal inference based
IR-based techniques
Many others!

1960 2017 ...

SLIDE 48

Additional Techniques

Contracts (e.g., Meyer et al.)
Counterexample-based (e.g., Groce et al., Ball et al.)
Tainting-based (e.g., Leek et al.)
Debugging of field failures (e.g., Jin et al.)
Predicate switching (e.g., Zhang et al.)
Fault localization for multiple faults (e.g., Steimann et al.)
Debugging of concurrency failures (e.g., Park et al.)
Automated data structure repair (e.g., Rinard et al.)
Finding patches with genetic programming
Domain specific fixes

(tests, web pages, comments, concurrency)

Identifying workarounds/recovery strategies (e.g., Gorla et al.)
Formula based debugging (e.g., Jose et al., Ermis et al.)
...

1960 2017

SLIDE 49

Additional Techniques

Contracts (e.g., Meyer et al.)
Counterexample-based (e.g., Groce et al., Ball et al.)
Tainting-based (e.g., Leek et al.)
Debugging of field failures (e.g., Jin et al.)
Predicate switching (e.g., Zhang et al.)
Fault localization for multiple faults (e.g., Steimann et al.)
Debugging of concurrency failures (e.g., Park et al.)
Automated data structure repair (e.g., Rinard et al.)
Finding patches with genetic programming
Domain specific fixes

(tests, web pages, comments, concurrency)

Identifying workarounds/recovery strategies (e.g., Gorla et al.)
Formula based debugging (e.g., Jose et al., Ermis et al.)
...

1960 2017

Not meant to be comprehensive!

SLIDE 50

Present

Can We Debug at the Push of a Button?

SLIDE 51

Automated Debugging

(rank based)

…"

1)" 2)" 3)" 4)"

SLIDE 52

Automated Debugging

(rank based)

…"

1)" 2)" 3)" 4)"

Here$is$a$list$of$ places$to$check$out$

SLIDE 53

Automated Debugging

Conceptual Model

…"

1)" 2)" 3)" 4)"

Ok,$I$will$check$out$ your$sugges3ons$

ne$by$one.$

SLIDE 54

Automated Debugging

Conceptual Model

…"

1)" 2)" 3)" 4)"

✔ ✔ ✔

Found&the&bug!&

SLIDE 55 0 20 40 60 80 100 % of faulty versions 20 40 60 80 100 % of program to be examined to find fault Siemens Space

Performance of Automated Debugging Techniques

Spectra-Based   Techniques

SLIDE 56 0 20 40 60 80 100 % of faulty versions 20 40 60 80 100 % of program to be examined to find fault Siemens Space

Performance of Automated Debugging Techniques

Spectra-Based   Techniques

M i s s i

n

A c c

m

p l i s h e d ?

SLIDE 57 100 LOC ➡ 10 LOC 10,000 LOC ➡ 1,000 LOC 100,000 LOC ➡ 10,000 LOC

Assumption #1: Locating a bug in 10% of the code is a great result

SLIDE 58

Assumption #2: Programmers exhibit perfect bug understanding

Do you see a bug?

SLIDE 59

Assumption #3: Programmers inspect a list linearly and exhaustively

Good for comparison, but is it realistic?

SLIDE 60

Assumption #3: Programmers inspect a list linearly and exhaustively

Good for comparison, but is it realistic?

Does the conceptual model make sense? Have we really evaluated it?

SLIDE 61

Assumption #3: Programmers inspect a list linearly and exhaustively

Good for comparison, but is it realistic?

Does the conceptual model make sense? Have we really evaluated it? Are we headed in the right direction?

SLIDE 62 “Are Automated Debugging Techniques Actually Helping Programmers?” ISSTA 2011 

C. Parnin and A. Orso

“Evaluating the Usefulness of IR-Based Fault Localization Techniques” ISSTA 2015 

Q. Wang, C. Parnin, and A. Orso

Are we headed in the right direction?

SLIDE 63

What do we know about automated debugging?

Studies on tools Human studies

SLIDE 64

What do we know about automated debugging?

Studies on tools Human studies Let’s&see…& Over&50&years&of&research&

n&automated&debugging.&

1962.&Symbolic&Debugging&(UNIVAC&FLIT)& 1981.%Weiser.%Program%Slicing% 1999.$Delta$Debugging$ 2001.%Sta)s)cal%Debugging%

SLIDE 65

What do we know about automated debugging?

S t u d i e s

n

t

l

s H u m a n s t u d i e s Weiser Kusumoto Sherwood Ko DeLine

SLIDE 66 RQ1: Do programmers who use automated debugging tools locate bugs faster than programmers who do not use such tools? RQ2: Is the effectiveness of debugging with automated tools affected by the faulty statement’s rank? RQ3: Do developers navigate a list of statements ranked by suspiciousness in the order provided? RQ4: Does perfect bug understanding exist?

Are these Techniques and Tools Actually Helping Programmers?

SLIDE 67 RQ1: Do programmers who use automated debugging tools locate bugs faster than programmers who do not use such tools? RQ2: Is the effectiveness of debugging with automated tools affected by the faulty statement’s rank? RQ3: Do developers navigate a list of statements ranked by suspiciousness in the order provided? RQ4: Does perfect bug understanding exist?

Are these Techniques and Tools Actually Helping Programmers?

User studies:  Spectra based fault localization  IR-based fault localization User studies:  Spectra based fault localization  IR-based fault localization

SLIDE 68 Tools

Rank-based tool

(Eclipse plug-in, logging)

Eclipse debugger

Participants: 34 developers MS’s Students Different levels of expertise  (low, medium, high)

Experimental Protocol: Setup

…" 1)" 2)" 3)" 4)" ✔ ✔ ✔

SLIDE 69 Software subjects:

Tetris (~2.5KLOC)
NanoXML (~4.5KLOC)

Experimental Protocol: Setup

…" 1)" 2)" 3)" 4)" ✔ ✔ ✔

SLIDE 70

Tetris Bug

(Easier)

SLIDE 71

NanoXML Bug

(Harder)

SLIDE 72 Tasks:

Fault in Tetris
Fault in NanoXML
30 minutes per task
Questionnaire at the end

Experimental Protocol: Setup

…" 1)" 2)" 3)" 4)" ✔ ✔ ✔

SLIDE 73

Experimental Protocol:

Studies and Groups

SLIDE 74

Experimental Protocol:

Studies and Groups

A! B!

Part 1

SLIDE 75

Experimental Protocol:

Studies and Groups

Part 2

C! D!

Rank! Rank!

7➡35 83➡16

SLIDE 76

Study Results

Tetris NanoXML A B C D A! B! C! D! Rank! Rank!

SLIDE 77

Study Results

Tetris NanoXML A Not significantly different B C D A! B! C! D! Rank! Rank!

SLIDE 78

Study Results

Tetris NanoXML A Not significantly different Not significantly different B C Not significantly different Not significantly different D A! B! C! D! Rank! Rank!

SLIDE 79

Study Results

Tetris NanoXML A Significantly different for high performers Not significantly different B C Not significantly different Not significantly different D A! B! C! D! Rank! Rank! Stratifying participants

SLIDE 80

Study Results

Tetris NanoXML A Significantly different for high performers Not significantly different B C Not significantly different Not significantly different D A! B! C! D! Rank! Rank! Stratifying participants

Analysis of results and questionnaires...

SLIDE 81

Findings

RQ1: Do programmers who use automated debugging tools locate bugs faster than programmers who do not use such tools? Experts are faster when using the tool ➡ Yes (with caveats) RQ2: Is the effectiveness of debugging with automated tools affected by the faulty statement’s rank Changes in rank have no significant effects ➡ No RQ3: Do developers navigate a list of statements ranked by suspiciousness in the order provided? Programmers do not visit each statement in the list, they search RQ4: Does perfect bug understanding exist? Perfect bug understanding is generally not a realistic assumption

SLIDE 82

Future

Where Shall We Go Next?

SLIDE 83

Feedback-based Debugging (Humans in the Loop)

Intuition: we should amplify, rather than replace human skills

SLIDE 84 Pass Fail

…"

1)" 2)" 3)" 4)" SFL

SLIDE 85 Pass Fail

…"

1)" 2)" 3)" 4)" SFL Assumption: Programmers exhibit perfect bug understanding Assumption: Programmers inspect a list linearly and exhaustively

SLIDE 86 Pass Fail

…"

1)" 2)" 3)" 4)" SFL Assumption: Programmers exhibit perfect bug understanding Assumption: Programmers inspect a list linearly and exhaustively

✗ ✗

SLIDE 87 Pass Fail Conjecture of Fault Cause Partial  Execution Pass Fail …" 1)" 2)" 3)" 4)" SFL

SLIDE 88 Pass Fail Conjecture of Fault Cause Partial  Execution …" 1)" 2)" 3)" 4)" SFL

SLIDE 89 Pass Fail Conjecture of Fault Cause Method-level Execution …" 1)" 2)" 3)" 4)" SFL

SLIDE 90 Pass Fail Conjecture of Fault Cause Method-level Execution …" 1)" 2)" 3)" 4)" SFL High-level Query state, params method return, state’ ✓ | ✗ | ?

Swift

SLIDE 91

Example

SLIDE 92

Iteration 1

✓ ✗ ?

SLIDE 93

Iteration 2

✓ ✗ ?

SLIDE 94

Iteration 3

✓ ✗ ?

SLIDE 95 Pass Fail Conjecture of Fault Cause Method-level Execution …" 1)" 2)" 3)" 4)" SFL High-level Query state, params method return, state’ ✓ | ✗ | ?

Swift

Virtual Test Cases

SLIDE 96 Pass Fail Conjecture of Fault Cause Method-level Execution …" 1)" 2)" 3)" 4)" SFL High-level Query state, params method return, state’ ✓ | ✗ | ?

Swift

Virtual Test Cases

Preliminary empirical results:

20 faults from 3 open-source projects
Average ranking: 66.3
Average # of queries: 4.3

SLIDE 97

Formula-based Debugging (AKA Failure Explanation)

Intuition: executions can be expressed as

formulas that we can reason about

Cause Clue Clauses
Error invariants
Error invariants

SLIDE 98 Assertion Input

SLIDE 99 Assertion Input Input A B C Assertion ✘ Unsatisfiable Formula Semantics of the program ⋀ ⋀ ⋀ ⋀ Formula

SLIDE 100 Assertion Input Input A B C Assertion MAX-SAT set ✔ MAX-SAT solver ⋀ ⋀ ⋀ CoMSS ✘ Formula Input A B C Assertion ✘ Unsatisfiable Formula Semantics of the program ⋀ ⋀ ⋀ ⋀

SLIDE 101 Assertion Input Input A B C Assertion MAX-SAT set ✔ MAX-SAT solver ⋀ ⋀ ⋀ CoMSS ✘ Formula Input A B C Assertion ✘ Unsatisfiable Formula Semantics of the program ⋀ ⋀ ⋀ ⋀

SLIDE 102

Formula-based Debugging (AKA Failure Explanation)

Intuition: executions can be expressed as

formulas that we can reason about

Bug Assist
Error invariants
Error invariants

SLIDE 103

Formula-based Debugging (AKA Failure Explanation)

Intuition: executions can be expressed as

formulas that we can reason about

Bug Assist
Error invariants
Error invariants

SLIDE 104

Formula-based Debugging (AKA Failure Explanation)

Intuition: executions can be expressed as

formulas that we can reason about

Bug Assist
Error invariants
Angelic Debugging
Error invariants

SLIDE 105

Formula-based Debugging (AKA Failure Explanation)

Intuition: executions can be expressed as

formulas that we can reason about

Bug Assist
Error invariants
Angelic Debugging
Error invariants

SLIDE 106

We came a long way since the early days of debugging

But there is still a long way to go...

In Summary

...

SLIDE 107

Where Shall We Go Next?

Hybrid, semi-automated fault localization techniques
Failure understanding and explanation
Debugging of field failures (with limited information)
(Semi-)automated repair and workarounds
Takeaway messages (true also for other areas):
Again: don’t pursue full automation at all costs!
Be careful (and honest) with your assumptions!
User studies, user studies, user studies!

SLIDE 108

With much appreciated input/contributions from

Andy Ko
Wei Jin
Jim Jones
Wes Masri
Chris Parnin
Abhik Roychoudhury
Wes Weimer
Tao Xie
Andreas Zeller
Xiangyu Zhang