State of the Art and Open Issues in Software Testing State of the - - PowerPoint PPT Presentation

state of the art and open issues in software testing
SMART_READER_LITE
LIVE PREVIEW

State of the Art and Open Issues in Software Testing State of the - - PowerPoint PPT Presentation

State of the Art and Open Issues in Software Testing State of the Art and Open Issues in Software Testing Software Testing d e a t g t i s e v i n d n a Most used approach 18 13.5 9 4.5 0 2000 2002 2004 2006 2008 2010


slide-1
SLIDE 1

State of the Art and Open Issues in Software Testing

slide-2
SLIDE 2

State of the Art and Open Issues in Software Testing

slide-3
SLIDE 3

Software Testing

Most used approach a n d i n v e s t i g a t e d

4.5 9 13.5 18 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018

Number of papers on testing at ICSE from 2000 to 2018

slide-4
SLIDE 4

Software Testing

Most used approach a n d i n v e s t i g a t e d

slide-5
SLIDE 5

Software Testing

Most used approach a n d i n v e s t i g a t e d

slide-6
SLIDE 6

Testing – A Travelogue

slide-7
SLIDE 7

A Travelogue – Goal

Discuss most successful research in software testing since 2000 Identify most significant challenges and opportunities

slide-8
SLIDE 8

A Travelogue – Approach

Two questions

  • 1. What do you think are the most significant

contributions to testing since 2000?

  • 2. What do you think are the biggest open

challenges and opportunities in this area?

Over 50 colleagues About 30 responses

slide-9
SLIDE 9

In a Nutshell

slide-10
SLIDE 10

Challenges/Opportunities

Testing Real-World Systems Oracles Probabilistic Approaches Testing Non-Functional Prop. Domain-Based Testing Leveraging Cloud and Crowd

Research Contributions

Automated Test Input Generation Dynamic Symbolic Execution Search-based Testing Random Testing Combined Techniques Testing Strategies Combinatorial Testing Model-Based Testing Mining/Learning from Field Data Regression Testing Empirical Studies & Infrastructure

Practical Contributions

Frameworks for Test Execution Continuous Integration

slide-11
SLIDE 11

So Many Things, So Little Time…

Automated Test Input Generation Empirical Studies
 & 
 Infrastructure Practical Contributions Leveraging Cloud and Crowd Regression Testing

slide-12
SLIDE 12

So Many Things, So Little Time…

Automated Test Input Generation Empirical Studies
 & 
 Infrastructure Practical Contributions Leveraging Cloud and Crowd Regression Testing

slide-13
SLIDE 13

Automated Test Input Generation

Input Not new, but resurgence

  • Symbolic execution
  • Search-based testing
  • Random/fuzz testing
  • Combined techniques

Achieve coverage goal Reach a given point/state

{

Technical improvements Powerful machines Powerful decision procedures Careful engineering

slide-14
SLIDE 14

Not new, but resurgence

  • Symbolic execution
  • Search-based testing
  • Random/fuzz testing
  • Combined techniques

Achieve coverage goal Reach a given point/state

{

Technical improvements Powerful machines Powerful decision procedures Careful engineering

Input

Automated Test Input Generation

slide-15
SLIDE 15

foo (x, y) { if(y > 0) { z = x + y; if(z < y) fail(); } print(“OK”); Normal execution: Input: x=4, y=3 Outcome: “OK” Symbolic execution: Input: x=x0, y=y0 Outcome: SS: x=x0, y=y0 PC: true SS: x=x0, y=y0 PC: y0 > 0 SS: x=x0, y=y0, z=x0+y0 PC: y0 > 0 SS: x=x0, y=y0, z=x0+y0 PC: y0 > 0 ∧ x0+y0 < y0 Symbolic execution: Input: x=x0, y=y0 Outcome: failure PC: y0 > 0 ∧ 
 x0 + y0 < y0 x0 = -1
 y0 = 4 solver

Symbolic Execution

slide-16
SLIDE 16

Symbolic execution: Input: x=x0, y=y0 Outcome: failure PC: y0 > 0 ∧ 
 x0 + y0 < y0 SS: x=x0, y=y0, z=x0+y0 PC: y0 > 0 ∧ x0+y0>10 foo (x, y) { if(x > 0) { z = cxf(x, y); if(z < y) fail(); } print(“OK”); Normal execution: Input: x=4, y=3 Outcome: “OK” SS: x=x0, y=y0, z=cxf(x0, y0)
 PC: y0 > 0 ∧ cxf(x0, y0) < y0 Symbolic execution: Input: x=x0, y=y0 Outcome: failure PC: y0 > 0 ∧ 
 cxf(x0, y0)<y0 solver

?

Symbolic Execution

slide-17
SLIDE 17

Symbolic execution: Input: x=x0, y=y0 Outcome: failure PC: x0 > 0 ∧ 
 cxf(x0, y0)<y0 Symbolic execution: Input: x=x0, y=y0 Outcome: failure PC: x0 > 0 ∧ 
 cxf(4, 3) < y0 Symbolic execution: Input: x=x0, y=y0 Outcome: failure PC: x0 > 0 ∧ 
 69 < y0 Normal execution: Input: x=4, y=3 Outcome: “OK” SS: x=x0, y=y0, z=cxf(x0, y0)
 PC: x0 > y0 ∧ cxf(x0, y0) < y0 SS: x=x0, y=y0 CS: x=4, y=3 PC: true SS: x=x0, y=y0 CS: x=4, y=3 PC: x0 > 0 SS: x=x0, y=y0, z=cxf(x0, y0) CS: x=4, y=3, z=69 PC: x0 > 0 SS: x=x0, y=y0, z=cxf(x0, y0) CS: x=4, y=3, z=69 PC: x0 > 0 ∧ cxf(x0, y0) > y0 foo (x, y) { if(x > 0) { z = cxf(x, y); if(z < y) fail(); } print(“OK”);

Dynamic Symbolic Execution

solver x0 = 10
 y0 = 80

Automated Test Input Generation

SS: x=x0, y=y0, z=cxf(x0, y0) CS: x=4, y=3, z=69 PC: x0 > 0 ∧ cxf(x0, y0) < y0

Success Stories

  • Academia: countless citations (e.g., over 1300

for DART), publications, and applications

  • Tools: Crest, Klee, Pex, Sage, Symbolic JPF, …
  • Industry: Microsoft, NASA, IBM, Fujitsu, …

Open Challenges

  • Highly structured inputs
  • External libraries
  • Large complex programs
  • Oracle problem
slide-18
SLIDE 18

So Many Things, So Little Time…

Empirical Studies
 & 
 Infrastructure Practical Contributions Leveraging Cloud and Crowd Regression Testing Automated Test Input Generation

slide-19
SLIDE 19

?

Test suite T Program P Program P'

Common Problem

  • Changes require rapid modification and testing for quick release

(time to market pressures)

  • This causes released software to have many defects

Approach

  • Focus on changes
  • Automate (as much as possible) the regression testing process

Research Question How can we test well to gain confidence in the changes in an efficient way before release of changed software?

Regression Testing

slide-20
SLIDE 20

Modified test suite Test-suite maintenance Obsolete test cases Test suite Tval Regression test selection Test suite T' Test-suite prioritization Prioritized Test suite T' Test-suite augmentation Test suite Taug Test-suite minimization Redundant test cases Minimized test suite T e s t

  • c

a s e m a n i p u l a t i

  • n

Test suite T

Regression Testing

Process and Issues

slide-21
SLIDE 21

Modified test suite Test-suite maintenance Obsolete test cases Test suite Tval Regression test selection Test suite T' Test-suite prioritization Prioritized Test suite T' Test-suite augmentation Test suite Taug Test-suite minimization Redundant test cases Minimized test suite T e s t

  • c

a s e m a n i p u l a t i

  • n

Test suite T

Regression Testing

Process and Issues

slide-22
SLIDE 22

RTS Algorithm

tc1 tc2 tc3 e1 e2 edges test cases X X X

if() doA doB e1 e2

G

if() doA doB e1 e2

G

if() doA doC e1 e2

G’ tc1 tc2 tc3 e1 e2 edges test cases X X X

  • 1. Build JIG for P
  • 2. Collect coverage data
  • 3. Build G’ and compare
  • 4. Select affected tests

if() if() doA doA doB doC doB doC

slide-23
SLIDE 23

Modified test suite Test-suite maintenance Obsolete test cases Test suite Tval Regression test selection Test suite T' Test-suite prioritization Prioritized Test suite T' Test-suite augmentation Test suite Taug Test-suite minimization Redundant test cases Minimized test suite T e s t

  • c

a s e m a n i p u l a t i

  • n

Test suite T

Regression Testing

Process and Issues

slide-24
SLIDE 24

Behavioral differences Code changes C Program P Program P' Test suite T Change analyzer Tests for C TC Test runner & Behavioral comparator Test case generator Raw behavioral differences Behavioral differences analyzer

Phase II: Behavioral comparison Phase III: Differential behavior analysis and reporting Phase I: Generation of test cases for changed code

BERT

slide-25
SLIDE 25

Code changes C Raw behavioral differences Program P Program P' Test suite T Change analyzer Tests for C TC Test runner & Behavioral comparator Test case generator Behavioral differences Raw behavioral differences Behavioral differences analyzer

BERT

Focus on a small
 code fraction ➡ thorough
 ➡ immediate
 feedback Analyze differential
 behavior
 ➡ no oracles

slide-26
SLIDE 26

Modified test suite Test-suite maintenance Obsolete test cases Test suite Tval Regression test selection Test suite T' Test-suite prioritization Prioritized Test suite T' Test-suite augmentation Test suite Taug Test-suite minimization Redundant test cases Minimized test suite T e s t

  • c

a s e m a n i p u l a t i

  • n

Test suite T

Regression Testing

Process and Issues

slide-27
SLIDE 27

Overview of MINTS

Test-related data

Test suite

Coverage data Cost data Fault detection data

Minimization criteria Criterion #1 Criterion #2 Criterion #n

MINTS tool

Solver n

Minimization policy Minimized Test suite

Minimization problem (suitably encoded) Solution (or timeout)

Solver 1

Testing team

slide-28
SLIDE 28

Modified test suite Test-suite maintenance Obsolete test cases Test suite Tval Regression test selection Test suite T' Test-suite prioritization Prioritized Test suite T' Test-suite augmentation Test suite Taug Test-suite minimization Redundant test cases Minimized test suite T e s t

  • c

a s e m a n i p u l a t i

  • n

Test suite T

Regression Testing

Process and Issues

slide-29
SLIDE 29
  • Greater industrial uptake
  • Requires better efforts to understand

practitioners’ problems and needs

  • Industrial case studies may help
  • Augmentation
  • Test suite repair

Regression Testing

Open Issues?

slide-30
SLIDE 30

So Many Things, So Little Time…

Empirical Studies
 &
 Infrastructure Practical Contributions Leveraging Cloud and Crowd Regression Testing Automated Test Input Generation

slide-31
SLIDE 31

&

  • Testing is heuristic ➡ must be empirically evaluated

None 52% Examples 4% Case studies 27% Experiments 17%

  • State of the art in ~2000:

study on 224 papers on testing (1994–2003)

  • Things have changed

dramatically since then:

  • Empirical evaluations

almost required

  • Artifact evaluations at

various conferences

Empirical Studies Infrastructure

slide-32
SLIDE 32
  • Increased availability of experiment objects
  • Repositories: SIR (over 600 companies/institution

registered, over 500 papers used it), BugBench, iBugs, Marmoset, SAMATE Reference Dataset, …

  • Open-source systems, often large and available

with versions, tests, bug reports, …

  • Increased availability of supporting infrastructure 


(analysis tools, coverage tools, mutation tools, …)

  • Increased understanding of empirical methodologies

Empirical Studies & Infrastructure

(What Changed?)

slide-33
SLIDE 33

So Many Things, So Little Time…

Practical Contributions Leveraging Cloud and Crowd Regression Testing Automated Test Input Generation Empirical Studies
 & 
 Infrastructure

slide-34
SLIDE 34

Practical Contributions

  • Frameworks for test execution
  • Shortening of the testing process life cycle
slide-35
SLIDE 35

Practical Contributions

  • Frameworks for test execution
  • Shortening of the testing process life cycle
  • Dramatically improved the stat of the art
  • Indirectly affected research
  • Examples:
slide-36
SLIDE 36

Practical Contributions

  • Frameworks for test execution
  • Shortening of the testing process life cycle
  • From integrating and testing “at the end”,


to early integration and testing,
 to continuous integration (CI)

  • Widely used in industry
  • Examples:

Hudson Jenkins Travis

slide-37
SLIDE 37

So Many Things, So Little Time…

Leveraging Cloud and Crowd Practical Contributions Regression Testing Automated Test Input Generation Empirical Studies
 & 
 Infrastructure

slide-38
SLIDE 38

Leveraging Cloud and Crowd

  • From local to remote (data centers, servers)
  • Software increasingly built and run on the net


(e.g., cloud IDEs)

  • Natural for testing to follow (e.g., symbolic

execution, test farms, heavy-weight analysis)

slide-39
SLIDE 39

Leveraging Cloud and Crowd

  • Testing is still very much human intensive
  • It makes sense to leverage the crowd in testing
  • This has been happening for some time, both

in academia and in industry

  • Interesting new directions (game based testing

and verification, crowd oracles, …)

slide-40
SLIDE 40

Leveraging Cloud and Crowd

  • Testing is still very much human intensive
  • It makes sense to leverage the crowd in testing
  • This has been happening for some time, both

in academia and in industry

  • Interesting new directions (game based testing

and verification, crowd oracles, …)

  • Testing as a game?
  • Must be a game people

are willing to play

  • Must be easier than the
  • riginal problem
slide-41
SLIDE 41

In Summary

  • Incredible amount of work on testing
  • Yet, things are not that different…

…or are they?

  • Automated testing
  • Empirical evaluation
  • Testing strategies
  • Testing tools
  • Testing process
slide-42
SLIDE 42

Future Directions

" I t ' s h a r d t

  • m

a k e p r e d i c t i

  • n

s , 
 e s p e c i a l l y a b

  • u

t t h e f u t u r e . " ( s

  • u

r c e u n c l e a r )

  • Stop chasing full automation
  • True for other related areas too


(e.g., debugging, program repair)

  • Use the different players for what

they are best at doing

  • Human: creativity
  • Computer: computation-intensive,

repetitive, error-prone, etc. tasks

But if you twist
 my arm…

Testing Real-World Systems Oracles Probabilistic Approaches Testing Non-Functional Properties Domain-Based Testing Leveraging Cloud and Crowd

And a personal message

slide-43
SLIDE 43

With much appreciated input/contributions from

  • Alex Groce
  • Andrea Arcuri
  • Andreas Zeller
  • Andy Podgurski
  • Antonia Bertolino
  • Atif Memon
  • Corina Pasareanu
  • Darko Marinov
  • David Rosenblum
  • Elaine Weyuker
  • John Regehr
  • Lionel Briand
  • Lori Pollock
  • Mark Grechanik
  • Natalia Juristo
  • Paolo Tonella
  • Patrice Godefroid
  • Per Runeson
  • Peter Santhanam
  • Phil McMinn
  • Phyllis Frankl
  • Robert Hierons
  • Satish Chandra
  • Sebastian Elbaum
  • Sriram Rajamani
  • T.
  • Y. Chen
  • Tom Ostrand
  • Wes Masri
  • Willem

Visser

  • Yves Le Traon
slide-44
SLIDE 44

Leveraging Symbolic Execution for Reproducing and Debugging Field Failures

slide-45
SLIDE 45

TYPICAL DEBUGGING PROCESS

Very hard to (1) reproduce (2) debug

Bug repository

OVERARCHING GOAL: help developers

(1) investigate field failures, (2) understand their causes, and (3) eliminate such causes.

Recent survey of Apache, Eclipse, and Mozilla developers:

Information on how to reproduce field failures is the most valuable, and difficult to obtain, piece of information for investigating such failures.

[Zimmermann10]

slide-46
SLIDE 46

OVERALL VISION

Crash report (execution data)

Synthesized Executions

Field Failure Reproduction

sed.c:8958 -> sed.c: 8958 sed.c:8993 -> sed.c: 9011 sed.c:8785 -> sed.c: 8786 sed.c:8786 -> sed.c: 8786 sed.c:990 -> sed.c: 990

Likely faults

Field Failure Debugging Instrumentation

Software developer Application In house In the field

slide-47
SLIDE 47

BUGREDUX

Crash report (execution data)

Synthesized Executions

Field Failure Reproduction In house In the field

sed.c:8958 -> sed.c: 8958 sed.c:8993 -> sed.c: 9011 sed.c:8785 -> sed.c: 8786 sed.c:8786 -> sed.c: 8786 sed.c:990 -> sed.c: 990

Likely faults

Field Failure Debugging Instrumentation

Software developer Application

slide-48
SLIDE 48

MIMICKING FIELD FAILURES

User run (R) Mimicked run (R’)

  • F’ is analogous to F
  • R’ is an actual execution

F F’

in the field in house

slide-49
SLIDE 49

MIMICKING FIELD FAILURES

User run (R) Relevant events
 (breadcrumbs) Mimicked run (R’)

slide-50
SLIDE 50

Crash report (execution data)

Synthesized Executions

Field Failure Reproduction

BUGREDUX

slide-51
SLIDE 51

Crash report (execution data) Synthesized Executions

BUGREDUX

Test input

slide-52
SLIDE 52

Crash report (execution data)

Oracle Candidate
 input Input generator

  • Execution data
  • Input generation technique
  • Point of failure (POF)
  • Failure call stack
  • Call sequence
  • Complete trace
  • Guided symbolic execution

Test input

BUGREDUX

slide-53
SLIDE 53

Symbolic execution: Input: x=x0, y=y0 Outcome: failure PC: x0 > y0 ∧ 
 x0 + y0 > 10

SYMBOLIC EXECUTION

foo (x, y) { if(x > y) { z = x + y; if(z > 10) assert false; } print(“OK”); x0 = 7
 y0 = 4
 solver

Mimicked run

slide-54
SLIDE 54

Input
 icfg for P
 goals (list of code locations) Output
 If (candidate input) 
 Main algorithm
 init; currGoal = first(goals)
 repeat
 currState = SelNextState()
 if (!currState) backtrack or fail
 if (currState.cl == currGoal)
 if (currGoal == last(goals))
 return solve(currState.pc)
 else
 currGoal = next(goals)
 currState.goal = currGoal
 SymbolicallyExecute(currState)

ALGORITHM (SIMPLIFIED)

statesSet= {<cl, pc, ss, goal>}

SelNextState
 minDis = ∞
 retState = null
 
 foreach state in statesSet
 if (state.goal = currGoal)
 if (state.cl can reach currGoal)
 d = |shortest path state.cl, currGoal|
 if d < minDis
 minDis = d
 retState = state
 return retState

Optimizations/Heuristics
 Dynamic tainting to reduce the symbolic input space
 Program analysis information to prune the search space
 Some randomness in the shortest path computation


slide-55
SLIDE 55

EMPIRICAL EVALUATION – RESEARCH QUESTIONS

  • RQ1:


Can BugRedux synthesize executions that are able to reproduce field failures?

  • RQ2:


If so, which types of execution data provide the best cost- benefit tradeoffs?

  • In addition, we gathered performance data
slide-56
SLIDE 56

EMPIRICAL EVALUATION – BUGREDUX TOOL

  • Tool

Instrumenter

Analyzer BugRedux

LLVM

Input Generator KLEE Oracle

(perl scripts)

Field data options:

  • POF
  • Call Stacks
  • Call Sequence
  • Complete Traces

Oracle:

  • inputs P

, If, crash report C

  • runs P(If), logs any crash C’
  • returns fail if no C’ or C’ != C
  • returns success otherwise
  • Publicly available:


http://www.cc.gatech.edu/~orso/software/bugredux.html

Easily
 customizable!

slide-57
SLIDE 57

EMPIRICAL EVALUATION – FAILURES CONSIDERED

Name Repository Size(KLOC) # Faults sed SIR 14 2 grep SIR 10 1 gzip SIR 5 2 ncompress BugBench 2 1 polymorph BugBench 1 1 aeon exploit-db 3 1 glftpd exploit-db 6 1 htget exploit-db 3 1 socat exploit-db 35 1 tipxd exploit-db 7 1 aspell exploit-db 0.5 1 exim exploit-db 241 1 rsync exploit-db 67 1 xmail exploit-db 1 1

None of these faults can be discovered by
 a vanilla KLEE with a timeout of 72 hours

Only crashing bugs

slide-58
SLIDE 58

EMPIRICAL EVALUATION – PROTOCOL

For each program P , fault f, and test case t that reveals f

  • 1. While recording time and size of execution data
  • a. Run t against P
  • b. Run t against P instrumented to collect call sequences
  • c. Run t against P instrumented to collect complete traces

  • 2. Run BugRedux with a timeout of 24 hours using POF, call

stack, call sequence, and complete trace as execution data

  • a. Record whether a candidate If is produced
  • b. Record whether If can reproduce the failure

slide-59
SLIDE 59

EMPIRICAL EVALUATION – RESULTS

Name POF Call Stack Call Seq. Compl. Trace sed #1 sed #2 grep gzip #1 gzip #2 ncompress polymorph aeon rsync glftpd htget socat tipxd aspell xmail exim

One of three outcomes: ✘: fail ~: synthesize ✔: (synthesize and) mimic

slide-60
SLIDE 60

EMPIRICAL EVALUATION – RESULTS

Name POF Call Stack Call Seq. Compl. Trace sed #1 ✘ ✘ ✔ ✘ sed #2 ✘ ✘ ✔ ✘ grep ✘ ~ ✔ ✘ gzip #1 ✔ ✔ ✔ ✘ gzip #2 ~ ~ ✔ ✘ ncompress ✔ ✔ ✔ ✘ polymorph ✔ ✔ ✔ ✘ aeon ✔ ✔ ✔ ✔ rsync ✘ ✘ ✔ ✘ glftpd ✔ ✔ ✔ ✘ htget ~ ~ ✔ ✘ socat ✘ ✘ ✔ ✘ tipxd ✔ ✔ ✔ ✘ aspell ~ ~ ✔ ✘ xmail ✘ ✘ ✔ ✘ exim ✘ ✘ ✔ ✔

Synthesize: 9/16
 Mimic: 6/16

slide-61
SLIDE 61

EMPIRICAL EVALUATION – RESULTS

Name POF Call Stack Call Seq. Compl. Trace sed #1 ✘ ✘ ✔ ✘ sed #2 ✘ ✘ ✔ ✘ grep ✘ ~ ✔ ✘ gzip #1 ✔ ✔ ✔ ✘ gzip #2 ~ ~ ✔ ✘ ncompress ✔ ✔ ✔ ✘ polymorph ✔ ✔ ✔ ✘ aeon ✔ ✔ ✔ ✔ rsync ✘ ✘ ✔ ✘ glftpd ✔ ✔ ✔ ✘ htget ~ ~ ✔ ✘ socat ✘ ✘ ✔ ✘ tipxd ✔ ✔ ✔ ✘ aspell ~ ~ ✔ ✘ xmail ✘ ✘ ✔ ✘ exim ✘ ✘ ✔ ✔

Synthesize: 10/16
 Mimic: 6/16

slide-62
SLIDE 62

EMPIRICAL EVALUATION – RESULTS

Name POF Call Stack Call Seq. Compl. Trace sed #1 ✘ ✘ ✔ ✘ sed #2 ✘ ✘ ✔ ✘ grep ✘ ~ ✔ ✘ gzip #1 ✔ ✔ ✔ ✘ gzip #2 ~ ~ ✔ ✘ ncompress ✔ ✔ ✔ ✘ polymorph ✔ ✔ ✔ ✘ aeon ✔ ✔ ✔ ✔ rsync ✘ ✘ ✔ ✘ glftpd ✔ ✔ ✔ ✘ htget ~ ~ ✔ ✘ socat ✘ ✘ ✔ ✘ tipxd ✔ ✔ ✔ ✘ aspell ~ ~ ✔ ✘ xmail ✘ ✘ ✔ ✘ exim ✘ ✘ ✔ ✔

Synthesize: 16/16
 Mimic: 16/16

slide-63
SLIDE 63

EMPIRICAL EVALUATION – RESULTS

Name POF Call Stack Call Seq. Compl. Trace sed #1 ✘ ✘ ✔ ✘ sed #2 ✘ ✘ ✔ ✘ grep ✘ ~ ✔ ✘ gzip #1 ✔ ✔ ✔ ✘ gzip #2 ~ ~ ✔ ✘ ncompress ✔ ✔ ✔ ✘ polymorph ✔ ✔ ✔ ✘ aeon ✔ ✔ ✔ ✔ rsync ✘ ✘ ✔ ✘ glftpd ✔ ✔ ✔ ✘ htget ~ ~ ✔ ✘ socat ✘ ✘ ✔ ✘ tipxd ✔ ✔ ✔ ✘ aspell ~ ~ ✔ ✘ xmail ✘ ✘ ✔ ✘ exim ✘ ✘ ✔ ✔

Synthesize: 2/16
 Mimic: 2/16

slide-64
SLIDE 64

EMPIRICAL EVALUATION – RESULTS

Name POF Call Stack Call Seq. Compl. Trace sed #1 ✘ ✘ ✔ ✘ sed #2 ✘ ✘ ✔ ✘ grep ✘ ~ ✔ ✘ gzip #1 ✔ ✔ ✔ ✘ gzip #2 ~ ~ ✔ ✘ ncompress ✔ ✔ ✔ ✘ polymorph ✔ ✔ ✔ ✘ aeon ✔ ✔ ✔ ✔ rsync ✘ ✘ ✔ ✘ glftpd ✔ ✔ ✔ ✘ htget ~ ~ ✔ ✘ socat ✘ ✘ ✔ ✘ tipxd ✔ ✔ ✔ ✘ aspell ~ ~ ✔ ✘ xmail ✘ ✘ ✔ ✘ exim ✘ ✘ ✔ ✔

Synthesize: 2/16
 Mimic: 2/16

  • Divergence due 


to lib modeling

  • Limitations of 


constraint solver

slide-65
SLIDE 65

EMPIRICAL EVALUATION – DISCUSSION

  • RQ1


Can BugRedux synthesize executions that are able to reproduce field failures?
 YES

  • RQ2


If so, which types of execution data provide the best cost-benefit tradeoffs?
 Call sequences

  • Observations
  • [Manual examination] Faults can be distant from the failure points,

so POFs and call stacks are unlikely to help

  • More information may not be always better
  • Call sequences work well, but provide a great deal of information
  • BugRedux can generate multiple mimicked executions (pass & fail)

Performance:
 Average overhead for call-sequence collection: 15%
 (unoptimized implementation)

slide-66
SLIDE 66

EMPIRICAL EVALUATION – DISCUSSION

  • RQ1


Can BugRedux synthesize executions that are able to reproduce field failures?
 YES

  • RQ2


If so, which types of execution data provide the best cost-benefit tradeoffs?
 Call sequences

  • Observations
  • [Manual examination] Faults can be distant from the failure points,

so POFs and call stacks are unlikely to help

  • More information may not be always better
  • Call sequences work well, but provide a great deal of information
  • BugRedux can generate multiple mimicked executions (pass & fail)
slide-67
SLIDE 67

MINIMIZING CALL SEQUENCES

Relevant events
 (breadcrumbs) Mimicked run

slide-68
SLIDE 68

MINIMIZING CALL SEQUENCES

Relevant events
 (breadcrumbs) Mimicked run

slide-69
SLIDE 69

MINIMIZING CALL SEQUENCES

Relevant events
 (breadcrumbs) Mimicked run

Mini study

  • for each entry e
  • remove e from sequence
  • if BugRedux “ generates

a failure” ➡ continue

  • else add back e
slide-70
SLIDE 70

MINIMIZING CALL SEQUENCES – RESULTS

Name Original Length Minimal Length sed.fault1 73 12 sed.fault2 146 7 grep 31 2 xmail 1142 363 gzip.fault2 27 2 rysnc 23 2 aspell 516 256 socat 62 3 htget 25 2 exim 1029 326

Summary

  • 1. On average, only 16% of entries in the original call

sequence are required to reproduce the failures–in some cases, as little as 2!

  • 2. The number of entries needed increases with the

complexity of the input that triggers the faults.


Preliminary Conclusion

It seems possible to recreate observed failure with only limited (and inexpensive to collect) information

slide-71
SLIDE 71

EMPIRICAL EVALUATION – DISCUSSION

  • RQ1


Can BugRedux synthesize executions that are able to reproduce field failures?
 YES

  • RQ2


If so, which types of execution data provide the best cost-benefit tradeoffs?
 Call sequences

  • Observations
  • [Manual examination] Faults can be distant from the failure points,

so POFs and call stacks are unlikely to help

  • More information may not be always better
  • Call sequences work well, but provide a great deal of information
  • BugRedux can generate multiple mimicked executions (pass & fail)
slide-72
SLIDE 72

EMPIRICAL EVALUATION – DISCUSSION

  • RQ1


Can BugRedux synthesize executions that are able to reproduce field failures?
 YES

  • RQ2


If so, which types of execution data provide the best cost-benefit tradeoffs?
 Call sequences

  • Observations
  • [Manual examination] Faults can be distant from the failure points,

so POFs and call stacks are unlikely to help

  • More information may not be always better
  • Call sequences work well, but provide a great deal of information
  • BugRedux can generate multiple mimicked executions (pass & fail)
slide-73
SLIDE 73

CURRENT AND FUTURE WORK

Crash report (execution data)

Synthesized Executions

Field Failure Reproduction

sed.c:8958 -> sed.c: 8958 sed.c:8993 -> sed.c: 9011 sed.c:8785 -> sed.c: 8786 sed.c:8786 -> sed.c: 8786 sed.c:990 -> sed.c: 990

Likely faults

Field Failure Debugging Instrumentation

Software developer Application In house In the field

Goals


  • ptimization

Failure
 explanation Alternative execution
 mimicking techniques Domain-specific execution recording Alternative uses of the field data

slide-74
SLIDE 74

Probabilistic Symbolic Execution

Acknowledgments:
 Willem Visser (Stellenbosch University, RSA),
 Matt Dwyer (UNL, USA), 
 Jaco Geldenhuys (SU, RSA), 
 Corina Pasareanu (NASA, USA), 
 Antonio Filieri (Imperial College London, UK)

slide-75
SLIDE 75

void test(int x, int y) { if (y == x*10) S0; else S1; if (x > 3 && y > 10) S2; else S3; }

Symbolic Execution

[ Y=X*10 ] S0 [ X>3 & 10<Y=X*10] S2 [ true ] test (X,Y) [ Y!=X*10 & !(X>3 & Y>10) ] S3 [ Y!=X*10 ] S1 [ Y=X*10 & !(X>3 & Y>10) ] S3 [ X>3 & 10<Y!=X*10] S2

Test(1,10) reaches S0,S3 Test(0,1) reaches S1,S3 Test(4,11) reaches S1,S2

slide-76
SLIDE 76

In a perfect world

  • Only linear integer constraints
  • Only uniform distributions
slide-77
SLIDE 77

LattE Model Counter

http://www.math.ucdavis.edu/~latte/

Count solutions for 
 conjunction

  • f linear inequalities
slide-78
SLIDE 78

void test(int x, int y) { if (y == x*10) S0; else S1; if (x > 3 && y > 10) S2; else S3; }

Symbolic Execution

[ Y=X*10 ] S0 [ X>3 & 10<Y=X*10] S2 [ true ] test (X,Y) [ Y!=X*10 & !(X>3 & Y>10) ] S3 [ Y!=X*10 ] S1 [ Y=X*10 & !(X>3 & Y>10) ] S3 [ X>3 & 10<Y!=X*10] S2

slide-79
SLIDE 79

void test(int x: 0..99, int y: 0..99) { if (y == x*10) S0; else S1; if (x > 3 && y > 10) S2; else S3; }

Probabilistic Symbolic Execution

[ Y=X*10 ] S0 [ X>3 & 10<Y=X*10] S2 [ true ] test (X,Y) [ Y!=X*10 & !(X>3 & Y>10) ] S3 [ Y!=X*10 ] S1 [ Y=X*10 & !(X>3 & Y>10) ] S3 [ X>3 & 10<Y!=X*10] S2 [ Y=X*10 ] [ Y!=X*10 ]

[ X>3 & 10<Y=X*10] [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ]

[ true ]

[ Y=X*10 & !(X>3 & Y>10) ]

y=10x x>3 & y>10 x>3 & y>10

slide-80
SLIDE 80

void test(int x: 0..99, int y: 0..99) { if (y == x*10) S0; else S1; if (x > 3 && y > 10) S2; else S3; }

Probabilistic Symbolic Execution

[ Y=X*10 ] [ Y!=X*10 ]

[ X>3 & 10<Y=X*10] [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ]

[ true ]

[ Y=X*10 & !(X>3 & Y>10) ]

y=10x x>3 & y>10 x>3 & y>10

104 9990 10 8538 6 4 1452

slide-81
SLIDE 81

Program Understanding

[ Y=X*10 ] [ Y!=X*10 ]

[ X>3 & 10<Y=X*10] [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ]

[ true ]

[ Y=X*10 & !(X>3 & Y>10) ]

y=10x x>3 & y>10 x>3 & y>10

104 9990 8538 10 6 4 1452

slide-82
SLIDE 82

Probabilistic Analysis

[ Y=X*10 ] [ Y!=X*10 ]

[ X>3 & 10<Y=X*10] [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ]

[ true ]

[ Y=X*10 & !(X>3 & Y>10) ]

y=10x x>3 & y>10 x>3 & y>10

104 0.999 0.854 0.001 0.6 0.4 0.145 0.0006 0.0004 0.853 0.145

slide-83
SLIDE 83

Software Reliability

[ Y=X*10 ] [ Y!=X*10 ]

[ X>3 & 10<Y=X*10] [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ]

[ true ]

[ Y=X*10 & !(X>3 & Y>10) ]

y=10x x>3 & y>10

104 0.999 0.854 0.001 0.6 0.4 0.145 0.0006 0.853 0.145 0.0004

x>3 & y>10

0.0004

0.9996 Reliable

slide-84
SLIDE 84
  • Incorporate usage profiles
  • Extend model counting to other types
  • Reduce, reuse and recycle constraints
  • Use informed sampling for statistical SE
  • Target non-deterministic programs

Future Directions