Improving Test Suites via Operational Abstraction Michael Ernst - - PowerPoint PPT Presentation

improving test suites via operational abstraction
SMART_READER_LITE
LIVE PREVIEW

Improving Test Suites via Operational Abstraction Michael Ernst - - PowerPoint PPT Presentation

Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science http://pag.lcs.mit.edu/~mernst/ Joint work with Michael Harder, Jeff Mellen, and Benjamin Morse Michael Ernst, page 1 Creating test suites Goal:


slide-1
SLIDE 1

Michael Ernst, page 1

Improving Test Suites via Operational Abstraction

Michael Ernst

MIT Lab for Computer Science http://pag.lcs.mit.edu/~mernst/ Joint work with Michael Harder, Jeff Mellen, and Benjamin Morse

slide-2
SLIDE 2

Michael Ernst, page 2

Creating test suites

Goal: small test suites that detect faults well Larger test suites are usually more effective

  • Evaluation must account for size

Fault detection cannot be predicted

  • Use proxies, such as code coverage
slide-3
SLIDE 3

Michael Ernst, page 3

Test case selection

Example: creating a regression test suite Assumes a source of test cases

  • Created by a human
  • Generated at random or from a grammar
  • Generated from a specification
  • Extracted from observed usage
slide-4
SLIDE 4

Michael Ernst, page 4

Contributions

Operational difference technique for selecting test cases, based on observed behavior

  • Outperforms (and complements) other

techniques (see paper for details)

  • No oracle, static analysis, or specification

Stacking and area techniques for comparing test suites

  • Corrects for size, permitting fair comparison
slide-5
SLIDE 5

Michael Ernst, page 5

Outline

Operational difference technique for selecting test cases Generating operational abstractions Stacking and area techniques for comparing test suites Evaluation of operational difference technique Conclusion

slide-6
SLIDE 6

Michael Ernst, page 6

Operational difference technique

Idea: Add a test case c to a test suite S if c exercises behavior that S does not Code coverage does this in the textual domain We extend this to the semantic domain Need to compare run-time program behaviors

  • Operational abstraction: program properties
  • x > y
  • a[] is sorted
slide-7
SLIDE 7

Michael Ernst, page 7

Test suite generation

  • r augmentation

Idea: Compare operational abstractions induced by different test suites Given: a source of test cases; an initial test suite Loop:

  • Add a candidate test case
  • If operational abstraction changes, retain the case
  • Stopping condition: failure of a few candidates
slide-8
SLIDE 8

Michael Ernst, page 8

The operational difference technique is effective

Operational difference suites

  • are smaller
  • have better fault detection

than branch coverage suites (in our evaluation; see paper for details)

slide-9
SLIDE 9

Michael Ernst, page 9

Example of test suite generation

Program under test: abs (absolute value) Test cases: 5, 1, 4, -1, 6, -3, 0, 7, -8, 3, … Suppose an operational abstraction contains:

  • var = constant
  • var ≥ constant
  • var ≤ constant
  • var = var
  • property  property
slide-10
SLIDE 10

Michael Ernst, page 10

Considering test case 5

Initial test suite: { } Initial operational abstraction for { }: Ø Candidate test case: 5 New operational abstraction for { 5 }:

  • Precondition: arg = 5
  • Postconditions: arg = return

New operational abstraction is different, so retain the test case

slide-11
SLIDE 11

Michael Ernst, page 11

Considering test case 1

Operational abstraction for { 5 }:

  • Pre: arg = 5
  • Post: arg = return

Candidate test case: 1 New operational abstraction for { 5, 1 }:

  • Pre: arg ≥ 1
  • Post: arg = return

Retain the test case

slide-12
SLIDE 12

Michael Ernst, page 12

Considering test case 4

Operational abstraction for { 5, 1 }:

  • Pre: arg ≥ 1
  • Post: arg = return

Candidate test case: 4 New operational abstraction for { 5, 1, 4 }:

  • Pre: arg ≥ 1
  • Post: arg = return

Discard the test case

slide-13
SLIDE 13

Michael Ernst, page 13

Considering test case -1

Operational abstraction for { 5, 1 }:

  • Pre: arg ≥ 1
  • Post: arg = return

Candidate test case: -1 New operational abstraction for { 5, 1, -1 }:

  • Pre: arg ≥ -1
  • Post: arg ≥ 1  (arg = return)

arg = -1  (arg = -return) return ≥ 1

Retain the test case

slide-14
SLIDE 14

Michael Ernst, page 14

Considering test case -6

Operational abstraction for { 5, 1, -1 }:

  • Pre: arg ≥ -1
  • Post: arg ≥ 1  (arg = return)

arg = -1  (arg = -return) return ≥ 1

Candidate test case: -6 New operational abstraction for { 5, 1, -1, -6 }:

  • Pre: Ø
  • Post: arg ≥ 1  (arg = return)

arg ≤ -1  (arg = -return) return ≥ 1

Retain the test case

slide-15
SLIDE 15

Michael Ernst, page 15

Considering test case -3

Operational abstraction for { 5, 1, -1, -6 }:

  • Post: arg ≥ 1  (arg = return)

arg ≤ -1  (arg = -return) return ≥ 1

Test case: -3 New operational abstraction for { 5, 1, -1, 6, -3 }:

  • Post: arg ≥ 1  (arg = return)

arg ≤ -1  (arg = -return) return ≥ 1

Discard the test case

slide-16
SLIDE 16

Michael Ernst, page 16

Considering test case 0

Operational abstraction for { 5, 1, -1, -6 }:

  • Post: arg ≥ 1  (arg = return)

arg ≤ -1  (arg = -return) return ≥ 1

Test case: 0 New operational abstraction for {5, 1, -1, -6, 0 }:

  • Post: arg ≥ 0  (arg = return)

arg ≤ 0  (arg = -return) return ≥ 0

Retain the test case

slide-17
SLIDE 17

Michael Ernst, page 17

Considering test case 7

Operational abstraction for { 5, 1, -1, -6, 0 }:

  • Post: arg ≥ 0  (arg = return)

arg ≤ 0  (arg = -return) return ≥ 0

Candidate test case: 7 New operational abstraction for { 5, 1, -1, -6, 0, 7 }:

  • Post: arg ≥ 0  (arg = return)

arg ≤ 0  (arg = -return) return ≥ 0

Discard the test case

slide-18
SLIDE 18

Michael Ernst, page 18

Considering test case -8

Operational abstraction for { 5, 1, -1, -6, 0 }:

  • Post: arg ≥ 0  (arg = return)

arg ≤ 0  (arg = -return) return ≥ 0

Candidate test case: -8 New operational abstraction for { 5, 1, -1, -6, 0, -8 }:

  • Post: arg ≥ 0  (arg = return)

arg ≤ 0  (arg = -return) return ≥ 0

Discard the test case

slide-19
SLIDE 19

Michael Ernst, page 19

Considering test case 3

Operational abstraction for { 5, 1, -1, -6, 0 }:

  • Post: arg ≥ 0  (arg = return)

arg ≤ 0  (arg = -return) return ≥ 0

Candidate test case: 3 New operational abstraction for { 5, 1, -1, -6, 0, 3 }:

  • Post: arg ≥ 0  (arg = return)

arg ≤ 0  (arg = -return) return ≥ 0

Discard the test case; third consecutive failure

slide-20
SLIDE 20

Michael Ernst, page 20

Minimizing test suites

Given: a test suite For each test case in the suite:

Remove the test case if doing so does not change the operational abstraction

slide-21
SLIDE 21

Michael Ernst, page 21

Outline

Operational difference technique for selecting test cases Generating operational abstractions Stacking and area techniques for comparing test suites Evaluation of operational difference technique Conclusion

slide-22
SLIDE 22

Michael Ernst, page 22

Dynamic invariant detection

Goal: recover invariants from programs Technique: run the program, examine values Artifact: Daikon

http://pag.lcs.mit.edu/daikon

Experiments demonstrate accuracy, usefulness

slide-23
SLIDE 23

Michael Ernst, page 23

Goal: recover invariants

Detect invariants (as in asserts or specifications)

  • x > abs(y)
  • x = 16*y + 4*z + 3
  • array a contains no duplicates
  • for each node n, n = n.child.parent
  • graph g is acyclic
  • if ptr  null then *ptr > i
slide-24
SLIDE 24

Michael Ernst, page 24

Uses for invariants

  • Write better programs [Gries 81, Liskov 86]
  • Document code
  • Check assumptions: convert to assert
  • Maintain invariants to avoid introducing bugs
  • Locate unusual conditions
  • Validate test suite: value coverage
  • Provide hints for higher-level profile-directed

compilation [Calder 98]

  • Bootstrap proofs [Wegbreit 74, Bensalem 96]
slide-25
SLIDE 25

Michael Ernst, page 25

Ways to obtain invariants

  • Programmer-supplied
  • Static analysis: examine the program text

[Cousot 77, Gannod 96]

  • properties are guaranteed to be true
  • pointers are intractable in practice
  • Dynamic analysis: run the program
  • complementary to static techniques
slide-26
SLIDE 26

Michael Ernst, page 26

Dynamic invariant detection

Look for patterns in values the program computes:

  • Instrument the program to write data trace files
  • Run the program on a test suite
  • Invariant engine reads data traces, generates potential

invariants, and checks them

Invariants Instrumented program Original program Test suite

Run Instrument

Data trace database

Detect invariants

slide-27
SLIDE 27

Michael Ernst, page 27

Checking invariants

For each potential invariant:

  • instantiate

(determine constants like a and b in y = ax + b)

  • check for each set of variable values
  • stop checking when falsified

This is inexpensive: many invariants, each cheap

slide-28
SLIDE 28

Michael Ernst, page 28

Improving invariant detection

Add desired invariants: implicit values, unused polymorphism Eliminate undesired invariants: unjustified properties, redundant invariants, incomparable variables Traverse recursive data structures Conditionals: compute invariants over subsets of data (if x>0 then yz)

slide-29
SLIDE 29

Michael Ernst, page 29

Outline

Operational difference technique for selecting test cases Generating operational abstractions Stacking and area techniques for comparing test suites Evaluation of operational difference technique Conclusion

slide-30
SLIDE 30

Michael Ernst, page 30

Comparing test suites

Key metric: fault detection

  • percentage of faults detected by a test suite

Correlated metric: test suite size

  • number of test cases
  • run time

Test suite comparisons must control for size

slide-31
SLIDE 31

Michael Ernst, page 31

Test suite efficiency

Efficiency = (fault detection)/(test suite size) Which test suite generation technique is better?

fault detection test suite size S2 S1

slide-32
SLIDE 32

Michael Ernst, page 32

Different size suites are incomparable

A technique induces a curve: How can we tell which is the true curve?

slide-33
SLIDE 33

Michael Ernst, page 33

Comparing test suite generation techiques

Each technique induces a curve Compare the curves, not specific points Approach: compare area under the curve

  • Compares the techniques at many sizes
  • Cannot predict the size users will want
slide-34
SLIDE 34

Michael Ernst, page 34

Approximating the curves (“stacking”)

Given a test budget (in suite execution time), generate a suite that runs for that long To reduce in size:

select a random subset

To increase in size:

combine independent suites

T1 T2 S2 S2' S1 S1' fault detection size

slide-35
SLIDE 35

Michael Ernst, page 35

Test suite generation comparison

  • 1. Approximate the curves
  • 2. Report ratios of

areas under curves

T1 T2 S2 S2' S1 S1' fault detection size

slide-36
SLIDE 36

Michael Ernst, page 36

Outline

Operational difference technique for selecting test cases Generating operational abstractions Stacking and area techniques for comparing test suites Evaluation of operational difference technique Conclusion

slide-37
SLIDE 37

Michael Ernst, page 37

Evaluation of operational difference technique

  • It ought to work: Correlating operational

abstractions with fault detection

  • It does work: Measurements of fault

detection of generated suites

slide-38
SLIDE 38

Michael Ernst, page 38

Subject programs

8 C programs

  • seven 300-line programs, one 6000-line program

Each program comes with

  • pool of test cases (1052 – 13585)
  • faulty versions (7 – 34)
  • statement, branch, and def-use coverage suites
slide-39
SLIDE 39

Michael Ernst, page 39

Improving operational abstractions improves tests

Let the ideal operational abstraction be that generated by all available test cases Operational coverage = closeness to the ideal

  • Operational coverage is correlated with fault

detection

  • Holding constant cases, calls, statement coverage,

branch coverage

  • Same result for 100% statement/branch coverage
slide-40
SLIDE 40

Michael Ernst, page 40

Generated suites

Relative fault detection (adjusted by using the stacking technique):

Def-use coverage: 1.73 Branch coverage: 1.66 Operational difference: 1.64 Statement coverage: 1.53 Random: 1.00

Similar results for augmentation, minimization

slide-41
SLIDE 41

Michael Ernst, page 41

Augmentation

Relative fault detection (via area technique):

Random: 1.00 Branch coverage: 1.70 Operational difference: 1.72 Branch + operational diff.: 2.16

slide-42
SLIDE 42

Michael Ernst, page 42

Operational difference complements structural

Best technique Total

  • Op. Diff.

equal Branch CFG changes 9 11 9 29 Non-CFG changes 56 54 24 134 Total 65 65 33 163

slide-43
SLIDE 43

Michael Ernst, page 43

Outline

Operational difference technique for selecting test cases Generating operational abstractions Stacking and area techniques for comparing test suites Evaluation of operational difference technique Conclusion

slide-44
SLIDE 44

Michael Ernst, page 44

Future work

How good is the stacking approximation? How do bugs in the programs affect the

  • perational difference technique?
slide-45
SLIDE 45

Michael Ernst, page 45

Contributions

Stacking and area techniques for comparing test suites

  • Control for test suite size

Operational difference technique for automatic test case selection

  • Based on observed program behavior
  • Outperforms statement and branch coverage
  • Complementary to structural techniques
  • Works even at 100% code coverage
  • No oracle, static analysis, or specification required