Test Factoring: Focusing test suites on the task at hand David - - PowerPoint PPT Presentation

test factoring focusing test suites on the task at hand
SMART_READER_LITE
LIVE PREVIEW

Test Factoring: Focusing test suites on the task at hand David - - PowerPoint PPT Presentation

Test Factoring: Focusing test suites on the task at hand David Saff, MIT ASE 2005 1 David Saff The problem: large, general system tests My test suite One hour Where I changed code Where I broke code How can I get: Quicker feedback?


slide-1
SLIDE 1

David Saff 1

Test Factoring: Focusing test suites on the task at hand

David Saff, MIT ASE 2005

slide-2
SLIDE 2

David Saff 2

The problem: large, general system tests

My test suite One hour Where I changed code Where I broke code

How can I get: Quicker feedback? Less wasted time?

[Saff, Ernst, ISSRE 2003]

slide-3
SLIDE 3

David Saff 3

The problem: large, general system tests

My test suite Test selection

slide-4
SLIDE 4

David Saff 4

The problem: large, general system tests

My test suite Test selection Test prioritization

slide-5
SLIDE 5

David Saff 5

The problem: large, general system tests

My test suite Test selection Test prioritization Test factoring

slide-6
SLIDE 6

David Saff 6

Test factoring

  • Input: large, general system tests
  • Output: small, focused unit tests
  • Work with Shay Artzi, Jeff Perkins, and

Michael D. Ernst

slide-7
SLIDE 7

David Saff 7

A factored test…

  • exercises less code than system test
  • should be faster if a system test is slow
  • can eliminate dependence on expensive

resources or human interaction

  • isolates bugs in subsystems
  • provides new opportunities for

prioritization and selection

slide-8
SLIDE 8

David Saff 8

Test Factoring

  • What?

– Breaking up a system test

  • How?

– Automatically creating mock objects

  • When?

– Integrating test factoring into development

  • What next?

– Results, evaluation, and challenges

slide-9
SLIDE 9

David Saff 9

System Test

Provided Checked

There’s more than one way to factor a test! Basic strategy:

  • Capture a subset of behavior beforehand.
  • Replay that behavior at test time.
slide-10
SLIDE 10

David Saff 10

Tested Code

System Test

Environment Provided Checked PayrollCalculator

  • Fast
  • Is changing

Database Server

  • Expensive
  • Not changing

Xcapture Xcapture Xcapture Xcapture Xcapture X

slide-11
SLIDE 11

David Saff 11

Introduce Mock

Environment Tested Code Checked Checked Checked Provided Provided Provided Introduce Mock:

  • simulate part of the functionality of the original environment
  • validate the unit’s interaction with the environment

Provided Checked [Saff, Ernst, PASTE 2004]

slide-12
SLIDE 12

David Saff 12

Test Factoring

  • What?

– Breaking up a system test

  • How?

– Automatically creating mock objects

  • When?

– Integrating test factoring into development

  • What next?

– Results, evaluation, and challenges

slide-13
SLIDE 13

David Saff 13

How? Automating Introduce Mock

PayrollCalculator ResultSet Database

addResultsTo(ResultSet) addResult(String) getResult() addResult(String) addResult(String) getResult() getResult() calculatePayroll()

Tested Code Environment

Xcapture X

capture

slide-14
SLIDE 14

David Saff 14

Interfacing: separate type hierarchy from inheritance hierarchy

PayrollCalculator ResultSet Database

addResultsTo(IResultSet) addResult(String)

getResult()

addResult(String) addResult(String)

getResult() getResult()

calculatePayroll()

Tested Code Environment IDatabase IPayrollCalculator IResultSet

slide-15
SLIDE 15

David Saff 15

Capturing: insert recording decorators where capturing must happen

PayrollCalculator ResultSet Database

addResultsTo(IResultSet) addResult(String)

getResult()

addResult(String) addResult(String)

getResult() getResult()

calculatePayroll()

Tested Code Environment IPayrollCalculator IResultSet IDatabase Callback ResultSet Capturing Database

capture capture

slide-16
SLIDE 16

David Saff 16

Replay: simulate environment’s behavior

PayrollCalculator ResultSet Database

addResultsTo(IResultSet) addResult(String)

getResult()

addResult(String) addResult(String)

getResult() getResult()

calculatePayroll()

Tested Code Environment IPayrollCalculator IResultSet IDatabase Replaying Database

replayed verified

slide-17
SLIDE 17

David Saff 17

Test Factoring

  • What?

– Breaking up a system test

  • How?

– Automatically creating mock objects

  • When?

– Integrating test factoring into development

  • What next?

– Results, evaluation, and challenges

slide-18
SLIDE 18

David Saff 18

When? Test factoring life cycle:

Slow system tests Transcript Fast unit tests Capture Replay Developer changes tested unit Run factored tests Success Failure Replay exception Run system tests for replay exceptions

slide-19
SLIDE 19

David Saff 19

Time saved:

Slow system tests Run factored tests Run system tests for replay exceptions

slide-20
SLIDE 20

David Saff 20

Time saved:

Slow system tests Factored tests Time until first error Time to complete tests

slide-21
SLIDE 21

David Saff 21

Test Factoring

  • What?

– Breaking up a system test

  • How?

– Automatically creating mock objects

  • When?

– Integrating test factoring into development

  • What next?

– Results, evaluation, and challenges

slide-22
SLIDE 22

David Saff 22

Implementation for Java

  • Captures and replays

– Static calls – Constructor calls – Calls via reflection – Explicit class loading

  • Allows for shared libraries

– i.e., tested code and environment are free to use disjoint ArrayLists without verification.

  • Preserves behavior on Java programs up to

100KLOC

slide-23
SLIDE 23

David Saff 23

Case study

  • Daikon: 347 KLOC

– Uses most of Java: reflection, native methods, JDK callbacks, communication through side effects

  • Tests found real developer errors
  • Two developers

– Fine-grained compilable changes over two months: 2505 – CVS check-ins over six months (all developers): 104

slide-24
SLIDE 24

David Saff 24

Evaluation method

  • Retrospective reconstruction of test

factoring’s results during real development

– Test on every change, or every check-in.

  • Assume capture happens every night
  • If transcript is too large, don’t capture

– just run original test

  • If factored test throws a ReplayException,

run original test.

slide-25
SLIDE 25

David Saff 25

Measured Quantities

  • Test time: total time to find out test results
  • Time to failure: If tests fail, how long until

first failure?

  • Time to success: If tests pass, how long

until all tests run?

  • ReplayExceptions are treated as giving

the developer no information

slide-26
SLIDE 26

David Saff 26

Results

.09 (0.8 / 8.8 min) n/a .09 (0.8 / 8.8 min) Every check-in All devs. .77 (11.0 / 14.3 s) 1.28 (64 / 50 s) .99 (14.1 / 14.3 min) Every change

  • Dev. 2

.59 (5.5 / 9.4 s) 1.56 (14 / 9 s) .79 (7.4 / 9.4 min) Every change

  • Dev. 1

Time to success Time to failure Test time How

  • ften?
slide-27
SLIDE 27

David Saff 27

Discussion

  • Test factoring dramatically reduced testing

time for checked-in code (by 90%)

  • Testing on every developer change

catches too many meaningless versions

  • Are ReplayExceptions really not helpful?

– When they are surprising, perhaps they are

slide-28
SLIDE 28

David Saff 28

Future work: improving the tool

  • Generating automated tests from UI bugs

– Factor out the user

  • Smaller factored tests

– Use static analysis to distill transcripts to bare essentials

slide-29
SLIDE 29

David Saff 29

Future work: Helping users

  • How do I partition my program?

– Should ResultSet be tested or mocked?

  • How do I use replay exceptions?

– Is it OK to return null when “” was expected?

  • Can I change my program to make it more

factorable?

– Can the tool suggest refactorings?

slide-30
SLIDE 30

David Saff 30

Conclusion

  • Test factoring uses large, general system

tests to create small, focused unit tests

  • Test factoring works now
  • How can it work better, and help users

more?

  • saff@mit.edu
slide-31
SLIDE 31

David Saff 31

slide-32
SLIDE 32

David Saff 32

Challenge: Better factored tests

  • Allow more code changes

– It’s OK to call toString an additional time.

  • Eliminate redundant tests

– Not all 2,000 calls to calculatePayroll are needed.

slide-33
SLIDE 33

David Saff 33

Evaluation strategy

1) Observe: minute-by-minute code changes from real development projects. 2) Simulate: running the real test factoring code on the changing code base. 3) Measure:

– Are errors found faster? – Do tests finish faster? – Do factored tests remain valid?

4) Distribute: developer case studies

slide-34
SLIDE 34

David Saff 34

Conclusion

  • Rapid feedback from test execution has

measurable impact on task completion.

  • Continuous testing is publicly available.
  • Test factoring is working, and will be

available by year’s end.

  • To read papers and download:

– Google “continuous testing”

slide-35
SLIDE 35

David Saff 35

Case Study

  • Four development projects monitored
  • Shown here: Perl implementation of delta tools.
  • Developed by me using test-first development
  • methodology. Tests were run often.
  • Small code base with small test suite.

lines of code 5714 total time worked (hours) 59 total test runs 266 average time between tests (mins) 5

slide-36
SLIDE 36

David Saff 36

We want to reduce wasted time

Test-wait time. If developers test

  • ften, they spend a lot
  • f time waiting for

tests to complete. Regret time: If developers test rarely, regression errors are not found

  • quickly. Extra time is

spent remembering and fixing old changes.

slide-37
SLIDE 37

David Saff 37

Results predict: continuous testing reduces wasted time

Wasted Time Reduction by Continuous Testing

0.00 0.02 0.04 0.06 0.08 0.10 0.12

Observed Best Reorder Random Recent Errors

Without ct With ct Wasted Time Regret Test-wait

Best we can do by changing frequency Best we can do by changing

  • rder

Continuous testing drastically cuts regret time.

slide-38
SLIDE 38

David Saff 38

A small catalog of test factorings

  • Like refactorings, test factorings can be

catalogued, reasoned about, and automated

Separate Sequential Code:

Also “Unroll Loop”, “Inline Method”, etc. to produce sequential code

slide-39
SLIDE 39

David Saff 39

A small catalog of test factorings

Original test Mocked Environment Unit Mocked Unit Environment

Introduce Mock:

slide-40
SLIDE 40

David Saff 40

Unit

Unit test

Provided Checked

slide-41
SLIDE 41

David Saff 41

Always tested: Continuous Testing and Test Factoring

David Saff MIT CSAIL IBM T J Watson, April 2005

slide-42
SLIDE 42

David Saff 42

Overview

  • Part I: Continuous testing

Continuous testing runs tests in the background to provide feedback as developers code.

  • Part II: Test factoring

Test factoring creates small, focused unit tests from large, general system tests

slide-43
SLIDE 43

David Saff 43

Part I: Continuous testing

  • Continuous testing runs tests in the

background to provide feedback as developers code.

  • Work with Kevin Chevalier, Michael

Bridge, Michael D. Ernst

slide-44
SLIDE 44

David Saff 44

Part I: Continuous testing

  • Motivation
  • Students with continuous testing:

– Were more likely to complete an assignment – Took no longer to finish

  • A continuous testing plug-in for Eclipse is

publicly available.

  • Demo!
slide-45
SLIDE 45

David Saff 45

“Traditional” testing during software maintenance (v2.0 → v2.1)

  • Developer has v2.0 test suite

– Changes the code – Runs the tests – Waits for completion – Repeats…

developer changes code computer runs tests developer changes code

zzz … zzz … zzz …

slide-46
SLIDE 46

David Saff 46

Continuous Testing

  • Continuous testing

uses excess cycles

  • n a nearby

workstation to continuously run regression tests in the background as the developer edits code.

  • Developer no longer

thinks about what to test when.

developer changes code system runs tests system notified about changes system notifies about errors

slide-47
SLIDE 47

David Saff 47

Continuous testing: inspired by continuous compilation

  • Continuous compilation, as in Eclipse, notifies

the developer quickly when a syntactic error is introduced:

  • Continuous testing notifies the developer

quickly when a semantic error is introduced:

slide-48
SLIDE 48

David Saff 48

Case study

  • Single-developer case study [ISSRE 03]
  • Maintenance of existing software with

regression test suites

  • Test suites took minutes: test prioritization

needed for best results

  • Focus: quick discovery of regression

errors to reduce development time (10- 15%)

slide-49
SLIDE 49

David Saff 49

Controlled human experiment

  • 22 undergraduate students developing Java in

Emacs

  • Each subject performed two 1-week class

programming assignments

– Test suites provided in advance

  • Initial development: regressions less important
  • Test suites took seconds: prioritization

unnecessary

  • Focus: “What happens when the computer

thinks about testing for us?”

slide-50
SLIDE 50

David Saff 50

Experimental Questions

  • 1. Does continuous testing improve

productivity?

  • 2. Does continuous compilation improve

productivity?

  • 3. Can productivity benefits be

attributed to other factors?

  • 4. Does asynchronous feedback distract

users?

Yes Yes No No

slide-51
SLIDE 51

David Saff 51

Productivity measures

  • time worked: Time spent editing source

files.

  • grade: On each individual problem set.
  • correct program: True if the student

solution passed all tests.

  • failed tests: Number of tests that the

student submission failed.

slide-52
SLIDE 52

David Saff 52

Treatment predicts correctness (Questions 1 and 2)

78% 18 Continuous testing 50% 10 Continuous compilation 27% 11 No tool Correct programs N Treatment p < .03

slide-53
SLIDE 53

David Saff 53

Can other factors explain this? (Question 3)

  • Frequent testing: no

– Frequent manual testing: 33% success

  • Easy testing: no

– All students could test with a keystroke

  • Demographics: no

– No significant differences between groups

78%

  • Cont. testing

50%

  • Cont. comp.

27% No tool correct Treatment

slide-54
SLIDE 54

David Saff 54

No significant effect on other productivity measures

85% 2.9 10.7 hrs 18

  • Cont. testing

83% 4.1 10.6 hrs 10

  • Cont. comp.

79% 7.6 10.1 hrs 11 No tool Grade Failed tests Time worked N Treatment

slide-55
SLIDE 55

David Saff 55

Did continuous testing win over users? (Question 4)

90% I would recommend the tool to others 80% …for my own programming 94% …for the rest of the class Yes I would use the tool…

slide-56
SLIDE 56

David Saff 56

Eclipse plug-in for continuous testing

  • Upgrades current Eclipse JUnit

integration:

– Remember and display results from several test suites – Pluggable test prioritization and selection strategies. – Remote test execution – Associate test suites with projects

slide-57
SLIDE 57

David Saff 57

Eclipse plug-in for continuous testing

  • Adds continuous testing:

– Tests run with every compile – Can run as low-priority process – Can take advantage of hotswapping JVMs – Works with plug-in tests, too.

  • Demo!
slide-58
SLIDE 58

David Saff 58

Future Work: Continuous testing

  • Incorporate JUnit and continuous testing

features from plug-in directly into Eclipse

  • Encourage test prioritization researchers

to implement JUnit plug-ins

  • Industrial case studies
slide-59
SLIDE 59

David Saff 59

System Test

Provided Checked

There’s more than one way to factor a test! Basic strategy:

  • Capture a subset of behavior beforehand.
  • Replay that behavior at test time.

Xcapture Xcapture Xcapture

slide-60
SLIDE 60

David Saff 60

Separate Sequential

Unit test Unit test Unit test Unit test Separate Sequential:

  • Before each stage, recreate state
  • After each stage, confirm state is correct