AUTOMATED UNIT TEST GENERATION DURING SOFTWARE DEVELOPMENT A - - PowerPoint PPT Presentation

automated unit test generation during software development
SMART_READER_LITE
LIVE PREVIEW

AUTOMATED UNIT TEST GENERATION DURING SOFTWARE DEVELOPMENT A - - PowerPoint PPT Presentation

AUTOMATED UNIT TEST GENERATION DURING SOFTWARE DEVELOPMENT A Controlled Experiment and Think-aloud Observations ISSTA 2015 Jos Miguel Rojas j.rojas@sheffield.ac.uk Joint work with Gordon Fraser and Andrea Arcuri Testing is a widespread


slide-1
SLIDE 1

AUTOMATED UNIT TEST GENERATION DURING SOFTWARE DEVELOPMENT

José Miguel Rojas j.rojas@sheffield.ac.uk Joint work with Gordon Fraser and Andrea Arcuri

A Controlled Experiment and Think-aloud Observations ISSTA 2015

slide-2
SLIDE 2

“Testing is a widespread validation approach in industry, but it is still largely ad hoc, expensive, and unpredictably effective.”

“Software Testing Research: Achievements, Challenges, Dreams,”

  • A. Bertolino. Future of Software Engineering. IEEE . 2007.
slide-3
SLIDE 3

“Testing is a widespread validation approach in industry, but it is still largely ad hoc, expensive, and unpredictably effective.” “Test case generation has a strong impact on the effectiveness and efficiency of testing.” “…one of the most active research topics in software testing for several decades, resulting in many different approaches and tools.”

“Software Testing Research: Achievements, Challenges, Dreams,”

  • A. Bertolino. Future of Software Engineering. IEEE . 2007.

”An orchestrated survey of methodologies for automated software test case generation,” S. Anand, E. K. Burke, T.

  • Y. Chen, J. Clark, M.B. Cohen, W. Grieskamp, M.

Harman, M.J. Harrold, P . McMinn. J. Systems and Software. Elsevier. 2013.

slide-4
SLIDE 4
slide-5
SLIDE 5

BACK IN ISSTA 2013…

“Does automated white-box test generation really help software testers?,” G. Fraser, M. Staats, P . McMinn, A. Arcuri and F. Padberg

slide-6
SLIDE 6

BACK IN ISSTA 2013…

“Does automated white-box test generation really help software testers?,” G. Fraser, M. Staats, P . McMinn, A. Arcuri and F. Padberg

ARE UNIT TEST GENERATION TOOLS HELPFUL TO DEVELOPERS WHILE THEY ARE CODING?

slide-7
SLIDE 7
slide-8
SLIDE 8

CODE COVERAGE

slide-9
SLIDE 9

CODE COVERAGE TIME SPENT ONTESTING

slide-10
SLIDE 10

CODE COVERAGE TIME SPENT ONTESTING IMPLEMENTATION QUALITY

slide-11
SLIDE 11

CONTROLLED EXPERIMENT

slide-12
SLIDE 12

CONTROLLED EXPERIMENT

Golden Implementation
 and Test Suite

slide-13
SLIDE 13

CONTROLLED EXPERIMENT

Class Template Golden Implementation
 and Test Suite

slide-14
SLIDE 14

CONTROLLED EXPERIMENT

Class Template Implementation 
 and Test Suite Golden Implementation
 and Test Suite

slide-15
SLIDE 15

CONTROLLED EXPERIMENT

Class Template Implementation 
 and Test Suite Golden Implementation
 and Test Suite

slide-16
SLIDE 16

CONTROLLED EXPERIMENT

Class Template Implementation 
 and Test Suite

Manual

Golden Implementation
 and Test Suite

slide-17
SLIDE 17

CONTROLLED EXPERIMENT

1 hour

Class Template Implementation 
 and Test Suite

Manual

Golden Implementation
 and Test Suite

slide-18
SLIDE 18

CONTROLLED EXPERIMENT

1 hour

Class Template Implementation 
 and Test Suite

41

Manual

Golden Implementation
 and Test Suite

slide-19
SLIDE 19

CONTROLLED EXPERIMENT

1 hour

Class Template Implementation 
 and Test Suite

41

Manual

Golden Implementation
 and Test Suite

slide-20
SLIDE 20

CONTROLLED EXPERIMENT

1 hour

Class Template Implementation 
 and Test Suite

41 2

Manual

Golden Implementation
 and Test Suite

slide-21
SLIDE 21

CONTROLLED EXPERIMENT

1 hour

Class Template Implementation 
 and Test Suite

41 4 2

Manual

Golden Implementation
 and Test Suite

slide-22
SLIDE 22

DOES USING EVOSUITE DURING SOFTWARE DEVELOPMENT LEAD TO TEST SUITES WITH HIGHER CODE COVERAGE?

RQ 1

slide-23
SLIDE 23

Branch Coverage

0% 20% 40% 60% 80% 100% FilterIterator FixedOrderComparator ListPopulation PredicatedMap

50% 26% 57% 39% 41% 83% 38% 63%

Assisted Manual

CODE COVERAGE

participants’ test suites run on their own implementations

slide-24
SLIDE 24

CODE COVERAGE

Times Coverage was checked

2 4 6 8 10

Category Axis

FilterIterator FixedOrderComparator ListPopulation PredicatedMap

6.4 4 9.6 6 5.3 5.9 1.9 9 Assisted Manual

Times coverage was checked

slide-25
SLIDE 25

ListPopulation

Branch Coverage (%)

0% 25% 50% 75% 100%

Time (min)

10 20 30 40 50 60

Manual EvoSuite Assisted

CODE COVERAGE

participant’s test suites run on their own implementations

slide-26
SLIDE 26

ListPopulation

Branch Coverage (%)

0% 25% 50% 75% 100%

Time (min)

10 20 30 40 50 60

Manual EvoSuite Assisted

CODE COVERAGE

participant’s test suites run on their own implementations

slide-27
SLIDE 27

ListPopulation

Branch Coverage (%)

0% 25% 50% 75% 100%

Time (min)

10 20 30 40 50 60

Manual EvoSuite Assisted

CODE COVERAGE

participant’s test suites run on their own implementations

slide-28
SLIDE 28

ListPopulation

Branch Coverage (%)

0% 25% 50% 75% 100%

Time (min)

10 20 30 40 50 60

Manual EvoSuite Assisted

CODE COVERAGE

participant’s test suites run on their own implementations

slide-29
SLIDE 29

Branch Coverage

0% 20% 40% 60% 80% 100% FilterIterator FixedOrderComparator ListPopulation PredicatedMap

50% 28% 21% 30% 42% 37% 35% 41%

Assisted Manual

CODE COVERAGE

participants’ test suites run on golden implementations

slide-30
SLIDE 30

FilterIterator

Branch Coverage 0% 25% 50% 75% 100% Time (min) 10 20 30 40 50 60

Assisted Manual EvoSuite-generated

CODE COVERAGE

participant’s test suites run on golden implementations, over time

ListPopulation

Branch Coverage 0% 25% 50% 75% 100% Time (min) 10 20 30 40 50 60

FixedOrderComparator

Branch Coverage 0% 25% 50% 75% 100% Time (min) 10 20 30 40 50 60

PredicatedMap

Branch Coverage 0% 25% 50% 75% 100% Time (min) 10 20 30 40 50 60

slide-31
SLIDE 31

FilterIterator

Branch Coverage 0% 25% 50% 75% 100% Time (min) 10 20 30 40 50 60

Assisted Manual EvoSuite-generated

CODE COVERAGE

participant’s test suites run on golden implementations, over time

ListPopulation

Branch Coverage 0% 25% 50% 75% 100% Time (min) 10 20 30 40 50 60

FixedOrderComparator

Branch Coverage 0% 25% 50% 75% 100% Time (min) 10 20 30 40 50 60

PredicatedMap

Branch Coverage 0% 25% 50% 75% 100% Time (min) 10 20 30 40 50 60

Coverage can be higher when using EvoSuite, depending on how the generated tests are used.

slide-32
SLIDE 32

DOES USING EVOSUITE DURING SOFTWARE DEVELOPMENT LEAD TO DEVELOPERS SPENDING MORE OR LESS TIME ON TESTING?

RQ 2

slide-33
SLIDE 33

TESTING EFFORT

3 6 8 11 14 FilterIterator FixedOrderComparator ListPopulation PredicatedMap

5.6 11 12.9 13.7 4.4 8.2 7.2 7.8 Assisted Manual

Number of test runs

slide-34
SLIDE 34

TESTING EFFORT

5 10 16 21 26 FilterIterator FixedOrderComparator ListPopulation PredicatedMap

14.3 25 15.8 20 7.7 12.6 9.3 18.5

Assisted Manual

Minutes spent on testing

slide-35
SLIDE 35

TESTING EFFORT

5 10 16 21 26 FilterIterator FixedOrderComparator ListPopulation PredicatedMap

14.3 25 15.8 20 7.7 12.6 9.3 18.5

Assisted Manual

Minutes spent on testing

Using EvoSuite reduces the time spent on testing.

slide-36
SLIDE 36

DOES USING EVOSUITE DURING SOFTWARE DEVELOPMENT LEAD TO SOFTWARE WITH FEWER BUGS?

RQ 3

slide-37
SLIDE 37

IMPLEMENTATION QUALITY

Golden test suites run on participants’ implementations

Number of Failures+Errors

3 6 10 13 16 FilterIterator FixedOrderComparator ListPopulation PredicatedMap

14.3 5.3 4.3 6.3 15.6 6.4 4.2 6.1

Assisted Manual

slide-38
SLIDE 38

IMPLEMENTATION QUALITY

Golden test suites run on participants’ implementations

Number of Failures+Errors

3 6 10 13 16 FilterIterator FixedOrderComparator ListPopulation PredicatedMap

14.3 5.3 4.3 6.3 15.6 6.4 4.2 6.1

Assisted Manual

Using EvoSuite during development did not 
 lead to to better implementations.

slide-39
SLIDE 39

DOES SPENDING MORE TIME WITH EVOSUITE AND ITS TESTS LEAD TO BETTER IMPLEMENTATIONS?

RQ 4

slide-40
SLIDE 40

PRODUCTIVITY

Time spent with EvoSuite

Correlation with 
 number of failures plus errors

  • 0.50
  • 0.40
  • 0.30
  • 0.20
  • 0.10

0.00 0.10 0.20 0.30 0.40 FilterIterator FixedOrderComparator ListPopulation PredicatedMap

  • 0.49

0.02

  • 0.22
  • 0.29

0.35 0.32

  • 0.03

0.35 Number of runs Time spent on tests

slide-41
SLIDE 41

PRODUCTIVITY

Time spent with EvoSuite

Correlation with 
 number of failures plus errors

  • 0.50
  • 0.40
  • 0.30
  • 0.20
  • 0.10

0.00 0.10 0.20 0.30 0.40 FilterIterator FixedOrderComparator ListPopulation PredicatedMap

  • 0.49

0.02

  • 0.22
  • 0.29

0.35 0.32

  • 0.03

0.35 Number of runs Time spent on tests

Implementation quality improves the more time
 developers spend with EvoSuite-generated tests.

slide-42
SLIDE 42

Using automated unit test generation does impact developers’ productivity,

slide-43
SLIDE 43

Using automated unit test generation does impact developers’ productivity,

but…

slide-44
SLIDE 44

…how to make the most out

  • f unit test generation tools?

Using automated unit test generation does impact developers’ productivity,

but…

slide-45
SLIDE 45

THINK ALOUD OBSERVATIONS

  • J. Hughes and S. Parkes, “Trends in the use of verbal protocol analysis in software engineering research,” Behaviour and

Information Technology, vol. 22, no. 2, pp. 127–140, 2003.

  • K. A. Ericsson and H. A. Simon, Protocol Analysis: Verbal Reports as Data (revised edition). MIT Press, 1993.
slide-46
SLIDE 46

THINK ALOUD OBSERVATIONS

  • J. Hughes and S. Parkes, “Trends in the use of verbal protocol analysis in software engineering research,” Behaviour and

Information Technology, vol. 22, no. 2, pp. 127–140, 2003.

  • K. A. Ericsson and H. A. Simon, Protocol Analysis: Verbal Reports as Data (revised edition). MIT Press, 1993.

Subject

slide-47
SLIDE 47

THINK ALOUD OBSERVATIONS

  • J. Hughes and S. Parkes, “Trends in the use of verbal protocol analysis in software engineering research,” Behaviour and

Information Technology, vol. 22, no. 2, pp. 127–140, 2003.

  • K. A. Ericsson and H. A. Simon, Protocol Analysis: Verbal Reports as Data (revised edition). MIT Press, 1993.

Observer Subject

slide-48
SLIDE 48

THINK ALOUD OBSERVATIONS

2 hours

Class Template Implementation 
 and Test Suite Golden Implementation
 and Test Suite

5 4 1

slide-49
SLIDE 49

THINK ALOUD OBSERVATIONS

2 hours

Class Template Implementation 
 and Test Suite Golden Implementation
 and Test Suite

5 4 1

slide-50
SLIDE 50

THINK ALOUD OBSERVATIONS

2 hours

Class Template Implementation 
 and Test Suite Golden Implementation
 and Test Suite

5 4 1

slide-51
SLIDE 51

THINK ALOUD OBSERVATIONS

2 hours

Class Template Implementation 
 and Test Suite Golden Implementation
 and Test Suite

5 4 1

slide-52
SLIDE 52

THINK ALOUD OBSERVATIONS

2 hours

Class Template Implementation 
 and Test Suite Golden Implementation
 and Test Suite

5 4 1

slide-53
SLIDE 53

THINK ALOUD OBSERVATIONS

2 hours

Class Template Implementation 
 and Test Suite Golden Implementation
 and Test Suite

5

4 1

slide-54
SLIDE 54

THINK ALOUD OBSERVATIONS

2 hours

Class Template Implementation 
 and Test Suite Golden Implementation
 and Test Suite

5 4 1

slide-55
SLIDE 55

THINK ALOUD OBSERVATIONS

2 hours

Class Template Implementation 
 and Test Suite Golden Implementation
 and Test Suite

5 4

1

slide-56
SLIDE 56

THINK ALOUD OBSERVATIONS

2 hours

Class Template Implementation 
 and Test Suite Golden Implementation
 and Test Suite

5 4 1

slide-57
SLIDE 57

THINK ALOUD OBSERVATIONS

2 hours

Class Template Implementation 
 and Test Suite Golden Implementation
 and Test Suite

5 4 1

slide-58
SLIDE 58

RESULTS

Images: http://jozef89.deviantart.com/

3 6 2 2 5 15 20 23 7 10

94% | 92% 97% | 72% 94% | 95% 100% | 94% 100% | 83% FixedOrderComparator ListPopulation FilterIterator PredicatedMap PredicatedMap

slide-59
SLIDE 59

RESULTS

Images: http://jozef89.deviantart.com/

3 6 2 2 5 15 20 23 7 10

94% 97% 94% | 95% 100% 100% FixedOrderComparator ListPopulation FilterIterator PredicatedMap PredicatedMap

slide-60
SLIDE 60

RESULTS

Images: http://jozef89.deviantart.com/

3 6 2 2 5 15 20 23 7 10

94% 97% | 72% 94% 100% 100%

FixedOrderComparator ListPopulation

FilterIterator PredicatedMap PredicatedMap

slide-61
SLIDE 61

LESSONS LEARNED

slide-62
SLIDE 62

LESSONS LEARNED

  • There are different approaches to testing and test generation tools

should be adaptable to them

slide-63
SLIDE 63

LESSONS LEARNED

  • There are different approaches to testing and test generation tools

should be adaptable to them

  • Developers’ behaviour is often not driven by code coverage
slide-64
SLIDE 64

LESSONS LEARNED

  • There are different approaches to testing and test generation tools

should be adaptable to them

  • Developers’ behaviour is often not driven by code coverage
  • Readability of generated unit tests is paramount
slide-65
SLIDE 65

LESSONS LEARNED

  • There are different approaches to testing and test generation tools

should be adaptable to them

  • Developers’ behaviour is often not driven by code coverage
  • Readability of generated unit tests is paramount
  • Integration into development environments must be improved
slide-66
SLIDE 66

LESSONS LEARNED

  • There are different approaches to testing and test generation tools

should be adaptable to them

  • Developers’ behaviour is often not driven by code coverage
  • Readability of generated unit tests is paramount
  • Integration into development environments must be improved
  • Education/Best practices: Developers do not know how to

best use automated test generation tools!

slide-67
SLIDE 67
slide-68
SLIDE 68
slide-69
SLIDE 69
slide-70
SLIDE 70
slide-71
SLIDE 71

“Coverage is easy to assess because it is a number, while readability is a very non- tangible property…

slide-72
SLIDE 72

“Coverage is easy to assess because it is a number, while readability is a very non- tangible property… … What is readable to me may not be readable to you. It is readable to me just because I spent the last hour and a half doing this.”

slide-73
SLIDE 73

“Coverage is easy to assess because it is a number, while readability is a very non- tangible property… … What is readable to me may not be readable to you. It is readable to me just because I spent the last hour and a half doing this.”

—Participant 5

slide-74
SLIDE 74

j.rojas@sheffield.ac.uk

slide-75
SLIDE 75

www.evosuite.org/study-2014/

j.rojas@sheffield.ac.uk