Optimizing unit test execution in large software programs using - - PowerPoint PPT Presentation

optimizing unit test execution in large software programs
SMART_READER_LITE
LIVE PREVIEW

Optimizing unit test execution in large software programs using - - PowerPoint PPT Presentation

Optimizing unit test execution in large software programs using dependency analysis Taesoo Kim, Ramesh Chandra and Nickolai Zeldovich MIT CSAIL Running unit tests takes too long Its our policy to make sure all tests pass at all times .


slide-1
SLIDE 1

Optimizing unit test execution in large software programs using dependency analysis

Taesoo Kim, Ramesh Chandra and Nickolai Zeldovich

MIT CSAIL

slide-2
SLIDE 2

2

Running unit tests takes too long

It’s our policy to make sure all tests pass at all times.

  • Large software programs often require running full

unit tests for each commit

  • But, unit tests take about 10 min in Django
  • With our work, it can be done within 2 sec!
slide-3
SLIDE 3

3

Current approaches for shortening testing time

  • Modular unit tests (e.g., testsuite)

– Run a certain set of unit tests that might be affected

  • Test bot (e.g., gtest, autotest)

– Run unit tests remotely and get the results back

slide-4
SLIDE 4

4

Problem: current approaches are very limited

  • Manual efforts involved

– Maintaining multiple test suites

  • Overall testing still takes too long

– Waiting for Test bot to complete full unit testing

slide-5
SLIDE 5

5

Research: regression test selection (RTS)

  • Goal: run only necessary tests instead of full tests

– identify test cases whose results might change due

to the current code modification

– Step 1: analyze test cases (e.g., execution traces) – Step 2: syntactically analyze code changes – Step 3: output the affected test cases

RTS

Code changes Affected test cases Test cases

slide-6
SLIDE 6

6

Problem: RTS techniques are never adopted in practice

  • “Soundness” of RTS techniques kills adoption

– Soundness means no false negatives – Impose non-negligible perf. overheads (analysis/runtime) – Select lots of test cases (particularly in dynamic languages) – e.g., changes in a global variable

run → all test cases

slide-7
SLIDE 7

7

Goal: make RTS practical

  • Idea 1: trade off soundness for performance

– Keep track of function-level dependency / changes – Fewer tests selected, may have false negatives

  • Idea 2: integrate test optimization into dev. cycle

– Maintain dependency information in code repository

slide-8
SLIDE 8

8

Current development cycle

<HEAD> Source tree <HEAD> Repository server Programmer's computer Check out code

Local repo.

slide-9
SLIDE 9

9

Current development cycle

<HEAD> Source tree <HEAD> Repository server Programmer's computer Check out code

Changes

Local repo.

slide-10
SLIDE 10

10

Current development cycle

<HEAD> Source tree <HEAD> Repository server Programmer's computer Check out code

Unit testing

Test results

Development cycle Changes

Local repo.

slide-11
SLIDE 11

11

New development cycle

<HEAD> Local repo. Diff Test case information Source tree <HEAD> Repository server Programmer's computer Check out code

Changes

Analyzing dependencies

Affected test cases

Development cycle Unit testing

⑤ Test results

slide-12
SLIDE 12

12

New development cycle

<HEAD> Local repo. Diff Test case information Source tree <HEAD> Repository server Programmer's computer Check out code

Changes

Analyzing dependencies

Affected test cases

Development cycle Unit testing

⑤ Test results

slide-13
SLIDE 13

13

Identifying affected test cases by the code modification

  • Plan: track which tests execute which functions

– Step 1: generate function-level dependency info.

  • Map: invoked functions

test case ↔

  • Construct map by running all unit tests

– Step 2: identify modified func., given code changes – Step 3: identify tests that ran the modified func.

slide-14
SLIDE 14

14

Identifying affected test cases by the code modification

  • Plan: track which tests execute which functions

– Step 1: generate function-level dependency info.

  • Map: invoked functions

test case ↔

  • Construct map by running all unit tests

– Step 2: identify modified func., given code changes – Step 3: identify tests that ran the modified func.

slide-15
SLIDE 15

15

Bootstrapping dependency info.

<HEAD> Local repo. Diff

  • Dep. info

Development cycle Source tree <HEAD> Repository server Programmer's computer Check out code Changes Analyzing dependencies Unit testing Testing results Generated by running full unit tests

slide-16
SLIDE 16

16

Bootstrapping dependency info.

<HEAD> Local repo. Diff

  • Dep. info

Development cycle Source tree <HEAD> Repository server Programmer's computer Check out code Changes Analyzing dependencies Unit testing Testing results Dependency info <HEAD> Dependency server Check out dep. info <HEAD>

slide-17
SLIDE 17

17

Update dependency information

<HEAD> Local repo. Diff

  • Dep. Info

<HEAD> Development cycle Source tree <HEAD> Repository server Programmer's computer Analyzing dependencies Testing results Dependency info <HEAD> Dependency server Incremental

  • dep. info

Unit testing <0xac0ffee> <0xac0ffee> Changes

slide-18
SLIDE 18

18

Update dependency information

<HEAD> Local repo. Diff

  • Dep. Info

<HEAD> Development cycle Source tree <HEAD> Repository server Programmer's computer Analyzing dependencies Testing results Dependency info <HEAD> Dependency server Incremental

  • dep. info

Unit testing <0xac0ffee> <0xac0ffee> Changes

slide-19
SLIDE 19

19

Problem: false negatives

  • Function-level tracking can miss some dependencies and

cause false negatives

– Failed to identify some test cases that are actually affected

  • Identified five types of missing dependencies

– Inter-class dependency – Non-determinism – Class variable – Global-scope – Lexical dependency

slide-20
SLIDE 20

20

Problem: false negatives

  • Function-level tracking can miss some dependencies and

cause false negatives

– Failed to identify some test cases that are actually affected

  • Identified five types of missing dependencies

– Inter-class dependency – Non-determinism – Class variable – Global-scope – Lexical dependency

slide-21
SLIDE 21

21

Example: inter-class dep. in Python

class A: def foo(): return 1 class B(A): pass def testcase(): assertEqual( B().foo(), 1)

slide-22
SLIDE 22

22

Example: inter-class dep. in Python

class A: def foo(): return 1 class B(A): pass def testcase(): assertEqual( B().foo(), 1)

Dependency info:

testcase() → B.__init__() A.foo()

slide-23
SLIDE 23

23

class A: def foo(): return 1 class B(A): pass def testcase(): assertEqual( B().foo(), 1)

Example: inter-class dep. in Python

Dependency info: Modified functions:

testcase() → B.__init__() A.foo() B.foo()

  • pass

+ def foo(): + return 2

slide-24
SLIDE 24

24

Example: missing dep. because of non-determinism in Python

def foo():

  • return 1

+ return 2 def testcase(): if rand()%2: assertEqual( foo(), 1)

Dependency info: Modified functions:

testcase() → rand() foo() foo() testcase() → rand()

  • r
slide-25
SLIDE 25

25

Example: missing dep. because of non-determinism in Python

def foo():

  • return 1

+ return 2 def testcase(): if rand()%2: assertEqual( foo(), 1)

Dependency info: Modified functions:

testcase() → rand() foo() foo() testcase() → rand()

  • r
slide-26
SLIDE 26

26

Example: class-var. dep. in Python

Dependency info: Modified functions:

testcase() → foo() N/A class C:

  • a = 1

+ a = 2 def foo(): return C.a def testcase(): assertEqual( foo(), 1)

slide-27
SLIDE 27

27

Solution: test server runs all tests async.

<HEAD> Local repo. Diff

  • Dep. Info

<HEAD> Development cycle Source tree <HEAD> Repository server Programmer's computer Changes Analyzing dependencies Unit testing Testing results Dependency info <HEAD> Dependency server Full unit testing <HEAD> Test server Incremental

  • dep. info

Changes

slide-28
SLIDE 28

28

Test server also verifies dep. info

<HEAD> Local repo. Diff

  • Dep. Info

<HEAD> Development cycle Source tree <HEAD> Repository server Programmer's computer Analyzing dependencies Testing results Dependency info <HEAD> Dependency server Full unit testing <HEAD> Test server Unit testing Verify Changes Changes Incremental

  • dep. info
slide-29
SLIDE 29

29

TAO: a prototype for PyUnit

  • Dep. Info

<HEAD> Source tree <HEAD> Repository server Programmer's computer Dependency info <HEAD> Dependency server Full unit testing <HEAD> Test server Incremental

  • dep. info

<HEAD> Repository Changes Analyzing dependencies Unit testing Testing results Diff Development cycle

slide-30
SLIDE 30

30

Implementation

  • TAO: a prototype for PyUnit

– Extending standard python-unittest library – Patch analysis: using ast/diff python module – Dependency tracking: using settrace() interface – 800 Lines of code in Python

slide-31
SLIDE 31

31

Evaluation

  • How many functions are modified in each

commit in large software programs?

  • How much testing time can be saved as result?
  • How many false negatives does TAO incur?
  • What is the overall runtime overhead of TAO?
slide-32
SLIDE 32

32

Experiment setup

  • Two popular projects: Django and Twisted

– Django: a web application framework – Twisted: a network protocol engine – Use existing unit tests of both projects – Integrate TAO into both projects – Analyze the latest 100 commits of each project

slide-33
SLIDE 33

33

Small number of functions are modified in each commit

  • Django: 50.8 / 13k functions (0.3%)
  • Twisted: 18.2 / 23k functions (0.07%)

Commit IDs (recent 100 commits) Django Twisted

slide-34
SLIDE 34

34

Small number of functions are modified in each commit

  • Django: 50.8 / 13k functions (0.3%)
  • Twisted: 18.2 / 23k functions (0.07%)

Commit IDs (recent 100 commits) Django Twisted

slide-35
SLIDE 35

35

Small number of functions are modified in each commit

  • Django: 50.8 / 13k functions (0.3%)
  • Twisted: 18.2 / 23k functions (0.07%)

Commit IDs (recent 100 commits) Django Twisted

slide-36
SLIDE 36

36

Small number of functions are modified in each commit

  • Django: 50.8 / 13k functions (0.3%)
  • Twisted: 18.2 / 23k functions (0.07%)

Commit IDs (recent 100 commits) Django Twisted

slide-37
SLIDE 37

37

Small number of test cases need to be rerun

  • Django: 50.4 / 5k test cases (1.0%)
  • Twisted: 28.7 / 7k test cases (0.4%)

Django Twisted Commit IDs (recent 100 commits)

slide-38
SLIDE 38

38

Small number of test cases need to be rerun

  • Django: 50.4 / 5k test cases (1.0%)
  • Twisted: 28.7 / 7k test cases (0.4%)

Django Twisted Commit IDs (recent 100 commits)

slide-39
SLIDE 39

39

Trend 1: #affected test cases is correlated with #modified functions

Django Commit IDs (recent 100 commits)

slide-40
SLIDE 40

40

Trend 2: many modified functions, few affected test cases

Django Commit IDs (recent 100 commits)

slide-41
SLIDE 41

41

Trend 2: many modified functions, few affected test cases

Django

Refactoring (maintenance): e.g., unittest2()

Commit IDs (recent 100 commits)

slide-42
SLIDE 42

42

Trend 3: few modified functions, many affected test cases

Django Commit IDs (recent 100 commits)

slide-43
SLIDE 43

43

Trend 3: few modified functions, many affected test cases

Changes in “hot” funcs: e.g., WSGIRequest()

Django Commit IDs (recent 100 commits)

slide-44
SLIDE 44

44

TAO can improve the overall execution time for unit testing

Project #Test cases Execution time (s) All TAO All TAO Django 5,166 50.8 520.3s 1.7s Twisted 7,150 28.7 72.1s 2.2s

  • Django: 520.3s

1.7s (5k 50.8 test cases) → →

  • Twisted: 72.1s

2.2s (7k 29.7 test cases) → →

slide-45
SLIDE 45

45

TAO has few false negatives (FN)

Project FN/I

(inter-class)

FN/N

(non-det.)

FN/G

(global scope)

FN/C

(class var.)

FN/L

(lexical dep.)

Django 0/0 0/0 2/8 1/3 1/23 Twisted 1/2 0/0 1/20 1/17 0/11

  • We manually identified types of missing dependencies and

false negatives on each commit

  • Django: 3 false negatives (one commit is counted in both G/L)
  • Twisted: 3 false negatives
slide-46
SLIDE 46

46

TAO has few false negatives (FN)

Project FN/I

(inter-class)

FN/N

(non-det.)

FN/G

(global scope)

FN/C

(class var.)

FN/L

(lexical dep.)

Django 0/0 0/0 2/8 1/3 1/23 Twisted 1/2 0/0 1/20 1/17 0/11

  • We manually identified types of missing dependencies and

false negatives on each commit

  • Django: 3 false negatives (one commit is counted in both G/L)
  • Twisted: 3 false negatives

Among class variable deps we identified, how many false negatives end up getting at?

slide-47
SLIDE 47

47

Example: not all missing deps cause false negatives

class DecimalField(IntegerField): default_error_messages = { ...

  • 'max_digits': _(msg)

+ 'max_digits': ungettext_lazy(msg) ... def __init__(...): ...

  • raise ValidationError(oldmsg)

+ raise ValidationError(newmsg) Missing dep.: class var. Function-level dependency

slide-48
SLIDE 48

48

Dependency tracking imposes performance overheads

Project Runtime Storage no TAO TAO Full Incremental Django 520.3s 1,129.1s 9.9MB 270KB Twisted 72.1s 115.6s 1.3MB 280KB

  • Django: 10 min (117%) to generate dep. info (9.9MB)
  • Twisted: <1 min (60%) to generate dep. info (1.3MB)
  • Performance can be improved if we implement function-level

tracing natively, instead of using settrace() library.

slide-49
SLIDE 49

49

Incremental dependency information is small

Project Runtime Storage no TAO TAO Full Incremental Django 520.3s 1,129.1s 9.9MB 270KB Twisted 72.1s 115.6s 1.3MB 280KB

  • Django: 270KB incremental dep. info (per commit)
  • Twisted: 280KB incremental dep. info (per commit)
slide-50
SLIDE 50

50

Related work

  • Regression test selection:

– RTS [Biswas '11]: survey of available RTS techniques

→ Simple function-level dependency is effective in practice → TAO can be integrated into the programmer's workflow

  • Dependency tracking:

– Poirot [Kim '12]: intrusion recovery – TaintDroid [Enck '12]: privacy monitoring

→ Dependency tracking can optimize unit test execution

slide-51
SLIDE 51

51

Summary

TAO: a system that optimizes unit test execution using dependency analysis

– Tracks function-level dependency of each unit test – Analyzes code changes to find the affected test cases – Runs only affected test cases (but few false negative) – Integrated into programmer's development cycle