Debugging the Performance of Mavens Test Isolation: Experience - - PowerPoint PPT Presentation

debugging the performance of maven s test isolation
SMART_READER_LITE
LIVE PREVIEW

Debugging the Performance of Mavens Test Isolation: Experience - - PowerPoint PPT Presentation

Debugging the Performance of Mavens Test Isolation: Experience Report Pengyu Nie 1 Ahmet Celik 2 Matthew Coley 3 Aleksandar Milicevic 4 Jonathan Bell 3 Milos Gligoric 1 1 The University of Texas at Austin 2 Facebook, Inc. 3 George Mason


slide-1
SLIDE 1

Debugging the Performance of Maven’s Test Isolation: Experience Report

Pengyu Nie 1 Ahmet Celik 2 Matthew Coley 3 Aleksandar Milicevic 4 Jonathan Bell 3 Milos Gligoric 1

1The University of Texas at Austin 2Facebook, Inc. 3George Mason University 4Microsoft

ISSTA 2020

slide-2
SLIDE 2

Need for Test Isolation

Code Under Test test0 test1 test2 testn . . . test0 test2 test1 testn . . .

Tests in industry are riddled with flakiness

tests may pass or fail nondeterministically without code changes

A common practice to combat flaky tests is to run them in isolation from each other, to eliminate test-order dependencies

1 / 20

slide-3
SLIDE 3

Test Isolation Introduces Substantial Overhead

Absolute Isolation run each test in a fresh VM more isolation higher cost ← − less isolation lower cost − →

2 / 20

slide-4
SLIDE 4

Test Isolation Introduces Substantial Overhead

Absolute Isolation run each test in a fresh VM No Isolation run all tests in one process more isolation higher cost ← − less isolation lower cost − →

2 / 20

slide-5
SLIDE 5

Test Isolation Introduces Substantial Overhead

Absolute Isolation run each test in a fresh VM No Isolation run all tests in one process more isolation higher cost ← − less isolation lower cost − → Process-Level Isolation run each test in its own process

2 / 20

slide-6
SLIDE 6

Test Isolation Introduces Substantial Overhead

Absolute Isolation run each test in a fresh VM No Isolation run all tests in one process more isolation higher cost ← − less isolation lower cost − → Process-Level Isolation run each test in its own process

For Java: forking a separate JVM process for each test case

2 / 20

slide-7
SLIDE 7

Test Isolation Introduces Substantial Overhead

Absolute Isolation run each test in a fresh VM No Isolation run all tests in one process more isolation higher cost ← − less isolation lower cost − → Process-Level Isolation run each test in its own process

For Java: forking a separate JVM process for each test case Process-level test isolation still introduces substantial overhead Potential sources: startup cost, inter process communication

2 / 20

slide-8
SLIDE 8

High Overhead of Test Isolation in Maven

We performed an exploratory study to measure per-test

  • verhead introduced by the build systems

Execute test: Thread.sleep(250) Overhead = actual time − 250ms

Build System Overhead (ms) Ant 1.10.6 259 Gradle 5.6.1 412 Maven (Surefire 3.0.0-M3) 596

3 / 20

slide-9
SLIDE 9

High Overhead of Test Isolation in Maven

We performed an exploratory study to measure per-test

  • verhead introduced by the build systems

Execute test: Thread.sleep(250) Overhead = actual time − 250ms

Build System Overhead (ms) Ant 1.10.6 259 Gradle 5.6.1 412 Maven (Surefire 3.0.0-M3) 596

Surprising findings:

Very different overhead among different build systems Maven has huge overhead compared to others

3 / 20

slide-10
SLIDE 10

Contributions

ForkScript: a novel technique to minimize inter process communication overhead in test isolation, that saved test execution time by 50% Guided by the development of ForkScript, we found and fixed a performance bug in Maven’s test execution, and our patch has been accepted and merged in Maven Evaluation of ForkScript and the Maven with our patch on 29 open-source projects totaling 2M LOC Implications and lessons learned

4 / 20

slide-11
SLIDE 11

Maven’s Test Execution: Users’ View

Maven uses Surefire plugin to manage test execution mvn test: executes tests with no isolation

JVM

t1, t2, t3, t4

(a) mvn test (default behavior)

time

5 / 20

slide-12
SLIDE 12

Maven’s Test Execution: Users’ View

Maven uses Surefire plugin to manage test execution mvn test: executes tests with no isolation

JVM

t1, t2, t3, t4

(a) mvn test (default behavior)

time

Test isolation with −DreuseForks and −DforkCount

5 / 20

slide-13
SLIDE 13

Maven’s Test Execution: Users’ View

Maven uses Surefire plugin to manage test execution mvn test: executes tests with no isolation

JVM

t1, t2, t3, t4

(a) mvn test (default behavior)

time

Test isolation with −DreuseForks and −DforkCount

JVM

t1

JVM

t2

JVM

t3

JVM

t4

(b) -DreuseForks=false -DforkCount=1

time

5 / 20

slide-14
SLIDE 14

Maven’s Test Execution: Users’ View

Maven uses Surefire plugin to manage test execution mvn test: executes tests with no isolation

JVM

t1, t2, t3, t4

(a) mvn test (default behavior)

time

Test isolation with −DreuseForks and −DforkCount

JVM

t1

JVM

t2

JVM

t3

JVM

t4

(b) -DreuseForks=false -DforkCount=1 JVM

t1

JVM

t2

JVM

t3

JVM

t4

(c) -DreuseForks=false -DforkCount=2

time time

5 / 20

slide-15
SLIDE 15

Maven’s Test Execution: Behind the Scenes

ForkStarter JVM . . . Executor

config

ForkBooter JVM Surefire Classes JUnit Classes App Classes

( 1 ) S E R I A L I Z E ( 3 ) D E S E R I A L I Z E

( 2 ) S P A W N ( 4 ) W A I T r e s u l t s

Two key classes: ForkStarter and ForkBooter

6 / 20

slide-16
SLIDE 16

Maven’s Test Execution: Behind the Scenes

ForkStarter JVM . . . Executor

config

ForkBooter JVM Surefire Classes JUnit Classes App Classes

( 1 ) S E R I A L I Z E ( 3 ) D E S E R I A L I Z E

( 2 ) S P A W N ( 4 ) W A I T r e s u l t s

Two key classes: ForkStarter and ForkBooter ForkStarter creates an Executor (thread pool)

6 / 20

slide-17
SLIDE 17

Maven’s Test Execution: Behind the Scenes

ForkStarter JVM . . . Executor

config

ForkBooter JVM Surefire Classes JUnit Classes App Classes

( 1 ) S E R I A L I Z E ( 3 ) D E S E R I A L I Z E

( 2 ) S P A W N ( 4 ) W A I T r e s u l t s

Two key classes: ForkStarter and ForkBooter ForkStarter creates an Executor (thread pool) For each test:

ForkStarter serializes configurations to file

6 / 20

slide-18
SLIDE 18

Maven’s Test Execution: Behind the Scenes

ForkStarter JVM . . . Executor

config

ForkBooter JVM Surefire Classes JUnit Classes App Classes

( 1 ) S E R I A L I Z E ( 3 ) D E S E R I A L I Z E

( 2 ) S P A W N ( 4 ) W A I T r e s u l t s

Two key classes: ForkStarter and ForkBooter ForkStarter creates an Executor (thread pool) For each test:

ForkStarter serializes configurations to file ForkStarter spawns a child JVM w/ main class ForkBooter

6 / 20

slide-19
SLIDE 19

Maven’s Test Execution: Behind the Scenes

ForkStarter JVM . . . Executor

config

ForkBooter JVM Surefire Classes JUnit Classes App Classes

( 1 ) S E R I A L I Z E ( 3 ) D E S E R I A L I Z E

( 2 ) S P A W N ( 4 ) W A I T r e s u l t s

Two key classes: ForkStarter and ForkBooter ForkStarter creates an Executor (thread pool) For each test:

ForkStarter serializes configurations to file ForkStarter spawns a child JVM w/ main class ForkBooter ForkBooter deserializes configurations from file

6 / 20

slide-20
SLIDE 20

Maven’s Test Execution: Behind the Scenes

ForkStarter JVM . . . Executor

config

ForkBooter JVM Surefire Classes JUnit Classes App Classes

( 1 ) S E R I A L I Z E ( 3 ) D E S E R I A L I Z E

( 2 ) S P A W N ( 4 ) W A I T r e s u l t s

Two key classes: ForkStarter and ForkBooter ForkStarter creates an Executor (thread pool) For each test:

ForkStarter serializes configurations to file ForkStarter spawns a child JVM w/ main class ForkBooter ForkBooter deserializes configurations from file ForkBooter executes the test with JUnit ForkStarter waits for ForkBooter to send a “goodbye” signal when the test finishes

6 / 20

slide-21
SLIDE 21

Maven’s Test Execution: Behind the Scenes

ForkStarter JVM . . . Executor

config

ForkBooter JVM Surefire Classes JUnit Classes App Classes

( 1 ) S E R I A L I Z E ( 3 ) D E S E R I A L I Z E

( 2 ) S P A W N ( 4 ) W A I T r e s u l t s

Inter process communication (IPC) is costly

Using thread pool and executors to manage processes Exchanging configuration with new JVMs via (de)serialization Class loading of Surefire’s classes “Pumping” input/output between the JVMs

6 / 20

slide-22
SLIDE 22

Maven’s Test Execution: Behind the Scenes

ForkStarter JVM . . . Executor

config

ForkBooter JVM Surefire Classes JUnit Classes App Classes

( 1 ) S E R I A L I Z E ( 3 ) D E S E R I A L I Z E

( 2 ) S P A W N ( 4 ) W A I T r e s u l t s

Inter process communication (IPC) is costly

Using thread pool and executors to manage processes Exchanging configuration with new JVMs via (de)serialization Class loading of Surefire’s classes “Pumping” input/output between the JVMs

6 / 20

slide-23
SLIDE 23

ForkScript

ForkScript generates a single on-the-fly specialized execution script for running all configured tests and collecting test results No IPC between the build system and test processes Relies on operating system’s process management

ForkScript (tests, config) Build System

#!/bin/bash java -cp ‘classpath forkscript.JUnitRunner t1 ‘config java -cp ‘classpath forkscript.JUnitRunner t2 ‘config java -cp ‘classpath forkscript.JUnitRunner t3 ‘config java -cp ‘classpath forkscript.JUnitRunner t4 ‘config

  • n-the-fly script (simplified)

JVM JVM JVM JVM

7 / 20

slide-24
SLIDE 24

ForkScript Scripts Examples

ForkScript supports test isolation, sequential and parallel testing

JVM

t1, t2, t3, t4

(a) mvn test (default behavior)

time #!/bin/bash java -cp ‘classpath forkscript.JUnitRunner t1 t2 t3 t4 ‘config

8 / 20

slide-25
SLIDE 25

ForkScript Scripts Examples

ForkScript supports test isolation, sequential and parallel testing

JVM

t1, t2, t3, t4

(a) mvn test (default behavior) JVM

t1

JVM

t2

JVM

t3

JVM

t4

(b) -DreuseForks=false -DforkCount=1

time #!/bin/bash java -cp ‘classpath forkscript.JUnitRunner t1 t2 t3 t4 ‘config time #!/bin/bash java -cp ‘classpath forkscript.JUnitRunner t1 ‘config java -cp ‘classpath forkscript.JUnitRunner t2 ‘config java -cp ‘classpath forkscript.JUnitRunner t3 ‘config java -cp ‘classpath forkscript.JUnitRunner t4 ‘config

8 / 20

slide-26
SLIDE 26

ForkScript Scripts Examples

ForkScript supports test isolation, sequential and parallel testing

JVM

t1, t2, t3, t4

(a) mvn test (default behavior) JVM

t1

JVM

t2

JVM

t3

JVM

t4

(b) -DreuseForks=false -DforkCount=1 JVM

t1

JVM

t2

JVM

t3

JVM

t4

(c) -DreuseForks=false -DforkCount=2

time #!/bin/bash java -cp ‘classpath forkscript.JUnitRunner t1 t2 t3 t4 ‘config time #!/bin/bash java -cp ‘classpath forkscript.JUnitRunner t1 ‘config java -cp ‘classpath forkscript.JUnitRunner t2 ‘config java -cp ‘classpath forkscript.JUnitRunner t3 ‘config java -cp ‘classpath forkscript.JUnitRunner t4 ‘config time #!/bin/bash java -cp ‘classpath forkscript.JUnitRunner t1 ‘config & java -cp ‘classpath forkscript.JUnitRunner t2 ‘config & wait java -cp ‘classpath forkscript.JUnitRunner t3 ‘config & java -cp ‘classpath forkscript.JUnitRunner t4 ‘config & wait

8 / 20

slide-27
SLIDE 27

Performance Profiling Maven

ForkScript provides a barebones, stripped down mechanism for test isolation, but doesn’t support all configuration options We also carefully profiled Maven to identify the source of the additional overhead

9 / 20

slide-28
SLIDE 28

Performance Profiling Maven: Setup

  • 1. Start Test
  • 2. Forked Runner

Starts

  • 3. Forked Runner

Exits

  • 4. Finished

fork()

waiting Run Test T1 T2 T3 Main Process Test Runner Process

T1: between when the build system begins running a test until the child process starts T2: between when the child process starts until the child process terminates T3: between when the child process terminates until when the build system determines the test has completed

10 / 20

slide-29
SLIDE 29

Performance Profiling Maven: Findings

  • 1. Start Test
  • 2. Forked Runner

Starts

  • 3. Forked Runner

Exits

  • 4. Finished

fork()

waiting Run Test T1 T2 T3 Main Process Test Runner Process

Build System T1[ms] T2[ms] T3[ms] Ant 1.10.6 250 253 9 Gradle 5.6.1 395 253 17 Maven (Surefire 3.0.0-M3) 244 253 352

11 / 20

slide-30
SLIDE 30

Performance Profiling Maven: Findings

  • 1. Start Test
  • 2. Forked Runner

Starts

  • 3. Forked Runner

Exits

  • 4. Finished

fork()

waiting Run Test T1 T2 T3 Main Process Test Runner Process

Build System T1[ms] T2[ms] T3[ms] Ant 1.10.6 250 253 9 Gradle 5.6.1 395 253 17 Maven (Surefire 3.0.0-M3) 244 253 352

Performance bug in Maven: child process keeps reading from stdin, so it cannot be interrupted (terminated) immediately

11 / 20

slide-31
SLIDE 31

Performance Profiling Maven: Patch

To fix the performance bug, we went over many iterations with Maven developers for several months First, we prepared a large patch that removed all sources of the

  • verhead, but it was hard for developers to review and integrate

12 / 20

slide-32
SLIDE 32

Performance Profiling Maven: Patch

To fix the performance bug, we went over many iterations with Maven developers for several months First, we prepared a large patch that removed all sources of the

  • verhead, but it was hard for developers to review and integrate

Then, we prepared another small patch that changed several lines, was easy to inspect, but didn’t remove all sources of overhead

12 / 20

slide-33
SLIDE 33

Performance Profiling Maven: Patch

To fix the performance bug, we went over many iterations with Maven developers for several months First, we prepared a large patch that removed all sources of the

  • verhead, but it was hard for developers to review and integrate

Then, we prepared another small patch that changed several lines, was easy to inspect, but didn’t remove all sources of overhead The small patch was merged to Maven Surefire 3.0.0-M5

12 / 20

slide-34
SLIDE 34

Performance Profiling Maven: Patch

To fix the performance bug, we went over many iterations with Maven developers for several months First, we prepared a large patch that removed all sources of the

  • verhead, but it was hard for developers to review and integrate

Then, we prepared another small patch that changed several lines, was easy to inspect, but didn’t remove all sources of overhead The small patch was merged to Maven Surefire 3.0.0-M5

Build System T1[ms] T2[ms] T3[ms] Maven (Surefire 3.0.0-M3) 244 253 352 Maven (With our patch) 217 252 17

12 / 20

slide-35
SLIDE 35

Evaluation: Research Questions

RQ1 What are the performance improvements obtained by ForkScript compared to the unpatched Maven? RQ2 How does the improvement scale as the number of concurrent processes increase? RQ3 How does the patched Maven compare to ForkScript?

13 / 20

slide-36
SLIDE 36

Evaluation: Subjects

29 projects used in recent testing literature, and:

use Maven build system have non-trivial number of tests have tests whose execution time is non-negligible successfully build at its latest revision

14 / 20

slide-37
SLIDE 37

Evaluation: Subjects

29 projects used in recent testing literature, and:

use Maven build system have non-trivial number of tests have tests whose execution time is non-negligible successfully build at its latest revision

LOC: total 2.12M, average 73.0K number of test classes: total 6.14K, average 211 number of test methods: total 209K, average 7.22K

14 / 20

slide-38
SLIDE 38

Evaluation: Setup

For each project: Clone the project Execute mvn install to download all necessary dependencies, then switch to offline mode Run tests using {ForkScript, unpatched Maven, patched Maven} and measure time

15 / 20

slide-39
SLIDE 39

Evaluation Results: Sequential Runs

RQ1 What are the performance improvements obtained by ForkScript compared to the unpatched Maven? mvn test -DreuseForks=false -DforkCount=1 T mvn: Maven; T FS: ForkScript; RT = T mvn−T FS

T mvn

× 100%

T mvn[s] T FS[s] RT Avg. 154.66 80.74 Σ 4,485.16 2,341.60 50%

16 / 20

slide-40
SLIDE 40

Evaluation Results: Sequential Runs

RQ1 What are the performance improvements obtained by ForkScript compared to the unpatched Maven? mvn test -DreuseForks=false -DforkCount=1 T mvn: Maven; T FS: ForkScript; RT = T mvn−T FS

T mvn

× 100%

T mvn[s] T FS[s] RT Avg. 154.66 80.74 Σ 4,485.16 2,341.60 50%

ForkScript reduces testing time by 50% on average and up to 75% Projects with smaller tests (lower time per test) benefit more

16 / 20

slide-41
SLIDE 41

Evaluation Results: Parallel Runs

RQ2 How does the improvement scale as the number of concurrent processes increase? mvn test -DreuseForks=false -DforkCount=2 T mvn: Maven; T FS: ForkScript; RT = T mvn−T FS

T mvn

× 100%

T mvn[s] T FS[s] RT Avg. 72.02 49.88 Σ 2,088.81 1,446.70 32%

17 / 20

slide-42
SLIDE 42

Evaluation Results: Parallel Runs

RQ2 How does the improvement scale as the number of concurrent processes increase? mvn test -DreuseForks=false -DforkCount=2 T mvn: Maven; T FS: ForkScript; RT = T mvn−T FS

T mvn

× 100%

T mvn[s] T FS[s] RT Avg. 72.02 49.88 Σ 2,088.81 1,446.70 32%

ForkScript reduces testing time by 32% on average and up to 63% The reduction in savings compared to sequential runs is due to total execution time approaches theoretical maximum (i.e., time to execute the longest test)

17 / 20

slide-43
SLIDE 43

Evaluation Results: Comparison with Patched Maven

RQ3 How does the patched Maven compare to ForkScript? mvn test -DreuseForks=false Fork 1: -DforkCount=1; Fork 2: -DforkCount=2 T mvn: Maven; T FS: ForkScript; T new: patched Maven

Fork 1 Fork 2 T mvn[s] T FS[s] T new[s] T mvn[s] T FS[s] T new[s] Avg. 154.66 80.74 95.88 72.02 49.88 55.45 Σ 4,485.16 2,341.60 2,780.54 2,088.81 1,446.70 1,608.06

18 / 20

slide-44
SLIDE 44

Evaluation Results: Comparison with Patched Maven

RQ3 How does the patched Maven compare to ForkScript? mvn test -DreuseForks=false Fork 1: -DforkCount=1; Fork 2: -DforkCount=2 T mvn: Maven; T FS: ForkScript; T new: patched Maven

Fork 1 Fork 2 T mvn[s] T FS[s] T new[s] T mvn[s] T FS[s] T new[s] Avg. 154.66 80.74 95.88 72.02 49.88 55.45 Σ 4,485.16 2,341.60 2,780.54 2,088.81 1,446.70 1,608.06

Patched Maven substantially outperforms the non-patched version ForkScript slightly outperforms patched Maven

18 / 20

slide-45
SLIDE 45

Implications and Lessons Learned

Detect performance bugs through differential testing

Performance bugs are notoriously difficult to find, but when there are alternative systems that accomplish the same goal, differential testing can help to reveal them

19 / 20

slide-46
SLIDE 46

Implications and Lessons Learned

Detect performance bugs through differential testing

Performance bugs are notoriously difficult to find, but when there are alternative systems that accomplish the same goal, differential testing can help to reveal them

Find simple fixes that can be integrated today

Our patch is already helping developers while the long-term fix (to completely remove stdin) is still going on

19 / 20

slide-47
SLIDE 47

Implications and Lessons Learned

Detect performance bugs through differential testing

Performance bugs are notoriously difficult to find, but when there are alternative systems that accomplish the same goal, differential testing can help to reveal them

Find simple fixes that can be integrated today

Our patch is already helping developers while the long-term fix (to completely remove stdin) is still going on

Researchers: engage in the open source community

Bug fixes, pull requests, Slack channel, etc. Find a balance between research novelty and practical impact

19 / 20

slide-48
SLIDE 48

Implications and Lessons Learned

Detect performance bugs through differential testing

Performance bugs are notoriously difficult to find, but when there are alternative systems that accomplish the same goal, differential testing can help to reveal them

Find simple fixes that can be integrated today

Our patch is already helping developers while the long-term fix (to completely remove stdin) is still going on

Researchers: engage in the open source community

Bug fixes, pull requests, Slack channel, etc. Find a balance between research novelty and practical impact

Researchers: continue testing of build systems

19 / 20

slide-49
SLIDE 49

Conclusions

Demystified why test isolation is costly Found a performance bug in Maven build system related to IPC ForkScript, a research prototype that minimizes IPC to speed up test isolation Evaluation on 29 open-source projects totaling 2M LOC Our patch was accepted and merged to Maven, which is already saving significant test execution time for many developers Pengyu Nie pynie@utexas.edu

20 / 20