Unit Tes)ng Tool Compe))on Round Four Urko Rueda, Ren Just, Juan - PowerPoint PPT Presentation

Unit Tes)ng Tool Compe))on Round Four Urko Rueda, René Just, Juan P. Galeo5, Tanja E. J. Vos The 9th Interna=onal Workshop on Search-Based SoDware Tes=ng

Contents 1. About the Tool compe==on 2. The Tools 3. The Methodology 4. The Results 5. Lessons learned 4th Java unit tes=ng 1 compe==on

About the Tool compe))on Benchmarked Java unit tes=ng at the class level FITTEST Unit Testing Coverage Mutation CUTs / Tools crest.cs.ucl.ac. Tool metrics metrics Projects SBST & nonSBST uk/fi;est Competition / Tools 2012 Cobertura Javalanche 77 / 5 / 2 Manual & Randoop ✓ - baselines ICST’13 JaCoCo PITest 63 / 9 / 4 1 st + T3 & Evosuite 2013 Round Two FITTEST’13 ✓ 63 / 9 / 8 2014 ✗ 2 nd + Commercial & GRT & jTexPert & Round Three Mosa(Evosuite) SBST’15 Defects4J: github.com/rjust/ 2015 ✗ 68 / 5 / 4 Randoop - baseline defects4j & T3 & Evosuite & Round Four + Real fault finding metric jTexPert SBST’16 4th Java unit tes=ng 2 compe==on

About the Tool compe))on § Why? § Towards tes=ng field maturity – this is just Java … § Tools improvements, future developments insight § What is new in the 4 th edi=on? § Benchmark infrastructure – split into § Test genera=on § Test execu=on & Test assessment (Defects4J) § Benchmark subjects (from Defects4J dataset) § Time budgets (1, 2, 4 & 8 minutes) § Flaky tests (non compliable, non reliable pass) 4th Java unit tes=ng 3 compe==on

The Tools § SBST and non-SBST tools § Command line tools § Fully automated – no human interven=on Tool Technique Static Edition analysis 2012 2013 2014 2015 Randoop Random ✗ ✓ ✓ ✓ ✓ (baseline) T3 ✗ ✗ ✓ ✓ ✓ jTexPert Random (guided) ✓ ✗ ✗ ✓ ✓ Evosuite Evolutionary ✓ ✗ ✓ ✓ ✓ algorithm 4th Java unit tes=ng 4 compe==on

The Methodology § Tool deployment § Installa=on – Linux environment § Wrapper implementa=on – runtool script § Std. IN/OUT communica=on protocol § 4 th edi=on has a =me budget § Tune-up cycle – setup, run, resolve issues § Benchmark infrastructure § Defects4J integra=on § Decoupling test genera=on from test execu=on/assessment § Tool – run over non contest benchmark samples 4th Java unit tes=ng 5 compe==on

The Methodology benchmark run tool framework for Tool T "BENCHMARK" Src Path / Bin Path / ClassPath ClassPath for JUnit Compilation . . preparation . "READY" time-budget name of CUT . . generate file in loop ./temp/testcases . "READY" compile + execute + measure test case 4th Java unit tes=ng 6 compe==on

The Methodology § Benchmark infrastructure § Two HP Z820 worksta=ons – each: § 2 CPU sockets for a total of 20 cores § 256Gb RAM § 32 virtual machines (16 per worksta=on) § Test genera=on § 1 core – control tool mul=-threading capability § 8GB RAM § Test execu=on/assessment (tool independent) § 2 cores § 16Gb RAM – resolves out of memory issues 4th Java unit tes=ng 7 compe==on

The Methodology HP Z820 16 VMs HP Z820 16 VMs 80 20core CPU 20core CPU CUTs 256Gb RAM 256Gb RAM 1core CPU 2core CPU 1core CPU 2core CPU 8Gb RAM 16Gb RAM 8Gb RAM 16Gb RAM time budgets time budgets runtool T3 1 2 4 8m 1 2 4 8m 1 2 4 8m 1 2 4 8m 1 2 4 8m 1 2 4 8m 1 2 4 8m 1 2 4 8m replicated x32 VMs 1 2 4 8m 1 2 4 8m 1 2 4 8m 1 2 4 8m benchmark tool runtool jTexpert 1 2 4 8m 1 2 4 8m 1 2 4 8m 1 2 4 8m RUNs 1, 2, 3 RUNs 4, 5, 6 runtool EvoSuite generate collect generate collect test cases metrics test cases metrics aggregator Randoop runtool Calculate Score 4th Java unit tes=ng 8 compe==on

The Methodology EvoSuite Randoop T3 jTexpert runtool runtool runtool runtool benchmark tool Time- CUT generate budget ( fixed ) (1, 2 , 4, 8min) Test classes run to detect @Test Y and remove compilable @Test flaky tests N @Test Test classes @Test @Test No flaky tests CUT ( 1 real fault ) run to CUT collect ( fixed ) metrics CUT ( mutated ) calculate score 4th Java unit tes=ng 9 compe==on

The Methodology § Flaky tests § Passes during genera=on § But, might Fail during execu=on/assessment § False-posi=ve warnings § Non reliable fault-detec=on § Non reliable muta=on analysis § Defects4J flaky tests sanity § Non compiling test classes § Failing tests over 5 execu=ons (fixed CUT versions) 4th Java unit tes=ng 10 compe==on

The Methodology § The Metrics – Test effec=veness § Code coverage (fixed benchmark versions) § Defects4J <- Cobertura § Statement coverage § Condi=on coverage § Muta=on score § Defects4J <- Major framework (all muta=on operators) § Real fault detec=on (buggy benchmark versions) § 1 real fault per benchmark § 0 or 1 score, independent of how many tests reveal it 4th Java unit tes=ng 11 compe==on

The Methodology § The Scoring formula covScore (T,L,C,r) := w i · cov i + w b · cov b + w m · cov m + (real fault found ? w f : 0) T = Tool; L = Time budget; C = CUT; r = RUN (1..6) Coverages: cov i = statement; cov b = condi=on cov m = mutants kill ra=o Weights: w i = 1; w b = 2; w m = 4; w f = 4 4th Java unit tes=ng 12 compe==on

The Methodology § The Scoring formula – =me penalty § Test genera=on slot: L .. 2 · L § No penalty if genTime <= L § Penalty for Extra =me taken (genTime – L) § Half covScore if the Tool must be killed (> 2 · L) 4th Java unit tes=ng 13 compe==on

The Methodology § The Scoring formula – tests penalty #Classes = generated test classes; #uClasses = uncompilable #Tests = test cases; #fTests = flaky 4th Java unit tes=ng 14 compe==on

The Methodology § The Scoring formula – Tool score Score(T,L,C,r) := tScore(T,L,C,r) – penalty(T,L,C,r) Score(T,L,C) := avg(Score(T,L,C,r) for all r execu=ons 4th Java unit tes=ng 15 compe==on

The Methodology § Conclusion validity § Reliability of treatment implementa=on § Tool deployment instruc=ons EQUAL for all par=cipants § Reliability of measures § Efficiency: wall clock =me by Java System.currentTimeMillis() § Effec=veness: Defects4J § Tools non-determinis=c nature: 6 runs (HW Capacity) 4th Java unit tes=ng 16 compe==on

The Methodology § Internal validity § CUTs from Defects4J (uniform and arbitrary selec=on from 5 open source projects) § Tools and benchmark infrastructure Tune-up samples § Contest benchmarks § Wrappers runtool : implemented by Tools side § Construct validity § Scoring formula weights – quality indicators value § Empirical studies – correla=on of proxy metrics for: Test effec=veness and Fault finding capability 4th Java unit tes=ng 17 compe==on

The Results Contest run for ~ 1week Test genera=on, A single virtual execu=on and machine would use assessment 8 CPU months ! x32 VMs 4th Java unit tes=ng 18 compe==on

Lessons learned § Tes=ng Tools improvements § Automa=on, Test effec=veness, Comparability § Benchmarking infrastructure improvements § Decoupling Test gen. from execu=on/assessment § Flaky tests iden=fica=on and sanity § Fault finding capability measurement § Test effec=veness due to Test genera=on =me § What next? § Automated paralleliza=on of the benchmark contest § More Tools, new languages? (i.e. C#?) 4th Java unit tes=ng 19 compe==on

Contact us Universidad Politécnica de Valencia, ES urueda@pros.upv.es, tvos@dsic.upv.es Open Universiteit Heerlen, NL tanja.vos@ou.nl University of Massachuseys Amherst, MA, USA rjust@cs.umass.edu University of Buenos Aires, Argen=na jgaleo5@dc.uba.ar web: hyp://sbstcontest.dsic.upv.es/ 4th Java unit tes=ng 20 compe==on

Unit Tes)ng Tool Compe))on Round Four Urko Rueda, Ren Just, Juan - PowerPoint PPT Presentation

Unit Tes)ng Tool Compe))on Round Four Urko Rueda, Ren Just, Juan P. Galeo5, Tanja E. J. Vos The 9th Interna=onal Workshop on Search-Based SoDware Tes=ng Contents 1. About the Tool compe==on 2. The Tools 3. The Methodology 4. The Results

Toy Example Toy Example Toy Example Toy Example Toy Example D 1 weak classifiers = vertical or

L e ss Re tail Compe tition L e ss Re tail Compe tition Compar ison Popula tion 34 million

SynAthina Onli line Tools 1. . A mapping tool 2. A Community Tool 3. An Archive Tool 3. An

TES Communications Strategy Not SELEP TES Communications Strategy Why is it important? You

REQUIR REQUIR IREM IREM EMENT EMENT ENTS ENTS S FOR UPDATES S FOR UPDATES TES OF TES OF

Potty Training in Potty Training in Potty Training in Potty Training in Four Days Four Days

COMMUNITY CONTRIBUTION FUND (CCF) Round 9, September 2015 TA11 CCF Summary 2011 -2015 Round

Compe&&ve exclusion in linguis&c morphology Mark Aronoff

Compe;;ve Retail Electric Market in ERCOT 55 The ERCOT

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

TES COMMUNICATIONS NEXT STEPS Aimed at TES Board Members STEP 1 Mainly LinkedIn and

Tes esting O Over erview R ye C ity School District March 22, 2018 NYS Tes esting O Over

Be#er Tes(ng with Less Work: QuickCheck Tes(ng in Prac(ce

SELF SE SELF SE SE SELF SE SELF LF-INJECTION LF LF LF-INJECTION INJECTION INJECTION

th An rs 4 th Ame Ameri rican Le Legion Riders Annual Sum Summit Merry-Go-Round Round

INLA workshop quiz Murray Logan August 16, 2016 Table of contents 1 Round 1: What the @#?

Hardware Implementation of Block Cipher: Case Study Using AES Tohoku University Rei Ueno

Round Compression for Parallel Matching Algorithms Krzysztof Onak IBM T.J. Watson Research

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

High Level Overview: Round 2 of the State Innovation Model (SIM) Initiative Dr. Karen Murphy

Public Involvement Meetings 2 nd Round July 23 Kauai July 28-31 Hawaii August 4-7

Minimum Disclosure Counting for the Alternative Vote Roland Wen and Richard Buckland School of

disease study updates Professor Michael J Seckl GCIG Chicago meeting Jun 17 GOG-0275: A

Cutting Convex Sets With Margin Shay Moran (Google AI and Technion) Background A geometric

Unit Tes)ng Tool Compe))on Round Four Urko Rueda, Ren Just, Juan - PowerPoint PPT Presentation

Unit Tes)ng Tool Compe))on Round Four Urko Rueda, Ren Just, Juan P. Galeo5, Tanja E. J. Vos The 9th Interna=onal Workshop on Search-Based SoDware Tes=ng Contents 1. About the Tool compe==on 2. The Tools 3. The Methodology 4. The Results

Toy Example Toy Example Toy Example Toy Example Toy Example D 1 weak classifiers = vertical or

L e ss Re tail Compe tition L e ss Re tail Compe tition Compar ison Popula tion 34 million

SynAthina Onli line Tools 1. . A mapping tool 2. A Community Tool 3. An Archive Tool 3. An

TES Communications Strategy Not SELEP TES Communications Strategy Why is it important? You

REQUIR REQUIR IREM IREM EMENT EMENT ENTS ENTS S FOR UPDATES S FOR UPDATES TES OF TES OF

Potty Training in Potty Training in Potty Training in Potty Training in Four Days Four Days

COMMUNITY CONTRIBUTION FUND (CCF) Round 9, September 2015 TA11 CCF Summary 2011 -2015 Round

Compe&amp;&amp;ve exclusion in linguis&amp;c morphology Mark Aronoff

Compe;;ve Retail Electric Market in ERCOT 55 The ERCOT

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

TES COMMUNICATIONS NEXT STEPS Aimed at TES Board Members STEP 1 Mainly LinkedIn and

Tes esting O Over erview R ye C ity School District March 22, 2018 NYS Tes esting O Over

Be#er Tes(ng with Less Work: QuickCheck Tes(ng in Prac(ce

SELF SE SELF SE SE SELF SE SELF LF-INJECTION LF LF LF-INJECTION INJECTION INJECTION

th An rs 4 th Ame Ameri rican Le Legion Riders Annual Sum Summit Merry-Go-Round Round

INLA workshop quiz Murray Logan August 16, 2016 Table of contents 1 Round 1: What the @#?

Hardware Implementation of Block Cipher: Case Study Using AES Tohoku University Rei Ueno

Round Compression for Parallel Matching Algorithms Krzysztof Onak IBM T.J. Watson Research

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

High Level Overview: Round 2 of the State Innovation Model (SIM) Initiative Dr. Karen Murphy

Public Involvement Meetings 2 nd Round July 23 Kauai July 28-31 Hawaii August 4-7

Minimum Disclosure Counting for the Alternative Vote Roland Wen and Richard Buckland School of

disease study updates Professor Michael J Seckl GCIG Chicago meeting Jun 17 GOG-0275: A

Cutting Convex Sets With Margin Shay Moran (Google AI and Technion) Background A geometric

Compe&&ve exclusion in linguis&c morphology Mark Aronoff