Comparing User-Provided Tests to Developer-Provided Tests Ren - PowerPoint PPT Presentation

Comparing User-Provided Tests to Developer-Provided Tests René Just, Chris Parnin, Ian Drosos, Michael D. Ernst ISSTA 2018

User-provided tests Developer-provided tests Found in bug reports Committed to repository One small test More tests, more LOC Weak or no assertions More, stronger assertions High code coverage Focused on the defect Used by programmers Used in experiments Fault localization 5-14% worse Automated program repair 54-100% worse User-provided tests should be used in experiments.

Fault localization: where is the defect? Defective program double avg (double[] nums) { Fault int n = nums.length; localization double sum = 0; for(int i=0; i<n; ++i) { technique sum += nums[i]; } return sum * n; } Test suite Passing tests Failing tests

Fault localization: where is the defect? Defective program Statement ranking double avg (double[] nums) { double avg (double[] nums) { Fault int n = nums.length; int n = nums.length; localization double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique sum += nums[i]; sum += nums[i]; } } return sum * n; return sum * n; } } Least suspicious Test suite Most Passing tests suspicious Failing tests

Evaluating fault localization Defective program Statement ranking double avg (double[] nums) { double avg (double[] nums) { Fault int n = nums.length; int n = nums.length; localization double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique sum += nums[i]; sum += nums[i]; } } return sum * n; return sum * n; } } Compare to Test suite known location of defect Passing tests Failing tests

Evaluating fault localization Defective program Statement ranking double avg (double[] nums) { double avg (double[] nums) { Fault int n = nums.length; int n = nums.length; localization double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique 1 sum += nums[i]; sum += nums[i]; } } return sum * n; return sum * n; } } Compare to Test suite known location of defect Passing double avg (double[] nums) { Fault tests int n = nums.length; localization double sum = 0; for(int i=0; i<n; ++i) { technique 2 Failing sum += nums[i]; tests } return sum * n; }

Evaluating fault localization Early work Defective program Statement ranking ● Artificial defects (“mutants”) double avg (double[] nums) { double avg (double[] nums) { Fault ○ Easy to create lots of them int n = nums.length; int n = nums.length; ○ Known fault locations localization double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique 1 Pearson et al. [ICSE 2017] sum += nums[i]; sum += nums[i]; } } ● 310 real defects (Defects4J) return sum * n; return sum * n; ● 2995 artificial defects } } Compare to Test suite known location of defect Passing double avg (double[] nums) { Fault tests int n = nums.length; localization double sum = 0; for(int i=0; i<n; ++i) { technique 2 Failing sum += nums[i]; tests } return sum * n; }

Evaluating fault localization Early work Defective program Statement ranking ● Artificial defects (“mutants”) double avg (double[] nums) { double avg (double[] nums) { Fault ○ Easy to create lots of them int n = nums.length; int n = nums.length; ○ Known fault locations localization double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique 1 Pearson et al. [ICSE 2017] sum += nums[i]; sum += nums[i]; } } ● 310 real defects (Defects4J) return sum * n; return sum * n; ● 2995 artificial defects } } Compare to Early work Test suite known location ● Artificial tests of defect ○ Written by researchers Passing double avg (double[] nums) { Fault ○ Unrealistically strong tests int n = nums.length; localization double sum = 0; Pearson et al. [ICSE 2017] for(int i=0; i<n; ++i) { technique 2 Failing ● Real tests (Defects4J) sum += nums[i]; tests } ○ Written by developers return sum * n; ○ Committed with the fix }

Pearson [ICSE 2017] Comparison of fault localization techniques SBFL vs. SBFL MBFL vs. SBFL

Pearson [ICSE 2017] Comparison of fault localization techniques SBFL vs. SBFL MBFL vs. SBFL Results agree with most prior studies on artificial faults but only 3 effect sizes are not negligible.

Pearson [ICSE 2017] Comparison of fault localization techniques SBFL vs. SBFL MBFL vs. SBFL Results disagree with all prior studies on real faults . Design decisions don’t matter: techniques indistinguishable .

Evaluating fault localization Defective program Statement ranking double avg (double[] nums) { double avg (double[] nums) { Fault int n = nums.length; int n = nums.length; localization double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique 1 sum += nums[i]; sum += nums[i]; } } return sum * n; return sum * n; } } Compare to Test suite known location of defect Passing double avg (double[] nums) { Fault tests int n = nums.length; localization double sum = 0; for(int i=0; i<n; ++i) { technique 2 Failing sum += nums[i]; tests } return sum * n; }

Evaluating fault localization Defective program Statement ranking New standard methodology: double avg (double[] nums) { double avg (double[] nums) { Fault Use real defects int n = nums.length; int n = nums.length; localization from Defects4J (mined from double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique 1 version control repositories) sum += nums[i]; sum += nums[i]; } } return sum * n; return sum * n; } } Compare to Test suite known location of defect Passing double avg (double[] nums) { Fault tests int n = nums.length; localization double sum = 0; for(int i=0; i<n; ++i) { technique 2 Failing sum += nums[i]; tests } return sum * n; }

Evaluating fault localization Defective program Statement ranking New standard methodology: double avg (double[] nums) { double avg (double[] nums) { Fault Use real defects int n = nums.length; int n = nums.length; localization from Defects4J (mined from double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique 1 version control repositories) sum += nums[i]; sum += nums[i]; } } return sum * n; return sum * n; } } Defects4J: real triggering tests Compare to ● Written by developers Test suite known location ● Committed with the fix of defect Passing double avg (double[] nums) { Fault tests int n = nums.length; localization double sum = 0; for(int i=0; i<n; ++i) { technique 2 Failing sum += nums[i]; tests } return sum * n; }

Evaluating fault localization Defective program Statement ranking New standard methodology: double avg (double[] nums) { double avg (double[] nums) { Fault Use real defects int n = nums.length; int n = nums.length; localization from Defects4J (mined from double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique 1 version control repositories) sum += nums[i]; sum += nums[i]; } } return sum * n; return sum * n; } } Defects4J: real triggering tests Compare to ● Written by developers Test suite known location ● Committed with the fix of defect Passing double avg (double[] nums) { Written before or after the fix? Fault tests int n = nums.length; localization double sum = 0; for(int i=0; i<n; ++i) { technique 2 Failing sum += nums[i]; tests } return sum * n; }

Evaluating fault localization Defective program Statement ranking New standard methodology: double avg (double[] nums) { double avg (double[] nums) { Fault Use real defects int n = nums.length; int n = nums.length; localization from Defects4J (mined from double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique 1 version control repositories) sum += nums[i]; sum += nums[i]; } } return sum * n; return sum * n; } } Defects4J: real triggering tests Compare to ● Written by developers Test suite known location ● Committed with the fix of defect Passing double avg (double[] nums) { Written before or after the fix? Fault tests int n = nums.length; localization double sum = 0; for(int i=0; i<n; ++i) { In practice, fault localization is technique 2 Failing sum += nums[i]; run before the fix , using triggering tests } return sum * n; tests from bug reports .In }

User-provided test https://issues.apache.org/jira/browse/LANG-857 public void userTest () { assertEquals("\uD83D\uDE30", StringEscapeUtils.escapeCsv("\uD83D\uDE30")); }

Comparing User-Provided Tests to Developer-Provided Tests Ren - PowerPoint PPT Presentation

Comparing User-Provided Tests to Developer-Provided Tests Ren Just, Chris Parnin, Ian Drosos, Michael D. Ernst ISSTA 2018 User-provided tests Developer-provided tests Found in bug reports Committed to repository One small test More tests,

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing

Climate: What Is It Anyway Comparing Weather and Climate Climate Regions and Biomes Comparing

RUN groupadd -r user && useradd -r -g user user USER user $ docker run --read-only debian

Comparing the Performance of Randomization Tests and Traditional Tests: A Simulation Study

A. Job Title: Junior Full Stack Developer The Junior Full Stack Developer will be advised by the

FINAL PRESENTATION Meet the Team David Purdum Team Leader, Lead Developer Travis Miller Front

Web Developer Internship Ralph Mehitang 2 Todd Smith Web Communications Manager Andrew

Karl-Johan Dahlstrm Head of Developer Relations developer world sonymobile.com/developer

Comparing adult antenatal adult antenatal- -clinic based clinic based Comparing HIV prevalence

Comparing Selected Water Comparing Selected Water Quality Trading Rules & Quality Trading

Comparing TensorFlow 2.0 with PyTorch and PyTorch JIT Tim Lazarus 29 November, 2019 Comparing

STAT 113 Comparing Multiple Means Colin Reimer Dawson Oberlin College December 5, 2017 1 / 34

Comparing Two Samples We are often interested in comparing measurements made under two different

Business Statistics CONTENTS Comparing two s Comparing more than two s Analysis of

Comparing State Spaces in Automatic Security Protocol Verification Pascal Lafourcade & Cas

Comparing Several Samples We are often interested in comparing measurements made under more than

Enabling better device interaction with accelerometer 3 Feb 2013 etezian.org Andi Shyti Mika

Introduction to Sockets Programming in C using TCP/IP Professor: Panagiota Fatourou TA:

Development and Pixel2018 Performance of Phase-I Taipei, Taiwan Pixel DAQ in 2018 Atanu Modak,

Tools and Techniques for Floating-Point Analysis Ignacio Laguna Jan 7, 2020 @ LLNL Modified

Interrupts in Zynq Systems C r i s t i a n S i s t e r n a U n i v e r s i d a d N a c i o n a l

Deadly Embrace: Sovereign and Financial Balance Sheet Doom Loops Emmanuel Farhi Jean Tirole

Trigger and Data Acquisition (II) Brigitte Vachon (McGill) HCPSS 2010 HCPSS 2010 Brigitte

A Tropical Semiring Multiple Matrix-Product Library on GPUs: (not just) a step towards RNA-RNA

Comparing User-Provided Tests to Developer-Provided Tests Ren - PowerPoint PPT Presentation

Comparing User-Provided Tests to Developer-Provided Tests Ren Just, Chris Parnin, Ian Drosos, Michael D. Ernst ISSTA 2018 User-provided tests Developer-provided tests Found in bug reports Committed to repository One small test More tests,

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing

Climate: What Is It Anyway Comparing Weather and Climate Climate Regions and Biomes Comparing

RUN groupadd -r user &amp;&amp; useradd -r -g user user USER user $ docker run --read-only debian

Comparing the Performance of Randomization Tests and Traditional Tests: A Simulation Study

A. Job Title: Junior Full Stack Developer The Junior Full Stack Developer will be advised by the

FINAL PRESENTATION Meet the Team David Purdum Team Leader, Lead Developer Travis Miller Front

Web Developer Internship Ralph Mehitang 2 Todd Smith Web Communications Manager Andrew

Karl-Johan Dahlstrm Head of Developer Relations developer world sonymobile.com/developer

Comparing adult antenatal adult antenatal- -clinic based clinic based Comparing HIV prevalence

Comparing Selected Water Comparing Selected Water Quality Trading Rules &amp; Quality Trading

Comparing TensorFlow 2.0 with PyTorch and PyTorch JIT Tim Lazarus 29 November, 2019 Comparing

STAT 113 Comparing Multiple Means Colin Reimer Dawson Oberlin College December 5, 2017 1 / 34

Comparing Two Samples We are often interested in comparing measurements made under two different

Business Statistics CONTENTS Comparing two s Comparing more than two s Analysis of

Comparing State Spaces in Automatic Security Protocol Verification Pascal Lafourcade &amp; Cas

Comparing Several Samples We are often interested in comparing measurements made under more than

Enabling better device interaction with accelerometer 3 Feb 2013 etezian.org Andi Shyti Mika

Introduction to Sockets Programming in C using TCP/IP Professor: Panagiota Fatourou TA:

Development and Pixel2018 Performance of Phase-I Taipei, Taiwan Pixel DAQ in 2018 Atanu Modak,

Tools and Techniques for Floating-Point Analysis Ignacio Laguna Jan 7, 2020 @ LLNL Modified

Interrupts in Zynq Systems C r i s t i a n S i s t e r n a U n i v e r s i d a d N a c i o n a l

Deadly Embrace: Sovereign and Financial Balance Sheet Doom Loops Emmanuel Farhi Jean Tirole

Trigger and Data Acquisition (II) Brigitte Vachon (McGill) HCPSS 2010 HCPSS 2010 Brigitte

A Tropical Semiring Multiple Matrix-Product Library on GPUs: (not just) a step towards RNA-RNA

RUN groupadd -r user && useradd -r -g user user USER user $ docker run --read-only debian

Comparing Selected Water Comparing Selected Water Quality Trading Rules & Quality Trading

Comparing State Spaces in Automatic Security Protocol Verification Pascal Lafourcade & Cas