Identifying Patch Correctness in Test-based Program Repair Yingfei - PowerPoint PPT Presentation

Identifying Patch Correctness in Test-based Program Repair Yingfei Xiong, Xinyuan Liu, Muhan Zeng , Lu Zhang, Gang Huang Peking University

Test-based Program Repair Passing test Passing test Program Program’ Passing test Patch Passing test (Buggy) (Fixed) Failing Passing test test

Program repair: The cure Bug Disease Test Symptom Patch Therapy

Workflow : Program repair & hospital Feel bad Bug discovered Go to hospital Program repair Feel better Test passed Cured? Correct?

Symptoms are gone == cured? Therapy Plausible patches • Makes you free of • Pass all the tests pain • Disease may still • Can still be be there incorrect (overfit)

Tools: Hospitals • Precision: Correct / (Correct + Incorrect) 45.00% 40.00% 35.00% 30.00% 25.00% 20.00% 15.00% 10.00% 5.00% 0.00% Prophet Angelix Nopol Kali Genprog

Approach overview Test suite Patch Test-based program repair Buggy program Low precision Identifying patch High-quality Patch correctness patch High precision

Plausible patches: Wrong cure An incorrect patch produced by jKali [1] A test checking for null dataset. Test oracle: function draw returns normally (without exception) [1]Martinez M, Durieux T, Sommerard R, et al. Automatic repair of real bugs in java: A large-scale experiment on the defects4j dataset[J]. Empirical Software Engineering, 2017, 22(4): 1936-1964.

Bad therapy: What’s wrong here? Passing test Something is drawn The original draw Failing test Exception thrown (Null dataset) Should fail! Passing test Nothing is done Failing test Exception not thrown (Null dataset)

Wrong cure • All symptoms are cured but in a bad way • Problems are solved but not in a satisfying way • “My leg is wounded” • “Cut it off so you no longer have a hurt leg” Directly return No exception • Weak test oracle

Weak test oracle • No exception ≠ correct patch Weak test oracle

Plausible patches : Incomplete cure An incorrect patch with wrong condition generated by Nopol [1] [1]Xuan J, Martinez M, Demarco F, et al. Nopol: Correct developer patch with correct null guard Automatic repair of conditional statement bugs in java programs[J]. IEEE Transactions on Software Engineering, 2017, 43(1): 34-55.

Bad therapy: What’s wrong here? Passing test increase calculated repeat=true The original program Failing test Expecting: increase=0 repeat=false Get: Exception thrown increase should be 0 This test is not in the test suite! Passing test The whole loop repeat=false skipped Passing test Same as original repeat=true program The whole loop is Failing test skipped repeat=false increase = 0 increase should be 0

Incomplete cure • Incomplete cure: concerned symptoms are cured, but some other symptoms are not. • Bugs that covered by tests is fixed while others not • “We cured your left leg and cut off your right leg” Wrong condition • “So what about my right leg?” Missing test inputs • “Well, we only care about your left leg” Existing test inputs • Weak test input

Test suites and heuristics Test Test Input Test Oracle • Test suites are weak on both input and oracle. • Two heuristics to save weak test suites: • PATCH-SIM: compensate for weak test oracle • TEST-SIM: compensate for weak test input

PATCH-SIM: heuristic for test oracle Behavior on Behavior on Passing tests patched Similar original program program “Well, you should keep my legs (which were good) as good as before” Behavior on Behavior on Failing tests patched Different original program program “What’s more, the wound (which was bad) should be cured”

Bad cure identified! Passing test The original draw Something is drawn Different! Passing test Nothing happens “Well, you should keep my legs (which were good) as good as before”

TEST-SIM: heuristic for test input • PATCH-SIM on newly generated tests: pass or fail? The new test probably Behavior of the Behavior of a Similar new test passing test passes “My left leg is just like my right leg. My right leg is good, so my left leg is also good” The new test Behavior of the Behavior of a probably Similar new test failing test fails • Classification result can be used by PATCH-SIM

Bad cure identified! Different with Classified as original passing test behavior Passing test Passing test The whole loop repeat=false repeat=true skipped “Check my left leg, it’s good and I want it as good as before”

Workflow • “Check my left leg, it’s good and I want it as good as before” Oracle of Test generation Classification PATCH-SIM by TEST-SIM Test generation TEST-SIM PATCH-SIM New test inputs Classification Correctness

Similar? Different? • Test oracle: output Not so reliable • Result is not all: the process is also important • Runtime information: Behavior similarity

Details for ‘Behavior similarity’ • Complete-path spectrum [1] : the sequence of executed statements {1,2,3,2,3,2,3,2,4} • Distance and similarity: [1]Harrold M J, Rothermel G, Wu R, et al. An empirical investigation of program spectra, Acm Sigplan Notices. ACM, 1998, 33(7): 83-90

‘Similar’ is relative, not absolute Common cold Simple bug • Easy cure • Small patch • Slightly affect your • Slightly affect original body program behavior Cancer Complex bug • Big surgery • Big patch • Greatly affect your • Greatly affect original body program behavior • Behaviors on passing tests should be more similar

Effectiveness • Dataset: 139 Patches from jGenProg, Nopol, jKali, ACS and HDRepair • Defects4J benchmark • 56.3% of incorrect patches filtered out without losing any of the correct patches. 60.00% 50.00% 40.00% Anti-pattern: pre-defined patterns 30.00% 20.00% Opad : patches shouldn’t introduce crash 10.00% or memory safety problem (designed for C) 0.00% Ours Anti-pattern Opad Incorrect filtered Correct filtered

Summary • Many program repair tools have low precision • Patch correctness can be identified based on behavior similarity • 2 heuristics: PATCH-SIM and TEST-SIM • 56.3% incorrect patches filtered, 0 loss on correct patches

Discussion: complicate patches • Patches from APR are simple (for now). • Will our approach still be effective in the future? • E.g. on more complicate patches

Developer patches • 194 correct patches from Defects4J benchmark • 178(91.75%) still classified as correct • Reason for misclassification: • Significant behavior change • Calling a different method with the same functionality

Identifying Patch Correctness in Test-based Program Repair Yingfei - PowerPoint PPT Presentation

Identifying Patch Correctness in Test-based Program Repair Yingfei Xiong, Xinyuan Liu, Muhan Zeng , Lu Zhang, Gang Huang Peking University Test-based Program Repair Passing test Passing test Program Program Passing test Patch Passing

Proving Program Correctness The Axiomatic Approach What is Correctness? Correctness:

PATCH PLANER EX Series UNIVERSAL CUTTER ES Series by Chris Goebel 11/2012 PATCH PLANER EX

Indy Greenways INDY GREENWAYS Girl Scout Patch Program Welcome and Introductions INDY GREENWAYS

An Analysis of Patch Plausibility and Correctness of

Collaborative Project Call ID FP7-SST-2008-RTD-1 Proposal N 233969 Acronym:

How to install Patch Manager Plus at AWS Steps to install Patch Manager Plus at AWS 1. Login to

Non-Inferiority Assessment of Patch Adhesion Non-Inferiority Assessment of Patch Adhesion and and

La Larg rge-Scal Scale Patch Patch Reco comme mmendati ation at at Alibab aba Xindong

Multi- -Function Function Microstrip Microstrip Multi Patch Antennas Partially Patch Antennas

The S PARK Way to Correctness is Via Abstraction John Barnes SIGAda, Laurel, November 2000 John

Proving Correctness of Graph Programs Relative to Recursively Nested Conditions Nils Erik Flick

CORRECTNESS CRITERIA FOR CONCURRENCY & PARALLELISM 2 6/16/2010

Program Correctness Assert formal correctness statements about

08Program Verification II CS 5209: Foundation in Logic and AI Martin Henz and Aquinas Hobor

Reducing Total Correctness to Partial Correctness by a Transformation of the Language Semantics a

Bruin Patch & Big Wave C ate ring Recipe for Success: Community Partnerships Recipe for

CS 356 Lecture 9 Malicious Code Spring 2013 Review Chapter 1: Basic Concepts and

PASSIVE IMMUNOTHERAPY: TARGETING TUMOR CELLS CD38 in myeloma and beyond: groundwork & outlook

DUNE Photon Detector Review Photosensor Baseline & Testing V. Zutshi for the DUNE Photon

A Perspective on Security and Trust Requirements for the Future Dr. Kenneth Plaks International

PROGRAMMING SLIDES BY CLAIRE LE GOUES (MOSTLY) BUT ALSO SOME BY MAHSA VARSHOSAZ & ANDRZEJ

Leveraging Program Invariants to Promote Population Diversity in Search-Based Automatic Program

Symbolic Execution for Evolving Software Cristian Cadar Department of Computing Imperial College

Do Automated Program Repair Techniques Repair Hard and Important Bugs? Manish Motwani Sandhya

Identifying Patch Correctness in Test-based Program Repair Yingfei - PowerPoint PPT Presentation

Identifying Patch Correctness in Test-based Program Repair Yingfei Xiong, Xinyuan Liu, Muhan Zeng , Lu Zhang, Gang Huang Peking University Test-based Program Repair Passing test Passing test Program Program Passing test Patch Passing

Proving Program Correctness The Axiomatic Approach What is Correctness? Correctness:

PATCH PLANER EX Series UNIVERSAL CUTTER ES Series by Chris Goebel 11/2012 PATCH PLANER EX

Indy Greenways INDY GREENWAYS Girl Scout Patch Program Welcome and Introductions INDY GREENWAYS

An Analysis of Patch Plausibility and Correctness of

Collaborative Project Call ID FP7-SST-2008-RTD-1 Proposal N 233969 Acronym:

How to install Patch Manager Plus at AWS Steps to install Patch Manager Plus at AWS 1. Login to

Non-Inferiority Assessment of Patch Adhesion Non-Inferiority Assessment of Patch Adhesion and and

La Larg rge-Scal Scale Patch Patch Reco comme mmendati ation at at Alibab aba Xindong

Multi- -Function Function Microstrip Microstrip Multi Patch Antennas Partially Patch Antennas

The S PARK Way to Correctness is Via Abstraction John Barnes SIGAda, Laurel, November 2000 John

Proving Correctness of Graph Programs Relative to Recursively Nested Conditions Nils Erik Flick

CORRECTNESS CRITERIA FOR CONCURRENCY &amp; PARALLELISM 2 6/16/2010

Program Correctness Assert formal correctness statements about

08Program Verification II CS 5209: Foundation in Logic and AI Martin Henz and Aquinas Hobor

Reducing Total Correctness to Partial Correctness by a Transformation of the Language Semantics a

Bruin Patch &amp; Big Wave C ate ring Recipe for Success: Community Partnerships Recipe for

CS 356 Lecture 9 Malicious Code Spring 2013 Review Chapter 1: Basic Concepts and

PASSIVE IMMUNOTHERAPY: TARGETING TUMOR CELLS CD38 in myeloma and beyond: groundwork &amp; outlook

DUNE Photon Detector Review Photosensor Baseline &amp; Testing V. Zutshi for the DUNE Photon

A Perspective on Security and Trust Requirements for the Future Dr. Kenneth Plaks International

PROGRAMMING SLIDES BY CLAIRE LE GOUES (MOSTLY) BUT ALSO SOME BY MAHSA VARSHOSAZ &amp; ANDRZEJ

Leveraging Program Invariants to Promote Population Diversity in Search-Based Automatic Program

Symbolic Execution for Evolving Software Cristian Cadar Department of Computing Imperial College

Do Automated Program Repair Techniques Repair Hard and Important Bugs? Manish Motwani Sandhya

CORRECTNESS CRITERIA FOR CONCURRENCY & PARALLELISM 2 6/16/2010

Bruin Patch & Big Wave C ate ring Recipe for Success: Community Partnerships Recipe for

PASSIVE IMMUNOTHERAPY: TARGETING TUMOR CELLS CD38 in myeloma and beyond: groundwork & outlook

DUNE Photon Detector Review Photosensor Baseline & Testing V. Zutshi for the DUNE Photon

PROGRAMMING SLIDES BY CLAIRE LE GOUES (MOSTLY) BUT ALSO SOME BY MAHSA VARSHOSAZ & ANDRZEJ