Leveraging Program Invariants to Promote Population Diversity in - - PowerPoint PPT Presentation

leveraging program invariants to promote population
SMART_READER_LITE
LIVE PREVIEW

Leveraging Program Invariants to Promote Population Diversity in - - PowerPoint PPT Presentation

Leveraging Program Invariants to Promote Population Diversity in Search-Based Automatic Program Repair Zhen Yu Ding , Yiwei Lyu , Christopher S. Timperley , Claire Le Goues University of Pittsburgh, Carnegie Mellon


slide-1
SLIDE 1

Leveraging Program Invariants to Promote Population Diversity in Search-Based Automatic Program Repair

Zhen Yu Ding†, Yiwei Lyu‡, Christopher S. Timperley‡, Claire Le Goues‡

† University of Pittsburgh, ‡ Carnegie Mellon University

1

slide-2
SLIDE 2

Bugs aren’t great...

In 2017

  • 3.7 billion people affected
  • Over $1.7 trillion of assets

affected Reduces developer productivity

  • Loss of time
  • Frustration

Z 2

slide-3
SLIDE 3

Automatic Bug Repair

Z 3

slide-4
SLIDE 4

Automatic Bug Repair

Z 4

slide-5
SLIDE 5

Buggy program

Z 5

Pos. Tests Neg. Tests

slide-6
SLIDE 6

Z 6

Buggy program Pos. Tests Neg. Tests

Candidate patches

slide-7
SLIDE 7

Z 7

Buggy program

Candidate patches

A patch passes all test cases. Repair found! How many test cases does each candidate patch pass? Passing positive tests Passing negative tests Fitness function: weighted sum Pos. Tests Neg. Tests

slide-8
SLIDE 8

Z 8

Buggy program

Candidate patches

A patch passes all test cases. Repair found! How many test cases does each candidate patch pass? Selected patches Passing positive tests Passing negative tests Fitness function: weighted sum Pos. Tests Neg. Tests

slide-9
SLIDE 9

Y 9

slide-10
SLIDE 10

Warning: this is not the normal GCD bug

  • ften seen in APR!

Y 10

slide-11
SLIDE 11

Y 11

slide-12
SLIDE 12

Y 12

slide-13
SLIDE 13

Y 13

slide-14
SLIDE 14

Y 14

slide-15
SLIDE 15

Works correctly when a != 0 Should return b when a = 0 This program returns 0 instead

Y 15

slide-16
SLIDE 16

Works correctly when a != 0 Should return b when a = 0 This program returns 0 instead Test Cases:

Y 16

a b Expected result Actual Result Passed? 5 7 1 2 2 12 16 4 3 3 10 10 ... ... ...

slide-17
SLIDE 17

Y 17

a b Expected result Actual Result Passed? 5 7 1 1 Yes 2 2 No 12 16 4 4 Yes 3 3 3 Yes 10 10 No ... ... ... ... ...

Works correctly when a != 0 Should return b when a = 0 This program returns 0 instead Test Cases:

slide-18
SLIDE 18

Problem: Should return b when a is 0 This program returns 0 instead

Y 18

slide-19
SLIDE 19

Problem: Should return b when a is 0 This program returns 0 instead Simplest fix is 2 steps: (1) Delete line 16 (2) Replace line 4 with line 8

Y 19

result=b;

slide-20
SLIDE 20

Simplest fix is 2 steps: (1) Delete line 16 (2) Replace line 4 with line 8 If we only perform step 1 (partial repair):

Y 20

slide-21
SLIDE 21

Simplest fix is 2 steps: (1) Delete line 16 (2) Replace line 4 with line 8 If we only perform step 1 (partial repair):

  • Still fails when a=0, passes otherwise
  • Cannot be differentiated just from test

results.

Y 21

slide-22
SLIDE 22

Patch indistinguishability

Test cases often fail to distinguish between different candidate patches. Plateau-like fitness landscape.

Z 22

  • S. Forrest, W. Weimer, T. Nguyen, and C. Le Goues, “A genetic programming approach to automated software repair,” in Genetic and Evolutionary

Computation Conference (GECCO), 2009, pp. 947–954.

  • E. Fast, C. Le Goues, S. Forrest, and W. Weimer, “Designing better fitness functions for automated program repair,” in Genetic and Evolutionary

Computation Conference, ser. GECCO ’10, 2010, pp. 965–972.

  • E. F. de Souza, C. Le Goues, and C. G. Camilo-Junior, “A novel fitness function for automated program repair based on source code checkpoints,” in

Genetic and Evolutionary Computation Conference, ser. GECCO ’18, 2018.

slide-23
SLIDE 23

Goal: distinguish patches better

Z 23

slide-24
SLIDE 24

Goal: distinguish patches better

Infer invariants to semantically describe candidate patches. Find semantically unique/diverse candidate patches.

Z 24

slide-25
SLIDE 25

Y 25

An intuition on why.

slide-26
SLIDE 26

Invariants when running positive tests (gcd(5,7), gcd(12,16), gcd(3,0), etc):

  • a>=0
  • b>=0
  • result>=0
  • a%result==0
  • b%result==0
  • ...

Y 26

An intuition on why.

slide-27
SLIDE 27

One simple fix: (1) Delete line 16 (2) Replace line 4 with line 8

Y 27

result=b;

An intuition on why.

slide-28
SLIDE 28

One simple fix: (1) Delete line 16 (2) Replace line 4 with line 8 If we only perform step 1 (partial repair):

  • Still fails when a=0, passes otherwise
  • Cannot be differentiated just from test

results.

Y 28

An intuition on why.

slide-29
SLIDE 29

Invariant a%result==0:

  • True when a != 0
  • False when a=0 (result is 0)

Y 29

An intuition on why.

slide-30
SLIDE 30

Invariant a%result==0:

  • True when a != 0
  • False when a=0 (result is 0)
  • True when a=0 in partial repair

Y 30

An intuition on why.

slide-31
SLIDE 31

Invariant a%result==0:

  • True when a != 0
  • False when a=0 (result is 0)
  • True when a=0 in partial repair

Partial repair results in invariant behavior change!

Y 31

An intuition on why.

slide-32
SLIDE 32

Daikon – an invariant detection tool

A mature dynamic invariant detection technique

  • Runs the program and record traces of intermediate variable values
  • Analyze the traces to learn invariants

Y 32

slide-33
SLIDE 33

Invariants when running positive tests (gcd(5,7), gcd(12,16), gcd(3,0), etc):

  • a>=0
  • b>=0
  • result>=0
  • a%result==0
  • b%result==0
  • ...

All were detected by Daikon

Y 33

slide-34
SLIDE 34

Z 34

Buggy program

Candidate patches

A patch passes all test cases. Repair found! How many test cases does each candidate patch pass? Selected patches Passing positive tests Passing negative tests Fitness function: weighted sum Pos. Tests Neg. Tests

slide-35
SLIDE 35

Z 35

Buggy program

Candidate patches

A patch passes all test cases. Repair found! How many test cases does each candidate patch pass? Starting set of invariants. Daikon Selected patches Passing positive tests Passing negative tests Fitness function: weighted sum Pos. Tests Neg. Tests

slide-36
SLIDE 36

Z 36

Buggy program

Candidate patches

A patch passes all test cases. Repair found! How many test cases does each candidate patch pass? Starting set of invariants. Daikon Do these invariants still hold in candidate patches? Selected patches Passing positive tests Passing negative tests Fitness function: weighted sum Pos. Tests Neg. Tests

slide-37
SLIDE 37

Z 37

Starting set

  • f invariants

a%result==0 b%result==0 result>=0

slide-38
SLIDE 38

Z 38

Starting set

  • f invariants

Candidate patch 0

a%result==0 b%result==0 result>=0

slide-39
SLIDE 39

Z 39

Starting set

  • f invariants

Candidate patch 0 Tested against

a%result==0

  • Pos. tests

b%result==0 result>=0

= Invariant never violated during program execution.

slide-40
SLIDE 40

Z 40

Starting set

  • f invariants

Candidate patch 0 Tested against

a%result==0

  • Pos. tests

  • Neg. tests

b%result==0 result>=0

= Invariant never violated during program execution. ✘ = Invariant violated at least

  • nce.
slide-41
SLIDE 41

Z 41

Starting set

  • f invariants

Candidate patch 0 Tested against

a%result==0

  • Pos. tests

  • Neg. tests

b%result==0

  • Pos. tests

result>=0

= Invariant never violated during program execution. ✘ = Invariant violated at least

  • nce.
slide-42
SLIDE 42

Z 42

Starting set

  • f invariants

Candidate patch 0 Tested against

a%result==0

  • Pos. tests

  • Neg. tests

b%result==0

  • Pos. tests

?

  • Neg. tests

result>=0

= Invariant never violated during program execution. ✘ = Invariant violated at least

  • nce.

? = Invariant not testable.

slide-43
SLIDE 43

Z 43

Starting set

  • f invariants

Candidate patch 0 Tested against

a%result==0

  • Pos. tests

  • Neg. tests

b%result==0

  • Pos. tests

?

  • Neg. tests

result>=0

  • Pos. tests

  • Neg. tests

= Invariant never violated during program execution. ✘ = Invariant violated at least

  • nce.

? = Invariant not testable.

slide-44
SLIDE 44

Z 44

Starting set

  • f invariants

Candidate patch 0 Candidate patch 1

a%result==0

b%result==0

? ✘

result>=0

✘ ✘ = Invariant never violated during program execution. ✘ = Invariant violated at least

  • nce.

? = Invariant not testable.

slide-45
SLIDE 45

Z 45

Starting set

  • f invariants

Candidate patch 0 Candidate patch 1

a%result==0

b%result==0

? ✘

result>=0

✘ ✘

Invariant profile:

Describes the semantics of a program based on a set of predicates.

slide-46
SLIDE 46

Z 46

Starting set

  • f invariants

Candidate patch 0 Candidate patch 1

a%result==0

b%result==0

? ✘

result>=0

✘ ✘

Invariant profile:

Describes the semantics of a program based on a set of predicates. We use string comparisons to compare program semantics.

  • We use Hamming

distances.

slide-47
SLIDE 47

Z 47

Starting set

  • f invariants

Candidate patch 0 Candidate patch 1

a%result==0

b%result==0

? ✘

result>=0

✘ ✘

Invariant profile:

Describes the semantics of a program based on a set of predicates. We use string comparisons to compare program semantics.

  • We use Hamming

distances.

Δ(p0, p1) = 2

slide-48
SLIDE 48

Z 48

Starting set

  • f invariants

Candidate patch 0 Candidate patch 1 Candidate patch 2

a%result==0

✘ ✘ ✘

b%result==0

? ✘ ?

result>=0

✘ ✘ ✘

Invariant profile:

Describes the semantics of a program based on a set of predicates. We can use string comparisons to compare program semantics.

  • We use Hamming

distances. We can calculate semantic diversity.

  • Sum the Hamming

distances.

Δ(p0, p1) = 2 Δ(p1, p2) = 3 Δ(p0, p2) = 1 diversity(p0) = Δ(p0, p1) + Δ(p0, p2) = 2 + 1 = 3 diversity(p1) = Δ(p1, p0) + Δ(p1, p2) = 2 + 3 = 5 diversity(p2) = Δ(p2, p0) + Δ(p2, p1) = 1 + 3 = 4

slide-49
SLIDE 49

Z 49

Buggy program

Candidate patches

A patch passes all test cases. Repair found! How many test cases does each candidate patch pass? Starting set of invariants. Daikon Do these invariants still hold in candidate patches? Selected patches Passing positive tests Passing negative tests Fitness function: weighted sum Pos. Tests Neg. Tests

slide-50
SLIDE 50

Z 50

Buggy program

Candidate patches

A patch passes all test cases. Repair found! How many test cases does each candidate patch pass? Starting set of invariants. Daikon Do these invariants still hold in candidate patches? Selected patches Passing positive tests Passing negative tests Fitness function: weighted sum Invariant profiles Pos. Tests Neg. Tests

slide-51
SLIDE 51

Z 51

Buggy program

Candidate patches

A patch passes all test cases. Repair found! How many test cases does each candidate patch pass? Starting set of invariants. Daikon Do these invariants still hold in candidate patches? Selected patches Passing positive tests Passing negative tests Fitness function: weighted sum Invariant profiles Diversity scores Pos. Tests Neg. Tests

slide-52
SLIDE 52

Z 52

Buggy program

Candidate patches

A patch passes all test cases. Repair found! How many test cases does each candidate patch pass? Selected patches Starting set of invariants. Daikon Do these invariants still hold in candidate patches? Invariant profiles Multiobjective

  • ptimization

(NSGA-II) Passing positive tests Passing negative tests Diversity scores Pos. Tests Neg. Tests

slide-53
SLIDE 53

Evaluation

  • IntroClass is a set of small, buggy C programs collected from introductory

programming courses.

  • IntroClassJava is a subset of IntroClass automatically transformed from C to

Java.

  • Randomly sampled 59 out of 297 bugs in IntroClassJava for our experiment
  • Run each selected bug 10 times with different randomization seeds.

Y 53

checksum digits grade median smallest syllables Total 2/11 14/75 19/89 9/57 13/52 2/13 59/297

slide-54
SLIDE 54

Results

No evidence of improvement in repair performance. Successfully shown that our approach:

  • Promotes semantic diversity
  • Improves fitness granularity (therefore reduced plateauing)

Y 54

slide-55
SLIDE 55

Z 55

GenProg implicitly selects for semantic diversity.

slide-56
SLIDE 56

Scalability

IntroClassJava is small (<30 LoC) Defects4J is large, real-world Java bugs

Y 56

Lines of Code Number of Unit Tests Apache Commons Math ~85K 3602 Apache Commons Lang ~20K 2245

slide-57
SLIDE 57

Scalability

IntroClassJava is small (<30 LoC) Defects4J is large, real-world Java bugs

  • Infeasible to collect invariants by running all thousands of positive tests
  • Instead, we only collect invariants by running positive tests co-located in the

same test class as the failing test cases.

Y 57

Lines of Code Number of Unit Tests Apache Commons Math ~85K 3602 Apache Commons Lang ~20K 2245

slide-58
SLIDE 58

Scalability

Overheads: invariant learning and checking Our approach is as scalable as GenProg!

Y 58

Bug GenProg Runtime (mins) Our Approach’s Runtime (mins) Difference

lang11 59.77 64.37 1.08 X lang29 29.72 37.05 1.25 X lang36 34.80 41.08 1.18 X lang8 97.50 103.98 1.07 X lang9 55.07 70.87 1.29 X math30 89.27 90.55 1.01 X math44 98.43 176.88 1.80 X math46 67.05 720.48 10.75 X math79 100.55 119.63 1.19 X math86 62.52 71.45 1.14 X

Median 64.78 81.00 1.19 X Mean 69.47 149.63 2.18 X

slide-59
SLIDE 59

Conclusion

Z 59

slide-60
SLIDE 60

Conclusion

Test cases often can’t distinguish between different patches.

Z 60

slide-61
SLIDE 61

Conclusion

Test cases often can’t distinguish between different patches. We use inferred invariants to get more semantic information.

Z 61

slide-62
SLIDE 62

Conclusion

Test cases often can’t distinguish between different patches. We use inferred invariants to get more semantic information. We encourage exploration of semantically diverse patches.

Z 62

slide-63
SLIDE 63

Conclusion

Test cases often can’t distinguish between different patches. We use inferred invariants to get more semantic information. We encourage exploration of semantically diverse patches. Invariants can effectively promote diversity & semantic exploration.

Z 63

slide-64
SLIDE 64

Conclusion

Test cases often can’t distinguish between different patches. We use inferred invariants to get more semantic information. We encourage exploration of semantically diverse patches. Invariants can effectively promote diversity & semantic exploration. No conclusive results on improvements to repair success and efficiency.

Z 64