REPRESENTATIONS AND OPERATORS FOR IMPROVING EVOLUTIONARY SOFTWARE - - PowerPoint PPT Presentation

representations and operators for improving evolutionary
SMART_READER_LITE
LIVE PREVIEW

REPRESENTATIONS AND OPERATORS FOR IMPROVING EVOLUTIONARY SOFTWARE - - PowerPoint PPT Presentation

REPRESENTATIONS AND OPERATORS FOR IMPROVING EVOLUTIONARY SOFTWARE REPAIR Claire Westley Stephanie Le Goues Weimer Forrest 1 http://genprog.cs.virginia.edu Everyday, almost 300 Annual cost of bugs appear [] far too many for only


slide-1
SLIDE 1

REPRESENTATIONS AND OPERATORS FOR IMPROVING EVOLUTIONARY SOFTWARE REPAIR

Claire Le Goues Westley Weimer Stephanie Forrest

http://genprog.cs.virginia.edu

1

slide-2
SLIDE 2

Claire Le Goues, GECCO 2012

PROBLEM: BUGGY SOFTWARE

http://genprog.cs.virginia.edu

“Everyday, almost 300 bugs appear […] far too many for only the Mozilla programmers to handle.”

– Mozilla Developer, 2005

Annual cost of software errors in the US: $59.5 billion (0.6% of GDP). Average time to fix a security-critical error: 28 days.

90%: Maintenance 10%: Everything Else

2

slide-3
SLIDE 3

Claire Le Goues, GECCO 2012

APPROACH: EVOLUTIONARY COMPUTATION

http://genprog.cs.virginia.edu

3

slide-4
SLIDE 4

Claire Le Goues, GECCO 2012

Genetic Programming Search Genetic Programming Search Input: source code, specification Output: repaired version of the program

http://genprog.cs.virginia.edu

4

slide-5
SLIDE 5

Claire Le Goues, GECCO 2012

SEARCH SPACE

http://genprog.cs.virginia.edu

5

slide-6
SLIDE 6

Claire Le Goues, GECCO 2012

Learning: patches or repaired programs Population: 40 Iterations: 10 Max variant size: unbounded Largest benchmark program: 2.8 million lines

  • f C code

Learning: patches or repaired programs Population: 40 Iterations: 10 Max variant size: unbounded Largest benchmark program: 2.8 million lines

  • f C code.

OTHER GP PROBLEMS

GECCO GP-TRACK BEST PAPERS

Learning: expression trees or lists Population: 64 – 2500 Iterations: 50 – 10000 Max variant size: 16

  • perations, 48 operations,

17 levels, 11 levels

PROGRAM REPAIR

http://genprog.cs.virginia.edu

6

slide-7
SLIDE 7

Claire Le Goues, GECCO 2012

SEARCH SPACE

http://genprog.cs.virginia.edu

7

EC-based repair starts with a large genome. The starting individual is mostly correct.

slide-8
SLIDE 8

Claire Le Goues, GECCO 2012

Genetic Programming Search

http://genprog.cs.virginia.edu

8

Input: source code, specification Output: repaired version of the program

slide-9
SLIDE 9

Claire Le Goues, GECCO 2012

IN-DEPTH STUDY OF REPRESENTATION AND OPERATORS FOR EVOLUTIONARY PROGRAM REPAIR.

OUR GOAL

http://genprog.cs.virginia.edu

9

IN-DEPTH STUDY OF REPRESENTATION AND OPERATORS FOR EVOLUTIONARY PROGRAM REPAIR.

slide-10
SLIDE 10

Claire Le Goues, GECCO 2012

INPUT OUTPUT EVALUATE FITNESS DISCARD ACCEPT

10

CROSSOVER, MUTATE, SELECT

slide-11
SLIDE 11

Claire Le Goues, GECCO 2012

IN-DEPTH STUDY OF REPRESENTATION AND OPERATORS FOR EVOLUTIONARY PROGRAM REPAIR.

OUR GOAL

http://genprog.cs.virginia.edu

11

slide-12
SLIDE 12

Claire Le Goues, GECCO 2012

INPUT OUTPUT EVALUATE FITNESS DISCARD ACCEPT

12

CROSSOVER, MUTATE, SELECT

slide-13
SLIDE 13

Claire Le Goues, GECCO 2012

CROSSOVER, MUTATE, SELECT DISCARD INPUT EVALUATE FITNESS ACCEPT OUTPUT

slide-14
SLIDE 14

Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu

14

5 6 8 9 10 11 12 7 1 2 3 4

slide-15
SLIDE 15

Claire Le Goues, GECCO 2012

Input:

http://genprog.cs.virginia.edu

REPRESENTATION

Patch:

Delete(3) Replace(3,5)

15

AST/WP:

1 2 4 5 5’ 2 4 1 5 5 4 1 2 3

Delete(3) Insert(2,4) Replace(5,1) Insert(5,4) Insert(3,3) Delete(4) …

slide-16
SLIDE 16

Claire Le Goues, GECCO 2012

Mutation operators

  • Manipulate only existing genetic material.
  • Semantic checking improves probability that mutation is

viable. Crossover:

  • One-point: on the weighted path or edit list.
  • Patch-subset: uniform, on the edit list.

GENETIC OPERATORS

http://genprog.cs.virginia.edu

16

swap insert delete

slide-17
SLIDE 17

Claire Le Goues, GECCO 2012

Mutation operators

  • Manipulate only existing genetic material.
  • Semantic checking improves probability that mutation is

viable. Crossover:

  • One-point: on the weighted path or edit list.
  • Patch-subset: uniform, on the edit list.

GENETIC OPERATORS

http://genprog.cs.virginia.edu

17

swap insert delete

Aside: mutation operator selection matters!

slide-18
SLIDE 18

Claire Le Goues, GECCO 2012

Legend:

Likely faulty.

probability

Maybe faulty.

probability

Not faulty.

2 5 6 1 3 4 8 7 9 11 10 12

http://genprog.cs.virginia.edu

Input:

18

2 5 6 1 3 4 8 7 9 11 10 12

slide-19
SLIDE 19

Claire Le Goues, GECCO 2012

2 5 6 1 3 4 8 7 9 11 10 12

http://genprog.cs.virginia.edu

Input:

19

2 5 6 1 3 4 8 7 9 11 10 12

Legend:

High change

probability.

Low change

probability.

Not changed.

slide-20
SLIDE 20

Claire Le Goues, GECCO 2012

2 5 6 1 3 4 8 7 9 11 10 12

http://genprog.cs.virginia.edu

Input:

20

2 5 6 1 3 4 8 7 9 11 10 12

Legend:

High change

probability.

Low change

probability.

Not changed.

Default: 10 : 1 ratio

slide-21
SLIDE 21

Claire Le Goues, GECCO 2012

IN-DEPTH STUDY OF REPRESENTATION AND OPERATORS FOR EVOLUTIONARY PROGRAM REPAIR.

OUR GOAL

http://genprog.cs.virginia.edu

21

slide-22
SLIDE 22

Claire Le Goues, GECCO 2012

Benchmarks: 105 bugs in 8 real-world programs.1

  • 5 million lines of C code, 10,000 test cases.
  • Bugs correspond to human-written repairs for regression test

failures. Default parameters, for comparison:

  • Patch representation.
  • Mutation operators selected with equal random probability. 1

mutation, 1 crossover/individual/iteration.

  • Population size: 40. 10 iterations or 12 wall-clock hours,

whichever comes first. Tournament size: 2.

EXPERIMENTAL SETUP

http://genprog.cs.virginia.edu

22

1Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer, “A

Systematic Study of Automated Program Repair: Fixing 55 out of 105 bugs for $8 Each.” International Conference on Software Engineering (ICSE), 2012, pp. 3 – 13.

slide-23
SLIDE 23

Claire Le Goues, GECCO 2012

55/105 bugs repaired using default parameters. Some bugs are more difficult to repair than others!

  • Easy: 100% success rate on default parameters.
  • Medium: 50 – 100% success rate on default parameters
  • Hard: 1 – 50% success rate on default parameters
  • Unfixed: 0% success

Metrics:

  • # fitness evaluations to a repair
  • GP success rate.

EXPERIMENTAL SETUP

http://genprog.cs.virginia.edu

23

slide-24
SLIDE 24

Claire Le Goues, GECCO 2012

Representation:

  • Which representation choice gives better results?
  • Which representation features contribute most to

success? Crossover: Which crossover operator is best? Operators:

  • Which operators contribute the most to success?
  • How should they be selected?

Search space: How should the representation weight program statements to best define the search space?

RESEARCH QUESTIONS

http://genprog.cs.virginia.edu

24

slide-25
SLIDE 25

Claire Le Goues, GECCO 2012

RESEARCH QUESTIONS

http://genprog.cs.virginia.edu

25

Representation:

  • Which representation choice gives better results?
  • Which representation features contribute most to

success? Crossover: Which crossover operator is best? Operators:

  • Which operators contribute the most to success?
  • How should they be selected?

Search space: How should the representation weight program statements to best define the search space?

slide-26
SLIDE 26

Claire Le Goues, GECCO 2012

RESEARCH QUESTIONS

http://genprog.cs.virginia.edu

26

Representation:

  • Which representation choice gives better results?
  • Which representation features contribute most to

success? Crossover: Which crossover operator is best? Operators:

  • Which operators contribute the most to success?
  • How should they be selected?

Search space: How should the representation weight program statements to best define the search space?

slide-27
SLIDE 27

Claire Le Goues, GECCO 2012

Procedure: Compare AST/WP to PATCH on original benchmarks with default parameters. For both representations, test effectiveness of:

  • 1. Crossover.
  • 2. Semantic check.

Results:

  • 1. Patch outperforms AST/WP (14 – 30%).
  • 2. Semantic check strongly influences success

rate of both representations.

  • 3. Crossover also improves results.

REPRESENTATION: RESULTS

http://genprog.cs.virginia.edu

27

slide-28
SLIDE 28

Claire Le Goues, GECCO 2012

Representation:

  • Which representation choice gives better results?
  • Which representation features contribute most to

success? Crossover: Which crossover operator is best? Operators:

  • Which operators contribute the most to success?
  • How should they be selected?

Search space: How should the representation weight program statements to best define the search space?

RESEARCH QUESTIONS

http://genprog.cs.virginia.edu

28

slide-29
SLIDE 29

Claire Le Goues, GECCO 2012

CROSSOVER: RESULTS

http://genprog.cs.virginia.edu

29

Crossover Operator Success Rate Fitness evaluations to repair None 54.4% 82.43 Default/“Uniform” 61.1% 163.05 One-Point/AST-WP 63.7% 114.12 One-Point/Patch 65.2% 118.20

slide-30
SLIDE 30

Claire Le Goues, GECCO 2012

Representation:

  • Which representation choice gives better results?
  • Which representation features contribute most to

success? Crossover: Which crossover operator is best? Operators:

  • Which operators contribute the most to success?
  • How should they be selected?

Search space: How should the representation weight program statements to best define the search space?

RESEARCH QUESTIONS

http://genprog.cs.virginia.edu

30

slide-31
SLIDE 31

Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu

31

SEARCH SPACE: SETUP

Hypothesis: statements executed only by the failing test case(s) should be mutated more often than those also executed by the passing test cases. Procedure: examine that ratio in actual repairs. Result: Expected: 10 : 1 vs. Actual: 1 : 1.85

slide-32
SLIDE 32

Claire Le Goues, GECCO 2012

34.3 93.6 103.1 66.0 49.1 75.7 27.1 59.5 36.3 67.0 57.7 58.6 10 20 30 40 50 60 70 80 90 100 110 Easy Medium Hard All

# fitness evaluations to repair Search difficulty

Default Realistic Equal

http://genprog.cs.virginia.edu

32

SEARCH SPACE: REPAIR TIME

slide-33
SLIDE 33

Claire Le Goues, GECCO 2012

1.00 0.78 0.24 0.00 0.62 0.93 0.85 0.23 0.23 0.69 0.92 0.90 0.33 0.19 0.70 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Easy Medium Hard 0% All

GP Success Rate Search difficulty

Default Realistic Equal

http://genprog.cs.virginia.edu

33

SEARCH SPACE: SUCCESS RATE

slide-34
SLIDE 34

Claire Le Goues, GECCO 2012

This EC problem is atypical; atypical problems warrant study. We studied representation and operators for EC-based bug

  • repair. These choices matter, especially for difficult bugs.
  • Incorporating all recommendations, GenProg repairs 5 new

bugs; repair time decreases by 17–43% on difficult bugs. We don’t know why some of these things are true, but:

  • We now have lots of interesting data to dig into!
  • We are currently (as we sit here) doing more and bigger runs

with the new parameters on the as-yet unfixed bugs.

CONCLUSIONS/DISCUSSION

http://genprog.cs.virginia.edu

34

slide-35
SLIDE 35

Claire Le Goues, GECCO 2012

PLEASE ASK QUESTIONS

http://genprog.cs.virginia.edu

35

slide-36
SLIDE 36

Claire Le Goues, GECCO 2012

REPRESENTATION BENCHMARKS

Program LOC Tests Bug Description gcd 22 6 infinite loop Example uniq-utx 1146 6 segfault Text processing look-utx 1169 6 segfault Dictionary lookup look-svr 1363 6 infinite loop Dictionary lookup units-svr 1504 6 segfault Metric conversion deroff-utx 2236 6 segfault Document processing nullhttpd 5575 7 buffer exploit Webserver indent 9906 6 infinite loop Source code processing flex 18775 6 segfault Lexical analyzer generator atris 21553 3 buffer exploit Graphical tetris game Total 63249

http://genprog.cs.virginia.edu

36

slide-37
SLIDE 37

Claire Le Goues, GECCO 2012

SEARCH SPACE BENCHMARKS

Program LOC Tests Bugs Description fbc 97,000 773 3 Language (legacy) gmp 145,000 146 2 Multiple precision math gzip 491,000 12 5 Data compression libtiff 77,000 78 24 Image manipulation lighttpd 62,000 295 9 Web server php 1,046,000 8,471 44 Language (web) python 407,000 355 11 Language (general) wireshark 2,814,000 63 7 Network packet analyzer Total 5,139,000 10,193 105

http://genprog.cs.virginia.edu

37

55/105 bugs repaired using default parameters.