Fixing bugs in Python programs with Genetic Improvement Program - PowerPoint PPT Presentation

Saemundur Haraldsson John Woodward Sandy Brownlee Fixing bugs in Python programs with Genetic Improvement Program size and search granularity

Overview of talk ● Developing a GI framework for Python programs ● Search granularity and program size ● Breaking and fixing small Python programs 2

Motivation GI has already been successfully applied to large software, >50K LOC ● (Langdon et al. & Le Goues et al.) Pushing GI to its lower size limit for usefulness ● “The competent programmer hypothesis” for students ● Easier to analyse exactly what the GI is doing ● 3

GI for Python 4

GI for Python ----- Entities of the population Evolving Edit lists ● A single edit: < “Edit”, “Old code”, “New code”, “Location”> ○ Available edits ● Copy, Swap, Delete and Replace ○ Movable code ● Whole Lines ○ Boolean operators: 'or', 'and', 'not', '<=', '!=', etc. ○ Mathematical operators: '+', '*', '-', '%', etc ○ Incremental operators: '+=', '*=', '/=’, ‘-=’ ○ Numerical constants ○ Fitness function ● Number of passed test cases ○ 5

GI for Python ----- Features of the evolution The usual customizable properties ● Population size ○ Number of generations ○ Selection ○ Survival / Elitism ○ Offspring entities made with mutation ● only Grow: Append randomly generated edits ○ Prune: Shorten the list of edits ○ Single edit mutation: Randomly select 1 ○ edit and change it slightly. 6

GI for Python ----- Features of the evolution The usual customizable properties ● Population size ○ <REPLACE, ‘<’, ‘>’, 34, 12> Number of generations ○ Selection ○ Survival / Elitism ○ Offspring entities made with mutation ● only <REPLACE, ‘<’, ‘>’, 34, 12><REPLACE, ‘2’, ‘1’, 65, 20> Grow: Append randomly generated edits ○ Prune: Shorten the list of edits ○ Single edit mutation: Randomly select 1 ○ edit and change it slightly. 7

GI for Python ----- Features of the evolution The usual customizable properties ● Population size ○ <REPLACE, ‘<’, ‘>’, 34, 12><REPLACE, ‘2’, ‘1’, 65, 20> Number of generations ○ Selection ○ Survival / Elitism ○ Offspring entities made with mutation ● only <REPLACE, ‘<’, ‘>’, 34, 12> Grow: Append randomly generated edits ○ Prune: Shorten the list of edits ○ Single edit mutation: Randomly select 1 ○ edit and change it slightly. 8

GI for Python ----- Features of the evolution The usual customizable properties ● Population size ○ <REPLACE, ‘<’, ‘>’, 34, 12><REPLACE, ‘2’, ‘1’, 65, 20> Number of generations ○ Selection ○ Survival / Elitism ○ Offspring entities made with mutation ● only <REPLACE, ‘<’, ‘==’, 34, 12><REPLACE, ‘2’, ‘1’, 65, 20> Grow: Append randomly generated edits ○ Prune: Shorten the list of edits ○ Single edit mutation: Randomly select 1 ○ edit and change it 9

Search Granularity Program Size 10

Search Granularity Step size of search algorithm Generation restart Variable Code blocks names Characters Lines Size of code chunks being moved Operators such as +-*/ Single point mutations 11

Search Granularity ----- Experimental setup Movable code Step size Random line edits ● All Grow and Single edit Like for like line edits ● available Prune Movable code Change operators: math, boolean ● X and incremental. Random lines Step size (mutation choices) X X X Like for like lines Grow and prune only (variable ● Operators and X X X size) numbers Single edit mutations and Grow ● (single edit growth) Both above ● 12

Program size Lines of Code ● Ranging from 5 - 100 ○ Implemented from various online sources ● “100+ python challenging programming exercises” ○ www.ActiveState.com -- code recipies ○ www.Cprogramming.com -- challenge ○ Beginner level programs that contain common code elements ● Simple numerical calculations: Factorial ○ Mathematical constants approximations: pi, e, sqrt(2) ○ Simple text input Calculator ○ etc. ○ 13

Breaking and Fixing 14

Breaking and fixing, The breaking process Start with correct implementation ● Used as an oracle to produce a test suite ○ GI applied with reversed objectives. ● Evaluated with unittest ○ Evolution is stopped if a valid break is ● found. A program is broken if it: ● Fails on at least 1 test case ○ Does not produce run time errors on at ○ least half of the test suite 15

Breaking and fixing, The fixing process Objectives are: ● Number of test cases passed ○ Size of edit list, i.e. number of changes to ○ the broken program Runs for 50 generations (population of ● 20) Returns the overall best solution. ● Fewest number of changes made to the ○ program to pass the greatest number of test cases. 16

Experiments, Line for line Broken Fixed 100 experiments Program Size Avg. size Avg. evals -> Avg. proportion Avg. size of fixer LOC of breaker fixed of error variants count_digs_letters 9 1 15.2 75% 2.01 dict_square 5 1 6.3 68% 1.5 divisable_5 7 1 10.2 81% 3.7 even_digits 13 1 4 74% 1.2 factorial 5 N/A N/A 100% N/A formula_this 8 1 6.2 72% 4.1

Experiments, Line for line Broken Fixed Program Size Avg. Avg. evals -> Avg. proportion of Avg. size of fixer LOC size of fixed error variants breaker lines_2_list 12 1 10.9 67% 4.01 list_tuple 5 N/A N/A 100% N/A make_multiMatrix 8 1 14.5 80% 3.4 sort_unique 5 1 13.2 45% 2.13 sort_words 5 1 8.4 51% 1.25

Experiments, Summary of line for line Breaking ● Fitness is effectively binary: broken or not broken ○ pass all or no test cases ■ Highly unlikely programming errors. ○ e.g. forgetting a complete line? ■ Takes only one line out of place to break. ○ If a valid break exists it is found in first generation. ○ Fixing ● Takes longer to find the fix than the break ○ High proportion of variants do not run ○ and those that run are mostly semantically identical, i.e. loads of redundancy ■

Experiments, finer grained def dict_squares(n) d=dict() for i in range(1,n+1): Case example, Dictionary of squares d[i]=i*i return d Input: single integer n ● Output: dictionary of all the numbers ● squared from 0 to n 5 test cases which include boundary ● inputs, n = 0 and 1 Program was broken by replacing the ● first occurrence of 1 with 2. def dict_squares(n) <REPLACE, ‘1’, ‘2’, 2,15> ○ d=dict() Then the GI was run 100 times to fix. for i in range(2,n+1): ● d[i]=i*i No elitism ○ return d 20

Experiments, Finer grained: Dictionary of squares

Experiments, finer grained: Dictionary of squares

Experiments, finer grained Case example: A simple text input calculator ~100 LOC ● Inserted bugs with 4 edits ● Forced by increasing the required failed test cases ○ <REPLACE, ’*’, ’+’, 24, 4><REPLACE, ’-’, ’+’, 22, 4><REPLACE, ’/’, ’**’, 36, 4><REPLACE, ’+’, ’%’, 20, 4> ○ Fails all test cases (19) ● At least one test case for each function: +, -, *, and / ○ and the rest combines them ○ Again: GI run 100 times to fix ● Now with elitism ○ 23

Experiments, finer grained 24

Experiments, summary of finer grained Sometimes finds mutations that pass ● Fitness some test cases Fitness is not always binary, rather a ○ step: passes 1 or 2 boundary cases. More bugs -> more needles ○ Much more realistic programming ● errors typing “=” instead of “+=” or “<” instead of ○ “<=” Only one edit needed to break ● Gen. 25

Experiments, summary of finer grained We can nearly always find a valid break ● Syntactically correct programs ○ High proportion of variants run ○ For such small programs the fix is usually converting it back to the ● original. No clever fixes, that weren’t foreseen. ○ The fix is most often found in the first 5-10 generations. ● Still, finding the fix takes much longer than finding the break. ● In practice “Needle/s in a haystack” fitness function that is largely level. ○

Summary 27

Summary GI for Python programs is doable and promising ● Tested on multiple small programs ● Considered 2 dimensions of search granularity ● Step size ○ Movable code ○ Line based GI is not a realistic option for small programs ● Where the boundary of size lies remains to be confirmed ○ Smaller programs call for finer grained searches ● 28

Thanks for listening Questions? 29

Fixing bugs in Python programs with Genetic Improvement Program - PowerPoint PPT Presentation

Saemundur Haraldsson John Woodward Sandy Brownlee Fixing bugs in Python programs with Genetic Improvement Program size and search granularity Overview of talk Developing a GI framework for Python programs Search granularity and

Defect Detection Thomas Zimmermann The First Bug September 9, 1947 More Bugs More Bugs More

Outline Bugs! 1 Avoiding and Finding bugs 2 Bugs still happen 3 Why do bugs still happen ?!

Datalog-based Scalable Semantic Diffing of Concurrent Programs Chungha Sung | Shuvendu K. Lahiri

1 2 Genetic Program Genetic Program Parameter 3 Genetic Program Genetic Program 4 Softcoding

Python for Data Science Overview of Python Why Python Installing Python Installing Python Modules

30 minutes to fix bugs in Eclipse Eclipse Con Europe 2015 November 3 2015 I - Fixing bugs I

Python Tidbits Python created by that guy ---> Python is named after Monty Pythons

DEBUGGING LESSONS LEARNED WHILE DEBUGGING LESSONS LEARNED WHILE FIXING NETBSD FIXING NETBSD

BED BUGS HOW TO HELP SOLVE THE PROBLEM WHAT ARE BED BUGS? Bed bugs are parasites that feed on

IST-Pesticides RESEARCH SUPPORTED BY: Osborne Natural Enemies Bugs eating Bugs What

IN SCRUM PROJECTS Ramesh Shiraddi Bugs Current sprint bugs -- Created and found in current

Bugs, Bugs, Bugs Uwe Schindler Apache Lucene Committer & PMC Member uschindler@apache.org

Part I. Hunting for Bugs Vadim Mutilin Institute for System Programming of the Russian Academy of

A SYSTEMATIC STUDY OF AUTOMATED PROGRAM REPAIR: FIXING 55 OUT OF 105 BUGS FOR $8 EACH Claire

Genetic.io Genetic Algorithms in all their shapes and forms ! Genetic.io Make something of your

Germ- -line Genetic Therapy line Genetic Therapy Germ Munson- -Davis Look Bravely at a Davis

ARTIFICIAL INTELLIGENCE Evolutionary computing Lecturer: Silja Renooij These slides are part of

For Monday Nothing due Chris and Rons project talks Program 4 Discussion Demo

Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre

Sequence Alignment: Scoring Schemes COMP 571 Luay Nakhleh, Rice University 2 Scoring Schemes

Miquel dynamics for circle patterns Sanjay Ramassamy ENS Lyon Partly joint work with Alexey

rt t t

Purity Dependant Markov Models for Microsatellite Mutation Tristan L. Stark University of

Longest Cycle Crossover for Solving the Capacitated Vehicle Routing Problem Depar artment ment

Fixing bugs in Python programs with Genetic Improvement Program - PowerPoint PPT Presentation

Saemundur Haraldsson John Woodward Sandy Brownlee Fixing bugs in Python programs with Genetic Improvement Program size and search granularity Overview of talk Developing a GI framework for Python programs Search granularity and

Defect Detection Thomas Zimmermann The First Bug September 9, 1947 More Bugs More Bugs More

Outline Bugs! 1 Avoiding and Finding bugs 2 Bugs still happen 3 Why do bugs still happen ?!

Datalog-based Scalable Semantic Diffing of Concurrent Programs Chungha Sung | Shuvendu K. Lahiri

1 2 Genetic Program Genetic Program Parameter 3 Genetic Program Genetic Program 4 Softcoding

Python for Data Science Overview of Python Why Python Installing Python Installing Python Modules

30 minutes to fix bugs in Eclipse Eclipse Con Europe 2015 November 3 2015 I - Fixing bugs I

Python Tidbits Python created by that guy ---&gt; Python is named after Monty Pythons

DEBUGGING LESSONS LEARNED WHILE DEBUGGING LESSONS LEARNED WHILE FIXING NETBSD FIXING NETBSD

BED BUGS HOW TO HELP SOLVE THE PROBLEM WHAT ARE BED BUGS? Bed bugs are parasites that feed on

IST-Pesticides RESEARCH SUPPORTED BY: Osborne Natural Enemies Bugs eating Bugs What

IN SCRUM PROJECTS Ramesh Shiraddi Bugs Current sprint bugs -- Created and found in current

Bugs, Bugs, Bugs Uwe Schindler Apache Lucene Committer &amp; PMC Member uschindler@apache.org

Part I. Hunting for Bugs Vadim Mutilin Institute for System Programming of the Russian Academy of

A SYSTEMATIC STUDY OF AUTOMATED PROGRAM REPAIR: FIXING 55 OUT OF 105 BUGS FOR $8 EACH Claire

Genetic.io Genetic Algorithms in all their shapes and forms ! Genetic.io Make something of your

Germ- -line Genetic Therapy line Genetic Therapy Germ Munson- -Davis Look Bravely at a Davis

ARTIFICIAL INTELLIGENCE Evolutionary computing Lecturer: Silja Renooij These slides are part of

For Monday Nothing due Chris and Rons project talks Program 4 Discussion Demo

Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre

Sequence Alignment: Scoring Schemes COMP 571 Luay Nakhleh, Rice University 2 Scoring Schemes

Miquel dynamics for circle patterns Sanjay Ramassamy ENS Lyon Partly joint work with Alexey

rt t t

Purity Dependant Markov Models for Microsatellite Mutation Tristan L. Stark University of

Longest Cycle Crossover for Solving the Capacitated Vehicle Routing Problem Depar artment ment

Python Tidbits Python created by that guy ---> Python is named after Monty Pythons

Bugs, Bugs, Bugs Uwe Schindler Apache Lucene Committer & PMC Member uschindler@apache.org