program boosting program synthesis via crowd sourcing
play

Program Boosting: Program Synthesis via Crowd-Sourcing Robert A. - PowerPoint PPT Presentation

Program Boosting: Program Synthesis via Crowd-Sourcing Robert A. Cochran, Loris DAntoni, Benjamin Livshits, David Molnar and Margus Veanes Presented by: Sam Witty, Shehzaad Dhuliawala and Samer Nashed Outline Introduction (Research


  1. Program Boosting: Program Synthesis via Crowd-Sourcing Robert A. Cochran, Loris D’Antoni, Benjamin Livshits, David Molnar and Margus Veanes Presented by: Sam Witty, Shehzaad Dhuliawala and Samer Nashed

  2. Outline Introduction (Research Question, Key Ideas, Contributions) Background (Genetic Programming and Regular Expressions) Motivating Example Evaluation Discussion

  3. Research Question Can Crowd-Sourced solutions to programming tasks be combined automatically to boost performance?

  4. Key Idea Many common programming tasks are 1. Surprisingly complex, such that even expert programmers may struggle

  5. Key Idea Many common programming tasks are 1. Surprisingly complex, such that even expert programmers may struggle 2. Easily specified (at least to a good approximation) in English

  6. Key Idea Many common programming tasks are 1. Surprisingly complex, such that even expert programmers may struggle 2. Easily specified (at least to a good approximation) in English 3. Nuanced enough that different programmers will fail in different ways These attributes make tasks prime candidates for genetic programming towards program synthesis

  7. Conventional Programming Blood, sweat, and tears Mostly working program Image sources - 4H, The Economist

  8. Program Boosting Genetic Programming The one true solution Flawed programs Image sources - 4H, The Economist

  9. Contributions 1. Proposal of new technique: Program Boosting

  10. Contributions 1. Proposal of new technique: Program Boosting 2. Implementation of genetic programming for regular expressions using custom-designed crossover and mutation operations

  11. Contributions 1. Proposal of new technique: Program Boosting 2. Implementation of genetic programming for regular expressions using custom-designed crossover and mutation operations 3. Proposal of a new genetic programming paradigm in which the fitness function is evolved along with the candidate programs

  12. Contributions 1. Proposal of new technique: Program Boosting 2. Implementation of genetic programming for regular expressions using custom-designed crossover and mutation operations 3. Proposal of a new genetic programming paradigm in which the fitness function is evolved along with the candidate programs 4. Release of the tool, CROWDBOOST

  13. Contributions 1. Proposal of new technique: Program Boosting 2. Implementation of genetic programming for regular expressions using custom-designed crossover and mutation operations 3. Proposal of a new genetic programming paradigm in which the fitness function is evolved along with the candidate programs 4. Release of the tool, CROWDBOOST 5. First use of genetic programming on automata over complex alphabets, in this case UTF-16

  14. Contributions 1. Proposal of new technique: Program Boosting 2. Implementation of genetic programming for regular expressions using custom-designed crossover and mutation operations 3. Proposal of a new genetic programming paradigm in which the fitness function is evolved along with the candidate programs 4. Release of the tool, CROWDBOOST 5. First use of genetic programming on automata over complex alphabets, in this case UTF-16 6. Evaluation of the proposed method on 465 regular expressions

  15. Outline Introduction (Research Question, Key Ideas, Contributions) Background (Genetic Programming and Regular Expressions) Motivating Example Evaluation Discussion

  16. Background: Genetic Programming Genetic Programming is a technique wherein a computer program is evolved from some seed using a generic algorithm (often).

  17. Background: Genetic Programming Genetic Programming is a technique wherein a computer program is evolved from some seed using a generic algorithm (often). Three main components

  18. Background: Genetic Programming Genetic Programming is a technique wherein a computer program is evolved from some seed using a generic algorithm (often). Three main components Crossover - Merge candidate programs

  19. Background: Genetic Programming Genetic Programming is a technique wherein a computer program is evolved from some seed using a generic algorithm (often). Three main components Crossover - Merge candidate programs Mutation - Stochastically alter candidate programs

  20. Background: Genetic Programming Genetic Programming is a technique wherein a computer program is evolved from some seed using a generic algorithm (often). Three main components Crossover - Merge candidate programs Mutation - Stochastically alter candidate programs Fitness - Evaluate candidate programs

  21. Animation Click Me Source: Rafael Matsunaga

  22. Background: SFA and Regex [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 0 1 2 3 5 6 7 8 9 10 11 {-} [0-9] 4 Corresponding regex: [0-9]{3}(-)?[0-9]{7}

  23. Outline Introduction (Research Question, Key Ideas, Contributions) Background (Genetic Programming and Regular Expressions) Motivating Example Evaluation Discussion

  24. Motivating Example Determine whether a string is a valid phone number. Ex: 111-111-1111, 1111111111

  25. Method Overview Crossover Mutation Crowdsource the fitness Evaluate New Gen

  26. A1 A2 B1 B2

  27. A1 A2 B1 B2

  28. A1 A2 B1 B2

  29. A1 A2 B1 B2

  30. Identifying components [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 0 1 2 3 5 6 7 8 9 10 11 {-} [0-9] 4 [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 0 1 2 3 4 5 6 8 9 10 11 {-} [0-9] 7

  31. Strongly Connected Components [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 0 1 2 3 5 6 7 8 9 10 11 {-} [0-9] 4 [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 0 1 2 3 4 5 6 8 9 10 11 {-} [0-9] 7

  32. Stretches [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 0 1 2 3 5 6 7 8 9 10 11 {-} [0-9] 4 [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 0 1 2 3 4 5 6 8 9 10 11 {-} [0-9] 7

  33. Stretches [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 0 1 2 3 5 6 7 8 9 10 11 {-} [0-9] 4 [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 0 1 2 3 4 5 6 8 9 10 11 {-} [0-9] 7

  34. Single Entry - Single Exit [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 0 1 2 3 5 6 7 8 9 10 11 {-} [0-9] 4 [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 0 1 2 3 4 5 6 8 9 10 11 {-} [0-9] 7

  35. Single Entry - Single Exit [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 0 1 2 3 5 6 7 8 9 10 11 {-} [0-9] 4 [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 0 1 2 3 4 5 6 8 9 10 11 {-} [0-9] 7

  36. [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 0 1 2 3 5 6 7 8 9 10 11 {-} [0-9] 4 [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 0 1 2 3 4 5 6 8 9 10 11 {-} [0-9] 7

  37. [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 0 1 2 3 5 6 8 9 10 11 {-} {-} [0-9] [0-9] 7 4 Resulting Regex: [0-9]{3}-?[0-9]{2}-?[0-9]{4}

  38. Mutations 1. Diminishing 2. Augmenting

  39. Example Mutation [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 0 1 2 3 5 6 7 9 10 11 12 {-} {-} [0-9] [0-9] 8 4 Negative example: 012-456-7890 Assume numbers cannot begin with 0

  40. Example Mutation [1-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 0 1 2 3 5 6 7 9 10 11 12 {-} {-} [0-9] [0-9] 8 4 Negative example: 0 12-456-7890 Assume numbers cannot begin with 0

  41. Fitness Function ● A simple approach to calculate fitness for regular expressions would be to calculate accuracy on the training set, but this doesn’t scale well.

  42. Fitness Function ● A simple approach to calculate fitness for regular expressions would be to calculate accuracy on the training set, but this doesn’t scale well. ● Instead, evaluate cardinality of sets.

  43. Fitness Function ● A simple approach to calculate fitness for regular expressions would be to calculate accuracy on the training set, but this doesn’t scale well. ● Instead, evaluate cardinality of sets. Fitness(A) = (L(A ∩ P) + L(N - A) )/ L( P U N)

  44. Outline Introduction (Research Question, Key Ideas, Contributions) Background (Genetic Programming and Regular Expressions) Motivating Example Evaluation Discussion

  45. Evaluation Regular expressions were pulled from Regexlib.com, blogs, Stack Overflow, and a Bountify task set by the authors

  46. Evaluation Regular expressions were pulled from Regexlib.com, blogs, Stack Overflow, and a Bountify task set by the authors In total, 465 program pairs were used for a variety of tasks (phone numbers, dates, email addresses, URLs)

  47. Evaluation Regular expressions were pulled from Regexlib.com, blogs, Stack Overflow, and a Bountify task set by the authors In total, 465 program pairs were used for a variety of tasks (phone numbers, dates, email addresses, URLs) Mechanical turk was used to generate new examples, thus evolving the fitness function. Examples were accepted of 60% of turkers reached consensus

  48. Evaluation

  49. Results - Accuracy

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend