Learning Programs from Noisy Data Veselin Raychev Pavol Bielik - PowerPoint PPT Presentation

Noisy synthesis using SMT total solution cost p best = arg min errors (D, p) + λ r(p) number of instructions p ∈ P err 1 = if p( )= then 0 else 1 err 2 = if p( )= then 0 else 1 Ψ err 3 = if p( )= then 0 else 1 encoding errors = err 1 + err 2 + err 3 formula given to SMT solver p ∈ P r (with r instructions) Ask a number of number of errors cost SMT queries in increasing value 0 1 2 3 of solution cost UNSAT UNSAT 1 0.6 1.6 2.6 3.6 e.g. for r λ = 0.6 2 UNSAT 1.2 2.2 3.2 4.2 costs are UNSAT 3 1.8 2.8 3.8 4.8 33

Noisy synthesis using SMT total solution cost p best = arg min errors (D, p) + λ r(p) number of instructions p ∈ P err 1 = if p( )= then 0 else 1 err 2 = if p( )= then 0 else 1 Ψ err 3 = if p( )= then 0 else 1 encoding errors = err 1 + err 2 + err 3 formula given to SMT solver p ∈ P r (with r instructions) Ask a number of number of errors cost SMT queries in increasing value 0 1 2 3 of solution cost UNSAT UNSAT 1 0.6 1.6 2.6 3.6 e.g. for r SAT λ = 0.6 2 UNSAT 1.2 2.2 3.2 4.2 costs are UNSAT 3 1.8 2.8 3.8 4.8 34

Noisy synthesis using SMT total solution cost p best = arg min errors (D, p) + λ r(p) number of instructions p ∈ P err 1 = if p( )= then 0 else 1 err 2 = if p( )= then 0 else 1 Ψ err 3 = if p( )= then 0 else 1 encoding errors = err 1 + err 2 + err 3 formula given to SMT solver p ∈ P r (with r instructions) Ask a number of number of errors cost SMT queries in increasing value 0 1 2 3 of solution cost best program is UNSAT UNSAT 1 0.6 1.6 2.6 3.6 e.g. for with two r SAT λ = 0.6 2 UNSAT 1.2 2.2 3.2 4.2 instructions and costs are makes one error UNSAT 3 1.8 2.8 3.8 4.8 35

Noisy synthesizer: example Take an actual synthesizer and show that we can make it handle noise 36

Implementation: BitSyn For BitStream programs, using Z3 similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11] Example program: function check_if_power_of_2( int32 x ) { var o = add(x, 1) synthesized, short loop-free programs return bitwise_and(x, o) } 37

Implementation: BitSyn For BitStream programs, using Z3 similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11] Example program: function check_if_power_of_2( int32 x ) { var o = add(x, 1) synthesized, short loop-free programs return bitwise_and(x, o) } Question: how well does our synthesizer discover noise? (in programs from prior work) 38

Implementation: BitSyn For BitStream programs, using Z3 similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11] Example program: function check_if_power_of_2( int32 x ) { var o = add(x, 1) synthesized, short loop-free programs return bitwise_and(x, o) } Question: how well does our synthesizer discover noise? (in programs from prior work) 39

Implementation: BitSyn For BitStream programs, using Z3 similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11] best area to be in. empirically pick λ here Example program: function check_if_power_of_2( int32 x ) { var o = add(x, 1) synthesized, short loop-free programs return bitwise_and(x, o) } Question: how well does our synthesizer discover noise? (in programs from prior work) 40

So far… handling noise ● Problem statement and regularization ● Synthesis procedure using SMT ● Presented one synthesizer Handling noise enables us to solve new classes of problems beyond normal synthesis 41

Contributions New probabilistic models 2. use probabilistic model parametrized Handling noise with p 1. synthesize p Input/output examples Handling large datasets incorrect examples Representative Program dataset generator sampler 42

Contributions New probabilistic models 2. use probabilistic model parametrized Handling noise with p 1. synthesize p Input/output examples Handling large datasets Handling large datasets incorrect examples Representative Program dataset generator sampler 43

Fundamental problem Large number of examples: p best = arg min cost (D, p) p ∈ P 44

Fundamental problem Large number of examples: p best = arg min cost (D, p) p ∈ P D Millions of input/output examples 45

Fundamental problem Large number of examples: p best = arg min cost (D, p) p ∈ P computing cost (D, p) O( |D| ) D Millions of input/output examples 46

Fundamental problem Large number of examples: p best = arg min cost (D, p) p ∈ P computing cost (D, p) O( |D| ) Synthesis: practically intractable D Millions of input/output examples 47

Fundamental problem Large number of examples: p best = arg min cost (D, p) p ∈ P computing cost (D, p) O( |D| ) Synthesis: practically intractable D Millions of Key idea: iterative synthesis on input/output fraction of examples examples 48

Our solution: two components Synthesizer for small number of examples Program generator p best = arg min cost (d, p) p ∈ P given dataset d, finds best program 49

Our solution: two components Synthesizer for small number of examples Program generator p best = arg min cost (d, p) p ∈ P given dataset d, finds best program Dataset sampler We introduce representative dataset Picks dataset d ⊆ D sampler Generalize a user providing input/output examples 50

In a loop Program Representative generator dataset sampler 51

In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. 52

In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. p 1 Program generator 53

In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. p 1 Program generator Representative d dataset sampler 54

In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. p 1 Program generator Representative d dataset sampler p 1 , p 2 Program generator 55

In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. p 1 Program generator Representative d dataset sampler p 1 , p 2 Program generator Representative dataset sampler 56

In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. p 1 Program generator Representative d dataset sampler p 1 , p 2 Program generator p best Representative dataset sampler 57

In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. p 1 Program generator Representative d dataset sampler p 1 , p 2 Program generator p best Representative dataset sampler Algorithm generalizes synthesis by examples techniques 58

Representative dataset sampler Idea: pick a small dataset d for which a set of already generated programs p 1 ,...,p n behave like on the full dataset d = arg min d ⊆ D max i ∊ 1..n | cost(d, p i ) - cost(D, p i ) | 59

Representative dataset sampler Idea: pick a small dataset d for which a set of already generated programs p 1 ,...,p n behave like on the full dataset d = arg min d ⊆ D max i ∊ 1..n | cost(d, p i ) - cost(D, p i ) | p 1 p 2 Costs on small dataset d 60

Representative dataset sampler Idea: pick a small dataset d for which a set of already generated programs p 1 ,...,p n behave like on the full dataset d = arg min d ⊆ D max i ∊ 1..n | cost(d, p i ) - cost(D, p i ) | p 1 p 2 Costs on small dataset d p 1 p 2 Costs on full dataset D 61

Representative dataset sampler Idea: pick a small dataset d for which a set of already generated programs p 1 ,...,p n behave like on the full dataset d = arg min d ⊆ D max i ∊ 1..n | cost(d, p i ) - cost(D, p i ) | p 1 p 2 Costs on small dataset d p 1 p 2 Costs on full dataset D Theorem: this sampler shrinks the candidate program search space 64 In evaluation: significant speedup of synthesis

So far... handling large datasets ● Iterative combination of synthesis and sampling ● New way to perform approximate empirical risk minimization ● Guarantees (in the paper) Representative Program dataset generator sampler 65

Contributions New probabilistic models 2. use probabilistic model parametrized Handling noise with p 1. synthesize p Input/output examples Handling large datasets incorrect examples Representative Program dataset generator sampler 66

Contributions New probabilistic models New probabilistic models 2. use probabilistic model parametrized Handling noise with p 1. synthesize p Input/output examples Handling large datasets incorrect examples Representative Program dataset generator sampler 67

Statistical programming tools A new breed of tools: Learn from large existing codebases (e.g. Big Code) to make predictions about programs 68

Statistical programming tools A new breed of tools: Learn from large existing codebases (e.g. Big Code) to make predictions about programs 1. Train machine learning model 69

Statistical programming tools A new breed of tools: Learn from large existing codebases (e.g. Big Code) to make predictions about programs 2. Make predictions 1. Train machine with model learning model 70

Statistical programming tools A new breed of tools: Learn from large existing codebases (e.g. Big Code) to make predictions about programs 2. Make predictions 1. Train machine with model learning model hard-coded model low precision 71

Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) 72

Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] 73

Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] Learn a mapping + name . slice Model will predict slice when it sees it after “ + name . ” This model comes from NLP 76

Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] Learn a mapping + name . slice Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] 77

Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] Learn a mapping + name . slice Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] Learn a mapping charAt slice Model will predict slice when it sees it after “ charAt ” 81 Relies on static analysis

Problem of existing systems Precision. They rarely predict the next statement Hindle et al.[ICSE’12] Learn a mapping + name . slice Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] Learn a mapping charAt slice Model will predict slice when it sees it after “ charAt ” 82 Relies on static analysis

Problem of existing systems Precision. They rarely predict the next statement Hindle et al.[ICSE’12] Learn a mapping + name . slice Very low precision Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] Learn a mapping charAt slice Model will predict slice when it sees it after “ charAt ” 83 Relies on static analysis

Problem of existing systems Precision. They rarely predict the next statement Hindle et al.[ICSE’12] Learn a mapping + name . slice Very low precision Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] Learn a mapping charAt slice Low precision for JavaScript Model will predict slice when it sees it after “ charAt ” 84 Relies on static analysis

Problem of existing systems Precision. They rarely predict the next statement Hindle et al.[ICSE’12] Learn a mapping + name . slice Very low precision Model will predict slice when it sees it after “ + name . ” Core problem: This model comes from NLP Existing machine learning models are limited and not Raychev et al.[PLDI’14] expressive enough Learn a mapping charAt slice Low precision for JavaScript Model will predict slice when it sees it after “ charAt ” 85 Relies on static analysis

Key idea: second-order learning Learn a program that parametrizes a probabilistic model that makes predictions. 86

Key idea: second-order learning Learn a program that parametrizes a probabilistic model that makes predictions. 1. Synthesize program describing a model 87

Key idea: second-order learning Learn a program that parametrizes a probabilistic model that makes predictions. 2. Train model i.e. learn the mapping 1. Synthesize program describing a model 88

Key idea: second-order learning Learn a program that parametrizes a probabilistic model that makes predictions. 2. Train model 3. Make predictions with this model i.e. learn the mapping 1. Synthesize program describing a model 89

Key idea: second-order learning Learn a program that parametrizes a probabilistic model that makes predictions. 2. Train model 3. Make predictions with this model i.e. learn the mapping 1. Synthesize prior models are described by simple hard-coded programs program describing a model Our approach: learn a better program 90

Training and evaluation Training example: slice input output 91

Training and evaluation Compute context with program p Training example: slice input output 92

Training and evaluation Compute context with program p Training example: slice Learn a mapping input output toUpperCase slice 93

Training and evaluation Compute context with program p Training example: slice Learn a mapping input output toUpperCase slice Evaluation example: 94

Training and evaluation Compute context with program p Training example: slice Learn a mapping input output toUpperCase slice Evaluation example: Compute context with program p 95

Training and evaluation Compute context with program p Training example: slice Learn a mapping input output toUpperCase slice Evaluation example: Compute context with program p predict completion slice 96

Training and evaluation Compute context with program p Training example: slice Learn a mapping input output toUpperCase slice Evaluation example: Compute context with program p predict completion ✔ slice 97

Observation Synthesis of probabilistic model can be done with the same optimization problem as before! evaluation data: input/output examples Our problem formulation: regularization constant p best = arg min errors (D, p) + λ r(p) p ∈ P regularizer penalizes long 98 programs

Observation Synthesis of probabilistic model can be done with the same optimization problem as before! cost (D, p) evaluation data: input/output examples Our problem formulation: regularization constant p best = arg min errors (D, p) + λ r(p) p ∈ P regularizer penalizes long 99 programs

So far... Handling noise Synthesizing a model Representative dataset sampler Techniques are generally applicable to program synthesis Next, application for “Big Code” called DeepSyn 100

Learning Programs from Noisy Data Veselin Raychev Pavol Bielik - PowerPoint PPT Presentation

Learning Programs from Noisy Data Veselin Raychev Pavol Bielik Martin Vechev Andreas Krause ETH Zurich Why learn programs from examples? Input/output examples often easier to provide examples than specification (e.g. in FlashFill) 2

Formal Modeling in Cognitive Science 1 Noisy Channel Model Channel Capacity Lecture 29: Noisy

Learning Nearest Neighbor Graphs from Noisy Distance Samples Noisy Distance Samples Blake Mason,

Learning to denoise without clean data Joshua Batson hep-ai seminar 10/18/18 Noisy data is

Learning Explanatory Rules from Noisy Data Richard Evans, Ed Grefenstette Overview Our system,

Multiple Programs How do programs communicate? 1 Multiple Programs How do programs communicate?

A Machine Learning Perspective on Managing Noisy Data Theodoros Rekatsinas | UW-Madison @thodrek

Discriminative Training February 19, 2013 Tuesday, February 19, 13 Noisy Channels Again p ( e )

Multi-parameter regularization for ill-posed problems with noisy right hand side and noisy

Noisy Channel Coding: Correlated Random Variables & Communication over a Noisy Channel Toni

Interactive Clustering Barna Saha Clustering Learning over Noisy Data Learn a classifier or

Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels Lu Jiang, Di Huang, Mason Liu,

LEARNING IN GAMES WITH NOISY PAYOFF OBSERVATIONS Background and motivation Preliminaries The

Multilingual and Noisy Data Challenges in Large-Scale Book Scanning Ashok C. Popat Staff

Secure Sketch for Set Distance on Noisy Data KMS Annual Meeting 2014 Jung Hee Cheon and Yongsoo

Metaprogramming Programs as Data Metaprogramming Programs that use other programs as data

A Probabilistic Model of Cross- situational Word Learning from Noisy and Ambiguous Data Afra

Extbase Cheat Sheet 1 Forge: http://forge.typo3.org/projects/show/typo3v4-mvc Patrick Lobacher

DNS domain suffix option for DHCPv6 (draft-yan-dhc-dhcpv6-opt-dnszone-03.txt) IETF 63 (Paris)

Conducting large-scale active and passive measurements of SSH deployments Oliver Gasser Master

Acquisition of domain- specific multiword expressions in Serbian Vesna Paji , Milo Paji

Complex Data Structures 2 Complex Data Structures Arrays, structs, objects How to handle

Textures III, Procedural Approaches Week 10, Mon Mar 19

African bull elephant by L0k1m0nk33 CC-BY-SA-3.0 Problems with spreadsheets African bull

A Radically Non-Morphemic Approach to Bidirectional Syncretisms Gereon Mller * Abstract This

Sambuz

Useful Links

Newsletter

Mail Us

Learning Programs from Noisy Data Veselin Raychev Pavol Bielik - PowerPoint PPT Presentation

Learning Programs from Noisy Data Veselin Raychev Pavol Bielik Martin Vechev Andreas Krause ETH Zurich Why learn programs from examples? Input/output examples often easier to provide examples than specification (e.g. in FlashFill) 2

Formal Modeling in Cognitive Science 1 Noisy Channel Model Channel Capacity Lecture 29: Noisy

Learning Nearest Neighbor Graphs from Noisy Distance Samples Noisy Distance Samples Blake Mason,

Learning to denoise without clean data Joshua Batson hep-ai seminar 10/18/18 Noisy data is

Learning Explanatory Rules from Noisy Data Richard Evans, Ed Grefenstette Overview Our system,

Multiple Programs How do programs communicate? 1 Multiple Programs How do programs communicate?

A Machine Learning Perspective on Managing Noisy Data Theodoros Rekatsinas | UW-Madison @thodrek

Discriminative Training February 19, 2013 Tuesday, February 19, 13 Noisy Channels Again p ( e )

Multi-parameter regularization for ill-posed problems with noisy right hand side and noisy

Noisy Channel Coding: Correlated Random Variables &amp; Communication over a Noisy Channel Toni

Interactive Clustering Barna Saha Clustering Learning over Noisy Data Learn a classifier or

Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels Lu Jiang, Di Huang, Mason Liu,

LEARNING IN GAMES WITH NOISY PAYOFF OBSERVATIONS Background and motivation Preliminaries The

Multilingual and Noisy Data Challenges in Large-Scale Book Scanning Ashok C. Popat Staff

Secure Sketch for Set Distance on Noisy Data KMS Annual Meeting 2014 Jung Hee Cheon and Yongsoo

Metaprogramming Programs as Data Metaprogramming Programs that use other programs as data

A Probabilistic Model of Cross- situational Word Learning from Noisy and Ambiguous Data Afra

Extbase Cheat Sheet 1 Forge: http://forge.typo3.org/projects/show/typo3v4-mvc Patrick Lobacher

DNS domain suffix option for DHCPv6 (draft-yan-dhc-dhcpv6-opt-dnszone-03.txt) IETF 63 (Paris)

Conducting large-scale active and passive measurements of SSH deployments Oliver Gasser Master

Acquisition of domain- specific multiword expressions in Serbian Vesna Paji , Milo Paji

Complex Data Structures 2 Complex Data Structures Arrays, structs, objects How to handle

Textures III, Procedural Approaches Week 10, Mon Mar 19

African bull elephant by L0k1m0nk33 CC-BY-SA-3.0 Problems with spreadsheets African bull

A Radically Non-Morphemic Approach to Bidirectional Syncretisms Gereon Mller * Abstract This

Sambuz

Useful Links

Newsletter

Mail Us

Noisy Channel Coding: Correlated Random Variables & Communication over a Noisy Channel Toni