Learning Programs from Noisy Data Veselin Raychev Pavol Bielik - - PowerPoint PPT Presentation

learning programs from noisy data
SMART_READER_LITE
LIVE PREVIEW

Learning Programs from Noisy Data Veselin Raychev Pavol Bielik - - PowerPoint PPT Presentation

Learning Programs from Noisy Data Veselin Raychev Pavol Bielik Martin Vechev Andreas Krause ETH Zurich Why learn programs from examples? Input/output examples often easier to provide examples than specification (e.g. in FlashFill) 2


slide-1
SLIDE 1

Learning Programs from Noisy Data

Veselin Raychev Pavol Bielik Martin Vechev Andreas Krause ETH Zurich

slide-2
SLIDE 2

Why learn programs from examples?

2

Input/output examples

  • ften easier to provide

examples than specification (e.g. in FlashFill)

slide-3
SLIDE 3

Why learn programs from examples?

3

Input/output examples

p

such that

p( )= ... p( )=

learn a function

?

  • ften easier to provide

examples than specification (e.g. in FlashFill)

slide-4
SLIDE 4

Why learn programs from examples?

4

Input/output examples

p

such that

p( )= ... p( )=

learn a function

?

  • ften easier to provide

examples than specification (e.g. in FlashFill) the user may make a mistake in the examples

slide-5
SLIDE 5

Why learn programs from examples?

5

Input/output examples

p

such that

p( )= ... p( )=

learn a function

?

Actual goal: produce p that the user really wanted and tried to specify

  • ften easier to provide

examples than specification (e.g. in FlashFill) the user may make a mistake in the examples

slide-6
SLIDE 6

Why learn programs from examples?

6

Input/output examples

p

such that

p( )= ... p( )=

learn a function

?

Actual goal: produce p that the user really wanted and tried to specify

  • ften easier to provide

examples than specification (e.g. in FlashFill) the user may make a mistake in the examples Key problem of synthesis: overfits, not robust to noise

slide-7
SLIDE 7

Learning Programs from Data: Defining Dimensions

7

Number of Examples Handling errors in the dataset Learned program complexity

slide-8
SLIDE 8

Learning Programs from Data: Defining Dimensions

8

Number of Examples Handling errors in the dataset Learned program complexity Program synthesis (PL)

no tens interesting programs

slide-9
SLIDE 9

Learning Programs from Data: Defining Dimensions

9

Number of Examples Handling errors in the dataset Learned program complexity Program synthesis (PL)

no tens interesting programs

Deep learning (ML)

yes millions simple, but unexplainable functions

slide-10
SLIDE 10

Learning Programs from Data: Defining Dimensions

10

Number of Examples Handling errors in the dataset Learned program complexity Program synthesis (PL)

no tens interesting programs

Deep learning (ML)

yes millions simple, but unexplainable functions

This paper bridges a gap

yes millions interesting programs

slide-11
SLIDE 11

Learning Programs from Data: Defining Dimensions

11

Number of Examples Handling errors in the dataset Learned program complexity Program synthesis (PL)

no tens interesting programs

Deep learning (ML)

yes millions simple, but unexplainable functions

This paper bridges a gap

yes millions interesting programs expands capabilities of existing synthesizers

slide-12
SLIDE 12

Learning Programs from Data: Defining Dimensions

12

Number of Examples Handling errors in the dataset Learned program complexity Program synthesis (PL)

no tens interesting programs

Deep learning (ML)

yes millions simple, but unexplainable functions

This paper bridges a gap

yes millions interesting programs new state-of-the-art precision for programming tasks expands capabilities of existing synthesizers

slide-13
SLIDE 13

Learning Programs from Data: Defining Dimensions

13

Number of Examples Handling errors in the dataset Learned program complexity Program synthesis (PL)

no tens interesting programs

Deep learning (ML)

yes millions simple, but unexplainable functions

This paper bridges a gap

yes millions interesting programs new state-of-the-art precision for programming tasks expands capabilities of existing synthesizers

Bridges gap between ML and PL Advances both areas

slide-14
SLIDE 14

In this paper

14

  • A general framework that handles

○ errors in training dataset ○ learns statistical models on data ○ handles synthesis with millions of examples

  • Instantiated with two synthesizers

○ generalize existing works

slide-15
SLIDE 15

Handling noise New probabilistic models Handling large datasets

Contributions

15

Input/output examples incorrect examples

  • 1. synthesize

p

  • 2. use

probabilistic model parametrized with p

Representative dataset sampler Program generator

slide-16
SLIDE 16

Handling noise New probabilistic models Handling large datasets

Contributions

16

Handling noise

Input/output examples incorrect examples

  • 1. synthesize

p

  • 2. use

probabilistic model parametrized with p

Representative dataset sampler Program generator

slide-17
SLIDE 17

Synthesis with noise: usage model

17

Input/output examples

slide-18
SLIDE 18

Synthesis with noise: usage model

18

Input/output examples Domain Specific Language

slide-19
SLIDE 19

Synthesis with noise: usage model

19

Input/output examples synthesizer Domain Specific Language

slide-20
SLIDE 20

Synthesis with noise: usage model

20

Input/output examples synthesizer Domain Specific Language

p

such that

p( )= ... p( )=

slide-21
SLIDE 21

Synthesis with noise: usage model

21

Input/output examples synthesizer Domain Specific Language

p

such that

p( )= ... p( )=

incorrect example (e.g. a typo)

slide-22
SLIDE 22

Synthesis with noise: usage model

22

Input/output examples synthesizer Domain Specific Language

p

such that

p( )= ... p( )=

incorrect example (e.g. a typo)

p( )≠

slide-23
SLIDE 23

Synthesis with noise: usage model

23

Input/output examples synthesizer Domain Specific Language

p

such that

p( )= ... p( )=

incorrect example (e.g. a typo)

p( )≠

new kind of feedback from synthesizer

✔ ❌ ✔ ✔ ✔

slide-24
SLIDE 24

Synthesis with noise: usage model

24

Input/output examples synthesizer Domain Specific Language

p

such that

p( )= ... p( )=

incorrect example (e.g. a typo)

p( )≠

new kind of feedback from synthesizer

  • Tell user to

remove suspicious example, or

  • Ask for more

examples

✔ ❌ ✔ ✔ ✔

slide-25
SLIDE 25

if return if return if return if return return

Issue:

such a program makes no errors and it may be the only solution to the minimization problem

Handling noise: problem statement

25

D: Input/output examples incorrect examples

synthesizer

pbest = arg min errors(D, p)

p∈P

Too long program, hardcodes the input/outputs. Synthesis must penalize such answers dataset of input/output examples space of possible programs in DSL

Our problem formulation: pbest = arg min errors(D, p) + λr(p)

p∈P

regularizer penalizes long programs regularization constant error rate

slide-26
SLIDE 26

Noisy synthesis using SMT

26

pbest = arg min errors(D, p) + λr(p)

p∈P

number of instructions total solution

cost

slide-27
SLIDE 27

Noisy synthesis using SMT

27

pbest = arg min errors(D, p) + λr(p)

p∈P

number of instructions total solution

cost

slide-28
SLIDE 28

Ψ

Noisy synthesis using SMT

28

encoding

err1 = if p( )= then 0 else 1 err2 = if p( )= then 0 else 1 err3 = if p( )= then 0 else 1 errors = err1 + err2 + err3 p ∈ Pr (with r instructions)

pbest = arg min errors(D, p) + λr(p)

p∈P

number of instructions total solution

cost

formula given to SMT solver

slide-29
SLIDE 29

cost

number of errors 1 2 3

r

1 0.6 1.6 2.6 3.6 2 1.2 2.2 3.2 4.2 3 1.8 2.8 3.8 4.8

Ψ

Noisy synthesis using SMT

29

encoding

err1 = if p( )= then 0 else 1 err2 = if p( )= then 0 else 1 err3 = if p( )= then 0 else 1 errors = err1 + err2 + err3 p ∈ Pr (with r instructions)

pbest = arg min errors(D, p) + λr(p)

p∈P

number of instructions total solution

cost

Ask a number of SMT queries in increasing value

  • f solution cost

e.g. for

λ = 0.6

costs are formula given to SMT solver

slide-30
SLIDE 30

cost

number of errors 1 2 3

r

1 0.6 1.6 2.6 3.6 2 1.2 2.2 3.2 4.2 3 1.8 2.8 3.8 4.8

Ψ

Noisy synthesis using SMT

30

encoding

err1 = if p( )= then 0 else 1 err2 = if p( )= then 0 else 1 err3 = if p( )= then 0 else 1 errors = err1 + err2 + err3 p ∈ Pr (with r instructions)

pbest = arg min errors(D, p) + λr(p)

p∈P

number of instructions total solution

cost

Ask a number of SMT queries in increasing value

  • f solution cost

e.g. for

λ = 0.6

costs are

UNSAT

formula given to SMT solver

slide-31
SLIDE 31

cost

number of errors 1 2 3

r

1 0.6 1.6 2.6 3.6 2 1.2 2.2 3.2 4.2 3 1.8 2.8 3.8 4.8

Ψ

Noisy synthesis using SMT

31

encoding

err1 = if p( )= then 0 else 1 err2 = if p( )= then 0 else 1 err3 = if p( )= then 0 else 1 errors = err1 + err2 + err3 p ∈ Pr (with r instructions)

pbest = arg min errors(D, p) + λr(p)

p∈P

number of instructions total solution

cost

Ask a number of SMT queries in increasing value

  • f solution cost

e.g. for

λ = 0.6

costs are

UNSAT UNSAT

formula given to SMT solver

slide-32
SLIDE 32

cost

number of errors 1 2 3

r

1 0.6 1.6 2.6 3.6 2 1.2 2.2 3.2 4.2 3 1.8 2.8 3.8 4.8

Ψ

Noisy synthesis using SMT

32

encoding

err1 = if p( )= then 0 else 1 err2 = if p( )= then 0 else 1 err3 = if p( )= then 0 else 1 errors = err1 + err2 + err3 p ∈ Pr (with r instructions)

pbest = arg min errors(D, p) + λr(p)

p∈P

number of instructions total solution

cost

Ask a number of SMT queries in increasing value

  • f solution cost

e.g. for

λ = 0.6

costs are

UNSAT UNSAT UNSAT

formula given to SMT solver

slide-33
SLIDE 33

cost

number of errors 1 2 3

r

1 0.6 1.6 2.6 3.6 2 1.2 2.2 3.2 4.2 3 1.8 2.8 3.8 4.8

Ψ

Noisy synthesis using SMT

33

encoding

err1 = if p( )= then 0 else 1 err2 = if p( )= then 0 else 1 err3 = if p( )= then 0 else 1 errors = err1 + err2 + err3 p ∈ Pr (with r instructions)

pbest = arg min errors(D, p) + λr(p)

p∈P

number of instructions total solution

cost

Ask a number of SMT queries in increasing value

  • f solution cost

e.g. for

λ = 0.6

costs are

UNSAT UNSAT UNSAT UNSAT

formula given to SMT solver

slide-34
SLIDE 34

cost

number of errors 1 2 3

r

1 0.6 1.6 2.6 3.6 2 1.2 2.2 3.2 4.2 3 1.8 2.8 3.8 4.8

Ψ

Noisy synthesis using SMT

34

encoding

err1 = if p( )= then 0 else 1 err2 = if p( )= then 0 else 1 err3 = if p( )= then 0 else 1 errors = err1 + err2 + err3 p ∈ Pr (with r instructions)

pbest = arg min errors(D, p) + λr(p)

p∈P

number of instructions total solution

cost

Ask a number of SMT queries in increasing value

  • f solution cost

e.g. for

λ = 0.6

costs are

UNSAT UNSAT UNSAT UNSAT

SAT

formula given to SMT solver

slide-35
SLIDE 35

cost

number of errors 1 2 3

r

1 0.6 1.6 2.6 3.6 2 1.2 2.2 3.2 4.2 3 1.8 2.8 3.8 4.8

Ψ

Noisy synthesis using SMT

35

encoding

err1 = if p( )= then 0 else 1 err2 = if p( )= then 0 else 1 err3 = if p( )= then 0 else 1 errors = err1 + err2 + err3 p ∈ Pr (with r instructions)

pbest = arg min errors(D, p) + λr(p)

p∈P

number of instructions total solution

cost

Ask a number of SMT queries in increasing value

  • f solution cost

e.g. for

λ = 0.6

costs are

UNSAT UNSAT UNSAT UNSAT

SAT best program is with two instructions and makes one error

formula given to SMT solver

slide-36
SLIDE 36

Noisy synthesizer: example

36

Take an actual synthesizer and show that we can make it handle noise

slide-37
SLIDE 37

Implementation: BitSyn

37

For BitStream programs, using Z3 similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11] Example program:

function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o) } synthesized, short loop-free programs

slide-38
SLIDE 38

Implementation: BitSyn

38

For BitStream programs, using Z3 similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11] Example program:

function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o) } synthesized, short loop-free programs

Question: how well does our synthesizer discover noise? (in programs from prior work)

slide-39
SLIDE 39

Implementation: BitSyn

39

For BitStream programs, using Z3 similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11] Example program:

function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o) } synthesized, short loop-free programs

Question: how well does our synthesizer discover noise? (in programs from prior work)

slide-40
SLIDE 40

Implementation: BitSyn

40

For BitStream programs, using Z3 similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11] Example program:

function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o) } synthesized, short loop-free programs

Question: how well does our synthesizer discover noise? (in programs from prior work) best area to be in. empirically pick λ here

slide-41
SLIDE 41

So far… handling noise

  • Problem statement and regularization
  • Synthesis procedure using SMT
  • Presented one synthesizer

Handling noise enables us to solve new classes of problems beyond normal synthesis

41

slide-42
SLIDE 42

Handling large datasets Handling noise New probabilistic models

Contributions

42

Input/output examples incorrect examples

  • 1. synthesize

p

  • 2. use

probabilistic model parametrized with p

Representative dataset sampler Program generator

slide-43
SLIDE 43

Handling large datasets Handling large datasets Handling noise New probabilistic models

Contributions

43

Input/output examples incorrect examples

  • 1. synthesize

p

  • 2. use

probabilistic model parametrized with p

Representative dataset sampler Program generator

slide-44
SLIDE 44

Fundamental problem

44

Large number of examples: pbest = arg min cost(D, p)

p∈P

slide-45
SLIDE 45

Fundamental problem

45

Large number of examples: pbest = arg min cost(D, p) D

Millions of input/output examples p∈P

slide-46
SLIDE 46

Fundamental problem

46

Large number of examples: pbest = arg min cost(D, p) D

Millions of input/output examples computing cost(D, p)

O( |D| )

p∈P

slide-47
SLIDE 47

Fundamental problem

47

Large number of examples: pbest = arg min cost(D, p) D

Millions of input/output examples computing cost(D, p)

O( |D| )

Synthesis: practically intractable p∈P

slide-48
SLIDE 48

Fundamental problem

48

Large number of examples: pbest = arg min cost(D, p) D

Millions of input/output examples computing cost(D, p)

O( |D| )

Synthesis: practically intractable Key idea: iterative synthesis on fraction of examples p∈P

slide-49
SLIDE 49

Our solution: two components

given dataset d, finds best program

49

pbest = arg min cost(d, p)

p∈P

Program generator

Synthesizer for small number of examples

slide-50
SLIDE 50

Our solution: two components

given dataset d, finds best program

50

pbest = arg min cost(d, p)

p∈P

Program generator Dataset sampler

Picks dataset d ⊆ D

Synthesizer for small number of examples We introduce representative dataset sampler Generalize a user providing input/output examples

slide-51
SLIDE 51

In a loop

51

Representative dataset sampler Program generator

slide-52
SLIDE 52

In a loop

52

Representative dataset sampler Program generator

Start with a small random sample d⊆D Iteratively generate programs and samples.

slide-53
SLIDE 53

In a loop

53

Representative dataset sampler Program generator

Start with a small random sample d⊆D Iteratively generate programs and samples.

Program generator

p1

slide-54
SLIDE 54

In a loop

54

Representative dataset sampler Program generator

Start with a small random sample d⊆D Iteratively generate programs and samples.

Program generator

p1

Representative dataset sampler

d

slide-55
SLIDE 55

In a loop

55

Representative dataset sampler Program generator

Start with a small random sample d⊆D Iteratively generate programs and samples.

Program generator

p1

Program generator

p1 , p2

Representative dataset sampler

d

slide-56
SLIDE 56

In a loop

56

Representative dataset sampler Program generator

Start with a small random sample d⊆D Iteratively generate programs and samples.

Program generator

p1

Program generator

p1 , p2

Representative dataset sampler Representative dataset sampler

d

slide-57
SLIDE 57

In a loop

57

Representative dataset sampler Program generator

Start with a small random sample d⊆D Iteratively generate programs and samples.

Program generator

p1

Program generator

p1 , p2

Representative dataset sampler

pbest

Representative dataset sampler

d

slide-58
SLIDE 58

In a loop

58

Representative dataset sampler Program generator

Start with a small random sample d⊆D Iteratively generate programs and samples.

Program generator

p1

Program generator

p1 , p2

Representative dataset sampler

pbest

Representative dataset sampler

d

Algorithm generalizes synthesis by examples techniques

slide-59
SLIDE 59

Representative dataset sampler

59

Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |

slide-60
SLIDE 60

Representative dataset sampler

60

Costs on small dataset d

p1 p2

Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |

slide-61
SLIDE 61

Representative dataset sampler

61

Costs on small dataset d Costs on full dataset D

p1 p2 p1 p2

Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |

slide-62
SLIDE 62

Representative dataset sampler

62

Costs on small dataset d Costs on full dataset D

p1 p2 p1 p2

Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |

slide-63
SLIDE 63

Representative dataset sampler

63

Costs on small dataset d Costs on full dataset D

p1 p2

Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |

p1 p2

slide-64
SLIDE 64

Representative dataset sampler

64

Costs on small dataset d Costs on full dataset D

p1 p2 p1 p2

Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |

Theorem: this sampler shrinks the candidate program search space In evaluation: significant speedup of synthesis

slide-65
SLIDE 65

So far... handling large datasets

  • Iterative combination of synthesis and sampling
  • New way to perform approximate

empirical risk minimization

  • Guarantees (in the paper)

65

Representative dataset sampler Program generator

slide-66
SLIDE 66

Handling noise New probabilistic models Handling large datasets

Contributions

66

Input/output examples incorrect examples

  • 1. synthesize

p

  • 2. use

probabilistic model parametrized with p

Representative dataset sampler Program generator

slide-67
SLIDE 67

Handling noise New probabilistic models Handling large datasets

Contributions

67

Input/output examples incorrect examples

New probabilistic models

  • 1. synthesize

p

  • 2. use

probabilistic model parametrized with p

Representative dataset sampler Program generator

slide-68
SLIDE 68

Statistical programming tools

68

A new breed of tools: Learn from large existing codebases (e.g. Big Code) to make predictions about programs

slide-69
SLIDE 69

Statistical programming tools

69

  • 1. Train machine

learning model A new breed of tools: Learn from large existing codebases (e.g. Big Code) to make predictions about programs

slide-70
SLIDE 70

Statistical programming tools

70

  • 1. Train machine

learning model

  • 2. Make predictions

with model A new breed of tools: Learn from large existing codebases (e.g. Big Code) to make predictions about programs

slide-71
SLIDE 71

Statistical programming tools

71

  • 1. Train machine

learning model

  • 2. Make predictions

with model A new breed of tools: Learn from large existing codebases (e.g. Big Code) to make predictions about programs hard-coded model low precision

slide-72
SLIDE 72

Existing machine learning models

Essentially remember mapping from context in training data to prediction (with probabilities)

72

slide-73
SLIDE 73

Hindle et al.[ICSE’12]

Existing machine learning models

Essentially remember mapping from context in training data to prediction (with probabilities)

73

slide-74
SLIDE 74

Hindle et al.[ICSE’12]

Existing machine learning models

Essentially remember mapping from context in training data to prediction (with probabilities)

74

slide-75
SLIDE 75

Hindle et al.[ICSE’12]

Existing machine learning models

Essentially remember mapping from context in training data to prediction (with probabilities)

75

slide-76
SLIDE 76

Hindle et al.[ICSE’12]

Existing machine learning models

Essentially remember mapping from context in training data to prediction (with probabilities)

76

Learn a mapping Model will predict slice when it sees it after “+ name .” This model comes from NLP + name . slice

slide-77
SLIDE 77

Raychev et al.[PLDI’14] Hindle et al.[ICSE’12]

Existing machine learning models

Essentially remember mapping from context in training data to prediction (with probabilities)

77

Learn a mapping Model will predict slice when it sees it after “+ name .” This model comes from NLP + name . slice

slide-78
SLIDE 78

Raychev et al.[PLDI’14] Hindle et al.[ICSE’12]

Existing machine learning models

Essentially remember mapping from context in training data to prediction (with probabilities)

78

Learn a mapping Model will predict slice when it sees it after “+ name .” This model comes from NLP + name . slice

slide-79
SLIDE 79

Raychev et al.[PLDI’14] Hindle et al.[ICSE’12]

Existing machine learning models

Essentially remember mapping from context in training data to prediction (with probabilities)

79

Learn a mapping Model will predict slice when it sees it after “+ name .” This model comes from NLP + name . slice

slide-80
SLIDE 80

Raychev et al.[PLDI’14] Hindle et al.[ICSE’12]

Existing machine learning models

Essentially remember mapping from context in training data to prediction (with probabilities)

80

Learn a mapping Model will predict slice when it sees it after “+ name .” This model comes from NLP + name . slice

slide-81
SLIDE 81

Raychev et al.[PLDI’14] Hindle et al.[ICSE’12]

Existing machine learning models

Essentially remember mapping from context in training data to prediction (with probabilities)

81

Learn a mapping Model will predict slice when it sees it after “+ name .” This model comes from NLP + name . slice Learn a mapping Model will predict slice when it sees it after “charAt” Relies on static analysis charAt slice

slide-82
SLIDE 82

Problem of existing systems

  • Precision. They rarely predict the next statement

82

Raychev et al.[PLDI’14] Hindle et al.[ICSE’12] Learn a mapping Model will predict slice when it sees it after “+ name .” This model comes from NLP + name . slice Learn a mapping Model will predict slice when it sees it after “charAt” Relies on static analysis charAt slice

slide-83
SLIDE 83

Problem of existing systems

  • Precision. They rarely predict the next statement

83

Raychev et al.[PLDI’14] Hindle et al.[ICSE’12] Learn a mapping Model will predict slice when it sees it after “+ name .” This model comes from NLP + name . slice Learn a mapping Model will predict slice when it sees it after “charAt” Relies on static analysis charAt slice

Very low precision

slide-84
SLIDE 84

Problem of existing systems

  • Precision. They rarely predict the next statement

84

Raychev et al.[PLDI’14] Hindle et al.[ICSE’12] Learn a mapping Model will predict slice when it sees it after “+ name .” This model comes from NLP + name . slice Learn a mapping Model will predict slice when it sees it after “charAt” Relies on static analysis charAt slice

Very low precision Low precision for JavaScript

slide-85
SLIDE 85

Problem of existing systems

  • Precision. They rarely predict the next statement

85

Raychev et al.[PLDI’14] Hindle et al.[ICSE’12] Learn a mapping Model will predict slice when it sees it after “+ name .” This model comes from NLP + name . slice Learn a mapping Model will predict slice when it sees it after “charAt” Relies on static analysis charAt slice

Very low precision Low precision for JavaScript Core problem: Existing machine learning models are limited and not expressive enough

slide-86
SLIDE 86

Key idea: second-order learning

86

Learn a program that parametrizes a probabilistic model that makes predictions.

slide-87
SLIDE 87

Key idea: second-order learning

87

  • 1. Synthesize

program describing a model Learn a program that parametrizes a probabilistic model that makes predictions.

slide-88
SLIDE 88

Key idea: second-order learning

88

  • 1. Synthesize

program describing a model

i.e. learn the mapping

  • 2. Train model

Learn a program that parametrizes a probabilistic model that makes predictions.

slide-89
SLIDE 89

Key idea: second-order learning

89

  • 1. Synthesize

program describing a model

i.e. learn the mapping

  • 2. Train model
  • 3. Make predictions

with this model Learn a program that parametrizes a probabilistic model that makes predictions.

slide-90
SLIDE 90

Key idea: second-order learning

90

  • 1. Synthesize

program describing a model

i.e. learn the mapping

  • 2. Train model
  • 3. Make predictions

with this model Learn a program that parametrizes a probabilistic model that makes predictions.

prior models are described by simple hard-coded programs Our approach: learn a better program

slide-91
SLIDE 91
  • utput

Training and evaluation

Training example:

91

slice

input

slide-92
SLIDE 92
  • utput

Training and evaluation

Training example:

92

slice Compute context with program p

input

slide-93
SLIDE 93
  • utput

Training and evaluation

Training example:

93

slice Learn a mapping toUpperCase slice Compute context with program p

input

slide-94
SLIDE 94
  • utput

Training and evaluation

Training example:

94

slice

Evaluation example:

Learn a mapping toUpperCase slice Compute context with program p

input

slide-95
SLIDE 95
  • utput

Training and evaluation

Training example:

95

slice

Evaluation example:

Learn a mapping toUpperCase slice Compute context with program p Compute context with program p

input

slide-96
SLIDE 96
  • utput

Training and evaluation

Training example:

96

slice

Evaluation example:

Learn a mapping toUpperCase slice Compute context with program p slice Compute context with program p predict completion

input

slide-97
SLIDE 97
  • utput

Training and evaluation

Training example:

97

slice

Evaluation example:

Learn a mapping toUpperCase slice Compute context with program p slice Compute context with program p predict completion

input

slide-98
SLIDE 98

Observation

Synthesis of probabilistic model can be done with the same optimization problem as before!

98

Our problem formulation: pbest = arg min errors(D, p) + λr(p)

p∈P

regularizer penalizes long programs regularization constant evaluation data: input/output examples

slide-99
SLIDE 99

Observation

Synthesis of probabilistic model can be done with the same optimization problem as before!

99

Our problem formulation: pbest = arg min errors(D, p) + λr(p)

p∈P

regularizer penalizes long programs regularization constant evaluation data: input/output examples

cost(D, p)

slide-100
SLIDE 100

So far...

100

Handling noise Synthesizing a model Representative dataset sampler Techniques are generally applicable to program synthesis Next, application for “Big Code” called DeepSyn

slide-101
SLIDE 101

DeepSyn: Training

Trained on 100’000 JavaScript files from GitHub

101

Program synthesizer

Representative dataset sampler Program generator

D

slide-102
SLIDE 102

DeepSyn: Training

Trained on 100’000 JavaScript files from GitHub

102

Program synthesizer

Representative dataset sampler Program generator

D Train model on full data and best program p p D

slide-103
SLIDE 103

DeepSyn: Evaluation

50’000 evaluation files (not used in training or synthesis) API completion task

103

slide-104
SLIDE 104

DeepSyn: Evaluation

50’000 evaluation files (not used in training or synthesis) API completion task

104

Conditioning program p Accuracy Last two tokens, Hindle et al.[ICSE’12] 22.2% Last two APIs per object, Raychev et al.[PLDI’14] 30.4% Program synthesis with noise 46.3% Program synthesis with noise + dataset sampler 50.4%

This work

slide-105
SLIDE 105

DeepSyn: Evaluation

50’000 evaluation files (not used in training or synthesis) API completion task

105

Conditioning program p Accuracy Last two tokens, Hindle et al.[ICSE’12] 22.2% Last two APIs per object, Raychev et al.[PLDI’14] 30.4% Program synthesis with noise 46.3% Program synthesis with noise + dataset sampler 50.4% We can explain best program. It looks at API preceding completion position and at tokens prior to these APIs.

This work

slide-106
SLIDE 106

Scalability

Handling noise Second-order learning Handling large datasets

Q&A

106

Input/output examples incorrect examples

  • 1. synthesize

p

  • 2. use

probabilistic model parametrized with p

Representative dataset sampler Program generator

Synthesis of probabilistic models Extending synthesizers to handle noise

slide-107
SLIDE 107

Scalability

Handling noise Second-order learning Handling large datasets

Q&A

107

Input/output examples incorrect examples

  • 1. synthesize

p

  • 2. use

probabilistic model parametrized with p

Representative dataset sampler Program generator

Synthesis of probabilistic models Extending synthesizers to handle noise Bridges gap between ML and PL Advances both areas

slide-108
SLIDE 108

What did we synthesize?

108

Left PrevActor WriteAction WriteValue PrevActor WriteAction PrevLeaf WriteValue PrevLeaf WriteValue