Deep Learning over “Big Code” for Program Analysis and Synthesis
Swarat Chaudhuri, Vijay Murali, Chris Jermaine
Deep Learning over Big Code for Program Analysis and Synthesis - - PowerPoint PPT Presentation
Deep Learning over Big Code for Program Analysis and Synthesis Swarat Chaudhuri, Vijay Murali, Chris Jermaine Programming is hard Program synthesis, debugging, verification, repair Can we automate these processes? Decades of prior
Deep Learning over “Big Code” for Program Analysis and Synthesis
Swarat Chaudhuri, Vijay Murali, Chris Jermaine
Program synthesis, debugging, verification, repair… Can we automate these processes?
(sketch)
property
3
… How to specify formally? …
reliance on formal specifications
4
great opportunities for specification learning
learning about broadly shared facets of programs
learning (ML) can be leveraged
Number of open-source projects on GitHub 19.4 million active projects on GitHub (Oct. 2016)
5
images, audio, etc.
Machine . Learning Formal Methods Good at handling uncertainty Good at handling semantics Bayou
6
completion, analysis
language
based synthesis
7
8
9
10
“Find a program that fits a specification.”
Many kinds of (formal) specifications: Input-output examples, traces, constraints, types… Flash-fill (Microsoft Excel) can synthesize macros from a few examples [Gulwani 2011] “One of the shock-and-awe features of Excel 2013.” — Ars Technica
BayouSynth
1. Java: works with a general purpose PL 2. APIs: synthesizes code involving APIs which are needed for most common tasks 3. Uncertainty: no need of a full formal specification
11
What do human programmers have that synthesizers don’t?
12
In formal methods and synthesis, specification is a Boolean
Humans can generalize imprecise and incomplete specifications.
13
Combinatorial Syntax-Guided Synthesizer
𝜒 Prog ⇒ 𝜒
Synthesizer
BufferedReader new BufferedReader(…);
The space of programs of size grows exponentially in . This is a fundamental bottleneck for program synthesis. Current solutions assume simple syntactic program model,
Humans can zoom in on the relevant parts of this search space.
14
Lesson from other areas of AI: Use data!
15
16
Programmers use this data to build mental models of how to design programs. This model lets them interpret programmer intent and “guess” the structure of solutions.
OK, so I need to
parse it, and…
Probabilistic models let us mimic this process inside a synthesizer
17
18
An idealized program Evidence about what the program does Candidate implementations, based on posterior Prior distribution
programs and their associated evidence Learned from data
Synthesizer
Posterior distribution
Neural Sketch Learning for Conditional Program Generation. Vijayaraghavan Murali, Letao Qi, Swarat Chaudhuri, and Chris Jermaine. International Conference on Learning Representations [ICLR] 2018 https://arxiv.org/abs/1703.05698
19
20
Prog
21
Language capturing essence of API usage in Java
API method name API call Prog
22
task
23
Prog 𝑌
API Calls: Set of the names of API methods called. “readLine”, “close” Types: Set of types on which API methods are called. “FileReader” Keywords: Textual description of programming task. “read from file”, “print list”
24
Prog 𝑌
evidence to programs.
Prog 𝑌
25
, i.e.,
Prog 𝑌
26
int read(String name) { FileReader fr; BufferedReader r; String s; int n = 0; try { fr = new FileReader(name); r = new BufferedReader(fr); for (; (s=r.readLine()) != null; n++); r.close(); return n; } catch (IOException e) { return
} void read() throws IOException { FileReader in = new FileReader(“a.txt”); BufferedReader br = new BufferedReader(in); String line; while ((line=br.readLine())! =null) { System.out.println(line); } br.close(); }
Both programs perform the task “reading from a file”
27
1. From superficial differences irrelevant for synthesis
Superficial differences in programs make it hard for probabilistic model to learn patterns 2. From knowledge already known
Probabilistic model learned from data cannot guarantee type safety constraints and rules
28
unknown patterns from data
semantic and syntactic constraints Learn to generate programs at a higher level
synthesizer to produce final code
29
: sketch, a syntactic abstraction of a program
known knowledge
[ call FileReader.new(String) call BufferedReader.new(FileReader) loop ([BufferedReader.readLine()]) { skip } call BufferedReader.close() ]
Prog 𝑌
30
Program-Sketch relation is many-to-one
is not learned from data
Prog 𝑌 Prog 𝑍 𝛽(Prog) 𝑄(Prog 𝑍 )
31
New goal: “Sketch-learning”
Learn distribution
Prog 𝑌 𝑍
32
Two-step synthesis
type-safety constraints, …
Prog 𝑌 𝑍
33
Sketches can be defined in many ways. But one has to be careful…
would suffer
hard to compute, would suffer
Our sketch language designed for API-using Java programs
34
35
API call Type Abstract API call
Training
Corpus
Evidences & Sketches from corpus Draft Program with Evidences
Inference
Statistical Learning (Deep Neural Network)
Distribution over Evidences & Sketches
𝑸(𝒁 𝒀)
foo(File f) { /// read file } Feature extractor
“read” “file” 𝑌1: 𝑍1 foo(File f) { f.read(); f.close(); }
Synthesized Program
Combinatorial Search
36
37
Underlying thesis: Bugs are anomalous behaviors.
[Engler et al., 2002; Hangal & Lam, 2002]
A specification is a commonplace pattern in program behaviors seen in the real world. Learn specifications from examples of program behavior.
[Ammons et al., 2002; Raychev et al., 2014]
38
range of specifications from a large, heterogeneous corpus
a measure of its correctness
Java/Android code
“mirrored”
39
Bayesian Specification Learning for Finding API Usage Errors Vijayaraghavan Murali, Swarat Chaudhuri, and Chris Jermaine. Foundations of Software Engineering [FSE] 2017 https://arxiv.org/abs/1703.01370
40
This dialog box cannot be closed
AlertDialog.Builder b = new AlertDialog.Builder(this); b.setTitle(R.string.title_variable_to_insert); if (focus.getId() == R.id.tmpl_item) { b.setItems(R.array.templatebodyvars, this); } else if (focus.getId() == R.id.tmpl_footer) { b.setItems(R.array.templateheaderfootervars, this); } b.show();
41
Prog 𝑌
42
(dynamic) or static analysis (symbolic)
Prog 𝑌 𝑍
43
: Generative probabilistic automaton
[Murawski & Ouaknine, 2005]
AlertDialog.Builder b = new AlertDialog.Builder(…); b.setTitle(…); if (…) { b.setItems (…); } else if (…) { b.setItems(…); } b.show();
44
1 4 11 T new A(…) setTitle(…) setItems(…) show()
1.0 0.33 1.0
11′′ T′′ show()
1.0
setItems(…)
0.33 1.0 1.0
5 8
𝜗 𝜗
11′ T′ show()
1.0
𝜗
0.33
Produced using static analysis
From data, learn a distribution over program behaviors given evidence, i.e.,
Prog 𝑌 𝑍
45
computation
Prog 𝑌 𝑍
46
… …
Training
Corpus
Features & Behaviors from corpus Test Program F
Inference
Features & Behaviors
Statistical Learning (Deep Neural Network)
Distribution over Features & Behaviors
𝑸(𝒁 |𝐐𝐬𝐩𝐡) 𝑸(𝒁 𝒀)
Anomaly Score (Aggregate)
…
foo(File f) { f.read(); f.close(); } Feature extractor
47
dealing with uncertainty
48
49
Embedding 𝑿𝟐 𝑿𝟑 𝑿𝟒 𝑿𝟓 softmax
50
import tensorflow as tf
51
I am a student Je suis un étudiant 5 0 4 1
52
are 0 except a 1 at index
+Easy to encode, no unintended relationships between words
their one-hot representations
Word One-hot encoding [ 1, 0, 0, … 0 ] [ 0, 1, 0, … 0 ] [ 0, 0, 0, … 1 ]
53
where is an “activation function”
∗ + 𝜏 𝒚 𝒛 W b
54
55
# input_size: size of input vocabulary # output_size: size of output as needed x = tf.placeholder(tf.int32, [1, input_size]) W = tf.get_variable(‘W’, [input_size,
b = tf.get_variable(‘b’, [output_size]) y = tf.sigmoid(tf.add(tf.multiply(W, x), b))
𝑧 = 𝜏(W . 𝑦 + b)
∗ + 𝜏 𝒚 𝒛 W b
56
hidden layers
# num_units: number of units in the hidden layer ... W_h = tf.get_variable(‘W_h’, [input_size, num_units]) b_h = tf.get_variable(‘b_h’, [num_units]) h = tf.sigmoid(tf.add(tf.multiply(W_h, x), b_h)) W = tf.get_variable(‘W’, [num_units,
b = tf.get_variable(‘b’, [output_size]) y = tf.sigmoid(tf.add(tf.multiply(W, h), b))
57
h2 h1 ∗ + 𝜏 𝒚 W1 b1 ∗ + 𝜏 W2 b2 𝒛 …
58
59
𝒚𝒖 𝒛𝒖
=
𝒚𝟐 𝒛𝟐 𝒚𝟑 𝒛𝟑 … 𝒚𝒐 𝒛𝒐
60
hidden state
memory) due to feedback loop
across time steps
step
61
# input: x = [x_1, x_2, ..., x_n] # expected output: y_ = [y_1, y_2, ..., y_n] # num_units: number of units in the hidden layer rnn = tf.nn.rnn_cell.BasicRNNCell(num_units, activation=tf.sigmoid) state = tf.zeros([1, rnn.state_size]) y = [] for i in range(len(x)):
state = new_state logits = tf.add(tf.multiply(W_y, output), b_y) y.append(logits)
62
programs
void read() throws IOException { ... }
void, read, LPAREN, RPAREN, throws …
63
token-by-token
back as input for time
𝒚𝟐 𝒛𝟐 𝒚𝟑 𝒛𝟑 … 𝒚𝒐 𝒛𝒐
64
distribution over classes
for i in range(len(x)):
state) state = new_state logits = tf.add(tf.multiply(W_y,
y.append(tf.nn.softmax(logits))
65
reduce it
expected outputs in training data
distributions
66
into a single call
# expected output: y_ = [y_1, y_2, ..., y_n] ... for i in range(len(x)):
state = new_state logits = tf.add(tf.multiply(W_y, output), b_y) y.append(logits) loss = tf.contrib.seq2seq.sequence_loss(y, y_, weights=tf.ones(...))
67
𝒚𝟐 𝒛𝟐 𝒚𝟑 𝒛𝟑 … 𝒚𝒐 𝒛𝒐 _𝒛𝟐 Outputs Targets Softmax Cross Entropy _𝒛𝟑 _𝒛𝒐
68
Neural Network Complex architecture to model generation of
Loss Function High-dimensional function measuring error w.r.t. ground truth Training Data Ground truth inputs and outputs Gradient Descent Find the point where function value is minimal
69
compute (local) minimum
direction of negative gradient
How to train neural networks efficiently?
given: function f(x), loss for each parameter p of function p_grad = 0 for each data point d in training data g = gradient of loss w.r.t. p for d p_grad += g p += -p_grad * learning_rate
A single “step” of gradient descent Millions!
#
70
given: function f(x), loss for each parameter p of function p_grad = 0 for each data point d in batch g = gradient of loss w.r.t. p for d p_grad += g p += -p_grad * learning_rate
A single “step” of gradient descent Millions!
😑
71
∗ + 𝒚 𝒛 W b _𝒛 𝑀 Output Target
72
∗ + 𝒚 𝒛 W b _𝒛 𝑀 Output Target 𝜖𝑀 𝜖𝑧 𝜖𝑧 𝜖+ 𝜖+ 𝜖 ∗ 𝜖𝑀 𝜖𝐜 𝜖𝑀 𝜖𝐗
73
vanishing (< 1) gradients
backprop till length
given: function f(x), loss grad = 0 for each data point d in batch g = gradient of loss w.r.t. each param for d grad += g backprop_gradients(grad)
A single “step” of gradient descent Millions!
🙃
74
... y_ = tf.placeholder(tf.int32, [batch_size, rnn_length], ...) step = tf.train.GradientDescentOptimizer(0.5).minimize(loss) with tf.Session() as sess: tf.global_variables_initializer().run() for epoch in range(50): batches = get_mini_batches() for (batch_x, batch_y) in batches: sess.run(step, feed_dict={x: batch_x, y_: batch_y})
75
by-character
+ Character vocabulary (ASCII) is simpler than token vocabulary
etc.) but token model would not
http://karpathy.github.io/2015/05/21/rnn-effectiveness
76
static void do_command(struct seq_file *m, void *v) { int column = 32 << (cmd[2] & 0x80); if (state) cmd = (int)(int_state ^ (in_8(&ch->ch_flags) & Cmd) ? 2 : 1); else seq = 1; for (i = 0; i < 16; i++) { if (k & (1 << 1)) pipe = (in_use & UMXTHREAD_UNCCA) + ((count & 0x00000000fffffff8) & 0x000000f) << 8; if (count == 0) sub(pid, ppc_md.kexec_handle, 0x20000000); pipe_set_bytes(i, 0); } /* Free our user pages pointer to place camera if all dash */ subsystem_info = &of_changes[PAGE_SIZE]; rek_controls(offset, idx, &soffset); /* Now we want to deliberately put it to device */ control_check_polarity(&context, val, 0); for (i = 0; i < COUNTER; i++) seq_puts(s, "policy ");
77
78
some input?
Encoder-Decoder architecture
79
h ∗ + 𝜏 𝒀 Wh bh . 𝒛𝟐 𝒛𝟐 𝒛𝟑 … 𝒛𝒐−𝟐 𝒛𝒐 …
80
# num_units,_enc,_dec: hidden state/encoder/decoder dimensionality ... h_enc = tf.sigmoid(tf.add(tf.multiply(W_enc, x), b_enc)) # transform into hidden state dimensions W_h = tf.get_variable(‘W_h’, [num_units_enc, num_units]) b_h = tf.get_variable(‘b_h’, [num_units]) h = tf.sigmoid(tf.add(tf.multiply(W_h, h_enc), b_h)) rnn = tf.nn.rnn_cell.BasicRNNCell(num_units_dec, ...) h_dec = tf.sigmoid(tf.add(tf.multiply(W_dec h), b_dec)) for i in range(len(y)):
h_dec = new_h_dec ...
81
1. Encoder and decoder must be trained together
2. Low-dimensional hidden state
𝒀 One-hot … 𝒁 Softmax Decoder Encoder
82
is useful for decoder
inputs that is necessary to produce the right outputs
Is there a principled way to do this?
83
84
𝑍 𝑌 𝑎
85
where is the encoding function
evidence is a sample from a Normal centered on
86
From Normal-Normal conjugacy:
𝑨1 𝑎~𝑂(0,𝑱) readLine FileReader swap 𝑨2 𝑔(𝑦2)~𝑂(𝑨2, 𝜏2𝑱)
87
How the encoder maps evidence to latent space (posterior)
Encoder
Animation AlertDialog BufferedReader
88
(sequential composition)
(loop condition with body)
89
4 paths in sketch
1. (try, ), (FR.new(String), ), (BR.new(FR), ), (while, ), (BR.readLine(), ), (skip, ) 2. (try, ), (catch, ), (FNFException, ), (printStackTrace(), .)
90
productions paths
the distribution over the next rule?
91
0.3 0.7 Production rule in sketch grammar
Distribution on rules that can be fired at a point, given history so far. History encoded as a real vector.
92
(from our probabilistic model) (from the Monte-Carlo definition of expectation) (from Jensen’s inequality) (lower bound for CLE)
93
Gradients cannot pass through stochastic operation! Problem? 𝒀 Evidence … 𝒁 Sketch Decoder Encoder Intent 𝒂
STOP
94
versions of
95
𝒀 Evidence … 𝒁 Sketch Decoder Encoder Intent 𝒂 𝝑 Sample from
gradients
Gaussian Encoder Decoder (GED)
96
synthesis
be instantiated from the Bayou framework
BayouDebug
97
towards
sampling a single sketch
98
99
Training
Corpus
Evidences & Sketches from corpus Draft Program with Evidences
Inference
Statistical Learning (Deep Neural Network)
Distribution over Evidences & Sketches
𝑸(𝒁 𝒀)
foo(File f) { /// read file } Feature extractor
“read” “file” 𝑌1: 𝑍1 foo(File f) { f.read(); f.close(); }
Synthesized Program
Combinatorial Search
100
imports)
the above?
Prog 𝑌 𝑍
101
[ call FileReader.new(String) call BufferedReader.new(FileReader) loop ([BufferedReader.readLine()]) { skip } call BufferedReader.close() ] FileReader fr; BufferedReader br; String s; try { fr = new FileReader(“a.txt”); br = new BufferedReader(fr); while ((s=br.readLine())!=null) { ... } br.close(); } catch (IOException e) { ... }
variables
handle language rules
Type-directed synthesis
102
1. Given an environment of variables…
2. and a set of functions …
String String, …
3. find an expression of a target type
x, y.toString(), x.concat(y.toString()), x.substring(y)
103
Invocation chain: a composition of method calls
Two-step enumerative search 1. Up to bounded breadth, gather all invocation chains
𝑦
toString() concat(String)
contains(String )
x.toString().? x.concat(…).? x.contains(…).? Target:
104
2. For each invocation chain, recursively search for expressions for arguments in the chain
such that is not a subtype of
𝑦
toString() concat(String)
contains(String )
Target: CharSequence
…
Target: String Return type: Boolean Not a subtype of CharSequence Recursive Search
105
1. Performance: expression should be found quickly 2. Parsimony: expression should be simple 3. Relevance: expression should use user-provided variables 4. …
to heuristics
106
Given
1. An environment , initially user-provided 2. Function that finds an expression of type from environment
Let be the function that synthesizes a sketch expression into code in environment
Sketch expression Code produced by Update to call x = e1.a(e2, ..., en) where ek = where is the return type
loop (cond) { body } while ((b=) { } … … …
107
Post-processing of code
Caveats 1. Some sketches may not be synthesizable into programs
experiments show extremely unlikely
2. Many Java APIs utilize generic types
108
without syntactic sugar
109
110
111
1. Syntactic check – decidable 2. Quantitative metrics – How similar are the sets/sequences of API calls, structures in code, etc.?
accuracy?
112
113
Probability: 0.08 [ call InputStreamReader.new(InputStream) call BufferedReader.new(Reader) call BufferedReader.readLine() ] Probability: 0.06 [ call BufferedReader.new(Reader) call BufferedReader.readLine() ] Probability: 0.01 [ call FileReader.new(File) call BufferedReader.new(Reader) call BufferedReader.readLine() ]
API calls: { readLine }
114
Probability: 0.04 [ call FileReader.new(File) call BufferedReader.new(Reader) call BufferedReader.readLine() ] Probability: 0.02 [ call FileReader.new(File) call BufferedReader.new(FileReader) loop ([BufferedReader.readLine()]) { skip } call BufferedReader.close() ] <more results using FileReader>
API calls: { readLine } Types: { File } Note: did not explicitly specify FileReader
115
language from uncertain inputs
then use combinatorial methods to generate final program
complex method bodies from a few tokens
semantic information
Big Takeaway
To synthesize code:
generate sketches
final code
116
117
Two random variables:
During training, learn a distribution While debugging a program with evidence , start with a distribution . Anomaly score for Prog: compute a statistical distance .
118
… …
Training
Corpus
Features & Behaviors from corpus Test Program F
Inference
Features & Behaviors
Statistical Learning (Deep Neural Network)
Distribution over Features & Behaviors
𝑸(𝒁 |𝐐𝐬𝐩𝐡) 𝑸(𝒁 𝒀)
Anomaly Score (Aggregate)
…
foo(File f) { f.read(); f.close(); } Feature extractor
119
BayouSynth.
the original conference paper.
implemented in multiple ways.
120
This dialog box cannot be closed
AlertDialog.Builder b = new AlertDialog.Builder(this); b.setTitle(R.string.title_variable_to_insert); if (focus.getId() == R.id.tmpl_item) { b.setItems(R.array.templatebodyvars, this); } else if (focus.getId() == R.id.tmpl_footer) { b.setItems(R.array.templateheaderfootervars, this); } b.show();
121
: Generative probabilistic automaton
[Murawski & Ouaknine, 2005]
AlertDialog.Builder b = new AlertDialog.Builder(…); b.setTitle(…); if (…) { b.setItems (…); } else if (…) { b.setItems(…); } b.show();
122
1 4 11 T new A(…) setTitle(…) setItems(…) show()
1.0 0.33 1.0
11′′ T′′ show()
1.0
setItems(…)
0.33 1.0 1.0
5 8
𝜗 𝜗
11′ T′ show()
1.0
𝜗
0.33
Produced using static analysis
programs
Assumption: X and Y conditionally
independent given Z. 𝑎 𝑌 𝑍
123
Allocation (LDA) [Blei, Ng, and Jordan, 2003]
[Mikolov and Zweig, 2012]
124
Generative topic model, widely used in NLP
125
126
Symbols are API calls. A specification is a topic distribution!
the same API.
Algorithmically:
127
A.setMessage(int) A.setTitle(int) new A(Context) A.setPositiveButton(int,…) A.show() A.setPositiveButton(CharSequence,…) A.setNegativeButton(CharSequence,…) A.setMessage(CharSequence) A.setTitle(CharSequence) A.show() A.setView(View) new A(Context) A.setTitle(CharSequence) A.setPositiveButton(int,…) A.setTitle(int) C.getInstance(String) C.init(int,Key,…) C.doFinal(byte[]) B.close() C.init(int,Key)
Top 5 symbols from a few topics in an Android corpus. A is the DialogBox API. B and C are other APIs
Topic-conditioned recurrent neural networks
P(Y) over sequences
and implements P(Y | Z).
128
𝒜 𝒚𝟐 𝒛𝟐 𝒚𝟑 𝒛𝟑 … 𝒚𝒐 𝒛𝒐
To estimate
129
Goal: Compute the sum where Y ranges over paths in an automaton.
A problem in automata analysis! We estimate sum by sampling.
130
131
AlertDialog.Builder b = new AlertDialog.Builder(…); b.setTitle(…); if (…) { b.setItems (…); } else if (…) { b.setItems(…); } b.show();
million lines of code
132
* Called Salento in original paper.
133
Example behaviors of programs with top-10% anomaly scores
134
135
136
research typically ignore uncertainty in intent and incompleteness of knowledge.
process.
138
typically solves each problem from scratch.
specifications.
provide this knowledge.
non-deep models with explicitly represented features might work as well or better.
139
Vechev, and Krause. POPL 2015.
ICML 2014.
Vechev, and Krause. POPL 2015.
ICML 2014.
Emerging wisdom: straightforward application of
compositionality are critical to handling discrete program structure and enforcing guarantees.
PL with statistical, data-driven ideas.
142