Conditional Program Generation for Bimodal Program Synthesis
(Joint work with Chris Jermaine, Vijay Murali, and Letao Qi)
Swarat Chaudhuri
Rice University
www.cs.rice.edu/~swarat
Conditional Program Generation for Bimodal Program Synthesis Swarat - - PowerPoint PPT Presentation
Conditional Program Generation for Bimodal Program Synthesis Swarat Chaudhuri Rice University www.cs.rice.edu/~swarat (Joint work with Chris Jermaine, Vijay Murali, and Letao Qi) Program synthesis [Simon 1963, Summers 1977, Manna-Waldinger
(Joint work with Chris Jermaine, Vijay Murali, and Letao Qi)
Swarat Chaudhuri
Rice University
www.cs.rice.edu/~swarat
[Simon 1963, Summers 1977, Manna-Waldinger 1977, Pnueli-Rosner 1989]
Specification
Synthesizer
Program + Correctness Certificate Specification: Logical constraint that must be satisfied exactly Algorithm: Search for a program that satisfies the specification.
3
An idealized program
Candidate implementations
Prior distribution
Synthesizer
Posterior distribution
Neural Sketch Learning for Conditional Program Generation. Murali, Qi, Chaudhuri, and Jermaine. Arxiv 2017.
Learned from a real-world code corpus Ambiguous “evidence” + Logical requirements
4
Ambiguous “evidence” + Logical requirements Candidate implementations
Prior distribution
Learned from a real-world code corpus
Synthesizer
Posterior distribution
An idealized program
5
http://bit.ly/2zgP5fj
Assume random variables 𝑌 and 𝑄𝑠𝑝, over labels and programs respectively, following a joint distribution 𝑅(𝑌, 𝑄𝑠𝑝). Offline:
𝑌*, 𝑄𝑠𝑝*
this, learn a function that maps evidence to programs.
Online: Given 𝑌, produce (𝑌).
6
I = ⇢ 1 if g(X) ≡ Prog
The map g is probabilistic. Learning is maximum conditional likelihood estimation:
;
∑ log 𝑄 𝑄𝑠𝑝* 𝑌*, 𝜄)
.
7
Language capturing the essence of API usage in Java.
8
Prog ::= skip | Prog1; Prog2 | call Call | let x = Call | if Exp then Prog1 else Prog2 | while Exp do Prog1 | try Prog1 Catch Exp ::= Sexp | Call | let x = Call : Exp1 Sexp ::= c | x Call ::= Sexp0.a(Sexp1, . . . , Sexpk) Catch ::= catch(x1) Prog1 . . . catch(xk) Progk
API method name API call
Set of API calls
Set of API datatypes
Set of keywords that may appear while describing program actions in English
split
9
Directly learning over source code simply doesn’t work
such as type safety. Learning to satisfy these constraints is hard.
10
Language abstractions to the rescue! Learn not over programs, but typed, syntactic models of programs.
11
The sketch of a program is obtained by applying an abstraction function 𝛽. From sketch 𝑍 to program 𝑄𝑠𝑝: a fixed concretization distribution 𝑄(𝑄𝑠𝑝 | 𝑍). Learning goal changes to
*)}, solve arg max ;
∑ log 𝑄 𝑍
* 𝑌*, 𝜄)
.
12
13
Abstract API call
Y ::= Call | skip | while Cond do Y1 | Y1; Y2 | try Y1 Catch | if Cond then Y1 else Y2 Catch ::= catch(τ1) Y1 . . . catch(τk) Yk Cond ::= {Call1, . . . , Callk} Call ::= a(τ1, . . . , τk)
14
Combinatorial “concretization”
Sample sketches Sketch → Executable code
Implementations satisfying 𝜒
𝑄 𝑍 𝑌)
Evidence 𝑌 Logical requirement 𝜒
Type-directed, compositional synthesizer End-to-end differentiable neural architecture
Learned from 𝒀𝒋, 𝒁𝒋 pairs
15
Combinatorial “concretization”
Sample sketches Sketch → Executable code
Implementations satisfying 𝜒
𝑄 𝑍 𝑌)
Evidence 𝑌 Logical requirement 𝜒
Type-directed, compositional synthesizer End-to-end differentiable neural architecture
Learned from 𝒀𝒋, 𝒁𝒋 pairs
Not all sketches may be realizable as executable programs
Learning using a probabilistic encoder-decoder
16
𝑌: Evidence 𝑍: Sketches 𝑎: Latent “intent”
Representation
Prior for regularization
Encoder f Decoder g 𝑎 𝑔(𝑌) Y 𝑌
𝑎 𝑌 𝑍
Learning using a probabilistic encoder-decoder
17
𝑌: Evidence 𝑍: Sketches 𝑎: Latent “intent”
Encoder f Decoder g 𝑎 𝑔(𝑌) Y 𝑌 𝑄 𝑎 = 𝑂𝑝𝑠𝑛𝑏𝑚 0, 𝐽 𝑄 𝑔(𝑌) 𝑎) = 𝑂𝑝𝑠𝑛𝑏𝑚 𝑎, 𝜏S𝐽
During learning, use Jensen’s inequality to get smooth loss function
Learning using a probabilistic encoder-decoder
18
𝑌: Evidence 𝑍: Sketches 𝑎: Latent “intent”
Encoder f Decoder g 𝑎 𝑔(𝑌) Y 𝑌
𝑄 𝑎 = 𝑂𝑝𝑠𝑛𝑏𝑚 0, 𝐽 𝑄 𝑔(𝑌) 𝑎) = 𝑂𝑝𝑠𝑛𝑏𝑚 𝑎, 𝜏S𝐽 During inference, get P(Z | X) using normal-normal conjugacy
19
0.3 0.7
Distribution on rules that can be fired at a point, given history so far. History encoded as a real vector.
20
Ruled out by type system
methods, ~1500 types.
types.
generative model).
21
Thank you! Questions?
22
swarat@rice.edu http://www.cs.rice.edu/~swarat
(Research funded by the DARPA MUSE award #FA8750-14-2-0270)