Program Synthesis and Description with Structured Machine Learning Models
Graham Neubig @Stanford CS379C 5/1/2018
Program Synthesis and Description with Structured Machine Learning - - PowerPoint PPT Presentation
Program Synthesis and Description with Structured Machine Learning Models Graham Neubig @Stanford CS379C 5/1/2018 Coding = Concept Implementation sort list x in descending order x.sort(reverse=True) The (Famous) Stack
Graham Neubig @Stanford CS379C 5/1/2018
sort list x in descending
x.sort(reverse=True)
Formulate the Idea
sort my_list in descending order
Search the Web
python sort list in descending order
Browse thru. results Modify the result
sorted(my_list, reverse=True)
x.sort(reverse=True)
sort list x in descending
Note: Good summary in Allamanis et al. (2017) Natural Language Code Human interpretable Human and machine interpretable Ambiguous Precise in interpretation Structured, but flexible Structured w/o flexibility
x Load % 5 ==
If Compare BinOp Name Num Num if x % 5 == 0: AST Parser Can we take advantage of this for better NL-code interfaces? (used in models of Maddison & Tarlow 2014)
(ASE 2015)
Joint Work w/ Yusuke Oda, Hiroyuki Fudaba, Hideaki Hata, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura.
def func2(t): my_list = range(1,t) my_val = 0 for x in my_list: my_val += x * x return my_val def func1(t): … class class1:
Single lines of code [Oda+ 2015] Single variables [Sridhara+ 2011a, Allamanis+ 2015] Code blocks [Sridhara+ 2011b, Wong+ 2013] Functions/Methods [Movshovitz-Attias+ 2013], others Classes [Moreno+ 2013]
Assisting Code Reading Pseudo-code can help explain functionality of code Debugging Could provide a sanity check for programmers
Sophisticated and robust, but high-maintenance and language-specific
Data-driven and easy to construct, but lack generalizability and error prone
Rule-based methods e.g. [Buse+ 08, Sridhara+ 10, Sridhara+ 11, Moreno+ 13]
Information retrieval methods e.g. [Haiduc+10, Eddy+13, Wong+13, Rodeghero+14]
Machine Translation もし x を 5 で 割り切れる なら if x is divisible by 5 Code Description if x % 5 == 0 : if x is divisible by 5
call the function _generator, join the result into a string, return the result Intent Target
compared to no pseudo-code
models, e.g.
Attention Model, Iyer et al. 2016
Summarization of Source Code, Allamanis et al. 2016.
(ACL 2017) Joint Work w/ Pengcheng Yin
Interface by William Qian
language programming (e.g. see Balzer 1985)
based statistical models (e.g. Wong & Mooney 2007)
models for code generation in Python (Ling et al. 2016)
(Sutskever et al. 2014, Bahadanau et al. 2015)
sort list x backwards
RNN RNN RNN RNN RNN
</s>
RNN RNN RNN RNN
sort ( x , sort ( x , reverse ...
(Python) as prior knowledge in a neural model
sorted(my_list, reverse=True)
Surface Code
Deterministic transformation (using Python astor library)
Input Intent sort my_list in descending order Generated AST
Expr Call expr[func] expr*[args] keyword*[keywords] Name Name erpr
str(my_list)
keyword s tr(sorted) ....
NOTE: very nice contemporaneous work by Rabinovich et al. (2017)
NL Intent
Action Sequence LSTM Encoder LSTM Decoder Parent Feeding (Dong and Lapata, 2016) Action Flow
sort my_list in descending
Pointer Net Softmax
...
Vocabulary
...
Softmax Input Words Generation Copy from Input
input
Generation prob. Copy prob. Final probability: marginalize over the two paths
Expr Call c] expr*[args] keyword*[keywords] Name erpr
str(my_list)
keyword ) ....
Derivation
implementation
(Semantic Parsing)
APP
<name> Divine Favor </name> <cost> 3 </cost> <desc> Draw cards until you have as many in hand as your [Ling et al., 2016] Intent (Card Property) Target (Python class, extracted from HearthBreaker)
HearthBreaker
crawled from ifttt.com
productivity, etc.
THAT structure, much simpler grammar
Intent Autosave your Instagram photos to Dropbox Target IF Instagram.AnyNewPhotoByYou THEN Dropbox.AddFileFromURL
https://ifttt.com/applets/1p-autosave- your-instagram-photos-to-dropbox
[Quirk et al., 2015]
–Latent Predictor Network [Ling et al., 2016] –Seq2Tree [Dong and Lapata., 2016] –Doubly recurrent RNN [Alvarez-Melis and Jaakkola., 2017]
–Modeling syntax helps for code generation and semantic parsing ☺
Intent join app_config.path and string 'locale' into a file path, substitute it for localedir. Pred. Intent self.plural is an lambda function with an argument n, which returns result of boolean expression n not equal to integer 1 Pred. Ref. Intent <name> Burly Rockjaw Trogg </name> <cost> 5 </cost> <attack> 3 </attack> <defense> 5 </defense> <desc> Whenever your opponent casts a spell, gain 2 Attack. </desc> <rarity> Common </rarity> ... Ref. tokens copied from input
(In Progress)
Joint Work w/ Pengcheng Yin, Bowen Deng, Edgar Chen, Bogdan Vasilescu
IFTTT, manually curated datasets
promises a large data source for code synthesis
don’t necessarily reflect the answer to the original question
–Position: Is the snippet a full block? The start/end of a block? The only block in an answer? –Code Features: Contains import? Starts w/ assignment? Is value? –Answer Quality: Answer is accepted? Answer is rank 1, 2, 3? –Length: What is the number of lines?
–Train an RNN to predict P(intent | snippet) and P(snippet | intent) given heuristically extracted noisy data –Use log probabilities and normalized by z score over post, etc.
better results than heuristic strategies
correspondence features were necessary
language?
Python Java
workers confirm or deny our model's extracted snippets
helpful