Programming Languages and Machine Learning Martin Vechev - PowerPoint PPT Presentation

Programming Languages and Machine Learning Martin Vechev DeepCode.ai and ETH Zurich

PL Research: Last 10 years (sample) • (Semi-) Automated Program Synthesis • Mostly learning functions/algorithms over discrete spaces (from examples, natural language, components, partial specs, etc) • Automated Symbolic Reasoning • Abstract Interpretation = theory of sound & precise approximation • SMT solvers • Approximate/Probabilistic Programming • Applications/Analysis/Synthesis

Two part talk ( 22 + 3)  Learning-based Programming Engines PSI: Exact Solver for Probabilistic Programs SLANG Deep3 1. Pick a structure of interest, e.g., trees: def main() { p := Uniform(0,1); r := [1,1,0,1,0]; TCond ::= 𝜁 | WriteOp TCond | MoveOp TCond for i in [0..r.len] { MoveOp ::= Up, Left, Right, DownFirst, DownLast, 2. Define a DSL for expressing functions: NextDFS, PrevDFS, NextLeaf, PrevLeaf, observe( PrevNodeType, PrevNodeValue, (can be Turing complete) Bernoulli (p) == r[i])); WriteOp ::= WriteValue, WriteType, WritePos } f best = argmin cost( D , f ) 3. Synthesize f best ∊ DSL from Dataset D: return p; f ∊ DSL } f best 𝜹 4. Use f best on new structures: （） http://plml.ethz.ch http://psisolver.org

Two part talk ( 22 + 3)  Learning-based Programming Engines SLANG Deep3 1. Pick a structure of interest, e.g., trees: TCond ::= 𝜁 | WriteOp TCond | MoveOp TCond MoveOp ::= Up, Left, Right, DownFirst, DownLast, 2. Define a DSL for expressing functions: NextDFS, PrevDFS, NextLeaf, PrevLeaf, PrevNodeType, PrevNodeValue, (can be Turing complete) WriteOp ::= WriteValue, WriteType, WritePos f best = argmin cost( D , f ) 3. Synthesize f best ∊ DSL from Dataset D: f ∊ DSL f best 𝜹 4. Use f best on new structures: （） http://plml.ethz.ch

Probabilistic Learning from Code Task Statistical Engine Solution probabilistic PL + ML model number of 15 million repositories repositories Billions of lines of code High quality, tested, maintained programs last 5 years

Probabilistic Learning from Code Probabilistically likely solutions to problems hard to solve otherwise Joint work with : Svetoslav Pascal Benjamin Timon Petar Andreas Pavol Mateo Veselin Christine Karaivanov Roos Bischel Gehr Tsankov Raychev Krause Bielik Zeller Panzacchi Publications Statistical Engines Program Synthesis for Char. Level Language Modeling, ICLR’17 sub • apk-deguard.com Learning a Static Analyzer from Data, https://arxiv.org/abs/1611.01752 • Statistical Deobfuscation of Android Applications, ACM CCS’16 • DEEP3 Probabilistic Mode for Code with Decision Trees, ACM OOPSLA’16 • jsnice.org PHOG: Probabilistic Mode for Code, ACM ICML’16 • Learning Programs from Noisy Data, ACM POPL ’16 • Predicting Program Properties from “Big Code”, ACM POPL ’15 • nice2predict.org Code Completion with Statistical Language Models, ACM PLDI’14 • SLANG Machine Translation for Programming Languages, ACM Onward’14 • more: http://plml.ethz.ch

JSNice.org  Every country  ~200,000 users  Top ranked tool

A Key Question Data Learning Model Probabilis Pr ilistic ic Mod odel el Widel dely Effic icie ient nt Hig igh h Ex Expl plainabl ble e Appl pplicabl ble Learning ning Preci Pr ecision on Pr Predi edict ction ons

Training dataset D f.open(“f2” | “r”); query: f.read(); f.open(“file” | “r”); f.open(“f2” | “w”); f. ? f.write(“c”); f.open(“f1” | “r”); f.read();

program context → 𝜹 key idea: synthesize a function f: “…All problems in computer science can be solved by another level of indirection …” -- David Wheeler

Creating probabilistic models: our method [“Learning Programs from Noisy Data”, ACM POPL ’16, “PHOG: Probabilistic Model for Code”, ACM ICML ’16, “Probabilistic Model for Code with Decision Trees”, ACM OOPSLA’16] 1. Pick a structure of interest, e.g., ASTs: DSL 2. Define a DSL for expressing functions: (can be Turing complete) 3. Synthesize f best ∊ DSL from Dataset D : f best = argmin cost( D , f ) f ∊ DSL （） 𝜹 f best 4. Use f best to compute context and predict:

Step 1: Pick Structure of Interest Let it be abstract syntax trees (ASTs) of programs AST JavaScript program elem.notify({ CallExpression position: ‘top’, autoHide: false, MemberExpression ObjectExpression delay: 100 }); Identifier Property Property Property Property elem notify position autoHide delay String Boolean ‘ top’ false

Step 2: Define a DSL over structure TCond ::= 𝜁 | WriteOp TCond | MoveOp TCond Syntax MoveOp ::= Up, Left, Right, DownFirst, DownLast, NextDFS, PrevDFS, NextLeaf, PrevLeaf,PrevNodeType, PrevNodeValue, PrevNodeContext WriteOp ::= WriteValue, WriteType, WritePos Semantics Up Left WriteValue 𝜹 ← 𝜹 ∙

Step 3: synthesize f best f best = argmin cost( D , f ) f ∊ DSL

Step 3: synthesize f best DSL generate candidate f dataset D TCond ::= 𝜁 | WriteOp TCond | MoveOp TCond MoveOp ::= Up, Left, Right, ... WriteOp ::= WriteValue, WriteType, ... millions ( ≈ 10 8 ) f best = argmin cost( D , f ) Build f ∊ DSL Synthesizer Probabilistic Model P use D and f to compute P(element | f ( )) to scale: iterative synthesis cost( D , f) = entropy ( P ) on fraction of examples 𝑃 ( 𝐸 )

Step 4: use f best to predict 𝜹 program f best Context Left {} elem.notify( ... , WriteValue {hide} ... , Up {hide} { WritePos {hide, 3} position: ‘top’, Up {hide, 3} hide: false, DownFirst {hide, 3} ? DownLast {hide, 3} } WriteValue {hide, 3, notify} ); {Previous Property, Parameter Position, API name}

Deep3: Experimental Results [Probabilistic Model of JavaScript] Dataset D : 150,000 files Training Time: ~ 100 hours f best ~ ~ 50,000 instr. Accuracy (APIs) Probabilistic Model 22.2% Last two tokens, Hindle et. al. [ICSE’12] 30.4% Last two APIs, Raychev et. al. [PLDI’14] Deep3 eep3 66.6% Details in: “Probabilistic Model for Code with Decision Trees”, ACM OOPSLA’16

Deep3: Experimental Results [Probabilistic Model of Python] Dataset D : 150,000 files Training Time: ~ 100 hours f best ~ ~ 120,000 instr Accuracy (identifiers) Probabilistic Model 38% Last two tokens, Hindle et. al. [ICSE’12] Deep3 eep3 51% Details in: “Probabilistic Model for Code with Decision Trees”, ACM OOPSLA’16

Applying the Concept to Natural Language [ Program Synthesis for Character Level Language Modeling, ICLR’17 sub ] Dataset D : Training Time: ~ 8 hours f best ~ ~ 9,000 instr uses a char-level DSL with state Hutter Prize Wikipedia Dataset Interpretable model, browse here : http://www.srl.inf.ethz.ch/charmodel.html Bits-per-Character Probabilistic Model 7-gram (best) 1.94 1.67 Stacked LSTM (Graves 2013) Char-based DSL synthesis 1.62 1.60 MRNN (Sutskever 2011) 51% 1.44 MI-LSTM (Wu et al. 2016) 51% 1.40 HM-LSTM* (Chung et al. 2016) 51%

Learning (Abstract) Semantics [ Learning a Static Analyzer from Data, https://arxiv.org/abs/1611.01752 ] VarPtsTo(“global” , h ) checkIfInsideMethodCall function isBig(v) { checkMethodCallName return v < this .length checkReceiverType } checkNumberOfArguments ... [12, 5].filter(isBig); VarPtsTo( this, h ) Can be understood by experts Found issues in Facebook’s Flow

Programming Languages and Machine Learning Martin Vechev - PowerPoint PPT Presentation

Programming Languages and Machine Learning Martin Vechev DeepCode.ai and ETH Zurich PL Research: Last 10 years (sample) (Semi-) Automated Program Synthesis Mostly learning functions/algorithms over discrete spaces (from examples, natural

61A Lecture 26 Announcements Programming Languages Programming Languages 4 Programming

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Chapter 2 Early History: low level languages The 1950s: first programming languages History of

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

The History Of Programming Languages Chapter Twenty-Four Modern Programming Languages, 2nd ed.

Programming Languages Chapter One Modern Programming Languages, 2nd ed. 1 Outline What

Big Ideas for CS 251 Theory of Programming Languages Principles of Programming Languages

Big Ideas for CS 251 Theory of Programming Languages Principles of Programming Languages

Early Programming Languages Introductory presentation History of Programming Languages seminar

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Before We Start Any questions? Context Free Languages PDAs and CFLs Languages Context Free

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Arabic Dialect Identification in the Context of Bivalency and Code-Switching Mahmoud EL-Haj Paul

Multilingual Aspects in Speech and Multimodal Interfaces Paolo Baggia Director of International

AITOK at the NTICR-14 OpenLiveQ-2 Tokushima University Hiroki Tanioka Good Morning! I am

Attack methods on privacy-preserving record linkage Peter Christen 1 , Rainer Schnell 2 , Dinusha

Fast Multipole Methods in Arbitrary Dimensions with Chenhan Yu James Levitt Severin Riez

The Infinite Markov Model Daichi Mochihashi NTT Communication Science Laboratories, Japan

Gram-Schmidt Finding Orthonormal Basis The famous Gram-Schmidt process is used to produce an

A CLT for Information-Theoretic Statistics of Gram Random Matrices Malika Kharouf Joint work