Any-Code Completion public static Path[] stat2Paths(FileStatus[] - PowerPoint PPT Presentation

Any-Code Completion public static Path[] stat2Paths(FileStatus[] stats) { if (stats == null) return null; Path[] ret = new Path[stats.length]; for (int i = 0; i < stats.length; ++i){ ret[i] = stats[i].getPath(); } return ret; } Generated: (Java) (25.2%) stats[i].getPath() (3.3%) new Path(stats[i]) (2.5%) new Path(stats[i], charset) charset)

Overview: a Structural Language Model MethodCall MethodCall stats[i].getPath() Name Name ArrayAccess ArrayAccess Name Name Name Name get get path path stats stats i i 2

http://AnyCodeGen.org 3

Structural Language Models of Code ICML’2020 Uri Alon Roy Sadaka Omer Levy Eran Yahav Technion Technion Tel-Aviv University Technion Facebook AI Research 4

Language modeling of code • Code completion • Validate existing code, detect unlikely code. public static Path[] stat2Paths(FileStatus[] stats) { public static Path[] stat2Paths(FileStatus[] stats) { if (stats == null) if (stats == null) return null; return null; Path[] ret = new Path[stats.size()]; Path[] ret = new Path[ stats.size() ]; for (int i = 0; i < stats.length; ++i){ for (int i = 0; i < stats.length; ++i){ ret[i] = stats[i].getPath(); ret[i] = stats[i].getPath(); } } return ret; return ret; } } 5

Key Idea #1 : predict a missing subtree Instead of representing the task as: “predict a missing sentence in a text ” Represent the task as: “predict a missing subtree in a tree ”. Learn syntactic patterns, instead of sequential patterns 6

Abstract Syntax Tree Any valid code snippet can be parsed into an Abstract Syntax Tree (AST). The AST is composed of nodes and user-defined values in its leaves. MethodCall ArrayAccess Name stats[i].getPath() Name Name get path stats i 7

Key Idea #2 : a structural language model (SLM) In a natural-language model : n Pr ( y t ∣ y < t ) ∏ Pr ( Y ) = Pr ( y 1 , y 2 , . . . , y n ) = t =1 But how can we compute the probability of a tree ? 8

Key Idea #2 : a structural language model (SLM) Given a tree A (can be an arbitrary graph) Induce an ordering over its nodes: A (in practice: DFS) a 0 , a 1 , . . . , a n ∈ A structural language model (SLM) computes the probability of the tree A : n A Pr ( a t ∣ a < t ) ∏ Pr ( ) = t =0 Pr ( a t ∣ a < t ) But, how can we represent the partial tree when computing ? a < t 9

The fundamental tradeoff in code representation Lea��i�g Implicitly re-learn syntactic & Eff�� semantic regularities model size, data, time… Sweet-spot Requires expertise, language-specific, task- specific model A�a��i� Eff�� S��face��e�� AST Ha�dc�af�ed Da�a �� C�� ... (��e� ��ea�) Pa�h� fea��e� A�a�� A�a�� [“A General Path-based Representation …”, PLDI’2018] 10 [“code2vec”, POPL’2019]

Key Idea #3 : a partial tree as AST paths Pr ( a t ∣ a < t ) We compute the probability of a node by considering the paths in the Abstract Syntax Tree (AST) from all leaves into . a t Me��d R�� IfE�� ? 11

Me��d R�� IfE�� ? 12

AST Paths AST Paths are simple paths over nodes in the AST . In previous works, we used AST paths to read code. In this work, we generate code by predicting the next node in a set of AST paths. Me��d R�� IfE�� ? [“code2seq”, ICLR’2019] SLM, this work 13

AST Paths capture long-range interactions public static Path[] stat2Paths(FileStatus[] stats) { if (stats == null) return null; Path [] ret = new Path[stats.length]; for (int i = 0; i < stats.length; ++i){ ret[i] = stats[i].getPath(); } return ret; 14 }

Model • Any sequential encoder to encode each arbitrary-length path into a fixed-length vector separately (e.g., LSTM, transformer encoder) • Any contextualizer to let all paths interact (e.g., transformer encoder) • Attend to the contextualized paths using the root path as the query Me��d R�� IfE�� ? 15

Model Me��d Encode paths Contextualize Attend Predict node R�� IfE�� Greater ? Context Query 16

Generate the Tree of: x > 1 Me��d R�� IfE�� ? 17

Generate the Tree of: x > 1 Method Root IfExpr Greater ? 18

Generate the Tree of: x > 1 Me�hod Roo� IfE�pr Grea�er Name ? 19

Generate the Tree of: x > 1 Me�hod Roo� IfE�pr Grea�er Name ? x 20

Generate the Tree of: x > 1 Me�hod Roo� IfE�p� G�ea�e� Name In�E�p x ? 21

Generate the Tree of: x > 1 Me�hod Roo� x > 1 IfE�pr Grea�er Name In�E�p x 1 22

Copy Mechanism full token copy subtoken copy myNewFoo = myObj.getFoo(); myNewFoo.setFooId(id); Vocabulary 23

Example - Java public static Path[] stat2Paths(FileStatus[] stats) { if (stats == null) return null; Path[] ret = new Path[stats.length]; for (int i = 0; i < stats.length; ++i){ ret[i] = stats[i].getPath(); } return ret; } Generated: (Java) (25.2%) stats[i].getPath() (3.3%) new Path(stats[i]) (2.5%) new Path(stats[i], charset) 24 charset)

Example - C# public static string Camelize( this string input) { var word = input.Pascalize(); return word.Length > 0 ? word.Substring(0, 1).ToLower() + word.Substring(1) : word; } Generated: (C#) (14.1%) word.Substring(0, 1) (8.2%) word.trim() (5.8%) word.Substring(1) 25

Java Results (trained on 1.3M examples) LSTM Transformer Transformer SLM seq2prod seq2tree seq2prod +copy small base (this work) 45M 45M seq2tree 12M 15M 55.3 2.9 5.6 7.9 4.8 13.6 52.4 LSTMs+attn+copy 50.5 49.7 47.4 Transformer-small+copy SLM 3.8 1.4 Transformer-base+copy (this work) 41.7 39.1 SLM 38.1 8.3 4.8 8.3 4.4 34.7 34.3 31.8 SLM 30.8 (this work) 24.8 SLM 24.1 23.2 23.0 (this work) 21.4 18.0 16.9 16.8 16.6 14.2 11.8 8.1 acc@1 acc@5 tree@1 tree@5 tree a.b > 1 c.d > 2 = NAME.NAME > INT 26

C# Results GNN seq2seq seq2seq seq2tree SLM GNN seq2seq seq2seq seq2tree SLM PHOG PHOG → NAG +copy +copy +copy (this work) → NAG +copy +copy +copy (this work) 45.5 33.5 27.0 18.4 7.6 9.6 37.9 37.6 35.9 30.2 24.6 22.4 11.2 15.3 27.1 26.4 22.3 18.5 15.2 13.0 12.0 7.4 acc@1 acc@5 27

Error Analysis What kind of mistakes are responsible for the gap between acc@k and tree@k ? 55.3 ? 39.1 ? SLM (this work): 24.8 18.0 acc@1 tree@1 acc@5 tree@5 28

Error Analysis What kind of mistakes are responsible for the gap between acc@k and tree@k ? 74% : Single-token mismatch 30% : Single- sub token mismatch Single 55.3 sub token 30% 39.1 Single token Single token 24.8 44% 74% 18.0 29

Error Analysis public float getProgress() { this .readLock.lock(); try { if ( this .currentAttempt != null ) { return this .currentAttempt.getProgress(); } return 0; } finally { this .readLock.unlock(); } } Exact-match Tree-match Compiles Generated: (31.3%) ✘ ✔ ✘ this.currentAttempt.getCount() (30.6%) ✘ ✘ ✔ -1 f (1.5%) ✘ ✔ ✘ this.currentAttempt.get() (1.2%) ✘ ✔ ✘ this.currentAttempt.getTime() (0.9%) ✔ ✔ ✔ this.currentAttempt.getProgress() 30

Error Analysis public float getProgress() { this .readLock.lock(); try { if ( this .currentAttempt != null ) { return this .currentAttempt.getProgress(); } return 0; } finally { this .readLock.unlock(); } } 31

Structural Language Models of Code Key points : 1. Predicting a missing subtree in a tree n 2. A structural language model over trees A Pr ( a t ∣ a < t ) ∏ Pr ( ) = t =0 Me�hod Roo� 3. A partial AST as a set of paths IfE�pr Grea�er Name In�E�p x 1 http://AnyCodeGen.org urialon@cs.technion.ac.il 34

Any-Code Completion public static Path[] stat2Paths(FileStatus[] - PowerPoint PPT Presentation

Any-Code Completion public static Path[] stat2Paths(FileStatus[] stats) { if (stats == null) return null; Path[] ret = new Path[stats.length]; for (int i = 0; i < stats.length; ++i){ ret[i] = stats[i].getPath(); } return ret; }

ELD Completion Module Advice for students on completion of Modules A, B & C Why?

4.6 Unfailing Completion Classical completion: Try to transform a set E of equations into an

Lecture 15: Exact Tensor Completion Joint Work with David Steurer Lecture Outline Part I:

Code Generation Machine code generation cs4713 1 Machine code generation machine Intermediate

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

80% of Code Red 2 Code Red 2 re-re- Code Red 1 and Code Red 2 Code Red 2 re- cleaned up

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

in practice source code source code javac scalac groovyc jrubyc 0xCAFEBABE byte code

Global code completion and architecture of clangd Ilya Biryukov Google EuroLLVM April 16, 2018

Certificate of Completion Office of Special Education June 2017 PURPOSE OF PRESENTATION

PHCCs Retention T o Completion PHCCs Retention T o Completion Project: Raising The Bar on

MILFORD EVSD District Wide Completion of Building Masterplan Completion of Masterplan

Self-service Training slides Part 2 of 3 1. Structure of NPDS-H 2. Completion of NPDS-H

SCAPE: Shape Completion SCAPE: Shape Completion and Animation of People and Animation of People

Phenomenological Gestalten and Center of Mathematics figural completion: CNRS-EHESS, Paris

Knowledge Graph Completion Mayank Kejriwal (USC/ISI) What is knowledge graph completion? An

Learning Morphophonology From Morphology and MDL John A Goldsmith The University of Chicago

Generative Adversarial Networks Benjamin Striner CMU 11-785 March 21, 2018 Benjamin Striner

Par arall llel Performan ance Optim imiz ization and Productiv ivity EU H2020 Centre of of

Generic, open and powerful Marc Moreno Maza September 1, 2006 1 An introductory example Z :=

Partitioning and numbering meshes for efficient MPI-parallel execution in PyOP2 Lawrence Mitchell,

Performance evaluation and optimization of Geant4 on GPUs Azamat Mametjanov LANS

Outline What is the proposed e-Science Desktop Peer and why. P2P-DVM, a prototype of

Structured matrix methods for polynomial computations Joab R. Winkler Department of Computer

Any-Code Completion public static Path[] stat2Paths(FileStatus[] - PowerPoint PPT Presentation

Any-Code Completion public static Path[] stat2Paths(FileStatus[] stats) { if (stats == null) return null; Path[] ret = new Path[stats.length]; for (int i = 0; i < stats.length; ++i){ ret[i] = stats[i].getPath(); } return ret; }

ELD Completion Module Advice for students on completion of Modules A, B &amp; C Why?

4.6 Unfailing Completion Classical completion: Try to transform a set E of equations into an

Lecture 15: Exact Tensor Completion Joint Work with David Steurer Lecture Outline Part I:

Code Generation Machine code generation cs4713 1 Machine code generation machine Intermediate

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

80% of Code Red 2 Code Red 2 re-re- Code Red 1 and Code Red 2 Code Red 2 re- cleaned up

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

in practice source code source code javac scalac groovyc jrubyc 0xCAFEBABE byte code

Global code completion and architecture of clangd Ilya Biryukov Google EuroLLVM April 16, 2018

Certificate of Completion Office of Special Education June 2017 PURPOSE OF PRESENTATION

PHCCs Retention T o Completion PHCCs Retention T o Completion Project: Raising The Bar on

MILFORD EVSD District Wide Completion of Building Masterplan Completion of Masterplan

Self-service Training slides Part 2 of 3 1. Structure of NPDS-H 2. Completion of NPDS-H

SCAPE: Shape Completion SCAPE: Shape Completion and Animation of People and Animation of People

Phenomenological Gestalten and Center of Mathematics figural completion: CNRS-EHESS, Paris

Knowledge Graph Completion Mayank Kejriwal (USC/ISI) What is knowledge graph completion? An

Learning Morphophonology From Morphology and MDL John A Goldsmith The University of Chicago

Generative Adversarial Networks Benjamin Striner CMU 11-785 March 21, 2018 Benjamin Striner

Par arall llel Performan ance Optim imiz ization and Productiv ivity EU H2020 Centre of of

Generic, open and powerful Marc Moreno Maza September 1, 2006 1 An introductory example Z :=

Partitioning and numbering meshes for efficient MPI-parallel execution in PyOP2 Lawrence Mitchell,

Performance evaluation and optimization of Geant4 on GPUs Azamat Mametjanov LANS

Outline What is the proposed e-Science Desktop Peer and why. P2P-DVM, a prototype of

Structured matrix methods for polynomial computations Joab R. Winkler Department of Computer

ELD Completion Module Advice for students on completion of Modules A, B & C Why?