Any-Code Completion public static Path[] stat2Paths(FileStatus[] - - PowerPoint PPT Presentation

any code completion
SMART_READER_LITE
LIVE PREVIEW

Any-Code Completion public static Path[] stat2Paths(FileStatus[] - - PowerPoint PPT Presentation

Any-Code Completion public static Path[] stat2Paths(FileStatus[] stats) { if (stats == null) return null; Path[] ret = new Path[stats.length]; for (int i = 0; i < stats.length; ++i){ ret[i] = stats[i].getPath(); } return ret; }


slide-1
SLIDE 1

public static Path[] stat2Paths(FileStatus[] stats) { if (stats == null) return null; Path[] ret = new Path[stats.length]; for (int i = 0; i < stats.length; ++i){ ret[i] = stats[i].getPath(); } return ret; }

Any-Code Completion

Generated: (Java) stats[i].getPath() (25.2%) new Path(stats[i]) (3.3%) new Path(stats[i], charset) charset) (2.5%)

slide-2
SLIDE 2

2

Overview: a Structural Language Model

MethodCall ArrayAccess Name Name Name stats i get path

stats[i].getPath()

MethodCall ArrayAccess Name Name Name stats i get path

slide-3
SLIDE 3

3

http://AnyCodeGen.org

slide-4
SLIDE 4

Structural Language Models of Code

ICML’2020

Uri Alon

Technion

Eran Yahav

Technion

Omer Levy

Tel-Aviv University Facebook AI Research

Roy Sadaka

Technion 4

slide-5
SLIDE 5

public static Path[] stat2Paths(FileStatus[] stats) { if (stats == null) return null; Path[] ret = new Path[stats.size()]; for (int i = 0; i < stats.length; ++i){ ret[i] = stats[i].getPath(); } return ret; }

Language modeling of code

  • Code completion
  • Validate existing code, detect unlikely code.

5

public static Path[] stat2Paths(FileStatus[] stats) { if (stats == null) return null; Path[] ret = new Path[stats.size()]; for (int i = 0; i < stats.length; ++i){ ret[i] = stats[i].getPath(); } return ret; }

slide-6
SLIDE 6

Instead of representing the task as: “predict a missing sentence in a text” Represent the task as: “predict a missing subtree in a tree”. Learn syntactic patterns, instead of sequential patterns

Key Idea #1: predict a missing subtree

6

slide-7
SLIDE 7

Any valid code snippet can be parsed into an Abstract Syntax Tree (AST). The AST is composed of nodes and user-defined values in its leaves.

Abstract Syntax Tree

7

stats[i].getPath()

MethodCall ArrayAccess Name Name Name stats i get path

slide-8
SLIDE 8

Key Idea #2: a structural language model (SLM)

In a natural-language model: But how can we compute the probability of a tree?

Pr(Y) = Pr(y1, y2, . . . , yn) =

n

t=1

Pr (yt ∣ y < t)

8

slide-9
SLIDE 9

Key Idea #2: a structural language model (SLM)

Given a tree A (can be an arbitrary graph) Induce an ordering over its nodes: A (in practice: DFS) A structural language model (SLM) computes the probability of the tree A: But, how can we represent the partial tree when computing ?

a0, a1, . . . , an ∈

Pr( ) =

n

t=0

Pr (at ∣ a<t) Pr (at ∣ a<t) a<t

9

A

slide-10
SLIDE 10

Leaig Eff Aai Eff

Sfacee (e ea) AST Pah Daa Aa C Aa Hadcafed feae

...

The fundamental tradeoff in code representation

Requires expertise, language-specific, task- specific model Implicitly re-learn syntactic & semantic regularities Sweet-spot model size, data, time…

10

[“code2vec”, POPL’2019] [“A General Path-based Representation …”, PLDI’2018]

slide-11
SLIDE 11

We compute the probability of a node by considering the paths in the Abstract Syntax Tree (AST) from all leaves into .

Pr (at ∣ a<t)

IfE Med R ?

Key Idea #3: a partial tree as AST paths

at

11

slide-12
SLIDE 12

IfE Med R ?

12

slide-13
SLIDE 13

AST Paths are simple paths over nodes in the AST . In previous works, we used AST paths to read code. In this work, we generate code by predicting the next node in a set of AST paths.

AST Paths

13

[“code2seq”, ICLR’2019]

IfE Med R ?

SLM, this work

slide-14
SLIDE 14

AST Paths capture long-range interactions

14

public static Path[] stat2Paths(FileStatus[] stats) { if (stats == null) return null; Path [] ret = new Path[stats.length]; for (int i = 0; i < stats.length; ++i){ ret[i] = stats[i].getPath(); } return ret; }

slide-15
SLIDE 15
  • Any sequential encoder to encode each arbitrary-length path into a fixed-length vector separately

(e.g., LSTM, transformer encoder)

  • Any contextualizer to let all paths interact

(e.g., transformer encoder)

  • Attend to the contextualized paths using the root path as the query

Model

IfE Med R ?

15

slide-16
SLIDE 16

Model

Encode paths Contextualize Attend Predict node

Greater

Query Context

IfE Med R ?

16

slide-17
SLIDE 17

Generate the Tree of: x > 1

IfE Med R ?

17

slide-18
SLIDE 18

Greater IfExpr Method Root ?

18

Generate the Tree of: x > 1

slide-19
SLIDE 19

Greaer Name IfEpr Mehod Roo ?

19

Generate the Tree of: x > 1

slide-20
SLIDE 20

Greaer Name IfEpr

x

Mehod Roo ?

20

Generate the Tree of: x > 1

slide-21
SLIDE 21

Geae Name InEp IfEp

x

Mehod Roo ?

21

Generate the Tree of: x > 1

slide-22
SLIDE 22

Greaer Name InEp IfEpr

x

Mehod Roo

1

x > 1

22

Generate the Tree of: x > 1

slide-23
SLIDE 23

myNewFoo = myObj.getFoo(); myNewFoo.setFooId(id);

Copy Mechanism

23

Vocabulary

full token copy subtoken copy

slide-24
SLIDE 24

public static Path[] stat2Paths(FileStatus[] stats) { if (stats == null) return null; Path[] ret = new Path[stats.length]; for (int i = 0; i < stats.length; ++i){ ret[i] = stats[i].getPath(); } return ret; }

24

Example - Java

Generated: (Java) stats[i].getPath() (25.2%) new Path(stats[i]) (3.3%) new Path(stats[i], charset) charset) (2.5%)

slide-25
SLIDE 25

public static string Camelize(this string input) { var word = input.Pascalize(); return word.Length > 0 ? word.Substring(0, 1).ToLower() + word.Substring(1) : word; }

25

Example - C#

Generated: (C#) word.Substring(0, 1) (14.1%) word.trim() (8.2%) word.Substring(1) (5.8%)

slide-26
SLIDE 26

acc@1 acc@5 tree@1 tree@5

55.3 39.1 24.8 18.0 50.5 34.7 24.1 16.6 47.4 31.8 21.4 14.2 49.7 34.3 23.2 16.9 52.4 38.1 23.0 16.8 41.7 30.8 11.8 8.1

seq2prod seq2tree LSTMs+attn+copy Transformer-small+copy Transformer-base+copy SLM

a.b > 1

tree

=

c.d > 2

Java Results (trained on 1.3M examples)

NAME.NAME > INT

26

15M 45M 12M 45M

Transformer base Transformer small LSTM +copy seq2tree seq2prod SLM (this work) SLM (this work) SLM (this work) SLM (this work)

1.4 3.8 8.3 8.3 4.4 4.8

13.6

7.9 4.8 5.6

2.9

slide-27
SLIDE 27

C# Results

27

acc@1 acc@5

45.5 37.6 35.9 22.3 37.9 26.4 27.1 15.2 18.5 13.0 12.0 7.4

SLM (this work) seq2seq +copy seq2tree +copy seq2seq +copy GNN →NAG PHOG SLM (this work) seq2seq +copy seq2tree +copy seq2seq +copy GNN →NAG PHOG

9.6 7.6 18.4 27.0 33.5 15.3 11.2 22.4 24.6 30.2

slide-28
SLIDE 28

28

Error Analysis

55.3 39.1 24.8 18.0 ?

What kind of mistakes are responsible for the gap between acc@k and tree@k ?

acc@1 tree@1 acc@5 tree@5 ?

SLM (this work):

slide-29
SLIDE 29

29

Error Analysis

74%: Single-token mismatch 30%: Single-subtoken mismatch

Single token 74% Single token 44% Single subtoken 30%

55.3 39.1 24.8 18.0

What kind of mistakes are responsible for the gap between acc@k and tree@k ?

slide-30
SLIDE 30

30

Error Analysis

public float getProgress() { this.readLock.lock(); try { if (this.currentAttempt != null) { return this.currentAttempt.getProgress(); } return 0; } finally { this.readLock.unlock(); } } Generated: Exact-match Tree-match Compiles this.currentAttempt.getCount() (31.3%) ✘ ✔ ✘

  • 1 f

(30.6%) ✘ ✘ ✔ this.currentAttempt.get() (1.5%) ✘ ✔ ✘ this.currentAttempt.getTime() (1.2%) ✘ ✔ ✘ this.currentAttempt.getProgress() (0.9%) ✔ ✔ ✔

slide-31
SLIDE 31

31

Error Analysis

public float getProgress() { this.readLock.lock(); try { if (this.currentAttempt != null) { return this.currentAttempt.getProgress(); } return 0; } finally { this.readLock.unlock(); } }

slide-32
SLIDE 32

http://AnyCodeGen.org

32

slide-33
SLIDE 33

33

http://AnyCodeGen.org

slide-34
SLIDE 34
  • 1. Predicting a missing subtree in a tree
  • 2. A structural language model over trees
  • 3. A partial AST as a set of paths

Structural Language Models of Code

Pr( ) =

n

t=0

Pr (at ∣ a<t)

Greaer Name InEp IfEpr

x

Mehod Roo

1

http://AnyCodeGen.org urialon@cs.technion.ac.il

34

Key points:

A