A General Path-Based Representation for Predicting Program - PowerPoint PPT Presentation

A General Path-Based Representation for Predicting Program Properties Uri Alon , Meital Zilberstein, Omer Levy, Eran Yahav University of Washington Technion 1

Motivating Example #1 Prediction of Variable Names in Python def sh3( c ): def sh3( cmd ): p = Popen( c , stdout=PIPE, process = Popen( cmd , stdout=PIPE, stderr=PIPE, shell=True) stderr=PIPE, shell=True) o , e = p .communicate() out , err = process .communicate() r = p .returncode retcode = process .returncode if r : if retcode : raise CalledProcessError( r , c ) raise CalledProcessError( retcode , cmd ) else: else: return o .rstrip(), e .rstrip() return out .rstrip(), err .rstrip() 2

Motivating Example #2 Prediction of Method Names in JavaScript function cloneObject (object) { function _______ (object) { if (!object) if (!object) return object; return object; var clone = {}; var clone = {}; for (var key in object) { for (var key in object) { clone [ key ] = object[ key ]; clone [ key ] = object[ key ]; } } return clone; return clone; } } 3

Motivating Example #3 Prediction of full types in Java Configuration conf = HBaseConfiguration.create(); Configuration conf = HBaseConfiguration.create(); StackOverflow try { try { answer: Connection connection = ConnectionFactory.createConnection(conf); Connection connection = ConnectionFactory.createConnection(conf); } } com.mysql.jdbc.Connection ? org.apache.http.Connection ? import org.apache.hadoop.hbase.client.Connection; 4

Previously – separate techniques for each problem / language Java JavaScript Python C# … Variable Bichsel et al. Raychev et al. Raychev et al. .. .. name CCS’2016 POPL’2015 OOPSLA’2016 prediction (CRFs) (CRFs) (Decision Trees) Method Allamanis et al. Raychev et al. .. .. name ICML’2016 OOPSLA’2016 prediction (NNs) (Decision Trees) Full types .. .. .. .. .. Completely automatically! prediction … Raychev et al. Bielik et al. Raychev et al. Allamanis et al. .. PLDI’2014 ICML’2016 OOPSLA’2016 ICML’2015 (n-grams+RNNs) (PHOG) (Decision Trees) (Generative) 5

▪ Should work for many programming languages ▪ Should work for different tasks ▪ Useful in multiple learning algorithms How to represent a program element? while (! count ) { while (! done ) { while (! done ) { if ( someCondition ()) { if ( someCondition ()) { if ( someCondition ()) { count = true ; done = true ; done = true ; } } } } } } ▪ What are the properties that make “done” a “done”? 6

How to represent a program element? Key idea: while (! done ) { if ( someCondition ()) { done = true ; } } ▪ The semantic role of a program element is the set of all structured contexts in which it appears ▪ “done” is “done” because it appears in particular structured contexts 7

AST-paths while (! done ) { while (! done ) { if ( someCondition ()) { if ( someCondition ()) { A general and simple method to represent code in machine learning models done = true ; done = true ; } } } } For example: ( SymbolRef ↑ UnaryPrefix! ↑ While ↓ If ↓ Assign= ↓ SymbolRef , self) done is represented as the set of all its paths 8

Example training & testing pipeline Training …↑ …↑ … 1. while (! done ) { …↑ …↑ … 2. if ( someCondition ()) { █↑ █ ↑ █ 3. done = true; …↑ …↑ … 4. …↑ …↑ … 5. } …↑ …↑ … 6. } …↑ …↑ … 7. while (! x ) { …↑ …↑ … 1. foo(); …↑ …↑ … 2. …↑ …↑ … 3. if ( bar() < 3 ) { …↑ …↑ … 4. done log.info(zoo); █↑ █ ↑ █ 5. x = true; …↑ …↑ … 6. …↑ …↑ … } 7. …↑ …↑ … 8. } Testing 9

Advantages of AST-Paths representation ✓ Expressive enough to capture any property that is expressed syntactically. ✓ Independent of the programming language ✓ Automatically extractable – only requires a parser ✓ Not bound to the learning algorithm ✓ Works for different tasks 10

Predicting program properties with AST paths ▪ Off-the shelf algorithms ▪ Plug-in our representation Conditional Random Fields (CRFs) word2vec-based 11

Predicting properties with CRFs SymbolRef↑UnaryPrefix!↑While↓If↓Assign=↓ SymbolRef SymbolRef↑Call↑If↓Assign=↓ SymbolRef SymbolRef↑Assign=↓True SymbolRef↑Assign=↓True ▪ Nodes: program elements ▪ Factors: learned scoring functions: ▪ 𝑊𝑏𝑚𝑣𝑓𝑡, 𝑊𝑏𝑚𝑣𝑓𝑡, 𝑄𝑏𝑢ℎ𝑡 → ℝ ▪ The same as in (JSNice, Raychev et al., POPL’2015), but with our paths as factors 12

Predicting properties with word2vec ▪ Input: pairs of: 𝑥𝑝𝑠𝑒, 𝑑𝑝𝑜𝑢𝑓𝑦𝑢 𝑒 𝑒 . . . . . . . . ▪ Model: 𝑋 𝐷 𝑤𝑝𝑑𝑏𝑐 𝑤𝑝𝑑𝑏𝑐 ▪ word vectors: 𝑋 𝑤𝑝𝑑𝑏𝑐 . . . . ▪ context vectors: 𝐷 𝑤𝑝𝑑𝑏𝑐 ▪ Prediction: ▪ predict 𝑑 1 , … , 𝑑 𝑜 = argmax 𝑥 𝑗 ∈𝑋 𝑤𝑝𝑑𝑏𝑐 𝑥 𝑗 ⋅ σ 𝑘 𝑑 𝑘 13

Word2vec and different contexts ▪ Input: pairs of: 𝑥𝑝𝑠𝑒, 𝑑𝑝𝑜𝑢𝑓𝑦𝑢 ▪ Train word2vec with 3 types of contexts: ▪ Neighbor tokens ▪ Surrounding AST-nodes ▪ AST paths 14

Word2vec and different contexts ▪ Input: pairs of: 𝑥𝑝𝑠𝑒, 𝑑𝑝𝑜𝑢𝑓𝑦𝑢 ▪ Train word2vec with 3 types of contexts:  while ˽ ( ! done ) ˽ { ▪ Neighbor tokens ▪ Surrounding AST-nodes ▪ AST paths

Word2vec and different contexts ▪ Input: pairs of: 𝑥𝑝𝑠𝑒, 𝑑𝑝𝑜𝑢𝑓𝑦𝑢 ▪ Train word2vec with 3 types of contexts:  while ˽ ( ! done ) ˽ { ▪ Neighbor tokens ▪ Surrounding AST-nodes ▪ AST paths 17

Evaluation ▪ 4 programming languages ▪ Java, JavaScript, Python, C# ▪ 3 tasks ▪ predicting method names, variable names, full types (“... hbase.client.Connection ”) ▪ 2 learning algorithms ▪ CRFs, word2vec-based 18

Predicting variable names with CRFs Format: Absolute (Relative%) 70 +7.3 (12.2%) 60 +8.1 (16.2%) Accuracy (%) 50 AST Paths +21.5 (61%) 40 Baseline 30 20 CRFs + 10 CRFs + UnuglifyJS n-grams No-relation 0 Java JavaScript Python C# 19

Word2vec with different context types Format: Absolute (Relative%) 45 40 35 30 Accuracy (%) +17.2 (74.1%) +19.8 (96.1%) 25 20 15 10 5 0 AST paths Surrounding nodes Neighbor tokens Task: Variable names, word2vec, JavaScript 20

▪ Limiting path-length and path-width ▪ Path vocabulary size (JavaScript): Reducing the number of paths 𝑚𝑓𝑜𝑕𝑢ℎ: 7 → 6: 13𝑁 → 11𝑁 𝑥𝑗𝑒𝑢ℎ: 3 → 2: 13𝑁 → 12𝑁 ▪ Path abstraction SymbolRef ↑ UnaryPrefix ! ↑ While ↓ If ↓ Assign= ↓ SymbolRef ▪ Path vocabulary size (Java): ~10 7 → ~10 2 …↑ While ↓… 21

Effect of limiting path length and width Task: V aria riable le names, , CR CRFs, Ja JavaScrip ipt 68 68 AST Paths with max_width=3 AST Paths with max_width=3 66 66 AST Paths with max_width=2 64 64 AST Paths with max_width=2 AST Paths with max_width=1 62 62 AST Paths with max_width=1 60 60 UnuglifyJS Accuracy (%) Accuracy (%) 58 58 56 56 54 54 52 52 50 50 3 3 4 4 5 5 6 6 7 7 Max path-length Max path-length 22

AST Path Abstractions Task : Variable names, CRFs, Java SymbolRef ↑ UnaryPrefix ! ↑ While ↓ If ↓ Assign= ↓ SymbolRef SymbolRef ↑ While ↓ SymbolRef only values, without considering the relation between them 23

Example (JavaScript) function countSomething (x, t) { var c = 0; for (var i = 0, l = x.length; i < l ; i ++) { if (x[ i ] === t) { c ++; } } return c; } 24

Example (JavaScript) function countSomething (array, target) { var count = 0; for (var i = 0, l = array.length; i < l ; i ++) { if (array[ i ] === target) { count ++ } } return count ; } 25

Example (Java) public String sendGetRequest( String l) { HttpClient c = HttpClientBuilder.create().build(); HttpGet r = new HttpGet(l); String u = USER_AGENT; r.addHeader("User-Agent", u); HttpResponse s = c.execute(r); HttpEntity t = s.getEntity(); String g = EntityUtils.toString(t, "UTF-8"); return g; } 26

Example (Java) public String sendGetRequest( String url) { HttpClient client = HttpClientBuilder.create().build(); HttpGet request = new HttpGet(url); String user = USER_AGENT; request.addHeader("User-Agent", user); HttpResponse response = client.execute(request); HttpEntity entity = response.getEntity(); String result = EntityUtils.toString(entity, "UTF-8"); return result; } 27

Semantic Similarity Between Names CRFs Candidate 1. done 2. ended 3. complete 4. found 5. finished 6. stop 7. end 8. success 28

More Semantic Similarities Similarities req ~ request count ~ counter ~ total element ~ elem ~ el array ~ arr ~ ary ~ list res ~ result ~ ret i ~ j ~ index 29

A General Path-Based Representation for Predicting Program - PowerPoint PPT Presentation

A General Path-Based Representation for Predicting Program Properties Uri Alon , Meital Zilberstein, Omer Levy, Eran Yahav University of Washington Technion 1 Motivating Example #1 Prediction of Variable Names in Python def sh3( c ): def sh3(

A * A path finding algorithm. A path finding algorithm. Given a state space, such as a

On Path Generation, Path Following On Path Generation, Path Following and Time Coordination for

Using Off-Path and On-Path Signaling for Internet Security Saikat Guha, Paul Francis Cornell

Wednesday, November 30, 2016 3:41 PM General Page 1 General Page 2 General Page 3 General Page

Introduction to Path Analysis Ways to think about path analysis Path coefficients

Martha Brumfield, President and CEO C-Path Mission C-Path The Critical Path Institute is a

More On Paths Supplement to Chapter 4, Graph Theory Path definition What is a path? We

CSE 421 Longest Path in a DAG, LIS, Shortest Path with Negative Weights Shayan Oveis Gharan 1

1 minute Path tracing Bidirectional path tracing Progressive photon mapping 1 minute

K K Knowledge Knowledge l d l d Representation Representation Representation

AGENCY 1 Silkroad, Path to Development Silkroad, Path to Development Area of

Bisimulation and path logic for sheaves a 1 Sebastian Enqvist 2 Giovanni Cin 1 ILLC 2 ILLC and

Finding Shortest Paths Shortest Path Problem Shortest Path Problem We are given a graph G = ( V ,

Timing Analysis Timing Path Groups and Types Timing paths are grouped into path groups

Three Graph Algorithms Shortest Distance Paths Distance/Cost of a path in weighted graph sum of

ECE 242 Data Structures Lecture 31 Shortest Path Algorithms November 30, 2009 ECE242 L31:

From Applicative To Environmental Bisimulation Vasileios Koutavas

CSEP 514 Data Management for Data Science Section 1: Introduction to SQLite SQLite: What is it

Advances in Programming Languages APL10: State Transformers Ian Stark School of Informatics The

Applicative Functors Prof. Tom Austin San Jos State University Review : what is a functor? A

word2vec Kuan-Ting Lai 2020/5/28 Word2vec (Word Embeddings) Embed one-hot encoded word

Chinese Takeaway Whats on the menu now? Desmond Crinion Managing

Implementation and Evaluation of Mobility Models with OPNET Vortrag zur Masterarbeit von Thomas

The Economic Consequences of Trade and Immigration for Local Labor Markets Gordon Hanson