Uri Alon, Meital Zilberstein, Omer Levy, Eran Yahav
A General Path-Based Representation for Predicting Program Properties
Technion
University of Washington
1
A General Path-Based Representation for Predicting Program - - PowerPoint PPT Presentation
A General Path-Based Representation for Predicting Program Properties Uri Alon , Meital Zilberstein, Omer Levy, Eran Yahav University of Washington Technion 1 Motivating Example #1 Prediction of Variable Names in Python def sh3( c ): def sh3(
University of Washington
1
2
def sh3(c): p = Popen(c, stdout=PIPE, stderr=PIPE, shell=True)
r = p.returncode if r: raise CalledProcessError(r, c) else: return o.rstrip(), e.rstrip() def sh3(cmd): process = Popen(cmd, stdout=PIPE, stderr=PIPE, shell=True)
retcode = process.returncode if retcode: raise CalledProcessError(retcode, cmd) else: return out.rstrip(), err.rstrip()
3
function _______(object) { if (!object) return object; var clone = {}; for (var key in object) { clone[key] = object[key]; } return clone; } function cloneObject(object) { if (!object) return object; var clone = {}; for (var key in object) { clone[key] = object[key]; } return clone; }
4
Configuration conf = HBaseConfiguration.create(); try { Connection connection = ConnectionFactory.createConnection(conf);
}
import org.apache.hadoop.hbase.client.Connection; com.mysql.jdbc.Connection ?
? StackOverflow answer: Configuration conf = HBaseConfiguration.create(); try { Connection connection = ConnectionFactory.createConnection(conf);
}
5
Java JavaScript Python C# … Variable name prediction Bichsel et al. CCS’2016 (CRFs) Raychev et al. POPL’2015 (CRFs) Raychev et al. OOPSLA’2016 (Decision Trees) .. .. Method name prediction Allamanis et al. ICML’2016 (NNs) Raychev et al. OOPSLA’2016 (Decision Trees) .. .. Full types prediction .. .. .. .. .. … Raychev et al. PLDI’2014 (n-grams+RNNs) Bielik et al. ICML’2016 (PHOG) Raychev et al. OOPSLA’2016 (Decision Trees) Allamanis et al. ICML’2015 (Generative) ..
6
while (!done) { if (someCondition()) { done = true; } } while (!count) { if (someCondition()) { count = true; } } while (!done) { if (someCondition()) { done = true; } }
while (!done) { if (someCondition()) { done = true; } }
7
while (!done) { if (someCondition()) { done = true; } }
8
while (!done) { if (someCondition()) { done = true; } }
9
while (!done) { if (someCondition()) { done = true; } } while (!x) { foo(); if (bar() < 3) { log.info(zoo); x = true; } }
1. …↑ …↑ … 2. …↑ …↑ … 3. █↑ █ ↑ █ 4. …↑ …↑ … 5. …↑ …↑ … 6. …↑ …↑ … 7. …↑ …↑ … 1. …↑ …↑ … 2. …↑ …↑ … 3. …↑ …↑ … 4. …↑ …↑ … 5. █↑ █ ↑ █ 6. …↑ …↑ … 7. …↑ …↑ … 8. …↑ …↑ …
10
11
▪ 𝑊𝑏𝑚𝑣𝑓𝑡, 𝑊𝑏𝑚𝑣𝑓𝑡, 𝑄𝑏𝑢ℎ𝑡 → ℝ
12
SymbolRef↑UnaryPrefix!↑While↓If↓Assign=↓SymbolRef SymbolRef↑Call↑If↓Assign=↓SymbolRef SymbolRef↑Assign=↓True SymbolRef↑Assign=↓True
𝑤𝑝𝑑𝑏𝑐
𝑘
𝑋
𝑤𝑝𝑑𝑏𝑐
. . . . . .
𝑒
𝐷𝑤𝑝𝑑𝑏𝑐 . . . . . .
𝑒
13
14
while ˽ ( ! done ) ˽ {
while ˽ ( ! done ) ˽ {
while ˽ ( ! done ) ˽ {
17
18
19
10 20 30 40 50 60 70
CRFs + n-grams UnuglifyJS CRFs + No-relation +8.1 (16.2%) +7.3 (12.2%) +21.5 (61%)
Format: Absolute (Relative%)
20
5 10 15 20 25 30 35 40 45
+17.2 (74.1%) +19.8 (96.1%) Format: Absolute (Relative%)
21
▪ Path vocabulary size (JavaScript): 𝑚𝑓𝑜𝑢ℎ: 7 → 6: 13𝑁 → 11𝑁 𝑥𝑗𝑒𝑢ℎ: 3 → 2: 13𝑁 → 12𝑁
SymbolRef ↑ UnaryPrefix! ↑ While ↓ If ↓ Assign= ↓ SymbolRef …↑ While ↓…
▪ Path vocabulary size (Java): ~107 → ~102
22
50 52 54 56 58 60 62 64 66 68 3 4 5 6 7
Accuracy (%) Max path-length
AST Paths with max_width=3 AST Paths with max_width=2 AST Paths with max_width=1 50 52 54 56 58 60 62 64 66 68 3 4 5 6 7
Accuracy (%) Max path-length
AST Paths with max_width=3 AST Paths with max_width=2 AST Paths with max_width=1 UnuglifyJS
23
SymbolRef ↑ UnaryPrefix! ↑ While ↓ If ↓ Assign= ↓ SymbolRef SymbolRef ↑ While ↓ SymbolRef
the relation between them
24
function countSomething(x, t) { var c = 0; for (var i = 0, l = x.length; i < l ; i++) { if (x[i] === t) { c++; } } return c; }
25
function countSomething(array, target) { var count = 0; for (var i = 0, l = array.length; i < l ; i++) { if (array[i] === target) { count++ } } return count; }
26
public String sendGetRequest(String l) { HttpClient c = HttpClientBuilder.create().build(); HttpGet r = new HttpGet(l); String u = USER_AGENT; r.addHeader("User-Agent", u); HttpResponse s = c.execute(r); HttpEntity t = s.getEntity(); String g = EntityUtils.toString(t, "UTF-8"); return g; }
27
public String sendGetRequest(String url) { HttpClient client = HttpClientBuilder.create().build(); HttpGet request = new HttpGet(url); String user = USER_AGENT; request.addHeader("User-Agent", user); HttpResponse response = client.execute(request); HttpEntity entity = response.getEntity(); String result = EntityUtils.toString(entity, "UTF-8"); return result; }
Candidate 1. done 2. ended 3. complete 4. found 5. finished 6. stop 7. end 8. success
28
29
30
Language-specific, task- specific, require expertise Implicitly re-learn syntactic & semantic regularities Sweet-spot
▪ Surface text – too noisy ▪ Complex analyses are great, but specific to language and task ▪ AST paths – sweet spot of simplicity, expressivity and generalizability ▪ “Structural n-grams” ▪ A strong baseline for any machine learning for code task
Structural n-grams