1
Michael Pradel TU Darmstadt
Joint work with Koushik Sen and Rohan Bavishi
Learning to Find Bugs (Work in progress) Michael Pradel TU - - PowerPoint PPT Presentation
Learning to Find Bugs (Work in progress) Michael Pradel TU Darmstadt 1 Joint work with Koushik Sen and Rohan Bavishi Automated Bug Detection Hundreds of bug Thousands of bug detectors patterns One analysis for each Existing bug
1
Joint work with Koushik Sen and Rohan Bavishi
2
One analysis for each
bug pattern
E.g., Google’s Error
Prone framework: 150+ different analyses
Existing bug detectors
miss most bugs
2
One analysis for each
bug pattern
E.g., Google’s Error
Prone framework: 150+ different analyses
Existing bug detectors
miss most bugs
3
Buggy code Correct code
Classifier
Train machine learning model
3
Buggy code Correct code
Classifier New code Buggy/Okay
Train machine learning model
3
Buggy code Correct code
Classifier New code Buggy/Okay
Train machine learning model
4
function setPoint(x, y) { ... } var x_dim = 23; var y_dim = 5; setPoint(y_dim , x_dim );
4
function setPoint(x, y) { ... } var x_dim = 23; var y_dim = 5; setPoint(y_dim , x_dim );
5
Find unusual and likely incorrect arguments Exploit similarities of identifier names
Finds incorrectly ordered, equally typed
arguments
Compares call sites of same method
2011
5
Find unusual and likely incorrect arguments Exploit similarities of identifier names
Improved precision Effective for multiple languages (Java, C, C++)
2013
5
Find unusual and likely incorrect arguments Exploit similarities of identifier names
Apply to arbitrary arguments Heuristic pruning of false positives
2016
5
Find unusual and likely incorrect arguments Exploit similarities of identifier names
Default check in Error Prone framework Found 2000+ new bugs
2017
6
Detect more bugs Special check for assertEquals calls Reduce false positives Hard-coded method names that suggest that
swapping is intended, e.g., transpose
6
Detect more bugs Special check for assertEquals calls Reduce false positives Hard-coded method names that suggest that
swapping is intended, e.g., transpose
7
8
Visit every function call with ≥ 2 arguments Positive example: Original order of arguments Negative example: Swap first two arguments
setPoint(x, y); setPoint(y, x);
9
x similar to x dim
x similar to width list similar to seq
10
Continuous vector representation for each word Similar words have similar vectors
”You shall know a word by the company it keeps” Context: Surrounding words in sentences
11
Surrounding nodes:
Parent, grandparent, siblings, etc.
Extract node types, node contents, and relative
positioning
12
window.setTimeout(callback , 1000);
CallExpr MemberExpr Identifier window Identifier setTimeout Arguments Identifier callBack Literal 1000
12
window.setTimeout(callback , 1000);
CallExpr MemberExpr Identifier window Identifier setTimeout Arguments Identifier callBack Literal 1000
13
Train neural network to predict context
Use hidden layer as representation for
Input layer: Identifier Hidden layer Output layer: Context
13
Train neural network to predict context
Use hidden layer as representation for
Input layer: Identifier Hidden layer Output layer: Context One-hot vectors Embedding vector
14
Given: Embeddings of callee and two
Train neural network:
Callee
Probability that correct Two hidden layers + +
15
Assignments of incorrect values Incorrect binary operators Swapped operands of binary operations
15
Assignments of incorrect values Incorrect binary operators Swapped operands of binary operations
var callback = function () { .. }
15
Assignments of incorrect values Incorrect binary operators Swapped operands of binary operations
var callback = function () { .. } "abc"
15
Assignments of incorrect values Incorrect binary operators Swapped operands of binary operations
var callback = function () { .. } "abc" if (x == undefined) ...
15
Assignments of incorrect values Incorrect binary operators Swapped operands of binary operations
var callback = function () { .. } "abc" if (x == undefined) ... >
15
Assignments of incorrect values Incorrect binary operators Swapped operands of binary operations
var callback = function () { .. } "abc" if (x == undefined) ... > bytes[i + 1] >> 4
15
Assignments of incorrect values Incorrect binary operators Swapped operands of binary operations
var callback = function () { .. } "abc" if (x == undefined) ... > bytes[i + 1] >> 4 4 >> bytes[i + 1]
16
100.000 JavaScript files from various
80.000 for training 20.000 for validation 68 million lines of code 37.3 million occurrences of identifiers 10.1 million occurrences of literals
17
// Callback must come before the // number of milliseconds to wait setTimeout (50, dojo.lang.hitch(this , function (){ ... })); // First argument must be smaller than // the second argument array.slice(3, 0);
18
Swapped arguments 0.9 0.92 0.94 0.96 0.98 1 0.2 0.4 0.6 0.8 1 Precision Recall AST embedding
18
Swapped arguments 0.9 0.92 0.94 0.96 0.98 1 0.2 0.4 0.6 0.8 1 Precision Recall AST embedding Random embedding
18
0.5 0.6 0.7 0.8 0.9 1 0.2 0.4 0.6 0.8 1 Precision Recall Wrong operator in binary operations AST embedding Random embedding
19
Same name Same meaning
Learn bug patterns from version histories?
Train a model per bug pattern
20
Buggy code Correct code Classifier
Train machine learning model