Neural-Augmented Static Analysis of Android Communication
Jinman Zhao, Aws Albarghouthi, Vaibhav Rastogi, Somesh Jha, Damien Octeau University of Wisconsin-Madison, Google
Neural-Augmented Static Analysis of Android Communication Jinman - - PowerPoint PPT Presentation
Neural-Augmented Static Analysis of Android Communication Jinman Zhao , Aws Albarghouthi, Vaibhav Rastogi, Somesh Jha, Damien Octeau University of Wisconsin-Madison, Google Use machine learning Key Idea to refine results from static
Jinman Zhao, Aws Albarghouthi, Vaibhav Rastogi, Somesh Jha, Damien Octeau University of Wisconsin-Madison, Google
Program & Property Must True Unsure... Must False False Positives Static Analyzer Ranking problem
Program & Property Must True Unsure... Must False Static Analyzer Train Model Predict Likelihood ∈ [0, 1]
Program & Property Inter-Component Communication Links Likelihood ∈ [0, 1]
Must True May Links Must False Static Analyzer Train Model Predict Must True Links Must False Links
Restaurant 1234 Alice St. Orlando, FL Send a message
Intent Component w/ Filter
I’d like to make a reservation ... (xxx) xxx-xxxx Malicious APP
Inter-Component Communication
resolution logic
Intent Filter
Code View
ICC link? (part of) the Yes!
(Bigger part of) the resolution logic
(Octeau et al., POPL’16)
probabilistic model that assigns probabilities to ICC links inferred by static analysis.
○ Laborious, error-prone and requiring expert domain knowledge. ○ Difficulty catching up with constantly evolving Android system.
How can we triage may links with minimal expert domain knowledge? Neural networks.
How can we process inputs of complex data types in a systematic way? Type-directed encoder.
How do our models perform? Very good!
Are the models learning the right things? Seems like so.
We are not trying to…
module
knowledge
We are trying to…
to construct NN
performance without expert knowledge
more automation
Part 1
LINN: An end-to-end encoder-and-classifier architecture.
Model
Classifier Encoder Encoder Intent Filter [0,1]
May Links Must True Links Must True Links Train Predict
Part 2
Model
Classifier Encoder Encoder Intent Filter [0,1]
Input Type
TDE: mapping type signature to neural network architecture.
TDE Type signature Neural network TDE Template Neural network template Rules Instan tiation
Instance t := (a, b) Type T := tuple(A, B)
encA a : A a-en : Rn encB b : A b-en : Rm encT encA encB t-en : Rl t : T comb Rn ⨉ Rm ➝ Rl
Rules for type-directed encoding
Type signatures
Intent intent := tuple(act, cats) Action act := optional(string) Categories cats := set(string) Filter filter := tuple(acts, cats) Actions acts := set(string) Categories cats := set(string) intent tuple act cats string string char char
list set list
intent tuple act cats string string char char
list set list Type signature Rules comb union aggr flat flat enum enum str-en char-en str-en char-en char char Neural network template act-en intent-en cats-en
TreeLSTM switch TreeLSTM CNN CNN lookup lookup Neural network (typed-tree) comb union aggr flat flat enum enum str-en char-en str-en char-en char char Neural network template Instantiation act-en intent-en cats-en
concat switch max RNN RNN lookup lookup Neural network (str-rnn) comb union aggr flat flat enum enum str-en char-en str-en char-en char char Neural network template Instantiation act-en intent-en cats-en
# pairs # positive # negative training set 105,108 63,168 41,940 testing set 43,680 29,260 14,420
Our best model (typed-tree) fills the correlation gap by 72% compared to PRIMO despite the harder setting.
Correlation
ROC (left) and the distribution of predicted likelihood (right) from typed-tree model.
Distribution Correlation
Picking distinctive values Ignoring less useful parts
Semantically closer values receive more similar encodings.
default (.*) None Visualized by t-SNE.