Neural-Augmented Static Analysis of Android Communication Jinman - - PowerPoint PPT Presentation

neural augmented static analysis of android communication
SMART_READER_LITE
LIVE PREVIEW

Neural-Augmented Static Analysis of Android Communication Jinman - - PowerPoint PPT Presentation

Neural-Augmented Static Analysis of Android Communication Jinman Zhao , Aws Albarghouthi, Vaibhav Rastogi, Somesh Jha, Damien Octeau University of Wisconsin-Madison, Google Use machine learning Key Idea to refine results from static


slide-1
SLIDE 1

Neural-Augmented Static Analysis of Android Communication

Jinman Zhao, Aws Albarghouthi, Vaibhav Rastogi, Somesh Jha, Damien Octeau University of Wisconsin-Madison, Google

slide-2
SLIDE 2

Key Idea

Use machine learning to refine results from static analysis.

slide-3
SLIDE 3

Static Analysis: False Positives

Program & Property Must True Unsure... Must False False Positives Static Analyzer Ranking problem

slide-4
SLIDE 4

Machine Learning to Augment

Program & Property Must True Unsure... Must False Static Analyzer Train Model Predict Likelihood ∈ [0, 1]

slide-5
SLIDE 5

Program & Property Inter-Component Communication Links Likelihood ∈ [0, 1]

Link Inference for Android Communication

Must True May Links Must False Static Analyzer Train Model Predict Must True Links Must False Links

slide-6
SLIDE 6

Task

Link Inference in Android Communication

slide-7
SLIDE 7

Restaurant 1234 Alice St. Orlando, FL Send a message

Android ICC: A User’s Experience

Intent Component w/ Filter

I’d like to make a reservation ... (xxx) xxx-xxxx Malicious APP

Inter-Component Communication

slide-8
SLIDE 8

resolution logic

Android ICC: An Example

Intent Filter

Code View

ICC link? (part of) the Yes!

slide-9
SLIDE 9

(Bigger part of) the resolution logic

(Octeau et al., POPL’16)

slide-10
SLIDE 10

Previous Work: PRIMO

  • PRIMO (Octeau et al., POPL’16) uses a hand-crafted

probabilistic model that assigns probabilities to ICC links inferred by static analysis.

○ Laborious, error-prone and requiring expert domain knowledge. ○ Difficulty catching up with constantly evolving Android system.

slide-11
SLIDE 11

Questions

slide-12
SLIDE 12

#1

How can we triage may links with minimal expert domain knowledge? Neural networks.

slide-13
SLIDE 13

#2

How can we process inputs of complex data types in a systematic way? Type-directed encoder.

slide-14
SLIDE 14

#3

How do our models perform? Very good!

slide-15
SLIDE 15

#4

Are the models learning the right things? Seems like so.

slide-16
SLIDE 16

We are not trying to…

  • Propose new NN

module

  • Eliminate use of domain

knowledge

  • Rule out manual effort

We are trying to…

  • Propose systematic way

to construct NN

  • Provide decent

performance without expert knowledge

  • Use less labour with

more automation

slide-17
SLIDE 17

Approach

How can we triage may links with minimal expert domain knowledge?

Part 1

slide-18
SLIDE 18

Link-Inference Neural Network

LINN: An end-to-end encoder-and-classifier architecture.

Model

Classifier Encoder Encoder Intent Filter [0,1]

May Links Must True Links Must True Links Train Predict

slide-19
SLIDE 19

Approach

How can we process inputs

  • f complex data types in a

systematic way?

Part 2

slide-20
SLIDE 20

Model

Classifier Encoder Encoder Intent Filter [0,1]

slide-21
SLIDE 21

Input Type

Type-Directed Encoder

TDE: mapping type signature to neural network architecture.

TDE Type signature Neural network TDE Template Neural network template Rules Instan tiation

slide-22
SLIDE 22

An example: Encoding Product Types

Instance t := (a, b) Type T := tuple(A, B)

encA a : A a-en : Rn encB b : A b-en : Rm encT encA encB t-en : Rl t : T comb Rn ⨉ Rm ➝ Rl

slide-23
SLIDE 23

Rules for type-directed encoding

slide-24
SLIDE 24

Android ICC: Our Abstraction

Type signatures

Intent intent := tuple(act, cats) Action act := optional(string) Categories cats := set(string) Filter filter := tuple(acts, cats) Actions acts := set(string) Categories cats := set(string) intent tuple act cats string string char char

  • ptional

list set list

slide-25
SLIDE 25

Type-Directed Encoder

intent tuple act cats string string char char

  • ptional

list set list Type signature Rules comb union aggr flat flat enum enum str-en char-en str-en char-en char char Neural network template act-en intent-en cats-en

slide-26
SLIDE 26

Type-Directed Encoder: Instantiation

TreeLSTM switch TreeLSTM CNN CNN lookup lookup Neural network (typed-tree) comb union aggr flat flat enum enum str-en char-en str-en char-en char char Neural network template Instantiation act-en intent-en cats-en

slide-27
SLIDE 27

Type-Directed Encoder: Instantiation

concat switch max RNN RNN lookup lookup Neural network (str-rnn) comb union aggr flat flat enum enum str-en char-en str-en char-en char char Neural network template Instantiation act-en intent-en cats-en

slide-28
SLIDE 28

A systematic way to build and explore structured NN.

slide-29
SLIDE 29

Experiments

Are our models correctly predicting links?

slide-30
SLIDE 30

Setup

# pairs # positive # negative training set 105,108 63,168 41,940 testing set 43,680 29,260 14,420

  • Dataset of 10,500 Android APPs from Google Play.
  • IC3 (Octeau et al., ICSE’15) for static analysis.
  • PRIMO’s abstract matching for may/must partition.
  • Simulated ground truth for may links.
  • 4 instantiations of the TDE architecture.
slide-31
SLIDE 31

All instantiated models perform as good as PRIMO.

slide-32
SLIDE 32

Our best model (typed-tree) fills the correlation gap by 72% compared to PRIMO despite the harder setting.

Correlation

slide-33
SLIDE 33

More Results for Our Best Model

ROC (left) and the distribution of predicted likelihood (right) from typed-tree model.

Distribution Correlation

slide-34
SLIDE 34

Interpretability

How do we know the model is learning the right thing?

slide-35
SLIDE 35

Sensitivity to Masking

Picking distinctive values Ignoring less useful parts

slide-36
SLIDE 36

Learned Encodings

Semantically closer values receive more similar encodings.

default (.*) None Visualized by t-SNE.

slide-37
SLIDE 37

Conclusion

  • Neural-augmented

static analysis

  • Type-directed encoder
  • Increased accuracy with

less domain knowledge

  • Interpretability study
slide-38
SLIDE 38

Future Works

  • Apply to other analysis

tasks

  • Push machine learning

into static analysis procedure

slide-39
SLIDE 39

Thanks for listening! Q & A