Modelgen: Mining Explicit Information Flow Specifications from - - PowerPoint PPT Presentation

modelgen
SMART_READER_LITE
LIVE PREVIEW

Modelgen: Mining Explicit Information Flow Specifications from - - PowerPoint PPT Presentation

Modelgen: Mining Explicit Information Flow Specifications from Concrete Executions Lazaro Clapp, Saswat Anand, Alex Aiken Stanford University I Why mine specifications? Whole-program static analysis Application Whole-program static


slide-1
SLIDE 1

Modelgen:

Mining Explicit Information Flow Specifications from Concrete Executions

Lazaro Clapp, Saswat Anand, Alex Aiken

Stanford University

slide-2
SLIDE 2

Why mine specifications?

I

slide-3
SLIDE 3

Application

Whole-program static analysis

slide-4
SLIDE 4

Application

Static Analysis

Whole-program static analysis

Malware? Bugs? Documentation

slide-5
SLIDE 5

Application

Whole-program static analysis?

Platform

(e.g. Android)

slide-6
SLIDE 6

Application

Static Analysis

Whole-program static analysis?

???

Platform

(e.g. Android)

slide-7
SLIDE 7

Application

Static Analysis

Whole-program static analysis?

???

  • Native code

Platform

(e.g. Android)

slide-8
SLIDE 8

Application

Static Analysis

Whole-program static analysis?

???

  • Native code
  • Reflection

Platform

(e.g. Android)

slide-9
SLIDE 9

Application

Static Analysis

Whole-program static analysis?

???

  • Native code
  • Reflection
  • Complex OOP patterns / indirection

Platform

(e.g. Android)

slide-10
SLIDE 10

Application

Static Analysis

Whole-program static analysis?

???

  • Native code
  • Reflection
  • Complex OOP patterns / indirection
  • Large (e.g. Android >2 MLOC, Java)

Platform

(e.g. Android)

slide-11
SLIDE 11

Application

Static Analysis

Whole-program static analysis?

???

  • Native code
  • Reflection
  • Complex OOP patterns / indirection
  • Large (e.g. Android >2 MLOC, Java)

Platform

(e.g. Android)

slide-12
SLIDE 12

Options: Best-case

Application Platform

(e.g. Android) Static Analysis Under-approximation (Very) Unsound False negatives

slide-13
SLIDE 13

Options: Worst-case

Application Platform

(e.g. Android) Static Analysis Over-approximation (Very) Imprecise False positives

slide-14
SLIDE 14

Options: Specifications

Application Platform

(e.g. Android)

  • Slight over-approximation
  • Manually written
  • Effort intensive*

* Our system (STAMP): Models for 1,116 methods, written over 2 years

slide-15
SLIDE 15

Mining Specifications

Application Platform

(e.g. Android)

  • Slight over-approximation
  • Manually written
  • Effort intensive
slide-16
SLIDE 16

Mining Specifications

Application Platform

(e.g. Android)

  • Slight over-approximation
  • Mined automatically using

dynamic analysis

slide-17
SLIDE 17

Mining Specifications

Application Platform

(e.g. Android) Dynamic Analysis Specifications

slide-18
SLIDE 18

Mining Specifications

Application Platform

(e.g. Android) Static Analysis Malware? Bugs? Documentation Dynamic Analysis Specifications

slide-19
SLIDE 19

Information flow specifications

II

slide-20
SLIDE 20

Static taint analysis

S.T.A.M.P. Static Analysis

#LOCATION -> ! INTERNET #CONTACTS -> ! INTERNET #PHONE_NUM -> !INTERNET

Information Flow Report Human Auditor

slide-21
SLIDE 21

Information flow specifications

// Set-up

SocketChannel socket = ...; CharBuffer buffer = ...; CharsetEncoder encoder = ...; TelephonyManager tMgr = ...;

// Leak phone number // ( #PHONE_NUM -> !INTERNET ) String mPhoneNumber = tMgr.getLine1Number(); CharBuffer b1 = buffer.put(mPhoneNumber,0,10); ByteBuffer bytebuffer = encoder.encode(b1); socket.write(bytebuffer);

slide-22
SLIDE 22

Information flow specifications

// Set-up

SocketChannel socket = ...; CharBuffer buffer = ...; CharsetEncoder encoder = ...; TelephonyManager tMgr = ...;

// Leak phone number // ( #PHONE_NUM -> !INTERNET ) String mPhoneNumber = tMgr.getLine1Number(); CharBuffer b1 = buffer.put(mPhoneNumber,0,10); ByteBuffer bytebuffer = encoder.encode(b1); socket.write(bytebuffer); #PHONE_NUM ->

slide-23
SLIDE 23

Information flow specifications

// Set-up

SocketChannel socket = ...; CharBuffer buffer = ...; CharsetEncoder encoder = ...; TelephonyManager tMgr = ...;

// Leak phone number // ( #PHONE_NUM -> !INTERNET ) String mPhoneNumber = tMgr.getLine1Number(); CharBuffer b1 = buffer.put(mPhoneNumber,0,10); ByteBuffer bytebuffer = encoder.encode(b1); socket.write(bytebuffer); #PHONE_NUM -> ... -> ... -> ... -> !INTERNET

slide-24
SLIDE 24

Information flow specifications

// Set-up

SocketChannel socket = ...; CharBuffer buffer = ...; CharsetEncoder encoder = ...; TelephonyManager tMgr = ...;

// Leak phone number // ( #PHONE_NUM -> !INTERNET ) String mPhoneNumber = tMgr.getLine1Number(); CharBuffer b1 = buffer.put(mPhoneNumber,0,10); ByteBuffer bytebuffer = encoder.encode(b1); socket.write(bytebuffer);

slide-25
SLIDE 25

Information flow specifications

// Set-up

SocketChannel socket = ...; CharBuffer buffer = ...; CharsetEncoder encoder = ...; TelephonyManager tMgr = ...;

// Leak phone number // ( #PHONE_NUM -> !INTERNET ) String mPhoneNumber = tMgr.getLine1Number(); CharBuffer b1 = buffer.put(mPhoneNumber,0,10); ByteBuffer bytebuffer = encoder.encode(b1); socket.write(bytebuffer); TelephonyManager.getLine1Number() #PHONE_NUM -> return #PHONE_NUM -> mPhoneNumber

slide-26
SLIDE 26

Information flow specifications

// Set-up

SocketChannel socket = ...; CharBuffer buffer = ...; CharsetEncoder encoder = ...; TelephonyManager tMgr = ...;

// Leak phone number // ( #PHONE_NUM -> !INTERNET ) String mPhoneNumber = tMgr.getLine1Number(); CharBuffer b1 = buffer.put(mPhoneNumber,0,10); ByteBuffer bytebuffer = encoder.encode(b1); socket.write(bytebuffer); TelephonyManager.getLine1Number() #PHONE_NUM -> return CharBuffer.put(String,int,int) arg#1 -> this arg#1 -> return this -> return #PHONE_NUM -> mPhoneNumber -> b1

slide-27
SLIDE 27

Information flow specifications

// Set-up

SocketChannel socket = ...; CharBuffer buffer = ...; CharsetEncoder encoder = ...; TelephonyManager tMgr = ...;

// Leak phone number // ( #PHONE_NUM -> !INTERNET ) String mPhoneNumber = tMgr.getLine1Number(); CharBuffer b1 = buffer.put(mPhoneNumber,0,10); ByteBuffer bytebuffer = encoder.encode(b1); socket.write(bytebuffer); TelephonyManager.getLine1Number() #PHONE_NUM -> return CharBuffer.put(String,int,int) arg#1 -> this arg#1 -> return this -> return CharsetEncoder.encode(CharBuffer) arg#1 -> return #PHONE_NUM -> mPhoneNumber -> b1 -> bytebuffer

slide-28
SLIDE 28

Information flow specifications

// Set-up

SocketChannel socket = ...; CharBuffer buffer = ...; CharsetEncoder encoder = ...; TelephonyManager tMgr = ...;

// Leak phone number // ( #PHONE_NUM -> !INTERNET ) String mPhoneNumber = tMgr.getLine1Number(); CharBuffer b1 = buffer.put(mPhoneNumber,0,10); ByteBuffer bytebuffer = encoder.encode(b1); socket.write(bytebuffer); TelephonyManager.getLine1Number() #PHONE_NUM -> return CharBuffer.put(String,int,int) arg#1 -> this arg#1 -> return this -> return CharsetEncoder.encode(CharBuffer) arg#1 -> return SocketChannel.write(ByteBuffer) arg#1 -> !INTERNET #PHONE_NUM -> mPhoneNumber -> b1 -> bytebuffer -> !INTERNET

slide-29
SLIDE 29

Technique

III

slide-30
SLIDE 30

Instrument, run, analyze

Instrument Run Analyze

slide-31
SLIDE 31

Instrument, run, analyze

Instrument Run Analyze

slide-32
SLIDE 32

Instrument, run, analyze

Instrument Run Analyze

slide-33
SLIDE 33

Instrument, run, analyze

Instrument Run Analyze

slide-34
SLIDE 34

Instrument, run, analyze

Instrument Run Analyze

slide-35
SLIDE 35

Method trace

Definition:

slide-36
SLIDE 36

Method trace

Definition:

  • Sequence of recorded operations between

method entry and return.

slide-37
SLIDE 37

Method trace

Definition:

  • Sequence of recorded operations between

method entry and return.

  • Including calls to other methods.
slide-38
SLIDE 38

Example

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

slide-39
SLIDE 39

Example

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

slide-40
SLIDE 40

Example

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

slide-41
SLIDE 41

Example

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

slide-42
SLIDE 42

Example

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

arg1->this

Spec:

arg2->this

slide-43
SLIDE 43

Example

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

arg1->this

Spec:

arg2->this this->return

slide-44
SLIDE 44

Example

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

arg1->this

Spec:

arg2->this this->return arg1->return arg2-> return

slide-45
SLIDE 45

Example: Initialization

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

ret = o . m ( arg1 , arg2 )

slide-46
SLIDE 46

Example: Taint propagation

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

ret = o . m ( arg1 , arg2 ) t

slide-47
SLIDE 47

ret = o . m ( arg1 , arg2 )

Example: Loads

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

  • 1
slide-48
SLIDE 48

ret = o . m ( arg1 , arg2 )

Example: Loads

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

  • 1
  • 2
slide-49
SLIDE 49

ret = o . m ( arg1 , arg2 )

Example: Loads

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

  • 3
slide-50
SLIDE 50

ret = o . m ( arg1 , arg2 )

Example: Loads

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

slide-51
SLIDE 51

ret = o . m ( arg1 , arg2 )

Example: Store

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

slide-52
SLIDE 52

ret = o . m ( arg1 , arg2 )

Example: Store

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

slide-53
SLIDE 53

ret = o . m ( arg1 , arg2 )

Example: Store

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

slide-54
SLIDE 54

ret = o . m ( arg1 , arg2 )

Example: Store

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

this <- arg2 this <- arg1

slide-55
SLIDE 55

ret = o . m ( arg1 , arg2 )

Example: Store

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

this <- arg2 this <- arg1

! : Information flow goes in the opposite direction

  • f reachability
slide-56
SLIDE 56

ret = o . m ( arg1 , arg2 )

Example: Store

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

this <- arg2 this <- arg1

slide-57
SLIDE 57

ret = o . m ( arg1 , arg2 )

Example: Store

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

this <- arg2 this <- arg1

slide-58
SLIDE 58

ret = o . m ( arg1 , arg2 )

Example: Store

  • . m ( arg1 , arg2 ) :

t = arg1 ⊗ arg2

  • 1 = o.f
  • 2 = o1.g
  • 3 = o.g
  • 2.f = t

return o

Initialization

return <- arg1 return <- arg2 return <- this this <- arg2 this <- arg1

slide-59
SLIDE 59

ret = o . m ( arg1 , arg2 )

Example: Store

Initialization

arg1->this

Spec:

arg2->this this->return arg1->return arg2-> return

slide-60
SLIDE 60

Merging specifications

r = max (arg1 , arg2 )

slide-61
SLIDE 61

Merging specifications

r = max ( 5 , 3 )

return <- arg1

Trace 1

slide-62
SLIDE 62

Merging specifications

r = max ( 5 , 3 )

return <- arg1

r = max ( 2 , 7 )

return <- arg2

Trace 1 Trace 1I

slide-63
SLIDE 63

Merging specifications

r = max ( 5 , 3 )

return <- arg1

r = max ( 2 , 7 )

return <- arg2

r = max (arg1 , arg2 )

return <- arg1 return <- arg2

U

Trace 1 Trace 1I

slide-64
SLIDE 64

Notes and gotchas

  • Native code / instrumentation holes
  • Arrays, threading, exceptions
  • Method calls (and recursion)
  • Etc.
slide-65
SLIDE 65

Notes and gotchas

  • Native code / instrumentation holes
  • Arrays, threading, exceptions
  • Method calls (and recursion)
  • Etc.
slide-66
SLIDE 66

Experiments and results

IV

slide-67
SLIDE 67

Experiment I: Man vs Machine

309 methods, 51 classes

slide-68
SLIDE 68

Experiment I: Man vs Machine

309 methods, 51 classes 440 TP / 2 FP

99.55% Precision 99.63% Precision

540 TP / 2 FP

slide-69
SLIDE 69

Experiment I: Man vs Machine

309 methods, 51 classes

96.36% Recall vs Manual

slide-70
SLIDE 70

Experiment I: Man vs Machine

309 methods, 51 classes

97.12% Recall vs Total (TP) 79.14% Recall vs Total (TP)

slide-71
SLIDE 71

Experiment II: STAMP

slide-72
SLIDE 72

Experiment II: STAMP

  • 242 apps (Google Play)
  • Base:

3.08 flows (x app)

  • Modelgen:

4.07 flows (x app)

slide-73
SLIDE 73

Experiment II: STAMP

  • 242 apps (Google Play)
  • Base:

3.08 flows (x app)

  • Modelgen:

4.07 flows (x app)

slide-74
SLIDE 74

Experiment II: STAMP

Flows Apps

slide-75
SLIDE 75

Experiment II: STAMP

Flows Apps

Flows w/ Manual specs (TP+FP)

slide-76
SLIDE 76

Experiment II: STAMP

Flows Apps

Flows w/ Manual specs (TP+FP) New TP

slide-77
SLIDE 77

Experiment II: STAMP

Flows Apps

Flows w/ Manual specs (TP+FP) New TP New Unknown

slide-78
SLIDE 78

Conclusions and related work

V

slide-79
SLIDE 79

Key points

  • Platform code specifications
slide-80
SLIDE 80

Key points

  • Platform code specifications
  • Dynamic analysis > manual effort (sometimes)
slide-81
SLIDE 81

Key points

  • Platform code specifications
  • Dynamic analysis > manual effort (sometimes)
  • For IF Specs: > 97% precision and recall
slide-82
SLIDE 82

(Some) Related work

Dynamic techniques for generating API specifications

  • V. K. Palepu, G. H. Xu, and J. A. Jones. Improving efficiency of dynamic analysis with dynamic dependence summaries. ASE

2013

  • A. W. Biermann and J. A. Feldman. On the synthesis of finite-state machines from samples of their behavior. IEEE ToC, 1972.
  • G. Ammons, R. Bodık, and J. R. Larus. Mining specifications. POPL 2002.
  • T. Xie, E. Martin, and H. Yuan. Automatic extraction of abstract-object-state machines from unit-test executions. ICSE 2006
  • D. Lorenzoli, L. Mariani, and M. Pezze. Automatic generation of software behavioral models. ICSE 2008
  • J. W. Nimmer and M. D. Ernst. Automatic generation of program specifications. ISSTA 2002

Dynamic / Static taint analysis

  • J. A. Clause, W. Li, and A. Orso. Dytan: A generic dynamic taint analysis framework. ISSTA 2007
  • W. Enck, P. Gilbert, B. gon Chun, L. P. Cox, J. Jung, P. McDaniel, and A. Sheth. Taintdroid: An information-flow tracking

system for realtime privacy monitoring on smartphones. OSDI 2010

  • S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, Y. L. Traon, D. Octeau, and P. McDaniel. Flowdroid: Precise context,

flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps. PLDI 2014

  • M. Sridharan, S. Artzi, M. Pistoia, S. Guarnieri, O. Tripp, and R. Berg. F4F: Taint analysis of framework-based web
  • applications. OOPSLA 2011
  • O. Bastani, S. Anand, and A. Aiken. Specification inference using context-free language reachability. POPL 2015
slide-83
SLIDE 83

Code and models available

https://bitbucket.org/lazaro_clapp/droidrecord

slide-84
SLIDE 84

Code and models available

https://bitbucket.org/lazaro_clapp/droidrecord

Questions?