St Statistical De Deobfuscati tion fo for Android Applications - - PowerPoint PPT Presentation

st statistical de deobfuscati tion fo for android
SMART_READER_LITE
LIVE PREVIEW

St Statistical De Deobfuscati tion fo for Android Applications - - PowerPoint PPT Presentation

St Statistical De Deobfuscati tion fo for Android Applications Benjamin Veselin Petar Martin Bichsel Raychev Tsankov Vechev Department of Computer Science Why De-obfuscate? Android binaries (APKs) (no code available) Number of APKs


slide-1
SLIDE 1

St Statistical De Deobfuscati tion fo for Android Applications

Benjamin Bichsel Veselin Raychev Petar Tsankov Martin Vechev Department of Computer Science

slide-2
SLIDE 2

Why De-obfuscate?

Google Play

Android binaries (APKs) (no code available)

Number of APKs on Google Play 2.4M APKs

’10 ’12 ’14 ’16

slide-3
SLIDE 3

Layout Obfuscation in Android

Obfuscate Non-descriptive names Names provide key semantic information

package com.example.dbhelper class DBHelper extends SQLiteHelper { SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDatabase(); } Cursor execSQL(String str) { return db.rawQuery(str); } } package a.b.c class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDatabase(); } Cursor c(String str) { return b.rawQuery(str); } }

Some names remain

slide-4
SLIDE 4

Layout Obfuscation in Android

Obfuscate Non-descriptive names Names provide key semantic information

package com.example.dbhelper class DBHelper extends SQLiteHelper { SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDatabase(); } Cursor execSQL(String str) { return db.rawQuery(str); } } package a.b.c class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDatabase(); } Cursor c(String str) { return b.rawQuery(str); } }

Some names remain

Security Challenges

Code Inspection Third-party Library Detection … many others

slide-5
SLIDE 5

Layout Obfuscation in Android

Non-descriptive names Names provide key semantic information

package a.b.c class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDatabase(); } Cursor c(String str) { return b.rawQuery(str); } } package com.example.dbhelper class DBHelper extends SQLiteHelper { SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDatabase(); } Cursor execSQL(String str) { return db.rawQuery(str); } }

Some names remain

Can we reverse layout obfuscation

slide-6
SLIDE 6

Layout Obfuscation in Android

package com.example.dbhelper class DBHelper extends SQLiteHelper { SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDatabase(); } Cursor execSQL(String str) { return db.rawQuery(str); } } package a.b.c class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDatabase(); } Cursor c(String str) { return b.rawQuery(str); } }

Non-descriptive names Names provide key semantic information

Yes, with roughly 80% accuracy!

www.apk-deguard.com

slide-7
SLIDE 7

www.apk-deguard.com

Released last week, so far: > 5K users > 5GB APKs Reddit posts/comments Tweets

. . . . . .

slide-8
SLIDE 8

How Does DeGuard Work?

slide-9
SLIDE 9

DeGuard: System Overview

Static analysis Transform MAP Inference

class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } } class DBHelper extends SQLiteHelper{ SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDB(); } }

Prediction Phase Open-source, unobfuscated applications Learning Phase Static analysis Training

Probabilistic model

𝑄 )

Semantic representation

Obfuscated Code De-obfuscated Code

slide-10
SLIDE 10

Probabilistic Graphical Models

slide-11
SLIDE 11

SQLiteHelper getWritableDB a

extends gets field-in

b

Probabilistic Graphical Models

name1 name2 weight 𝑔

% SQLiteHelper DBUtils

0.3 𝑔

& SQLiteHelper DBHelper

0.2 name1 name2 weight 𝑔

' getWritableDB db

0.7 𝑔

( getWritableDB instance

0.4 name1 name2 weight 𝑔

) DBUtils

instance 0.5 𝑔

* DBHelper db

0.4 𝑔

+ …

… …

Graph + features define a probabilistic graphical model 𝑃 𝐿 𝑄 ) = = 𝑄 𝑏, 𝑐 𝑇𝑅𝑀𝑗𝑢𝑓𝐼𝑓𝑚𝑞𝑓𝑠, 𝑕𝑓𝑢𝑋𝑠𝑗𝑢𝑏𝑐𝑚𝑓𝐸𝐶 ) = 1 𝑎 exp (0.3 I 𝑔

% 𝑇𝑅𝑀𝑗𝑢𝑓𝐼𝑓𝑚𝑞𝑓𝑠, 𝑏

+ 0.2 I 𝑔

& 𝑇𝑅𝑀𝑗𝑢𝑓𝐼𝑓𝑚𝑞𝑓𝑠, 𝑏 + ⋯ )

class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } }

𝑃 Unknown variables Known variables 𝑔

%, 𝑔 &, . .

𝐿 Feature functions

slide-12
SLIDE 12

SQLiteHelper getWritableDB a

extends gets field-in

b

Probabilistic Graphical Models

name1 name2 weight 𝑔

% SQLiteHelper DBUtils

0.3 𝑔

& SQLiteHelper DBHelper

0.2 name1 name2 weight 𝑔

' getWritableDB db

0.7 𝑔

( getWritableDB instance

0.4 name1 name2 weight 𝑔

) DBUtils

instance 0.5 𝑔

* DBHelper db

0.4 𝑔

+ …

… …

Graph + features define a probabilistic graphical model 𝑃 𝐿 𝑄 ) = = 𝑄 𝑏, 𝑐 𝑇𝑅𝑀𝑗𝑢𝑓𝐼𝑓𝑚𝑞𝑓𝑠, 𝑕𝑓𝑢𝑋𝑠𝑗𝑢𝑏𝑐𝑚𝑓𝐸𝐶 ) = 1 𝑎 exp (0.3 I 𝑔

% 𝑇𝑅𝑀𝑗𝑢𝑓𝐼𝑓𝑚𝑞𝑓𝑠, 𝑏

+ 0.2 I 𝑔

& 𝑇𝑅𝑀𝑗𝑢𝑓𝐼𝑓𝑚𝑞𝑓𝑠, 𝑏 + ⋯ )

class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } }

𝑃 Unknown variables Known variables 𝑔

%, 𝑔 &, . .

𝐿 Feature functions

Next

How are the weights and features learned?

slide-13
SLIDE 13

Learning

slide-14
SLIDE 14

Learning

Unobfuscated APKs

name1 name2 weight 𝑔

% SQLiteHelper

DBUtils 0.3 𝑔

& SQLiteHelper

DBHelper 0.2 𝑔

' getWritableDB db

0.7 𝑔

( getWritableDB instance

0.4 𝑔

) DBUtils

instance 0.5 𝑔

* DBHelper

db 0.4 𝑔

+ …

… … name1 name2 𝑔

% SQLiteHelper

DBUtils 𝑔

& SQLiteHelper

DBHelper 𝑔

' getWritableDB db

𝑔

( getWritableDB instance

𝑔

) DBUtils

instance 𝑔

* DBHelper

db 𝑔

+ …

… Compute weights that maximize 𝑄 𝑃 = 𝑝N 𝐿 = 𝑙N for all training samples (𝑝N, 𝑙N)

Static analysis Train Model Feature templates

Features (with candidate names) Dependency graphs

28 templates Actual graphs have >1,000 nodes >2,000 >100,000

slide-15
SLIDE 15

DeGuard: System Overview

Static analysis Transform MAP Inference

class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } } class DBHelper extends SQLiteHelper{ SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDB(); } }

Prediction Phase Open-source, unobfuscated applications Learning Phase Static analysis Training Obfuscated Code De-obfuscated Code

Probabilistic model

𝑄 )

slide-16
SLIDE 16

DeGuard: System Overview

Static analysis Transform MAP Inference

class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } } class DBHelper extends SQLiteHelper{ SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDB(); } }

Prediction Phase Obfuscated Code De-obfuscated Code

Probabilistic model

𝑄 )

slide-17
SLIDE 17

Obfuscated Code

SQLiteHelper getWritableDB a

extends gets field-in

b

Prediction Phase

name1 name2 weight SQLiteHelper DBUtils 0.3 SQLiteHelper DBHelper 0.2 name1 name2 weight getWritableDB db 0.7 getWritableDB instance 0.4 name1 name2 weight DBUtils instance 0.5 DBHelper db 0.4 DBUtils db 0.2 DBHelper instance 0.2

class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } } Static analysis

slide-18
SLIDE 18

Obfuscated Code

SQLiteHelper getWritableDB a

extends gets field-in

b

Prediction Phase

name1 name2 weight SQLiteHelper DBUtils 0.3 SQLiteHelper DBHelper 0.2 name1 name2 weight getWritableDB db 0.7 getWritableDB instance 0.4 name1 name2 weight DBUtils instance 0.5 DBHelper db 0.4 DBUtils db 0.2 DBHelper instance 0.2

class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } } Program analysis

MAP Inference

Candidate assignment 𝒑 𝑸 𝒑 𝒍)* a = DBUtils b = instance 1.2 a = DBHelper b = db 1.3 a = DBUtils b = db 0.8 a = DBHelper b = instance 1.2

*Non-normalized

𝑝 ⃗ = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑄 𝑃 = 𝑝 ⃗′ 𝐿 = 𝑙

𝑝 ⃗′ ∈ Ω

slide-19
SLIDE 19

Obfuscated Code

SQLiteHelper getWritableDB a

extends gets field-in

b

Prediction Phase

name1 name2 weight SQLiteHelper DBUtils 0.3 SQLiteHelper DBHelper 0.2 name1 name2 weight getWritableDB db 0.7 getWritableDB instance 0.4 name1 name2 weight DBUtils instance 0.5 DBHelper db 0.4 DBUtils db 0.2 DBHelper instance 0.2

class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } } Program analysis

MAP Inference

Candidate assignment 𝒑 𝑸 𝒑 𝒍)* a = DBUtils b = instance 1.2 a = DBHelper b = db 1.3 a = DBUtils b = db 0.8 a = DBHelper b = instance 1.2

*Non-normalized

𝑝 ⃗ = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑄 𝑃 = 𝑝 ⃗′ 𝐿 = 𝑙

𝑝 ⃗′ ∈ Ω

slide-20
SLIDE 20

Obfuscated Code

SQLiteHelper getWritableDB DBHelper

extends gets field-in

db

Prediction Phase

name1 name2 weight SQLiteHelper DBUtils 0.3 SQLiteHelper DBHelper 0.2 name1 name2 weight getWritableDB db 0.7 getWritableDB instance 0.4 name1 name2 weight DBUtils instance 0.5 DBHelper db 0.4 DBUtils db 0.2 DBHelper instance 0.2

class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } } Static analysis

Deobfuscated Code

class DBHelper extends SQLiteHelper { SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDB(); } } Transform

slide-21
SLIDE 21

Preserving Semantics

class A int a Object b void a() class B extends A void b() void c(A a)

Syntactic constraints

e.g. fields within a class must have distinct names

Semantic constraints

e.g. method overloads must be preserved Freely renaming fields/variables/methods may change the program semantics must have distinct names

must have distinct names must not

  • verride

method a()

slide-22
SLIDE 22

DeGuard: System Overview

Static analysis Transform MAP Inference

class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } } class DBHelper extends SQLiteHelper{ SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDB(); } }

Prediction Phase Open-source, unobfuscated applications Learning Phase Static analysis Training Obfuscated Code De-obfuscated Code

slide-23
SLIDE 23

DeGuard Implementation

slide-24
SLIDE 24

DeGuard Implementation

www.apk-deguard.com

§ Static analysis framework for Java and Android

Static Analysis Learning and MAP Inference

§ Scalable open-source framework for structured prediction § Open-source: http://nice2predict.org § Training data: 2K open-source, unobfuscated Android applications

slide-25
SLIDE 25
  • 1. Can DeGuard reverse ProGuard?
  • 2. Can DeGuard detect third-party libraries?
  • 3. Is DeGuard useful for malware inspection?

Evaluation

Evaluation

slide-26
SLIDE 26

ProGuard Experiment

Source Code Obfuscated APK De-obfuscated APK Non-obfuscated APK

= ?

slide-27
SLIDE 27

After Obfuscation

Fields Methods Classes Packages Total 20 40 60 80 100 % of program elements

  • nly 13%

known names

Known names

slide-28
SLIDE 28

Can DeGuard reverse ProGuard?

Fields Methods Classes Packages Total 20 40 60 80 100 % of program elements Known names Correctly predicted names Mis-predicted names

Package names are directly used to predict third-party libraries 1.6% known names 80.6% correct names

80% of the names are identical to the original ones

i.e., identical to the original names

slide-29
SLIDE 29

Can DeGuard Detect Third-Party Libraries?

Library Code Source Code Obfuscated APK

ProGuard

  • bfuscates library

package names

De-obfuscated APK

?

Precision: 93.1% Recall: 91%

ProGuard

slide-30
SLIDE 30

Is DeGuard Useful for Malware Inspection?

class d { String a = System.getProperty(..) char[] b; byte [] c; byte[] a(String) {} } class Base64 { String NL = System.getProperty(..) char[] ENC; byte [] DEC; byte[] decode(String) {} }

De-obfuscating samples from the Android Malware Genome Project

Malware Sample De-obfuscated Malware Sample

Base64 Decoder Reveals string decoders Reveals classes that handle sensitive data (e.g. Location) Hard to handle heavily-obfuscated code (e.g. reflection)

slide-31
SLIDE 31

package com.example.dbhelper class DBHelper extends SQLiteHelper { SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDB(); } Cursor execSQL(String str) { return db.rawQuery(str); package a.b.c class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } Cursor c(String str) { return b.rawQuery(str);

Try online: www.apk-deguard.com

Fields Methods Classes Packages Total 20 40 60 80 100 SQLiteHelper getWritableDB a b name1 name2 weight SQLiteHelper DBUtils 0.3 SQLiteHelper DBHelper 0.2 name1 name2 weight getWritableDB db 0.7 getWritableDB instance 0.4

Probabilistic Models High Prediction Accuracy

More info: http://www.srl.inf.ethz.ch/spas

Summary