St Statistical De De-ob obfu fusc scation ion for or Android - - PowerPoint PPT Presentation

β–Ά
st statistical de de ob obfu fusc scation ion for or
SMART_READER_LITE
LIVE PREVIEW

St Statistical De De-ob obfu fusc scation ion for or Android - - PowerPoint PPT Presentation

www.srl.inf.ethz.ch St Statistical De De-ob obfu fusc scation ion for or Android oid Pe Petar Tsankov, ETH Zurich DeGua De uard Team Te Benjamin Veselin Petar Martin Bichsel Raychev Tsankov Vechev Why De-obfuscate Android


slide-1
SLIDE 1

De DeGua uard Te Team

St Statistical De De-ob

  • bfu

fusc scation ion for

  • r Android
  • id

Pe Petar Tsankov, ETH Zurich

Benjamin Bichsel Veselin Raychev Martin Vechev Petar Tsankov www.srl.inf.ethz.ch

slide-2
SLIDE 2

Why De-obfuscate Android Applications?

Google Play

Android binaries (APKs) (no code available) Open-source (code available) 2

F-Droid

slide-3
SLIDE 3

Why De-obfuscate Android Applications?

F-Droid 2.6M APKs 5K APKs

2

Google Play

Which APKs are malicious? Which ones use vulnerable libraries?

slide-4
SLIDE 4

Layout Obfuscation in Android

Obfuscate Descriptive application- specific names

package com.example.dbhelper class DBHelper extends SQLiteHelper { SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDatabase(); } Cursor execSQL(String str) { return db.rawQuery(str); } }

3 Names of API classes/methods

package a.b.c class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDatabase(); } Cursor c(String str) { return b.rawQuery(str); } }

API names remain Non-descriptive names

slide-5
SLIDE 5

Layout Obfuscation in Android

Obfuscate Descriptive application- specific names

package com.example.dbhelper class DBHelper extends SQLiteHelper { SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDatabase(); } Cursor execSQL(String str) { return db.rawQuery(str); } }

3 Names of API classes/methods

package a.b.c class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDatabase(); } Cursor c(String str) { return b.rawQuery(str); } }

API names remain Non-descriptive names

Security Challenges

Code inspection Third-party library detection … many others

slide-6
SLIDE 6

Layout Obfuscation in Android

Obfuscate

package com.example.dbhelper class DBHelper extends SQLiteHelper { SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDatabase(); } Cursor execSQL(String str) { return db.rawQuery(str); } } package a.b.c class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDatabase(); } Cursor c(String str) { return b.rawQuery(str); } }

3

Can we reverse layout obfuscation

Non-descriptive names Descriptive application- specific names API names remain Names of API classes/methods

slide-7
SLIDE 7

Layout Obfuscation in Android

Obfuscate

package com.example.dbhelper class DBHelper extends SQLiteHelper { SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDatabase(); } Cursor execSQL(String str) { return db.rawQuery(str); } } package a.b.c class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDatabase(); } Cursor c(String str) { return b.rawQuery(str); } }

3 www.apk-deguard.com Non-descriptive names Descriptive application- specific names API names remain Names of API classes/methods

Yes, with roughly 80% accuracy!

slide-8
SLIDE 8

Demo

slide-9
SLIDE 9

www.apk-deguard.com

Released in October 2016, so far: > 100GB distinct APKs de-obfuscated Reddit posts/comments Tweets

. . . . . .

4

slide-10
SLIDE 10

How Does DeGuard Work?

slide-11
SLIDE 11

DeGuard: System Overview

Static analysis MAP inference

class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } }

Prediction phase Open-source, unobfuscated APKs Learning phase

Static analysis

Probabilistic model 𝑄 )

Semantic representation

Obfuscated code De-obfuscated code 5

Training

class DBHelper extends SQLiteHelper{ SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDB(); } }

Transform

slide-12
SLIDE 12

Probabilistic Graphical Models

slide-13
SLIDE 13

SQLiteHelper getWritableDB a

extends gets field-in

b

Probabilistic Graphical Models

name1 name2 weight 𝑔

)

SQLiteHelper DBUtils 0.3 𝑔

* SQLiteHelper DBHelper

0.2 name1 name2 weight 𝑔

+ getWritableDB db

0.7 𝑔

, getWritableDB instance

0.4 name1 name2 weight 𝑔

  • DBUtils

instance 0.5 𝑔

. DBHelper db

0.4 𝑔

/ …

… …

Graph + features define a probabilistic graphical model 𝑄 𝑏, 𝑐 π‘‡π‘…π‘€π‘—π‘’π‘“πΌπ‘“π‘šπ‘žπ‘“π‘ , π‘•π‘“π‘’π‘‹π‘ π‘—π‘’π‘π‘π‘šπ‘“πΈπΆ ) = 1 π‘Ž exp (0.3 J 𝑔

) π‘‡π‘…π‘€π‘—π‘’π‘“πΌπ‘“π‘šπ‘žπ‘“π‘ , 𝑏

+ 0.2 J 𝑔

* π‘‡π‘…π‘€π‘—π‘’π‘“πΌπ‘“π‘šπ‘žπ‘“π‘ , 𝑏 + β‹― )

` class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } }

a, b Unknown variables Known variables 𝑔

), 𝑔 *, . . , 𝑔 /

SQLiteHelper, getWritableDB

Feature functions

6 For details see report on www.apk-deguard.com

slide-14
SLIDE 14

SQLiteHelper getWritableDB a

extends gets field-in

b

Probabilistic Graphical Models

name1 name2 weight 𝑔

)

SQLiteHelper DBUtils 0.3 𝑔

* SQLiteHelper DBHelper

0.2 name1 name2 weight 𝑔

+ getWritableDB db

0.7 𝑔

, getWritableDB instance

0.4 name1 name2 weight 𝑔

  • DBUtils

instance 0.5 𝑔

. DBHelper db

0.4 𝑔

/ …

… …

Graph + features define a probabilistic graphical model 𝑄 𝑏, 𝑐 π‘‡π‘…π‘€π‘—π‘’π‘“πΌπ‘“π‘šπ‘žπ‘“π‘ , π‘•π‘“π‘’π‘‹π‘ π‘—π‘’π‘π‘π‘šπ‘“πΈπΆ ) = 1 π‘Ž exp (0.3 J 𝑔

) π‘‡π‘…π‘€π‘—π‘’π‘“πΌπ‘“π‘šπ‘žπ‘“π‘ , 𝑏

+ 0.2 J 𝑔

* π‘‡π‘…π‘€π‘—π‘’π‘“πΌπ‘“π‘šπ‘žπ‘“π‘ , 𝑏 + β‹― )

` class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } }

a, b Unknown variables Known variables 𝑔

), 𝑔 *, . . , 𝑔 /

SQLiteHelper, getWritableDB

Feature functions

Next

How are the features and their weights learned?

6 For details see report on www.apk-deguard.com

slide-15
SLIDE 15

Learning

slide-16
SLIDE 16

Learning

Unobfuscated APKs

Static analysis

name1 name2 weight 𝑔

) SQLiteHelper

DBUtils 0.3 𝑔

* SQLiteHelper

DBHelper 0.2 𝑔

+ getWritableDB db

0.7 𝑔

, getWritableDB instance

0.4 𝑔

  • DBUtils

instance 0.5 𝑔

. DBHelper

db 0.4 𝑔

/ …

… … name1 name2 𝑔

) SQLiteHelper

DBUtils 𝑔

* SQLiteHelper

DBHelper 𝑔

+ getWritableDB db

𝑔

, getWritableDB instance

𝑔

  • DBUtils

instance 𝑔

. DBHelper

db 𝑔

/ …

…

Compute weights that maximize 𝑄 𝑃 = 𝑝O 𝐿 = 𝑙O for all training samples (𝑝O, 𝑙O)

Feature templates Features (with candidate names) Dependency graphs 28 templates Actual graphs have > 1,000 nodes > 2,000 > 100,000 7

Train model

slide-17
SLIDE 17

DeGuard: System Overview

Static analysis MAP inference

class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } }

Prediction phase Open-source, unobfuscated APKs Learning phase

Static analysis

Probabilistic model 𝑄 ) Obfuscated code De-obfuscated code

Training

class DBHelper extends SQLiteHelper{ SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDB(); } }

Transform

slide-18
SLIDE 18

Obfuscated Code

SQLiteHelper getWritableDB a

extends gets field-in

b

Prediction Phase

name1 name2 weight SQLiteHelper DBUtils 0.3 SQLiteHelper DBHelper 0.2 name1 name2 weight getWritableDB db 0.7 getWritableDB instance 0.4 name1 name2 weight DBUtils instance 0.5 DBHelper db 0.4 DBUtils db 0.2 DBHelper instance 0.2

class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } }

8

Static analysis

slide-19
SLIDE 19

Static analysis SQLiteHelper getWritableDB a

extends gets field-in

b

Prediction Phase

name1 name2 weight SQLiteHelper DBUtils 0.3 SQLiteHelper DBHelper 0.2 name1 name2 weight getWritableDB db 0.7 getWritableDB instance 0.4 name1 name2 weight DBUtils instance 0.5 DBHelper db 0.4 DBUtils db 0.2 DBHelper instance 0.2

Static analysis

8 Obfuscated Code

class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } }

Candidate assignment 𝒑 𝑸 𝒑 𝒍)* a = DBUtils b = instance 1.2 a = DBHelper b = db 1.3 a = DBUtils b = db 0.8 a = DBHelper b = instance 1.2

*Non-normalized

𝑝 βƒ— = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑄 𝑃 = 𝑝 βƒ—β€² 𝐿 = 𝑙

𝑝 βƒ—β€² ∈ Ξ©

MAP Inference

slide-20
SLIDE 20

Obfuscated Code

SQLiteHelper getWritableDB a

extends gets field-in

b

Prediction Phase

name1 name2 weight SQLiteHelper DBUtils 0.3 SQLiteHelper DBHelper 0.2 name1 name2 weight getWritableDB db 0.7 getWritableDB instance 0.4 name1 name2 weight DBUtils instance 0.5 DBHelper db 0.4 DBUtils db 0.2 DBHelper instance 0.2

class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } }

Static analysis

8

MAP Inference

*Non-normalized

Candidate assignment 𝒑 𝑸 𝒑 𝒍)* a = DBUtils b = instance 1.2 a = DBHelper b = db 1.3 a = DBUtils b = db 0.8 a = DBHelper b = instance 1.2

𝑝 βƒ— = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑄 𝑃 = 𝑝 βƒ—β€² 𝐿 = 𝑙

𝑝 βƒ—β€² ∈ Ξ©

slide-21
SLIDE 21

Obfuscated Code

SQLiteHelper getWritableDB DBHelper

extends gets field-in

db

Prediction Phase

name1 name2 weight SQLiteHelper DBUtils 0.3 SQLiteHelper DBHelper 0.2 name1 name2 weight getWritableDB db 0.7 getWritableDB instance 0.4 name1 name2 weight DBUtils instance 0.5 DBHelper db 0.4 DBUtils db 0.2 DBHelper instance 0.2

class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } }

Static analysis

8 Deobfuscated Code

class DBHelper extends SQLiteHelper { SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDB(); } }

Transform

Semantically the same?

slide-22
SLIDE 22

Semantics-Preserving De-obfuscation Constraints

class A int a Object b void a() class B extends A void b() void c(A a)

Syntactic constraints

e.g. β€œFields within a class must have distinct names”

Semantic constraints

e.g. β€œMethod overloads must be preserved” Freely renaming fields/variables/methods may change the application’s semantics 9

β‰  β‰ 

slide-23
SLIDE 23

DeGuard: System Overview

Static analysis MAP inference

class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } }

Prediction phase Open-source, unobfuscated APKs Learning phase

Static analysis

Probabilistic model 𝑄 ) Obfuscated code De-obfuscated code

class DBHelper extends SQLiteHelper{ SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDB(); } }

Transform

Training

slide-24
SLIDE 24

DeGuard Implementation

slide-25
SLIDE 25

DeGuard Implementation

www.apk-deguard.com

Β§ Static analysis framework for Java and Android

Static Analysis Learning and MAP Inference

Β§ Scalable open-source framework for structured prediction Β§ Open-source: http://nice2predict.org Β§ Training data: 2K open-source, unobfuscated Android applications 10

slide-26
SLIDE 26
  • 1. Can DeGuard reverse ProGuard?
  • 2. Can DeGuard detect third-party libraries?
  • 3. Is DeGuard useful for malware inspection?

Evaluation

Evaluation

slide-27
SLIDE 27

ProGuard Experiment

Source Code Obfuscated APK De-obfuscated APK Non-obfuscated APK =

?

11

slide-28
SLIDE 28

After Obfuscation

Fields Methods Classes Packages Total

20 40 60 80 100

% of program elements

  • nly 13%

known names 12 Known names

slide-29
SLIDE 29

Can DeGuard Reverse ProGuard?

20 40 60 80 100

Known names Correctly predicted names Mis-predicted names Package names can be directly used to predict third-party libraries 1.6% known names 80.6% correct names 12

80% of the names are identical to the original ones

Fields Methods Classes Packages Total % of program elements

  • nly 13%

known names

slide-30
SLIDE 30

Can DeGuard Detect Third-Party Libraries?

Source Code Library Code Obfuscated APK ProGuard

  • bfuscates library

package names De-obfuscated APK

?

Precision: 93.1% Recall: 91%

ProGuard

13

slide-31
SLIDE 31

Is DeGuard Useful for Malware Inspection?

class d { String a = System.getProperty(..) char[] b; byte [] c; byte[] a(String) {..} } class Base64 { String NL = System.getProperty(..) char[] ENC; byte [] DEC; byte[] decode(String) {..} }

We de-obfuscated all samples from the Android Malware Genome Project Malware Sample De-obfuscated Malware Sample Base64 Decoder

Reveals string decoders Reveals classes that handle sensitive data (e.g. Location) Hard to handle heavily-obfuscated code (e.g. reflection)

14

slide-32
SLIDE 32

package com.example.dbhelper class DBHelper extends SQLiteHelper { SQLiteDatabase db; public DBHelper(Context ctx) { db = getWritableDB(); } Cursor execSQL(String str) { return db.rawQuery(str); package a.b.c class a extends SQLiteHelper { SQLiteDatabase b; public a(Context ctx) { b = getWritableDB(); } Cursor c(String str) { return b.rawQuery(str);

Try online: www.apk-deguard.com

Fields Methods Classes Packages Total 20 40 60 80 100 SQLiteHelper getWritableDB a b name1 name2 weight SQLiteHelper DBUtils 0.3 SQLiteHelper DBHelper 0.2 name1 name2 weight getWritableDB db 0.7 getWritableDB instance 0.4

Probabilistic Models High Prediction Accuracy

For more info: http://plml.ethz.ch / ht http:/ ://srl srl.i .inf.e .ethz.ch z.ch

Summary