Characterizing the Strength of Software Obfuscation Against - - PowerPoint PPT Presentation

characterizing the strength of software obfuscation
SMART_READER_LITE
LIVE PREVIEW

Characterizing the Strength of Software Obfuscation Against - - PowerPoint PPT Presentation

Fakultt fr Informatik T echnische Universitt Mnchen Dagstuhl Seminar 17281 Characterizing the Strength of Software Obfuscation Against Automated Attacks Sebastian Banescu, BMW Group Outline 1 Introduction 2 Obfuscation in Theory 3


slide-1
SLIDE 1

T echnische Universität München Fakultät für Informatik

Dagstuhl Seminar 17281

Characterizing the Strength of Software Obfuscation Against Automated Attacks

Sebastian Banescu, BMW Group

slide-2
SLIDE 2

Outline

1 Introduction 2 Obfuscation in Theory 3 Obfuscation in Practice 4 Characterizing the Strength of Obfuscation

July 9 – 14, 2017 Sebastian Banescu, BMW Group 2

slide-3
SLIDE 3

Introduction

Question: What is the goal of “software obfuscation”?

July 9 – 14, 2017 Sebastian Banescu, BMW Group 3

slide-4
SLIDE 4

Introduction

Question: What is the goal of “software obfuscation”?

  • 1. Protect data in transit

against man-in-the-middle

July 9 – 14, 2017 Sebastian Banescu, BMW Group 3

slide-5
SLIDE 5

Introduction

Question: What is the goal of “software obfuscation”?

  • 1. Protect data in transit

against man-in-the-middle

  • 2. Protect software against

remote attackers, who exploit vulnerabilities (e.g. buffer overflows)

July 9 – 14, 2017 Sebastian Banescu, BMW Group 3

slide-6
SLIDE 6

Introduction

Question: What is the goal of “software obfuscation”?

  • 1. Protect data in transit

against man-in-the-middle

  • 2. Protect software against

remote attackers, who exploit vulnerabilities (e.g. buffer overflows)

  • 3. Protect software against

malicious users who want to reverse engineer it

July 9 – 14, 2017 Sebastian Banescu, BMW Group 3

slide-7
SLIDE 7

Introduction

Question: What is the goal of “software obfuscation”?

  • 1. Protect data in transit

against man-in-the-middle

  • 2. Protect software against

remote attackers, who exploit vulnerabilities (e.g. buffer overflows)

  • 3. Protect software against

malicious users who want to reverse engineer it

Answer: The 3rd option

July 9 – 14, 2017 Sebastian Banescu, BMW Group 3

slide-8
SLIDE 8

Introduction

Informal Definition of Obfuscation: To obfuscate a program P means to transform it into an equivalent program P′ from which it is harder to extract information than from P.

July 9 – 14, 2017 Sebastian Banescu, BMW Group 4

slide-9
SLIDE 9

Introduction

Informal Definition of Obfuscation: To obfuscate a program P means to transform it into an equivalent program P′ from which it is harder to extract information than from P. Benefits of obfuscation: Obfuscation aims to raise the bar for reverse engineering Obfuscation is last layer of defense against attackers (e.g. if attacker bypasses firewall and OS authentication)

July 9 – 14, 2017 Sebastian Banescu, BMW Group 4

slide-10
SLIDE 10

Introduction

Informal Definition of Obfuscation: To obfuscate a program P means to transform it into an equivalent program P′ from which it is harder to extract information than from P. Benefits of obfuscation: Obfuscation aims to raise the bar for reverse engineering Obfuscation is last layer of defense against attackers (e.g. if attacker bypasses firewall and OS authentication) Possible remarks: This sounds fishy! Are there any theoretical foundations or guarantees of obfuscation?

July 9 – 14, 2017 Sebastian Banescu, BMW Group 4

slide-11
SLIDE 11

Outline

1 Introduction 2 Obfuscation in Theory 3 Obfuscation in Practice 4 Characterizing the Strength of Obfuscation

July 9 – 14, 2017 Sebastian Banescu, BMW Group 5

slide-12
SLIDE 12

Black Box Obfuscation

Formal Definition of Black-Box Obfuscation: A probabilistic algorithm O is an obfuscator if the following conditions hold: For every program P, the obfuscated program O(P) has the same functionality (e.g. input-output behavior)

July 9 – 14, 2017 Sebastian Banescu, BMW Group 6

slide-13
SLIDE 13

Black Box Obfuscation

Formal Definition of Black-Box Obfuscation: A probabilistic algorithm O is an obfuscator if the following conditions hold: For every program P, the obfuscated program O(P) has the same functionality (e.g. input-output behavior) The memory size increase and execution slowdown of O(P) w.r.t. P are less than polynomial

July 9 – 14, 2017 Sebastian Banescu, BMW Group 6

slide-14
SLIDE 14

Black Box Obfuscation

Formal Definition of Black-Box Obfuscation: A probabilistic algorithm O is an obfuscator if the following conditions hold: For every program P, the obfuscated program O(P) has the same functionality (e.g. input-output behavior) The memory size increase and execution slowdown of O(P) w.r.t. P are less than polynomial Any probabilistic polynomial time attacker only has a negligible probability of guessing any bit of information about P, given O(P)

July 9 – 14, 2017 Sebastian Banescu, BMW Group 6

slide-15
SLIDE 15

Black Box Obfuscation

Formal Definition of Black-Box Obfuscation: A probabilistic algorithm O is an obfuscator if the following conditions hold: For every program P, the obfuscated program O(P) has the same functionality (e.g. input-output behavior) The memory size increase and execution slowdown of O(P) w.r.t. P are less than polynomial Any probabilistic polynomial time attacker only has a negligible probability of guessing any bit of information about P, given O(P) In 2001 Barak et al. (2001) proved there exists no general

  • bfuscator applicable to all programs

July 9 – 14, 2017 Sebastian Banescu, BMW Group 6

slide-16
SLIDE 16

Black Box Obfuscation

Formal Definition of Black-Box Obfuscation: A probabilistic algorithm O is an obfuscator if the following conditions hold: For every program P, the obfuscated program O(P) has the same functionality (e.g. input-output behavior) The memory size increase and execution slowdown of O(P) w.r.t. P are less than polynomial Any probabilistic polynomial time attacker only has a negligible probability of guessing any bit of information about P, given O(P) In 2001 Barak et al. (2001) proved there exists no general

  • bfuscator applicable to all programs

There may exist obfuscators for some programs which leak bits of information, but are “good enough” for some practical scenarios

July 9 – 14, 2017 Sebastian Banescu, BMW Group 6

slide-17
SLIDE 17

Indistinguishability Obfuscation

Garg et al. (2013) proposed indistinguishability obfuscation:

The obfuscated versions of 2 semantically equivalent programs cannot be distinguished

July 9 – 14, 2017 Sebastian Banescu, BMW Group 7

slide-18
SLIDE 18

Indistinguishability Obfuscation

Garg et al. (2013) proposed indistinguishability obfuscation:

The obfuscated versions of 2 semantically equivalent programs cannot be distinguished Goldwasser and Rothblum (2007) proved this obfuscation is the best-possible obfuscation

July 9 – 14, 2017 Sebastian Banescu, BMW Group 7

slide-19
SLIDE 19

Indistinguishability Obfuscation

Garg et al. (2013) proposed indistinguishability obfuscation:

The obfuscated versions of 2 semantically equivalent programs cannot be distinguished Goldwasser and Rothblum (2007) proved this obfuscation is the best-possible obfuscation Implementations still far from being practical: Banescu et al. (2015); Barak (2016)

July 9 – 14, 2017 Sebastian Banescu, BMW Group 7

slide-20
SLIDE 20

Indistinguishability Obfuscation

Garg et al. (2013) proposed indistinguishability obfuscation:

The obfuscated versions of 2 semantically equivalent programs cannot be distinguished Goldwasser and Rothblum (2007) proved this obfuscation is the best-possible obfuscation Implementations still far from being practical: Banescu et al. (2015); Barak (2016)

The rest of this talk will focus on practical code obfuscation

July 9 – 14, 2017 Sebastian Banescu, BMW Group 7

slide-21
SLIDE 21

Outline

1 Introduction 2 Obfuscation in Theory 3 Obfuscation in Practice 4 Characterizing the Strength of Obfuscation

July 9 – 14, 2017 Sebastian Banescu, BMW Group 8

slide-22
SLIDE 22

Obfuscation in Practice

What types of obfuscation transformations exist? Code vs. Data Static vs. Dynamic Source code vs. IR vs. Machine code etc.

July 9 – 14, 2017 Sebastian Banescu, BMW Group 9

slide-23
SLIDE 23

Obfuscation in Practice

What types of obfuscation transformations exist? Code vs. Data Static vs. Dynamic Source code vs. IR vs. Machine code etc. How many obfuscation transformations are there? Scramble identifiers Instruction substitution Garbage code insertion Merging and splitting functions Opaque predicates Control-flow flattening Virtualization obfuscation White-box cryptography etc.

July 9 – 14, 2017 Sebastian Banescu, BMW Group 9

slide-24
SLIDE 24

Software Diversity via Obfuscation Transformations

  • 1. Software developer

distributes software X to all end-users

July 9 – 14, 2017 Sebastian Banescu, BMW Group 10

slide-25
SLIDE 25

Software Diversity via Obfuscation Transformations

  • 1. Software developer

distributes software X to all end-users

  • 2. Some end-users are MATE

attackers

July 9 – 14, 2017 Sebastian Banescu, BMW Group 10

slide-26
SLIDE 26

Software Diversity via Obfuscation Transformations

  • 1. Software developer

distributes software X to all end-users

  • 2. Some end-users are MATE

attackers

  • 3. MATE reverse engineers X

and builds a hijacker of X

July 9 – 14, 2017 Sebastian Banescu, BMW Group 10

slide-27
SLIDE 27

Software Diversity via Obfuscation Transformations

  • 1. Software developer

distributes software X to all end-users

  • 2. Some end-users are MATE

attackers

  • 3. MATE reverse engineers X

and builds a hijacker of X

  • 4. MATE distributes hijacker

to other end-users of X

July 9 – 14, 2017 Sebastian Banescu, BMW Group 10

slide-28
SLIDE 28

Software Diversity via Obfuscation Transformations

What can we do to protect victims?

July 9 – 14, 2017 Sebastian Banescu, BMW Group 11

slide-29
SLIDE 29

Software Diversity via Obfuscation Transformations

What can we do to protect victims? Give everyone a different version

July 9 – 14, 2017 Sebastian Banescu, BMW Group 11

slide-30
SLIDE 30

Software Diversity via Obfuscation Transformations

What can we do to protect victims? Give everyone a different version Generate 1000s of different versions using obfuscation

July 9 – 14, 2017 Sebastian Banescu, BMW Group 11

slide-31
SLIDE 31

Software Diversity via Obfuscation Transformations

What can we do to protect victims? Give everyone a different version Generate 1000s of different versions using obfuscation Assumption: Same attack will not work on different binaries

July 9 – 14, 2017 Sebastian Banescu, BMW Group 11

slide-32
SLIDE 32

Software Diversity via Obfuscation Transformations

What can we do to protect victims? Give everyone a different version Generate 1000s of different versions using obfuscation Assumption: Same attack will not work on different binaries Issues:

Analyzing crash-dumps Incremental updates Digitally signing all versions

July 9 – 14, 2017 Sebastian Banescu, BMW Group 11

slide-33
SLIDE 33

Research Questions?

  • 1. How can we characterize the strength of code obfuscation

transformations against automated MATE attacks?

July 9 – 14, 2017 Sebastian Banescu, BMW Group 12

slide-34
SLIDE 34

Research Questions?

  • 1. How can we characterize the strength of code obfuscation

transformations against automated MATE attacks?

  • 2. How can we determine which code features have the highest impact
  • n different automated MATE attacks?

July 9 – 14, 2017 Sebastian Banescu, BMW Group 12

slide-35
SLIDE 35

Research Questions?

  • 1. How can we characterize the strength of code obfuscation

transformations against automated MATE attacks?

  • 2. How can we determine which code features have the highest impact
  • n different automated MATE attacks?
  • 3. Which obfuscation transformations hinder which automated attacks,

by how much?

July 9 – 14, 2017 Sebastian Banescu, BMW Group 12

slide-36
SLIDE 36

Research Questions?

  • 1. How can we characterize the strength of code obfuscation

transformations against automated MATE attacks?

  • 2. How can we determine which code features have the highest impact
  • n different automated MATE attacks?
  • 3. Which obfuscation transformations hinder which automated attacks,

by how much?

  • 4. Can we estimate the effort of the attack?

July 9 – 14, 2017 Sebastian Banescu, BMW Group 12

slide-37
SLIDE 37

Outline

1 Introduction 2 Obfuscation in Theory 3 Obfuscation in Practice 4 Characterizing the Strength of Obfuscation

July 9 – 14, 2017 Sebastian Banescu, BMW Group 13

slide-38
SLIDE 38

Step 1: Model MATE Attacks as Attack-Nets

Pick best attack path or run all in parallel (first that finishes is best)

Figure : Attack-net representing MATE attack for bypassing license check.

Inspired by work of Wang et al. (2013)

July 9 – 14, 2017 Sebastian Banescu, BMW Group 14

slide-39
SLIDE 39

Step 2: Model Transitions as Search Problems

July 9 – 14, 2017 Sebastian Banescu, BMW Group 15

slide-40
SLIDE 40

Step 2: Model Transitions as Search Problems

Example for Symbolic Execution: Data structure: code State: code + instruction pointer + memory state Action: interpret one of the next instructions Goal: find shortest path to pass license check Strategy: non-uniform random search Heuristic: minimum distance to target instruction

July 9 – 14, 2017 Sebastian Banescu, BMW Group 15

slide-41
SLIDE 41

1 int

main(int ac , char* av []) {

2

int hash = 5381;

3

unsigned char *str = av [1];

4

while (int c = *str ++) hash += (hash << 5) + c;

5

if (hash == 0x49a54935)

6

printf("You Win");

7

return 0;

8 }

July 9 – 14, 2017 Sebastian Banescu, BMW Group 16

slide-42
SLIDE 42

1 int

main(int ac , char* av []) {

2

int hash = 5381;

3

unsigned char *str = av [1];

4

while (int c = *str ++) hash += (hash << 5) + c;

5

if (hash == 0x49a54935)

6

printf("You Win");

7

return 0;

8 }

July 9 – 14, 2017 Sebastian Banescu, BMW Group 16

slide-43
SLIDE 43

SAT Solving as a Search Problem

Data structure: first order logic formula e.g. (a+c+d) · (a+c+!d) · (a+!c+d) · (a+!c+!d) · (!b+!c+d) == 1 State: partial assignment Action: assign subset of literals Goal: find satisfying assignment Strategy: DPLL Heuristic: Assign literals with highest frequency

July 9 – 14, 2017 Sebastian Banescu, BMW Group 17

slide-44
SLIDE 44

Step 3: Identify Features and Generate Dataset

Think of data structures, states and actions of search problems

July 9 – 14, 2017 Sebastian Banescu, BMW Group 18

slide-45
SLIDE 45

Step 3: Identify Features and Generate Dataset

Think of data structures, states and actions of search problems For symbolic execution some of the interesting features are:

  • 1. Size of the symbolic variable

(input value)

  • 2. Maximum depth of nested

control flow structures

  • 3. Total number of branch

statements (if and loops)

  • 4. Number of branch

statements dependent on symbolic variables

  • 5. The types of operators used

in the computation

July 9 – 14, 2017 Sebastian Banescu, BMW Group 18

slide-46
SLIDE 46

Step 3: Identify Features and Generate Dataset

Think of data structures, states and actions of search problems For symbolic execution some of the interesting features are:

  • 1. Size of the symbolic variable

(input value)

  • 2. Maximum depth of nested

control flow structures

  • 3. Total number of branch

statements (if and loops)

  • 4. Number of branch

statements dependent on symbolic variables

  • 5. The types of operators used

in the computation

Create data set of programs that exhibit many (preferably all) combinations of features

July 9 – 14, 2017 Sebastian Banescu, BMW Group 18

slide-47
SLIDE 47

Step 4: Obfuscate Programs

Each original program obfuscated with:

Tigress by Collberg et al. (2011) OLLVM by Junod et al. (2015)

July 9 – 14, 2017 Sebastian Banescu, BMW Group 19

slide-48
SLIDE 48

Step 4: Obfuscate Programs

Each original program obfuscated with:

Tigress by Collberg et al. (2011) OLLVM by Junod et al. (2015)

Obfuscated also using mix of 2 transformations

July 9 – 14, 2017 Sebastian Banescu, BMW Group 19

slide-49
SLIDE 49

Step 4: Obfuscate Programs

Each original program obfuscated with:

Tigress by Collberg et al. (2011) OLLVM by Junod et al. (2015)

Obfuscated also using mix of 2 transformations Transformation Names:

Virtualization (Virt) Flattening (Flat) Opaque Predicates (AddO, UpdO) Encode arithmetic (EncA) Encode literals (EncL) Bogus Control Flow (BCF) Instruction Substitution (ISub) Control-Flow Flattening (CFF)

July 9 – 14, 2017 Sebastian Banescu, BMW Group 19

slide-50
SLIDE 50

Step 5: Attack Obfuscated Programs

We symbolically executed each

  • bfuscated program using KLEE

Cadar et al. (2008) Attacker goal is to find an input that bypasses license check Similar to finding a trigger input for malware More details in Banescu et al. (2016)

July 9 – 14, 2017 Sebastian Banescu, BMW Group 20

slide-51
SLIDE 51

Step 5: Attack Obfuscated Programs

1 5 10 50 100 500 1000 5000

O r i g E n c L A d d O 4 − U p d O A d d O 4 E n c L − A d d O 1 6 A d d O 1 6 − U p d O A d d O 1 6 A d d O 1 6 − E n c L F l a t − A d d O 1 6 V i r t − E n c L V i r t E n c L − V i r t E n c A − E n c L E n c A E n c L − E n c A V i r t − A d d O 1 6 E n c A − A d d O 1 6 A d d O 1 6 − V i r t F l a t − V i r t V i r t − E n c A F l a t − E n c L E n c L − F l a t F l a t F l a t x 2 F l a t − E n c A E n c A − V i r t V i r t − F l a t A d d O 1 6 − F l a t E n c A − F l a t A d d O 1 6 − E n c A V i r t x 2 I S u b B C F − I S u b B C F I S u b − B C F I S u b − C F F C F F − I S u b C F F C F F − B C F B C F − C F F

25 50 75 100 125 150

Mean program size increase (factor) Mean KLEE slowdown (factor) % Time waiting for solver Mean number of added queries (factor) Mean query size increase (factor)

July 9 – 14, 2017 Sebastian Banescu, BMW Group 21

slide-52
SLIDE 52

Step 5: Attack Obfuscated Programs

1 5 10 50 100 500 1000 5000

O r i g E n c L A d d O 4 − U p d O A d d O 4 E n c L − A d d O 1 6 A d d O 1 6 − U p d O A d d O 1 6 A d d O 1 6 − E n c L F l a t − A d d O 1 6 V i r t − E n c L V i r t E n c L − V i r t E n c A − E n c L E n c A E n c L − E n c A V i r t − A d d O 1 6 E n c A − A d d O 1 6 A d d O 1 6 − V i r t F l a t − V i r t V i r t − E n c A F l a t − E n c L E n c L − F l a t F l a t F l a t x 2 F l a t − E n c A E n c A − V i r t V i r t − F l a t A d d O 1 6 − F l a t E n c A − F l a t A d d O 1 6 − E n c A V i r t x 2 I S u b B C F − I S u b B C F I S u b − B C F I S u b − C F F C F F − I S u b C F F C F F − B C F B C F − C F F

25 50 75 100 125 150

Mean program size increase (factor) Mean KLEE slowdown (factor) % Time waiting for solver Mean number of added queries (factor) Mean query size increase (factor)

Opaque predicates and virtualization have highest increase in program size Opaque predicates and encode literals have smallest impact on symbolic execution time Flattening and virtualization (also combined w. other transformations) increase time Flattening increases number of queries sent to SMT solver Encode arithmetic increases size of queries sent to SMT solver

July 9 – 14, 2017 Sebastian Banescu, BMW Group 21

slide-53
SLIDE 53

Step 6: Feature Extraction

Extracted 64 features in total:

Static code metrics Dynamic code metrics SAT metrics

July 9 – 14, 2017 Sebastian Banescu, BMW Group 22

slide-54
SLIDE 54

Step 6: Feature Extraction

Extracted 64 features in total:

Static code metrics Dynamic code metrics SAT metrics

What are SAT metrics?

July 9 – 14, 2017 Sebastian Banescu, BMW Group 22

slide-55
SLIDE 55

Step 6: Feature Extraction

Extracted 64 features in total:

Static code metrics Dynamic code metrics SAT metrics

What are SAT metrics? Graph metrics on a SAT formula represented as a graph (x∨y∨z)∧(¬x∨¬y∨z)∧(x∨¬y∨¬z)

July 9 – 14, 2017 Sebastian Banescu, BMW Group 22

slide-56
SLIDE 56

Step 6: Feature Extraction

Extracted 64 features in total:

Static code metrics Dynamic code metrics SAT metrics

What are SAT metrics? Graph metrics on a SAT formula represented as a graph (x∨y∨z)∧(¬x∨¬y∨z)∧(x∨¬y∨¬z) How does SAT graph of obfuscated program look like?

July 9 – 14, 2017 Sebastian Banescu, BMW Group 22

slide-57
SLIDE 57

SAT Graph Before and After Obfuscation

Figure : Before Obfuscation (7.5 sec)

1 unsigned int SDBMHash(char* str , unsigned int len) 2 { 3 unsigned int hash = 0; 4 unsigned int i = 0; 5 for(i = 0; i < len; str++, i++) 6 hash = (* str) + (hash << 6) 7 + (hash << 16) - hash; 8 return hash; 9 } 10 11 int main(int argc , char* argv []) { 12 unsigned char *str = argv [1]; 13 unsigned int hash = SDBMHash(str , strlen(str)); 14 15 if (hash == 0x89dcd66e) 16 printf("The license key is correct !\n"); 17 else 18 printf("The license key is incorrect !\n"); 19 return 0; 20 }

July 9 – 14, 2017 Sebastian Banescu, BMW Group 23

slide-58
SLIDE 58

SAT Graph Before and After Obfuscation

Figure : Before Obfuscation (7.5 sec) Figure : After Obfuscation (438 sec)

Tool for obtaining these figures is SATGraf by Newsham et al. (2015)

July 9 – 14, 2017 Sebastian Banescu, BMW Group 23

slide-59
SLIDE 59

Step 7: Predict Average Effort Needed by Attack

Some features useless for prediction Performed recursive feature selection → 15 features

July 9 – 14, 2017 Sebastian Banescu, BMW Group 24

slide-60
SLIDE 60

Step 7: Predict Average Effort Needed by Attack

Some features useless for prediction Performed recursive feature selection → 15 features

  • 0.0e+00

4.0e+06 8.0e+06 1.2e+07 Variable Importance weight sdinter

  • l_coms

meaninter sdedgeratio meancom meanintra sdcom sdintra

  • l_q

edgeratio Risk L1.Loops max_clause num_max_inter

July 9 – 14, 2017 Sebastian Banescu, BMW Group 24

slide-61
SLIDE 61

Step 7: Predict Average Effort Needed by Attack

Some features useless for prediction Performed recursive feature selection → 15 features Employed different ML algorithms for predicting attacker effort:

Neural Networks Support Vector Machines Random Forest Genetic Programming

More details in Banescu et al. (2017)

July 9 – 14, 2017 Sebastian Banescu, BMW Group 24

slide-62
SLIDE 62

Step 7: Predict Average Effort Needed by Attack

Relative error Percentage of programs 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 Type of error Maximum error with Neural Networks Maximum error with Support Vector Machines Maximum error with Random Forest Maximum error with Genetic Programming Median error with Neural Networks Median error with Support Vector Machines Median error with Random Forest Median error with Genetic Programming July 9 – 14, 2017 Sebastian Banescu, BMW Group 25

slide-63
SLIDE 63

Step 7: Predict Average Effort Needed by Attack

Relative error Percentage of programs 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 Type of error Maximum error with Neural Networks Maximum error with Support Vector Machines Maximum error with Random Forest Maximum error with Genetic Programming Median error with Neural Networks Median error with Support Vector Machines Median error with Random Forest Median error with Genetic Programming

The most important features for prediction are SAT features These features stem from the complexity of path constraints Path constraints do not have to be long to be complex

July 9 – 14, 2017 Sebastian Banescu, BMW Group 25

slide-64
SLIDE 64

Conclusions and Future Work

Conclusions Obfuscation strength is proportional to effort of best known automated MATE attack All automated MATE attacks can be formalized as one or more search problems Search algorithms have a cost dependent on program features Auto-MATEd attacker effort can be predicted using these features Several control-flow obfuscation transformations are weak against symbolic execution effort of symbolic execution attacks

July 9 – 14, 2017 Sebastian Banescu, BMW Group 26

slide-65
SLIDE 65

Conclusions and Future Work

Conclusions Obfuscation strength is proportional to effort of best known automated MATE attack All automated MATE attacks can be formalized as one or more search problems Search algorithms have a cost dependent on program features Auto-MATEd attacker effort can be predicted using these features Several control-flow obfuscation transformations are weak against symbolic execution effort of symbolic execution attacks Future Work Instantiate for more auto-MATEd attacks, e.g. CFG simplification, disassembly, etc. Expand the dataset of programs for experiments Use machine learning to directly deobfuscate programs

July 9 – 14, 2017 Sebastian Banescu, BMW Group 26

slide-66
SLIDE 66

Thank you for your attention

Questions ?

July 9 – 14, 2017 Sebastian Banescu, BMW Group 27

slide-67
SLIDE 67

References I

  • S. Banescu, M. Ochoa, A. Pretschner, and N. Kunze. Benchmarking

indistinguishability obfuscation - a candidate implementation. In Proc.

  • f 7th International Symposium on ESSoS, number 8978 in LNCS,

2015. Sebastian Banescu, Christian Collberg, Vijay Ganesh, Zack Newsham, and Alexander Pretschner. Code obfuscation against symbolic execution attacks. In Proc. of 2016 Annual Computer Security Applications Conference. ACM, 2016. Sebastian Banescu, Christian Collberg, and Alexander Pretschner. Predicting the resilience of obfuscated code against symbolic execution attacks via machine learning. In Proceedings of the 26th USENIX Security Symposium, 2017. Boaz Barak. Hopes, fears, and software obfuscation. Communications of the ACM, 59(3):88–96, 2016.

July 9 – 14, 2017 Sebastian Banescu, BMW Group 28

slide-68
SLIDE 68

References II

Boaz Barak, Oded Goldreich, Rusell Impagliazzo, Steven Rudich, Amit Sahai, Salil Vadhan, and Ke Yang. On the (im) possibility of

  • bfuscating programs. In Advances in Cryptology CRYPTO 2001,

pages 1–18. Springer, 2001. Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. Klee: Unassisted and automatic generation of high-coverage tests for complex systems

  • programs. In OSDI, 2008.
  • C. Collberg, J. Davidson, R. Giacobazzi, Y. X. Gu, A. Herzberg, and
  • F. Wang. Toward digital asset protection. Intelligent Systems, IEEE,

26(6):8–13, 2011.

  • S. Garg, C. Gentry, S. Halevi, M. Raykova, A Sahai, and B. Waters.

Candidate indistinguishability obfuscation and functional encryption for all circuits. In Proc. of the 54th Annual Symp. on Foundations of Computer Science, pages 40–49, 2013. doi: 10.1109/FOCS.2013.13.

  • S. Goldwasser and G. N. Rothblum. On best-possible obfuscation. In

Theory of Cryptography, pages 194–213. Springer, 2007.

July 9 – 14, 2017 Sebastian Banescu, BMW Group 29

slide-69
SLIDE 69

References III

Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. Obfuscator-LLVM – software protection for the masses. In Brecht Wyseur, editor, Proceedings of the IEEE/ACM 1st International Workshop on Software Protection, SPRO’15, Firenze, Italy, May 19th, 2015, pages 3–9. IEEE, 2015. doi: 10.1109/SPRO.2015.10. Zack Newsham, William Lindsay, Vijay Ganesh, Jia Hui Liang, Sebastian Fischmeister, and Krzysztof Czarnecki. Satgraf: Visualizing the evolution of sat formula structure in solvers. In Theory and Applications of Satisfiability Testing–SAT 2015, pages 62–70. Springer, 2015. Huaijun Wang, Dingyi Fang, Ni Wang, Zhanyong Tang, Feng Chen, and Yuanxiang Gu. Method to evaluate software protection based on attack modeling. In High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC EUC), 2013 IEEE 10th International Conference on, pages 837–844. IEEE, 2013.

July 9 – 14, 2017 Sebastian Banescu, BMW Group 30