QSynth - A Program Synthesis approach for Binary Code Deobfuscation - - PowerPoint PPT Presentation

qsynth a program synthesis approach for binary code
SMART_READER_LITE
LIVE PREVIEW

QSynth - A Program Synthesis approach for Binary Code Deobfuscation - - PowerPoint PPT Presentation

QSynth - A Program Synthesis approach for Binary Code Deobfuscation Binary Analysis Workshop - NDSS Robin David <rdavid@quarkslab.com> Luigi Coniglio <luigi.coniglio@studenti.unitn.it> Mariano Ceccato <mariano.ceccato@univr.it>


slide-1
SLIDE 1

www.quarkslab.com

QSynth - A Program Synthesis approach for Binary Code Deobfuscation

Binary Analysis Workshop - NDSS Robin David <rdavid@quarkslab.com>

Luigi Coniglio <luigi.coniglio@studenti.unitn.it> Mariano Ceccato <mariano.ceccato@univr.it> February 23th, 2020 - San Diego, California

slide-2
SLIDE 2

Talk Outline

Context:

◮ Need to address highly obfuscated binaries ◮ Few approaches address data obfuscation

Goal: deobfuscating expression (obfuscated with data transformations)

2 / 26

slide-3
SLIDE 3

Talk Outline

Context:

◮ Need to address highly obfuscated binaries ◮ Few approaches address data obfuscation

Goal: deobfuscating expression (obfuscated with data transformations) Takeway We provide a synthesis approach addressing various obfuscations and that supersede the state-of-the-art in both speed and accuracy

2 / 26

slide-4
SLIDE 4

Table of Contents

Background Software obfuscation Deobfuscation techniques Our Synthesis Approach Goal & Contributions Approach steps Experimental Benchmarks Experimental Setup Benchmarks Conclusion

3 / 26

slide-5
SLIDE 5

Obfuscation types

Control-Flow Obfuscation

Hiding the logic and algorithm of the program

Virtualization, Opaque predicates, CFG-flattening, Split, Merge, Packing, Implicit Flow, MBA, Loop-Unrolling...

Example

4 / 26

slide-6
SLIDE 6

Obfuscation types

Control-Flow Obfuscation

Hiding the logic and algorithm of the program

Virtualization, Opaque predicates, CFG-flattening, Split, Merge, Packing, Implicit Flow, MBA, Loop-Unrolling...

Data-Flow Obfuscation

Hiding data, constants, strings, APIs, keys etc.

Data encoding, MBA, Arithmetic Encoding, Whitebox, Array Split, Fold and Merge, Variable Splitting...

Example

a + b

((((((a ∧¬b)+b) << 1)∧¬((a ∨b)− (a ∧b))) << 1)−((((a ∧¬b)+b) <<

1) ⊕ ((a ∨ b) − (a ∧ b))))

4 / 26

slide-7
SLIDE 7

Obfuscation types

Control-Flow Obfuscation

Hiding the logic and algorithm of the program

Virtualization, Opaque predicates, CFG-flattening, Split, Merge, Packing, Implicit Flow, MBA, Loop-Unrolling...

Data-Flow Obfuscation

Hiding data, constants, strings, APIs, keys etc.

Data encoding, MBA, Arithmetic Encoding, Whitebox, Array Split, Fold and Merge, Variable Splitting...

Example

a + b

((((((a ∧¬b)+b) << 1)∧¬((a ∨b)− (a ∧b))) << 1)−((((a ∧¬b)+b) <<

1) ⊕ ((a ∨ b) − (a ∧ b))))

Problem: Reverting an obfuscating transformation is hard.

4 / 26

slide-8
SLIDE 8

Deobfuscation

Let’s focus on two deobfuscation techniques: Dynamic Symbolic Execution Program Synthesis

5 / 26

slide-9
SLIDE 9

Symbolic Execution

Definition

Mean of executing a program using symbolic values (logical symbols) rather than real values (bitvectors) in order to obtain an in-out relationship of a path

6 / 26

slide-10
SLIDE 10

Symbolic Execution

Definition

Mean of executing a program using symbolic values (logical symbols) rather than real values (bitvectors) in order to obtain an in-out relationship of a path

Dynamic Symbolic Execution (a.k.a. concolic)

◮ Properties: work on dynamic paths and use runtime values ◮ Advantages: path sure to be feasible and thwart various obfuscations

6 / 26

slide-11
SLIDE 11

Symbolic Execution: Example

⇒ In this context used to extract symbolic expressions (e.g. b)

Symbolic State

7 / 26

slide-12
SLIDE 12

Symbolic Execution: Example

⇒ In this context used to extract symbolic expressions (e.g. b)

Symbolic State φb = b

7 / 26

slide-13
SLIDE 13

Symbolic Execution: Example

⇒ In this context used to extract symbolic expressions (e.g. b)

Symbolic State φb = b φb = b + (a | − 1) − 1

7 / 26

slide-14
SLIDE 14

Symbolic Execution: Example

⇒ In this context used to extract symbolic expressions (e.g. b)

Symbolic State φb = b φb = b + (a | − 1) − 1 φb = b + (a | − 1) − 1 − ((∼ a)

& − 1)

7 / 26

slide-15
SLIDE 15

Symbolic Execution: Example

⇒ In this context used to extract symbolic expressions (e.g. b)

Symbolic State φb = b φb = b + (a | − 1) − 1 φb = b + (a | − 1) − 1 − ((∼ a)

& − 1)

φb = b + (a | − 1) − 1 − ((∼ a)

& − 1) − 1 + (((b + (a | − 1) −1 − ((∼ a)& − 1)) × (b + . . .

Question: How to simplify the φb expression?

(Knowing that the quality of the result depends on the syntactic complexity of the

  • bfuscated expression)

7 / 26

slide-16
SLIDE 16

Program Synthesis

Definition

Program synthesis consists in automatically deriving a program from:

◮ a high-level specification (typically its I/O behaviour) ◮ additional constraints:

◮ Compilation: a faster program ◮ Deobfuscation: a smaller or more readable program

8 / 26

slide-17
SLIDE 17

Program Synthesis

Definition

Program synthesis consists in automatically deriving a program from:

◮ a high-level specification (typically its I/O behaviour) ◮ additional constraints:

◮ Compilation: a faster program ◮ Deobfuscation: a smaller or more readable program

Example

Obfuscated Program Input Output

8 / 26

slide-18
SLIDE 18

Program Synthesis

Definition

Program synthesis consists in automatically deriving a program from:

◮ a high-level specification (typically its I/O behaviour) ◮ additional constraints:

◮ Compilation: a faster program ◮ Deobfuscation: a smaller or more readable program

Example

Obfuscated Program Input Output 1, 2 3

8 / 26

slide-19
SLIDE 19

Program Synthesis

Definition

Program synthesis consists in automatically deriving a program from:

◮ a high-level specification (typically its I/O behaviour) ◮ additional constraints:

◮ Compilation: a faster program ◮ Deobfuscation: a smaller or more readable program

Example

Obfuscated Program Input Output 1, 2 3 2, 2 4

8 / 26

slide-20
SLIDE 20

Program Synthesis

Definition

Program synthesis consists in automatically deriving a program from:

◮ a high-level specification (typically its I/O behaviour) ◮ additional constraints:

◮ Compilation: a faster program ◮ Deobfuscation: a smaller or more readable program

Example

Obfuscated Program Input Output 1, 2 3 2, 2 4 2, 3 5

8 / 26

slide-21
SLIDE 21

Program Synthesis

Definition

Program synthesis consists in automatically deriving a program from:

◮ a high-level specification (typically its I/O behaviour) ◮ additional constraints:

◮ Compilation: a faster program ◮ Deobfuscation: a smaller or more readable program

Example

Obfuscated Program Input Output 1, 2 3 2, 2 4 2, 3 5

a + b

8 / 26

slide-22
SLIDE 22

Program Synthesis

Definition

Program synthesis consists in automatically deriving a program from:

◮ a high-level specification (typically its I/O behaviour) ◮ additional constraints:

◮ Compilation: a faster program ◮ Deobfuscation: a smaller or more readable program

Example

Obfuscated Program Input Output 1, 2 3 2, 2 4 2, 3 5

a + b

Problem

Synthesizing programs (expressions) with complex behaviors is hard.

8 / 26

slide-23
SLIDE 23

Table of Contents

Background Software obfuscation Deobfuscation techniques Our Synthesis Approach Goal & Contributions Approach steps Experimental Benchmarks Experimental Setup Benchmarks Conclusion

9 / 26

slide-24
SLIDE 24

Key Intuition

Symbolic Execution + Capture full semantic

  • Influenced by syntactic

complexity

10 / 26

slide-25
SLIDE 25

Key Intuition

Symbolic Execution + Capture full semantic

  • Influenced by syntactic

complexity

Program Synthesis + Only influenced by semantic

complexity

  • Black-box ⇒ big search space

10 / 26

slide-26
SLIDE 26

Key Intuition

Symbolic Execution + Capture full semantic

  • Influenced by syntactic

complexity

Program Synthesis + Only influenced by semantic

complexity

  • Black-box ⇒ big search space

Idea: Using symbolic execution to reduce the synthesis search space

10 / 26

slide-27
SLIDE 27

Contributions

A synthesis approach using an

Offline Enumerative Search

based on pre-computed lookup tables combined with an Abstract Syntax Tree

simplification algorithm

which outperform similar approach of the state-of-the-art (e.g. Syntia)

11 / 26

slide-28
SLIDE 28

QSynth: Overview

Execution tracing (DBI) Obfuscated program Dynamic Symbolic Execution Execution trace Enumerative Synthesis Oracle

(generated once for all)

Simplification Strategy

(for each sub-expression)

inputs equivalent expression

  • utputs

Obfuscated expressions synthesized expressions

12 / 26

slide-29
SLIDE 29

QSynth: Overview

Execution tracing (DBI) Obfuscated program Dynamic Symbolic Execution Execution trace Enumerative Synthesis Oracle

(generated once for all)

Simplification Strategy

(for each sub-expression)

inputs equivalent expression

  • utputs

Obfuscated expressions synthesized expressions

QBDI

Tool:

12 / 26

slide-30
SLIDE 30

Execution Tracing

Dynamic Binary Instrumentation

Using QBDI: QuarkslaB Dynamic binary Instrumentation (similar to Pin, DynamoRIO)

+ multi-architecture & platform

  • no (direct) thread support

Qtracer (a qbditool like Pin ‘‘pintools’’)

◮ gather instruction executed with their

concrete state (registers and memory)

◮ Data are consolidated in database

(SQLite, PostgresSQL etc.)

mov qword [0x000232c0], 8 mov r13, rax test rax, rax je 0x42a7 xor r8d, r8d xor edx, edx xor esi, esi mov qword [0x000232c0], 8 ; Some code ... mov r13, rax ; Some code ... test rax, rax ; Some code ... je <patched address> ; Some code ... xor r8d, r8d ; Some code ... xor edx, edx ; Some code ... xor esi, esi ; Some code ...

Original Instrumented

Instrumentation

https://qbdi.quarkslab.com/

13 / 26

slide-31
SLIDE 31

QSynth: Overview

Execution tracing (DBI) Obfuscated program Dynamic Symbolic Execution Execution trace Enumerative Synthesis Oracle

(generated once for all)

Simplification Strategy

(for each sub-expression)

inputs equivalent expression

  • utputs

Obfuscated expressions synthesized expressions

QBDI

Tool:

14 / 26

slide-32
SLIDE 32

QSynth: Overview

Execution tracing (DBI) Obfuscated program Dynamic Symbolic Execution Execution trace Enumerative Synthesis Oracle

(generated once for all)

Simplification Strategy

(for each sub-expression)

inputs equivalent expression

  • utputs

Obfuscated expressions synthesized expressions

Tool:

14 / 26

slide-33
SLIDE 33

DSE: Symbolic expression computation

⇒ Triton allows computing any symbolic expression along the

trace by backtracking on data dependencies

15 / 26

slide-34
SLIDE 34

DSE: Symbolic expression computation

⇒ Triton allows computing any symbolic expression along the

trace by backtracking on data dependencies

15 / 26

slide-35
SLIDE 35

DSE: Symbolic expression computation

⇒ Triton allows computing any symbolic expression along the

trace by backtracking on data dependencies

15 / 26

slide-36
SLIDE 36

DSE: Symbolic expression computation

⇒ Triton allows computing any symbolic expression along the

trace by backtracking on data dependencies

ϕ (b + (a − 1)) − 1

Oϕ the associated I/O oracle can be evaluated on different inputs

15 / 26

slide-37
SLIDE 37

QSynth: Overview

Execution tracing (DBI) Obfuscated program Dynamic Symbolic Execution Execution trace Enumerative Synthesis Oracle

(generated once for all)

Simplification Strategy

(for each sub-expression)

inputs equivalent expression

  • utputs

Obfuscated expressions synthesized expressions

Tool:

16 / 26

slide-38
SLIDE 38

QSynth: Overview

Execution tracing (DBI) Obfuscated program Dynamic Symbolic Execution Execution trace Enumerative Synthesis Oracle

(generated once for all)

Simplification Strategy

(for each sub-expression)

inputs equivalent expression

  • utputs

Obfuscated expressions synthesized expressions

16 / 26

slide-39
SLIDE 39

Synthesis Primitive

Definition

We call Synthesis Primitive any program SP taking as input parameters a black-box oracle Oϕ and a set of input parameters to the oracle I, and returning, in case of success, a program p, such that for any i ∈ I then p(i) = Oϕ(i).

SP(Oϕ, I) ⇒

p | ∀i ∈ I, p(i) ≡ Oϕ(i)

SP(Oϕ, I) ⇒ ∅

17 / 26

slide-40
SLIDE 40

Offline Enumerative Search (synthesis primitive SP)

Generate a set of programs based on a given grammar: (operators & variables)

a + b, a − b, a + a, b + b, a + a − b, . . .

18 / 26

slide-41
SLIDE 41

Offline Enumerative Search (synthesis primitive SP)

Generate a set of programs based on a given grammar: (operators & variables)

a + b, a − b, a + a, b + b, a + a − b, . . .

and with a set of inputs: (pseudo-random)

vector I = {(1, 1), (1, 0), (2, 1)}

18 / 26

slide-42
SLIDE 42

Offline Enumerative Search (synthesis primitive SP)

Generate a set of programs based on a given grammar: (operators & variables)

a + b, a − b, a + a, b + b, a + a − b, . . .

and with a set of inputs: (pseudo-random)

vector I = {(1, 1), (1, 0), (2, 1)}

Evaluate all programs on I and create the synthesis oracle SP : outputs → p

18 / 26

slide-43
SLIDE 43

Offline Enumerative Search (synthesis primitive SP)

Generate a set of programs based on a given grammar: (operators & variables)

a + b, a − b, a + a, b + b, a + a − b, . . .

and with a set of inputs: (pseudo-random)

vector I = {(1, 1), (1, 0), (2, 1)}

Evaluate all programs on I and create the synthesis oracle SP : outputs → p

Example: Outputs p 2, 1, 3 a + b

18 / 26

slide-44
SLIDE 44

Offline Enumerative Search (synthesis primitive SP)

Generate a set of programs based on a given grammar: (operators & variables)

a + b, a − b, a + a, b + b, a + a − b, . . .

and with a set of inputs: (pseudo-random)

vector I = {(1, 1), (1, 0), (2, 1)}

Evaluate all programs on I and create the synthesis oracle SP : outputs → p

Example: Outputs p 2, 1, 3 a + b 0, 1, 1 a − b

18 / 26

slide-45
SLIDE 45

Offline Enumerative Search (synthesis primitive SP)

Generate a set of programs based on a given grammar: (operators & variables)

a + b, a − b, a + a, b + b, a + a − b, . . .

and with a set of inputs: (pseudo-random)

vector I = {(1, 1), (1, 0), (2, 1)}

Evaluate all programs on I and create the synthesis oracle SP : outputs → p

Example: Outputs p 2, 1, 3 a + b 0, 1, 1 a − b 2, 2, 4 a + a

18 / 26

slide-46
SLIDE 46

Offline Enumerative Search (synthesis primitive SP)

Generate a set of programs based on a given grammar: (operators & variables)

a + b, a − b, a + a, b + b, a + a − b, . . .

and with a set of inputs: (pseudo-random)

vector I = {(1, 1), (1, 0), (2, 1)}

Evaluate all programs on I and create the synthesis oracle SP : outputs → p

Example: Outputs p 2, 1, 3 a + b 0, 1, 1 a − b 2, 2, 4 a + a . . . . . .

18 / 26

slide-47
SLIDE 47

Offline Enumerative Search (synthesis primitive SP)

Generate a set of programs based on a given grammar: (operators & variables)

a + b, a − b, a + a, b + b, a + a − b, . . .

and with a set of inputs: (pseudo-random)

vector I = {(1, 1), (1, 0), (2, 1)}

Evaluate all programs on I and create the synthesis oracle SP : outputs → p

Example: Outputs p 2, 1, 3 a + b 0, 1, 1 a − b 2, 2, 4 a + a . . . . . .

18 / 26

Bad

◮ Expressions derived grows exponentially (but can still easily achieve

10 nodes AST expressions)

◮ This primitive is unsound (it is only sound wrt. I)

slide-48
SLIDE 48

Offline Enumerative Search (synthesis primitive SP)

Generate a set of programs based on a given grammar: (operators & variables)

a + b, a − b, a + a, b + b, a + a − b, . . .

and with a set of inputs: (pseudo-random)

vector I = {(1, 1), (1, 0), (2, 1)}

Evaluate all programs on I and create the synthesis oracle SP : outputs → p

Example: Outputs p 2, 1, 3 a + b 0, 1, 1 a − b 2, 2, 4 a + a . . . . . .

18 / 26

Bad

◮ Expressions derived grows exponentially (but can still easily achieve

10 nodes AST expressions)

◮ This primitive is unsound (it is only sound wrt. I)

Good

Generated only once and usable on different obfuscations and

across programs

slide-49
SLIDE 49

QSynth: Overview

Execution tracing (DBI) Obfuscated program Dynamic Symbolic Execution Execution trace Enumerative Synthesis Oracle

(generated once for all)

Simplification Strategy

(for each sub-expression)

inputs equivalent expression

  • utputs

Obfuscated expressions synthesized expressions

19 / 26

slide-50
SLIDE 50

QSynth: Overview

Execution tracing (DBI) Obfuscated program Dynamic Symbolic Execution Execution trace Enumerative Synthesis Oracle

(generated once for all)

Simplification Strategy

(for each sub-expression)

inputs equivalent expression

  • utputs

Obfuscated expressions synthesized expressions

19 / 26

slide-51
SLIDE 51

AST simplification - Example

ϕ (((A ∨ B) + (A ∧ B)) ∧ A) − (((A ∨ B) + (A ∧ B)) ∨ A)

I = {(1, 1), (1, 0), (2, 1)}

20 / 26

slide-52
SLIDE 52

AST simplification - Example

ϕ (((A ∨ B) + (A ∧ B)) ∧ A) − (((A ∨ B) + (A ∧ B)) ∨ A)

I = {(1, 1), (1, 0), (2, 1)} Oϕoutputs = {3, 0, 1} SP[outputs]: not found

20 / 26

slide-53
SLIDE 53

AST simplification - Example

ϕ (((A ∨ B) + (A ∧ B)) ∧ A) − (((A ∨ B) + (A ∧ B)) ∨ A)

I = {(1, 1), (1, 0), (2, 1)} Oϕoutputs = {3, 1, 3} SP[outputs]: not found

20 / 26

slide-54
SLIDE 54

AST simplification - Example

ϕ (((A ∨ B) + (A ∧ B)) ∧ A) − (((A ∨ B) + (A ∧ B)) ∨ A)

I = {(1, 1), (1, 0), (2, 1)} Oϕoutputs = {0, 1, 2} SP[outputs]: not found

20 / 26

slide-55
SLIDE 55

AST simplification - Example

ϕ (((A ∨ B) + (A ∧ B)) ∧ A) − (((A ∨ B) + (A ∧ B)) ∨ A)

I = {(1, 1), (1, 0), (2, 1)} Oϕoutputs = {2, 1, 3} SP[outputs]: found ⇒ A + B

20 / 26

slide-56
SLIDE 56

AST simplification - Example

ϕ (((A ∨ B) + (A ∧ B)) ∧ A) − (((A ∨ B) + (A ∧ B)) ∨ A)

I = {(1, 1), (1, 0), (2, 1)} Oϕoutputs = {2, 1, 3} SP[outputs]: found ⇒ A + B

20 / 26

slide-57
SLIDE 57

AST simplification - Example

ϕ (((A ∨ B) + (A ∧ B)) ∧ A) − (((A ∨ B) + (A ∧ B)) ∨ A)

I = {(1, 1), (1, 0), (2, 1)} Oϕoutputs = {2, 1, 3} SP[outputs]: found ⇒ A + B

20 / 26

slide-58
SLIDE 58

AST simplification - Example

ϕ (((A ∨ B) + (A ∧ B)) ∧ A) − (((A ∨ B) + (A ∧ B)) ∨ A)

I = {(1, 1), (1, 0), (2, 1)} Oϕoutputs = {2, 1, 3} SP[outputs]: found ⇒ A + B

20 / 26

slide-59
SLIDE 59

AST simplification - Example

ϕ (((A ∨ B) + (A ∧ B)) ∧ A) − (((A ∨ B) + (A ∧ B)) ∨ A)

I = {(1, 1), (1, 0), (2, 1)} Oϕoutputs = {2, 1, 3} SP[outputs]: found ⇒ A + B

20 / 26

slide-60
SLIDE 60

AST simplification - Example

ϕ (((A ∨ B) + (A ∧ B)) ∧ A) − (((A ∨ B) + (A ∧ B)) ∨ A)

I = {(1, 1), (1, 0), (2, 1)} Oϕoutputs = {0, 1, 3} SP[outputs]: found ⇒ V1 ⊕ A

20 / 26

slide-61
SLIDE 61

AST simplification - Example

ϕ (((A ∨ B) + (A ∧ B)) ∧ A) − (((A ∨ B) + (A ∧ B)) ∨ A)

20 / 26

slide-62
SLIDE 62

AST simplification - Example

ϕ (((A ∨ B) + (A ∧ B)) ∧ A) − (((A ∨ B) + (A ∧ B)) ∨ A)

20 / 26

slide-63
SLIDE 63

AST simplification - Example

ϕ (((A ∨ B) + (A ∧ B)) ∧ A) − (((A ∨ B) + (A ∧ B)) ∨ A)

20 / 26

slide-64
SLIDE 64

AST simplification - Example

ϕ (((A ∨ B) + (A ∧ B)) ∧ A) − (((A ∨ B) + (A ∧ B)) ∨ A)

20 / 26

Result Obfuscated:

(((A ∨ B) + (A ∧ B)) ∧ A) − (((A ∨ B) + (A ∧ B)) ∨ A)

Deobfuscated:

(A + B) ⊕ A

slide-65
SLIDE 65

Table of Contents

Background Software obfuscation Deobfuscation techniques Our Synthesis Approach Goal & Contributions Approach steps Experimental Benchmarks Experimental Setup Benchmarks Conclusion

21 / 26

slide-66
SLIDE 66

Dataset

⇒ Datasets are built with Tigress 2.2 and the EncodeArithmetic (EA),

EncodeData (ED) and Virtualization (VR).

⇒ In each dataset: 500 obfuscated functions (except 239 for EA-ED)

22 / 26

slide-67
SLIDE 67

Dataset

⇒ Datasets are built with Tigress 2.2 and the EncodeArithmetic (EA),

EncodeData (ED) and Virtualization (VR).

⇒ In each dataset: 500 obfuscated functions (except 239 for EA-ED)

Mean size ϕ (in node) Original Obfuscated #1: Syntia † 3.97 203.19 #2: EA 13.5 131.56 #3: VR-EA 13.5 443.64 #4: EA-ED 13.5 9223.46

†use EA-ED (with 5 derivations max, other are 21 max)

22 / 26

slide-68
SLIDE 68

Dataset

⇒ Datasets are built with Tigress 2.2 and the EncodeArithmetic (EA),

EncodeData (ED) and Virtualization (VR).

⇒ In each dataset: 500 obfuscated functions (except 239 for EA-ED)

Mean size ϕ (in node) Original Obfuscated #1: Syntia † 3.97 203.19 #2: EA 13.5 131.56 #3: VR-EA 13.5 443.64 #4: EA-ED 13.5 9223.46

†use EA-ED (with 5 derivations max, other are 21 max)

# lookup table (SP): 3,358,709 expressions (14 sets of 3 vars & 5 operators each) input vector size I (for SP): 15

22 / 26

slide-69
SLIDE 69

Syntia benchmark

Simplification

Mean expr. size Simplification Mean scale factor Orig ObfB Synt

Partial Full ObfS/Orig Synt/Orig Syntia / / / 52 448 / / QSynth 3.97 203.19 3.71 500 500 x35.03 x0.94

Orig, ObfS, ObfB, Synt are rsp. original, obfuscated (source, binary level) and synthesized exprs

23 / 26

slide-70
SLIDE 70

Syntia benchmark

Simplification

Mean expr. size Simplification Mean scale factor Orig ObfB Synt

Partial Full ObfS/Orig Synt/Orig Syntia / / / 52 448 / / QSynth 3.97 203.19 3.71 500 500 x35.03 x0.94

Orig, ObfS, ObfB, Synt are rsp. original, obfuscated (source, binary level) and synthesized exprs

Accuracy & Speed

Semantic Time Sym.Ex Synthesis Total per fun. Syntia / / / 34 min 4.08s QSynth 500 1m20s 15s 1m35s 0.19s

23 / 26

slide-71
SLIDE 71

Tigress benchmark

Simplification

Mean expr. size Simplification Mean Scale factor Orig ObfB Synt

Partial Full ObfS/Orig Synt/Orig Dataset 2 13.5 245.81 21.92 500 354 x18.34 x1.64 EA (70.80%) Dataset 3 13.5 443.64 25.42 500 375

  • x1.90

VR-EA (75.00%) Dataset 4 13.5 9223.46 3812.84 5 234 133 x405.25 x234.44 EA-ED (55.65%)

Orig, ObfS, ObfB, Synt are respectively original, obfuscated (source, binary level) and synthesized expressions

24 / 26

slide-72
SLIDE 72

Tigress benchmark

Simplification

Mean expr. size Simplification Mean Scale factor Orig ObfB Synt

Partial Full ObfS/Orig Synt/Orig Dataset 2 13.5 245.81 21.92 500 354 x18.34 x1.64 EA (70.80%) Dataset 3 13.5 443.64 25.42 500 375

  • x1.90

VR-EA (75.00%) Dataset 4 13.5 9223.46 3812.84 5 234 133 x405.25 x234.44 EA-ED (55.65%)

Orig, ObfS, ObfB, Synt are respectively original, obfuscated (source, binary level) and synthesized expressions

Accuracy & Speed

Semantic Time Sym.Ex Synthesis Total per fun. Dataset 2 OK: 413 1m7s 1m42s 2m49s 0.34s EA KO: 4 Dataset 3 OK: 401 17m10s 2m46s 19m56s 2.39s VR-EA KO: 43 Dataset 4

  • 13m18s

2h7m 2h21m 35.47s EA-ED

24 / 26

slide-73
SLIDE 73

Conclusion

Challenge

⇒ Deobfuscating some data-flow based (composite) obfuscations

25 / 26

slide-74
SLIDE 74

Conclusion

Challenge

⇒ Deobfuscating some data-flow based (composite) obfuscations

Results

⇒ A scalable synthesis algorithm improving the state-of-the-art in

both speed and accuracy

25 / 26

slide-75
SLIDE 75

Conclusion

Challenge

⇒ Deobfuscating some data-flow based (composite) obfuscations

Results

⇒ A scalable synthesis algorithm improving the state-of-the-art in

both speed and accuracy

Limitation:

◮ synthesizing expressions using constants ◮ addressing encoded-data (which scale)

25 / 26

slide-76
SLIDE 76

Conclusion

Challenge

⇒ Deobfuscating some data-flow based (composite) obfuscations

Results

⇒ A scalable synthesis algorithm improving the state-of-the-art in

both speed and accuracy

Limitation:

◮ synthesizing expressions using constants ◮ addressing encoded-data (which scale)

Future work:

◮ experimenting other synthesis primitives & simplification strategies (D&C..) ◮ combining with other approach (not necessarily synthesis-based) ◮ testing against other obfuscators

25 / 26

slide-77
SLIDE 77

Thank you!

26 / 26

slide-78
SLIDE 78

References

Susmit Jha, Sumit Gulwani, Sanjit A Seshia, and Ashish Tiwari. Oracle-guided component-based program synthesis. Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, pages 215-224. ACM, 2010.

Synthesis time: 31 seconds in average

Fabrizio Biondi, S´ ebastien Josse, Axel Legay, and Thomas Sirvent. Effectiveness of synthesis in concolic deobfuscation. Computers & Security, 70:500-515, 2017.

Synthesis time: 96 bits in 20 seconds ca.

Tim Blazytko, Moritz Contag, Cornelius Aschermann, and Thorsten Holz. Syntia: Synthesizing the semantics of obfuscated code. 26th USENIX Security Symposium (USENIX Security 17), pages 643-659, 2017.

Synthesis time: 4 seconds in average

27 / 26

slide-79
SLIDE 79

Presetting pre-computed synthesis lookup tables

Goal: Finding the smallest discriminative input vector size How: Checking equivalence by SMT with synthesized expr. (on EA) x axis: input vector size, y axis: Function number

28 / 26

slide-80
SLIDE 80

Presetting pre-computed synthesis lookup tables

Goal: Finding the smallest discriminative input vector size How: Checking equivalence by SMT with synthesized expr. (on EA) x axis: input vector size, y axis: Function number

28 / 26

Conclusion We chose 15 as a good trade-of between semantic accuracy and evaluation speed.

slide-81
SLIDE 81

Synthesis time distribution (on EA)

29 / 26

slide-82
SLIDE 82

Synthesis simplification (on EA)

30 / 26