[PPT] - DEOBFUSCATION: SEMANTIC ANALYSIS TO THE RESCUE Sbastien Bardin (CEA PowerPoint Presentation

SLIDE 1

| 1 Sébastien Bardin et al. – Dagstuhl2017

Robin David (CEA LIST, QuarksLab) Jean-Yves Marion (LORIA)

DEOBFUSCATION: SEMANTIC ANALYSIS TO THE RESCUE

Sébastien Bardin (CEA LIST)

SLIDE 2

| 2 Sébastien Bardin et al. – Dagstuhl2017

IN A NUTSHELL

Challenge: malware deobfuscation
Standard techniques (dynamic, syntactic) not enough
Semantic methods can help [obfuscation preserves semantic]
Yet, need to be strongly adapted (robustness, precision, efficiency)
A tour on how symbolic methods can help
Explore and discover
Prove infeasibility [S&P 2017] -- with Robin David
Simplify (not covered here) -- with Jonathan Salwan

SLIDE 3

| 3 Sébastien Bardin et al. – Dagstuhl2017

CONTEXT: MALWARE COMPREHENSION The day after: malware comprehension

understand what has been going on
mitigate, fix and clean
improve defense

Goal: help malware comprehension

Reverse of heavily obfuscated code
Identify and simplify protections

APT: highly sophisticated attacks

Targeted malware
Written by experts
Attack: 0-days
Defense: stealth, obfuscation
Sponsored by states or mafia

USA elections: DNC Hack

SLIDE 4

| 4 Sébastien Bardin et al. – Dagstuhl2017

CHALLENGE: CORRECT DISASSEMBLY Basic reverse problem

aka model recovery
aka CFG recovery

SLIDE 5

| 5 Sébastien Bardin et al. – Dagstuhl2017

CAN BE TRICKY!

code – data
dynamic jumps (jmp eax)

SLIDE 6

| 6 Sébastien Bardin et al. – Dagstuhl2017

REVERSE CAN BECOME A NIGHTMARE (OBFUSCATION) Obfuscation: make a code

hard to reverse

self-modification
encryption
virtualization
code overlapping
opaque predicates
callstack tampering
…

Goal: help malware comprehension

Identify and simplify protections
Ideal = revert protections

SLIDE 7

| 7 Sébastien Bardin et al. – Dagstuhl2017

EXAMPLE: OPAQUE PREDICATE

Constant-value predicates (always true, always false)

dead branch points to spurious code
goal = waste reverser time & efforts

SLIDE 8

| 8 Sébastien Bardin et al. – Dagstuhl2017

EXAMPLE: STACK TAMPERING

Alter the standard compilation scheme: ret do not go back to call

hide the real target
return site may be spurious code

SLIDE 9

| 9 Sébastien Bardin et al. – Dagstuhl2017

STANDARD DISASSEMBLY TECHNIQUES ARE NOT ENOUGH

Static analysis

too fragile vs obfuscation
junk instr, missed instr.

Dynamic analysis

robust vs obfuscation
too incomplete

SLIDE 10

| 10 Sébastien Bardin et al. – Dagstuhl2017

SOLUTION? BINARY-LEVEL SEMANTIC ANALYSIS Semantic preserved by obfuscation (?)

SLIDE 11

| 11 Sébastien Bardin et al. – Dagstuhl2017

ABOUT FORMAL METHODS

Success in safety-critical

SLIDE 12

| 12 Sébastien Bardin et al. – Dagstuhl2017

THE HARD JOURNEY FROM SOURCE TO BINARY Wanted

robustness
precision
scale

SLIDE 13

| 13 Sébastien Bardin et al. – Dagstuhl2017

STATIC SEMANTIC ANALYSIS IS VER VERY HARD ON BINARY CODE Problems

Jump eax
memory
Bit resoning

SLIDE 14

| 14 Sébastien Bardin et al. – Dagstuhl2017

INSTEAD: DYNAMIC SYMBOLIC EXECUTION (DSE, Godefroid 2005)

Perfect for intensive testing

Correct, relatively complete
No false alarm
Robust
Scale in some ways

// incomplete

SLIDE 15

| 15 Sébastien Bardin et al. – Dagstuhl2017

DSE: PATH PREDICATE COMPUTATION (DSE, Godefroid 2005)

SLIDE 16

| 16 Sébastien Bardin et al. – Dagstuhl2017

ABOUT ROBUSTNESS (imo, the major advantage) « concretization »

Keep going when symbolic

reasoning fails

Tune the tradeoff genericity
cost

SLIDE 17

| 17 Sébastien Bardin et al. – Dagstuhl2017

DYNAMIC SYMBOLIC EXECUTION CAN HELP (Debray, Kruegel, …) For deobfuscation

find new real paths
robust
still incomplete

« dynamic analysis on steroids »

SLIDE 18

| 18 Sébastien Bardin et al. – Dagstuhl2017

DSE COMPLEMENTS DYNAMIC ANALYSIS

SLIDE 19

| 19 Sébastien Bardin et al. – Dagstuhl2017

OVERVIEW

Correct Complete Efficient Robust Static syntactic X

- / X

OK X Dynamic OK XX OK OK DSE OK

X

OK Static semantic X OK / X X X

SLIDE 20

| 20 Sébastien Bardin et al. – Dagstuhl2017

IN PRACTICE

Can recover useful semantic information

More precise disassembly
Exact semantic of instructions
Input of interest
…

SLIDE 21

| 21 Sébastien Bardin et al. – Dagstuhl2017

YET … WHAT ABOUT INFEASIBILITY QUESTIONS? Prove that something is always true (resp. false)

Many such issues in reverse

is a branch dead?
does the ret always return to the call?
have i found all targets of a dynamic jump?

And more

does this malicious ret always go there?
does this expression always evaluate to 15?
does this self-modification always write this opcode?
does this self-modification always rewrite this instr.?
…

Not addressed by DSE

Cannot enumerate all paths

SLIDE 22

| 22 Sébastien Bardin et al. – Dagstuhl2017

OUR CHALLENGE Check infeasibility questions in obfuscated codes

scale to realistic malware sizes
robust to obfuscation such as self-modification
precise
generic

Rest of the talk:

opaque predicate
stack tampering

SLIDE 23

| 23 Sébastien Bardin et al. – Dagstuhl2017

OUR PROPOSAL: BACKWARD-BOUNDED SYMBOLIC EXECUTION

Insight 1: symbolic reasoning

precision
But: need finite #paths

Insight 2: backward-bounded

pre_k(c)=0 => c is infeasible
finite #paths
efficient, depends on k
But: backward on jump eax?

Insight 3: dynamic partial CFG

solve (partially) dyn. jumps
robustness

False negative (FN)

can miss infeasibility
why: k too small (miss /\-constraints)

False positive (FP)

wrongly assert infeasibility
why: CFG too partial (miss \/-constraints)

Low FP/FN rates in practice

ground truth xp

SLIDE 24

| 24 Sébastien Bardin et al. – Dagstuhl2017

FORWARD & BACKWARD SYMBOLIC EXECUTION

SLIDE 25

| 25 Sébastien Bardin et al. – Dagstuhl2017

EXPERIMENTAL EVALUATION

Controlled experiments (ground truth) precision
Large-scale experiment: packers scalability, robustness
Case-study: X-tunnel malware usefulness

SLIDE 26

| 26 Sébastien Bardin et al. – Dagstuhl2017

CONTROLLED EXPERIMENTS

Goal = assess the precision of the technique
ground truth value
Experiment 1: opaque predicates (o-llvm)
100 core utils, 5x20 obfuscated codes
k=16: 3.46% error, no false negative
robust to k
efficient: 0.02s / query
Experiment 2: stack tampering (tigress)
5 obfuscated codes, 5 core utils
almost all genuine ret are proved (no false positive)
many malicious ret are proved « single-targets »
Very precise résults
Seems efficient

SLIDE 27

| 27 Sébastien Bardin et al. – Dagstuhl2017

CASE-STUDY: PACKERS

Packers: legitimate software protection tools (basic malware: the sole protection)

SLIDE 28

| 28 Sébastien Bardin et al. – Dagstuhl2017

CASE-STUDY: PACKERS (fun facts)

SLIDE 29

| 29 Sébastien Bardin et al. – Dagstuhl2017

CASE-STUDY: PACKERS (fun facts)

SLIDE 30

| 30 Sébastien Bardin et al. – Dagstuhl2017

CASE-STUDY: THE XTUNNEL MALWARE (part of DNC hack) Two heavily obfuscated samples

Many opaque predicates

Goal: detect & remove protections

Identify 50% of code as spurious
Fully automatic, < 3h

SLIDE 31

| 31 Sébastien Bardin et al. – Dagstuhl2017

CASE-STUDY: THE XTUNNEL MALWARE (fun facts)

Protection seems to rely only on opaque predicates
Only two families of opaque predicates
Yet, quite sophisticated
riginal OPs
interleaving between payload and OP computation
sharing among OP computations
possibly long dependencies chains (avg 8.7, upto 230)

SLIDE 32

| 32 Sébastien Bardin et al. – Dagstuhl2017

SECURITY ANALYSIS: COUNTER-MEASURES (and mitigations)

Long dependecy chains (evading the bound k)
Not always requires the whole chain to conclude!
Can use a more flexible notion of bound (data-dependencies, formula size)
Hard-to-solve predicates (causing timeouts)
A time-out is already a valuable information
Opportunity to find infeasible patterns (then matching), or signatures
Tradeoff between performance penalty vs protection focus
Note: must be input-dependent, otherwise removed by standard DSE optimizations
Anti-dynamic tricks (fool initial dynamic recovery)
Can use the appropriate mitigations
Note: some tricks can be circumvent by symbolic reasoning

Current state-of-the-art

push the cat-and-mouse game further
raise the bar for malware designers

Also

« Probabilistic obfuscation »
Covert channels

SLIDE 33

| 33 Sébastien Bardin et al. – Dagstuhl2017

SUMMARY

Feasibility Infeasibility Efficient Robust Static syntactic X X OK X Dynamic

X

OK OK DSE OK X X OK Static semantic X OK X X BB-DSE X OK OK OK

SLIDE 34

| 34 Sébastien Bardin et al. – Dagstuhl2017

BINSEC

SLIDE 35

| 35 Sébastien Bardin et al. – Dagstuhl2017

CONCLUSION & TAKE AWAY

A tour on the advantages of symbolic methods for deobfuscation
Semantic analysis complements existing approaches
Explore, prove infeasible, simplify
Open the way to fruitful combinations
Formal methods can be useful for malware, but must be adapted
Need robustness and scalability!
Accept to lose both correctness & completeness – in a controlled way
Next Step
Combines with user and learning!
Anti-anti-DSE

SLIDE 36

Commissariat à l’énergie atomique et aux énergies alternatives Institut List | CEA SACLAY NANO-INNOV | BAT. 861 – PC142 91191 Gif-sur-Yvette Cedex - FRANCE www-list.cea.fr Établissement public à caractère industriel et commercial | RCS Paris B 775 685 019