DEOBFUSCATION: SEMANTIC ANALYSIS TO THE RESCUE Sbastien Bardin (CEA - - PowerPoint PPT Presentation

deobfuscation
SMART_READER_LITE
LIVE PREVIEW

DEOBFUSCATION: SEMANTIC ANALYSIS TO THE RESCUE Sbastien Bardin (CEA - - PowerPoint PPT Presentation

DEOBFUSCATION: SEMANTIC ANALYSIS TO THE RESCUE Sbastien Bardin (CEA LIST) Robin David (CEA LIST, QuarksLab) Jean-Yves Marion (LORIA) Sbastien Bardin et al. Dagstuhl2017 | 1 IN A NUTSHELL Challenge: malware deobfuscation


slide-1
SLIDE 1

| 1 Sébastien Bardin et al. – Dagstuhl2017

Robin David (CEA LIST, QuarksLab) Jean-Yves Marion (LORIA)

DEOBFUSCATION: SEMANTIC ANALYSIS TO THE RESCUE

Sébastien Bardin (CEA LIST)

slide-2
SLIDE 2

| 2 Sébastien Bardin et al. – Dagstuhl2017

IN A NUTSHELL

  • Challenge: malware deobfuscation
  • Standard techniques (dynamic, syntactic) not enough
  • Semantic methods can help [obfuscation preserves semantic]
  • Yet, need to be strongly adapted (robustness, precision, efficiency)
  • A tour on how symbolic methods can help
  • Explore and discover
  • Prove infeasibility [S&P 2017] -- with Robin David
  • Simplify (not covered here) -- with Jonathan Salwan
slide-3
SLIDE 3

| 3 Sébastien Bardin et al. – Dagstuhl2017

CONTEXT: MALWARE COMPREHENSION The day after: malware comprehension

  • understand what has been going on
  • mitigate, fix and clean
  • improve defense

Goal: help malware comprehension

  • Reverse of heavily obfuscated code
  • Identify and simplify protections

APT: highly sophisticated attacks

  • Targeted malware
  • Written by experts
  • Attack: 0-days
  • Defense: stealth, obfuscation
  • Sponsored by states or mafia

USA elections: DNC Hack

slide-4
SLIDE 4

| 4 Sébastien Bardin et al. – Dagstuhl2017

CHALLENGE: CORRECT DISASSEMBLY Basic reverse problem

  • aka model recovery
  • aka CFG recovery
slide-5
SLIDE 5

| 5 Sébastien Bardin et al. – Dagstuhl2017

CAN BE TRICKY!

  • code – data
  • dynamic jumps (jmp eax)
slide-6
SLIDE 6

| 6 Sébastien Bardin et al. – Dagstuhl2017

REVERSE CAN BECOME A NIGHTMARE (OBFUSCATION) Obfuscation: make a code

hard to reverse

  • self-modification
  • encryption
  • virtualization
  • code overlapping
  • opaque predicates
  • callstack tampering

Goal: help malware comprehension

  • Identify and simplify protections
  • Ideal = revert protections
slide-7
SLIDE 7

| 7 Sébastien Bardin et al. – Dagstuhl2017

EXAMPLE: OPAQUE PREDICATE

Constant-value predicates (always true, always false)

  • dead branch points to spurious code
  • goal = waste reverser time & efforts
slide-8
SLIDE 8

| 8 Sébastien Bardin et al. – Dagstuhl2017

EXAMPLE: STACK TAMPERING

Alter the standard compilation scheme: ret do not go back to call

  • hide the real target
  • return site may be spurious code
slide-9
SLIDE 9

| 9 Sébastien Bardin et al. – Dagstuhl2017

STANDARD DISASSEMBLY TECHNIQUES ARE NOT ENOUGH

Static analysis

  • too fragile vs obfuscation
  • junk instr, missed instr.

Dynamic analysis

  • robust vs obfuscation
  • too incomplete
slide-10
SLIDE 10

| 10 Sébastien Bardin et al. – Dagstuhl2017

SOLUTION? BINARY-LEVEL SEMANTIC ANALYSIS Semantic preserved by obfuscation (?)

slide-11
SLIDE 11

| 11 Sébastien Bardin et al. – Dagstuhl2017

ABOUT FORMAL METHODS

Success in safety-critical

slide-12
SLIDE 12

| 12 Sébastien Bardin et al. – Dagstuhl2017

THE HARD JOURNEY FROM SOURCE TO BINARY Wanted

  • robustness
  • precision
  • scale
slide-13
SLIDE 13

| 13 Sébastien Bardin et al. – Dagstuhl2017

STATIC SEMANTIC ANALYSIS IS VER VERY HARD ON BINARY CODE Problems

  • Jump eax
  • memory
  • Bit resoning
slide-14
SLIDE 14

| 14 Sébastien Bardin et al. – Dagstuhl2017

INSTEAD: DYNAMIC SYMBOLIC EXECUTION (DSE, Godefroid 2005)

Perfect for intensive testing

  • Correct, relatively complete
  • No false alarm
  • Robust
  • Scale in some ways

// incomplete

slide-15
SLIDE 15

| 15 Sébastien Bardin et al. – Dagstuhl2017

DSE: PATH PREDICATE COMPUTATION (DSE, Godefroid 2005)

slide-16
SLIDE 16

| 16 Sébastien Bardin et al. – Dagstuhl2017

ABOUT ROBUSTNESS (imo, the major advantage) « concretization »

  • Keep going when symbolic

reasoning fails

  • Tune the tradeoff genericity
  • cost
slide-17
SLIDE 17

| 17 Sébastien Bardin et al. – Dagstuhl2017

DYNAMIC SYMBOLIC EXECUTION CAN HELP (Debray, Kruegel, …) For deobfuscation

  • find new real paths
  • robust
  • still incomplete

« dynamic analysis on steroids »

slide-18
SLIDE 18

| 18 Sébastien Bardin et al. – Dagstuhl2017

DSE COMPLEMENTS DYNAMIC ANALYSIS

slide-19
SLIDE 19

| 19 Sébastien Bardin et al. – Dagstuhl2017

OVERVIEW

Correct Complete Efficient Robust Static syntactic X

  • - / X

OK X Dynamic OK XX OK OK DSE OK

  • X

OK Static semantic X OK / X X X

slide-20
SLIDE 20

| 20 Sébastien Bardin et al. – Dagstuhl2017

IN PRACTICE

Can recover useful semantic information

  • More precise disassembly
  • Exact semantic of instructions
  • Input of interest
slide-21
SLIDE 21

| 21 Sébastien Bardin et al. – Dagstuhl2017

YET … WHAT ABOUT INFEASIBILITY QUESTIONS? Prove that something is always true (resp. false)

Many such issues in reverse

  • is a branch dead?
  • does the ret always return to the call?
  • have i found all targets of a dynamic jump?

And more

  • does this malicious ret always go there?
  • does this expression always evaluate to 15?
  • does this self-modification always write this opcode?
  • does this self-modification always rewrite this instr.?

Not addressed by DSE

  • Cannot enumerate all paths
slide-22
SLIDE 22

| 22 Sébastien Bardin et al. – Dagstuhl2017

OUR CHALLENGE Check infeasibility questions in obfuscated codes

  • scale to realistic malware sizes
  • robust to obfuscation such as self-modification
  • precise
  • generic

Rest of the talk:

  • opaque predicate
  • stack tampering
slide-23
SLIDE 23

| 23 Sébastien Bardin et al. – Dagstuhl2017

OUR PROPOSAL: BACKWARD-BOUNDED SYMBOLIC EXECUTION

Insight 1: symbolic reasoning

  • precision
  • But: need finite #paths

Insight 2: backward-bounded

  • pre_k(c)=0 => c is infeasible
  • finite #paths
  • efficient, depends on k
  • But: backward on jump eax?

Insight 3: dynamic partial CFG

  • solve (partially) dyn. jumps
  • robustness

False negative (FN)

  • can miss infeasibility
  • why: k too small (miss /\-constraints)

False positive (FP)

  • wrongly assert infeasibility
  • why: CFG too partial (miss \/-constraints)

Low FP/FN rates in practice

  • ground truth xp
slide-24
SLIDE 24

| 24 Sébastien Bardin et al. – Dagstuhl2017

FORWARD & BACKWARD SYMBOLIC EXECUTION

slide-25
SLIDE 25

| 25 Sébastien Bardin et al. – Dagstuhl2017

EXPERIMENTAL EVALUATION

  • Controlled experiments (ground truth) precision
  • Large-scale experiment: packers scalability, robustness
  • Case-study: X-tunnel malware usefulness
slide-26
SLIDE 26

| 26 Sébastien Bardin et al. – Dagstuhl2017

CONTROLLED EXPERIMENTS

  • Goal = assess the precision of the technique
  • ground truth value
  • Experiment 1: opaque predicates (o-llvm)
  • 100 core utils, 5x20 obfuscated codes
  • k=16: 3.46% error, no false negative
  • robust to k
  • efficient: 0.02s / query
  • Experiment 2: stack tampering (tigress)
  • 5 obfuscated codes, 5 core utils
  • almost all genuine ret are proved (no false positive)
  • many malicious ret are proved « single-targets »
  • Very precise résults
  • Seems efficient
slide-27
SLIDE 27

| 27 Sébastien Bardin et al. – Dagstuhl2017

CASE-STUDY: PACKERS

Packers: legitimate software protection tools (basic malware: the sole protection)

slide-28
SLIDE 28

| 28 Sébastien Bardin et al. – Dagstuhl2017

CASE-STUDY: PACKERS (fun facts)

slide-29
SLIDE 29

| 29 Sébastien Bardin et al. – Dagstuhl2017

CASE-STUDY: PACKERS (fun facts)

slide-30
SLIDE 30

| 30 Sébastien Bardin et al. – Dagstuhl2017

CASE-STUDY: THE XTUNNEL MALWARE (part of DNC hack) Two heavily obfuscated samples

  • Many opaque predicates

Goal: detect & remove protections

  • Identify 50% of code as spurious
  • Fully automatic, < 3h
slide-31
SLIDE 31

| 31 Sébastien Bardin et al. – Dagstuhl2017

CASE-STUDY: THE XTUNNEL MALWARE (fun facts)

  • Protection seems to rely only on opaque predicates
  • Only two families of opaque predicates
  • Yet, quite sophisticated
  • riginal OPs
  • interleaving between payload and OP computation
  • sharing among OP computations
  • possibly long dependencies chains (avg 8.7, upto 230)
slide-32
SLIDE 32

| 32 Sébastien Bardin et al. – Dagstuhl2017

SECURITY ANALYSIS: COUNTER-MEASURES (and mitigations)

  • Long dependecy chains (evading the bound k)
  • Not always requires the whole chain to conclude!
  • Can use a more flexible notion of bound (data-dependencies, formula size)
  • Hard-to-solve predicates (causing timeouts)
  • A time-out is already a valuable information
  • Opportunity to find infeasible patterns (then matching), or signatures
  • Tradeoff between performance penalty vs protection focus
  • Note: must be input-dependent, otherwise removed by standard DSE optimizations
  • Anti-dynamic tricks (fool initial dynamic recovery)
  • Can use the appropriate mitigations
  • Note: some tricks can be circumvent by symbolic reasoning

Current state-of-the-art

  • push the cat-and-mouse game further
  • raise the bar for malware designers

Also

  • « Probabilistic obfuscation »
  • Covert channels
slide-33
SLIDE 33

| 33 Sébastien Bardin et al. – Dagstuhl2017

SUMMARY

Feasibility Infeasibility Efficient Robust Static syntactic X X OK X Dynamic

  • X

OK OK DSE OK X X OK Static semantic X OK X X BB-DSE X OK OK OK

slide-34
SLIDE 34

| 34 Sébastien Bardin et al. – Dagstuhl2017

BINSEC

slide-35
SLIDE 35

| 35 Sébastien Bardin et al. – Dagstuhl2017

CONCLUSION & TAKE AWAY

  • A tour on the advantages of symbolic methods for deobfuscation
  • Semantic analysis complements existing approaches
  • Explore, prove infeasible, simplify
  • Open the way to fruitful combinations
  • Formal methods can be useful for malware, but must be adapted
  • Need robustness and scalability!
  • Accept to lose both correctness & completeness – in a controlled way
  • Next Step
  • Combines with user and learning!
  • Anti-anti-DSE
slide-36
SLIDE 36

Commissariat à l’énergie atomique et aux énergies alternatives Institut List | CEA SACLAY NANO-INNOV | BAT. 861 – PC142 91191 Gif-sur-Yvette Cedex - FRANCE www-list.cea.fr Établissement public à caractère industriel et commercial | RCS Paris B 775 685 019