the context of Reverse Engineering Sebastian Porst - - PowerPoint PPT Presentation

the context of reverse engineering
SMART_READER_LITE
LIVE PREVIEW

the context of Reverse Engineering Sebastian Porst - - PowerPoint PPT Presentation

Automated static deobfuscation in the context of Reverse Engineering Sebastian Porst (sebastian.porst@zynamics.com) Christian Ketterer (cketti@gmail.com) Sebastian Christian zynamics GmbH Student Lead Developer University of


slide-1
SLIDE 1

Automated static deobfuscation in the context of Reverse Engineering

Sebastian Porst (sebastian.porst@zynamics.com) Christian Ketterer (cketti@gmail.com)

slide-2
SLIDE 2

Sebastian Christian

  • zynamics GmbH
  • Lead Developer

– BinNavi – REIL/MonoREIL

  • Student
  • University of Karlsruhe
  • Deobfuscation
slide-3
SLIDE 3

Obfuscated Code Readable Code (mysterious things happen here)

20% 40% 40%

This talk

slide-4
SLIDE 4

Motivation

  • Combat common obfuscation

techniques

  • Can it be done?
  • Will it produce useful results?
  • Can it be integrated into our

technology stack?

slide-5
SLIDE 5

Examples of Obfuscation

  • Jump chains
  • Splitting calculations
  • Garbage code insertion
  • Predictable branches
  • Self-modifying code
  • Control-flow flattening
  • Opaque predicates
  • Code parallelization
  • Virtual Machines
  • ...

Simple Tricky

slide-6
SLIDE 6

Our Deobfuscation Approach

I. Copy ancient algorithms from compiler theory books

  • II. Translate obfuscated

assembly code to REIL

  • III. Run algorithms on REIL code
  • IV. Profit (?)
slide-7
SLIDE 7

2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 199X U of Wisc + TU Munich U of Ghent Mathur

  • F. Perriot

Mathur

  • M. Mohammed

U of Auckland zynamics (see end of this presentation for proper source references)

We‘re late in the game ...

Christodorescu Bruschi

slide-8
SLIDE 8

2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 199X Malware Research Defensive Reverse Engineering Offensive Reverse Engineering

... but

slide-9
SLIDE 9

REIL

  • Reverse Engineering Intermediate Language
  • Specifically designed for Reverse Engineering
  • Design Goal: As simple as possible, but not

simpler

  • In use since 2007
slide-10
SLIDE 10

Uses of REIL

Register Tracking: Helps Reverse Engineers follow data flow through code (Never officially presented) Index Underflow Detection: Automatically find negative array accesses (CanSecWest 2009, Vancouver) Automated Deobfuscation: Make obfuscated code more readable (SOURCE Barcelona 2009, Barcelona) ROP Gadget Generator: Automatically generates return-oriented shellcode (Work in progress; scheduled for Q1/2010)

slide-11
SLIDE 11

The REIL Instruction Set

ADD SUB MUL DIV MOD BSH

Arithmetical

AND OR XOR

Bitwise

STR LDM STM

Data Transfer

BISZ JCC

Logical

NOP UNDEF UNKN

Other

slide-12
SLIDE 12
slide-13
SLIDE 13

Why REIL?

  • Simplifies input code
  • Makes effects obvious
  • Makes algorithms platform-independent
slide-14
SLIDE 14

http://www.flickr.com/photos/wedrrc/3586908193/

MonoREIL

  • Monotone Framework for REIL
  • Based on Abstract Interpretation
  • Used to write static code analysis algorithms
slide-15
SLIDE 15

Why MonoREIL?

  • In General: Makes complicated algorithms

simple (trade brain effort for runtime)

  • Deobfuscator: Wrong choice really, but we

wanted more real-life test cases for MonoREIL

slide-16
SLIDE 16

Building the Deobfuscator

  • Java
  • BinNavi Plugin
  • REIL + MonoREIL

http://www.flickr.com/photos/mattimattila/3602654187/

slide-17
SLIDE 17

Block Merging

  • Long chains of basic blocks ending with

unconditional jumps

  • Confusing to follow in text-based

disassemblers

  • Advantage of higher abstraction level in

BinNavi

– Block merging is purely cosmetic

slide-18
SLIDE 18

Before After

Block Merging

slide-19
SLIDE 19

Constant Propagation and Folding

  • Two different concepts
  • One algorithm in our implementation
  • Partial evaluation of the input code
slide-20
SLIDE 20

Before After

Constant Propagation and Folding

slide-21
SLIDE 21

Dead Branch Elimination

  • Removes branches that are never executed

– Turns conditional jumps into unconditional jumps – Removes code from unreachable branch

  • Requires constant propagation/folding
slide-22
SLIDE 22

Before After

Dead Branch Elimination

slide-23
SLIDE 23

Dead Code Elimination

  • Removes code that computes unused values
  • Gets rid of inserted garbage code
  • Cleans up after constant propagation/folding
slide-24
SLIDE 24

Before After

Dead Code Elimination

slide-25
SLIDE 25

Dead Store Elimination

  • Comparable to dead code elimination
  • Removes useless memory write accesses
  • Limited to stack access in our implementation
  • Only platform-specific part of our optimizer
slide-26
SLIDE 26

Dead Store Elimination

Before After

slide-27
SLIDE 27

Suddenly it dawned us: Deobfuscation for RE brings new problems which do not exist in other areas

slide-28
SLIDE 28

Let‘s get some help

slide-29
SLIDE 29

Problem: Side effects

push 10 pop eax mov eax, 10

Removed code was used

  • in a CRC32 integrity check
  • as key of a decryption routine
  • as part of an anti-debug check
  • ...
slide-30
SLIDE 30

Problem: Code Blowup

mov eax, 10 add eax, 10 mov eax, 20 clc ...

Good luck setting

  • AF
  • CF
  • OF
  • PF
  • ZF
slide-31
SLIDE 31

Problem: Moving addresses

0000: jmp ecx 0002: push 10 0003: pop eax 0000: jmp ecx 0002: mov eax, 10

we just missed the pop instruction ecx is 0003 but static analysis can not know this

slide-32
SLIDE 32

Problem: Inability to debug

Executable Input File

mov eax, 10

Deobfuscated list of Instructions but no executable file

slide-33
SLIDE 33

The only way to solve all* problems:

* except for the side-effects issue

A full-blown native code compiler with an integrated optimizer

Too much work, maybe we can approximate ...

slide-34
SLIDE 34

Before After

Only generate optimized REIL code

slide-35
SLIDE 35
  • Produces excellent input for
  • ther analysis algorithms
  • Code blow-up solved
  • Keeps address/instruction

mapping

  • Code can not be debugged

natively but interpreted

  • Side effects problem remains
  • Pretty much unreadable for

human reverse engineers

Only generate optimized REIL code

slide-36
SLIDE 36

Before After

Effect comments

slide-37
SLIDE 37
  • Results can easily be used by

human reverse engineers

  • Code blow-up solved
  • Side effects problem remains
  • Address mapping problem
  • Code can not be debugged
  • Comments have semantic

meaning

Effect comments

slide-38
SLIDE 38

Before After

Extract formulas from code

slide-39
SLIDE 39
  • Results can easily be used by

human reverse engineers

  • No code generation necessary,
  • nly extraction of semantic

information

  • Solves all problems because
  • riginal program remains

unchanged

  • Not really deobfuscation (but

produces similar result?)

Extract formulas from code

slide-40
SLIDE 40

Before After

Implement a small pseudo-compiler

slide-41
SLIDE 41
  • This is what we did
  • Closest thing to the real deal
  • Code blow-up is solved
  • Partially
  • Natively debug the output
  • not in our case
  • pseudo x86 instructions
  • Side effects problem remains
  • Address mapping problem

remains

  • Why not go for a complete

compiler?

Implement a small pseudo-compiler

slide-42
SLIDE 42

Economic value in creating a complete

  • ptimizing compiler for RE?

Not for us

  • Small company
  • Limited market
  • Wrong approach?
slide-43
SLIDE 43

Alternative Approaches

  • Deobfuscator built into disassembler
  • REIL-based formula extraction
  • Hex-Rays Decompiler
  • Code optimization and generation based on

LLVM

  • Emulation / Dynamic deobfuscation
slide-44
SLIDE 44

Conclusion

  • The concept of static deobfuscation is sound

– Except for things like side-effects, SMC, ...

  • A lot of work
  • Expression reconstruction might be much

easier and still produce comparable results

slide-45
SLIDE 45

Related work

  • A taxonomy of obfuscating transformations
  • Defeating polymorphism through code
  • ptimization
  • Code Normalization for Self-Mutating Malware
  • Software transformations to improve malware

detection

  • Zeroing in on Metamorphic Computer Viruses
  • ...
slide-46
SLIDE 46

http://www.flickr.com/photos/marcobellucci/3534516458/