How to really obfuscate your PDF malware Sebastian Porst - ReCon - - PowerPoint PPT Presentation

how to really obfuscate
SMART_READER_LITE
LIVE PREVIEW

How to really obfuscate your PDF malware Sebastian Porst - ReCon - - PowerPoint PPT Presentation

How to really obfuscate your PDF malware Sebastian Porst - ReCon 2010 Email: sebastian.porst@zynamics.com Twitter: @LambdaCube 1 Targeted Attacks 2008 Adobe Acrobat Reader; 28.61% Microsoft Word; 34.55% Microsoft PowerPoint; 16.87%


slide-1
SLIDE 1

How to really obfuscate your PDF malware

Sebastian Porst - ReCon 2010 Email: sebastian.porst@zynamics.com Twitter: @LambdaCube

1

slide-2
SLIDE 2

Targeted Attacks 2008

2

Adobe Acrobat Reader; 28.61% Microsoft PowerPoint; 16.87% Microsoft Excel; 19.97% Microsoft Word; 34.55% http://www.f-secure.com/weblog/archives/00001676.html

slide-3
SLIDE 3

Targeted Attacks 2009

3

Adobe Acrobat Reader; 48.87% Microsoft PowerPoint; 4.52% Microsoft Excel; 7.39% Microsoft Word; 39.22%

slide-4
SLIDE 4

Exploited in the wild

CVE- 2007- 5659 CVE- 2008- 2992 CVE- 2009- 0658 CVE- 2009- 0927 CVE- 2009- 1492 CVE- 2009- 3459 CVE- 2009- 4324 CVE- 2010- 0188

slide-5
SLIDE 5

Four common exploit paths

5

Broken PDF Parser Vulnerable JavaScript Engine Vulnerable external libraries /Launch

slide-6
SLIDE 6

PDF Malware Obfuscation

6

Different tricks for different purposes

Make manual analysis more difficult Resist automated analysis Avoid detection by virus scanners

slide-7
SLIDE 7

PDF Malware Obfuscation

7

Conflicting goals

Avoid detection by being wellformed Make analysis difficult by being malformed

slide-8
SLIDE 8

How to achieve these goals

8

Being harmless Being evil

  • Avoid JavaScript
  • Do not use unusual

encodings

  • Do not try to break

parser-based tools

  • Ideally use an 0-day
  • Use heavy
  • bfuscation
  • Try to break tools
slide-9
SLIDE 9

9

Let‘s be evil

slide-10
SLIDE 10

Breaking tools

slide-11
SLIDE 11

11

Rule #1: Do the unexpected

slide-12
SLIDE 12

This is what tools expect

  • ASCII Strings
  • Boring encodings like #41 instead of A
  • Wellformed or only moderately malformed

PDF file structure

12

slide-13
SLIDE 13

Malformed documents

  • Adobe Reader tries to load malformed PDF

files

  • Very, very liberal interpretation of the PDF

specification

  • Parser-based analysis tools need to know

about Adobe Reader file correction

13

slide-14
SLIDE 14

Malformed PDF file – Example I

14

7 0 obj << /Type /Action /S /JavaScript /JS (app.alert('whatever');) >> endobj

slide-15
SLIDE 15

Malformed PDF file – Example II

15

5 0 obj << /Length 45 >> stream some data endstream endobj

slide-16
SLIDE 16

Further reading

16

slide-17
SLIDE 17

Obfuscating JavaScript code

slide-18
SLIDE 18

Goal of JavaScript obfuscation

18

Hide the shellcode

slide-19
SLIDE 19

JavaScript obfuscation in the wild

  • Screwed up formatting
  • Name obfuscation
  • Eval-chains
  • Splitting JavaScript code
  • Simple anti-emulation techniques
  • callee-trick
  • ...

19

slide-20
SLIDE 20

Screwed up formatting

  • Basically just remove all newlines
  • Completely useless: jsbeautifier.org

20

slide-21
SLIDE 21

Name obfuscation

  • Variables or function names are renamed to

hide their meaning

  • Most JavaScript obfuscators screw this up

21

slide-22
SLIDE 22

Obfuscation example: Original code

22

function executePayload(payload, delay) { if (delay > 1000) { // Whatever } } function heapSpray(code, repeat) { for (i=0;i<repeat;i++) { code = code + code; } }

slide-23
SLIDE 23

Obfuscation without considering scope

23

function executePayload(hkof3ewhoife, fhpfewhpofe) { if (fhpfewhpofe > 1000) { // Whatever } } function heapSpray(hoprwehjoprew, hoifwep43) { for (jnpfw93=0;jnpfw93<hoifwep43;jnpfw93++) { hoprwehjoprew = hoprwehjoprew + hoprwehjoprew; } }

slide-24
SLIDE 24

Obfuscation with considering scope

24

function executePayload(grtertttrr, hnpfefwefee) { if (hnpfefwefee > 1000) { // Whatever } } function heapSpray(grtertttrr, hnpfefwefee) { for (hjnprew=0;hjnprew<hnpfefwefee;hjnprew++) { grtertttrr = grtertttrr + grtertttrr; } }

slide-25
SLIDE 25

Obfuscation: Going the whole way

25

function ____(____, _____) { if (_____ > 1000) { // Whatever } } function _____(____, _____) { for (______=0; ______<_____; ______++) { ____ = ____ + ____; } }

slide-26
SLIDE 26

Name obfuscation: Lessons learned

  • Consider name scope

– Deobfuscator needs to know scoping rules too

  • Use underscores

– Drives human analysts crazy

  • Also cute: Use meaningful names that have

nothing to do with the variable

– Maybe shuffle real variable names

26

slide-27
SLIDE 27

Eval chains

  • JavaScript code can execute JavaScript code in

strings through eval

  • Often used to hide later code stages which are

decrypted on the fly

  • Common way to extract argument: replace

eval with a printing function

27

slide-28
SLIDE 28

Eval chains: Doing it better

  • Make sure your later stages reference

variables or functions from earlier stages

  • Re-use individual eval statements multiple

times to make sure eval calls can not just be replaced

28

slide-29
SLIDE 29

JavaScript splitting

  • JavaScript can be split over several PDF
  • bjects
  • These scripts can be executed consecutively
  • Context is preserved between scripts
  • In the wild I‘ve seen splitting across 2-4
  • bjects

29

slide-30
SLIDE 30

JavaScript splitting: Doing it better

  • One line of JavaScript per object
  • Randomize the order of JavaScript objects
  • Admittedly it takes only one script to sort and

extract the scripts from the objects

30

slide-31
SLIDE 31

Anti-emulation code

  • Simple checks for Adobe Reader extensions
  • Multistaged JavaScript code

31

slide-32
SLIDE 32

Current malware loads code from

32

Pages Annotations Info Dictionary

slide-33
SLIDE 33

Example: Loading code from annotations

33

y = app.doc; y.syncAnnotScan(); var p = y["getAnnots"]({nPage: 0}); var s = p[0].subject; eval(s);

slide-34
SLIDE 34

Problems with current approaches

34

Code is in the file Easy to extract

slide-35
SLIDE 35

Anti-emulation code: Improved

35

Key ideas behind anti-emulation code

Find idiosyncrasies in the Adobe JavaScript engine Find extensions that are difficult to emulate

slide-36
SLIDE 36

Exhibit A: Idiosyncrasy

36

cypher = [7, 17, 28, 93, 4, 10, 4, 30, 7, 77, 83, 72]; cypherLength = cypher.length; hidden = "ThisIsNotTheKeyYouAreLookingFor"; hiddenLength = hidden.toString().length; for(i=0,j=0;i<cypherLength;i++,j++) { cypherChar = cypher[i]; keyChar = hidden.toString().charCodeAt(j); cypher[i] = String.fromCharCode(cypherChar ^ keyChar); if (j == hiddenLength - 1) j = -1; } eval(cypher.join(""));

slide-37
SLIDE 37

Exhibit A: Explained

37

hidden = false; hidden = "Key"; hidden = false; hidden = "Key"; JavaScript Standard Adobe Reader JavaScript hidden has the value „Key“ hidden has the value „true“

slide-38
SLIDE 38

Exhibit A: Explained

38

The Adobe Reader JavaScript engine defines global variables that do not change their type on assignment.

(I suspect this happens because they are backed by C++ code)

slide-39
SLIDE 39

Exhibit B: Difficult to emulate

  • Goal: Find Adobe JavaScript API functions

which are nearly impossible to emulate

  • Then use effects of these functions in sneaky

ways to change malware behavior

  • The Adobe Reader JavaScript documentation

is your friend

39

slide-40
SLIDE 40

Exhibit B: Difficult to emulate

40

Functions to look for

Rendering engine Forms extensions Multimedia extensions

slide-41
SLIDE 41

Exhibit B: Difficult to emulate

41

crypt = "T^_]^[T IEYYD__ FuRRKBD "; plain = Array(); key = getPageNthWordQuads(0, 0).toString().split(",")[1]; for (i=0,j=0;i<crypt.length;i++,j++) { plain = plain + String.fromCharCode((crypt.charCodeAt(i) ^ key.charCodeAt(j))); if (j >= key.length) j = 0; } app.alert(plain); )

slide-42
SLIDE 42

Exhibit B: Difficult to emulate

42

Functions to avoid

Anything with security restrictions

slide-43
SLIDE 43

Exhibit C: Multi-threaded JavaScript

  • Multi-threaded applications are difficult to

reverse engineer

  • Problem: There are no threads in JavaScript
  • Solution: setTimeOut
  • Example: Cooperative multi-threading with

message-passing between objects

43

slide-44
SLIDE 44

Basic idea

  • Multiple server objects
  • String messages are passed between servers
  • Messages contain new timeout value and

code to evaluate

44

slide-45
SLIDE 45

45

function Server(name) { ... } s1 = new Server("S1"); s2 = new Server("S2"); s1.receive(ENCODED_MESSAGE);

slide-46
SLIDE 46

46

function Server(name) { this.name = name; this.receive = function(message) { recipient = parse_recipient(message) delayTime = parse_delay(message) eval_string = parse_eval_string(message) msg_string = parse_message_string(message) eval(eval_string); command = "recipient.receive('" + msg_string + "')"; this.x = app.setTimeOut(command, delayTime); } };

slide-47
SLIDE 47

How to improve this

  • Use a global string object as the message

queue and manipulate the object on the fly

  • Usage of non-commutative operations so that

execution order really matters

  • Message broadcasting
  • Add anti-emulation code to eval-ed code

47

slide-48
SLIDE 48

callee-trick

  • Not specific to Adobe Reader
  • Frequently used by JavaScript code in other

contexts

  • Function accesses its own source and uses it

as a key to decrypt code or data

  • Add a single whitespace and decryption fails

48

slide-49
SLIDE 49

callee-trick Example

49

function decrypt(cypher) { var key = arguments.callee.toString(); for (var i = 0; i < cypher.length; i++) { plain = key.charCodeAt(i) ^ cypher.charCodeAt(i); } ... }

slide-50
SLIDE 50

More ideas for the future

  • Combine anti-debugging, callee-trick, and

message passing

  • Find more JavaScript engine idiosyncracies:

Sputnik JavaScript test suite

50

slide-51
SLIDE 51

Thanks

  • Didier Stevens
  • Julia Wolf
  • Peter Silberman
  • Bruce Dang

51

slide-52
SLIDE 52

52