Mining Malware Secrets Paul Black Arun Lakhotia Federation - - PowerPoint PPT Presentation

mining malware secrets
SMART_READER_LITE
LIVE PREVIEW

Mining Malware Secrets Paul Black Arun Lakhotia Federation - - PowerPoint PPT Presentation

Mining Malware Secrets Paul Black Arun Lakhotia Federation University University of Louisiana at Lafayette Introductions Paul Black Arun Lakhotia Malware Analyst 5 years Professor of Computer Science CEO, Cythereal,


slide-1
SLIDE 1

Mining Malware Secrets

Paul Black

Federation University

Arun Lakhotia

University of Louisiana at Lafayette

slide-2
SLIDE 2

Paul Black

  • Malware Analyst – 5 years
  • PhD Candidate, ICSL,

Federation Uni

  • Masters thesis (2013):

– Decryption of Zeus Configuration File Arun Lakhotia

  • Professor of Computer Science
  • CEO, Cythereal, Inc.
  • Malware Analysis Research: 12+

years

  • Participant in DARPA Cyber

Genome

  • Other AFRL/ARO projects.

Introductions

slide-3
SLIDE 3

Secrets Hidden in Malware

  • Decryption Keys
  • C2 Server(s)
  • DGA keys
  • Malware Version
slide-4
SLIDE 4

Example: Citadel - decrypting secret

0x11a23: push esi mov edx,56CH push edx push 403AA8H push eax call 1443AH 0x11a35: mov ecx,dptr(4342FCH) add ecx,dptr(4346DCH) mov esi,edx sub ecx,eax 0x11a45: mov dl,bptr(esp+eax*1) xor bptr(eax),dl inc eax dec esi jnz 11A45H 0x11a4e: pop esi retn

Copy configuration buffer Decryption loop Access decryption key Mem Copy

slide-5
SLIDE 5

41C6EA 56 push esi 416CEB BA 6C 05 00 00 mov edx, 56Ch 41C6F0 52 push edx 41C6F1 68 A8 3A 40 00 push 403AA8H 41C6F6 50 push eax 41C6F7 E8 C5 C9 00 00 call 143AAH 41C6FC 8B 0D FC 42 43 00 mov ecx, 4342FCH 41C702 03 0D DC 46 43 00 add ecx, 4346DCH 41C708 8B F2 mov esi, edx 41C70A 2B C8 sub ecx, eax

Example: Citadel - parameters

Config Buffer Size Config Buffer Start Pointer Xor key

slide-6
SLIDE 6

41C6EA 56 416CEB BA 6C 05 00 00 41C6F0 52 41C6F1 68 A8 3A 40 00 41C6F6 50 41C6F7 E8 C5 C9 00 00 41C6FC 8B 0D FC 42 43 00 41C702 03 0D DC 46 43 00 41C708 8B F2 41C70A 2B C8

Example: Citadel – YARA Rule

Config Buffer size Config Buffer Start Pointer Xor Key 56 BA [4] 52 68 [4] 50 E8 [4] 8B [5] 03 [5] 8B F2 2B C8

slide-7
SLIDE 7
  • Tools

– Debugger – Volatility – IDA – YARA

  • Automation Steps

– Run malware – Dump memory – Run volatility plugin – * Locate code segment – * Extract secrets

Mining Malware for Fun & Profit

slide-8
SLIDE 8

YARA: Yet Another Recursive Analyzer

  • Developer: Victor Alvarez, VirusTotal
  • Like grep over binaries
  • Regular expressions over binaries
  • Does not use program or file structure
  • File type agnostic
  • Easy to use and effective
slide-9
SLIDE 9
  • Rules require exact match of

bytes

– Strings, Data, Instructions

  • Easily broken by small changes

– Malware are frequently updated

  • Rules fail silently

– Requires manual verification

  • Silent failures

– Because sample is old – Might be unknown variant – Updated version of known malware – May not be the expected family

YARA - Disadvantages

slide-10
SLIDE 10

Alternative: Use Semantics

slide-11
SLIDE 11

Recap: Requirement for Automation

  • Support the process

– Step 1: Find relevant code in binary – Step 2: Extract parameters about secrets from code

  • Should be resistant to code changes

– Ideally, one set of rules for versions AND variants of the same malware family

slide-12
SLIDE 12

Cod Code vs e vs Se Sema mantics ntics

push ebp mov ebp,esp sub esp,4 mov eax, DWORD ebp+4 mov DWORD ebp+8,eax mov eax, DWORD ebp mov DWORD ebp-4,eax

Code

eax = def(ebp) ebp = -4+def(esp) esp = -8+def(esp) memdw(-8+def(esp))= def(ebp) memdw(-4+def(esp))= def(ebp) memdw(4+def(esp)) = def(memdw(def(esp)))

Semantics

Instruction dependent Order dependent Instruction independent Order independent

slide-13
SLIDE 13

Semantics Neutralizes Polymorphism

mov(ecx,ebp) sub(ecx,63) mov(dptr(ecx+59),eax) pop(ecx) lea(eax,wptr(ebp-28)) push(edi) mov(edi,1148415812)

push(esi) mov(esi,-1545600507)

  • r(ecx,esi)

pop(esi) push(edi) mov(edi,ebp) mov(ecx,edi) pop(edi) push(eax) mov(eax,63) sub(ecx,eax) pop(eax) mov(dptr(ecx+59),eax) pop(ecx) lea(eax,wptr(ebp-28)) push(edi) mov(edi,880280128) push(esi) mov(esi,268135684) add(edi,esi) pop(esi)

Semantics

slide-14
SLIDE 14

cmp(bptr(esi),al) push(edx) mov(dl,al) cmp(bptr(esi),dl) pop(edx) mov(bptr(edi),al) push(ecx) mov(cl,al) mov(bptr(edi),cl) pop(ecx) cmp(al,0) push(ebx) mov(bh,0) cmp(al,bh) pop(ebx) mov(ebx,1684957510) mov(ebx,251658400) xor(ebx,1802398182) mov(cl,0) mov(ecx,1342369920) mov(cl,69) sub(cl,69)]

Sensitive to behavior addition

slide-15
SLIDE 15

41C6EA 56 push esi 416CEB BA 6C 05 00 00 mov edx, 56Ch 41C6F0 52 push edx 41C6F1 68 A8 3A 40 00 push 403AA8H 41C6F6 50 push eax 41C6F7 E8 C5 C9 00 00 call Mem::copy 41C6FC 8B 0D FC 42 43 00 mov ecx, 4342FCH 41C702 03 0D DC 46 43 00 add ecx, 4346DCH 41C708 8B F2 mov esi, edx 41C70A 2B C8 sub ecx, eax 56 BA

Revisit Example: Citadel

Config Buffer Size Config Buffer Start Pointer Xor key

slide-16
SLIDE 16

41C6EA 56 push esi 416CEB BA 6C 05 00 00 mov edx, 56Ch 41C6F0 52 push edx 41C6F1 68 A8 3A 40 00 push 403AA8H 41C6F6 50 push eax 41C6F7 E8 C5 C9 00 00 call Mem::copy 41C6FC 8B 0D FC 42 43 00 mov ecx, 4342FCH 41C702 03 0D DC 46 43 00 add ecx, 4346DCH 41C708 8B F2 mov esi, edx 41C70A 2B C8 sub ecx, eax

Example: Citadel – Semantics

ecx=dptr(0x4342fc,def(ds)) +dptr(0x4346dc,def(ds))

  • def(eax)

esi=def(edx) edx=0x56c esp=-16+def(esp), memdw(-16+def(esp))=def(eax) memdw(-12+def(esp))=0x403aa8 memdw(-8+def(esp))=0x56c memdw(-4+def(esp))=def(esi)

slide-17
SLIDE 17

Two versions of the same function

0x11a23: push esi mov edx,56CH push edx push 403AA8H push eax call 1443AH 0x11a35: mov ecx,dptr(4342FCH) add ecx,dptr(4346DCH) mov esi,edx sub ecx,eax 0x11a45: mov dl,bptr(esp+eax*1) xor bptr(eax),dl inc eax dec esi jnz 11A45H 0x11a4e: pop esi retn 0x54fa: push esi mov edx,330H push edx push 2D80H push eax call CB8CH 0x550c: mov esi,dptr(20980H) mov ecx,dptr(204d4H) add ecx,esi sub ecx,eax mov esi,edx 0x551e: mov dl,bptr(esp+eax*1) xor bptr(eax),dl inc eax dec esi jnz 551EH 0x5527: pop esi retn

slide-18
SLIDE 18

Semantics ‘similar’, not ‘same’

ecx=dptr(0x20980,def(ds)) +dptr(0x204d4,def(ds))

  • def(eax)

esi=def(edx) edx=0x330 esp=-16+def(esp), memdw(-16+def(esp))=def(eax) memdw(-12+def(esp))=0x2d80 memdw(-8+def(esp))=0x330 memdw(-4+def(esp))=def(esi) ecx=dptr(0x4342fc,def(ds)) +dptr(0x4346dc,def(ds))

  • def(eax)

esi=def(edx) edx=0x56c esp=-16+def(esp), memdw(-16+def(esp))=def(eax) memdw(-12+def(esp))=0x403aa8 memdw(-8+def(esp))=0x56c memdw(-4+def(esp))=def(esi)

slide-19
SLIDE 19
  • BinHunt
  • Strong Equivalence
  • Theorem Proving
  • Prove equivalence under

register renaming

  • Accurate, glacially slow
  • Thrown off by different

constants/addresses

  • BinJuice
  • Semantic ‘Similarity’
  • Generalize semantics
  • abstract registers, and

constants

  • Normalized string form
  • Match String
  • Fuzzy, but fast

Matching code on ‘similar’ semantics

slide-20
SLIDE 20

‘Abstract’ semantics = Juice

ecx=dptr(D,def(ds)) +dptr(E,def(ds))

  • def(eax)

esi=def(edx) edx=A esp=-16+def(esp), memdw(-16+def(esp))=def(eax) memdw(-12+def(esp))=B memdw(-8+def(esp))=C memdw(-4+def(esp))=def(esi) ecx=dptr(D,def(ds)) +dptr(E,def(ds))

  • def(eax)

esi=def(edx) edx=A esp=-16+def(esp), memdw(-16+def(esp))=def(eax) memdw(-12+def(esp))=B memdw(-8+def(esp))=C memdw(-4+def(esp))=def(esi)

slide-21
SLIDE 21

Advantage of Juice

  • Determine ‘similarity’ using ‘string match’.
  • Change the nature of the problem
  • Data mining, instead of pairwise comparison
slide-22
SLIDE 22

Mining Intelligence Using Semantics

slide-23
SLIDE 23

Mining Mining Malw Malwar are e in the L in the Lar arge ge

❶ ❸ ❷ ❹ ❶ Unpack ❷ Use juice for features ❸ Create indexes ❹ Search

slide-24
SLIDE 24

Cythereal MAGIC: Malware Genomic Correlation

VM VM HYPERVISOR

Google Cloud Unpack Extract Juice Index Cluster Classify Search

slide-25
SLIDE 25

Mining Malware Repository

Step 1:

– “Search” for functions semantically similar to an example

Step 2:

– Extract parameters from abstract state For every function found:

  • get semantics of its blocks
  • select blocks of interest
  • fetch values of memory/register

from abstract state

slide-26
SLIDE 26

Example: Citadel parameters

sha1[4] static_config config_size xor_offset version count 0c4d 0x401668 0x328 0x422a3c+0x422ee8 0x1020500 11 56f9 0x402638 0x388 0x4237f4+0x423ca0 0x1020600 20 ac52 0x4018c8 0x3b8 0x423adc+0x423f88 0x1020700 1 836a 0x401578 0x360 0x41a2e4+0x41a790 0x2000700 5 8a2f 0x402d80 0x330 0x4204d4+0x420980 0x2000700 5 8a7f 0x402b98 0x34c 0x422adc+0x422f88 0x2000807 1 70d1 0x401690 0x31c 0x422a64+0x422f10 0x2000809 1 7084 0x4018b8 0x2e8 0x422d7c+0x423228 0x2010001 1

slide-27
SLIDE 27
  • Very accurate
  • Use for both steps

– Find and Extract

  • No silent failure
  • Resilient
  • Shift brittleness

– Disassembly and CFG

Pros and Cons Semantics

slide-28
SLIDE 28

Takeaway

  • Malware analysis not all about signatures

– Embedded secrets in malware can unlock defenses

  • Bytecode based tools are simple, but brittle
  • Theoretically rigorous tools realistic

– Scalable; new way of looking at the problem

slide-29
SLIDE 29

Contacts

Paul Black

Federation University p.black@federation.edu.au

Arun Lakhotia

University of Louisiana at Lafayette arun@Louisiana.edu arun@cythereal.com

slide-30
SLIDE 30

Extra Slides

slide-31
SLIDE 31
  • Bytecode based:

– IDA Flirt

  • Disassemble, and use

instructions

– Improvement – Still brittle

Other YARA Like methods

slide-32
SLIDE 32
  • Create Graphs

– control flow graph, call graph – Bindiff, Malwise

  • Find relevant code

– Instructions – Graph structure

  • Extract parameters

– Peek in code (Use Yara) – Byte/Instruction order

  • Better, but still brittle

Using Program Structure

No known implementation