Mila Giaco Saumya Kevin Gregg Modeling metamorphism by abstract interpretation Roberto Giacobazzi 16.09.2010 - SAS2010 Thursday, September 16, 2010

The problem Thursday, September 16, 2010

Malware analysis: signature checking ✤ Malware refers to malicious software ✤ Signature checking: identify a sequence of instructions which is unique to a malware (virus signature) then scan program for signatures ✤ Example: Chernobyl signature: E800 0000 005B 8D4B 4251 5050 0F01 4C24 FE5B 83C3 1CFA 882B ✤ Cumbersome, inaccurate, easy to foil.... Thursday, September 16, 2010

Anti-anti malware ✤ How can we escape signature checking? ✤ ...by dynamically modifying malware structure! ✤ Polymorphic malware contain decryption routines which decrypt encrypted constant parts of their body. ✤ Metamorphic malware typically do not use encryption, but mutates (obfuscate) forms in subsequent generations. Thursday, September 16, 2010

Metamorphism as obfuscation Loop: From Chernobyl CIH 1.4 pop ecx nop jecxz SFModMark Loop: xor ebx, ebx pop ecx beqz N1 jecxz SFModMark N1: mov esi, ecx mov esi, ecx nop mov eax, 0d601h mov eax, 0d601h pop edx pop edx pop ecx pop ecx call edi nop jmp Loop call edi xor ebx, ebx beqz N2 N2: jmp Loop Thursday, September 16, 2010

Metamorphism as obfuscation Loop: pop ecx From Chernobyl CIH 1.4 nop call edi xor ebx, ebx Loop: beqz N2 pop ecx N2: jmp Loop jecxz SFModMark mov esi, ecx nop mov eax, 0d601h mov eax, 0d601h pop edx pop edx pop ecx pop ecx call edi nop jmp Loop jecxz SFModMark xor ebx, ebx beqz N1 N1: mov esi, ecx Thursday, September 16, 2010

Metamorphism as obfuscation Loop: pop ecx nop jmp L1 L3: call edi From Chernobyl CIH 1.4 xor ebx, ebx beqz N2 N2: jmp Loop Loop: jmp L4 pop ecx L2: nop jecxz SFModMark mov eax, 0d601h mov esi, ecx pop edx mov eax, 0d601h pop ecx pop edx nop pop ecx jmp L3 call edi L1: jecxz SFModMark jmp Loop xor ebx, ebx beqz N1 N1: mov esi, ecx jmp L2 L4: Thursday, September 16, 2010

Metamorphism: an example Malware evolution push ecx push ecx mov ecx, [ebp + 10] mov ecx, ebp mov ecx, ebp push eax push eax mov eax, 33 add eax, 2342 push ecx add ecx, eax mov eax, 33 mov ecx,ebp pop eax add ecx, eax push ecx add ecx,33 pop eax mov ecx,ebp push esi push esi mov eax, esi mov [ebp - 3], eax add ecx,33 mov esi,ecx mov esi, ecx push eax mov [ecx-36],eax sub esi,34 push edx mov esi, ecx pop ecx mov [esi-2],eax push edx pop esi mov edx, 34 xor edx, 778f pop ecx sub esi, edx mov edx, 34 pop edx sub esi, edx mov [esi - 2], eax pop edx pop esi mov [esi-2], eax pop ecx pop esi pop ecx ✤ How can we model and compute signatures for metamorphism? Thursday, September 16, 2010

Metamorphism: some (public) history � � � � � � � � http://vx.netlux.org/ Win32.Evol swaps instructions with equivalents inserts junk code between essential instructions Regswap (Win32) same code different register names BadBoy (DOS) and Ghost (Win32) same code different subroutine order (n! possible mutations: 10 modules ~3.6M possible signatures) Zmorph (Win95) decrypt virus body instruction by instruction by Peter Szor push instructions on stack insert and remove jumps rebuild body on stack Zperm (Win95) ............................................ Thursday, September 16, 2010

Attacking metamorphism ✤ Idea: Behavior Monitors ✤ Run suspect program in an emulator and extract a DB of relevant signatures (huge DB) ✤ Look for changes in file structure: Some viruses modify files in a consistent way (inaccurate) ✤ Disassemble and look for virus-like instructions: reverse engineering malware (expensive) Thursday, September 16, 2010

ME Problems V ME ME V V ME V ✤ The code may contain its own metamorphic engine ME ✤ The metamorphic engine can be used when engineering malware ✤ Metamorphic signature: is a language L of possible signatures generated by a metamorphic malware: σ ∈ L σ is a possible signature ⇒ ✤ Is there a way for extracting a metamorphic signatures? Thursday, September 16, 2010

ME Related works V ME ME V V ME V ✤ Specify some abstraction (CFG, instruction equivalence, rewrite rules towards normal form - undo metamorphism ) [Dalla Preda et al POPL07, Filiol PWASET07, Zbitsky JCV 09, Bonfante et al JCV 09] ✤ ✤ Existing semantics-based approach to malware detection are promising but they still rely on a priori knowledge of the metamorphic transformations used by malware writers ✤ Need to model the self-modifying behavior of a metamorphic malware without any a priori knowledge of the transformations it uses Thursday, September 16, 2010

ME Idea V ME ME V V ME V ✤ Idea: Extract L as a abstract interpretation of the metamorphic malware! Extracting metamorphic signatures is approximating malware semantics ✤ data objects are code slices ✤ abstraction acts on code structure (code may be as complex as data!!) ✤ invariants on mutational code structure describe the metamorphic engine behavior!! ✤ fix-point abstraction approximate invariants, i.e. generates metamorphic signatures.... Thursday, September 16, 2010

Modeling metamorphism Thursday, September 16, 2010

Phase semantics entry point memory stack input m ✤ States: no distinction between code and data ple � a, m , θ , I � executed, ✤ Phase semantics: partition the trace of execution states into phases, each collecting the computation of a particular code variant → ✤ Maximal trace semantics: s: S [ ] , wh [ P ] ] = lfp F T [ [ P ] PHASE BOUND PHASE BOUND PHASE BOUND PHASE BOUND MOD MOD TRACE OF S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 STATES MOD PHASE 1 PHASE 2 PHASE 3 PHASE 4 TRACE OF P0 P1 P2 P3 PROGRAMS bound ( s ) = { s 0 } ∪ { s i | MOD ( s i − 1 ) ∩ { a j | i ≤ j ≤ n } � = ∅ } phases ( s ) = { s i . . . s j | s i , s j + 1 ∈ bound ( s ) , ∀ l ∈ [ i + 1, j ] : s l �∈ bound ( s ) } Thursday, September 16, 2010

Fix-point phase semantics P0 P1 P2 P3 P4 ✤ Program evolution graph: P5 h: G [ [ P 0 ] ] = ( V, E ) P9 Nodes = Phases P8 ✤ P6 Edges = Phase transitions ✤ P7 ✤ The phase semantics of a program P 0 is given by the set of all possible paths of its program evolution graph PHASE SEMANTICS S Ph [ [ P 0 ] ] = { P 0 ...P n | ∀ i ∈ [ 0, n − 1 ] : ( P i , P i + 1 ) ∈ E } S Ph P ∗ Thursday, September 16, 2010

Fix-point phase semantics � � ˛ s = s 0 . . . s i . . . s n ∈ S [ [ P 0 ] ] , s i ∈ bound ( s ) , ✤ Phase transition: ˛ Ph ( P 0 ) = T P i ˛ ∀ l ∈ [ 1, i − 1 ] : s l �∈ bound ( s ) ˛ ˛ S Ph [ Fix-point iteration: [ P ] ] = lfp F T Ph [ [ P ] ] ✤ MOD MOD TRACE SEMANTICS T T T T T T T T T TRACE OF ∈ S [ [ P 0 ] ] S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 STATES MOD T Ph T Ph T Ph TRACE OF ∈ S Ph [ PHASE SEMANTICS [ P 0 ] ] P0 P1 P2 P3 PROGRAMS Thursday, September 16, 2010

Correctness of phase semantics γ Ph on: � ℘ ( Σ ∗ ) , ⊆� � ℘ ( P ∗ ) , ⊆� . L ← − → − α Ph ✤ Trace semantics and phase semantics are related by abstraction: hen abstraction can be defined a keeps only phase bounds ✤ α P h ✤ Locally incomplete..... ✤ Fix-point complete: s: α Ph ( lfp F T [ ] , n [ P 0 ] ]) = lfp F T Ph [ [ P 0 ] CONCRETE TEST FOR METAMORPHISM ∃ P 0 , P 1 , ..., P n ∈ S Ph [ ] , ∃ i ∈ [ 0, n ] : P i = Q [ P 0 ] P 0 ; Ph Q ⇔ no false positives, no false negatives Thursday, September 16, 2010

Abstracting metamorphism Thursday, September 16, 2010

Abstracting phases ✤ Need abstraction for approximating phases!!! γ A design GC: � ℘ ( P ∗ ) , ⊆� ← − � A, � A � → − α A define the abstract transition relation T A : A → ℘ ( A ) ] : A → A whose fixpoint computation lfp � A F T A [ ] = S A [ define F T A [ [ P 0 ] [ P 0 ] [ P 0 ] ] corresponds to the abstract specification of the metamorphic behavior prove that S A [ ] is a correct approximation of phase semantics S Ph [ ] , i.e., [ P 0 ] [ P 0 ] α A ( lfp ⊆ F T Ph [ ]) � A lfp � A F T A [ [ P 0 ] [ P 0 ] ] ABSTRACT TEST FOR METAMORPHISM α A ( Q ) � A S A [ [ P 0 ] ] P 0 ; A Q ⇔ no false negatives Thursday, September 16, 2010

Phases as FSA let ˚ α : P → F be 1: MEM [ f ] := 100 8: MEM [ MEM [ f ]] := MEM [ 4 ] P 0 2: 9: input ⇒ MEM [ a ] MEM [ MEM [ f ] + 1 ] := MEM [ 5 ] 3: ( MEM [ a ] mod 2) goto 7 10: if MEM [ MEM [ f ] + 2 ] := encode ( goto 6 ) 4: MEM [ b ] := MEM [ a ] 11: MEM [ 4 ] := encode ( nop ) 5: 12: MEM [ a ] := MEM [ a ] /2 MEM [ 5 ] := encode ( goto MEM [ f ]) 6: 13: goto 8 MEM [ f ] := MEM [ f ] + 3 7: MEM [ a ] := ( MEM [ a ] + 1 ) /2 14: goto 2 α ( P 0 ) ˚ MEM[b]:= MEM[a]:= MEM[a] MEM[a]/2 4 5 6 goto input => MEM[f]:= MEM[MEM[f]]:= MEM[a] 100 MEM[4] MEM[a] mod 2 1 2 3 8 9 7 MEM[MEM[f]+1]:= goto MEM[5] MEM[f]:= MEM[MEM[f]+2]:= MEM[5]:= MEM[4]:= MEM[f] + 3 encode(goto 6) encode(goto MEM[f]) encode(nop) 14 13 12 11 10 Thursday, September 16, 2010

Recommend

More recommend