semantic trace based malware variants detection
play

Semantic Trace-based Malware Variants Detection Khalid Alzarooni - PowerPoint PPT Presentation

Overview Trace-based approach Experiments Semantic Trace-based Malware Variants Detection Khalid Alzarooni CREST - DCS - UCL April 6, 2011 Overview Trace-based approach Experiments Outline Overview Trace-based approach Experiments


  1. Overview Trace-based approach Experiments Semantic Trace-based Malware Variants Detection Khalid Alzarooni CREST - DCS - UCL April 6, 2011

  2. Overview Trace-based approach Experiments Outline Overview Trace-based approach Experiments

  3. Overview Trace-based approach Experiments Overview

  4. Overview Trace-based approach Experiments Malware Variants • Speed of evolution of malware partly driven by automatic generation of program variants • Semantic equivalence tables used in malware, e.g. polymorphic and metamorphic malware • These alter “local behaviour” of programs but larger scale behaviour is unchanged

  5. Overview Trace-based approach Experiments Malware Problem Anoirel S. Issa Symantec, UK (EICAR 2009) “Poly or metamorphic engines have some essential components that help them build highly obfuscated code. A single engine is able to produce unique variants that can reach millions.” Malware evolution: M 0 → M 1 → M 2 → M 3 → . . . Syntactic view: code 0 �≈ code 1 �≈ code 2 �≈ code 3 �≈ . . .

  6. Overview Trace-based approach Experiments Some Code Obfuscation Schemes [Beaucamps, 2007, Sz¨ or, 2005] Label Category Obfuscation gi Garbage insertion {} → { C } {} → { P T / F } op Opaque predicate ec Equivalent command { op } → { ¯ op } rr Register renaming { Rx } → { Ry } cs Command split { C } → { C x , C y } cm Command merging { C x , C y } → { C xy } cr Command reorder { ( C x , C y ) } → { ( C y , C x ) } .. . . . . . .

  7. Overview Trace-based approach Experiments Example: a program P and its semantically equivalent variant P ′ P ′ : a ′ R0:=n cr 1 JMP rr 1 P : gi 1 R22:=R22+1 a R0:=n P T JMP cm op 1 b R1:=m rr 1 c R11:=m R2:=R1 − → gi 2 d R22:=R22+1 R3:=R2+R0 cr 2 JMP op 1 e R4:=R1+k cm R3:=R11+R0 f R5:=1 e ′ R4:=k 1 e ′ R4:=R4+R11 2 rr 2 R15:=1

  8. Overview Trace-based approach Experiments Malware Problem • To detect variants of a known malware • Given two arbitrary programs is it possible to tell whether they are semantically equivalent? • It is undecidable: not possible to devise an algorithm to produce “yes” or “no” detection answer [Cohen, 1987] P ′ ? ≈ P

  9. Overview Trace-based approach Experiments Semantic trace-based Program ↓ Program approximation ↓ Trace collection ↓ Semantic analysis ↓ Detection of semantic signatures

  10. Overview Trace-based approach Experiments Test scenarios Results: • Tested samples: Bho, Binom, Mobler, Telf, . . . • Most malware successfully matched, with k ≥ 60% • No false positives, similarity ≤ 20% (10 benign executables) • 100% malware variants classification • sig-w-slice: accuracy 30% and speed 26% in detection phase • sig-wo-slice: 5:7 faster in sig. generation phase

  11. Overview Trace-based approach Experiments Trace-based approach

  12. Overview Trace-based approach Experiments Semantic trace-based • Design a detector that can tell when two programs are approximately equivalent, which might often be good enough • Approximate semantic equivalence is decidable • Approximate a program’s semantics [ [ P ] ] • CFG abstract traces (program paths) & test inputs • concrete & semantic traces Malware evolution: M 0 → M 1 → M 2 → M 3 → . . . Syntactic view: code 0 �≈ code 1 �≈ code 2 �≈ code 3 �≈ . . . Semantic view: [ [ M 0 ] ] ≈ [ [ M 1 ] ] ≈ [ [ M 2 ] ] ≈ [ [ M 3 ] ] ≈ . . .

  13. Overview Trace-based approach Experiments Semantic trace-based • M 1 is a variant of M 0 if [ [ M 0 ] ] is sub-sequence of [ [ M 1 ] ] 1 2 3 4 malware trace t variant trace t ′ 4 1 2 3 . ] , ∃ t ′ ∈ [ ] : t ≺ t ′ ∀ t ∈ [ [ M 0 ] [ M 1 ]

  14. Overview Trace-based approach Experiments Semantic trace-based Two phases: 1. Signature generation 2. Detection

  15. Overview Trace-based approach Experiments Signature generation phase executable M ↓ (disassembler & translator) abstract code (AAPL) ↓ (test data generator) abstract trace and a test input x ↓ (semantic simulator) a concrete trace ↓ (trace slicer) trace slices ↓ (abstracter) semantic traces τ m semantic signature = ( τ m , x )

  16. Overview Trace-based approach Experiments Detection phase executable P ↓ (disassembler & translator) abstract code (AAPL) ↓ (semantic simulator, sig m = ( τ m , x )) a concrete trace ↓ (abstracter) ( τ p , τ m ) ↓ (Matcher) yes/no

  17. Overview Trace-based approach Experiments Experiments

  18. Overview Trace-based approach Experiments Detector prototype Malicious program M Signature generation phase Semantic signatures Suspicious program P Detection phase Yes/No

  19. Overview Trace-based approach Experiments Test scenarios We tested: • Robustness against real in-the-wild variants • Effectiveness of trace slicing in the signatures • Fig. gen.& detection phases: sig-wo-slice vs. sig-w-slice • False positives • Classification of malware samples

  20. Overview Trace-based approach Experiments Test scenarios Results: • Tested samples: Bho, Binom, Mobler, Telf, . . . • Most malware successfully matched, with k ≥ 60% • sig-w-slice: accuracy 30% and speed 26% in detection phase • sig-wo-slice: 5:7 faster in sig. generation phase • No false positives, similarity ≤ 20% (10 benign executables) • 100% malware variants classification

  21. Overview Trace-based approach Experiments Prototype limitation Technical shortcomes: • Limited to viruses and worms • Does not work for dynamic packed code and code with anti-disassembly techniques and • Relay on tools to manually unpack (encrypted) and disassemble files

  22. Overview Trace-based approach Experiments Thank you very much ! 0 Image: Salvatore Vuono / FreeDigitalPhotos.net

  23. Overview Trace-based approach Experiments References Alzarouni, K., Clark, D., and Tratt, L. (2010). Semantic malware detection. Technical Report TR-10-03, Department of Computer Science, King’s College London. Beaucamps, P. (2007). Advanced metamorphic techniques in computer viruses. In Proceedings of the International Conference on Computer, Electrical, and Systems Science, and Engineering - CESSE’07 . Cohen, F. (1987). Computer viruses: theory and experiments. Comput. Secur. , 6(1):22–35. Sz¨ or, P. (2005). The Art of Computer Virus Research and Defense . Addison-Wesley, Reading, Mass.

  24. Overview Trace-based approach Experiments Detector components

  25. Overview Trace-based approach Experiments Trace Semantics • Trace semantics of a program is the set of all traces T that the program can produce • A trace t ∈ T is a sequence of pairs of execution context X and program syntax C • Execution context: memory (locations) and environment (variables) values X = E × M • Program syntax: source code (commands) ρ ∈ E = R → Z ⊥ (environments) m ∈ M = Z → Z ⊥ ∪ C (memory) ξ ∈ X = E × M (execution contexts) S = C × X (program states)

  26. Overview Trace-based approach Experiments Trace Semantics • Signatures refer to exact program state • Semantic signatures refer to values at particular memory locations and in registers that are observed to be constant across variants from the same malware family • Detection: environment-memory traces of M that are contained (subtraces) of environment-memory traces of M ′

  27. Overview Trace-based approach Experiments Semantic Simulator Not “live” testing Evaluate abstract trace and collect concrete traces Semantics of Actions: ˆ A : A × X → X where ξ = ( ρ, m ) and ρ ′ = ρ ( R �→ ˆ ˆ ] ξ = ( ρ ′ , m ) A [ [ R := E ] E [ [ E ] ] ξ ) where ξ = ( ρ, m ) and m ′ = m ( ρ ( R ) �→ ˆ ˆ ] ξ = ( ρ, m ′ ) A [ [ ∗ R := E ] E [ [ E ] ] ξ ) where ξ = ( ρ, m ) and ρ ′ = ρ ( PC �→ ˆ ˆ A [ [ JMP E ] ] ξ = ( ρ ′ , m ) E [ [ E ] ] ξ ) where ξ = ( ρ, m ) and ρ ′ = ρ ( PC �→ m ( ρ ( SP )) , SP �→ SP + 1) ˆ ] ξ = ( ρ ′ , m ) A [ [ RTN ] where ξ = ( ρ, m ) and ρ ′ = ρ ( SP �→ SP − 1) and ˆ ] ξ = ( ρ ′ , m ′ ) A [ [ PUSH E ] m ′ = m ( ρ ( SP − 1) �→ ˆ E [ [ E ] ] ξ )

  28. Overview Trace-based approach Experiments Semantic Simulator Not “live” testing Evaluate abstract trace and collect concrete traces Semantics of Commands: ˆ C : S → Σ( S ) ( determines transition relation between states ) where ξ = ( ρ, m ) , ξ ′ = ˆ ˆ C [ [ C A ] ] ξ = ( ξ ′ , C ′ ) A [ [ A ] ] ξ and � m ( ρ ( PC )) if A := JMP ∪ CALL ∪ RTN C ′ = m ( ρ ( PC + 1)) otherwise ˆ ] ξ = ( ξ ′ , C ′ ) C [ [ C B ] where ξ = ( ρ, m ) , and ξ ′ = ( ρ ′ , m ) , ρ ′ = ρ ( PC �→ ˆ ] ξ ) , C ′ = m ( ρ ( ˆ if ˆ � E [ [ E ] E [ [ E ] ] ξ )) B [ [ B ] ] ξ = true ( ξ ′ , C ′ ) = ξ ′ = ξ, C ′ = m ( ρ ( PC + 1)) otherwise

  29. Overview Trace-based approach Experiments TSAlgo – Trace slicing → P ′ (semantically invariant subprogram wrt a criterion) • P slice − • t slice → t ′ (semantically invariant subtrace wrt tsc ) − • Trace slicing criterion tsc : recent definition points of variables in t • A conjecture: useful in the detection step for more accurate and efficient results. • Effect is to shorten the trace and thus the signature

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend