Semantic Trace-based Malware Variants Detection Khalid Alzarooni - - PowerPoint PPT Presentation

semantic trace based malware variants detection
SMART_READER_LITE
LIVE PREVIEW

Semantic Trace-based Malware Variants Detection Khalid Alzarooni - - PowerPoint PPT Presentation

Overview Trace-based approach Experiments Semantic Trace-based Malware Variants Detection Khalid Alzarooni CREST - DCS - UCL April 6, 2011 Overview Trace-based approach Experiments Outline Overview Trace-based approach Experiments


slide-1
SLIDE 1

Overview Trace-based approach Experiments

Semantic Trace-based Malware Variants Detection

Khalid Alzarooni

CREST - DCS - UCL

April 6, 2011

slide-2
SLIDE 2

Overview Trace-based approach Experiments

Outline

Overview Trace-based approach Experiments

slide-3
SLIDE 3

Overview Trace-based approach Experiments

Overview

slide-4
SLIDE 4

Overview Trace-based approach Experiments

Malware Variants

  • Speed of evolution of malware partly driven by automatic

generation of program variants

  • Semantic equivalence tables used in malware, e.g.

polymorphic and metamorphic malware

  • These alter “local behaviour” of programs but larger scale

behaviour is unchanged

slide-5
SLIDE 5

Overview Trace-based approach Experiments

Malware Problem

Anoirel S. Issa Symantec, UK (EICAR 2009) “Poly or metamorphic engines have some essential components that help them build highly obfuscated code. A single engine is able to produce unique variants that can reach millions.” Malware evolution: M0 → M1 → M2 → M3 → . . . Syntactic view: code0 ≈ code1 ≈ code2 ≈ code3 ≈ . . .

slide-6
SLIDE 6

Overview Trace-based approach Experiments

Some Code Obfuscation Schemes

[Beaucamps, 2007, Sz¨

  • r, 2005]

Label Category Obfuscation gi Garbage insertion {} → {C}

  • p

Opaque predicate {} → {PT/F} ec Equivalent command {op} → { ¯

  • p}

rr Register renaming {Rx} → {Ry} cs Command split {C} → {Cx, Cy} cm Command merging {Cx, Cy} → {Cxy} cr Command reorder {(Cx, Cy)} → {(Cy, Cx)} .. . . . . . .

slide-7
SLIDE 7

Overview Trace-based approach Experiments

Example: a program P and its semantically equivalent variant P′

P : a R0:=n b R1:=m c R2:=R1 d R3:=R2+R0 e R4:=R1+k f R5:=1 − → P′ : a′ R0:=n cr1 JMP rr1 gi1 R22:=R22+1

  • p1

PT JMP cm rr1 R11:=m gi2 R22:=R22+1 cr2 JMP op1 cm R3:=R11+R0 e′

1

R4:=k e′

2

R4:=R4+R11 rr2 R15:=1

slide-8
SLIDE 8

Overview Trace-based approach Experiments

Malware Problem

  • To detect variants of a known malware
  • Given two arbitrary programs is it possible to tell whether they

are semantically equivalent?

  • It is undecidable: not possible to devise an algorithm to

produce “yes” or “no” detection answer [Cohen, 1987] P′ ? ≈ P

slide-9
SLIDE 9

Overview Trace-based approach Experiments

Semantic trace-based

Program ↓ Program approximation ↓ Trace collection ↓ Semantic analysis ↓ Detection of semantic signatures

slide-10
SLIDE 10

Overview Trace-based approach Experiments

Test scenarios

Results:

  • Tested samples: Bho, Binom, Mobler, Telf, . . .
  • Most malware successfully matched, with k ≥ 60%
  • No false positives, similarity ≤ 20% (10 benign executables)
  • 100% malware variants classification
  • sig-w-slice: accuracy 30% and speed 26% in detection phase
  • sig-wo-slice: 5:7 faster in sig. generation phase
slide-11
SLIDE 11

Overview Trace-based approach Experiments

Trace-based approach

slide-12
SLIDE 12

Overview Trace-based approach Experiments

Semantic trace-based

  • Design a detector that can tell when two programs are

approximately equivalent, which might often be good enough

  • Approximate semantic equivalence is decidable
  • Approximate a program’s semantics [

[P] ]

  • CFG abstract traces (program paths) & test inputs
  • concrete & semantic traces

Malware evolution: M0 → M1 → M2 → M3 → . . . Syntactic view: code0 ≈ code1 ≈ code2 ≈ code3 ≈ . . . Semantic view: [ [M0] ] ≈ [ [M1] ] ≈ [ [M2] ] ≈ [ [M3] ] ≈ . . .

slide-13
SLIDE 13

Overview Trace-based approach Experiments

Semantic trace-based

  • M1 is a variant of M0 if [

[M0] ] is sub-sequence of [ [M1] ]

.

2 3 4 1 2 3 4

malware trace variant trace

1

t t′ ∀t ∈ [ [M0] ], ∃t′ ∈ [ [M1] ] : t ≺ t′

slide-14
SLIDE 14

Overview Trace-based approach Experiments

Semantic trace-based Two phases:

  • 1. Signature generation
  • 2. Detection
slide-15
SLIDE 15

Overview Trace-based approach Experiments

Signature generation phase

executable M ↓ (disassembler & translator) abstract code (AAPL) ↓ (test data generator) abstract trace and a test input x ↓ (semantic simulator) a concrete trace ↓ (trace slicer) trace slices ↓ (abstracter) semantic traces τm semantic signature = (τm, x)

slide-16
SLIDE 16

Overview Trace-based approach Experiments

Detection phase

executable P ↓ (disassembler & translator) abstract code (AAPL) ↓ (semantic simulator, sigm = (τm, x)) a concrete trace ↓ (abstracter) (τp, τm) ↓ (Matcher) yes/no

slide-17
SLIDE 17

Overview Trace-based approach Experiments

Experiments

slide-18
SLIDE 18

Overview Trace-based approach Experiments

Detector prototype

Signature generation phase Malicious program M Semantic signatures Suspicious program P Detection phase Yes/No

slide-19
SLIDE 19

Overview Trace-based approach Experiments

Test scenarios

We tested:

  • Robustness against real in-the-wild variants
  • Effectiveness of trace slicing in the signatures
  • Fig. gen.& detection phases: sig-wo-slice vs. sig-w-slice
  • False positives
  • Classification of malware samples
slide-20
SLIDE 20

Overview Trace-based approach Experiments

Test scenarios

Results:

  • Tested samples: Bho, Binom, Mobler, Telf, . . .
  • Most malware successfully matched, with k ≥ 60%
  • sig-w-slice: accuracy 30% and speed 26% in detection phase
  • sig-wo-slice: 5:7 faster in sig. generation phase
  • No false positives, similarity ≤ 20% (10 benign executables)
  • 100% malware variants classification
slide-21
SLIDE 21

Overview Trace-based approach Experiments

Prototype limitation

Technical shortcomes:

  • Limited to viruses and worms
  • Does not work for dynamic packed code and code with

anti-disassembly techniques and

  • Relay on tools to manually unpack (encrypted) and

disassemble files

slide-22
SLIDE 22

Overview Trace-based approach Experiments

Thank you very much !

0Image: Salvatore Vuono / FreeDigitalPhotos.net

slide-23
SLIDE 23

Overview Trace-based approach Experiments

References

Alzarouni, K., Clark, D., and Tratt, L. (2010). Semantic malware detection. Technical Report TR-10-03, Department of Computer Science, King’s College London. Beaucamps, P. (2007). Advanced metamorphic techniques in computer viruses. In Proceedings of the International Conference on Computer, Electrical, and Systems Science, and Engineering - CESSE’07. Cohen, F. (1987). Computer viruses: theory and experiments.

  • Comput. Secur., 6(1):22–35.

Sz¨

  • r, P. (2005).

The Art of Computer Virus Research and Defense. Addison-Wesley, Reading, Mass.

slide-24
SLIDE 24

Overview Trace-based approach Experiments

Detector components

slide-25
SLIDE 25

Overview Trace-based approach Experiments

Trace Semantics

  • Trace semantics of a program is the set of all traces T that

the program can produce

  • A trace t ∈ T is a sequence of pairs of execution context X

and program syntax C

  • Execution context: memory (locations) and environment

(variables) values X = E × M

  • Program syntax: source code (commands)

ρ ∈ E = R → Z⊥ (environments) m ∈ M = Z → Z⊥ ∪ C (memory) ξ ∈ X = E × M (execution contexts) S = C × X (program states)

slide-26
SLIDE 26

Overview Trace-based approach Experiments

Trace Semantics

  • Signatures refer to exact program state
  • Semantic signatures refer to values at particular memory

locations and in registers that are observed to be constant across variants from the same malware family

  • Detection: environment-memory traces of M that are

contained (subtraces) of environment-memory traces of M′

slide-27
SLIDE 27

Overview Trace-based approach Experiments

Semantic Simulator

Not “live” testing Evaluate abstract trace and collect concrete traces Semantics of Actions:

ˆ A : A × X → X ˆ A[ [R := E] ]ξ = (ρ′, m) where ξ = (ρ, m) and ρ′ = ρ(R → ˆ E[ [E] ]ξ) ˆ A[ [∗R := E] ]ξ = (ρ, m′) where ξ = (ρ, m) and m′ = m(ρ(R) → ˆ E[ [E] ]ξ) ˆ A[ [JMP E] ]ξ = (ρ′, m) where ξ = (ρ, m) and ρ′ = ρ(PC → ˆ E[ [E] ]ξ) ˆ A[ [RTN] ]ξ = (ρ′, m) where ξ = (ρ, m) and ρ′ = ρ(PC → m(ρ(SP)), SP → SP + 1) ˆ A[ [PUSH E] ]ξ = (ρ′, m′) where ξ = (ρ, m) and ρ′ = ρ(SP → SP − 1) and m′ = m(ρ(SP − 1) → ˆ E[ [E] ]ξ)

slide-28
SLIDE 28

Overview Trace-based approach Experiments

Semantic Simulator

Not “live” testing Evaluate abstract trace and collect concrete traces Semantics of Commands:

ˆ C : S → Σ(S) (determines transition relation between states) ˆ C[ [CA] ]ξ = (ξ′, C ′) where ξ = (ρ, m), ξ′ = ˆ A[ [A] ]ξ and C ′ = m(ρ(PC)) if A := JMP ∪ CALL ∪ RTN m(ρ(PC + 1))

  • therwise

ˆ C[ [CB] ]ξ = (ξ′, C ′) where ξ = (ρ, m), and (ξ′, C ′) =

  • ξ′ = (ρ′, m), ρ′ = ρ(PC → ˆ

E[ [E] ]ξ), C ′ = m(ρ(ˆ E[ [E] ]ξ)) if ˆ B[ [B] ]ξ = true ξ′ = ξ, C ′ = m(ρ(PC + 1))

  • therwise
slide-29
SLIDE 29

Overview Trace-based approach Experiments

TSAlgo – Trace slicing

  • P slice

− → P′ (semantically invariant subprogram wrt a criterion)

  • t slice

− → t′ (semantically invariant subtrace wrt tsc)

  • Trace slicing criterion tsc: recent definition points of variables

in t

  • A conjecture: useful in the detection step for more accurate

and efficient results.

  • Effect is to shorten the trace and thus the signature
slide-30
SLIDE 30

Overview Trace-based approach Experiments

Signature matching

sig = (τm, x) of M and τp of P: MD(sig, P) = yes if τm is contained in τp no

  • therwise

Our assumption: some core semantic values in the two variants that would match with a high degree of similarity, indicating the likelihood of them being behaviourally the same.

slide-31
SLIDE 31

Overview Trace-based approach Experiments

Signature matching

  • we look for corresponding semantic traces of τm in τp,
  • a fuzzy matching to determine whether τp corresponds

semantically to τm semantics similarity measure = no. of mappings/|τm|

  • We consider τm is contained in τp if the similarity measure is

above a certain similarity threshold k, k ≤ similarity measure ≤ 100 k: a large percentage of (desired) mappings