A Semantics-based Approach to Malware Detection Mila Dalla Preda - - PowerPoint PPT Presentation

a semantics based approach to malware detection
SMART_READER_LITE
LIVE PREVIEW

A Semantics-based Approach to Malware Detection Mila Dalla Preda - - PowerPoint PPT Presentation

A Semantics-based Approach to Malware Detection Mila Dalla Preda University of Verona, Italy Mihai Christodorescu, Somesh Jha University of Wisconsin, USA Saumya Debray University of Arizona, USA 17-19 Jan, POPL 07, Nice A


slide-1
SLIDE 1

A Semantics-based Approach to Malware Detection

Mila Dalla Preda – University of Verona, Italy Mihai Christodorescu, Somesh Jha – University of Wisconsin, USA Saumya Debray – University of Arizona, USA 17-19 Jan, POPL ’07, Nice

A Semantics-based Approach to Malware Detection – p.1

slide-2
SLIDE 2

A Few Basic Definitions

Malware represents malicious software. Malware detector is a program D that determines whether another program P is infected with a malware M.

D(P, M) =

  • True

if D determines that P is infected with M False

  • therwise

A Semantics-based Approach to Malware Detection – p.2

slide-3
SLIDE 3

A Few Basic Definitions

Malware represents malicious software. Malware detector is a program D that determines whether another program P is infected with a malware M.

D(P, M) =

  • True

if D determines that P is infected with M False

  • therwise

An ideal malware detector detects all and only the programs infected with M, i.e., it is sound and complete. Sound = no false positives (no false alarms) Complete = no false negatives (no missed alarms)

A Semantics-based Approach to Malware Detection – p.2

slide-4
SLIDE 4

Malware Trends

There is more malware every year.

445 10992

New Malware 2002 2003 2004 2005

A Semantics-based Approach to Malware Detection – p.3

slide-5
SLIDE 5

Malware Trends

There is more malware every year.

445 10992 141 101

New Malware New Malware Families 2002 2003 2004 2005

But the number of malware families has almost no variation. Beagle family has 197 variants (as of Nov. 30). Warezov family has 218 variants (as on Nov. 27).

A Semantics-based Approach to Malware Detection – p.3

slide-6
SLIDE 6

The Malware Threat

Current detectors are signature-based:

P matches byte-signature sig ⇒ P is infected

Signature-based detectors, when sound, are not complete. Malware writers use obfuscation to evade current detectors.

A Semantics-based Approach to Malware Detection – p.4

slide-7
SLIDE 7

The Malware Threat

Current detectors are signature-based:

P matches byte-signature sig ⇒ P is infected

Signature-based detectors, when sound, are not complete. Malware writers use obfuscation to evade current detectors. Virus–antivirus “coevolution”

  • 1. Malware writers create new, undetected malware.
  • 2. Antimalware tools are updated to catch the new malware.
  • 3. Repeat...

A Semantics-based Approach to Malware Detection – p.4

slide-8
SLIDE 8

Common Obfuscations

Nop insertion Register renaming Junk insertion Code reordering Encryption Reordering of independent statements Reversing of branch conditions Equivalent instruction substitution Opaque predicate insertion ... and many others...

A Semantics-based Approach to Malware Detection – p.5

slide-9
SLIDE 9

Common Obfuscations

Nop insertion Register renaming Junk insertion Code reordering Encryption Reordering of independent statements Reversing of branch conditions Equivalent instruction substitution Opaque predicate insertion ... and many others...

A Semantics-based Approach to Malware Detection – p.5

slide-10
SLIDE 10

Obfuscation Example

(Pseudo-)Code:

mov eax, [edx+0Ch] push ebx push [eax] call ReleaseLock

A Semantics-based Approach to Malware Detection – p.6

slide-11
SLIDE 11

Obfuscation Example

(Pseudo-)Code:

mov eax, [edx+0Ch] push ebx push [eax] call ReleaseLock

Obfuscated code (junk):

mov eax, [edx+0Ch] inc eax push ebx dec eax push [eax] call ReleaseLock

A Semantics-based Approach to Malware Detection – p.6

slide-12
SLIDE 12

Obfuscation Example

(Pseudo-)Code:

mov eax, [edx+0Ch] push ebx push [eax] call ReleaseLock

Obfuscated code (junk + reordering):

mov eax, [edx+0Ch] jmp +3 push ebx dec eax jmp +4 inc eax jmp -3 call ReleaseLock jmp +2 push [eax] jmp -2

A Semantics-based Approach to Malware Detection – p.6

slide-13
SLIDE 13

Solutions?

Recent developments based on deep static analysis: Detecting Malicious Code by Model Checking [Kinder et al. 2005] Semantics-Aware Malware Detection [Christodorescu et al. 2005] Behavior-based Spyware Detection [Kirda et al. 2006]

A Semantics-based Approach to Malware Detection – p.7

slide-14
SLIDE 14

Solutions?

Recent developments based on deep static analysis: Detecting Malicious Code by Model Checking [Kinder et al. 2005] Semantics-Aware Malware Detection [Christodorescu et al. 2005] Behavior-based Spyware Detection [Kirda et al. 2006] Lack of a formal framework for assessing these techniques.

A Semantics-based Approach to Malware Detection – p.7

slide-15
SLIDE 15

Our Contributions

Challenges: Many different obfuscations Obfuscations are usually combined Detection schemes usually rely on static/dynamic analyses

A Semantics-based Approach to Malware Detection – p.8

slide-16
SLIDE 16

Our Contributions

Challenges: Many different obfuscations Obfuscations are usually combined Detection schemes usually rely on static/dynamic analyses A framework for assessing the resilience to obfuscation of malware detectors. Obfuscation as transformation of trace semantics Malware detection as abstract interpretation of trace semantics Composing obfuscations vs. composing detectors

A Semantics-based Approach to Malware Detection – p.8

slide-17
SLIDE 17

Two Worlds of Malware Detectors

Malware detector

  • n finite semantic structure

Disassembler CFG construction Other analyses

A Semantics-based Approach to Malware Detection – p.9

slide-18
SLIDE 18

Two Worlds of Malware Detectors

Malware detector

  • n finite semantic structure

Disassembler CFG construction Other analyses Malware detector

  • n trace semantics

A Semantics-based Approach to Malware Detection – p.9

slide-19
SLIDE 19

Two Worlds of Malware Detectors

Malware detector

  • n finite semantic structure

Disassembler CFG construction Other analyses Malware detector

  • n trace semantics

A Semantics-based Approach to Malware Detection – p.9

slide-20
SLIDE 20

Abstract Interpretation

Design approximate semantics of programs [Cousot & Cousot ’77, ’79].

α γ ⊤ ⊤ α(c) γ(α(c)) c ⊥ C ⊥ A

Galois Connection: C, α, γ, A, A and C are complete lattices.

Abs(C), ⊑ set of all possible abstract domains, A1 ⊑ A2 if A1 is more concrete than A2

A Semantics-based Approach to Malware Detection – p.10

slide-21
SLIDE 21

Outline

Semantic Malware Detector Soundness and Completeness Classifying Obfuscations Composing Obfuscations Proving Soundness and Completeness

A Semantics-based Approach to Malware Detection – p.11

slide-22
SLIDE 22

Semantic Malware Detector

A program P is infected by malware M, denoted M ֒

→ P

if (a part) of P execution is similar to that of M:

A Semantics-based Approach to Malware Detection – p.12

slide-23
SLIDE 23

Semantic Malware Detector

A program P is infected by malware M, denoted M ֒

→ P

if (a part) of P execution is similar to that of M:

∃ restriction r : S[ [M] ] ) ⊆ αr( S[ [P] ] )

A Semantics-based Approach to Malware Detection – p.12

slide-24
SLIDE 24

Semantic Malware Detector

A program P is infected by malware M, denoted M ֒

→ P

if (a part) of P execution is similar to that of M:

∃ restriction r : S[ [M] ] ⊆ αr(S[ [P] ])

αr

malware trace program trace

A Semantics-based Approach to Malware Detection – p.12

slide-25
SLIDE 25

Semantic Malware Detector

A program P is infected by malware M, denoted M ֒

→ P

if (a part) of P execution is similar to that of M:

∃ restriction r : S[ [M] ] ⊆ αr(S[ [P] ])

αr

malware trace program trace

Vanilla Malware i.e. not obfuscated malware

A Semantics-based Approach to Malware Detection – p.12

slide-26
SLIDE 26

Obfuscated Malware

O : P → P obfuscating transformation α : Sem → A abstraction that discards the details changed by the

  • bfuscation while preserving maliciousness

∃ restriction r : α (S[ [M] ]) ⊆ α (αr(S[ [P] ]))

A Semantics-based Approach to Malware Detection – p.13

slide-27
SLIDE 27

Obfuscated Malware

O : P → P obfuscating transformation α : Sem → A abstraction that discards the details changed by the

  • bfuscation while preserving maliciousness

∃ restriction r : α (S[ [M] ]) ⊆ α (αr(S[ [P] ]))

α αr

malware trace program trace

  • bfuscated malware trace

α

A Semantics-based Approach to Malware Detection – p.13

slide-28
SLIDE 28

Sound vs. Complete

Precision of the Semantic Malware Detector (SMD) depends on α

A Semantics-based Approach to Malware Detection – p.14

slide-29
SLIDE 29

Sound vs. Complete

Precision of the Semantic Malware Detector (SMD) depends on α A SMD on α is complete w.r.t. a set O of transformations if ∀O ∈ O:

O(M) ֒ → P ⇒

  • ∃ restriction r :

α(S[ [M] ]) ⊆ α(αr(S[ [P] ]))

always detects programs that are infected (no false negatives)

A Semantics-based Approach to Malware Detection – p.14

slide-30
SLIDE 30

Sound vs. Complete

Precision of the Semantic Malware Detector (SMD) depends on α A SMD on α is complete w.r.t. a set O of transformations if ∀O ∈ O:

O(M) ֒ → P ⇒

  • ∃ restriction r :

α(S[ [M] ]) ⊆ α(αr(S[ [P] ]))

always detects programs that are infected (no false negatives) If α is preserved by O then the SMD on α is complete w.r.t. O.

A Semantics-based Approach to Malware Detection – p.14

slide-31
SLIDE 31

Sound vs. Complete

Precision of the Semantic Malware Detector (SMD) depends on α A SMD on α is sound w.r.t. a set O of transformations if:

∃ restriction r : α(S[ [M] ]) ⊆ α(αr(S[ [P] ]))

  • ⇒ ∃O ∈ O : O(M) ֒

→ P

never erroneously claims a program is infected (no false positives)

A Semantics-based Approach to Malware Detection – p.14

slide-32
SLIDE 32

Outline

Semantic Malware Detector Soundness and Completeness Classifying Obfuscations Composing Obfuscations Proving Soundness and Completeness

A Semantics-based Approach to Malware Detection – p.15

slide-33
SLIDE 33

Classifying Obfuscations

O : P → P is a conservative obfuscation if ∀ trace1 ∈ S[ [P] ], ∃ trace2 ∈ S[ [O[ [P] ]] ]: trace1 is sub-sequence of trace2

program trace program trace 1 2 3 4 1 2 3 4

  • bfuscated

A Semantics-based Approach to Malware Detection – p.16

slide-34
SLIDE 34

Conservative Obfuscations

Abstraction αc handles conservative obfuscations:

αc[X](Y) = X ∩ SubSequences(Y)

The SMD on αc is sound and complete w.r.t. conservative obfuscations

A Semantics-based Approach to Malware Detection – p.17

slide-35
SLIDE 35

Conservative Obfuscations

Abstraction αc handles conservative obfuscations:

αc[X](Y) = X ∩ SubSequences(Y)

The SMD on αc is sound and complete w.r.t. conservative obfuscations

αc(malwaretrace)(programtr

2 1 3 4 malware trace program trace 1 2 4 3

A Semantics-based Approach to Malware Detection – p.17

slide-36
SLIDE 36

Conservative Obfuscations

Abstraction αc handles conservative obfuscations:

αc[X](Y) = X ∩ SubSequences(Y)

The SMD on αc is sound and complete w.r.t. conservative obfuscations

αc(malware trace)(program trace)

2 1 3 4 malware trace 1 2 3 4 program trace

A Semantics-based Approach to Malware Detection – p.17

slide-37
SLIDE 37

Conservative Obfuscations

Abstraction αc handles conservative obfuscations:

αc[X](Y) = X ∩ SubSequences(Y)

The SMD on αc is sound and complete w.r.t. conservative obfuscations Abstraction αc returns the set of malware traces that are subsequences of some program trace

A Semantics-based Approach to Malware Detection – p.17

slide-38
SLIDE 38

Classifying Common Obfuscations

Nop insertion Register renaming Junk insertion Code reordering Encryption Reordering of independent statements Reversing of branch conditions Equivalent instruction substitution Opaque predicate insertion

A Semantics-based Approach to Malware Detection – p.18

slide-39
SLIDE 39

Conservative Obfuscation Example

(Pseudo-)Code:

mov eax, [edx+0Ch] push ebx push [eax] call ReleaseLock

Obfuscated code (junk + reordering):

mov eax, [edx+0Ch] jmp +3 push ebx dec eax jmp +4 inc eax jmp -3 call ReleaseLock jmp +2 push [eax] jmp -2

A Semantics-based Approach to Malware Detection – p.19

slide-40
SLIDE 40

Conservative Obfuscation Example

(Pseudo-)Code: Obfuscated code (junk + reordering):

A Semantics-based Approach to Malware Detection – p.20

slide-41
SLIDE 41

Conservative Obfuscation Example

(Pseudo-)Code: Obfuscated code (junk + reordering):

A Semantics-based Approach to Malware Detection – p.20

slide-42
SLIDE 42

Non-Conservative

Approach 1: Find a canonical transformation

A Semantics-based Approach to Malware Detection – p.21

slide-43
SLIDE 43

Non-Conservative

Approach 1: Find a canonical transformation (Pseudo-)Code:

mov eax, [edx+0Ch] push ebx push [eax] call ReleaseLock

Obfuscated Code (Renaming):

mov edi, [eax+0Ch] push ecx push [edi] call ReleaseLock

A Semantics-based Approach to Malware Detection – p.21

slide-44
SLIDE 44

Non-Conservative

Approach 1: Find a canonical transformation (Pseudo-)Code:

mov R1, [R2+0Ch] push R3 push [R1] call ReleaseLock

Obfuscated Code (Renaming):

mov R1, [R2+0Ch] push R3 push [R1] call ReleaseLock

A Semantics-based Approach to Malware Detection – p.21

slide-45
SLIDE 45

Non-Conservative

Program infection: M ֒

→ P if ∃ restriction r : S[ [M] ] ⊆ αr(S[ [P] ])

A Semantics-based Approach to Malware Detection – p.22

slide-46
SLIDE 46

Non-Conservative

Program infection: M ֒

→ P if ∃ restriction r : S[ [M] ] ⊆ αr(S[ [P] ])

Approach 2: Futher abstractions Interesting Malware States: I ⊆ States[

[M] ]: M ֒ → P if ∃r : αI (S[ [M] ]) ⊆ αI (αr(S[ [P] ]))

3 1 2 4

A Semantics-based Approach to Malware Detection – p.22

slide-47
SLIDE 47

Non-Conservative

Program infection: M ֒

→ P if ∃ restriction r : S[ [M] ] ⊆ αr(S[ [P] ])

Approach 2: Futher abstractions Interesting Malware States: I ⊆ States[

[M] ]: M ֒ → P if ∃r : αI (S[ [M] ]) ⊆ αI (αr(S[ [P] ]))

3 1 2 4

Interesting Malware Traces: X ⊆ S[

[M] ] M ֒ → P if ∃r : X ⊆ αr(S[ [P] ])

A Semantics-based Approach to Malware Detection – p.22

slide-48
SLIDE 48

Composition

Malware writers combine different obfuscations to avoid detection The property of being conservative is preserved by composition

⇒ abstraction αc

Under certain assumptions we can handle the composition of non-conservative obfuscations

A Semantics-based Approach to Malware Detection – p.23

slide-49
SLIDE 49

Outline

Semantic Malware Detector Soundness and Completeness Classifying Obfuscations Composing Obfuscations Proving Soundness and Completeness

A Semantics-based Approach to Malware Detection – p.24

slide-50
SLIDE 50

Proving Soundness/Completeness of MD

Identifying the class of obfuscators to which a malware detector is resilient can be a complex and error-prone task. Obfuscators and detectors can be expressed on executions traces. A detector is resilient to an obfuscator if it can “abstract away” the obfuscator’s effect on the program. Case study: Semantics-Aware Malware Detection Algorithm proposed by [Christodorescu et al. 2005]. Complete for code reordering Complete for junk insertion Complete for variable renaming

A Semantics-based Approach to Malware Detection – p.25

slide-51
SLIDE 51

Conclusions

Malware detection as abstraction of program semantics vs. Obfuscation as transformation of program semantics We can now determine: Whether a detector is resilient to a set of obfuscations How complex a detector has to be to handle a given obfuscation Open Problems: Can we handle some interesting classes of non-conservative

  • bfuscations?

How does one design a semantic detector based on trace semantics? Connecting cryptographic and program analysis views of obfuscation

A Semantics-based Approach to Malware Detection – p.26

slide-52
SLIDE 52

Thank you!

A Semantics-based Approach to Malware Detection – p.27