Mining malware specifications through static reachability analysis - - PowerPoint PPT Presentation

mining malware specifications through static reachability
SMART_READER_LITE
LIVE PREVIEW

Mining malware specifications through static reachability analysis - - PowerPoint PPT Presentation

Introduction Mining specifications Detecting malware Results References Mining malware specifications through static reachability analysis Hugo Daniel Macedo 1 Tayssir Touili 2 1 INRIA Rocqencourt 2 LIAFA Univ. Paris 7 November 4, 2013


slide-1
SLIDE 1

Introduction Mining specifications Detecting malware Results References

Mining malware specifications through static reachability analysis

Hugo Daniel Macedo1 Tayssir Touili2

1INRIA Rocqencourt 2LIAFA Univ. Paris 7

November 4, 2013

slide-2
SLIDE 2

Introduction Mining specifications Detecting malware Results References

Motivation

Our goal: Malware detection! Why? Social impact!

  • Malware in the news!
  • We are all collateral damage!

Huge technological challenge!

  • 286 million new malware variants in 2010 ([Fossi et al.])
slide-3
SLIDE 3

Introduction Mining specifications Detecting malware Results References

Motivation

Our goal: Malware detection! Why? Social impact!

  • Malware in the news!
  • We are all collateral damage!

Huge technological challenge!

  • 286 million new malware variants in 2010 ([Fossi et al.])

We need automation!

slide-4
SLIDE 4

Introduction Mining specifications Detecting malware Results References

Existing anti-malware technology

Emulation based

  • Time limited
  • Behavior hiding

Signature matching based

  • Easy to avoid detection by syntactic manipulation!
slide-5
SLIDE 5

Introduction Mining specifications Detecting malware Results References

Malware detection

More robust techniques

Solution

One needs to analyse the behavior not the syntax of the program without executing it!

slide-6
SLIDE 6

Introduction Mining specifications Detecting malware Results References

Malware detection

More robust techniques

Solution

One needs to analyse the behavior not the syntax of the program without executing it! Model checking is a good candidate!

slide-7
SLIDE 7

Introduction Mining specifications Detecting malware Results References

Model checking for malware detection

| = Program Malicious behavior

slide-8
SLIDE 8

Introduction Mining specifications Detecting malware Results References

Model checking for malware detection

| = Program Malicious behavior Model?

slide-9
SLIDE 9

Introduction Mining specifications Detecting malware Results References

Model checking for malware detection

| = Program Malicious behavior Model? Specification formalism to describe behaviors?

slide-10
SLIDE 10

Introduction Mining specifications Detecting malware Results References

Previous approaches on model checking for malware detection

Use finite state models

  • (E.g. Kinder et al. [2010],Bonfante et al. [2008])
  • But the model fails to capture stack behavior!

Why is the stack important?

Malware writers use the stack to obfuscate their behaviour.

slide-11
SLIDE 11

Introduction Mining specifications Detecting malware Results References

Example of obfuscation

E.g. call obfuscation:

l1 : push m l2 : push 0 l3 : call GetModuleFileName lr : . . . l1 : push m l2 : push 0 l3 : push lr l4 : jmp lg lr : . . . Import address table lg GetModuleFileName

slide-12
SLIDE 12

Introduction Mining specifications Detecting malware Results References

Example of obfuscation

E.g. call obfuscation:

l1 : push m l2 : push 0 l3 : call GetModuleFileName lr : . . . l1 : push m l2 : push 0 l3 : push lr l4 : jmp lg lr : . . . Import address table lg GetModuleFileName

Our solution is:

To use pushdown systems that is a finite state system + a stack

slide-13
SLIDE 13

Introduction Mining specifications Detecting malware Results References

We use PDS (FSS + stack!)

Pushdown systems (PDS)

A PDS is a triple P = (P, Γ, ∆) where:

  • P is a finite set of control points,
  • Γ is a finite alphabet of stack symbols, and
  • ∆ ⊆ (P × Γ) × (P × Γ∗) is a finite set of transition rules.

Configurations

  • A configuration p, ω of P is an element of P × Γ∗
slide-14
SLIDE 14

Introduction Mining specifications Detecting malware Results References

PDS for malware detection

Since 2012 PDS have been used to perform malware detection!

  • FM [Song and Touili, 2012b]
  • TACAS [Song and Touili, 2012a]

POMMADE tool (FSEN [Song and Touili, 2013])

  • Logic to specify malicious behaviors.
  • Few malicious behaviors (discovered manually!)
slide-15
SLIDE 15

Introduction Mining specifications Detecting malware Results References

PDS for malware detection

Since 2012 PDS have been used to perform malware detection!

  • FM [Song and Touili, 2012b]
  • TACAS [Song and Touili, 2012a]

POMMADE tool (FSEN [Song and Touili, 2013])

  • Logic to specify malicious behaviors.
  • Few malicious behaviors (discovered manually!)

Our contribution in this work is to

Show how to automatically extract the malicious behaviors from a set of malware!

slide-16
SLIDE 16

Introduction Mining specifications Detecting malware Results References

Model checking for malware detection

| = Program Malicious behavior PDS Specification??

slide-17
SLIDE 17

Introduction Mining specifications Detecting malware Results References

Example of an email worm behavior

Assembly fragment from Bagle malware

l1 : push m l2 : push 0 l3 : call GetModuleFileName . . . l4 : push m l5 : call CopyFile

Self-replication!

slide-18
SLIDE 18

Introduction Mining specifications Detecting malware Results References

System call dependency trees (SCDT)

l1 : push m l2 : push 0 l3 : call GetModuleFileName . . . l4 : push m l5 : call CopyFile

GetModuleFileName CopyFile 1 2 ֌ 1 Self-replication!

slide-19
SLIDE 19

Introduction Mining specifications Detecting malware Results References

Model checking for malware detection

To summarize

| = Program Malicious behavior PDS SCDT

slide-20
SLIDE 20

Introduction Mining specifications Detecting malware Results References

Roadmap

Introduction Mining specifications Detecting malware Results

slide-21
SLIDE 21

Introduction Mining specifications Detecting malware Results References

How to automatically discover malicious SCDTs from programs?

Approach

Learning!

Given a:

  • set of already known malicious programs
  • set of already known benign programs

The goal is

To extract SCDTs and use statistical machinery to distinguish the malicious ones!

slide-22
SLIDE 22

Introduction Mining specifications Detecting malware Results References

How to extract SCDTs from a program?

  • 1. Model binaries as pushdown systems (mimic program

behaviors)

slide-23
SLIDE 23

Introduction Mining specifications Detecting malware Results References

How to extract SCDTs from a program?

  • 1. Model binaries as pushdown systems (mimic program

behaviors)

  • 2. Static reachability analysis (discover system calls)
slide-24
SLIDE 24

Introduction Mining specifications Detecting malware Results References

How to extract SCDTs from a program?

  • 1. Model binaries as pushdown systems (mimic program

behaviors)

  • 2. Static reachability analysis (discover system calls)
  • 3. Extract behaviors (discover data flows encoded as trees)
slide-25
SLIDE 25

Introduction Mining specifications Detecting malware Results References

Learning malicious trees

MalSCDT malicious behavior trees

A malicious behavior tree is a tree that occurs frequently in malware extracted SCDTs!

To compute frequent “subtrees” we use gSpan!

We specialize the frequent subgraph algorithm presented in [Yan and Han, 2002] to the case of trees.

slide-26
SLIDE 26

Introduction Mining specifications Detecting malware Results References

Roadmap

Introduction Mining specifications Detecting malware Results

slide-27
SLIDE 27

Introduction Mining specifications Detecting malware Results References

Model checking for malware detection

In summary we want to verify that:

| = Program Malicious behavior PDS MalSCDT

slide-28
SLIDE 28

Introduction Mining specifications Detecting malware Results References

Recognizing MalSCDT

A problem!

MalSCDT SCDT extracted from program under test GetModuleFileName CopyFile 1 2 ֌ 1 . . . GetModuleFileName . . . CopyFile . . . 1 2 ֌ 1 . . .

Use automata with regexps!

GetModuleFileName(q∗1(0)q∗2 ֌ 1(CopyFile) q∗) → qfin

slide-29
SLIDE 29

Introduction Mining specifications Detecting malware Results References

Teaching computers to detect malware

Build malicious behaviors database

  • 1. Build an hedge automaton A (recognizing MalSCDT)
slide-30
SLIDE 30

Introduction Mining specifications Detecting malware Results References

Teaching computers to detect malware

Build malicious behaviors database

  • 1. Build an hedge automaton A (recognizing MalSCDT)

Malware detection

  • 1. Model binary as PDS (mimic program behavior)
slide-31
SLIDE 31

Introduction Mining specifications Detecting malware Results References

Teaching computers to detect malware

Build malicious behaviors database

  • 1. Build an hedge automaton A (recognizing MalSCDT)

Malware detection

  • 1. Model binary as PDS (mimic program behavior)
  • 2. Static reachability analysis (discover system calls)
slide-32
SLIDE 32

Introduction Mining specifications Detecting malware Results References

Teaching computers to detect malware

Build malicious behaviors database

  • 1. Build an hedge automaton A (recognizing MalSCDT)

Malware detection

  • 1. Model binary as PDS (mimic program behavior)
  • 2. Static reachability analysis (discover system calls)
  • 3. Extract SCDT (discover data flows encoded as a tree)
slide-33
SLIDE 33

Introduction Mining specifications Detecting malware Results References

Teaching computers to detect malware

Build malicious behaviors database

  • 1. Build an hedge automaton A (recognizing MalSCDT)

Malware detection

  • 1. Model binary as PDS (mimic program behavior)
  • 2. Static reachability analysis (discover system calls)
  • 3. Extract SCDT (discover data flows encoded as a tree)
  • 4. Check wether SCDT belongs to A
slide-34
SLIDE 34

Introduction Mining specifications Detecting malware Results References

Roadmap

Introduction Mining specifications Detecting malware Results

slide-35
SLIDE 35

Introduction Mining specifications Detecting malware Results References

Results

  • Implemented the approach in a tool called PYRAMID
  • Learned MalSCDT from a set of malware
  • Tested them on another set of malware
  • Compared the results with traditional antivirus
slide-36
SLIDE 36

Introduction Mining specifications Detecting malware Results References

Implementation

PYRAMID in learning mode

MSDN × PE pyramidLearn pommade (PDS × API) pyramidExtract SCDT gspan MalSCDT inferAut × MSDN HELTA

slide-37
SLIDE 37

Introduction Mining specifications Detecting malware Results References

PYRAMID in detection mode

MSDN × PE pyramidCheck pommade (PDS × API) pyramidExtract SCDT pyramidMatch × MSDN yes/no? × HELTA × HELTA

slide-38
SLIDE 38

Introduction Mining specifications Detecting malware Results References

Experimental results

Learning experimental phase

From 193 malware files we obtained 1026 MalSCDT

Detection experimental phase

  • Detected 983 malware instances from 330 families (5× bigger)
  • Detection in 2.15s in average
  • Correctly classified as non-malware 250 benign programs files
slide-39
SLIDE 39

Introduction Mining specifications Detecting malware Results References

Results comparison

Procedure

  • Submitted the “malware” files to 48 antivirus tools
  • Categorized the antivirus performance in 4 classes
slide-40
SLIDE 40

Introduction Mining specifications Detecting malware Results References

Results comparison

Procedure

  • Submitted the “malware” files to 48 antivirus tools
  • Categorized the antivirus performance in 4 classes

Outcome

  • 99% of the malware files were detected by the top 10% tools!
  • Our tool detects real malware!
  • In average the tools only detected 80% of the files!
slide-41
SLIDE 41

Introduction Mining specifications Detecting malware Results References

Results comparison

Performance #Antivirus Detection range very good 5 99.1% to 99.5% good 19 95.0% to 99.1% bad 19 40.0% to 95.0% very bad 5 8.0% to 40.0%

Table: Performance categories

slide-42
SLIDE 42

Introduction Mining specifications Detecting malware Results References

Results comparison

p y r a m i d v e r y g

  • d

g

  • d

a v e r a g e b a d v e r y b a d 40 60 80 100 percentage malicious benign no answer

slide-43
SLIDE 43

Introduction Mining specifications Detecting malware Results References

Thank you for your attention!

slide-44
SLIDE 44

Introduction Mining specifications Detecting malware Results References

Bibliography

Guillaume Bonfante, Matthieu Kaczmarek, and Jean-Yves Marion. Morphological detection of malware. In International Conference

  • n Malicious and Unwanted Software, 2008. doi:

10.1109/MALWARE.2008.4690851.

  • M. Fossi, G. Egan, K. Haley, E. Johnson, T. Mack, T. Adams,
  • J. Blackbird, M.K. Low, D. Mazurek, D. McKinney, et al.

Symantec internet security threat report trends for 2010. Johannes Kinder, Stefan Katzenbeisser, Christian Schallhart, and Helmut Veith. Proactive Detection of Computer Worms Using Model Checking. IEEE Trans. on Dependable and Secure Computing, 2010. Fu Song and Tayssir Touili. Pushdown model checking for malware

  • detection. In TACAS, 2012a.

Fu Song and Tayssir Touili. Efficient malware detection using model-checking. In FM, 2012b. Fu Song and Tayssir Touili. PoMMaDe: Pushdown model-checking for malware detection, 2013. Xifeng Yan and Jiawei Han. gSpan: Graph-based substructure

slide-45
SLIDE 45

Introduction Mining specifications Detecting malware Results References

Experiments

Learning

From 193 malware files we obtained 1026 MalSCDT

Detection

  • Detected 983 malware instances from 330 families (5× larger)
  • Detection in 2.15s in average
  • Correctly classified as non-malware 250 benign programs files