Improving Malware Classification: Bridging the Static/Dynamic Gap - - PowerPoint PPT Presentation

improving malware classification bridging the static
SMART_READER_LITE
LIVE PREVIEW

Improving Malware Classification: Bridging the Static/Dynamic Gap - - PowerPoint PPT Presentation

Improving Malware Classification: Bridging the Static/Dynamic Gap Authors: Blake Anderson, Curtis Storlie, Terran Lane Vinit Singh 18 th April 2017 CISC850 Cyber Analytics CISC850 Cyber Analytics INTRODUCTION Why is there a need for


slide-1
SLIDE 1

Improving Malware Classification: Bridging the Static/Dynamic Gap

Vinit Singh 18th April 2017

Authors: Blake Anderson, Curtis Storlie, Terran Lane

CISC850 Cyber Analytics

slide-2
SLIDE 2

INTRODUCTION

  • Why is there a need for machine learning in

malware detection ?

  • The need for different type of data sources

and how to combine them.

  • Unified framework by using a support vector

machine using multiple kernel learning.

CISC850 Cyber Analytics

slide-3
SLIDE 3

DATA SOURCES

  • STATIC SOURCES:

Binary, Disassembled Binary, Control Flow Graph

  • DYNAMIC SOURCES:

Dynamic Instruction Traces (DIT) , Dynamic System Call Traces (DST)

  • MISCELLANEOUS FILE INFORMATION:

Entropy, Packers, Instructions in file, vertices and edges in CFG

CISC850 Cyber Analytics

slide-4
SLIDE 4

METHOD

STEP 1: DATA REPRESENTATION

  • Markov chain representation for raw binary,

disassembled binary, DIT and DST

  • Standard representation for Control Flow Graph
  • The miscellaneous file information is represented as

a simple feature vector of length seven

CISC850 Cyber Analytics

slide-5
SLIDE 5

STEP 2: KERNELS

  • The Kernel Trick
  • Exponential Kernel:

xi : Features of the file information / transition probability of Markov chain

  • Graphlet Kernel:

G: Graph , k : number of nodes of subgraph equal to k DG : Normalized probability vector = fg / # of all graphlets of size k fg = feature vector consisting number of times unique subgraph of size k occurs

slide-6
SLIDE 6

Heatmaps for Individual Kernels

slide-7
SLIDE 7

STEP 3: MULTIPLE KERNEL LEARNING

  • Optimization problem for classical kernel learning:

Subject to constraint: Thus the Decision function is :

  • But for multiple kernel learning we need to estimate βk
slide-8
SLIDE 8

Heatmap of Combined Kernel

slide-9
SLIDE 9

RESULTS

  • Criteria 1 : Accuracy:

Accuracy is calculated using 10-fold cross-validation.

slide-10
SLIDE 10
  • Criteria 2: ROC Curves / AUC Values
slide-11
SLIDE 11
  • Criteria 3: Speed to classify new instances
slide-12
SLIDE 12
  • Criteria 4: Testing on a Large Malware Sample

Accuracy on validation set consisting of 20k samples

slide-13
SLIDE 13

OBSERVATIONS

  • There were a total of 19 false positives and

false negatives that were found out of 1556 instances of the original dataset.

  • Use of only static analysis doesn’t work well

when the training instances have been packed.

slide-14
SLIDE 14

LIMITATIONS AND DRAWBACKS

  • Selecting an appropriate value of n for n-gram

analysis

  • Time to collect dynamic system traces will be too

resource intensive on a normal system

  • Choosing optimal instruction call categories
  • Intel Pin isn’t transparent while tracing the

program to collect instructions

slide-15
SLIDE 15

RELATED WORK

  • Use of single data sources
  • Use of static data sources combined with

ensemble learning

  • Result Fusion Model
  • Identifying packed and hidden code
slide-16
SLIDE 16

CONCLUSION

  • Not restricting malware classification to a single

data source improves classification accuracy.

  • In a resource constrained environment combined

static analysis can result in high accuracy and low number of false positives.

  • Static analysis is not an optimal solution when

instances have been packed or have an high entropy.