When Malware is Packin Heat; Limits of Machine Learning Classifiers - - PowerPoint PPT Presentation

when malware is packin heat
SMART_READER_LITE
LIVE PREVIEW

When Malware is Packin Heat; Limits of Machine Learning Classifiers - - PowerPoint PPT Presentation

When Malware is Packin Heat; Limits of Machine Learning Classifiers Based on Static Analysis Features Hojjat Aghakhani , Fabio Gri*, Francesco Mecca, Mar0na Lindorfer, Stefano Ortolani, Davide Balzaro*, Giovanni Vigna, Christopher Kruegel


slide-1
SLIDE 1

When Malware is Packin’ Heat;

Limits of Machine Learning Classifiers Based on Static Analysis Features

Hojjat Aghakhani, Fabio Gri*, Francesco Mecca, Mar0na Lindorfer, Stefano Ortolani, Davide Balzaro*, Giovanni Vigna, Christopher Kruegel

slide-2
SLIDE 2

Packing

2

Original PE Header .text .data, .rsrc, .rdata, … Packed Section/s PE Header Decompression Stub Packing Process Original File Packed File

slide-3
SLIDE 3

Packing

3

Original PE Header .text .data, .rsrc, .rdata, … Packed Section/s PE Header Decompression Stub Packing Process Original PE Header .text .data, .rsrc, .rdata, … Unpacking RouAne Original File Packed File Original Program Loaded in Memory

slide-4
SLIDE 4

Packing Employed By Malware Authors

4

slide-5
SLIDE 5

Packing Evolution

  • Most packers are not this simple anymore...

5

slide-6
SLIDE 6

Packing Evolution

  • Most packers are not this simple anymore...
  • Different methods of obfuscation or encryption are being used

6

slide-7
SLIDE 7

Packing Evolution

  • Most packers are not this simple anymore...
  • Different methods of obfuscation or encryption are being used
  • Packing happens at multiple layers

7

slide-8
SLIDE 8

Packing Evolution

  • Most packers are not this simple anymore...
  • Different methods of obfuscation or encryption are being used
  • Packing happens at multiple layers
  • Unpacking routines are not necessarily executed in a straight line

8

slide-9
SLIDE 9

Packing Evolution

  • Most packers are not this simple anymore...
  • Different methods of obfuscation or encryption are being used
  • Packing happens at multiple layers
  • Unpacking routines are not necessarily executed in a straight line
  • Only a single fragment of the original code at any given time

9

slide-10
SLIDE 10

Packing Evolution

  • Most packers are not this simple anymore...
  • Different methods of obfuscation or encryption are being used
  • Packing happens at multiple layers
  • Unpacking routines are not necessarily executed in a straight line
  • Only a single fragment of the original code at any given time
  • Usually anti-debugging or anti-reverse-engineering techniques are employed

10

slide-11
SLIDE 11

Why Does Packing Matter?

  • It hampers the analysis of the code

11

slide-12
SLIDE 12

Why Does Packing Matter?

  • It hampers the analysis of the code
  • Makes malware classification more challenging!

12

slide-13
SLIDE 13

Why Does Packing Matter?

  • It hampers the analysis of the code
  • Makes malware classification more challenging!
  • Especially, when using only static analysis

13

slide-14
SLIDE 14

Malware Classification Using Static Analysis

16

Static Analysis + Machine Learning Dynamic Analysis Anti-Malware Companies

slide-15
SLIDE 15
  • What happens if the program is packed, i.e., the features are
  • bfuscated?

Malware Classification Using Static Analysis

17

Dynamic Analysis Anti-Malware Companies Static Analysis + Machine Learning

slide-16
SLIDE 16

Do Benign Software Programs Use Packing?

18

Packed Not Packed YES NO Malicious

slide-17
SLIDE 17

Packing Is Common in Benign Programs

19

slide-18
SLIDE 18

Packing Is Common in Benign Programs

  • Rahbarinia et al. [84], who studied 3 million web-based software

downloads over 7 months in 2014, found that both malicious and benign files use known packers (58% and 54%, respectively)

20

  • B. Rahbarinia, M. Balduzzi, and R. Perdisci, “Exploring the Long Tail of (Malicious) Software Downloads,”

in Proc. of the International Conference on Dependable Systems and Networks (DSN), 2017.

slide-19
SLIDE 19

21

“Packing == Malicious”

slide-20
SLIDE 20

“Packing == Malicious” on VirusTotal?

22

613 Windows 10 binaries located in C:\Windows\System32 Pack with Themida Submit to VT

slide-21
SLIDE 21

Dataset Pollution

24

slide-22
SLIDE 22

Does static analysis on packed binaries provide rich enough features to a malware classifier?

25

slide-23
SLIDE 23

Datasets

  • 1. Wild Dataset (50,724 executables):
  • 4,396 unpacked benign
  • 12,647 packed benign
  • 33,681 packed malicious

26

slide-24
SLIDE 24

Datasets

  • 1. Wild Dataset (50,724 executables):
  • 4,396 unpacked benign
  • 12,647 packed benign
  • 33,681 packed malicious
  • 2. Lab Dataset:

27

Wild Dataset Pack with 9 packers (including Themida, PECompact, UPX, …) 91,987 Benign Samples 198,734 Malicious Samples

slide-25
SLIDE 25

Nine Feature Categories

28

Category # Features

PE headers 28 PE sections 570 DLL imports 4,305 API imports 19,168 Rich Header 66 Byte n-grams 13,000 Opcode n-grams 2,500 Strings 16,900 File generic 2

slide-26
SLIDE 26

Our Research Questions

  • 1. Do packers preserve static analysis features that are useful for

malware classification?

29

slide-27
SLIDE 27

Experiment “Different Packed Ratios (lab)”

  • 1. We exclude packed benign samples from the training set
  • 2. Then, we keep adding more packed benign samples to the training

set

33

slide-28
SLIDE 28

Experiment “Different Packed Ratios (lab)”

  • 1. We exclude packed benign samples from the training set.
  • 2. Then, we keep adding more packed benign samples to the training

set

  • Surprisingly, the classifier is doing ok!

34

slide-29
SLIDE 29

But, How??

  • We focused on one packer at a time to identify useful features for

each packer!

  • 1. Some packers (e.g., Themida) often keep the Rich Header.
  • 2. Packers often keep .CAB file headers in the resource sections of the

executables.

  • 3. UPX keeps one API for each DLL.

35

slide-30
SLIDE 30

Our Research Questions

  • 1. Do packers preserve static analysis features that are useful for

malware classification?

  • 2. Do packers preserve static analysis features that are useful for

malware classification?

  • 3. Can a classifier that is carefully trained and not biased towards

specific packing routines perform well in real-world scenarios?

36

Packers preserve some information when packing programs that may be “useful” for malware classification, however, such information does not necessarily represent the real nature of samples

slide-31
SLIDE 31

Our Research Questions

  • 1. Do packers preserve static analysis features that are useful for

malware classification?

  • 2. Can such a classifier perform well in real-world scenarios?

37

slide-32
SLIDE 32

Our Research Questions

  • 1. Do packers preserve static analysis features that are useful for

malware classification?

  • 2. Can such a classifier perform well in real-world scenarios?

38

Generalization to unseen packers Adversarial examples

slide-33
SLIDE 33

Generalization To Unseen Packers

  • Runtime packers are evolving, and malware authors often tend to use

their own custom packers

39

slide-34
SLIDE 34

Generalization To Unseen Packers

  • 1. Experiment “withheld packer”

42

Themida tElock UPX kkrunchy Obsidium Petite MPRESS PELock PECompact Training Set Test Set

slide-35
SLIDE 35

Generalization To Unseen Packers

  • 1. Experiment “withheld packer”

43

Withheld Packer FPR (%) FNR (%) PELock 7.30 3.74 PECompact 47.49 2.14 Obsidium 17.42 3.32 Petite 5.16 4.47 tElock 43.65 2.02 Themida 6.21 3.29 MPRESS 5.43 4.53 kkrunchy 83.06 2.50 UPX 11.21 4.34

slide-36
SLIDE 36

Generalization To Unseen Packers

  • 2. Experiment “lab against wild”
  • We train the classifier on Lab Dataset
  • And evaluate it on packed executables in Wild Dataset

44

slide-37
SLIDE 37

Generalization To Unseen Packers

  • 2. Experiment “lab against wild”
  • We train the classifier on Lab Dataset
  • And evaluate it on packed executables in Wild Dataset
  • We observed the false negative rate of 41.84%, and false positive rate
  • f 7.27%

45

slide-38
SLIDE 38

Poor Generalization To Unseen Packers

46

slide-39
SLIDE 39

Adversarial Examples

  • Machine-learning-based malware detectors shown to be vulnerable

to adversarial samples

47

slide-40
SLIDE 40

Adversarial Examples

  • Machine-learning-based malware detectors shown to be vulnerable

to adversarial samples

  • Packing produces features not directly deriving from the actual

(unpacked) program

48

slide-41
SLIDE 41

Adversarial Examples

  • Machine-learning-based malware detectors shown to be vulnerable

to adversarial samples

  • Packing produces features not directly deriving from the actual

(unpacked) program

  • Generating such adversarial samples would be easier for an adversary

49

slide-42
SLIDE 42

Adversarial Examples

50

Packed Malicious Packed Benign Training Set Random Forest Train RF Model Training: Unpacked Benign

slide-43
SLIDE 43

Adversarial Examples

51

Packed Malicious Packed Benign Training Set Random Forest Train RF Model Training: Test Set Prediction Malicious Evasion: Unpacked Benign Packed Malicious Benign Strings

slide-44
SLIDE 44

Adversarial Examples

52

Packed Malicious Packed Benign Training Set Random Forest Train RF Model Training: Prediction Benign Evasion: Unpacked Benign Packed Malicious Benign Strings Test Set

slide-45
SLIDE 45

Ma Machine Le Learn rning St Static E Evasion

  • n Comp

Competition

  • n

53

150 Malicious Samples Benign Strings 50% Evasion!!!

slide-46
SLIDE 46

Ma Machine Le Learn rning St Static E Evasion

  • n Comp

Competition

  • n

54

150 Malicious Samples Benign Strings 50% Evasion!!!

  • Recently, a group of researchers found a very similar way to subvert

an AI-based anti-malware engine

  • By simply taking strings from an online gaming program and

appending them to known malware, like WannaCry

slide-47
SLIDE 47

Vulnerable To Trivial Adversarial Examples

56

slide-48
SLIDE 48

Conclusion

57

Packed Malicious Packed Benign Training Set Unpacked Benign Not Biased Model

slide-49
SLIDE 49

Conclusion

58

Rich Header .CAB Headers API Imports M a n i f e s t S t r i n g s Rich Header .CAB Headers API Imports M a n i f e s t S t r i n g s

slide-50
SLIDE 50

Reproducibility

  • The source code and our datasets of 392,168 executables are

available at https://github.com/ucsb-seclab/packware

  • All experiments can be simply executed in the provided Docker image

59

slide-51
SLIDE 51

Any Questions?

60

slide-52
SLIDE 52

Experiment “Good-Bad Packers”

61

Themida tElock UPX kkrunchy Benign Obsidium Petite MPRESS PELock Malicious PECompact Training Set

slide-53
SLIDE 53

Experiment “Good-Bad Packers”

62

Benign Malicious Training Set Obsidium kkrunchy tElock PELock Themida UPX PECompact Petite MPRESS Benign Malicious Test Set UPX PECompact Petite MPRESS Obsidium kkrunchy tElock PELock Themida

Accuracy varied from 0.01% to 12.57% across all splits