When Malware is Packin’ Heat;
Limits of Machine Learning Classifiers Based on Static Analysis Features
Hojjat Aghakhani, Fabio Gri*, Francesco Mecca, Mar0na Lindorfer, Stefano Ortolani, Davide Balzaro*, Giovanni Vigna, Christopher Kruegel
When Malware is Packin Heat; Limits of Machine Learning Classifiers - - PowerPoint PPT Presentation
When Malware is Packin Heat; Limits of Machine Learning Classifiers Based on Static Analysis Features Hojjat Aghakhani , Fabio Gri*, Francesco Mecca, Mar0na Lindorfer, Stefano Ortolani, Davide Balzaro*, Giovanni Vigna, Christopher Kruegel
Hojjat Aghakhani, Fabio Gri*, Francesco Mecca, Mar0na Lindorfer, Stefano Ortolani, Davide Balzaro*, Giovanni Vigna, Christopher Kruegel
2
Original PE Header .text .data, .rsrc, .rdata, … Packed Section/s PE Header Decompression Stub Packing Process Original File Packed File
3
Original PE Header .text .data, .rsrc, .rdata, … Packed Section/s PE Header Decompression Stub Packing Process Original PE Header .text .data, .rsrc, .rdata, … Unpacking RouAne Original File Packed File Original Program Loaded in Memory
4
5
6
7
8
9
10
11
12
13
16
Static Analysis + Machine Learning Dynamic Analysis Anti-Malware Companies
17
Dynamic Analysis Anti-Malware Companies Static Analysis + Machine Learning
18
Packed Not Packed YES NO Malicious
19
downloads over 7 months in 2014, found that both malicious and benign files use known packers (58% and 54%, respectively)
20
in Proc. of the International Conference on Dependable Systems and Networks (DSN), 2017.
21
22
613 Windows 10 binaries located in C:\Windows\System32 Pack with Themida Submit to VT
24
25
26
27
Wild Dataset Pack with 9 packers (including Themida, PECompact, UPX, …) 91,987 Benign Samples 198,734 Malicious Samples
28
Category # Features
PE headers 28 PE sections 570 DLL imports 4,305 API imports 19,168 Rich Header 66 Byte n-grams 13,000 Opcode n-grams 2,500 Strings 16,900 File generic 2
malware classification?
29
set
33
set
34
each packer!
executables.
35
malware classification?
malware classification?
specific packing routines perform well in real-world scenarios?
36
malware classification?
37
malware classification?
38
Generalization to unseen packers Adversarial examples
their own custom packers
39
42
Themida tElock UPX kkrunchy Obsidium Petite MPRESS PELock PECompact Training Set Test Set
43
Withheld Packer FPR (%) FNR (%) PELock 7.30 3.74 PECompact 47.49 2.14 Obsidium 17.42 3.32 Petite 5.16 4.47 tElock 43.65 2.02 Themida 6.21 3.29 MPRESS 5.43 4.53 kkrunchy 83.06 2.50 UPX 11.21 4.34
44
45
46
to adversarial samples
47
to adversarial samples
(unpacked) program
48
to adversarial samples
(unpacked) program
49
50
Packed Malicious Packed Benign Training Set Random Forest Train RF Model Training: Unpacked Benign
51
Packed Malicious Packed Benign Training Set Random Forest Train RF Model Training: Test Set Prediction Malicious Evasion: Unpacked Benign Packed Malicious Benign Strings
52
Packed Malicious Packed Benign Training Set Random Forest Train RF Model Training: Prediction Benign Evasion: Unpacked Benign Packed Malicious Benign Strings Test Set
53
150 Malicious Samples Benign Strings 50% Evasion!!!
54
150 Malicious Samples Benign Strings 50% Evasion!!!
an AI-based anti-malware engine
appending them to known malware, like WannaCry
56
57
Packed Malicious Packed Benign Training Set Unpacked Benign Not Biased Model
58
Rich Header .CAB Headers API Imports M a n i f e s t S t r i n g s Rich Header .CAB Headers API Imports M a n i f e s t S t r i n g s
available at https://github.com/ucsb-seclab/packware
59
60
61
Themida tElock UPX kkrunchy Benign Obsidium Petite MPRESS PELock Malicious PECompact Training Set
62
Benign Malicious Training Set Obsidium kkrunchy tElock PELock Themida UPX PECompact Petite MPRESS Benign Malicious Test Set UPX PECompact Petite MPRESS Obsidium kkrunchy tElock PELock Themida