PE-Miner: Mining Structural Information to Detect Malicious Executables in Realtime
RAID, 2009
- M. Zubair Shafiq, S. Momina Tabish,
Fauzan Mirza, Muddassar Farooq
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
PE-Miner : Mining Structural Information to Detect Malicious - - PowerPoint PPT Presentation
PE-Miner : Mining Structural Information to Detect Malicious Executables in Realtime M. Zubair Shafiq, S. Momina Tabish, Fauzan Mirza, Muddassar Farooq RAID,2009
RAID, 2009
Fauzan Mirza, Muddassar Farooq
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Outline
2
Introduc1on to Domain Problem Defini1on Literature Survey Conclusion
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Results and Discussion Proposed Solu1on Evalua1on
3
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Introduction
Computer malware is a widespread problem… Backdoor, Virus, Worm, Trojan, etc. A number of commercial anti-virus software
4
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Financial losses…
5
100 200 300 400 500 600 Jan-Jun 2002 Jul-Dec 2002 Jan-Jun 2003 Jul-Dec 2003 Jan-Jun 2004 Jul-Dec 2004 Jan-Jun 2005 Jul-Dec 2005 Jan-Jun 2006 Jul-Dec 2006 Jan-Jun 2007 Milliers Number of new threats Total threats 1999 2000 2001 2002 2003 12,1 17,1 13,2 25 55 Estimated Damage (in billions of US Dollars) Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Need of non-signature based AV?
Problems with signature matching… Size of signature database cannot scale Evaded by simple code obfuscation techniques
6
Norton AV Command AV McAfee AV Chernobyl-1.4 F0sf0r0 Hare Z0mbie-6.b Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
How good are non-signature based solutions
7
Problems with existing non-signature solutions… High false alarm rate Large scanning overheads Usually leverage Statistical analysis of machine level byte content Disassembled code Run-time API calls
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
8
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Problem Definition
Non-signature based detector Keep run-time complexity low “Content Independent” features Low false alarm rate
9
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
10
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
11
experts
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
PE-Miner Framework
Uses novel structural features to efficiently detect
malicious PE files
Strict requirements of the system: Must be a pure non-signature based framework with an
ability to detect zero-day malicious PE files.
Must be realtime deployable i.e. more than 99% tp rate
and less than 1% fp rate
Design must be modular that allows for the plug-n-play
design philosophy
12
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
PE-Miner Framework
A threefold research methodology in our static
analysis:
1.
Identify a set of structural features for PE files which is computable in realtime,
2.
use an efficient preprocessor for removing redundancy in the features’ set, and
3.
select an efficient data mining algorithm for final classification
13
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Proposed Architecture
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Which PE format features to select?
Structural features
from Windows PE file format
189 features selected For example malicious
exe’s have usually
bigger import tables, smaller resource tables no exception tables
15
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Which PE format features to select?
16
Name
feature Benign Backdor +Sniffer Cons +Virto
DoS+ Nuker Flooder Exploit+ Hacktool Worm Trojan Virus Malfease
Import Table Size 5.8 19.2 6.1 7.9 20.8 7.1 23.4 10.3 6.2 4.7 Rsrc Table Size 32.6 5.5 1.5 1.4 6.2 1.0 2.6 2.2 0.5 5.9 Excep tion Table 12.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.5 Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan Full table in the paper
Which Pre-processor be used?
Why preprocessing? Out of 189 features, some might not convey useful
information!
Either remove / combine such features To reduce the dimensionality of input feature space
Reduces training / testing times of classifiers
Three pre-processing algorithms used
17
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
18
RFR is selected due to the high detection accuracy
deployable
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Which classification algorithm?
J48 is selected due to its highest detection accuracy and
low computational complexity and it is also realtime deployable after performing the timing analysis
19
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
20
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Evaluation
Evaluation of the proposed framework is done on 2
well known malware collections.
Evaluation datasets
VX Heavens virus collection
10 thousand labeled malware
Malfease malware collection
5 thousand malware
21
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
22
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Learning to Detect and Classify Malicious Executables in the Wild
23
Journal of Machine Learning Research, MIT Press, 2006. (ISI Impact Factor: 2.682) @ Stanford University, George Town University, USA
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
24
Learning to Detect and Classify Malicious Executables in the Wild
Executable File
N-gram Analysis Classification Algorithm Feature Extraction Benign N-gram
Malicious N-gram
Result?
Overview of KM
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
25
+ First “real” application of n-gram analysis for
malware detection + Forensic insights from trained models + High accuracy + Classification of malicious executables as a function
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
McBoost: Boosting Scalability in Malware Analysis Using Statistical Classification of Executables
26
Annual Computer Security Applications Conference (ACSAC), USA, 2008. (acceptance rate 24.3%) @ Georgia Tech University, Damballa Inc., USA
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
27
McBoost: Boosting Scalability in Malware Analysis Using Statistical Classification of Executables
Executable File
A1
Overview of McBoost
A2 A3
Σ
dynamic unpacker C1 C2
packed non-packed
Result?
hidden code
Heuristic packer detector Malcode Classifier Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
28
+ First ever technique that leverages packer identification
+ Uses unpacker to extract hidden malicious code + Separate n-gram training models for packed and unpacked executable files
realtime deployment
halt, crash, evasion
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
29
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Results
30
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Discussion
Highly Accurate Low scanning overheads Structural features are robust to evasion attempts?
31
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Discussion
Scenario 1: Training on random PE files and detection
Scenario 2: Training on non-packed PE files and
detection of packed PE files (AUC = 0.964)
Scenario 3: Training on packed PE files and detection
Scenario 4: Training on packed/non-packed PE files
and detection of packed benign and non-packed malicious PE files (AUC = 0.995)
32
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Future Work
Completely remove biasness w.r.t. executable
packing
Use a non-signature based packer detector PE-probe: leveraging packer detection and structural
information to detect malicious portable executables, Virus Bulletin (VB) Conference, September 2009.
33
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Executable File
PD P1 P2
packed non-packed
Non-signature based packer detector Specialized PE-Miner Models
Result?
34
Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
References (1/2)
Malware Collection and Analysis Using Statistical Classification of Executables”, Annual Computer Security Applications Conference (ACSAC), USA, 2008. (In Press)
M.G. Schultz, E. Eskin, E. Zadok, S.J. Stolfo, “Data mining methods for detection of new malicious executables”, IEEE Symposium on Security and Privacy (S&P), pp. 38- 49, USA, 2001.
J.Z. Kolter, M.A. Maloof, “Learning to detect malicious executables in the wild”, ACM International Conference on Knowledge Discovery and Data Mining (KDD), pp. 470-478, USA, 2004.
Symantec Internet Security Threat Reports I-XI (Jan 2002-Jan 2008).
35
School of Electrical Engineering and Computer Science (SEECS)
References (2/2)
F-Secure Corporation, “F-Secure Reports Amount of Malware Grew by 100% during 2007”, Press release, 2007.
VX Heavens Virus Collection, VX Heavens website, available at http://vx.netlux.org.
Project Malfease, available at http://malfease.oarci.net/.
Annual Technical Conference, FREENIX Track, pp. 41-46, 2005.
J.R. Quinlan, “C4.5: Programs for machine learning”, Morgan Kaufmann, USA, 1993.
36
School of Electrical Engineering and Computer Science (SEECS)