PE-Miner : Mining Structural Information to Detect Malicious - PowerPoint PPT Presentation

PE-Miner : Mining Structural Information to Detect Malicious Executables in Realtime M. Zubair Shafiq, S. Momina Tabish, Fauzan Mirza, Muddassar Farooq RAID, 2009   Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 

Agenda Outline Introduc1on to Domain  Problem Defini1on  Proposed Solu1on  Evalua1on  Literature Survey  Results and Discussion  Conclusion  Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  2

Domain Introduction Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  3

Introduction Computer malware is a widespread problem… Backdoor, Virus, Worm, Trojan, etc. A number of commercial anti-virus software Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  4

Financial losses… Number of new threats Estimated Damage (in billions of US Dollars) Total threats 55 600 Milliers 500 400 300 25 200 17,1 13,2 12,1 100 0 Jan-Jun 2002 Jul-Dec 2002 Jan-Jun 2003 Jul-Dec 2003 Jan-Jun 2004 Jul-Dec 2004 Jan-Jun 2005 Jul-Dec 2005 Jan-Jun 2006 Jul-Dec 2006 Jan-Jun 2007 1999 2000 2001 2002 2003 Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  5

Need of non-signature based AV?  Problems with signature matching…  Size of signature database cannot scale  Evaded by simple code obfuscation techniques Norton AV Command AV McAfee AV Chernobyl-1.4 F0sf0r0 Hare Z0mbie-6.b Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  6

How good are non-signature based solutions  Usually leverage  Statistical analysis of machine level byte content  Disassembled code  Run-time API calls  Problems with existing non-signature solutions…  High false alarm rate  Large scanning overheads Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  7

Problem Definition Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  8

Problem Definition  Non-signature based detector  Keep run-time complexity low  “Content Independent” features  Low false alarm rate Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  9

Proposed Solution Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  10

Proposed Solution: “PE-Miner” Leverage the structural information of an executable • Extract structural features from all portions of an executable • Standard pre-processing to remove redundancy • • Use supervised classification algorithms for detection • Training models provide comprehendible insights for forensic experts Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  11

PE-Miner Framework  Uses novel structural features to efficiently detect malicious PE files  Strict requirements of the system:  Must be a pure non-signature based framework with an ability to detect zero-day malicious PE files.  Must be realtime deployable i.e. more than 99% tp rate and less than 1% fp rate  Design must be modular that allows for the plug-n-play design philosophy Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  12

PE-Miner Framework  A threefold research methodology in our static analysis: Identify a set of structural features for PE files which 1. is computable in realtime, use an efficient preprocessor for removing 2. redundancy in the features’ set, and select an efficient data mining algorithm for final 3. classification Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  13

Proposed Architecture Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 

Which PE format features to select?  Structural features from Windows PE file format  189 features selected  For example malicious exe’s have usually  bigger import tables,  smaller resource tables  no exception tables Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  15

Which PE format features to select? Name Benign Backdor Cons DoS+ Flooder Exploit+ Worm Trojan Virus Malfease of +Sniffer +Virto Nuker Hacktool feature ol Import 5.8 19.2 6.1 7.9 20.8 7.1 23.4 10.3 6.2 4.7 Table Size Rsrc 32.6 5.5 1.5 1.4 6.2 1.0 2.6 2.2 0.5 5.9 Table Size Excep 12.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.5 tion Table Full table in the paper Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  16

Which Pre-processor be used?  Why preprocessing?  Out of 189 features, some might not convey useful information!  Either remove / combine such features  To reduce the dimensionality of input feature space  Reduces training / testing times of classifiers  Three pre-processing algorithms used Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  17

Feature Pre-processing Algorithms Redundant Feature Removal (RFR) -- repeated values • Principal Component Analysis (PCA) -- data variance • Haar Wavelet Transform (HWT) -- approximation of function •  RFR is selected due to the high detection accuracy obtained after applying it, as well as it is realtime deployable Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  18

Which classification algorithm? IBk – nearest neighbor algorithm • J48 – decision tree • NB – Bayesian classifier • SMO – optimized support vector machine • RIPPER – inductive rule learning algorithm •  J48 is selected due to its highest detection accuracy and low computational complexity and it is also realtime deployable after performing the timing analysis Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  19

Evaluation Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  20

Evaluation  Evaluation of the proposed framework is done on 2 well known malware collections.  Evaluation datasets  VX Heavens virus collection  10 thousand labeled malware  Malfease malware collection  5 thousand malware Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  21

Literature Survey Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  22

Learning to Detect and Classify Malicious Executables in the Wild J. Zico Kolter, Macus A. Maloof @ Stanford University, George Town University, USA Journal of Machine Learning Research, MIT Press, 2006 . (ISI Impact Factor: 2.682) Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  23

Learning to Detect and Classify Malicious Executables in the Wild N-gram Executable File Analysis Benign N-gram Feature Extraction Classification Malicious N-gram Result? Algorithm Overview of KM Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  24

Critiques + First “real” application of n-gram analysis for malware detection + Forensic insights from trained models + High accuracy + Classification of malicious executables as a function of their payload function (i.e., backdoor, worm, virus, etc.) - Huge computational complexity in training. (several days) - Not robust to malware packing - False alarms for packed benign files Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  25

McBoost: Boosting Scalability in Malware Analysis Using Statistical Classification of Executables R. Perdisci, A. Lanzi, W. Lee @ Georgia Tech University, Damballa Inc., USA Annual Computer Security Applications Conference (ACSAC), USA, 2008. (acceptance rate 24.3%) Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  26

McBoost: Boosting Scalability in Malware Analysis Using Statistical Classification of Executables hidden dynamic code unpacker C1 packed A1 Executable Σ A2 C2 File non-packed A3 Malcode Classifier Heuristic packer detector Result? Overview of McBoost Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  27

Critiques + First ever technique that leverages packer identification + Uses unpacker to extract hidden malicious code + Separate n-gram training models for packed and unpacked executable files - High run-time computational overhead; not feasible for realtime deployment - Inherits problems with the use of dynamic unpacker; halt, crash, evasion Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  28

Results and Discussion Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  29

Results Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  30

Discussion  Highly Accurate  Low scanning overheads  Structural features are robust to evasion attempts? Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan  31

PE-Miner : Mining Structural Information to Detect Malicious - PowerPoint PPT Presentation

PE-Miner : Mining Structural Information to Detect Malicious Executables in Realtime M. Zubair Shafiq, S. Momina Tabish, Fauzan Mirza, Muddassar Farooq RAID,2009

The MINER A A The MINER Experiment Experiment Csar Castromonte Csar Castromonte

MINER n A Cross Sections what is MINER n A ? why MINER n A ? n beam and n flux n / n inclusive

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

n N Deep Inelastic Scattring at MINER n A Alessandro Bravar Universit de Genve for the

Can We Detect Crisp Sets Based Only on How to Detect 1- . . . the Subsethood Ordering of Fuzzy

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Miner Lake Association Spring 2017 Meeting May 27, 2017 Agenda Approve Minutes from Fall

Portrait of a Miner in a Landscape Alex Biryukov, Daniel Feher University of Luxembourg April

Web Mining Andreas Andersson Gustav Strmberg Sandra Stendahl Introduction Web mining o

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

Web Mining Web Mining to automatically discover and extract information from Web

The Continued Evolution of Commercial Operating Leasing Rob Morris, Head of Consultancy, Ascend

BRINGING IN THE SUPPLIER Part 20 claims for tour operators Sarah Crowther, Barrister, 3 Hare

versus the reality of eTP implementation By Kyaw K. Htat Supervisors: A/Prof Trish Williams

CYGNUS Update Directional WIMP Detector Vision reminder Sensitivity and cost studies

Logic Programming Using Data Structures Part 1 Temur Kutsia Research Institute for Symbolic

De stina tio n Po int De fia nc e City o f T a c o ma a nd Me tro Pa rks De ve lo pme nt Re g

CS 744: DRF Shivaram Venkataraman Fall 2020 ML knowledge ADMINISTRIVIA q TEY Attend tM%L -

Not All Coverage Measurements Are Equal Fuzzing by Coverage Accounting for Input Prioritization