Automatically Evading Classifiers
A Case Study on PDF Malware Classifiers
Weilin Xu David Evans Yanjun Qi University of Virginia
Automatically Evading Classifiers A Case Study on PDF Malware - - PowerPoint PPT Presentation
Automatically Evading Classifiers A Case Study on PDF Malware Classifiers Weilin Xu David Evans Yanjun Qi University of Virginia Machine Learning is Solving Our Problems Fake Fake Spam IDS
Weilin Xu David Evans Yanjun Qi University of Virginia
2
Fake
Spam IDS Malware Fake Accounts … …
3
4
Security Expert 5
Security Expert 6
Goal: Understand classifiers under attack. Results: Vulnerable to automated evasion.
7
8
Trained Classifier Labelled Training Data ML Algorithm
Training (Supervised Learning)
Feature Extraction Vectors
9
Labelled Training Data ML Algorithm Feature Extraction Vectors
Deployment
Malicious / Benign Operational Data Trained Classifier
Training (Supervised Learning)
Accuracy
False Negative Rate
False Negative Rate with Adversary
10
* Mimicus [Oakland ’14], an open source reimplementation of PDFrate.
Accuracy
False Negative Rate
False Negative Rate with Adversary
11
Very robust against “strongest conceivable mimicry attack”.
* Mimicus [Oakland ’14], an open source reimplementation of PDFrate.
Variants
12
Clone Benign PDFs Malicious PDF
Mutation
01011001101
Variants Variants
Select Variants
✓ ✓ ✗ ✓
Variants
13
Clone Benign PDFs Malicious PDF
Mutation
01011001101
Variants Variants
Select Variants
✓ ✓ ✗ ✓
/Catalog /Pages /JavaScript eval(‘…’); /Root
Modified Parser
Abusing PDF Parsers in Malware Detectors
Curtis Carmony,et al.
Variants
14
Clone Benign PDFs Malicious PDF
Mutation
01011001101
Variants Variants
Select Variants
✓ ✓ ✗ ✓
/Catalog /Pages /JavaScript eval(‘…’); /Root
Variants From Benign
Variants
15
Clone Benign PDFs Malicious PDF
Mutation
01011001101
Variants Variants
Select Variants
✓ ✓ ✗ ✓
/Catalog /Pages /JavaScript eval(‘…’); /Root
Variants From Benign
128 546
Variants
16
Clone Benign PDFs Malicious PDF
Mutation
01011001101
Variants Variants
Select Variants
✓ ✓ ✗ ✓
/Catalog /Pages /JavaScript eval(‘…’); /Root
Variants From Benign
128 546
Variants
17
Clone Benign PDFs Malicious PDF
Mutation
01011001101
Variants Variants
Select Variants
✓ ✓ ✗ ✓
/Catalog /Pages /JavaScript eval(‘…’); /Root
Variants From Benign
128 546 128
Variants
18
Clone Benign PDFs Malicious PDF
Mutation
01011001101
Variants Variants
Select Variants
✓ ✓ ✗ ✓
/Catalog /Pages /JavaScript eval(‘…’); /Root
Variants From Benign
128
Variants
19
Clone Benign PDFs Malicious PDF
Mutation
01011001101
Variants Variants
Select Variants
✓ ✓ ✗ ✓
/Catalog /Pages /JavaScript eval(‘…’); /Root
Variants From Benign
128
Variants
20
Clone Benign PDFs Malicious PDF
Mutation
01011001101
Variants Variants
Select Variants
✓ ✓ ✗ ✓
Variants
21
Clone Benign PDFs Malicious PDF
Mutation
01011001101
Variants Variants
Select Variants
✓ ✓ ✗ ✓
Fitness Function Oracle Target Classifier
Malicious? Score Fitness Score Variants
Variants
22
Clone Benign PDFs Malicious PDF
Mutation
01011001101
Variants Variants
Select Variants
✓ ✓ ✗ ✓
Fitness Function Oracle Target Classifier
Malicious? Score Fitness Score Variants Malicious Benign
Variants
23
Clone Benign PDFs Malicious PDF
Mutation
01011001101
Variants Variants
Select Variants
✓ ✓ ✗ ✓
24
Original Malware Seeds
25
Original Malware Seeds Evasive Variants
26
Original Malware Seeds Evasive Variants Evasive Variants with lower threshold
27
Original Malware Seeds
28
Original Malware Seeds Evasive Variants
29
Difficulty varies by seed Simple mutations often work Complex mutations sometimes needed. Difficulty varied by targets: PDFrate: 6 days to evade all Hidost: 2 days to evade all
30
PDF Malware Seeds
Hidost
Evasive PDF Malware (against Hidost) Automated Evasion
PDFrate
387/500 Evasive (77.4%) 3/500 Evasive (0.6%)
Gmail’s classifier is secure?
31
PDF Malware Seeds
Hidost
Evasive PDF Malware (against Hidost) Automated Evasion
PDFrate
387/500 Evasive (77.4%) 3/500 Evasive (0.6%)
Gmail’s classifier is secure? different.
32
33
34
Source Code: http://EvadeML.org