Automatically Evading Classifiers A Case Study on PDF Malware - - PowerPoint PPT Presentation

automatically evading classifiers
SMART_READER_LITE
LIVE PREVIEW

Automatically Evading Classifiers A Case Study on PDF Malware - - PowerPoint PPT Presentation

Automatically Evading Classifiers A Case Study on PDF Malware Classifiers Weilin Xu David Evans Yanjun Qi University of Virginia Machine Learning is Solving Our Problems Fake Fake Spam IDS


slide-1
SLIDE 1

Automatically Evading Classifiers

A Case Study on PDF Malware Classifiers

Weilin Xu David Evans Yanjun Qi University of Virginia

slide-2
SLIDE 2

Machine Learning is Solving Our Problems

2

Fake

Spam IDS Malware Fake Accounts … …

slide-3
SLIDE 3

3

slide-4
SLIDE 4

4

slide-5
SLIDE 5

Machine Learning is Eating the World

Data Scientist

Security Expert 5

?

slide-6
SLIDE 6

Machine Learning is Eating the World

Data Scientist

Security Expert 6

No! Security is different.

slide-7
SLIDE 7

Goal: Understand classifiers under attack. Results: Vulnerable to automated evasion.

Security Tasks are Different: Adversary Adapts

7

slide-8
SLIDE 8

Building Machine Learning Classifiers

8

Trained Classifier Labelled Training Data ML Algorithm

Training (Supervised Learning)

Feature Extraction Vectors

slide-9
SLIDE 9

Assumption: Training Data is Representative

9

Labelled Training Data ML Algorithm Feature Extraction Vectors

Deployment

Malicious / Benign Operational Data Trained Classifier

Training (Supervised Learning)

slide-10
SLIDE 10

Results: Evaded PDF Malware Classifiers

PDFrate* [ACSAC’12] Hidost [NDSS’13]

Accuracy

0.9976 0.9996

False Negative Rate

0.0000 0.0056

False Negative Rate with Adversary

1.0000 1.0000

10

* Mimicus [Oakland ’14], an open source reimplementation of PDFrate.

slide-11
SLIDE 11

Results: Evaded PDF Malware Classifiers

PDFrate* [ACSAC’12] Hidost [NDSS’13]

Accuracy

0.9976 0.9996

False Negative Rate

0.0000 0.0056

False Negative Rate with Adversary

1.0000 1.0000

11

Very robust against “strongest conceivable mimicry attack”.

* Mimicus [Oakland ’14], an open source reimplementation of PDFrate.

slide-12
SLIDE 12

Variants

12

Clone Benign PDFs Malicious PDF

Mutation

01011001101

Variants Variants

Select Variants

✓ ✓ ✗ ✓

Based on Genetic Programming

Automated Evasion Approach

slide-13
SLIDE 13

Variants

13

Clone Benign PDFs Malicious PDF

Mutation

01011001101

Variants Variants

Select Variants

✓ ✓ ✗ ✓

Based on Genetic Programming

Automated Evasion Approach

/Catalog /Pages /JavaScript eval(‘…’); /Root

Modified Parser

Extract Me If You Can:

Abusing PDF Parsers in Malware Detectors

Curtis Carmony,et al.

slide-14
SLIDE 14

Variants

14

Clone Benign PDFs Malicious PDF

Mutation

01011001101

Variants Variants

Select Variants

✓ ✓ ✗ ✓

Based on Genetic Programming

Automated Evasion Approach

/Catalog /Pages /JavaScript eval(‘…’); /Root

Mutation

Variants From Benign

Insert / Replace / Delete

slide-15
SLIDE 15

Variants

15

Clone Benign PDFs Malicious PDF

Mutation

01011001101

Variants Variants

Select Variants

✓ ✓ ✗ ✓

Based on Genetic Programming

Automated Evasion Approach

/Catalog /Pages /JavaScript eval(‘…’); /Root

Mutation

Variants From Benign

128 546

Insert / Replace / Delete

slide-16
SLIDE 16

Variants

16

Clone Benign PDFs Malicious PDF

Mutation

01011001101

Variants Variants

Select Variants

✓ ✓ ✗ ✓

Based on Genetic Programming

Automated Evasion Approach

/Catalog /Pages /JavaScript eval(‘…’); /Root

Mutation

Variants From Benign

128 546

Insert / Replace / Delete

slide-17
SLIDE 17

Variants

17

Clone Benign PDFs Malicious PDF

Mutation

01011001101

Variants Variants

Select Variants

✓ ✓ ✗ ✓

Based on Genetic Programming

Automated Evasion Approach

/Catalog /Pages /JavaScript eval(‘…’); /Root

Mutation

Variants From Benign

128 546 128

Insert / Replace / Delete

slide-18
SLIDE 18

Variants

18

Clone Benign PDFs Malicious PDF

Mutation

01011001101

Variants Variants

Select Variants

✓ ✓ ✗ ✓

Based on Genetic Programming

Automated Evasion Approach

/Catalog /Pages /JavaScript eval(‘…’); /Root

Mutation

Variants From Benign

128

Insert / Replace / Delete

slide-19
SLIDE 19

Variants

19

Clone Benign PDFs Malicious PDF

Mutation

01011001101

Variants Variants

Select Variants

✓ ✓ ✗ ✓

Based on Genetic Programming

Automated Evasion Approach

/Catalog /Pages /JavaScript eval(‘…’); /Root

Mutation

Variants From Benign

128

Insert / Replace / Delete

slide-20
SLIDE 20

Variants

20

Clone Benign PDFs Malicious PDF

Mutation

01011001101

Variants Variants

Select Variants

✓ ✓ ✗ ✓

Based on Genetic Programming

Automated Evasion Approach

slide-21
SLIDE 21

Variants

21

Clone Benign PDFs Malicious PDF

Mutation

01011001101

Variants Variants

Select Variants

✓ ✓ ✗ ✓

Based on Genetic Programming

Automated Evasion Approach

Fitness Function Oracle Target Classifier

f(x)

Malicious? Score Fitness Score Variants

slide-22
SLIDE 22

Variants

22

Clone Benign PDFs Malicious PDF

Mutation

01011001101

Variants Variants

Select Variants

✓ ✓ ✗ ✓

Based on Genetic Programming

Automated Evasion Approach

Fitness Function Oracle Target Classifier

f(x)

Malicious? Score Fitness Score Variants Malicious Benign

slide-23
SLIDE 23

Variants

23

Clone Benign PDFs Malicious PDF

Mutation

01011001101

Variants Variants

Select Variants

✓ ✓ ✗ ✓

Based on Genetic Programming

Automated Evasion Approach

slide-24
SLIDE 24

Results: Evaded PDFrate 100%

24

Original Malware Seeds

slide-25
SLIDE 25

Results: Evaded PDFrate 100%

25

Original Malware Seeds Evasive Variants

slide-26
SLIDE 26

Evaded PDFrate with Adjusted Threshold

26

Original Malware Seeds Evasive Variants Evasive Variants with lower threshold

slide-27
SLIDE 27

Results: Evaded Hidost 100%

27

Original Malware Seeds

slide-28
SLIDE 28

Results: Evaded Hidost 100%

28

Original Malware Seeds Evasive Variants

slide-29
SLIDE 29

29

Difficulty varies by seed Simple mutations often work Complex mutations sometimes needed. Difficulty varied by targets: PDFrate: 6 days to evade all Hidost: 2 days to evade all

Results: Accumulated Evasion Rate

slide-30
SLIDE 30

Cross-Evasion Effects

30

PDF Malware Seeds

Hidost

Evasive PDF Malware (against Hidost) Automated Evasion

PDFrate

387/500 Evasive (77.4%) 3/500 Evasive (0.6%)

Gmail’s classifier is secure?

slide-31
SLIDE 31

Cross-Evasion Effects

31

PDF Malware Seeds

Hidost

Evasive PDF Malware (against Hidost) Automated Evasion

PDFrate

387/500 Evasive (77.4%) 3/500 Evasive (0.6%)

Gmail’s classifier is secure? different.

slide-32
SLIDE 32

Evading Gmail’s Classifier

32

Evasion rate on : 135/380 (35.5%)

slide-33
SLIDE 33

Evading Gmail’s Classifier

33

Evasion rate on : 179/380 (47.1%)

slide-34
SLIDE 34

Conclusion

34

Source Code: http://EvadeML.org

Vs. Who will win this arm race?