Shrinking and Exploring David Evans University of Virginia - - PowerPoint PPT Presentation

β–Ά
shrinking and exploring
SMART_READER_LITE
LIVE PREVIEW

Shrinking and Exploring David Evans University of Virginia - - PowerPoint PPT Presentation

evadeML. L.org Shrinking and Exploring David Evans University of Virginia Adversarial Search Spaces ARO Workshop on Adversarial Learning Stanford, 14 Sept 2017 Weilin Xu Yanjun Qi Machine Learning is Eating Computer Science 1 Security


slide-1
SLIDE 1

Shrinking and Exploring Adversarial Search Spaces

David Evans

University of Virginia

ARO Workshop on Adversarial Learning Stanford, 14 Sept 2017

Weilin Xu Yanjun Qi

evadeML. L.org

slide-2
SLIDE 2

Machine Learning is Eating Computer Science

1

slide-3
SLIDE 3

Security State-of-the-Art

Random guessing attack success probability Threat models Proofs Cryptography

πŸ‘"πŸπŸ‘πŸ—

information theoretic, resource bounded required System Security

πŸ‘"πŸ’πŸ‘

capabilities, motivations, rationality common Adversarial Machine Learning

πŸ‘"𝟐𝟐 *; πŸ‘"πŸ•

white-box, black-box rare!

2

slide-4
SLIDE 4

Adversarial Examples

3

0.007 Γ— [π‘œπ‘π‘—π‘‘π‘“] + =

β€œpanda” β€œgibbon”

Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples. 2014.

slide-5
SLIDE 5

Adversarial Examples Game

4

Given seed sample, 𝑦, find 𝑦6 where: 𝑔 𝑦6 β‰  𝑔(𝑦) Class is different (untargeted) 𝑔 𝑦6 = 𝑒 Class is 𝑒 (targeted) βˆ† 𝑦, 𝑦6 ≀ πœ€ Difference below threshold

βˆ† 𝑦, 𝑦6 is defined in some (simple!) metric space:

𝑀@ β€œnorm (# different), 𝑀Anorm, 𝑀Bnorm (β€œEuclidean”), 𝑀Cnorm:

slide-6
SLIDE 6

Model Model Model

Squeezer1

Squeezer2

Prediction0 Prediction1 Prediction2 π’ˆ(π‘žπ‘ π‘“π‘’@, π‘žπ‘ π‘“π‘’A, … , π‘žπ‘ π‘“π‘’K)

Yes

Input Adversarial

No

Legitimate Model’

Squeezerk

…

Predictionk

Detecting Adversarial Examples

slide-7
SLIDE 7

β€œFeature Squeezing”

6

[0.054, 0.4894, 0.9258, 0.0116, 0.2898, 0.5222, 0.5074, …] [0.0491, 0.4903, 0.9292, 0.009, 0.2942, 0.5243, 0.5078, …] [0.0, 0.5, 1.0, 0.0, 0.25, 0.5, 0.5, …] π’š6 π’š Squeeze: 𝑔

O = round(𝑔 OΓ—4)/4

Squeeze: 𝑔

O = round(𝑔 OΓ—4)/4

[0.0, 0.5, 1.0, 0.0, 0.25, 0.5, 0.5, …] squeeze π’š6 β‰ˆ squeeze π’š ⟹ 𝑔(squeeze π’š6 ) β‰ˆ 𝑔(squeeze π’š )

slide-8
SLIDE 8

Example Squeezers

7

Reduce Color Depth Median Smoothing

8-bit greyscale 1-bit monochrome

3x3 smoothing: Replace with median of pixels and its neighbors

slide-9
SLIDE 9

Simple Instantiation

Model (7-layer CNN) Model Model

Bit Depth- 1

Median 2Γ—2

Prediction0 Prediction1 Prediction2

Yes

Input Adversarial

No

Legitimate

max 𝑀A π‘ž@, π‘žA , 𝑀A π‘ž@, π‘ž2 > 𝑒

slide-10
SLIDE 10

9

200 400 600 800 0.0 0.4 0.8 1.2 1.6 2.0 Number of Examples

Legitimate Adversarial

Maximum 𝑀Adistance between original and squeezed input

threshold = 0.0029 detection: 98.2%, FP < 4%

slide-11
SLIDE 11

ImageNet Configuration

Model (MobileNet) Model Model

Bit Depth- 5

Median 2Γ—2

Prediction0 Prediction1 Prediction2

Yes

Input Adversarial

No

Legitimate

max(𝑀A(π‘ž@, {π‘žA, π‘žB, π‘žb}) > 𝑒

Model

Non-local Mean

Prediction3

slide-12
SLIDE 12

11

20 40 60 80 100 120 140 0.0 0.4 0.8 1.2 1.6 2.0

Legitimate Adversarial

Maximum 𝑀Adistance between original and squeezed input

threshold = 1.24 detection: 85%, FP < 5%

slide-13
SLIDE 13

Detection Performance Summary

Dataset Most Effective Squeezers Found Threshold Detection Rate False Positive Rate ROC-AUC Successful AEs Failed AEs Including FAEs Excluding FAEs

MNIST

Bit Depth (1-bit), Median (2x2)

0.0029 98.2% 20.0% 3.98% 94.5% 99.6%

CIFAR-10

Bit Depth (5-bit), Median (2x2), Non-local Mean (13-3-2)

1.1402 85.0% 9.1% 4.93% 95.7% 95.9%

ImageNet

Bit Depth (5-bit), Median (2x2), Non-local Mean (11-3-4)

1.2476 85.2% 25.0% 4.70% 94.0% 94.5%

slide-14
SLIDE 14

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Detection Performance

JSMA (LL) JSMA (Next) CW0 (LL) CW0 (Next) CW2 (LL) CW2 (Next) DeepFool CW∞ (LL) CW∞ (Next) BIM FGSM

MNIST CIFAR-10 ImageNet

slide-15
SLIDE 15

14

Composes with model-based defenses πœ— =

slide-16
SLIDE 16

Arms Race?

15

WOOT (August 2017)

Incorporate 𝑀Asqueezed distance into loss function Untargeted Targeted (Next) Targeted (Least Likely) 64% 41% 21% (Adversary success rate on MNIST)

slide-17
SLIDE 17

Raising the Bar or Changing the Game?

16

Metric Space 1: Target Classifier Metric Space 2: β€œOracle”

Before: find a small perturbation that changes class for classifier, but imperceptible to oracle. Now: change class for both original and squeezed classifier, but imperceptible to oracle.

slide-18
SLIDE 18

β€œFeature Squeezing” Conjecture

For any distance-limited adversarial method, there exists some feature squeezer that accurately detects its adversarial examples.

17

Intuition: if the perturbation is small (in some simple metric space), there is some squeezer that coalesces

  • riginal and adversarial example into same sample.
slide-19
SLIDE 19

Model Model Model

Squeezer1

Squeezer2

Prediction0 Prediction1 Prediction2 π’ˆ(π‘žπ‘ π‘“π‘’@, π‘žπ‘ π‘“π‘’A, … , π‘žπ‘ π‘“π‘’K)

Yes

Input Adversarial

No

Legitimate Model’

Squeezerk

…

Predictionk

Defender’s

En Entropy py

Advantage

random seed

slide-20
SLIDE 20

More Complex Squeezers + Entropy

19

CCS 2017

Pick a random autoencoder

slide-21
SLIDE 21

Changing the Game

Option 1: Find distance-limited adversarial methods for which it is intractable to find effective feature squeezers. Option 2: Redefine adversarial examples so distance is not limited in a simple metric space...

20

focus of rest of the talk

slide-22
SLIDE 22

Do Humans Matter?

21

Metric Space 1: Machine Metric Space 2: Human Metric Space 1: Machine 1 Metric Space 2: Machine 2

slide-23
SLIDE 23

Malware Classifiers

slide-24
SLIDE 24

Variants

Automated Classifier Evasion Using Genetic Programming

Clone Benign PDFs Malicious PDF

Mutation

Variants Variants

Select Variants

βœ“ βœ“ βœ— βœ“

Found Evasive?

Benign Oracle

slide-25
SLIDE 25

Variants

Generating Variants

Clone Benign PDFs Malicious PDF

Mutation

Variants Variants

Select Variants

βœ“ βœ“ βœ— βœ“

Found Evasive?

slide-26
SLIDE 26

Variants

Generating Variants

Clone Benign PDFs Malicious PDF

Mutation

Variants Variants

Select Variants

βœ“ βœ“ βœ— βœ“

Found Evasive? Found Evasive ?

/JavaScript eval(β€˜β€¦β€™); /Root /Catalog /Pages

Select random node

Randomly transform: delete, insert, replace

slide-27
SLIDE 27

Variants

Generating Variants

Clone Benign PDFs Malicious PDF

Mutation

Variants Variants

Select Variants

Found Evasive? Found Evasive ?

Select random node

Randomly transform: delete, insert, replace Nodes from Benign PDFs

/JavaScript eval(β€˜β€¦β€™); /Root /Catalog /Pages 128 546 7 63 128

slide-28
SLIDE 28

Variants

Selecting Promising Variants

Clone Benign PDFs Malicious PDF

Mutation

Variants Variants

Select Variants

βœ“ βœ“ βœ— βœ“

Found Evasive?

slide-29
SLIDE 29

Variants

Selecting Promising Variants

Clone Benign PDFs Malicious PDF

Mutation

Variants Variants

Select Variants

βœ“ βœ“ βœ— βœ“

Found Evasive?

Fitness Function

Candidate Variant

𝑔(𝑑ghijkl, 𝑑jkimm)

Score

Malicious

/JavaScript eval(β€˜β€¦β€™); /Root /Catalog /Pages 128

Oracle Target Classifier

slide-30
SLIDE 30

Oracle

Execute candidate in vulnerable Adobe Reader in virtual environment Behavioral signature: malicious if signature matches

https://github.com/cuckoosandbox Simulated network: INetSim

Cuckoo

HTTP_URL + HOST extracted from API traces

Advantage: we know the target malware behavior

slide-31
SLIDE 31

Fitness Function

Assumes lost malicious behavior will not be recovered

𝑔 𝑀 = o.5 βˆ’ classifier_score 𝑀 if oracle 𝑀 = "malicious" βˆ’βˆž otherwise classifier_score β‰₯ 0.5: labeled malicious

slide-32
SLIDE 32

100 200 300 400 500 100 200 300

Seeds Evaded

(out of 500)

PDFRate Number of Mutations Hidost

slide-33
SLIDE 33

100 200 300 400 500 100 200 300

Seeds Evaded

(out of 500)

PDFRate Number of Mutations Hidost

Simple transformations

  • ften worked
slide-34
SLIDE 34

100 200 300 400 500 100 200 300

Seeds Evaded

(out of 500)

PDFRate Number of Mutations Hidost

(insert insert, /Root/Pages/Kids, 3:/Root/Pages/Kids/4/Kids/5/)

Works on 162/500 seeds

slide-35
SLIDE 35

100 200 300 400 500 100 200 300

Seeds Evaded

(out of 500)

PDFRate Number of Mutations Hidost

Works on 162/500 seeds

Some seeds required complex transformations

slide-36
SLIDE 36

Possible Defenses

slide-37
SLIDE 37

Possible Defense: Adjust Threshold

Charles Smutz, Angelos Stavrou. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016.

slide-38
SLIDE 38

Original Malicious Seeds

Evading PDFrate

Malicious Label Threshold

slide-39
SLIDE 39

Discovered Evasive Variants

Adjust threshold?

slide-40
SLIDE 40

Adjust threshold?

Variants found with threshold = 0.25 Variants found with threshold = 0.50

slide-41
SLIDE 41

Possible Defense: Hide Classifier

slide-42
SLIDE 42

Variants

Hide the Classifier Score?

Clone Benign PDFs Malicious PDF

Mutation

Variants Variants

Select Variants

βœ“ βœ“ βœ— βœ“

Found Evasive?

Fitness Function

Candidate Variant

𝑔(𝑑ghijkl, 𝑑jkimm)

Score

Malicious

/JavaScript eval(β€˜β€¦β€™); /Root /Catalog /Pages 128

Oracle Target Classifier

slide-43
SLIDE 43

Variants

Binary Classifier Output is Enough

Clone Benign PDFs Malicious PDF

Mutation

Variants Variants

Select Variants

βœ“ βœ“ βœ— βœ“

Found Evasive?

Fitness Function

Candidate Variant

𝑔(𝑑ghijkl, 𝑑jkimm)

Score

Malicious

/JavaScript eval(β€˜β€¦β€™); /Root /Catalog /Pages 128

Oracle Target Classifier ACM CCS 2017

slide-44
SLIDE 44

Possible Defense: Retrain Classifier

slide-45
SLIDE 45

La Labelle lled Tr Training ining Data

ML ML Alg Algor

  • rithm

hm Fe Feature Ex Extraction Vectors

Deployment

Malicious / Benign Operational Data Trained Classifier

Training

(supervised learning)

Retrain Classifier

slide-46
SLIDE 46

La Labelle lled Tr Training ining Data

ML ML Alg Algor

  • rithm

hm Fe Feature Ex Extraction Vectors

Training

(supervised learning)

Clone

EvadeML

Deployment

slide-47
SLIDE 47

100 200 300 400 500 200 400 600 800 Seeds Evaded (out of 500) Generations

Hidost16

Original classifier: Takes 614 generations to evade all seeds

slide-48
SLIDE 48

100 200 300 400 500 200 400 600 800

HidostR1

Seeds Evaded (out of 500) Generations

Hidost16

slide-49
SLIDE 49

100 200 300 400 500 200 400 600 800

HidostR1 HidostR2

Seeds Evaded (out of 500) Generations

Hidost16

slide-50
SLIDE 50

100 200 300 400 500 200 400 600 800

Hidost16

Genome Contagio Benign

Hidost16 0.00 0.00 HidostR1 0.78 0.30 HidostR2 0.85 0.53

False Positive Rates

HidostR1

Seeds Evaded (out of 500) Generations

HidostR2

slide-51
SLIDE 51

50

Only 8/6987 robust features (Hidost) Robust classifier High false positives

/Names /Names /JavaScript /Names /JavaScript /Names /Names /JavaScript /JS /OpenAction /OpenAction /JS /OpenAction /S /Pages

slide-52
SLIDE 52

EvadeML-Zoo: an AML Toolbox

Model

FGSM, BIM, JSMA, DeepFool, CW2, CW∞, CW0 MNIST CIFAR-10 ImageNet CNN DenseNet MobileNets Feature Squeezing

Weilin Xu, Andrew Norton, Noah Kim, Yanjun Qi

Visualization evademl.org/zoo

slide-53
SLIDE 53

Open Questions

Can we close the gap between experimental techniques (that work on complex models) and formal methods (that work on small models)? Reducing adversarial search space Will classifiers ever be good enough to apply β€œcrypto” standards to adversarial examples? Is PDF Malware the MNIST of malware classification?

52

EvadeML.org

slide-54
SLIDE 54

David Evans

University of Virginia

evans@virginia.edu

EvadeML.org

source code, papers