Shrinking and Exploring Adversarial Search Spaces
David Evans
University of Virginia
ARO Workshop on Adversarial Learning Stanford, 14 Sept 2017
Weilin Xu Yanjun Qi
Shrinking and Exploring David Evans University of Virginia - - PowerPoint PPT Presentation
evadeML. L.org Shrinking and Exploring David Evans University of Virginia Adversarial Search Spaces ARO Workshop on Adversarial Learning Stanford, 14 Sept 2017 Weilin Xu Yanjun Qi Machine Learning is Eating Computer Science 1 Security
Weilin Xu Yanjun Qi
1
Random guessing attack success probability Threat models Proofs Cryptography
π"πππ
information theoretic, resource bounded required System Security
π"ππ
capabilities, motivations, rationality common Adversarial Machine Learning
π"ππ *; π"π
white-box, black-box rare!
2
3
0.007 Γ [ππππ‘π] + =
βpandaβ βgibbonβ
Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples. 2014.
4
β π¦, π¦6 is defined in some (simple!) metric space:
Model Model Model
Squeezer1
Squeezer2
Prediction0 Prediction1 Prediction2 π(ππ ππ@, ππ ππA, β¦ , ππ ππK)
Yes
Input Adversarial
No
Legitimate Modelβ
Squeezerk
Predictionk
6
O = round(π OΓ4)/4
O = round(π OΓ4)/4
7
8-bit greyscale 1-bit monochrome
3x3 smoothing: Replace with median of pixels and its neighbors
Model (7-layer CNN) Model Model
Bit Depth- 1
Median 2Γ2
Prediction0 Prediction1 Prediction2
Yes
Input Adversarial
No
Legitimate
max πA π@, πA , πA π@, π2 > π’
9
Maximum πAdistance between original and squeezed input
threshold = 0.0029 detection: 98.2%, FP < 4%
Model (MobileNet) Model Model
Bit Depth- 5
Median 2Γ2
Prediction0 Prediction1 Prediction2
Yes
Input Adversarial
No
Legitimate
max(πA(π@, {πA, πB, πb}) > π’
Model
Non-local Mean
Prediction3
11
Maximum πAdistance between original and squeezed input
threshold = 1.24 detection: 85%, FP < 5%
Dataset Most Effective Squeezers Found Threshold Detection Rate False Positive Rate ROC-AUC Successful AEs Failed AEs Including FAEs Excluding FAEs
MNIST
Bit Depth (1-bit), Median (2x2)
0.0029 98.2% 20.0% 3.98% 94.5% 99.6%
CIFAR-10
Bit Depth (5-bit), Median (2x2), Non-local Mean (13-3-2)
1.1402 85.0% 9.1% 4.93% 95.7% 95.9%
ImageNet
Bit Depth (5-bit), Median (2x2), Non-local Mean (11-3-4)
1.2476 85.2% 25.0% 4.70% 94.0% 94.5%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
JSMA (LL) JSMA (Next) CW0 (LL) CW0 (Next) CW2 (LL) CW2 (Next) DeepFool CWβ (LL) CWβ (Next) BIM FGSM
MNIST CIFAR-10 ImageNet
14
Composes with model-based defenses π =
15
WOOT (August 2017)
Incorporate πAsqueezed distance into loss function Untargeted Targeted (Next) Targeted (Least Likely) 64% 41% 21% (Adversary success rate on MNIST)
16
Metric Space 1: Target Classifier Metric Space 2: βOracleβ
Before: find a small perturbation that changes class for classifier, but imperceptible to oracle. Now: change class for both original and squeezed classifier, but imperceptible to oracle.
17
Model Model Model
Squeezer1
Squeezer2
Prediction0 Prediction1 Prediction2 π(ππ ππ@, ππ ππA, β¦ , ππ ππK)
Yes
Input Adversarial
No
Legitimate Modelβ
Squeezerk
Predictionk
19
CCS 2017
Pick a random autoencoder
20
focus of rest of the talk
21
Metric Space 1: Machine Metric Space 2: Human Metric Space 1: Machine 1 Metric Space 2: Machine 2
Variants
Clone Benign PDFs Malicious PDF
Mutation
Variants Variants
Select Variants
β β β β
Found Evasive?
Benign Oracle
Variants
Clone Benign PDFs Malicious PDF
Mutation
Variants Variants
Select Variants
β β β β
Found Evasive?
Variants
Clone Benign PDFs Malicious PDF
Mutation
Variants Variants
Select Variants
β β β β
Found Evasive? Found Evasive ?
/JavaScript eval(ββ¦β); /Root /Catalog /Pages
Randomly transform: delete, insert, replace
Variants
Clone Benign PDFs Malicious PDF
Mutation
Variants Variants
Select Variants
Found Evasive? Found Evasive ?
Randomly transform: delete, insert, replace Nodes from Benign PDFs
/JavaScript eval(ββ¦β); /Root /Catalog /Pages 128 546 7 63 128
Variants
Clone Benign PDFs Malicious PDF
Mutation
Variants Variants
Select Variants
β β β β
Found Evasive?
Variants
Clone Benign PDFs Malicious PDF
Mutation
Variants Variants
Select Variants
β β β β
Found Evasive?
Fitness Function
Candidate Variant
Score
Malicious
/JavaScript eval(ββ¦β); /Root /Catalog /Pages 128
Oracle Target Classifier
https://github.com/cuckoosandbox Simulated network: INetSim
HTTP_URL + HOST extracted from API traces
Charles Smutz, Angelos Stavrou. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016.
Variants
Clone Benign PDFs Malicious PDF
Mutation
Variants Variants
Select Variants
β β β β
Found Evasive?
Fitness Function
Candidate Variant
Score
Malicious
/JavaScript eval(ββ¦β); /Root /Catalog /Pages 128
Oracle Target Classifier
Variants
Clone Benign PDFs Malicious PDF
Mutation
Variants Variants
Select Variants
β β β β
Found Evasive?
Fitness Function
Candidate Variant
Score
Malicious
/JavaScript eval(ββ¦β); /Root /Catalog /Pages 128
Oracle Target Classifier ACM CCS 2017
La Labelle lled Tr Training ining Data
ML ML Alg Algor
hm Fe Feature Ex Extraction Vectors
Malicious / Benign Operational Data Trained Classifier
(supervised learning)
La Labelle lled Tr Training ining Data
ML ML Alg Algor
hm Fe Feature Ex Extraction Vectors
(supervised learning)
Clone
Original classifier: Takes 614 generations to evade all seeds
Genome Contagio Benign
50
/Names /Names /JavaScript /Names /JavaScript /Names /Names /JavaScript /JS /OpenAction /OpenAction /JS /OpenAction /S /Pages
Model
FGSM, BIM, JSMA, DeepFool, CW2, CWβ, CW0 MNIST CIFAR-10 ImageNet CNN DenseNet MobileNets Feature Squeezing
Weilin Xu, Andrew Norton, Noah Kim, Yanjun Qi
Visualization evademl.org/zoo
52
source code, papers