Erkut ErdemAykut ErdemLevent Karacan
Computer Vision Lab, Hacettepe University
Adversarial Training
Attacks on Deep Networks and Generative Adversarial Networks
images from Geri’s Game (Pixar, 1997)
Adversarial Training Attacks on Deep Networks and Generative - - PowerPoint PPT Presentation
images from Geris Game (Pixar, 1997) Adversarial Training Attacks on Deep Networks and Generative Adversarial Networks Erkut Erdem Aykut Erdem Levent Karacan Computer Vision Lab, Hacettepe University Outline Part 1: Attacks on
Erkut ErdemAykut ErdemLevent Karacan
Computer Vision Lab, Hacettepe University
images from Geri’s Game (Pixar, 1997)
2
10 Minutes Break
Computer Vision Lab, Hacettepe University
John Carpenter’s The Thing (1982)
4
Σ ⁞ b w1 wD w2 1 x1 x2 xD S P(y = 1 | x, w, b)
linear weighting accumulation non-linear activation
networks of simple units
Slide adapted from Rob Fergus 5
effective way to train multi-layered networks
(inspired by Hubel and Wiesel’s simple/complex cells)
6
INPUT 32x32
Convolutions Subsampling Convolutions
C1: feature maps 6@28x28
Subsampling
S2: f. maps 6@14x14 S4: f. maps 16@5x5 C5: layer 120 C3: f. maps 16@10x10 F6: layer 84
Full connection Full connection Gaussian connections
OUTPUT 10
Slide adapted from Rob Fergus
1. Deep neural network models (supervised training)
7 Slide credit: Rob Fergus
Processing Units (GPUs)
deliver 10 Tflops
in the world in 2000
1980’s Sun workstation
8 Slide adapted from Rob Fergus
[AlexNet by Krizhevsky et al. 2012]
− 8 layer Convolutional network model [LeCun et al. 1989] − 7 hidden layers, 650,000 neurons, ~60,000,000 parameters − Trained on 1.2 million ImageNet images (with labels) − GPU implementation (50x speedup over CPU) − Training time: 1 week on pair of GPUs
9
Joshua Drewe
10
Model [parameters θ]
Joshua Drewe
Training: Adjust model parameters θ so predicted labels match true labels across training set
11
Excellent performance in most image understanding tasks Learn a sequence of general-purpose representations Millions of parameters learned from data The “meaning” of the representation is unclear
[AlexNet by Krizhevsky et al. 2012]
[AlexNet by Krizhevsky et al. 2012]
12 Slide credit: Andrea Vedaldi
Σ x y F
− Convolution is local Filters look locally Parameter sharing − Translation invariant Filters act the same everywhere
Σ ⁞ b f1 fD f2 1 x1 x2 xD S Σ Fq
lattice structure
Σ Fq
multiple feature channels
13 Slide credit: Andrea Vedaldi
1 1 4 1 1
1/8
14
4
15
1
2
1
16
32 32 3 Convolutional Layer activation maps 6 28 28 We stack these up to get an output of size 28x28x6.
Slide credit: Alex Karpathy 17
and more manageable
independently:
1 1 2 4 5 6 7 8 3 2 1 1 2 3 4
Single depth slice x y max pool with 2x2 filters and stride 2
6 8 3 4
18 Slide adapted from Alex Karpathy
Neural Networks
19
Slide credit: Alex Karpathy 20
Slide credit: Yann LeCun 20
(van der Maaten & Hinton)
that locally, pairwise distances are conserved
wherever
MNIST digits (0-9) in 2D
Slide credit: Alex Karpathy 21
3x3 conv, 64 3x3 conv, 64, pool/2 3x3 conv, 128 3x3 conv, 128, pool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256, pool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512, pool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512, pool/2 fc, 4096 fc, 4096 fc, 1000
, Shaoqing Ren, & Jian S
p
11x11 conv, 96, /4, pool/25x5 conv, 256, pool/2 3x3 conv, 384 3x3 conv, 384 3x3 conv, 256, pool/2 fc, 4096 fc, 4096 fc, 1000 rs
AlexNet, 8 layers (ILSVRC 2012) VGG, 19 layers (ILSVRC 2014)
s
Image Recognition”. CVP
g ck nnection GoogLeNet, 22 layers (ILSVRC 2014)
pooling operation.
23
backbone structure ImageNet data classification network pre-train features detection network detection data fine-tune
“plug-in” features detectors
independently developed “plug-in” feature detectors “plug-in” feature
developed independently
Slide credit: Kaiming He 24
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image
Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. Faster R- CNN: Towards Real-Time Object Detection with Region Proposal
Slide credit: Kaiming He 26
*the ori
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image
Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. Faster R- CNN: Towards Real-Time Object Detection with Region Proposal
Slide credit: Kaiming He 27
27
28
29
Sample x f(x)=ytrue
Machine Learning System
Joshua Drewe 30
Adversarial example a
(indistinguishable from x)
f(a)≠ytrue
Machine Learning System
Joshua Drewe 31
(Pinna and Gregory, 2002) These are concentric circles, not intertwined spirals.
Slide adapted from Ian Goodfellow 32
learning systems.
Slide adapted from Ian Goodfellow 33
1. Poisoning training sets
training set
inference phase)
system, but no sense at all to us
34
Slide credit: Ian Goodfellow 35
Slide credit: Alex Karpathy 36
2
3
2 2 1
5 1
1
1
1 1
1
x w
input example weights
Slide credit: Alex Karpathy 37
class 1 score = dot product: = -2 + 1 + 3 + 2 + 2 - 2 + 1 - 4 - 5 + 1 = -3 => probability of class 1 is 1/(1+e^(-(-3))) = 0.0474 i.e. the classifier is 95% certain that this is class 0 example.
2
3
2 2 1
5 1
1
1
1 1
1
input example weights
x w
Slide credit: Alex Karpathy 38
2
3
2 2 1
5 1
1
1
1 1
1 ? ? ? ? ? ? ? ? ? ? adversarial x
class 1 score = dot product: = -2 + 1 + 3 + 2 + 2 - 2 + 1 - 4 - 5 + 1 = -3 => probability of class 1 is 1/(1+e^(-(-3))) = 0.0474 i.e. the classifier is 95% certain that this is class 0 example.
input example weights
x w
Slide credit: Alex Karpathy 39
2
3
2 2 1
5 1
1
1
1 1
1 1.5
3.5
2.5 1.5 1.5
4.5 1.5
class 1 score before:
=> probability of class 1 is 1/(1+e^(-(-3))) = 0.0474
=> probability of class 1 is now 1/(1+e^(-(2))) = 0.88 i.e. we improved the class 1 probability from 5% to 88%
adversarial x
x w
input example weights
Slide credit: Alex Karpathy 40
2
3
2 2 1
5 1
1
1
1 1
1 1.5
3.5
2.5 1.5 1.5
4.5 1.5 This was only with 10 input
input image has 150,528. (It’s significantly easier with more numbers, need smaller nudge for each)
class 1 score before:
=> probability of class 1 is 1/(1+e^(-(-3))) = 0.0474
=> probability of class 1 is now 1/(1+e^(-(2))) = 0.88 i.e. we improved the class 1 probability from 5% to 88%
adversarial x
x w
input example weights
Slide credit: Alex Karpathy 41
Recall CIFAR-10 linear classifiers: ImageNet classifiers: http://karpathy.github.io/2015/03/30/breaking-convnets/
Slide credit: Alex Karpathy 42
mix in a tiny bit of Goldfish classifier weights
+ =
100% Goldfish
Slide credit: Alex Karpathy 43
Slide credit: Alex Karpathy 44
Slide credit: Alex Karpathy 45
(Szegedy et al., 2013)
correct +distort
correct +distort
f: classifier function x: input image r: distortion l: target label
l) subject to x + r 2 [0, 1]m
46
(Goodfellow et al., 2014)
Slide credit: Ian Goodfellow 47
Score of label ytrue, given input image X
e.g. cross entropy loss
(Goodfellow et al., 2014)
Slide credit: Ian Goodfellow 48
The Fast Gradient Sign Method
(Goodfellow et al., 2014)
while increasing the model’s prediction error
49
Slide credit: Ian Goodfellow 50
Slide credit: Ian Goodfellow 51
Rectified linear unit Carefully tuned sigmoid Maxout LSTM
Slide credit: Ian Goodfellow 52
53
Xadv = X + ✏ sign
= X, Xadv
N+1 = ClipX,✏
n Xadv
N
+ ↵ sign
N , ytrue)
n 255, X(x, y, z)+✏, max
y
Xadv = X, Xadv
N+1 = ClipX,✏
N
↵ sign
N , yLL)
Original sample
54
Clean image “Fast”; L∞ distance to clean image = 32
Basic Iterative Method Fast Gradient Iterative Least-Likely Class Method
(Kurakin, Goodfellow, Bengio, 2017)
Slide credit: Ian Goodfellow 55
56
57
Although state-of-the-art deep neural networks can increasingly recognize natural images (left panel), they also are
Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images
(Nguyen, Yosinski, Clune, 2014)
Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images
(Nguyen, Yosinski, Clune, 2014)
>99.6% confidences
58
Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images
(Nguyen, Yosinski, Clune, 2014)
>99.6% confidences
59
Weight decay Adding noise at test time Adding noise at train time Dropout Ensembles Multiple glimpses Generative pretraining Removing perturbation with an autoencoder Error correcting codes Confidence-reducing perturbation at test time Various non-linear units Double backprop
Slide credit: Ian Goodfellow 60
1. Adversarial training (Szegedy et al., 2013)
explicitly trained not to be fooled by each of them.
rather than hard decisions about which class to output
adversary
61
Labeled as bird Decrease probability
Still has same label (bird)
Slide credit: Ian Goodfellow 62
63
training target adversarial regularization
Unlabeled; model guesses it’s probably a bird, maybe a plane Adversarial perturbation intended to change the guess New guess should match old guess (probably bird, maybe plane)
Slide credit: Ian Goodfellow 64
50 100 150 200 250 300 Training time (epochs) 10−2 10−1 100 Test misclassification rate
Train=Clean, Test=Clean Train=Clean, Test=Adv Train=Adv, Test=Clean Train=Adv, Test=Adv
Slide credit: Ian Goodfellow 65
adversarial training is less useful, very similar to weight decay
adversarial examples of any machine learning model.
66
re-configuring this “softmax” layer.
increase robustness of the deep neural network model.
labels to train the distilled model.
qi = exp(zi/T )
T: a temperature that is normally set to 1 qi: class probability
67
68
Class Probabilities Knowledge
Training Data X DNN F trained at temperature T Training Labels Y Probability Vector Predictions F(X) Training Data X DNN trained at temperature T Training Labels F(X) Probability Vector Predictions .
Initial Network Distilled Network 1 2 3 4 5
1 0.02 0.92 0.04 0.02 0.02 0.92 0.04 0.02 0.03 0.93 0.01 0.03F d(X) F d(X)
0.02 0.92 0.04 0.02Papernot et al., 2016
∂C ∂zi = 1 T (qi − pi) = 1 T
evi/T
1. Train a first instance of the neural network by using the training data (X, Y) where the labels Y indicate the correct class of samples X.
(X,f(X)) where the new class labels are the probability vectors quantifying the likeliness of X being in each class.
dataset (X,f(X)).
69
While training the first and the distilled network, use the same high T
0.0% 0.5% 1.0% 1.5% 1 2 5 10 20 30 40 50 100 Accuracy Variation after Distillation Distillation Temperature MNIST Test Set Accuracy Variation CIFAR10 Test Set Accuracy Variation
Accuracy Variation after Distillation
network, and reduces its sensitivity to small input variations.
10 20 30 40 50 60 70 80 90 100 20 40 60 80 100 Attack Success Rate (%) Softmax Temperature during Distillation Success Rate (on distilled network) Baseline Success Rate (no distillation)
On the MNIST model - a 9 layer deep neural network with a 99.5% test accuracy
(Papernot and McDaniel, 2016)
70
10,000 samples from the CIFAR10 test set into bins according to the mean value of their adversarial gradient
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% No Distillation T=1 T=2 T=5 T=10 T=20 T=30 T=40 T=50 T=100 Frequency of Adversarial Gradient Mean Amplitudes Distillation Temperature
0 - 10^-40 10^-40 - 10^-35 10^-35 - 10^-30 10^-30 - 10^-25 10^-25 - 10^-20 10^-20 - 10^-15 10^-15 - 10^-10 10^-10 - 10^-5 10^-5 - 10^-3 10^-3 - 1
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% No Distillation T=1 T=2 T=5 T=10 T=20 T=30 T=40 T=50 T=100 Frequency of Adversarial Gradient Mean Amplitudes Distillation Temperature
0 - 10^-40 10^-40 - 10^-35 10^-35 - 10^-30 10^-30 - 10^-25 10^-25 - 10^-20 10^-20 - 10^-15 10^-15 - 10^-10 10^-10 - 10^-5 10^-5 - 10^-3 10^-3 - 1
0 − 10−40 10−40 − 10−35 10−35 − 10−30 10−30 − 10−25 10−25 − 10−20 10−20 − 10−15 10−15 − 10−10 10−10 − 10−5 10−5 − 10−3 10−3 − 100
71
network models.
algorithms can be easily fooled.
parameters.
72