A Two-Pronged Defense against Adversarial Examples
Dongyu Meng Hao Chen
ShanghaiTech University, China University of California, Davis, USA
Mag Net A Two-Pronged Defense against Adversarial Examples Dongyu - - PowerPoint PPT Presentation
Mag Net A Two-Pronged Defense against Adversarial Examples Dongyu Meng Hao Chen ShanghaiTech University, China University of California, Davis, USA Neural networks in real-life applications user authentication autonomous
A Two-Pronged Defense against Adversarial Examples
Dongyu Meng Hao Chen
ShanghaiTech University, China University of California, Davis, USA
2
user authentication autonomous vehicle
3
Panda 0.62 Tiger 0.03 Gibbon 0.11
Classifier Input Output
(distribution)
p(x is panda) = 0.58
4
p(x is gibbon) = 0.99
[ICLR 15] Goodfellow, Shlens, and Szegedy. EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES
+ x
Examples carefully crafted to
x
Fast gradient sign method(FGSM) Carlini’s attack Iterative gradient sign Deepfool ……
5
confidence
[Goodfellow, 2015] [Carlini, 2017] [Kurakin, 2016] [Moosavi-Dezfooli, 2015]
Adversarial training Defensive distillation Detecting specific attacks ……
6
[Goodfellow, 2015] [Papernot, 2016] [Metzen, 2017]
target specific attack
Yes Yes
modify classifier
Yes Yes
7
Does not modify target classifier.
Does not rely on attack-specific properties.
8
Possible inputs take up dense sample space. Manifold But inputs we care about lie on a low dimensional manifold.
9
Some adversarial examples are far away from the manifold. Classifiers are not trained to work on these inputs. Manifold
10
Other adversarial example are close to the manifold boundary where the classifier generalizes poorly. Manifold
11
12
Detector: Decides if the example is far from the manifold. Manifold
13
Reformer: Draws the example towards the manifold. Manifold
14
MagNet rejects the input MagNet returns y
15
Reconstruction error:
16
Autoencoders
Train autoencoders on normal examples only as building blocks.
17
autoencoder
?
yes Input is normal. MagNet accepts the input. no Input is adversarial. MagNet rejects the input.
||X-X’||2< threshold?
18
autoencoder
?
classifier
Panda ... Tiger ... Gibbon ...
classifier
Panda 0.62 Tiger 0.03 Gibbon 0.11
DKL(P||Q)
P Q
Panda ... Tiger ... Gibbon ... Panda 0.62 Tiger 0.03 Gibbon 0.11
P Q
DKL(P||Q) < threshold?
yes Input is normal. MagNet accepts the input. no Input is adversarial. MagNet rejects the input.
19
autoencoder
?
classifier
Panda ... Tiger ... Gibbon ...
Q
MagNet returns Q as final classification result.
20
target classifier defense blackbox defense whitebox defense knows the parameters of ...
21
accuracy on adversarial examples
22
accuracy on adversarial examples
23
Detector and reformer complement each other.
(distortion)
large distortion more transferable small distortion less noticeable
complete MagNet: detector+reformer detector reformer no defense
confidence
24
To defeat whitebox attacker, defender has to either
25
classifier defense blackbox defense graybox defense whitebox defense
knows the parameters of...
26
With MagNet, this means training diverse autoencoders.
Train n autoencoders at the same time. Minimize
reconstruction error autoencoder diversity average reconstructed image
27
Idea Penalize the resemblance of autoencoders. generate attack on defend with Graybox classification accuracy
28
The effectiveness of MagNet depends on assumptions that
We show empirically that these assumptions are likely correct.
29
We propose MagNet framework:
We demonstrated effective defense against adversarial examples in blackbox scenario with MagNet. Instead of whitebox model, we advocate graybox model, where security rests on model diversity.
30
Find more about MagNet: