Mag Net A Two-Pronged Defense against Adversarial Examples Dongyu - - PowerPoint PPT Presentation

mag net
SMART_READER_LITE
LIVE PREVIEW

Mag Net A Two-Pronged Defense against Adversarial Examples Dongyu - - PowerPoint PPT Presentation

Mag Net A Two-Pronged Defense against Adversarial Examples Dongyu Meng Hao Chen ShanghaiTech University, China University of California, Davis, USA Neural networks in real-life applications user authentication autonomous


slide-1
SLIDE 1

A Two-Pronged Defense against Adversarial Examples

Dongyu Meng Hao Chen

ShanghaiTech University, China University of California, Davis, USA

Mag Net

slide-2
SLIDE 2

2

Neural networks in real-life applications

user authentication autonomous vehicle

slide-3
SLIDE 3

Neural networks as classifier

3

Panda 0.62 Tiger 0.03 Gibbon 0.11

Classifier Input Output

(distribution)

slide-4
SLIDE 4

Adversarial examples

p(x is panda) = 0.58

4

p(x is gibbon) = 0.99

[ICLR 15] Goodfellow, Shlens, and Szegedy. EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES

+ x

Examples carefully crafted to

  • look like normal examples
  • cause misclassification

x

gibbon panda

slide-5
SLIDE 5

Attacks

Fast gradient sign method(FGSM) Carlini’s attack Iterative gradient sign Deepfool ……

5

confidence

[Goodfellow, 2015] [Carlini, 2017] [Kurakin, 2016] [Moosavi-Dezfooli, 2015]

slide-6
SLIDE 6

Defenses

Adversarial training Defensive distillation Detecting specific attacks ……

6

[Goodfellow, 2015] [Papernot, 2016] [Metzen, 2017]

target specific attack

Yes Yes

modify classifier

Yes Yes

slide-7
SLIDE 7

Desirable properties

7

Does not modify target classifier.

  • Can be deployed more easily as an add-on.

Does not rely on attack-specific properties.

  • Generalizes to unknown attacks.
slide-8
SLIDE 8

Manifold hypothesis

8

Possible inputs take up dense sample space. Manifold But inputs we care about lie on a low dimensional manifold.

slide-9
SLIDE 9

Our hypothesis for adversarial examples

9

Some adversarial examples are far away from the manifold. Classifiers are not trained to work on these inputs. Manifold

slide-10
SLIDE 10

Our hypothesis for adversarial examples

10

Other adversarial example are close to the manifold boundary where the classifier generalizes poorly. Manifold

slide-11
SLIDE 11

Sanitize your inputs.

11

slide-12
SLIDE 12

Our solution

12

Detector: Decides if the example is far from the manifold. Manifold

slide-13
SLIDE 13

Our solution

13

Reformer: Draws the example towards the manifold. Manifold

slide-14
SLIDE 14

14

MagNet rejects the input MagNet returns y

Workflow

slide-15
SLIDE 15

Autoencoder

15

  • Neural nets.
  • Learn to copy input to output.
  • Trained with constraints.

Reconstruction error:

slide-16
SLIDE 16

Autoencoder

16

Autoencoders

  • learn to map inputs towards manifold.
  • approximate input-manifold distance with reconstruction error.

Train autoencoders on normal examples only as building blocks.

slide-17
SLIDE 17

Detector

17

  • - based on reconstruction error

autoencoder

?

?

yes Input is normal. MagNet accepts the input. no Input is adversarial. MagNet rejects the input.

||X-X’||2< threshold?

x x’ x’

slide-18
SLIDE 18

Detector

18

  • - based on probability divergence

autoencoder

?

?

classifier

Panda ... Tiger ... Gibbon ...

classifier

Panda 0.62 Tiger 0.03 Gibbon 0.11

?

DKL(P||Q)

P Q

Panda ... Tiger ... Gibbon ... Panda 0.62 Tiger 0.03 Gibbon 0.11

?

P Q

DKL(P||Q) < threshold?

yes Input is normal. MagNet accepts the input. no Input is adversarial. MagNet rejects the input.

x x’ x’

slide-19
SLIDE 19

Reformer

19

autoencoder

?

?

classifier

Panda ... Tiger ... Gibbon ...

Q

MagNet returns Q as final classification result.

x x’ x’

slide-20
SLIDE 20

Threat model

20

target classifier defense blackbox defense whitebox defense knows the parameters of ...

slide-21
SLIDE 21

Blackbox defense on MNIST dataset

21

accuracy on adversarial examples

slide-22
SLIDE 22

Blackbox defense on CIFAR-10 dataset

22

accuracy on adversarial examples

slide-23
SLIDE 23

Detector vs. reformer

23

Detector and reformer complement each other.

(distortion)

large distortion more transferable small distortion less noticeable

complete MagNet: detector+reformer detector reformer no defense

confidence

slide-24
SLIDE 24

Whitebox defense is not practical

24

To defeat whitebox attacker, defender has to either

  • make it impossible for attacker to find adversarial examples,
  • or create a perfect classification network.
slide-25
SLIDE 25

Graybox model

25

  • Attacker knows possible defenses.
  • Exact defense is only known at run time.

classifier defense blackbox defense graybox defense whitebox defense

Defense strategy

  • Train diverse defenses.
  • Randomly pick one for each session.

A B C D

knows the parameters of...

slide-26
SLIDE 26

Train diverse defenses

26

With MagNet, this means training diverse autoencoders.

Our Method:

Train n autoencoders at the same time. Minimize

reconstruction error autoencoder diversity average reconstructed image

slide-27
SLIDE 27

Train diverse defense

27

Idea Penalize the resemblance of autoencoders. generate attack on defend with Graybox classification accuracy

slide-28
SLIDE 28

Limitations

28

The effectiveness of MagNet depends on assumptions that

  • detector and reformer functions exist.
  • we can approximate them with autoencoders.

We show empirically that these assumptions are likely correct.

slide-29
SLIDE 29

Conclusion

29

We propose MagNet framework:

  • Detector detects examples far from the manifold
  • Reformer moves examples closer to the manifold

We demonstrated effective defense against adversarial examples in blackbox scenario with MagNet. Instead of whitebox model, we advocate graybox model, where security rests on model diversity.

slide-30
SLIDE 30

Thanks & Questions?

30

Find more about MagNet:

  • https://arxiv.org/abs/1705.09064 Paper
  • https://github.com/Trevillie/MagNet Demo code
  • mengdy.me Author homepage