Adversarial Examples and Adversarial Training Ian Goodfellow, - - PowerPoint PPT Presentation

adversarial examples and adversarial training
SMART_READER_LITE
LIVE PREVIEW

Adversarial Examples and Adversarial Training Ian Goodfellow, - - PowerPoint PPT Presentation

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Presentation at San Francisco AI Meetup, 2016-08-18 In this presentation Intriguing Properties of Neural Networks Szegedy et al, 2013


slide-1
SLIDE 1

Adversarial Examples and Adversarial Training

Ian Goodfellow, OpenAI Research Scientist Presentation at San Francisco AI Meetup, 2016-08-18

slide-2
SLIDE 2

(Goodfellow 2016)

In this presentation

  • “Intriguing Properties of Neural Networks” Szegedy

et al, 2013

  • “Explaining and Harnessing Adversarial Examples”

Goodfellow et al 2014

  • “Adversarial Perturbations of Deep Neural

Networks” Warde-Farley and Goodfellow, 2016

slide-3
SLIDE 3

(Goodfellow 2016)

In this presentation

  • “Transferability in Machine Learning: from Phenomena

to Black-Box Attacks using Adversarial Samples” Papernot et al 2016

  • “Practical Black-Box Attacks against Deep Learning

Systems using Adversarial Examples” Papernot et al 2016

  • “Adversarial Perturbations Against Deep Neural

Networks for Malware Classification” Grosse et al 2016 (not my own work)

slide-4
SLIDE 4

(Goodfellow 2016)

In this presentation

  • “Distributional Smoothing with Virtual

Adversarial Training” Miyato et al 2015 (not my own work)

  • “Virtual Adversarial Training for Semi-

Supervised Text Classification” Miyato et al 2016

  • “Adversarial Examples in the Physical World”

Kurakin et al 2016

slide-5
SLIDE 5

(Goodfellow 2016)

Overview

  • What are adversarial examples?
  • Why do they happen?
  • How can they be used to compromise machine learning

systems?

  • What are the defenses?
  • How to use adversarial examples to improve machine

learning, even when there is no adversary

slide-6
SLIDE 6

(Goodfellow 2016)

Adversarial Examples

Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against Machine Learning at Test Time” Biggio 2013: fool neural nets Szegedy et al 2013: fool ImageNet classifiers imperceptibly Goodfellow et al 2014: cheap, closed form attack

slide-7
SLIDE 7

(Goodfellow 2016)

Turning Objects into “Airplanes”

slide-8
SLIDE 8

(Goodfellow 2016)

Attacking a Linear Model

slide-9
SLIDE 9

(Goodfellow 2016)

Not just for neural nets

  • Linear models
  • Logistic regression
  • Softmax regression
  • SVMs
  • Decision trees
  • Nearest neighbors
slide-10
SLIDE 10

(Goodfellow 2016)

Adversarial Examples from Overfitting

x x x

O O O

x

O

slide-11
SLIDE 11

(Goodfellow 2016)

Adversarial Examples from Excessive Linearity

x x x

O O O O O

x

slide-12
SLIDE 12

(Goodfellow 2016)

Modern deep nets are very piecewise linear

Rectified linear unit Carefully tuned sigmoid Maxout LSTM

Google Proprietary

Modern deep nets are very (piecewise) linear

Rectified linear unit Carefully tuned sigmoid Maxout LSTM

slide-13
SLIDE 13

(Goodfellow 2016)

Nearly Linear Responses in Practice

slide-14
SLIDE 14

(Goodfellow 2016)

Maps of Adversarial and Random Cross-Sections

(collaboration with David Warde-Farley and Nicolas Papernot)

slide-15
SLIDE 15

(Goodfellow 2016)

Maps of Adversarial Cross-Sections

slide-16
SLIDE 16

(Goodfellow 2016)

Maps of Random Cross-Sections

Adversarial examples are not noise

(collaboration with David Warde-Farley and Nicolas Papernot)

slide-17
SLIDE 17

(Goodfellow 2016)

Clever Hans

(“Clever Hans, Clever Algorithms,” Bob Sturm)

slide-18
SLIDE 18

(Goodfellow 2016)

Small inter-class distances

Clean example Perturbation Corrupted example All three perturbations have L2 norm 3.96 This is actually small. We typically use 7! Perturbation changes the true class Random perturbation does not change the class Perturbation changes the input to “rubbish class”

slide-19
SLIDE 19

(Goodfellow 2016)

The Fast Gradient Sign Method

slide-20
SLIDE 20

(Goodfellow 2016)

Wrong almost everywhere

slide-21
SLIDE 21

(Goodfellow 2016)

Cross-model, cross-dataset generalization

slide-22
SLIDE 22

(Goodfellow 2016)

Cross-technique transferability

(Papernot 2016)

slide-23
SLIDE 23

(Goodfellow 2016)

Train your

  • wn model

Transferability Attack

Target model with unknown weights, machine learning algorithm, training set; maybe non- differentiable Substitute model mimicking target model with known, differentiable function Adversarial examples Adversarial crafting against substitute Deploy adversarial examples against the target; transferability property results in them succeeding

slide-24
SLIDE 24

(Goodfellow 2016)

Adversarial Examples in the Human Brain

(Pinna and Gregory, 2002) These are concentric circles, not intertwined spirals.

slide-25
SLIDE 25

(Goodfellow 2016)

Practical Attacks

  • Fool real classifiers trained by remotely hosted API

(MetaMind, Amazon, Google)

  • Fool malware detector networks
  • Display adversarial examples in the physical world

and fool machine learning systems that perceive them through a camera

slide-26
SLIDE 26

(Goodfellow 2016)

Adversarial Examples in the Physical World

slide-27
SLIDE 27

(Goodfellow 2016)

Failed defenses

Weight decay Adding noise at test time Adding noise at train time Dropout Ensembles Multiple glimpses Generative pretraining Removing perturbation with an autoencoder Error correcting codes Confidence-reducing perturbation at test time Various non-linear units Double backprop

slide-28
SLIDE 28

(Goodfellow 2016)

Training on Adversarial Examples

slide-29
SLIDE 29

(Goodfellow 2016)

Adversarial Training

Labeled as bird Decrease probability

  • f bird class

Still has same label (bird)

slide-30
SLIDE 30

(Goodfellow 2016)

Virtual Adversarial Training

Unlabeled; model guesses it’s probably a bird, maybe a plane Adversarial perturbation intended to change the guess New guess should match old guess (probably bird, maybe plane)

slide-31
SLIDE 31

(Goodfellow 2016)

Text Classification with VAT

RCV1 Misclassification Rate

6.00 6.50 7.00 7.50 8.00

Earlier SOTA SOTA Our baseline Adversarial Virtual Adversarial Both Both + bidirectional model

6.68 6.97 7.05 7.12 7.40 7.20 7.70

Zoomed in for legibility

slide-32
SLIDE 32

(Goodfellow 2016)

Conclusion

  • Attacking is easy
  • Defending is difficult
  • Benchmarking vulnerability is training
  • Adversarial training provides regularization and

semi-supervised learning