Thermometer Encoding: One Hot Way to Resist Adversarial Examples - - PowerPoint PPT Presentation

thermometer encoding one hot way to resist adversarial
SMART_READER_LITE
LIVE PREVIEW

Thermometer Encoding: One Hot Way to Resist Adversarial Examples - - PowerPoint PPT Presentation

Thermometer Encoding: One Hot Way to Resist Adversarial Examples Stanford, 2017-11-16 Aurko Roy* Colin Ra ff el Jacob Ian Buckman* Goodfellow *joint first author Adversarial Examples Adversarial Definitely Probably panda perturbation


slide-1
SLIDE 1

Thermometer Encoding: One Hot Way to Resist Adversarial Examples

Stanford, 2017-11-16

Jacob Buckman* Aurko Roy* Colin Raffel Ian Goodfellow *joint first author

slide-2
SLIDE 2

(Goodfellow 2017)

Adversarial Examples

Probably panda Adversarial perturbation Definitely gibbon

Image from “Explaining and Harnessing Adversarial Examples”, Goodfellow et al, 2014

slide-3
SLIDE 3

(Goodfellow 2017)

Unreasonable Linear Extrapolation

Argument to softmax

Plot from “Explaining and Harnessing Adversarial Examples”, Goodfellow et al, 2014

slide-4
SLIDE 4

(Goodfellow 2017)

Difficult to train extremely nonlinear hidden layers

To train: changing this weight needs to have a large, predictable effect To defend: changing this input needs to have a small or unpredictable effect

slide-5
SLIDE 5

(Goodfellow 2017)

Idea: edit only the input layer

DEFENSE

Train

  • nly

this part

slide-6
SLIDE 6

(Goodfellow 2017)

slide-7
SLIDE 7

(Goodfellow 2017)

Observation: PixelRNN shows

  • ne-hot codes work

Plot from “Pixel Recurrent Neural Networks”, van den Oord et al, 2016

slide-8
SLIDE 8

(Goodfellow 2017)

slide-9
SLIDE 9

(Goodfellow 2017)

Fast Improvement Early in Learning

slide-10
SLIDE 10

(Goodfellow 2017)

Large improvements on SVHN white box attacks

5 years ago, this would have been SOTA

  • n clean data
slide-11
SLIDE 11

(Goodfellow 2017)

Large Improvements against CIFAR-10 white box attacks

6 years ago, this would have been SOTA

  • n clean data
slide-12
SLIDE 12

(Goodfellow 2017)

Other results

  • Improvement on CIFAR-100
  • (Still very broken)
  • Improvement on MNIST
  • Please quit caring about MNIST
slide-13
SLIDE 13

(Goodfellow 2017)

Caveats

  • Slight drop in accuracy on clean examples
  • Only small improvement on black-box adversarial

examples

slide-14
SLIDE 14

(Goodfellow 2017)

Get involved!

https://github.com/tensorflow/cleverhans

slide-15
SLIDE 15

(Goodfellow 2017)

g.co/airesidency