Adversarial Examples and Adversarial Training
Ian Goodfellow, OpenAI Research Scientist Presentation at San Francisco AI Meetup, 2016-08-18
Adversarial Examples and Adversarial Training Ian Goodfellow, - - PowerPoint PPT Presentation
Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Presentation at San Francisco AI Meetup, 2016-08-18 In this presentation Intriguing Properties of Neural Networks Szegedy et al, 2013
Ian Goodfellow, OpenAI Research Scientist Presentation at San Francisco AI Meetup, 2016-08-18
(Goodfellow 2016)
et al, 2013
Goodfellow et al 2014
Networks” Warde-Farley and Goodfellow, 2016
(Goodfellow 2016)
to Black-Box Attacks using Adversarial Samples” Papernot et al 2016
Systems using Adversarial Examples” Papernot et al 2016
Networks for Malware Classification” Grosse et al 2016 (not my own work)
(Goodfellow 2016)
Adversarial Training” Miyato et al 2015 (not my own work)
Supervised Text Classification” Miyato et al 2016
Kurakin et al 2016
(Goodfellow 2016)
systems?
learning, even when there is no adversary
(Goodfellow 2016)
Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against Machine Learning at Test Time” Biggio 2013: fool neural nets Szegedy et al 2013: fool ImageNet classifiers imperceptibly Goodfellow et al 2014: cheap, closed form attack
(Goodfellow 2016)
(Goodfellow 2016)
(Goodfellow 2016)
(Goodfellow 2016)
(Goodfellow 2016)
(Goodfellow 2016)
Rectified linear unit Carefully tuned sigmoid Maxout LSTM
Google Proprietary
Rectified linear unit Carefully tuned sigmoid Maxout LSTM
(Goodfellow 2016)
(Goodfellow 2016)
(collaboration with David Warde-Farley and Nicolas Papernot)
(Goodfellow 2016)
(Goodfellow 2016)
Adversarial examples are not noise
(collaboration with David Warde-Farley and Nicolas Papernot)
(Goodfellow 2016)
(“Clever Hans, Clever Algorithms,” Bob Sturm)
(Goodfellow 2016)
Clean example Perturbation Corrupted example All three perturbations have L2 norm 3.96 This is actually small. We typically use 7! Perturbation changes the true class Random perturbation does not change the class Perturbation changes the input to “rubbish class”
(Goodfellow 2016)
(Goodfellow 2016)
(Goodfellow 2016)
(Goodfellow 2016)
(Papernot 2016)
(Goodfellow 2016)
Train your
Target model with unknown weights, machine learning algorithm, training set; maybe non- differentiable Substitute model mimicking target model with known, differentiable function Adversarial examples Adversarial crafting against substitute Deploy adversarial examples against the target; transferability property results in them succeeding
(Goodfellow 2016)
(Pinna and Gregory, 2002) These are concentric circles, not intertwined spirals.
(Goodfellow 2016)
(MetaMind, Amazon, Google)
and fool machine learning systems that perceive them through a camera
(Goodfellow 2016)
(Goodfellow 2016)
Weight decay Adding noise at test time Adding noise at train time Dropout Ensembles Multiple glimpses Generative pretraining Removing perturbation with an autoencoder Error correcting codes Confidence-reducing perturbation at test time Various non-linear units Double backprop
(Goodfellow 2016)
(Goodfellow 2016)
Labeled as bird Decrease probability
Still has same label (bird)
(Goodfellow 2016)
Unlabeled; model guesses it’s probably a bird, maybe a plane Adversarial perturbation intended to change the guess New guess should match old guess (probably bird, maybe plane)
(Goodfellow 2016)
RCV1 Misclassification Rate
6.00 6.50 7.00 7.50 8.00
Earlier SOTA SOTA Our baseline Adversarial Virtual Adversarial Both Both + bidirectional model
6.68 6.97 7.05 7.12 7.40 7.20 7.70
Zoomed in for legibility
(Goodfellow 2016)
semi-supervised learning