Adversarial Examples and Adversarial Training Ian Goodfellow, - PowerPoint PPT Presentation

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Presentation at HORSE 2016 London, 2016-09-19

In this presentation “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and Harnessing Adversarial Examples” Goodfellow et al 2014 • “Adversarial Perturbations of Deep Neural Networks” Warde-Farley and • Goodfellow, 2016 “Transferability in Machine Learning: from Phenomena to Black-Box Attacks • using Adversarial Samples” Papernot et al 2016 “Practical Black-Box Attacks against Deep Learning Systems using • Adversarial Examples” Papernot et al 2016 “Adversarial Perturbations Against Deep Neural Networks for Malware • Classification” Grosse et al 2016 (not my own work) “Distributional Smoothing with Virtual Adversarial Training” Miyato et al • 2015 (not my own work) “Virtual Adversarial Training for Semi-Supervised Text Classification” • Miyato et al 2016 “Adversarial Examples in the Physical World” Kurakin et al 2016 • (Goodfellow 2016)

Overview • What causes adversarial examples? • How can they be used to compromise machine learning systems? • Adversarial training and virtual adversarial training • New open source adversarial example library: cleverhans (Goodfellow 2016)

Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against Machine Learning at Test Time” Biggio 2013: fool neural nets Szegedy et al 2013: fool ImageNet classifiers imperceptibly Goodfellow et al 2014: cheap, closed form attack (Goodfellow 2016)

Attacking a Linear Model (Goodfellow 2016)

Adversarial Examples from Overfitting O O x x O O x x (Goodfellow 2016)

Adversarial Examples from Excessive Linearity O O O O x x x O x (Goodfellow 2016)

Modern deep nets are very piecewise linear Modern deep nets are very (piecewise) linear Rectified linear unit Maxout Rectified linear unit Maxout LSTM Carefully tuned sigmoid Carefully tuned sigmoid LSTM (Goodfellow 2016) Google Proprietary

Maps of Adversarial and Random Cross-Sections (collaboration with David Warde-Farley and Nicolas Papernot) (Goodfellow 2016)

Maps of Random Cross-Sections Adversarial examples are not noise (collaboration with David Warde-Farley and Nicolas Papernot) (Goodfellow 2016)

Clever Hans (“Clever Hans, Clever Algorithms,” Bob Sturm) (Goodfellow 2016)

Small inter-class distances Corrupted Clean Perturbation example example Perturbation changes the true class Random perturbation does not change the class Perturbation changes the input to “rubbish class” All three perturbations have L2 norm 3.96 This is actually small. We typically use 7! (Goodfellow 2016)

The Fast Gradient Sign Method (Goodfellow 2016)

Wrong almost everywhere (Goodfellow 2016)

Cross-model, cross-dataset generalization (Goodfellow 2016)

Cross-technique transferability •Fool cloud ML API •Amazon •Google •MetaMind •Fool malware detector (Papernot 2016) (Goodfellow 2016)

Adversarial Examples in the Physical World (Goodfellow 2016)

Adversarial Examples in the Human Brain These are concentric circles, not intertwined spirals. (Pinna and Gregory, 2002) (Goodfellow 2016)

Failed defenses Generative Removing perturbation pretraining with an autoencoder Adding noise at test time Ensembles Confidence-reducing Error correcting perturbation at test time codes Multiple glimpses Weight decay Double backprop Adding noise Various at train time Dropout non-linear units (Goodfellow 2016)

Training on Adversarial Examples (Goodfellow 2016)

Virtual Adversarial Training Unlabeled; model New guess should guesses it’s probably match old guess a bird, maybe a plane (probably bird, maybe plane) Adversarial perturbation intended to change the guess (Goodfellow 2016)

cleverhans Open-source library available at: https://github.com/openai/cleverhans Built on top of TensorFlow (Theano support anticipated) Benchmark your model against di ff erent adversarial examples attacks Beta version 0.1 released, more attacks and features to be added (Goodfellow 2016)

Adversarial Examples and Adversarial Training Ian Goodfellow, - PowerPoint PPT Presentation

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Presentation at HORSE 2016 London, 2016-09-19 In this presentation Intriguing Properties of Neural Networks Szegedy et al, 2013 Explaining

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

A Closer Look at Adversarial Examples for Separated Data Kamalika Chaudhuri University of

Adversarial Examples Hanxiao Liu April 2, 2018 1 / 22 Adversarial Examples Inputs to ML

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Guest

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist

Adversarial Examples and Adversarial Training Innova&ve Technology Leader program January 22

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Security

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Physical Adversarial Examples Alex Kurakin Ian Goodfellow Output STOP Machine Learning

Adversarial Examples in NLP Sameer Singh sameer@uci.edu @sameer_ sameersingh.org What are

Can we determine the particle/antiparticle nature of Dark Matter? Bradley J. Kavanagh GRAPPA,

AIRS and the GCSS Pacific Cross-section Intercomparison (GPCI): Evaluating the Physics of Climate

WIMP dark matter David G. Cerdeo (Supersymmetric) WIMP dark matter David G. Cerdeo

Neutrino-nucleus cross-section measurements at T2K Callum Wilkinson On behalf of the T2K

Secondary Aim Jeffrey (Jerry) Jarvik, MD MPH Departments of Radiology, Neurological Surgery,

double parton scattering at the LHC in the W W channel marc dnser (CERN) 20th of march

MAPPING THE GEOLOGIC SUBSURFACE IN NEW YORK CITY Thursday, October 19, 2017 Presented by Dennis

Vertical structure Now we will examine the vertical structure of the intense baroclinic wave

Adversarial Examples and Adversarial Training Ian Goodfellow, - PowerPoint PPT Presentation

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Presentation at HORSE 2016 London, 2016-09-19 In this presentation Intriguing Properties of Neural Networks Szegedy et al, 2013 Explaining

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

A Closer Look at Adversarial Examples for Separated Data Kamalika Chaudhuri University of

Adversarial Examples Hanxiao Liu April 2, 2018 1 / 22 Adversarial Examples Inputs to ML

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Guest

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist

Adversarial Examples and Adversarial Training Innova&amp;ve Technology Leader program January 22

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Security

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Physical Adversarial Examples Alex Kurakin Ian Goodfellow Output STOP Machine Learning

Adversarial Examples in NLP Sameer Singh sameer@uci.edu @sameer_ sameersingh.org What are

Can we determine the particle/antiparticle nature of Dark Matter? Bradley J. Kavanagh GRAPPA,

AIRS and the GCSS Pacific Cross-section Intercomparison (GPCI): Evaluating the Physics of Climate

WIMP dark matter David G. Cerdeo (Supersymmetric) WIMP dark matter David G. Cerdeo

Neutrino-nucleus cross-section measurements at T2K Callum Wilkinson On behalf of the T2K

Secondary Aim Jeffrey (Jerry) Jarvik, MD MPH Departments of Radiology, Neurological Surgery,

double parton scattering at the LHC in the W W channel marc dnser (CERN) 20th of march

MAPPING THE GEOLOGIC SUBSURFACE IN NEW YORK CITY Thursday, October 19, 2017 Presented by Dennis

Vertical structure Now we will examine the vertical structure of the intense baroclinic wave

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

Adversarial Examples and Adversarial Training Innova&ve Technology Leader program January 22