Adversarial Examples are Not Easily Detected: Bypassing Ten - PowerPoint PPT Presentation

Adversarial Examples are Not Easily Detected: Bypassing Ten Detection Methods Nicholas Carlini, David Wagner University of California, Berkeley

Background

Neural Networks • I assume knowledge of neural networks ... • This talk: neural networks for classification • Specifically image-based classification

Background: Adversarial Examples • Given an input X classified as label T ... • ... it is easy to find an X ′ close to X • ... so that F(X ′ ) != T

Constructing Adversarial Examples • Formulation: given input x, find x ′ where   minimize d(x,x ′ ) + L(x ′ )   such that x ′ is "valid"   • Where L(x ′ ) is a loss function minimized when   F(x ′ ) != T and maximized when F(x ′ ) = T • Solve via gradient descent

MNIST Normal Adversarial 7 8 9 8

CIFAR-10 Normal Adversarial Truck Airplane

This is decidedly bad

But also: ripe opportunity for research!

Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification. Xiaoyu Cao, Neil Zhenqiang Gong APE-GAN: Adversarial Perturbation Elimination with GAN. Shiwei Shen, Guoqing Jin, Ke Gao, Yongdong Zhang A Learning Approach to Secure Learning. Linh Nguyen, Arunesh Sinha EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples. Pin-Yu Chen, Yash Sharma, Huan Zhang, Jinfeng Yi, Cho-Jui Hsieh Ensemble Methods as a Defense to Adversarial Perturbations Against Deep Neural Networks. Thilo Strauss, Markus Hanselmann, Andrej Junginger, Holger Ulmer MagNet: a Two-Pronged Defense against Adversarial Examples. Dongyu Meng, Hao Chen CuRTAIL: ChaRacterizing and Thwarting AdversarIal deep Learning. Bita Darvish Rouhani, Mohammad Samragh, Tara Javidi, Farinaz Koushanfar Efficient Defenses Against Adversarial Attacks. Valentina Zantedeschi, Maria-Irina Nicolae, Ambrish Rawat Learning Adversary-Resistant Deep Neural Networks. Qinglong Wang, Wenbo Guo, Kaixuan Zhang, Alexander G. Ororbia II, Xinyu Xing, Xue Liu, C. Lee Giles SafetyNet: Detecting and Rejecting Adversarial Examples Robustly. Jiajun Lu, Theerasit Issaranon, David Forsyth Enhancing Robustness of Machine Learning Systems via Data Transformations. Arjun Nitin Bhagoji, Daniel Cullina, Bink Sitawarin, Prateek Mittal Towards Deep Learning Models Resistant to Adversarial Attacks. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu Towards Robust Deep Neural Networks with BANG. Andras Rozsa, Manuel Gunther, Terrance E. Boult Deep Variational Information Bottleneck. Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy NO Need to Worry about Adversarial Examples in Object Detection in Autonomous Vehicles. Jiajun Lu, Hussein

Research Question: Which of these defenses are robust?

Focus of this talk: detection schemes

Normal Classifier 7 Classifier

Normal Classifier 8 Classifier

Detector & Classifier 7 Detector Classifier

Detector & Classifier Detector Classifier

This Talk: 1. How to evaluate a defense 2. Comment on explored directions

Defense #1: PCA-based detection Dan Hendrycks and Kevin Gimpel. 2017. Early Methods for Detecting Adversarial Images. In International Conference on Learning Representations (Workshop Track)

PCA-based detection • Hypothesis: Adversarial examples rely on later principle components • ... and valid images don't ... • ... so let's detect use of high components

Normal Adversarial

It works!

Attack: Only modify regions of the image that are also used in normal images.

Original Adversarial (unsecured) Adversarial (with detector)

Lesson 1: Separate the artifacts of one attack vs intrinsic properties of adversarial examples

Lesson 2: MNIST is insufficient CIFAR is better

Defense #2: Additional Neural Network Detection Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischo. 2017. On Detecting Adversarial Perturbations. In International Conference on Learning Representations.

Normal Training ( , ) 7 F Training ( , ) 3

Adversarial Training ( , ) 7 Attack ( , ) 3 ( , ) n ( , ) n

Adversarial Training ( , ) y G Training ( , ) y ( , n) n n ( , n)

Sounds great.

Sounds great. But we already know it's easy to fool neural networks ...

... so just construct adversarial examples to   1. be misclassified 2. not be detected

    Breaking Adversarial Training • minimize d(x,x ′ ) + L(x ′ )   such that x ′ is "valid" • Old: L(x ′ ) measures loss of classifier on x ′  

Breaking Adversarial Training • minimize d(x,x ′ ) + L(x ′ ) + M(x ′ )   such that x ′ is "valid" • Old: L(x ′ ) measures loss of classifier on x ′ • New: M(x ′ ) measures loss of detector on x ′  

Lesson 3: Minimize over (compute gradients through) the full defense

Defense #3: Network Randomization Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. 2017. Detecting Adversarial Samples from Artifacts.

Randomized Classifier 7 7 Classifier

Randomized Classifier 3 2 6 7 Classifier

    Breaking Randomization • minimize d(x,x ′ ) + L(x ′ )   such that x ′ is "valid" • Old: L(x ′ ) measures loss of network on x ′  

Breaking Randomization • minimize d(x,x ′ ) + E[L(x ′ )]   such that x ′ is "valid" • Old: L(x ′ ) measures loss of network on x ′ • Now: E[L(x ′ )] expected loss of network on x ′  

Evaluation Lessons 1. Don't evaluate only on MNIST 2. Minimize over the full defense 3. Use a strong iterative attack 4. Release your source code! https://nicholas.carlini.com/nn_breaking_detection

Adversarial Examples are Not Easily Detected: Bypassing Ten - PowerPoint PPT Presentation

Adversarial Examples are Not Easily Detected: Bypassing Ten Detection Methods Nicholas Carlini, David Wagner University of California, Berkeley Background Neural Networks I assume knowledge of neural networks ... This talk: neural

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

A Closer Look at Adversarial Examples for Separated Data Kamalika Chaudhuri University of

Adversarial Examples Hanxiao Liu April 2, 2018 1 / 22 Adversarial Examples Inputs to ML

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

The Fermi-detected and non- detected BL Lac objects Minfeng

A Gamma A Gamma-ray Source ray Source Detected by the Fermi Detected by the Fermi- LAT at the

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Adversarial Examples in NLP Sameer Singh sameer@uci.edu @sameer_ sameersingh.org What are

Thermometer Encoding: One Hot Way to Resist Adversarial Examples Stanford, 2017-11-16 Aurko Roy*

Bypassing the hubs - The potential of secondary European airports in the long haul sector Sven

Bypassing Memory Protections: The Future of Exploitation Alexander Sotirov alex@sotirov.net

02/01/2019 Herodotus 484-425 BC Roman C2nd AD copy of Greek C4th BC original The Seven Wonders

Experimental observables and transport models: a challenge in HIC from low to high energy regime

14 Chapter Exercises Inheritance and Polymorphism Consider the following class definitions.

Glaciology Danish Meteorological Institute Glaciers NASA Ice Sheets Modern glaciology

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Fellows Leaders We will begin the training at 8 p.m. ET / 7 p.m. CT SPRING 2018 OFA FELLOWS

the Paris Agreement Welcome To our 2020 Climate Survey Webinar #ClimateSurvey Available Now:

The Complexity of Boundedness for Gu a rded L ogi c s Micha el B enedikt 1 , Ba lder ten Ca te 2 ,

Adversarial Examples are Not Easily Detected: Bypassing Ten - PowerPoint PPT Presentation

Adversarial Examples are Not Easily Detected: Bypassing Ten Detection Methods Nicholas Carlini, David Wagner University of California, Berkeley Background Neural Networks I assume knowledge of neural networks ... This talk: neural

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

A Closer Look at Adversarial Examples for Separated Data Kamalika Chaudhuri University of

Adversarial Examples Hanxiao Liu April 2, 2018 1 / 22 Adversarial Examples Inputs to ML

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

The Fermi-detected and non- detected BL Lac objects Minfeng

A Gamma A Gamma-ray Source ray Source Detected by the Fermi Detected by the Fermi- LAT at the

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Adversarial Examples in NLP Sameer Singh sameer@uci.edu @sameer_ sameersingh.org What are

Thermometer Encoding: One Hot Way to Resist Adversarial Examples Stanford, 2017-11-16 Aurko Roy*

Bypassing the hubs - The potential of secondary European airports in the long haul sector Sven

Bypassing Memory Protections: The Future of Exploitation Alexander Sotirov alex@sotirov.net

02/01/2019 Herodotus 484-425 BC Roman C2nd AD copy of Greek C4th BC original The Seven Wonders

Experimental observables and transport models: a challenge in HIC from low to high energy regime

14 Chapter Exercises Inheritance and Polymorphism Consider the following class definitions.

Glaciology Danish Meteorological Institute Glaciers NASA Ice Sheets Modern glaciology

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Fellows Leaders We will begin the training at 8 p.m. ET / 7 p.m. CT SPRING 2018 OFA FELLOWS

the Paris Agreement Welcome To our 2020 Climate Survey Webinar #ClimateSurvey Available Now:

The Complexity of Boundedness for Gu a rded L ogi c s Micha el B enedikt 1 , Ba lder ten Ca te 2 ,

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin