by learning the Distributions of Adversarial Examples Boqing Gong - PowerPoint PPT Presentation

by learning the Distributions of Adversarial Examples Boqing Gong Joint work with Yandong Li, Lijun Li, Liqiang Wang, & Tong Zhang Published in ICML 2019

Intriguing properties of deep neural networks (DNNs) Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. ICLR . Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. ICLR .

Projected gradient descent (PGD) attack Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083 . Kurakin, A., Goodfellow, I., & Bengio, S. (2016). Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 .

Intriguing results (1) ~100% attack success rates on CIFAR10 & ImageNet vs

Intriguing results (2)

Intriguing results (2) Adversarial examples generalize between different DNNs vs E.g., AlexNet InceptionV3

Intriguing results (3) A universal adversarial perturbation Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. CVPR.

In a nutshell, white-box adversarial attacks can Fool different DNNs for almost all test examples Most data points lie near the classification boundaries. Fool different DNNs by the same adversarial examples The classification boundaries of various DNNs are close. Fool different DNNs by a single universal perturbation We can turn most examples to adversarial by moving them along the same direction by the same amount.

However, white-box adversarial attacks can Not apply to most real-world scenarios Not work when the network architecture is unknown Not work when the weights are unknown Not work when querying networks is (e.g., cost) prohibitive

Substitute attack (Papernot Black-box attacks et al., 2017) Decision-based (Brendel et al., 2017) Panda Boundary-tracing (Cheng et al., 2019) Zero-th order (Chen et al., 2017) Panda: 0.88493 Natural evolution Indri: 0.00878 strategies (Ilyas et al., 2018) Red Panda: 0.00317

The? adversarial perturbation (for an input) Bad local optimum, non-smooth optimization, curse of dimensionality, defense-specific gradient estimation, etc. Athalye, A., Carlini, N., & Wagner, D. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ICML .

Our work Learns the distribution of adversarial examples (for any input)

Our work Learns the distribution of adversarial examples (for an input) Reduces the “attack dimension” Fewer queries into the network. Smoothes the optimization Higher attack success rates. Characterizes the risk of the input example New defense methods.

Our work Learns the distribution of adversarial examples (for an input) > > >

Our work Learns the distribution of adversarial examples (for an input) A sample from the distribution fails DNN by a high chance.

Which family of distributions?

Natural evolution strategies (NES) Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR. .

Black-box

Experiment setup Attack 13 defended DNNs & 2 vanilla DNNs Consider both and Examine all test examples of CIFAR10 & 1000 of ImageNet Excluding those misclassified by the targeted DNN Evaluate by attack success rates

Attack success rates, ImageNet

Attack success rates, CIFAR10

Attack success rate vs. optimization steps

Transferabilities of the adversarial examples

A universally effective defense technique? Adversarial training / defensive learning DNN weights The PGD attack

In a nutshell, Is a powerful black-box attack, >= white-box attacks Is universal , failed various defenses by the same algorithm Characterizes the distributions of adversarial examples Reduces the “attack dimension” Speeds up the defensive learning (ongoing work)

Physical Joint work with Yang Zhang, adversarial attack Hassan Foroosh, & David Phil Boqing Gong Published in ICLR 2019

Recall the following result A universal adversarial perturbation Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. CVPR.

Physical attack: universal perturbation → 2D mask Eykholt, Kevin, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. "Robust physical-world attacks on deep learning models." CVPR 2018 .

Physical attack: 2D mask → 3D camouflage Gradient descent w.r.t. camouflage c in order to minimize detection scores for the vehicle under all feasible locations Non-differentiable

Physical attack: 2D mask → 3D camouflage Repeat until done 1. Camouflage a vehicle 2. Drive it around and take many pictures of it 3. Detect it by Faster-RCNN & save the detection scores → Dataset: {(camouflage, vehicle, background, detection score)}

Physical attack: 2D mask → 3D camouflage Fit a DNN to predict any camouflage’s corresponding detection scores

Physical attack: 2D mask → 3D camouflage Gradient descent w.r.t. Camouflage c in order to minimize detection scores for the vehicle under all feasible locations Non-differentiable But approximated by a DNN

Why do we care?

Observation, re-observation, & future work Defended DNNs are still vulnerable to transfer attacks (only to some moderate degree though) Adversarial examples from black-box attacks are less transferable than those from white-box attacks All future work on defenses will adopt adversarial training Adversarial training will become faster (we are working on it) We should certify DNNs’ expected robustness by

New works to watch Stateful DNNs: Goodfellow (2019). A Research Agenda: Dynamic Models to Defend Against Correlated Attacks. arXiv: 1903.06293. Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples Are Not Bugs, They Are Features. arXiv:1905.02175. Faster adversarial training: Zhang et al. (2019). You Only Propagate Once: Painless Adversarial Training Using Maximal Principle. arXiv:1905.00877. && Shafahi et al. (2019). Adversarial Training for Free! arXiv: 1904.12843. Certifying DNNs’ expected robustness: Webb et al. (2019). A Statistical Approach to Assessing Neural Network Robustness. ICLR. && Cohen et al. (2019). Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918.

by learning the Distributions of Adversarial Examples Boqing Gong - PowerPoint PPT Presentation

by learning the Distributions of Adversarial Examples Boqing Gong Joint work with Yandong Li, Lijun Li, Liqiang Wang, & Tong Zhang Published in ICML 2019 Intriguing properties of deep neural networks (DNNs) Szegedy, C., Zaremba, W.,

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Outline Power Law Size Distributions Distributions Power Law Size Distributions Overview

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

A Closer Look at Adversarial Examples for Separated Data Kamalika Chaudhuri University of

Adversarial Examples Hanxiao Liu April 2, 2018 1 / 22 Adversarial Examples Inputs to ML

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Adversarial Examples in NLP Sameer Singh sameer@uci.edu @sameer_ sameersingh.org What are

Thermometer Encoding: One Hot Way to Resist Adversarial Examples Stanford, 2017-11-16 Aurko Roy*

Physical Adversarial Examples Alex Kurakin Ian Goodfellow Output STOP Machine Learning

Words & their Meaning: Distributional Semantics CMSC 470 Marine Carpuat Slides credit: Dan

OpenStack based telco clouds Todays Panel Sunil Sood Moderator Ericsson VP, IT & Cloud

CSE 610 Special Topics: System Security - Attack and Defense for Binaries Instructor: Dr. Ziming

Adversarial Regression with Multiple Learners Liang Tong 1 Sixie Yu 1 Scott Alfeld 2

CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples Honggang Yu 1 ,

Five ways not to fool yourself Tim Harris 23-Jun-18 Five ways not to fool yourself A

Outline Introduction Dynamic Symbolic Execution Binsec/SE Demo CEA - - 2/11 Introduction The

Joey Stanley University of Georgia joeystan@uga.edu @joey_stan joeystanley.com Special thanks

by learning the Distributions of Adversarial Examples Boqing Gong - PowerPoint PPT Presentation

by learning the Distributions of Adversarial Examples Boqing Gong Joint work with Yandong Li, Lijun Li, Liqiang Wang, & Tong Zhang Published in ICML 2019 Intriguing properties of deep neural networks (DNNs) Szegedy, C., Zaremba, W.,

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Outline Power Law Size Distributions Distributions Power Law Size Distributions Overview

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

A Closer Look at Adversarial Examples for Separated Data Kamalika Chaudhuri University of

Adversarial Examples Hanxiao Liu April 2, 2018 1 / 22 Adversarial Examples Inputs to ML

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

? ? ? ? Basic Charts Outline - Distributions &amp; Histograms - Mean, Mode, Average - Chart

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Adversarial Examples in NLP Sameer Singh sameer@uci.edu @sameer_ sameersingh.org What are

Thermometer Encoding: One Hot Way to Resist Adversarial Examples Stanford, 2017-11-16 Aurko Roy*

Physical Adversarial Examples Alex Kurakin Ian Goodfellow Output STOP Machine Learning

Words &amp; their Meaning: Distributional Semantics CMSC 470 Marine Carpuat Slides credit: Dan

OpenStack based telco clouds Todays Panel Sunil Sood Moderator Ericsson VP, IT &amp; Cloud

CSE 610 Special Topics: System Security - Attack and Defense for Binaries Instructor: Dr. Ziming

Adversarial Regression with Multiple Learners Liang Tong 1 Sixie Yu 1 Scott Alfeld 2

CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples Honggang Yu 1 ,

Five ways not to fool yourself Tim Harris 23-Jun-18 Five ways not to fool yourself A

Outline Introduction Dynamic Symbolic Execution Binsec/SE Demo CEA - - 2/11 Introduction The

Joey Stanley University of Georgia joeystan@uga.edu @joey_stan joeystanley.com Special thanks

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

Words & their Meaning: Distributional Semantics CMSC 470 Marine Carpuat Slides credit: Dan

OpenStack based telco clouds Todays Panel Sunil Sood Moderator Ericsson VP, IT & Cloud