Friendly Adversarial Training: Attacks Which Do Not Kill Training - PowerPoint PPT Presentation

Friendly Adversarial Training: Attacks Which Do Not Kill Training Make Adversarial Learning Stronger Jingfeng Zhang 1 * , Xilie Xu 2* , Bo Han 34 , Gang Niu 4 , Lichen Cui 5 , Masashi Sugiyama 46 , and Mohan Kankanhalli 1 1 Department of Computer Science, National University of Singapore 2 Taishan Colleague, Shandong University 3 Department of Computer Science, Hong Kong Baptist University 4 RIKEN Center for Advanced Intelligence Project 5 School of Software & C-FAIR, Shandong University 6 Graduate School of Frontier Sciences, The University of Tokyo Virtual ICML 2020 July, 2020

https://blog.openai.com/adversarial-example-research/ Purpose of adversarial learning • Adversarial data can easily fool the standard trained classifier. • Adversarial training so far is the most effective method for obtaining the adversarial robustness of the trained classifier. Minimizing ! "#$ Decision boundary Training data Purpose 1: correctly classify the data. Purpose 2: make the decision boundary thick so that no data is encouraged to fall inside the decision boundary.

Conventional formulation of adversarial training • Minimax formulation: ' , ( ∑ *+' min ℓ(𝑔 (0 𝑦 * ) , 𝑧 * ) , where 0 𝑦 * = 𝑏𝑠𝑕𝑛𝑏𝑦 :∈;(: < ) ℓ(𝑔(= 𝑦), 𝑧 * ) $∈𝓖 Outer minimization Inner maximization • Projected gradient descent (PGD) – adversarial training approximately realizes this minimax formulation. • PGD formulates the problem of finding the most adversarial data as a constrained optimization problem. Namely, given a starting point 𝑦 (>) ∈ 𝒴 and step size 𝛽 , PGD works as followed: 𝑦 (AB') = Π ; : D 𝑦 A + 𝛽 𝑡𝑗𝑕𝑜 ∇ : J ℓ 𝑔 K 𝑦 A , 𝑧 , t ∈ 𝑂

The minimax formulation is pessimistic. • Many existing studies found the minimax-based adversarial training causes the severe degradation of the natural generalization. Why? The adversarial data generated by PGD The cross-over mixture problem! Is the minimax formulation suitable to the adversarial training?

Min-min formulation for the adversarial training • The outer minimization keeps the same. Instead of generating adversarial data 0 𝑦 * via inner maximization, we generate 0 𝑦 * as follows: 𝑦 * = arg 𝑛𝑗𝑜 = 0 :∈;(: < ) ℓ(𝑔 = 𝑦 , 𝑧 * ) s.t. ℓ 𝑔 = 𝑦 , 𝑧 * − min R∈ 𝒵 ℓ 𝑔 = 𝑦 , 𝑧 * ≥ 𝜍 • The constraint firstly ensures 𝑧 * ≠ arg min R∈𝒵 ℓ 𝑔 = 𝑦 , 𝑧 * or = 𝑦 is misclassified, and secondly ensures the wrong prediction of = 𝑦 is better than the desired prediction 𝑧 * by at least the margin 𝜍 in terms of the loss value.

Adversarial data by min-min and minimax formulation

A tight upper bound on the adversarial risk ≔ 𝔽 ],^∈_ 𝟚{∃ 𝑌 d ∈ 𝐶 𝑌 : 𝑔 𝑌 d ≠ 𝑍} The adversarial risk 𝕾 XYZ 𝑔 Zhang, Hongyang, et al. "Theoretically principled trade- off between robustness and accuracy.” ICML 2019 Minimizing the adversarial risk captures the two purposes of the adversarial training: (a) correctly classify the natural data and (b) make the decision boundary thick.

Realization of our min-min formulation – friendly adversarial training (FAT) Natural data Step #1 Step #3 Step #6 Step #8 Step #10 Conventional PGD generating most adversarial data Natural data Step #1 Step #3 Step #6 Step #8 Step #10 Early stopped PGD (ours) generating friendly adversarial data Friendly adversarial training (FAT) employs the friendly adversarial data generated by early stopped PGD to update the model.

Benefits (a): Alleviate the cross-over mixture problem • In the classification of the CIFAR-10 dataset, the cross-over mixture problem may not appear in the input space, but in the middle layers. Natural data Most adversarial data Friendly adversarial data (not mixed) generated by generated by conventional PGD early stopped PGD (not (significantly mixed) significantly mixed)

Benefits (b): FAT is computationally efficient. We report the average backward propagations (BPs) per epoch over training process. Dashed line is existing adversarial training based on conventional PGD. Solid lines are friendly adversarial trainings based on early stopped PGD.

Benefits (c): FAT can enable larger defense parameter 𝜗 AXj*, For CIFAR-10 dataset, we adversarially train deep neural networks with 𝜗 AXj*, ∈ 0.03, 0.15 , and evaluate each robust model with 6 evaluation metrics (1 natural generalization metric + 5 robustness metrics) The purple line represents existing adversarial training. The red, orange and green lines represent our friendly adversarial training with different configurations.

Benefits (d): Benchmarking on Wide ResNet. [14] Wang, Yisen, et al. "On the convergence and robustness of adversarial training.” ICML 2019 [13] Zhang, Hongyang, et al. "Theoretically principled trade-off between robustness and accuracy.” ICML 2019 FAT can improve standard test accuracy while maintain the superior adversarial robustness.

Conclusion and future work • We propose a novel min-min formulation for adversarial training. • Friendly adversarial training (FAT) to realize this min-min formulation. • FAT helps alleviate the problem of cross-over mixture. • FAT is computationally efficient. • FAT can enable larger perturbation bounds 𝜗 AXj*, . • FAT can achieve competitive performance on the large capacity networks. • Besides FAT, one of the potential future work is to find a better realization of our min-min formulation.

Thanks for your interest in our work.

Friendly Adversarial Training: Attacks Which Do Not Kill Training - PowerPoint PPT Presentation

Friendly Adversarial Training: Attacks Which Do Not Kill Training Make Adversarial Learning Stronger Jingfeng Zhang 1 * , Xilie Xu 2* , Bo Han 34 , Gang Niu 4 , Lichen Cui 5 , Masashi Sugiyama 46 , and Mohan Kankanhalli 1 1 Department of Computer

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Adversarial Training Attacks on Deep Networks and Generative Adversarial Networks Erkut Erdem

kill Run default signal handler! Process Process A B kill signal(SIGINT, func) Process

Friendly Communities Sarah Prescott and Jude Woods Time to Shine Leeds Older Peoples Forum

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

The Plot to Kill Jesus (Wednesday-Thursday) The Plot to Kill Jesus (Wednesday-Thursday) We

A-NICE-MC Jiaming Song 1. Motivation 2. Notations and Problem Setup 3. Adversarial Training for

PD Targets for Various Infection Types: Stasis vs. 1-Log Kill vs. 2 Log Kill G.L. Drusano, M.D.

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Overview What is Adversarial Attack? Why should we care? How does it work? Real

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Guest

ECE590-03 Enterprise Storage Architecture Fall 2016 File Systems Tyler Bletsch Duke University

Adventures with LLVM in a magical land where pointers are not integers David Chisnall Approved

Disclosure in Offic e Pra c tic e No re le va nt fina nc ia l Robe rt Ba ron, MD MS re la

File Systems Main Points File layout Directory layout

A Universal Approach to Data Center Network Design Aditya Akella, Theo Benson, Bala

Filesystem considerations for embedded devices ELC2015 03/25/15 Tristan Lelong Senior embedded

Thank you for joining in! Sit back and relax.. Well get started at 5:30pm Please mute

Mathematical approximation Jo Hardin Professor, Pomona College DataCamp Inference for Linear

Friendly Adversarial Training: Attacks Which Do Not Kill Training - PowerPoint PPT Presentation

Friendly Adversarial Training: Attacks Which Do Not Kill Training Make Adversarial Learning Stronger Jingfeng Zhang 1 * , Xilie Xu 2* , Bo Han 34 , Gang Niu 4 , Lichen Cui 5 , Masashi Sugiyama 46 , and Mohan Kankanhalli 1 1 Department of Computer

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Adversarial Training Attacks on Deep Networks and Generative Adversarial Networks Erkut Erdem

kill Run default signal handler! Process Process A B kill signal(SIGINT, func) Process

Friendly Communities Sarah Prescott and Jude Woods Time to Shine Leeds Older Peoples Forum

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

The Plot to Kill Jesus (Wednesday-Thursday) The Plot to Kill Jesus (Wednesday-Thursday) We

A-NICE-MC Jiaming Song 1. Motivation 2. Notations and Problem Setup 3. Adversarial Training for

PD Targets for Various Infection Types: Stasis vs. 1-Log Kill vs. 2 Log Kill G.L. Drusano, M.D.

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Overview What is Adversarial Attack? Why should we care? How does it work? Real

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Guest

ECE590-03 Enterprise Storage Architecture Fall 2016 File Systems Tyler Bletsch Duke University

Adventures with LLVM in a magical land where pointers are not integers David Chisnall Approved

Disclosure in Offic e Pra c tic e No re le va nt fina nc ia l Robe rt Ba ron, MD MS re la

File Systems Main Points File layout Directory layout

A Universal Approach to Data Center Network Design Aditya Akella, Theo Benson, Bala

Filesystem considerations for embedded devices ELC2015 03/25/15 Tristan Lelong Senior embedded

Thank you for joining in! Sit back and relax.. Well get started at 5:30pm Please mute

Mathematical approximation Jo Hardin Professor, Pomona College DataCamp Inference for Linear

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin