Virtual Adversarial Training Buse Gul Atli Aalto University, - PowerPoint PPT Presentation

Virtual Adversarial Training Buse Gul Atli Aalto University, Department of Science buse.atli@aalto.fi May 21, 2019 Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 1 / 27

What We Will Cover 1 Miyato et.al Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning 2018 2 Miyato et. al Distributional Smoothing with Virtual Adversarial Training 2015. Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 2 / 27

Overfitting vs Underfitting Poor design of the model Noise in the training set Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 3 / 27

Regularization Smoothing the output distribution w.r.t spatial/temporal inputs L 1 and L 2 regularization Applying random perturbations to input and hidden layers Droput in NNs. Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 4 / 27

Adversarial Training 1 Adds a noise to the image where the noise is in the adversarial direction Model’s probability of correct classification is reduced in adversarial direction. Still has the same Labeled as a dog label (dog) 1 Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy, Explaining and Harnessing Adversarial Examples, 2015 Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 5 / 27

Adversarial Training Adds a noise to the image where the noise is in the adversarial direction Improves the generalization performance Robustness against adversarial perturbation Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 6 / 27

Adversarial Training L adv ( x l , θ ) := D [ q ( y | x l ) p ( y | x l + r adv , θ )] (1) where r adv : argmax r , � r � 2 ≤ ǫ D [ q ( y | x l ) p ( y | x l + r , θ )] (2) g r adv ≃ ǫ , g = ∇ x l D [ h ( y ; y l ) p ( y | x l , θ )] (3) � g � 2 r adv ≃ ǫ sign ( g ) when norm is L ∞ Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 7 / 27

Virtual Adversarial Training How can we modify the adversarial training loss in Eq 1 when full label information is not available ? Adversarial perturbation intended to change the guess New guess should Unlabeled; model match the old guess guesses it’s proba- (probably dog, bly a dog, maybe maybe stick) stick Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 8 / 27

Virtual Adversarial Training x ∗ denotes both labeled x l or unlabeled x ul samples L adv ( x ∗ , θ ) := D [ q ( y | x ∗ ) p ( y | x ∗ + r adv , θ )] where r adv : argmax r , � r � 2 ≤ ǫ D [ q ( y | x ∗ ) p ( y | x ∗ + r , θ )] Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 9 / 27

Virtual Adversarial Training Replace q ( y | x ) with current estimate of p ( y | x , ˆ θ ) Local Distributional Smoothness (LDS) LDS ( x ∗ , θ ) := D [ p ( y | x ∗ , θ ) p ( y | x ∗ + r adv , θ )] (4) where r adv : argmax r , � r � 2 ≤ ǫ D [ p ( y | x ∗ ) p ( y | x ∗ + r , θ )] (5) Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 10 / 27

Virtual Adversarial Training LDS ( x ∗ , θ ) := D [ p ( y | x ∗ , θ ) p ( y | x ∗ + r adv , θ )] where r adv : argmax r , � r � 2 ≤ ǫ D [ p ( y | x ∗ ) p ( y | x ∗ + r , θ )] (6) 1 � R vadv ( D l , D ul , θ ) := LDS ( x ∗ , θ ) (7) N l + N ul x ∗ ∈D l , D ul Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 11 / 27

Virtual Adversarial Training LDS ( x ∗ , θ ) := D [ P ( y | x ∗ , θ ) p ( y | x ∗ + r adv , θ )] where r adv : argmax r , � r � 2 ≤ ǫ D [ p ( y | x ∗ ) p ( y | x ∗ + r , θ )] 1 � R vadv ( D l , D ul , θ ) := LDS ( x ∗ , θ ) N l + N ul x ∗ ∈D l , D ul Full objective function ℓ ( D l , θ ) + α R vadv ( D l , D ul , θ ) (8) Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 12 / 27

VAT : Fast Approximation Methods for r adv Linear approximation in Eq 3 cannot be performed for LDS D ( r , x ∗ , ˆ θ ) Use second order Taylor approximation, since ∇ r D ( r , x ∗ , ˆ θ ) = 0 when r = 0 θ ) ≃ 1 D ( r , x , ˆ 2 r T H ( X , ˆ θ ) r (9) Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 13 / 27

VAT : Fast Approximation Methods for r adv θ ) ≃ 1 D ( r , x , ˆ 2 r T H ( X , ˆ θ ) r r adv ≃ argmax r { r T H ( x , ˆ θ ) r ; � r � 2 ≤ ǫ } (10) = ǫ u ( x , ˆ θ ) v where v = � v � 2 and u is the first dominant eigenvector Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 14 / 27

VAT : Fast Approximation Methods for r adv O ( n 3 ) for computing eigenvectors of Hessian Use power iteration method d ← Hd (11) Hd ≃ ∇ r D ( r , x , ˆ θ ) | r = ξ d − ∇ r D ( r , x , ˆ θ ) | r =0 ξ (12) = ∇ r D ( r , x , ˆ θ ) | r = ξ d ξ d d ← ∇ r D ( r , x , ˆ θ ) | r = ξ d (13) Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 15 / 27

VAT : Fast Approximation Methods for r adv g r adv ≃ ǫ (14) � g � 2 g = ∇ r D [ p ( y | x , ˆ θ ) , p ( y | x + r , ˆ θ )] | r = ξ d (15) KL divergence for the choice of D Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 16 / 27

VAT Example VAT forces the model to be smooth around the points with large LDS values. Model predicts the same label for the set of points that belong to the same cluster after 100 updates Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 17 / 27

VAT vs. Other Regularization Methods Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 18 / 27

Random Perturbation Training (RPT) and Conditional Entropy (VAT+EntMin) VAT can be written as 1 E rk [ D [ p ( y | x , ˆ R ( K ) ( θ, D l , D ul ) := � θ ) p ( y | x + r K , θ )]] N l + N ul x ∈D l , D ul (16) R 0 : RPT (Smooths the function isotropically) Conditional entropy: R cent = H 1 (17) � � = − p ( y | x , θ ) log p ( y | x , θ ) N l + N ul y x ∈D l , D ul Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 19 / 27

VAT Performance on Semi-Supervised Learning VAT and data augmentation can be used together With augmentation,(translation and horizontal flip) Without augmentation, (DGM=Deep Generative Models, FM=feature matching) Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 20 / 27

Effects of Perturbation Size ǫ and Regularization Coefficient α For small ǫ , the hyper-parameter α plays a similar role as ǫ Parameter search for ǫ over the search for α max r { D ( r , x , θ ); ≃ � r � 2 ≤ ǫ } ≃ max r { 1 2 r T H ( x , θ ) r ; � r � 2 ≤ ǫ } (18) 1 2 ǫ 2 λ 1 ( x , θ ) Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 21 / 27

Effects of Perturbation Size ǫ Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 22 / 27

Effect of the Number of the Power Iterations K Power iteration method converges slowly if there is an eigenvalue close in magnitude to the dominant eigenvalue. Might depend on the spectrum of the Hessian matrix Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 23 / 27

Robustness of the VAT-trained Model Against Perturbed Images VAT-trained model behaves more natural than without VAT model. Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 24 / 27

VAT: Contributions Applicability to semi supervised learning tasks Applicability to any parametric models that we can calculate gradients w.r.t input and model parameters Small number of hyper-parameters Increase robustness against adversarial examples, acts more natural in different noise levels Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 25 / 27

More info Performances on semi-supervised image classification benchmarks Adversarial training methods for semi-supervised text classifications Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 26 / 27

Implementation Semi-supervised learning with VAT on SVHN 1000 labeled samples, 72,257 unlabeled samples, no data augmentation batch size for cross entropy loss:32, batch size for LDS: 128 ∼ 48,000 updates in training ADAM optimization, base learning rate = 0 . 001, linearly decayed the rate over the last 16,000 updates α = 1, ǫ = 2 . 5, ξ = 1 e − 6 Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 27 / 27

Virtual Adversarial Training Buse Gul Atli Aalto University, - PowerPoint PPT Presentation

Virtual Adversarial Training Buse Gul Atli Aalto University, Department of Science buse.atli@aalto.fi May 21, 2019 Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 1 / 27 What We Will Cover 1 Miyato et.al Virtual

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Friendly Adversarial Training: Attacks Which Do Not Kill Training Make Adversarial Learning

A-NICE-MC Jiaming Song 1. Motivation 2. Notations and Problem Setup 3. Adversarial Training for

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

GROUPS Virtual Group Topics Overview of Virtual Groups Participating as a Virtual Group in

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Guest

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist

Adversarial Training Attacks on Deep Networks and Generative Adversarial Networks Erkut Erdem

Introduction to Generative Adversarial Networks Ian Goodfellow, OpenAI Research Scientist NIPS

KLIN LINTBERG TBERG & & WAY AUTOMOT TOMOTIVE IVE IS IS THE THE LE LEADI DING G

Understanding deeply and improving VAT Dongha Kim and Yongchan Choi Speaker : Dongha Kim

Energy Efficiency in Agriculture Tax Aspects Presented By: Declan McEvoy, Head of Tax Who

Proposal for a Standard VAT return (SVD) Proposal for a COUNCIL DIRECTIVE amending Directive

ROMANIAN COURT OF ACCOUNTS Presentation Annualy financial audit missions, control activities and

Launch of 2015 Tax Statistics 8 th edition 10 November 2015 Presenters Dr Randall Carolissen

Date: 19 May 2010 Trofeo Maserati GranTurismo MC - Monza (I) - Presentation of Round 1 The First

Lawyers Aaron Levine Business Law Reform Port Vila 3

Virtual Adversarial Training Buse Gul Atli Aalto University, - PowerPoint PPT Presentation

Virtual Adversarial Training Buse Gul Atli Aalto University, Department of Science buse.atli@aalto.fi May 21, 2019 Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 1 / 27 What We Will Cover 1 Miyato et.al Virtual

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Friendly Adversarial Training: Attacks Which Do Not Kill Training Make Adversarial Learning

A-NICE-MC Jiaming Song 1. Motivation 2. Notations and Problem Setup 3. Adversarial Training for

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

GROUPS Virtual Group Topics Overview of Virtual Groups Participating as a Virtual Group in

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Guest

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist

Adversarial Training Attacks on Deep Networks and Generative Adversarial Networks Erkut Erdem

Introduction to Generative Adversarial Networks Ian Goodfellow, OpenAI Research Scientist NIPS

KLIN LINTBERG TBERG &amp; &amp; WAY AUTOMOT TOMOTIVE IVE IS IS THE THE LE LEADI DING G

Understanding deeply and improving VAT Dongha Kim and Yongchan Choi Speaker : Dongha Kim

Energy Efficiency in Agriculture Tax Aspects Presented By: Declan McEvoy, Head of Tax Who

Proposal for a Standard VAT return (SVD) Proposal for a COUNCIL DIRECTIVE amending Directive

ROMANIAN COURT OF ACCOUNTS Presentation Annualy financial audit missions, control activities and

Launch of 2015 Tax Statistics 8 th edition 10 November 2015 Presenters Dr Randall Carolissen

Date: 19 May 2010 Trofeo Maserati GranTurismo MC - Monza (I) - Presentation of Round 1 The First

Lawyers Aaron Levine Business Law Reform Port Vila 3

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

KLIN LINTBERG TBERG & & WAY AUTOMOT TOMOTIVE IVE IS IS THE THE LE LEADI DING G