virtual adversarial training
play

Virtual Adversarial Training Buse Gul Atli Aalto University, - PowerPoint PPT Presentation

Virtual Adversarial Training Buse Gul Atli Aalto University, Department of Science buse.atli@aalto.fi May 21, 2019 Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 1 / 27 What We Will Cover 1 Miyato et.al Virtual


  1. Virtual Adversarial Training Buse Gul Atli Aalto University, Department of Science buse.atli@aalto.fi May 21, 2019 Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 1 / 27

  2. What We Will Cover 1 Miyato et.al Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning 2018 2 Miyato et. al Distributional Smoothing with Virtual Adversarial Training 2015. Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 2 / 27

  3. Overfitting vs Underfitting Poor design of the model Noise in the training set Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 3 / 27

  4. Regularization Smoothing the output distribution w.r.t spatial/temporal inputs L 1 and L 2 regularization Applying random perturbations to input and hidden layers Droput in NNs. Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 4 / 27

  5. Adversarial Training 1 Adds a noise to the image where the noise is in the adversarial direction Model’s probability of correct classification is reduced in adversarial direction. Still has the same Labeled as a dog label (dog) 1 Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy, Explaining and Harnessing Adversarial Examples, 2015 Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 5 / 27

  6. Adversarial Training Adds a noise to the image where the noise is in the adversarial direction Improves the generalization performance Robustness against adversarial perturbation Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 6 / 27

  7. Adversarial Training L adv ( x l , θ ) := D [ q ( y | x l ) p ( y | x l + r adv , θ )] (1) where r adv : argmax r , � r � 2 ≤ ǫ D [ q ( y | x l ) p ( y | x l + r , θ )] (2) g r adv ≃ ǫ , g = ∇ x l D [ h ( y ; y l ) p ( y | x l , θ )] (3) � g � 2 r adv ≃ ǫ sign ( g ) when norm is L ∞ Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 7 / 27

  8. Virtual Adversarial Training How can we modify the adversarial training loss in Eq 1 when full label information is not available ? Adversarial perturbation intended to change the guess New guess should Unlabeled; model match the old guess guesses it’s proba- (probably dog, bly a dog, maybe maybe stick) stick Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 8 / 27

  9. Virtual Adversarial Training x ∗ denotes both labeled x l or unlabeled x ul samples L adv ( x ∗ , θ ) := D [ q ( y | x ∗ ) p ( y | x ∗ + r adv , θ )] where r adv : argmax r , � r � 2 ≤ ǫ D [ q ( y | x ∗ ) p ( y | x ∗ + r , θ )] Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 9 / 27

  10. Virtual Adversarial Training Replace q ( y | x ) with current estimate of p ( y | x , ˆ θ ) Local Distributional Smoothness (LDS) LDS ( x ∗ , θ ) := D [ p ( y | x ∗ , θ ) p ( y | x ∗ + r adv , θ )] (4) where r adv : argmax r , � r � 2 ≤ ǫ D [ p ( y | x ∗ ) p ( y | x ∗ + r , θ )] (5) Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 10 / 27

  11. Virtual Adversarial Training LDS ( x ∗ , θ ) := D [ p ( y | x ∗ , θ ) p ( y | x ∗ + r adv , θ )] where r adv : argmax r , � r � 2 ≤ ǫ D [ p ( y | x ∗ ) p ( y | x ∗ + r , θ )] (6) 1 � R vadv ( D l , D ul , θ ) := LDS ( x ∗ , θ ) (7) N l + N ul x ∗ ∈D l , D ul Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 11 / 27

  12. Virtual Adversarial Training LDS ( x ∗ , θ ) := D [ P ( y | x ∗ , θ ) p ( y | x ∗ + r adv , θ )] where r adv : argmax r , � r � 2 ≤ ǫ D [ p ( y | x ∗ ) p ( y | x ∗ + r , θ )] 1 � R vadv ( D l , D ul , θ ) := LDS ( x ∗ , θ ) N l + N ul x ∗ ∈D l , D ul Full objective function ℓ ( D l , θ ) + α R vadv ( D l , D ul , θ ) (8) Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 12 / 27

  13. VAT : Fast Approximation Methods for r adv Linear approximation in Eq 3 cannot be performed for LDS D ( r , x ∗ , ˆ θ ) Use second order Taylor approximation, since ∇ r D ( r , x ∗ , ˆ θ ) = 0 when r = 0 θ ) ≃ 1 D ( r , x , ˆ 2 r T H ( X , ˆ θ ) r (9) Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 13 / 27

  14. VAT : Fast Approximation Methods for r adv θ ) ≃ 1 D ( r , x , ˆ 2 r T H ( X , ˆ θ ) r r adv ≃ argmax r { r T H ( x , ˆ θ ) r ; � r � 2 ≤ ǫ } (10) = ǫ u ( x , ˆ θ ) v where v = � v � 2 and u is the first dominant eigenvector Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 14 / 27

  15. VAT : Fast Approximation Methods for r adv O ( n 3 ) for computing eigenvectors of Hessian Use power iteration method d ← Hd (11) Hd ≃ ∇ r D ( r , x , ˆ θ ) | r = ξ d − ∇ r D ( r , x , ˆ θ ) | r =0 ξ (12) = ∇ r D ( r , x , ˆ θ ) | r = ξ d ξ d d ← ∇ r D ( r , x , ˆ θ ) | r = ξ d (13) Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 15 / 27

  16. VAT : Fast Approximation Methods for r adv g r adv ≃ ǫ (14) � g � 2 g = ∇ r D [ p ( y | x , ˆ θ ) , p ( y | x + r , ˆ θ )] | r = ξ d (15) KL divergence for the choice of D Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 16 / 27

  17. VAT Example VAT forces the model to be smooth around the points with large LDS values. Model predicts the same label for the set of points that belong to the same cluster after 100 updates Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 17 / 27

  18. VAT vs. Other Regularization Methods Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 18 / 27

  19. Random Perturbation Training (RPT) and Conditional Entropy (VAT+EntMin) VAT can be written as 1 E rk [ D [ p ( y | x , ˆ R ( K ) ( θ, D l , D ul ) := � θ ) p ( y | x + r K , θ )]] N l + N ul x ∈D l , D ul (16) R 0 : RPT (Smooths the function isotropically) Conditional entropy: R cent = H 1 (17) � � = − p ( y | x , θ ) log p ( y | x , θ ) N l + N ul y x ∈D l , D ul Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 19 / 27

  20. VAT Performance on Semi-Supervised Learning VAT and data augmentation can be used together With augmentation,(translation and horizontal flip) Without augmentation, (DGM=Deep Generative Models, FM=feature matching) Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 20 / 27

  21. Effects of Perturbation Size ǫ and Regularization Coefficient α For small ǫ , the hyper-parameter α plays a similar role as ǫ Parameter search for ǫ over the search for α max r { D ( r , x , θ ); ≃ � r � 2 ≤ ǫ } ≃ max r { 1 2 r T H ( x , θ ) r ; � r � 2 ≤ ǫ } (18) 1 2 ǫ 2 λ 1 ( x , θ ) Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 21 / 27

  22. Effects of Perturbation Size ǫ Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 22 / 27

  23. Effect of the Number of the Power Iterations K Power iteration method converges slowly if there is an eigenvalue close in magnitude to the dominant eigenvalue. Might depend on the spectrum of the Hessian matrix Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 23 / 27

  24. Robustness of the VAT-trained Model Against Perturbed Images VAT-trained model behaves more natural than without VAT model. Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 24 / 27

  25. VAT: Contributions Applicability to semi supervised learning tasks Applicability to any parametric models that we can calculate gradients w.r.t input and model parameters Small number of hyper-parameters Increase robustness against adversarial examples, acts more natural in different noise levels Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 25 / 27

  26. More info Performances on semi-supervised image classification benchmarks Adversarial training methods for semi-supervised text classifications Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 26 / 27

  27. Implementation Semi-supervised learning with VAT on SVHN 1000 labeled samples, 72,257 unlabeled samples, no data augmentation batch size for cross entropy loss:32, batch size for LDS: 128 ∼ 48,000 updates in training ADAM optimization, base learning rate = 0 . 001, linearly decayed the rate over the last 16,000 updates α = 1, ǫ = 2 . 5, ξ = 1 e − 6 Buse Gul Atli (Aalto University) Virtual Adversarial Training May 21, 2019 27 / 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend