Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu - PowerPoint PPT Presentation

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work with Allen Wang and Yaoliang Yu K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 1 / 18

Adversarial Examples Adversarial examples: (Goodfellow et al. 2015) Generating adversarial examples: maximize ℓ ( f ( x adv ) , y ) x adv subject to x adv ≈ x K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 2 / 18

How “Similar” Is Similar? How to quantify x adv ≈ x ? � x − x adv � p ≤ ǫ (Szegedy et al. 2014) point-wise function (Laidlaw et al. 2019) geometric transformation (Engstrom et al. 2019) Wasserstein distance (Wong et al. 2019) ... Our contributions stronger and faster Wasserstein adversarial attacks higher robust accuracy using adversarial training K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 3 / 18

What is Wasserstein Distance? Π ≥ 0 � Π , C � s . t . Π 1 = x , Π ⊤ 1 = z W ( x , z ) = min x ∈ R n and z ∈ R n : input images Π ∈ R n × n : transportation matrix C ∈ R n × n : transportation cost cost Π ij × C ij z x K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 4 / 18

Applications across Different Domains (Arjovsky et al. 2017; Rabin et al. 2014; Solomon et al. 2015) K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 5 / 18

Why Wasserstein Distance? Captures geometry in image space, e.g . translation, rotation ǫ = 0 . 05 ǫ = 0 . 10 ǫ = 0 . 20 ǫ = 0 . 40 ℓ ∞ ǫ = 0 . 50 ǫ = 1 . 00 ǫ = 2 . 00 ǫ = 4 . 00 ℓ 2 ǫ = 0 . 05 ǫ = 0 . 10 ǫ = 0 . 20 ǫ = 0 . 40 Wasserstein predict: 4 predict: 9

Why Wasserstein Distance? Captures geometry in image space, e.g . translation, rotation ǫ = 0 . 05 ǫ = 0 . 10 ǫ = 0 . 20 ǫ = 0 . 40 ℓ ∞ ǫ = 0 . 50 ǫ = 1 . 00 ǫ = 2 . 00 ǫ = 4 . 00 ℓ 2 ǫ = 0 . 05 ǫ = 0 . 10 ǫ = 0 . 20 ǫ = 0 . 40 Wasserstein predict: 4 predict: 9 K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 6 / 18

Computing Wasserstein Adversarial Examples Search for adversarial examples: ℓ ( x adv ) maximize x adv subject to W ( x , x adv ) ≤ ǫ K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 7 / 18

Computing Wasserstein Adversarial Examples Search for adversarial examples: ℓ ( x adv ) maximize x adv subject to W ( x , x adv ) ≤ ǫ Alternatively, search for transportation matrix: ℓ (Π ⊤ 1 ) maximize Π ≥ 0 subject to Π 1 = x , � Π , C � ≤ ǫ Then, recover adversarial examples: x adv = Π ⊤ 1 K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 7 / 18

Optimization in Transportation Matrix K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 8 / 18

Optimization in Transportation Matrix ∇ Π ℓ (Π) ǫ (a) projected gradient 1 2 � Π − G � 2 minimize F Π ≥ 0 subject to Π 1 = x , � Π , C � ≤ ǫ K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 8 / 18

Optimization in Transportation Matrix ∇ Π ℓ (Π) ǫ (a) projected gradient (b) Frank-Wolfe (Jaggi 2011) 1 2 � Π − G � 2 � Π , H � minimize minimize F Π ≥ 0 Π ≥ 0 subject to Π 1 = x , � Π , C � ≤ ǫ subject to Π 1 = x , � Π , C � ≤ ǫ K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 8 / 18

Optimization in Transportation Matrix ∇ Π ℓ (Π) ǫ (a) projected gradient (b) Frank-Wolfe (Jaggi 2011) 1 2 � Π − G � 2 � Π , H � minimize minimize F Π ≥ 0 Π ≥ 0 subject to Π 1 = x , � Π , C � ≤ ǫ subject to Π 1 = x , � Π , C � ≤ ǫ For n dimensional images, Π has n 2 variables... K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 8 / 18

Solve Projection in PGD 2 � Π − G � 2 1 minimize F Π ≥ 0 subject to Π 1 = x , � Π , C � ≤ ǫ K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 9 / 18

Solve Projection in PGD 2 � Π − G � 2 1 minimize F Π ≥ 0 subject to Π 1 = x , � Π , C � ≤ ǫ The Lagrange dual can be simplified as a univariate problem g ( λ ) maximize λ ≥ 0 K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 9 / 18

Solve Projection in PGD 2 � Π − G � 2 1 minimize F Π ≥ 0 subject to Π 1 = x , � Π , C � ≤ ǫ The Lagrange dual can be simplified as a univariate problem g ( λ ) maximize λ ≥ 0 No closed-form expression... K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 9 / 18

Solve Projection in PGD 2 � Π − G � 2 1 minimize F Π ≥ 0 subject to Π 1 = x , � Π , C � ≤ ǫ The Lagrange dual can be simplified as a univariate problem g ( λ ) maximize λ ≥ 0 No closed-form expression... But g ′ ( λ ) can be evaluated in O ( n 2 log n ) time Proposition 0 ≤ λ ⋆ ≤ 2 � vec ( G ) � ∞ + � x � ∞ min i � = j { C ij } K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 9 / 18

Bisection on the Dual maximize g ( λ ) λ ≥ 0 Converge to high precision ≤ 20 iterations in practice. g ( λ ) λ

Bisection on the Dual maximize g ( λ ) λ ≥ 0 Converge to high precision ≤ 20 iterations in practice. g ( λ ) λ ⋆ λ

Bisection on the Dual maximize g ( λ ) λ ≥ 0 Converge to high precision ≤ 20 iterations in practice. g ( λ ) 2 � vec ( G ) � ∞ + � x � ∞ λ ⋆ min i � = j { C ij } λ

Bisection on the Dual maximize g ( λ ) λ ≥ 0 Converge to high precision ≤ 20 iterations in practice. g ( λ ) 2 � vec ( G ) � ∞ + � x � ∞ λ ⋆ min i � = j { C ij } λ 0 K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 10 / 18

Solve Linear Minimization in Frank-Wolfe � Π , H � minimize Π ≥ 0 subject to Π 1 = x , � Π , C � ≤ ǫ K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 11 / 18

Solve Linear Minimization in Frank-Wolfe � Π , H � minimize Π ≥ 0 subject to Π 1 = x , � Π , C � ≤ ǫ The Lagrange dual can be simplified as a univariate problem maximize g ( λ ) λ ≥ 0 K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 11 / 18

Solve Linear Minimization in Frank-Wolfe � Π , H � minimize Π ≥ 0 subject to Π 1 = x , � Π , C � ≤ ǫ The Lagrange dual can be simplified as a univariate problem maximize g ( λ ) λ ≥ 0 Bound on the optimum: 0 ≤ λ ⋆ ≤ 2 � vec ( H ) � ∞ min i � = j { C ij } K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 11 / 18

Solve Linear Minimization in Frank-Wolfe � Π , H � minimize Π ≥ 0 subject to Π 1 = x , � Π , C � ≤ ǫ The Lagrange dual can be simplified as a univariate problem maximize g ( λ ) λ ≥ 0 Bound on the optimum: 0 ≤ λ ⋆ ≤ 2 � vec ( H ) � ∞ min i � = j { C ij } Does not work... ◮ difficult to recover primal solution ◮ severe numerical instability K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 11 / 18

Entropic Regularization n n � � � Π , H � + γ Π ij log Π ij minimize Π ≥ 0 i =1 j =1 subject to Π 1 = x , � Π , C � ≤ ǫ K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 12 / 18

Entropic Regularization n n � � � Π , H � + γ Π ij log Π ij minimize Π ≥ 0 i =1 j =1 subject to Π 1 = x , � Π , C � ≤ ǫ Closed-form expression to recover primal solution K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 12 / 18

Entropic Regularization n n � � � Π , H � + γ Π ij log Π ij minimize Π ≥ 0 i =1 j =1 subject to Π 1 = x , � Π , C � ≤ ǫ Closed-form expression to recover primal solution Entropic regularization introduces approximation error K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 12 / 18

Entropic Regularization n n � � � Π , H � + γ Π ij log Π ij minimize Π ≥ 0 i =1 j =1 subject to Π 1 = x , � Π , C � ≤ ǫ Closed-form expression to recover primal solution Entropic regularization introduces approximation error But the approximation error is guaranteed to be small K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 12 / 18

Exploit Sparsity Local transportation constraint (Wong et al. 2019) ⇒ structured sparsity in Π Per iteration cost is reduced to O ( n ) by exploiting sparsity K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 13 / 18

Comparison adversarial accuracy on CIFAR-10 (standard training) 80 60 40 20 0 ǫ = 0 . 001 0.002 0.003 0.004 0.005 Wong et al. (2019) Dual Proj.(ours) Dual LMO(ours) K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 14 / 18

Comparison adversarial accuracy on CIFAR-10 (standard training) 80 60 40 20 0 ǫ = 0 . 001 0.002 0.003 0.004 0.005 Wong et al. (2019) Dual Proj.(ours) Dual LMO(ours) time per iteration in ms iterations 80 60 20 40 10 20 0 0 K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 14 / 18

Entropic Regularization Reflects Shapes

Entropic Regularization Reflects Shapes K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 15 / 18

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu - PowerPoint PPT Presentation

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work with Allen Wang and Yaoliang Yu K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 1 / 18 Adversarial Examples Adversarial

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

Generative Adversarial Networks, Wasserstein Distance, and Adversarial Loss Zhiyu Min Alibaba

Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters Pavel Dvurechensky,

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

Friendly Adversarial Training: Attacks Which Do Not Kill Training Make Adversarial Learning

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Adversarial Training Attacks on Deep Networks and Generative Adversarial Networks Erkut Erdem

Wasserstein Adversarial Examples via Projected Sinkhorn Iterations ICML 19 Eric Wong 1 Frank R.

A variational finite volume scheme for Wasserstein gradient flows es 1 , T. O. Gallou et 2 , G.

On the Complexity of Approximating Wasserstein Barycenters Alexey Kroshnin, Darina Dvinskikh,

Stochastic Optimization for Regularized Wasserstein Estimators ICML 2020 Francis Bach Quentin

Wasserstein barycenters over Riemannian manifolds Brendan Pass (joint work with Y.H. Kim (UBC))

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Read and delete this slide Alongside the guidance and resources available at psnc.org.uk/flu,

Target groups Phase 1: Potential stakeholders to improve specifications and ensure maximum

The Carbon Cycle: Budgets, Trends, and Lessons from Southern Hemisphere Measurements A. Modelling

26 th World Gas Conference 1 5 June 2015, Paris, France CURRENT BIOGAS PRODUCTION AND

Assessing Impacts of Training Opportunities in the Safety Net For Dental Students and Residents

GAIA Infrastructure Capital Limited Results for the year ended 28 February 2017 May 2017

Visions of our Profession Overview 1. Situation in your country 2. Your expectations as a

POST GRADUATE DIPLOMA IN AUDIO PROGRAMME PRODUCTION (PGDAPP) Term-End Examination December,