Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu - - PowerPoint PPT Presentation

stronger and faster wasserstein adversarial attacks
SMART_READER_LITE
LIVE PREVIEW

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu - - PowerPoint PPT Presentation

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work with Allen Wang and Yaoliang Yu K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 1 / 18 Adversarial Examples Adversarial


slide-1
SLIDE 1

Stronger and Faster Wasserstein Adversarial Attacks

Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work with Allen Wang and Yaoliang Yu

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 1 / 18

slide-2
SLIDE 2

Adversarial Examples

Adversarial examples:

(Goodfellow et al. 2015)

Generating adversarial examples: maximize

xadv

ℓ(f (xadv), y) subject to xadv ≈ x

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 2 / 18

slide-3
SLIDE 3

How “Similar” Is Similar?

How to quantify xadv ≈ x? x − xadvp ≤ ǫ (Szegedy et al. 2014) point-wise function (Laidlaw et al. 2019) geometric transformation (Engstrom et al. 2019) Wasserstein distance (Wong et al. 2019) ... Our contributions stronger and faster Wasserstein adversarial attacks higher robust accuracy using adversarial training

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 3 / 18

slide-4
SLIDE 4

What is Wasserstein Distance?

W(x, z) = min

Π≥0 Π, C s.t. Π1 = x, Π⊤1 = z

x ∈ Rn and z ∈ Rn: input images Π ∈ Rn×n: transportation matrix C ∈ Rn×n: transportation cost x z cost Πij × Cij

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 4 / 18

slide-5
SLIDE 5

Applications across Different Domains

(Arjovsky et al. 2017; Rabin et al. 2014; Solomon et al. 2015)

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 5 / 18

slide-6
SLIDE 6

Why Wasserstein Distance?

Captures geometry in image space, e.g. translation, rotation

ǫ = 0.05 ǫ = 0.10 ǫ = 0.20 ǫ = 0.40 ǫ = 0.50 ǫ = 1.00 ǫ = 2.00 ǫ = 4.00 ǫ = 0.05 ǫ = 0.10 ǫ = 0.20 ǫ = 0.40

ℓ∞ ℓ2

Wasserstein

predict: 4 predict: 9

slide-7
SLIDE 7

Why Wasserstein Distance?

Captures geometry in image space, e.g. translation, rotation

ǫ = 0.05 ǫ = 0.10 ǫ = 0.20 ǫ = 0.40 ǫ = 0.50 ǫ = 1.00 ǫ = 2.00 ǫ = 4.00 ǫ = 0.05 ǫ = 0.10 ǫ = 0.20 ǫ = 0.40

ℓ∞ ℓ2

Wasserstein

predict: 4 predict: 9

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 6 / 18

slide-8
SLIDE 8

Computing Wasserstein Adversarial Examples

Search for adversarial examples: maximize

xadv

ℓ(xadv) subject to W (x, xadv) ≤ ǫ

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 7 / 18

slide-9
SLIDE 9

Computing Wasserstein Adversarial Examples

Search for adversarial examples: maximize

xadv

ℓ(xadv) subject to W (x, xadv) ≤ ǫ Alternatively, search for transportation matrix: maximize

Π≥0

ℓ(Π⊤1) subject to Π1 = x, Π, C ≤ ǫ Then, recover adversarial examples: xadv = Π⊤1

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 7 / 18

slide-10
SLIDE 10

Optimization in Transportation Matrix

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 8 / 18

slide-11
SLIDE 11

Optimization in Transportation Matrix

ǫ ∇Πℓ (Π)

(a) projected gradient

minimize

Π≥0 1 2Π − G2 F

subject to Π1 = x, Π, C ≤ ǫ

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 8 / 18

slide-12
SLIDE 12

Optimization in Transportation Matrix

ǫ ∇Πℓ (Π)

(a) projected gradient

minimize

Π≥0 1 2Π − G2 F

subject to Π1 = x, Π, C ≤ ǫ

(b) Frank-Wolfe (Jaggi 2011)

minimize

Π≥0

Π, H subject to Π1 = x, Π, C ≤ ǫ

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 8 / 18

slide-13
SLIDE 13

Optimization in Transportation Matrix

ǫ ∇Πℓ (Π)

(a) projected gradient

minimize

Π≥0 1 2Π − G2 F

subject to Π1 = x, Π, C ≤ ǫ

(b) Frank-Wolfe (Jaggi 2011)

minimize

Π≥0

Π, H subject to Π1 = x, Π, C ≤ ǫ For n dimensional images, Π has n2 variables...

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 8 / 18

slide-14
SLIDE 14

Solve Projection in PGD

minimize

Π≥0 1 2Π − G2 F

subject to Π1 = x, Π, C ≤ ǫ

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 9 / 18

slide-15
SLIDE 15

Solve Projection in PGD

minimize

Π≥0 1 2Π − G2 F

subject to Π1 = x, Π, C ≤ ǫ The Lagrange dual can be simplified as a univariate problem maximize

λ≥0

g(λ)

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 9 / 18

slide-16
SLIDE 16

Solve Projection in PGD

minimize

Π≥0 1 2Π − G2 F

subject to Π1 = x, Π, C ≤ ǫ The Lagrange dual can be simplified as a univariate problem maximize

λ≥0

g(λ) No closed-form expression...

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 9 / 18

slide-17
SLIDE 17

Solve Projection in PGD

minimize

Π≥0 1 2Π − G2 F

subject to Π1 = x, Π, C ≤ ǫ The Lagrange dual can be simplified as a univariate problem maximize

λ≥0

g(λ) No closed-form expression... But g′(λ) can be evaluated in O(n2 log n) time

Proposition

0 ≤ λ⋆ ≤ 2 vec(G)∞ + x∞ mini=j{Cij}

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 9 / 18

slide-18
SLIDE 18

Bisection on the Dual

maximize

λ≥0

g(λ) Converge to high precision ≤ 20 iterations in practice. λ g(λ)

slide-19
SLIDE 19

Bisection on the Dual

maximize

λ≥0

g(λ) Converge to high precision ≤ 20 iterations in practice. λ g(λ) λ⋆

slide-20
SLIDE 20

Bisection on the Dual

maximize

λ≥0

g(λ) Converge to high precision ≤ 20 iterations in practice. λ g(λ) λ⋆

2vec(G)∞+x∞ mini=j{Cij}

slide-21
SLIDE 21

Bisection on the Dual

maximize

λ≥0

g(λ) Converge to high precision ≤ 20 iterations in practice. λ g(λ) λ⋆

2vec(G)∞+x∞ mini=j{Cij}

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 10 / 18

slide-22
SLIDE 22

Solve Linear Minimization in Frank-Wolfe

minimize

Π≥0

Π, H subject to Π1 = x, Π, C ≤ ǫ

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 11 / 18

slide-23
SLIDE 23

Solve Linear Minimization in Frank-Wolfe

minimize

Π≥0

Π, H subject to Π1 = x, Π, C ≤ ǫ The Lagrange dual can be simplified as a univariate problem maximize

λ≥0

g(λ)

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 11 / 18

slide-24
SLIDE 24

Solve Linear Minimization in Frank-Wolfe

minimize

Π≥0

Π, H subject to Π1 = x, Π, C ≤ ǫ The Lagrange dual can be simplified as a univariate problem maximize

λ≥0

g(λ) Bound on the optimum: 0 ≤ λ⋆ ≤ 2vec(H)∞

mini=j{Cij}

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 11 / 18

slide-25
SLIDE 25

Solve Linear Minimization in Frank-Wolfe

minimize

Π≥0

Π, H subject to Π1 = x, Π, C ≤ ǫ The Lagrange dual can be simplified as a univariate problem maximize

λ≥0

g(λ) Bound on the optimum: 0 ≤ λ⋆ ≤ 2vec(H)∞

mini=j{Cij}

Does not work...

◮ difficult to recover primal solution ◮ severe numerical instability K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 11 / 18

slide-26
SLIDE 26

Entropic Regularization

minimize

Π≥0

Π, H + γ

n

  • i=1

n

  • j=1

Πij log Πij subject to Π1 = x, Π, C ≤ ǫ

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 12 / 18

slide-27
SLIDE 27

Entropic Regularization

minimize

Π≥0

Π, H + γ

n

  • i=1

n

  • j=1

Πij log Πij subject to Π1 = x, Π, C ≤ ǫ Closed-form expression to recover primal solution

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 12 / 18

slide-28
SLIDE 28

Entropic Regularization

minimize

Π≥0

Π, H + γ

n

  • i=1

n

  • j=1

Πij log Πij subject to Π1 = x, Π, C ≤ ǫ Closed-form expression to recover primal solution Entropic regularization introduces approximation error

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 12 / 18

slide-29
SLIDE 29

Entropic Regularization

minimize

Π≥0

Π, H + γ

n

  • i=1

n

  • j=1

Πij log Πij subject to Π1 = x, Π, C ≤ ǫ Closed-form expression to recover primal solution Entropic regularization introduces approximation error But the approximation error is guaranteed to be small

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 12 / 18

slide-30
SLIDE 30

Exploit Sparsity

Local transportation constraint (Wong et al. 2019) ⇒ structured sparsity in Π Per iteration cost is reduced to O(n) by exploiting sparsity

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 13 / 18

slide-31
SLIDE 31

Comparison

ǫ = 0.001 0.002 0.003 0.004 0.005 20 40 60 80 adversarial accuracy on CIFAR-10 (standard training) Wong et al. (2019) Dual Proj.(ours) Dual LMO(ours)

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 14 / 18

slide-32
SLIDE 32

Comparison

ǫ = 0.001 0.002 0.003 0.004 0.005 20 40 60 80 adversarial accuracy on CIFAR-10 (standard training) Wong et al. (2019) Dual Proj.(ours) Dual LMO(ours) 20 40 60 80 iterations 10 20 time per iteration in ms

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 14 / 18

slide-33
SLIDE 33

Entropic Regularization Reflects Shapes

slide-34
SLIDE 34

Entropic Regularization Reflects Shapes

slide-35
SLIDE 35

Entropic Regularization Reflects Shapes

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 15 / 18

slide-36
SLIDE 36

Scalable to High Dimensional Data

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 16 / 18

slide-37
SLIDE 37

Improved Adversarial Training

Stronger attacks improve adversarial training! ǫ = 0.001 0.002 0.003 0.004 0.005 20 40 60 80 adversarial accuracy of models on CIFAR-10 (adversarial training) Wong et al. (2019) FW + dual LMO (ours)

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 17 / 18

slide-38
SLIDE 38

Summary

PGD and Frank-Wolfe complement each other nicely PGD with dual projection is the strongest attack Frank-Wolfe with dual LMO is the fastest attack Improved adversarial training Applicable to any Wasserstein constrained optimization

K.Wu, A.Wang and Y.Yu Wasserstein Adversarial Attacks July 29, 2020 18 / 18