On the Connection Between Adversarial Robustness and Saliency Map - - PowerPoint PPT Presentation

on the connection between adversarial robustness and
SMART_READER_LITE
LIVE PREVIEW

On the Connection Between Adversarial Robustness and Saliency Map - - PowerPoint PPT Presentation

On the Connection Between Adversarial Robustness and Saliency Map Interpretability Christian Etmann , 1 , 3 , Sebastian Lunz , 2 , Peter Maass 1 , Carola-Bibiane Sch onlieb 2 13th June, 2019 1: ZeTeM, University of Bremen, 2: Cambridge


slide-1
SLIDE 1

On the Connection Between Adversarial Robustness and Saliency Map Interpretability

Christian Etmann∗,1,3, Sebastian Lunz∗,2, Peter Maass1, Carola-Bibiane Sch¨

  • nlieb2

13th June, 2019

1: ZeTeM, University of Bremen, 2: Cambridge Image Analysis, University of Cambridge, 3: Work done at Cambridge 1

slide-2
SLIDE 2

Saliency Maps

Conv Conv Conv Conv affine layer

Ψ(x)

For a logit Ψi(x), we call its gradient ∇Ψi(x) the saliency map in x. It should show us the discriminative portions of the image.

2

slide-3
SLIDE 3

Saliency Maps

Conv Conv Conv Conv affine layer

Ψ(x)

For a logit Ψi(x), we call its gradient ∇Ψi(x) the saliency map in x. It should show us the discriminative portions of the image.

Original Image Saliency map of a ResNet50

2

slide-4
SLIDE 4

An Unexplained Phenomenon

Models trained to be more robust to adversarial attacks seem to exhibit ’interpretable’ saliency maps1

Original Image Saliency map of a robustified ResNet50

1Tsipras et al, 2019: ’Robustness may be at odds with accuracy.’

3

slide-5
SLIDE 5

An Unexplained Phenomenon

Models trained to be more robust to adversarial attacks seem to exhibit ’interpretable’ saliency maps1

Original Image Saliency map of a robustified ResNet50

This phenomenon has a remarkably simple explanation!

1Tsipras et al, 2019: ’Robustness may be at odds with accuracy.’

3

slide-6
SLIDE 6

Explaining the Interpretability Puzzle

We call ρ(x) = inf

e∈X{e : F(x + e) = F(x)}

the adversarial robustness of the classifier F (with respect to euclidean norm · ).

  • Adversarial attacks are tiny perturbations that ’fool’ the classifier

4

slide-7
SLIDE 7

Explaining the Interpretability Puzzle

We call ρ(x) = inf

e∈X{e : F(x + e) = F(x)}

the adversarial robustness of the classifier F (with respect to euclidean norm · ).

  • Adversarial attacks are tiny perturbations that ’fool’ the classifier
  • A higher robustness to these attacks ⇒ greater distance to the decision boundary

4

slide-8
SLIDE 8

Explaining the Interpretability Puzzle

We call ρ(x) = inf

e∈X{e : F(x + e) = F(x)}

the adversarial robustness of the classifier F (with respect to euclidean norm · ).

  • Adversarial attacks are tiny perturbations that ’fool’ the classifier
  • A higher robustness to these attacks ⇒ greater distance to the decision boundary
  • A larger distance to the decision boundary results in a lower angle between x and

∇Ψi(x)

4

slide-9
SLIDE 9

Explaining the Interpretability Puzzle

We call ρ(x) = inf

e∈X{e : F(x + e) = F(x)}

the adversarial robustness of the classifier F (with respect to euclidean norm · ).

  • Adversarial attacks are tiny perturbations that ’fool’ the classifier
  • A higher robustness to these attacks ⇒ greater distance to the decision boundary
  • A larger distance to the decision boundary results in a lower angle between x and

∇Ψi(x)

  • We perceive this as a higher visual alignment between image and saliency map

4

slide-10
SLIDE 10

Explaining the Interpretability Puzzle

We call ρ(x) = inf

e∈X{e : F(x + e) = F(x)}

the adversarial robustness of the classifier F (with respect to euclidean norm · ).

  • Adversarial attacks are tiny perturbations that ’fool’ the classifier
  • A higher robustness to these attacks ⇒ greater distance to the decision boundary
  • A larger distance to the decision boundary results in a lower angle between x and

∇Ψi(x)

  • We perceive this as a higher visual alignment between image and saliency map

. . . but not quite

4

slide-11
SLIDE 11

A Simple Toy Example

x z x z

First, we consider a linear, binary classifier F(x) = sgn (Ψ(x)) , where Ψ(x) := x, z for some z. Then ρ(x) = |x, z| z = |x, ∇Ψ(x)| ∇Ψ(x) . Note that ρ(x) = x · | cos(δ)|, where δ is the angle between x and z.

5

slide-12
SLIDE 12

A Simple Toy Example

x ∇Ψ(x) x ∇Ψ(x)

First, we consider a linear, binary classifier F(x) = sgn (Ψ(x)) , where Ψ(x) := x, z for some z. Then ρ(x) = |x, z| z = |x, ∇Ψ(x)| ∇Ψ(x) . Note that ρ(x) = x · | cos(δ)|, where δ is the angle between x and z.

6

slide-13
SLIDE 13

Alignment

Definition (Alignment) Let Ψ = (Ψ1, . . . , Ψn) : X → Rn be differentiable in x. Then for an n-class classifier defined a.e. by F(x) = arg maxi Ψi(x), we call ∇ΨF(x) the saliency map of F. We further call α(x) := |x, ∇ΨF(x)(x)| ∇ΨF(x)(x) , the alignment with respect to Ψ in x. For binary, linear models by construction: ρ(x) = α(x)

7

slide-14
SLIDE 14

Alignment

Definition (Alignment) Let Ψ = (Ψ1, . . . , Ψn) : X → Rn be differentiable in x. Then for an n-class classifier defined a.e. by F(x) = arg maxi Ψi(x), we call ∇ΨF(x) the saliency map of F. We further call α(x) := |x, ∇ΨF(x)(x)| ∇ΨF(x)(x) , the alignment with respect to Ψ in x. For binary, linear models by construction: ρ(x) = α(x) ....but already wrong for affine models.

7

slide-15
SLIDE 15

How about neural nets?

There is no closed expression for robustness. One idea is to linearize. Definition (Linearized Robustness) Let Ψ(x) be the differentiable score vector for the classifier F in x. We call ˜ ρ(x) := min

j=i∗

Ψi∗(x) − Ψj(x) ∇Ψi∗(x) − ∇Ψj(x), the linearized robustness in x, where i∗ := F(x) is the predicted class at point x.

8

slide-16
SLIDE 16

Bridging the Gap Between Linearized Robustness and Alignment

Using

  • a homogeneous decomposition theorem
  • the ’binarization’ of our classifier

we get Theorem (Bound for general models) Let g := ∇Ψi∗(x). Furthermore, let g† := ∇Ψ†

x(x) and β† the

non-homogeneous portion of Ψ†

  • x. Denote by ¯

v the · -normed v = 0. Then ˜ ρ(x) ≤ α(x) + x · g† − g + |β†| g†.

9

slide-17
SLIDE 17

Experiments: Robustness vs. Alignment

ImageNet

50 100 150 200 250 300 350 400 100 200 300 400

M [ρ(x)] M [α(x)]

Gradient Attack Projected Gradient Descent Carlini-Wagner Linearized Robustness

MNIST

1.5 2 2.5 3 2.5 3 3.5 4 4.5

M [ρ(x)] M [α(x)]

Gradient Attack Projected Gradient Descent Carlini-Wagner Linearized Robustness

  • Linearized robustness is a reasonable approximation
  • Alignment increases with robustness
  • Superlinear growth for ImageNet and saturating effect on MNIST

10

slide-18
SLIDE 18

Experiments: Explaining the Observations

ImageNet

50 100 150 200 250 300 0.4 0.5 0.6 0.7 0.8 0.9 1

M [˜ ρ(x)]

M[|x,g†|] M[|Ψ†(x)|]

MNIST

1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 0.5 0.6 0.7 0.8 0.9 1

M [˜ ρ(x)]

M[|x,g†|] M[|Ψ†(x)|]

Fraction of homogeneous part of logit

  • The degree of homogeneity largely determines how strong the connection between

α and ˜ ρ is

  • ImageNet: higher robustness + more homogeneity = superlinear growth
  • MNIST: higher robustness + less homogeneity = effects start cancelling out

11

slide-19
SLIDE 19

On the Connection Between Adversarial Robustness and Saliency Map Interpretability

Thank you and see you at the poster! Pacific Ballroom, #70

12