on the connection between adversarial robustness and
play

On the Connection Between Adversarial Robustness and Saliency Map - PowerPoint PPT Presentation

On the Connection Between Adversarial Robustness and Saliency Map Interpretability Christian Etmann , 1 , 3 , Sebastian Lunz , 2 , Peter Maass 1 , Carola-Bibiane Sch onlieb 2 13th June, 2019 1: ZeTeM, University of Bremen, 2: Cambridge


  1. On the Connection Between Adversarial Robustness and Saliency Map Interpretability Christian Etmann ∗ , 1 , 3 , Sebastian Lunz ∗ , 2 , Peter Maass 1 , Carola-Bibiane Sch¨ onlieb 2 13th June, 2019 1: ZeTeM, University of Bremen, 2: Cambridge Image Analysis, University of Cambridge, 3: Work done at Cambridge 1

  2. Saliency Maps Ψ( x ) affine layer Conv Conv Conv Conv For a logit Ψ i ( x ), we call its gradient ∇ Ψ i ( x ) the saliency map in x . It should show us the discriminative portions of the image. 2

  3. Saliency Maps Ψ( x ) affine layer Conv Conv Conv Conv For a logit Ψ i ( x ), we call its gradient ∇ Ψ i ( x ) the saliency map in x . It should show us the discriminative portions of the image. Original Image Saliency map of a ResNet50 2

  4. An Unexplained Phenomenon Models trained to be more robust to adversarial attacks seem to exhibit ’interpretable’ saliency maps 1 Original Image Saliency map of a robustified ResNet50 3 1 Tsipras et al, 2019: ’Robustness may be at odds with accuracy.’

  5. An Unexplained Phenomenon Models trained to be more robust to adversarial attacks seem to exhibit ’interpretable’ saliency maps 1 Original Image Saliency map of a robustified ResNet50 This phenomenon has a remarkably simple explanation! 3 1 Tsipras et al, 2019: ’Robustness may be at odds with accuracy.’

  6. Explaining the Interpretability Puzzle We call ρ ( x ) = inf e ∈ X {� e � : F ( x + e ) � = F ( x ) } the adversarial robustness of the classifier F (with respect to euclidean norm � · � ). • Adversarial attacks are tiny perturbations that ’fool’ the classifier 4

  7. Explaining the Interpretability Puzzle We call ρ ( x ) = inf e ∈ X {� e � : F ( x + e ) � = F ( x ) } the adversarial robustness of the classifier F (with respect to euclidean norm � · � ). • Adversarial attacks are tiny perturbations that ’fool’ the classifier • A higher robustness to these attacks ⇒ greater distance to the decision boundary 4

  8. Explaining the Interpretability Puzzle We call ρ ( x ) = inf e ∈ X {� e � : F ( x + e ) � = F ( x ) } the adversarial robustness of the classifier F (with respect to euclidean norm � · � ). • Adversarial attacks are tiny perturbations that ’fool’ the classifier • A higher robustness to these attacks ⇒ greater distance to the decision boundary • A larger distance to the decision boundary results in a lower angle between x and ∇ Ψ i ( x ) 4

  9. Explaining the Interpretability Puzzle We call ρ ( x ) = inf e ∈ X {� e � : F ( x + e ) � = F ( x ) } the adversarial robustness of the classifier F (with respect to euclidean norm � · � ). • Adversarial attacks are tiny perturbations that ’fool’ the classifier • A higher robustness to these attacks ⇒ greater distance to the decision boundary • A larger distance to the decision boundary results in a lower angle between x and ∇ Ψ i ( x ) • We perceive this as a higher visual alignment between image and saliency map 4

  10. Explaining the Interpretability Puzzle We call ρ ( x ) = inf e ∈ X {� e � : F ( x + e ) � = F ( x ) } the adversarial robustness of the classifier F (with respect to euclidean norm � · � ). • Adversarial attacks are tiny perturbations that ’fool’ the classifier • A higher robustness to these attacks ⇒ greater distance to the decision boundary • A larger distance to the decision boundary results in a lower angle between x and ∇ Ψ i ( x ) • We perceive this as a higher visual alignment between image and saliency map . . . but not quite 4

  11. A Simple Toy Example x x z z First, we consider a linear, binary classifier F ( x ) = sgn (Ψ( x )) , where Ψ( x ) := � x , z � for some z . Then ρ ( x ) = |� x , z �| = |� x , ∇ Ψ( x ) �| �∇ Ψ( x ) � . � z � Note that ρ ( x ) = � x � · | cos( δ ) | , where δ is the angle between x and z . 5

  12. A Simple Toy Example x x ∇ Ψ( x ) ∇ Ψ( x ) First, we consider a linear, binary classifier F ( x ) = sgn (Ψ( x )) , where Ψ( x ) := � x , z � for some z . Then ρ ( x ) = |� x , z �| = |� x , ∇ Ψ( x ) �| �∇ Ψ( x ) � . � z � Note that ρ ( x ) = � x � · | cos( δ ) | , where δ is the angle between x and z . 6

  13. Alignment Definition (Alignment) Let Ψ = (Ψ 1 , . . . , Ψ n ) : X → R n be differentiable in x . Then for an n -class classifier defined a.e. by F ( x ) = arg max i Ψ i ( x ), we call ∇ Ψ F ( x ) the saliency map of F . We further call α ( x ) := |� x , ∇ Ψ F ( x ) ( x ) �| �∇ Ψ F ( x ) ( x ) � , the alignment with respect to Ψ in x . For binary, linear models by construction: ρ ( x ) = α ( x ) 7

  14. Alignment Definition (Alignment) Let Ψ = (Ψ 1 , . . . , Ψ n ) : X → R n be differentiable in x . Then for an n -class classifier defined a.e. by F ( x ) = arg max i Ψ i ( x ), we call ∇ Ψ F ( x ) the saliency map of F . We further call α ( x ) := |� x , ∇ Ψ F ( x ) ( x ) �| �∇ Ψ F ( x ) ( x ) � , the alignment with respect to Ψ in x . For binary, linear models by construction: ρ ( x ) = α ( x ) ....but already wrong for affine models. 7

  15. How about neural nets? There is no closed expression for robustness. One idea is to linearize. Definition (Linearized Robustness) Let Ψ( x ) be the differentiable score vector for the classifier F in x . We call Ψ i ∗ ( x ) − Ψ j ( x ) ρ ( x ) := min ˜ �∇ Ψ i ∗ ( x ) − ∇ Ψ j ( x ) � , j � = i ∗ the linearized robustness in x , where i ∗ := F ( x ) is the predicted class at point x . 8

  16. Bridging the Gap Between Linearized Robustness and Alignment Using • a homogeneous decomposition theorem • the ’binarization’ of our classifier we get Theorem (Bound for general models) Let g := ∇ Ψ i ∗ ( x ) . Furthermore, let g † := ∇ Ψ † x ( x ) and β † the non-homogeneous portion of Ψ † x . Denote by ¯ v the � · � -normed v � = 0 . Then ρ ( x ) ≤ α ( x ) + � x � · � g † − g � + | β † | ˜ � g † � . 9

  17. Experiments: Robustness vs. Alignment ImageNet MNIST 400 4 . 5 4 300 M [ α ( x )] M [ α ( x )] 3 . 5 200 3 100 Gradient Attack Gradient Attack Projected Gradient Descent Projected Gradient Descent Carlini-Wagner 2 . 5 Carlini-Wagner Linearized Robustness Linearized Robustness 50 100 150 200 250 300 350 400 1 . 5 2 2 . 5 3 M [ ρ ( x )] M [ ρ ( x )] • Linearized robustness is a reasonable approximation • Alignment increases with robustness • Superlinear growth for ImageNet and saturating effect on MNIST 10

  18. Experiments: Explaining the Observations ImageNet MNIST 1 1 0 . 9 0 . 9 0 . 8 M [ |� x,g † �| ] M [ | Ψ † ( x ) | ] M [ |� x,g † �| ] M [ | Ψ † ( x ) | ] 0 . 8 0 . 7 0 . 7 0 . 6 0 . 5 0 . 6 0 . 4 0 . 5 50 100 150 200 250 300 1 . 2 1 . 4 1 . 6 1 . 8 2 2 . 2 2 . 4 2 . 6 2 . 8 3 M [˜ ρ ( x )] M [˜ ρ ( x )] Fraction of homogeneous part of logit • The degree of homogeneity largely determines how strong the connection between α and ˜ ρ is • ImageNet: higher robustness + more homogeneity = superlinear growth • MNIST: higher robustness + less homogeneity = effects start cancelling out 11

  19. On the Connection Between Adversarial Robustness and Saliency Map Interpretability Thank you and see you at the poster! Pacific Ballroom, #70 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend