adversarial examples are not mysterious generalization is
play

Adversarial examples are not mysterious, generalization is Angus - PowerPoint PPT Presentation

I NTRODUCTION Theory Trade-offs Practical Attacks DNNs Adversarial examples are not mysterious, generalization is Angus Galloway University of Guelph gallowaa@uoguelph.ca I NTRODUCTION Theory Trade-offs Practical Attacks DNNs T HE


  1. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs Adversarial examples are not mysterious, generalization is Angus Galloway University of Guelph gallowaa@uoguelph.ca

  2. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs T HE ADVERSARIAL EXAMPLES PHENOMENON Machine learning models generalize well to an unseen test set, yet every input of a particular class is extremely close to an input of another class. “Accepted” informal definition : Any input designed to fool a machine learning system.

  3. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs F ORMAL DEFINITIONS A “misclassification” adversarial candidate ˆ x on a neural network F with input x via some perturbation of x by δ : ˆ x = x + δ where δ is usually derived from the gradient of the loss ∇ L ( θ, y , x ) w.r.t x , and for some small scalar ǫ , � δ � p ≤ ǫ, p ∈ { 1 , 2 , ∞} such that F ( x ) � = F (ˆ x )

  4. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs G OODFELLOW ET AL . 2015 For input x ∈ R n , there is an adversarial example ˜ x = x + η subject to the constraint � η � ∞ < ǫ . The dot product between a weight vector w and an adversarial example ˜ x is then: w T · ˜ x = w T · x + w T · η If w has mean m , activation grows linearly with ǫ m n . . .

  5. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs T ANAY & G RIFFIN 2016 But both w T · x and w T · η grow linearly with dimension n , provided that the distribution of w and x do not change.

  6. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs T HE B OUNDARY T ILTING P ERSPECTIVE Submanifold of sampled data Dense distribution of “low probability pockets” The boundary is “outside the box” Image space Image space (a) (b) Recall manifold learning hypothesis : training data sub-manifold exists with finite topological dimension f �� n .

  7. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs C m ( I , C ) m ( i , C ) j J I i (c) B C B m ( i, C ) j = m ( i, B ) m ( I , B ) m ( I , C ) I I j i i J J (d)

  8. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs T AXONOMY er 0.5 titling of L poorly performing classifiers (0 < v z << 1) B Type 0 titling of L Type 1 Type 2 (v z = 0) er min T optimal L classifiers (low reg.) δ 0 π/2

  9. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs A TTACKING B INARIZED N EURAL N ETWORKS tf.sign() tf.sign() Batch Batch Norm Norm Full-Precision Conv2D ReLU ReLU Scalar Binary Conv2D The empirical observation that BNNs with low-precision weights and activations are at least as robust as their full-precision counter parts.

  10. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs A TTACKING B INARIZED N EURAL N ETWORKS (2) 1. Regularizing effect due to decoupling between continuous and quantized parameters used in forward pass, biased gradient estimator (STE?) 2. Strikes better trade-off on IB curve in over-parameterized regime by discarding irrelevant information. (e) (f)

  11. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs W HY ONLY CONSIDER SMALL PERTURBATIONS ? Fault tolerant engineering design: Want performance degradation to be proportional to perturbation magnitude, regardless of an attacker’s strategy.

  12. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs 100 80 Train w/PGD Natural Accuracy (%) 60 40 20 0 0.1 0.3 0.5 0.7 0.9 Shift Magnitude

  13. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs H UMAN -D RIVEN A TTACKS (g) (h)

  14. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs A P RACTICAL B LACK -B OX A TTACK

  15. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs T RADE - OFFS 100 100 Expert-L2 90 80 Natural FGSM Accuracy (%) 80 60 70 40 40 60 20 50 0 0 10 20 30 40 50 0.0 0.1 0.2 0.3 0.4 0.5 Pixels changed FGSM attack epsilon (i) (j)

  16. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs I NTERPRETABILITY OF L OGISTIC R EGRESSION (k) (l) (m) (n)

  17. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs C ANDIDATE E XAMPLES (o) (p) (q) (r) (s)

  18. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs CIFAR-10 ARCHITECTURE Table: Simple fully-convolutional architecture adapted from the CleverHans library. Model uses ReLU activations, and does not use batch normalization or pooling. Layer h w c in c out s params 8 8 3 32 2 6.1k Conv1 6 6 32 64 2 73.7k Conv2 5 5 64 64 1 102.4k Conv3 1 1 256 10 1 2.6k Fc1 Total – – – – – 184.8k Model has 0.4% as many parameters as WideResNet.

  19. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs L ∞ A DVERSARIAL E XAMPLES 100 WRN FGSM WRN PGD 80 CNN-L2 FGSM CNN-L2 PGD WRN-Nat PGD accuracy (%) 60 40 20 0 25 50 75 100 epsilon

  20. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs R OBUSTNESS 100 WRN CNN 80 CNN-L2 WRN-Nat accuracy (%) 60 40 20 0 0.2 0.4 0.6 0.8 fraction of pixels swapped

  21. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs N OISY E XAMPLES

  22. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs W ITH L 2 WEIGHT DECAY The “independent components” of natural scenes are edge filters (Bell & Sejnowski 1997).

  23. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs W ITHOUT WEIGHT DECAY

  24. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs F OOLING I MAGES 4 years ago I didn’t think small-perturbation adversarial examples were going to be so hard to solve. I thought after another n months of working on those, I’d be basically done with them and would move on to fooling attacks . Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images (CVPR 2015)

  25. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs F OOLING I MAGES (CIFAR-10)

  26. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs F OOLING I MAGES (SVHN)

  27. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs F OOLING I MAGES (SVHN) Robust training procedure does not learn random labels (lower Rademacher complexity).

  28. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs F OOLING I MAGES 100 attack success rate (ASR) - margin (M) WRN ASR WRN M 80 CNN-L2 ASR CNN-L2 M 60 40 20 0 0 50 100 150 200 250 epsilon

  29. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs D IVIDE AND C ONQUER ? Image from Dube (2018).

  30. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs R EMARKS ◮ Test accuracy on popular ML benchmarks a weak measure of generalization. ◮ Plethora of band-aid fixes to std DNNs do not yield compelling results (e.g. provably robust framework). ◮ Incorporate expert knowledge, e.g. by excplicitly modeling part-whole relationships, other priors that relate to the known causal features such as edges in natural scenes. ◮ Good generalization implies some level of privacy, and more “fair” models assuming original intent is fair.

  31. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs F UTURE WORK Information bottleneck (IB) theory seems essential for efficiently learning robust models from finite data. But why do models with no bottleneck generalize well on common machine learning datasets? i -RevNet retains all information until final layer and achieves high accuracy, but is extremely sensitive to adversarial examples.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend