Adversarial examples are not mysterious, generalization is Angus - - PowerPoint PPT Presentation

adversarial examples are not mysterious generalization is
SMART_READER_LITE
LIVE PREVIEW

Adversarial examples are not mysterious, generalization is Angus - - PowerPoint PPT Presentation

I NTRODUCTION Theory Trade-offs Practical Attacks DNNs Adversarial examples are not mysterious, generalization is Angus Galloway University of Guelph gallowaa@uoguelph.ca I NTRODUCTION Theory Trade-offs Practical Attacks DNNs T HE


slide-1
SLIDE 1

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

Adversarial examples are not mysterious, generalization is

Angus Galloway University of Guelph gallowaa@uoguelph.ca

slide-2
SLIDE 2

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

THE ADVERSARIAL EXAMPLES PHENOMENON

Machine learning models generalize well to an unseen test set, yet every input of a particular class is extremely close to an input of another class. “Accepted” informal definition: Any input designed to fool a machine learning system.

slide-3
SLIDE 3

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

FORMAL DEFINITIONS

A “misclassification” adversarial candidate ˆ x on a neural network F with input x via some perturbation of x by δ: ˆ x = x + δ where δ is usually derived from the gradient of the loss ∇L(θ, y, x) w.r.t x, and for some small scalar ǫ, δp ≤ ǫ, p ∈ {1, 2, ∞} such that F(x) = F(ˆ x)

slide-4
SLIDE 4

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

GOODFELLOW ET AL. 2015

For input x ∈ Rn, there is an adversarial example ˜ x = x + η subject to the constraint η∞ < ǫ. The dot product between a weight vector w and an adversarial example ˜ x is then: wT · ˜ x = wT · x + wT · η If w has mean m, activation grows linearly with ǫ m n . . .

slide-5
SLIDE 5

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

TANAY & GRIFFIN 2016

But both wT · x and wT · η grow linearly with dimension n, provided that the distribution of w and x do not change.

slide-6
SLIDE 6

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

THE BOUNDARY TILTING PERSPECTIVE

Dense distribution

  • f “low probability

pockets” Image space

(a)

Image space The boundary is “outside the box” Submanifold of sampled data

(b)

Recall manifold learning hypothesis: training data sub-manifold exists with finite topological dimension fn.

slide-7
SLIDE 7

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

I i J j m(i,C) m(I,C) C

(c)

I i J j = m(i,B) m(I,B) B I i J m(i,C) m(I,C) C j B

(d)

slide-8
SLIDE 8

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

TAXONOMY

poorly performing classifiers π/2

δ er B

0.5

L T

classifiers Type 1 Type 2

  • ptimal

titling of L (0 < vz << 1) titling of L (vz = 0) ermin (low reg.) Type 0

slide-9
SLIDE 9

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

ATTACKING BINARIZED NEURAL NETWORKS

Batch Norm Full-Precision Conv2D tf.sign() Binary Conv2D Scalar ReLU ReLU Batch Norm tf.sign()

The empirical observation that BNNs with low-precision weights and activations are at least as robust as their full-precision counter parts.

slide-10
SLIDE 10

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

ATTACKING BINARIZED NEURAL NETWORKS (2)

  • 1. Regularizing effect due to decoupling between continuous

and quantized parameters used in forward pass, biased gradient estimator (STE?)

  • 2. Strikes better trade-off on IB curve in over-parameterized

regime by discarding irrelevant information.

(e) (f)

slide-11
SLIDE 11

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

WHY ONLY CONSIDER SMALL PERTURBATIONS?

Fault tolerant engineering design: Want performance degradation to be proportional to perturbation magnitude, regardless of an attacker’s strategy.

slide-12
SLIDE 12

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

0.1 0.3 0.5 0.7 0.9 Shift Magnitude 20 40 60 80 100 Accuracy (%) Natural Train w/PGD

slide-13
SLIDE 13

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

HUMAN-DRIVEN ATTACKS

(g) (h)

slide-14
SLIDE 14

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

A PRACTICAL BLACK-BOX ATTACK

slide-15
SLIDE 15

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

TRADE-OFFS

10 20 30 40 50 Pixels changed 50 60 70 80 90 100 Accuracy (%)

(i)

0.0 0.1 0.2 0.3 0.4 0.5 FGSM attack epsilon 20 40 60 80 100 40 Expert-L2 Natural FGSM

(j)

slide-16
SLIDE 16

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

INTERPRETABILITY OF LOGISTIC REGRESSION

(k) (l) (m) (n)

slide-17
SLIDE 17

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

CANDIDATE EXAMPLES

(o) (p) (q) (r) (s)

slide-18
SLIDE 18

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

CIFAR-10 ARCHITECTURE

Table: Simple fully-convolutional architecture adapted from the CleverHans library. Model uses ReLU activations, and does not use batch normalization or pooling.

Layer h w cin cout s params Conv1 8 8 3 32 2 6.1k Conv2 6 6 32 64 2 73.7k Conv3 5 5 64 64 1 102.4k Fc1 1 1 256 10 1 2.6k Total – – – – – 184.8k Model has 0.4% as many parameters as WideResNet.

slide-19
SLIDE 19

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

L∞ ADVERSARIAL EXAMPLES

25 50 75 100 epsilon 20 40 60 80 100 accuracy (%) WRN FGSM WRN PGD CNN-L2 FGSM CNN-L2 PGD WRN-Nat PGD

slide-20
SLIDE 20

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

ROBUSTNESS

0.2 0.4 0.6 0.8 fraction of pixels swapped 20 40 60 80 100 accuracy (%) WRN CNN CNN-L2 WRN-Nat

slide-21
SLIDE 21

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

NOISY EXAMPLES

slide-22
SLIDE 22

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

WITH L2 WEIGHT DECAY

The “independent components” of natural scenes are edge filters (Bell & Sejnowski 1997).

slide-23
SLIDE 23

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

WITHOUT WEIGHT DECAY

slide-24
SLIDE 24

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

FOOLING IMAGES

4 years ago I didn’t think small-perturbation adversarial examples were going to be so hard to solve. I thought after another n months of working on those, I’d be basically done with them and would move on to fooling attacks. Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images (CVPR 2015)

slide-25
SLIDE 25

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

FOOLING IMAGES (CIFAR-10)

slide-26
SLIDE 26

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

FOOLING IMAGES (SVHN)

slide-27
SLIDE 27

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

FOOLING IMAGES (SVHN)

Robust training procedure does not learn random labels (lower Rademacher complexity).

slide-28
SLIDE 28

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

FOOLING IMAGES

50 100 150 200 250 epsilon 20 40 60 80 100 attack success rate (ASR) - margin (M) WRN ASR WRN M CNN-L2 ASR CNN-L2 M

slide-29
SLIDE 29

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

DIVIDE AND CONQUER?

Image from Dube (2018).

slide-30
SLIDE 30

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

REMARKS

◮ Test accuracy on popular ML benchmarks a weak measure

  • f generalization.

◮ Plethora of band-aid fixes to std DNNs do not yield

compelling results (e.g. provably robust framework).

◮ Incorporate expert knowledge, e.g. by excplicitly modeling

part-whole relationships, other priors that relate to the known causal features such as edges in natural scenes.

◮ Good generalization implies some level of privacy, and

more “fair” models assuming original intent is fair.

slide-31
SLIDE 31

INTRODUCTION Theory Trade-offs Practical Attacks DNNs

FUTURE WORK

Information bottleneck (IB) theory seems essential for efficiently learning robust models from finite data. But why do models with no bottleneck generalize well on common machine learning datasets? i-RevNet retains all information until final layer and achieves high accuracy, but is extremely sensitive to adversarial examples.