Transferable Adversarial Examples: Insights, A9acks & Defenses - - PowerPoint PPT Presentation

transferable adversarial examples insights a9acks defenses
SMART_READER_LITE
LIVE PREVIEW

Transferable Adversarial Examples: Insights, A9acks & Defenses - - PowerPoint PPT Presentation

Transferable Adversarial Examples: Insights, A9acks & Defenses June 12 th 2017 Florian Tramr Joint work with Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh & Patrick McDaniel Adversarial Examples Threat Model: White-Box


slide-1
SLIDE 1

Transferable Adversarial Examples: Insights, A9acks & Defenses

June 12th 2017 Florian Tramèr Joint work with Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh & Patrick McDaniel

slide-2
SLIDE 2

Adversarial Examples Threat Model: White-Box A9acks

2

ML Model

bird tree plane

Loss

ground truth

“Fast Gradient Sign Method” (FGSM)

Take gradient

  • f the loss

r = ✏ · sign (rxJ(x, y, ✓))

slide-3
SLIDE 3

3

ML Model

bird tree plane

Hypothetical Attacks on Autonomous Vehicles

Denial of service

Confusing object

Harm others

Adversarial input recognized as “open space on the road”

Harm self / passengers

Adversarial input recognized as “navigable road”

+

krk∞ = ✏

“Fast Gradient Sign Method” (FGSM)

r = ✏ · sign (rxJ(x, y, ✓))

Adversarial Examples Threat Model: White-Box A9acks

slide-4
SLIDE 4

4

ML Model

plane plane plane

Adversarial Examples transfer

ML Model

ML Model

Adversarial Examples Threat Model: Black-Box A9acks

slide-5
SLIDE 5

The Space of Transferable Adversarial Examples

5

slide-6
SLIDE 6

How large is the “space” of adversarial examples?

  • At least 2D

– Warde-Farley & Goodfellow 2016 – Liu et al. 2017

6

Church window plots. Warde-Farley & Goodfellow 2016

slide-7
SLIDE 7

Gradient-Aligned Subspaces

7

  • Adversarial examples

form a con)guous subspace of “high” dimensionality

– 15-45 dimensions for DNNs and CNNs on MNIST – IntersecAon of adversarial subspaces is also mul^dimensional

slide-8
SLIDE 8

Decision Boundary Similarity

8

Distance to boundary Distance between boundaries

slide-9
SLIDE 9

Decision Boundary Similarity

  • Experiments with MNIST and DREBIN (malware)

– DNN, Logis^c Regression, SVM – 3 direc^ons:

  • Aligned with gradient (adversarial example)
  • In direc^on of data point of different class
  • In random direc^on
  • Results: In any direc^on,

distance to boundary ≫ distance btw boundaries

9

Models are similar “everywhere”

slide-10
SLIDE 10

Open Ques^ons

  • Why this similarity?

– Data dependent results? – E.g., for a binary MNIST task (3s vs 7s) we prove: – These adversarial examples also transfer to DNNs and CNNs but we can’t prove this is inherent…

10

If F1 (linear model) and F2 (quadraAc model) have high accuracy, then there are adversarial examples that transfer between the two models

slide-11
SLIDE 11

Transferability and Adversarial Training

11

slide-12
SLIDE 12

Adversarial Training

12

ML Model

bird

Loss ML Model

plane

Loss

take gradient (FGSM)

slide-13
SLIDE 13

A9acks on Adversarial Training

13

1.0 3.6 18.2 5 10 15 20 Error Rate (%)

MNIST

22.0 26.8 36.5 5 10 15 20 25 30 35 40 Error Rate (%)

ImageNet (top1)

Adversarial examples transferred from another (standard) model

slide-14
SLIDE 14

“Gradient Masking”

  • How to get robustness to FGSM-style a9acks?

Large Margin Classifier

Gradient Masking

14

slide-15
SLIDE 15

Loss of Adversarially Trained Model

15

Data Point Move in direc^on of another model’s gradient (black-box a9ack) Adversarial Example Move in direc^on

  • f model’s gradient

(white-box a9ack) Non-Adversarial Example Loss

slide-16
SLIDE 16

Loss of Adversarially Trained Model

16

Loss

slide-17
SLIDE 17

Simple One-Shot A9ack: RAND+FGSM

17

  • 1. Small random step
  • 2. Step in direc^on of gradient

3.6 34.1 20 40 FGSM RAND+FGSM Error Rate (%)

MNIST

26.8 64.3 20 40 60 80 FGSM RAND+FGSM Error Rate (%)

ImageNet (top1)

slide-18
SLIDE 18

FGSM vs RAND+FGSM

  • An improved one-shot a9ack even against

non-defended models:

≈ + 4% error on MNIST ≈ + 11% error on ImageNet

  • Adversarial training with RAND+FGSM

– Doesn’t work… – Are we stuck with adversarial training?

18

slide-19
SLIDE 19

What’s Wrong with Adversarial Training?

  • Minimize

19

loss(x, y) + loss(x + ✏ · sign(grad), y)

Small if:

  • 1. The model is actually robust
  • 2. Or, the gradient points in a

direcAon that is not adversarial Degenerate Minimum

slide-20
SLIDE 20

Ensemble Adversarial Training

  • How do we avoid these degenerate minima?

20

ML Model Loss

ML Model ML Model

pre-trained

slide-21
SLIDE 21

Results

21

0.7 3.8 15.5 0.7 6.0 3.9 2 4 6 8 10 12 14 16 18 Clean Data White-Box FGSM A9ack Black-Box FGSM A9ack Error Rate

MNIST (standard CNN)

  • Adv. Training

Ensemble Adv. Training

Source model for a9ack was not used during training Less white-box FGSM samples seen during training

slide-22
SLIDE 22

Results

22

22.0 26.8 36.5 23.6 30.0 30.4 20.2 25.9 24.6 5 10 15 20 25 30 35 40 Clean Data White-Box FGSM A9ack Black-Box FGSM A9ack Error Rate

ImageNet (Incep)on v3, Incep)on ResNet v2)

  • Adv. Training

Ensemble Adv. Training Ensemble Adv. Training (ResNet)

slide-23
SLIDE 23

What about stronger a9acks?

  • Li9le to no improvement on white-box

itera^ve and RAND+FGSM a9acks!

  • But, improvements in black-box seMng!

23

15.5 15.2 13.5 9.5 3.9 7.0 6.2 2.9 0.0 10.0 20.0 FGSM Carlini-Wagner I-FGSM RAND+FGSM Error Rate

Black-Box APacks on MNIST

  • Adv. Training

Ensemble Adv. Training

≈ ≈

slide-24
SLIDE 24

What about stronger a9acks?

24

36.5 30.8 30.4 29.9 24.6 25.0 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 FGSM RAND+FGSM Error Rate

Black-Box APacks on ImageNet

  • Adv. Training

Ensemble Adv. Training Ensemble Adv. Training (ResNet)

slide-25
SLIDE 25

Prac^cal Considera^ons for Ensemble Adversarial Training

  • Pre-compute gradients for pre-trained models

– Lower per-batch cost than with adversarial training!

  • Randomize source model in each batch

– If num_models % num_batches = 0, we see the same adversarial examples in each epoch if we just rotate

  • Convergence is slower

Standard Incep^on v3: ~150 epochs Adversarial training: ~190 epochs Ensemble adversarial training: ~280 epochs

25

Maybe because the task is actually hard?...

slide-26
SLIDE 26

Takeaways

  • Test defenses on black-box a9acks

– Dis^lla^on (Papernot et al. 2016, a9ack by Carlini et al. 2016) – Biologically Inspired Networks

(Nayebi & Ganguli 2017, a9ack by Brendel & Bethge 2017)

– Adversarial Training, and probably many others…

  • Ensemble Adversarial Training can improve

robustness to black-box a9acks

26

« If you don’t know where to go, just move at random. »

— Morgan Freeman — (or Dan Boneh)

slide-27
SLIDE 27

Open Problems

  • Be9er black-box a9acks?

– Using ensemble of source models? (Lin et al. 2017) – How much does oracle access to the model help?

  • More efficient ensemble adversarial training?
  • Can we say anything formal (and useful) about

adversarial examples?

27

THANK YOU