Ensemble Adversarial Training A1acks and Defenses Facebook December - - PowerPoint PPT Presentation

ensemble adversarial training
SMART_READER_LITE
LIVE PREVIEW

Ensemble Adversarial Training A1acks and Defenses Facebook December - - PowerPoint PPT Presentation

Ensemble Adversarial Training A1acks and Defenses Facebook December 15 th 2017 Florian Tramr Stanford Joint work with Alexey Kurakin (Google Brain) Nicolas Papernot (PSU) Ian Goodfellow (Google Brain) Dan Boneh (Stanford) Patrick


slide-1
SLIDE 1

Ensemble Adversarial Training

A1acks and Defenses

Facebook December 15th 2017 Florian Tramèr Stanford Joint work with Alexey Kurakin (Google Brain) Nicolas Papernot (PSU) Ian Goodfellow (Google Brain) Dan Boneh (Stanford) Patrick McDaniel (PSU)

slide-2
SLIDE 2

Adversarial Examples in ML

+ .007 ⇥ =

2

(Goodfellow et al. 2015)

Pre8y sure this is a panda I’m certain this is a gibbon

slide-3
SLIDE 3

Adversarial Examples in ML

  • Images

Szegedy et al. 2013, Nguyen et al. 2015, Goodfellow et al. 2015, Papernot et al. 2016, Liu et al. 2016, Kurakin et al. 2016, …

  • Physical Objects

Sharif et al. 2016, Kurakin et al. 2017, EvXmov et al. 2017, Lu et al. 2017, Athalye et al. 2017

  • Malware

Šrndić & Laskov 2014, Xu et al. 2016, Grosse et al. 2016, Hu et al. 2017

  • Text Understanding

Papernot et al. 2016, Jia & Liang 2017

  • Speech

Carlini et al. 2015, Cisse et al. 2017

3

slide-4
SLIDE 4

CreaXng an adversarial example

4

ML Model

bird tree plane

Loss

bird

What happens if I nudge this pixel?

slide-5
SLIDE 5

CreaXng an adversarial example

5

ML Model

bird tree plane

Loss

bird

What happens if I nudge this pixel? What about this one?

slide-6
SLIDE 6

CreaXng an adversarial example

6

ML Model

bird tree plane

Loss

bird

What about this one?

Maximize loss with gradient ascent

slide-7
SLIDE 7

Threat Model: Black-Box A1acks

7

ML Model

plane plane plane

Adversarial Examples transfer

ML Model

ML Model

slide-8
SLIDE 8

Defenses?

  • Ensembles
  • Preprocessing (blurring, cropping, etc.)
  • DisXllaXon
  • GeneraXve modeling
  • Adversarial training

8

?

slide-9
SLIDE 9

Adversarial Training

9

ML Model

bird

Loss ML Model

plane

Loss

a1ack

slide-10
SLIDE 10

Adversarial Training - Tradeoffs

“weak” a8ack single step

10

“strong” a8ack many steps

slide-11
SLIDE 11

Adversarial Training - Tradeoffs

“weak” a8ack fast

11

“strong” a8ack slow

slide-12
SLIDE 12

Adversarial Training - Tradeoffs

“weak” a8ack not infallible but scalable

12

“strong” a8ack learn robust models on small datasets

Madry et al. 2017

slide-13
SLIDE 13

Adversarial Training on ImageNet

  • Adversarial training with single-step a1ack

(Kurakin et al. 2016)

13

22.0 26.8 36.5 5 10 15 20 25 30 35 40 Clean Data White-Box Single-Step Black-Box Single-Step Error Rate

Top1 error

Adversarial examples transferred from another model

slide-14
SLIDE 14

“Gradient Masking”

  • How to get robustness to single-step a1acks?

Large Margin Classifier

What’s happening? Gradient Masking!

14

slide-15
SLIDE 15

Loss of Adversarially Trained Model

15

Data Point Move in direcXon of another model’s gradient (black-box a1ack) Adversarial Example Move in direcXon

  • f model’s gradient

(white-box a1ack) Non-Adversarial Example

slide-16
SLIDE 16

Loss of Adversarially Trained Model

16

slide-17
SLIDE 17

Simple A1ack: RAND+Single-Step

17

  • 1. Small random step
  • 2. Step in direcXon of gradient

26.8 58.3 64.3 10 20 30 40 50 60 70 Error Rate

Top1 error

slide-18
SLIDE 18

What’s wrong with “Single-Step” Adversarial Training?

Minimize:

18

SoluXon:

  • 1. The model is actually robust
  • 2. Or, the a3ack is really bad

self.loss(self.attack( ))

Be1er approach? decouple a1ack and defense Degenerate Minimum

slide-19
SLIDE 19

Ensemble Adversarial Training

19

ML Model Loss

ML Model ML Model

pre-trained

slide-20
SLIDE 20

Domain AdaptaXon

  • Interpret Ensemble Adversarial Training as a

mulXple-source domain adaptaXon problem

– Train on distribuXons (adversaries) A1, …, Ak – Get tested on a new adversary A’

  • Provable generalizaXon if ∃i such that A’ ≈ Ai
  • Can’t say much about
  • ther adversaries

20

slide-21
SLIDE 21

Results

21

22.0 26.8 36.5 23.6 30.0 30.4 20.2 25.9 24.6 5 10 15 20 25 30 35 40 Clean Data White-Box A1ack Black-Box A1ack Error Rate

ImageNet (IncepLon v3, IncepLon ResNet v2)

  • Adv. Training

Ensemble Adv. Training Ensemble Adv. Training (ResNet)

slide-22
SLIDE 22

What about stronger a1acks?

  • Li1le gain on strong white-box a1acks!
  • But, improvements in black-box seQng!

22

slide-23
SLIDE 23

Open Problems

  • How far can we go with adversarial training?

– White-box robustness is possible! (Madry et al. 2017)

  • Caveat 1: Very expensive
  • Caveat 2: What is the right metric (l∞ , l2 , rotaXons) ?
  • Can we say anything formal (and useful)

about adversarial examples?

– Why do they exist? Why do they transfer?

23

THANK YOU