Ensemble Adversarial Training A1acks and Defenses Facebook December - - PowerPoint PPT Presentation

▶

Oct 26, 2022 894 likes •1.15k views

Ensemble Adversarial Training A1acks and Defenses Facebook December 15 th 2017 Florian Tramr Stanford Joint work with Alexey Kurakin (Google Brain) Nicolas Papernot (PSU) Ian Goodfellow (Google Brain) Dan Boneh (Stanford) Patrick

SLIDE 1

Ensemble Adversarial Training

A1acks and Defenses

Facebook December 15th 2017 Florian Tramèr Stanford Joint work with Alexey Kurakin (Google Brain) Nicolas Papernot (PSU) Ian Goodfellow (Google Brain) Dan Boneh (Stanford) Patrick McDaniel (PSU)

SLIDE 2

Adversarial Examples in ML

+ .007 ⇥ =

(Goodfellow et al. 2015)

Pre8y sure this is a panda I’m certain this is a gibbon

SLIDE 3

Adversarial Examples in ML

Images

Szegedy et al. 2013, Nguyen et al. 2015, Goodfellow et al. 2015, Papernot et al. 2016, Liu et al. 2016, Kurakin et al. 2016, …

Physical Objects

Sharif et al. 2016, Kurakin et al. 2017, EvXmov et al. 2017, Lu et al. 2017, Athalye et al. 2017

Malware

Šrndić & Laskov 2014, Xu et al. 2016, Grosse et al. 2016, Hu et al. 2017

Text Understanding

Papernot et al. 2016, Jia & Liang 2017

Speech

Carlini et al. 2015, Cisse et al. 2017

SLIDE 4

CreaXng an adversarial example

ML Model

bird tree plane

Loss

bird

What happens if I nudge this pixel?

SLIDE 5

CreaXng an adversarial example

ML Model

bird tree plane

Loss

bird

What happens if I nudge this pixel? What about this one?

SLIDE 6

CreaXng an adversarial example

ML Model

bird tree plane

Loss

bird

What about this one?

Maximize loss with gradient ascent

SLIDE 7

Threat Model: Black-Box A1acks

ML Model

plane plane plane

Adversarial Examples transfer

ML Model

SLIDE 8

Defenses?

Ensembles
Preprocessing (blurring, cropping, etc.)
DisXllaXon
GeneraXve modeling
Adversarial training

?

SLIDE 9

Adversarial Training

ML Model

bird

Loss ML Model

plane

Loss

a1ack

SLIDE 10

Adversarial Training - Tradeoffs

“weak” a8ack single step

“strong” a8ack many steps

SLIDE 11

Adversarial Training - Tradeoffs

“weak” a8ack fast

“strong” a8ack slow

SLIDE 12

Adversarial Training - Tradeoffs

“weak” a8ack not infallible but scalable

“strong” a8ack learn robust models on small datasets

Madry et al. 2017

SLIDE 13

Adversarial Training on ImageNet

Adversarial training with single-step a1ack

(Kurakin et al. 2016)

22.0 26.8 36.5 5 10 15 20 25 30 35 40 Clean Data White-Box Single-Step Black-Box Single-Step Error Rate

Top1 error

Adversarial examples transferred from another model

SLIDE 14

“Gradient Masking”

How to get robustness to single-step a1acks?

Large Margin Classifier

What’s happening? Gradient Masking!

SLIDE 15

Loss of Adversarially Trained Model

Data Point Move in direcXon of another model’s gradient (black-box a1ack) Adversarial Example Move in direcXon

f model’s gradient

(white-box a1ack) Non-Adversarial Example

SLIDE 16

Loss of Adversarially Trained Model

SLIDE 17

Simple A1ack: RAND+Single-Step

1. Small random step
2. Step in direcXon of gradient

26.8 58.3 64.3 10 20 30 40 50 60 70 Error Rate

Top1 error

SLIDE 18

What’s wrong with “Single-Step” Adversarial Training?

Minimize:

SoluXon:

1. The model is actually robust
2. Or, the a3ack is really bad

self.loss(self.attack( ))

Be1er approach? decouple a1ack and defense Degenerate Minimum

SLIDE 19

Ensemble Adversarial Training

ML Model Loss

ML Model ML Model

pre-trained

SLIDE 20

Domain AdaptaXon

Interpret Ensemble Adversarial Training as a

mulXple-source domain adaptaXon problem

– Train on distribuXons (adversaries) A1, …, Ak – Get tested on a new adversary A’

Provable generalizaXon if ∃i such that A’ ≈ Ai
Can’t say much about
ther adversaries

SLIDE 21

Results

22.0 26.8 36.5 23.6 30.0 30.4 20.2 25.9 24.6 5 10 15 20 25 30 35 40 Clean Data White-Box A1ack Black-Box A1ack Error Rate

ImageNet (IncepLon v3, IncepLon ResNet v2)

Adv. Training

Ensemble Adv. Training Ensemble Adv. Training (ResNet)

SLIDE 22

What about stronger a1acks?

Li1le gain on strong white-box a1acks!
But, improvements in black-box seQng!

Ensemble Adversarial Training

A1acks and Defenses

Facebook December 15th 2017 Florian Tramèr Stanford Joint work with Alexey Kurakin (Google Brain) Nicolas Papernot (PSU) Ian Goodfellow (Google Brain) Dan Boneh (Stanford) Patrick McDaniel (PSU)

Adversarial Examples in ML

Pre8y sure this is a panda I’m certain this is a gibbon

Adversarial Examples in ML

CreaXng an adversarial example

ML Model

Loss

What happens if I nudge this pixel?

CreaXng an adversarial example

ML Model

Loss

What happens if I nudge this pixel? What about this one?

CreaXng an adversarial example

ML Model

Loss

What about this one?

Maximize loss with gradient ascent

Threat Model: Black-Box A1acks

Adversarial Examples transfer

ML Model

Defenses?

?

Adversarial Training

ML Model

Loss ML Model

Loss

a1ack

Adversarial Training - Tradeoffs

“weak” a8ack single step

“strong” a8ack many steps

Adversarial Training - Tradeoffs

“weak” a8ack fast

“strong” a8ack slow

Adversarial Training - Tradeoffs

“weak” a8ack not infallible but scalable

“strong” a8ack learn robust models on small datasets

Adversarial Training on ImageNet

Adversarial examples transferred from another model

“Gradient Masking”

Large Margin Classifier

What’s happening? Gradient Masking!

Loss of Adversarially Trained Model

Loss of Adversarially Trained Model

Simple A1ack: RAND+Single-Step

What’s wrong with “Single-Step” Adversarial Training?

Minimize:

SoluXon:

self.loss(self.attack( ))

Be1er approach? decouple a1ack and defense Degenerate Minimum

Ensemble Adversarial Training

ML Model Loss

pre-trained

Domain AdaptaXon

mulXple-source domain adaptaXon problem

– Train on distribuXons (adversaries) A1, …, Ak – Get tested on a new adversary A’

Results

What about stronger a1acks?

Open Problems

– White-box robustness is possible! (Madry et al. 2017)

about adversarial examples?

– Why do they exist? Why do they transfer?

THANK YOU