Ensemble Adversarial Training A1acks and Defenses Facebook December - - PowerPoint PPT Presentation
Ensemble Adversarial Training A1acks and Defenses Facebook December - - PowerPoint PPT Presentation
Ensemble Adversarial Training A1acks and Defenses Facebook December 15 th 2017 Florian Tramr Stanford Joint work with Alexey Kurakin (Google Brain) Nicolas Papernot (PSU) Ian Goodfellow (Google Brain) Dan Boneh (Stanford) Patrick
Adversarial Examples in ML
+ .007 ⇥ =
2
(Goodfellow et al. 2015)
Pre8y sure this is a panda I’m certain this is a gibbon
Adversarial Examples in ML
- Images
Szegedy et al. 2013, Nguyen et al. 2015, Goodfellow et al. 2015, Papernot et al. 2016, Liu et al. 2016, Kurakin et al. 2016, …
- Physical Objects
Sharif et al. 2016, Kurakin et al. 2017, EvXmov et al. 2017, Lu et al. 2017, Athalye et al. 2017
- Malware
Šrndić & Laskov 2014, Xu et al. 2016, Grosse et al. 2016, Hu et al. 2017
- Text Understanding
Papernot et al. 2016, Jia & Liang 2017
- Speech
Carlini et al. 2015, Cisse et al. 2017
3
CreaXng an adversarial example
4
ML Model
bird tree plane
Loss
bird
What happens if I nudge this pixel?
CreaXng an adversarial example
5
ML Model
bird tree plane
Loss
bird
What happens if I nudge this pixel? What about this one?
CreaXng an adversarial example
6
ML Model
bird tree plane
Loss
bird
What about this one?
Maximize loss with gradient ascent
Threat Model: Black-Box A1acks
7
ML Model
plane plane plane
Adversarial Examples transfer
ML Model
ML Model
Defenses?
- Ensembles
- Preprocessing (blurring, cropping, etc.)
- DisXllaXon
- GeneraXve modeling
- Adversarial training
8
?
Adversarial Training
9
ML Model
bird
Loss ML Model
plane
Loss
a1ack
Adversarial Training - Tradeoffs
“weak” a8ack single step
10
“strong” a8ack many steps
Adversarial Training - Tradeoffs
“weak” a8ack fast
11
“strong” a8ack slow
Adversarial Training - Tradeoffs
“weak” a8ack not infallible but scalable
12
“strong” a8ack learn robust models on small datasets
Madry et al. 2017
Adversarial Training on ImageNet
- Adversarial training with single-step a1ack
(Kurakin et al. 2016)
13
22.0 26.8 36.5 5 10 15 20 25 30 35 40 Clean Data White-Box Single-Step Black-Box Single-Step Error Rate
Top1 error
Adversarial examples transferred from another model
“Gradient Masking”
- How to get robustness to single-step a1acks?
Large Margin Classifier
What’s happening? Gradient Masking!
14
Loss of Adversarially Trained Model
15
Data Point Move in direcXon of another model’s gradient (black-box a1ack) Adversarial Example Move in direcXon
- f model’s gradient
(white-box a1ack) Non-Adversarial Example
Loss of Adversarially Trained Model
16
Simple A1ack: RAND+Single-Step
17
- 1. Small random step
- 2. Step in direcXon of gradient
26.8 58.3 64.3 10 20 30 40 50 60 70 Error Rate
Top1 error
What’s wrong with “Single-Step” Adversarial Training?
Minimize:
18
SoluXon:
- 1. The model is actually robust
- 2. Or, the a3ack is really bad
self.loss(self.attack( ))
Be1er approach? decouple a1ack and defense Degenerate Minimum
Ensemble Adversarial Training
19
ML Model Loss
ML Model ML Model
pre-trained
Domain AdaptaXon
- Interpret Ensemble Adversarial Training as a
mulXple-source domain adaptaXon problem
– Train on distribuXons (adversaries) A1, …, Ak – Get tested on a new adversary A’
- Provable generalizaXon if ∃i such that A’ ≈ Ai
- Can’t say much about
- ther adversaries
20
Results
21
22.0 26.8 36.5 23.6 30.0 30.4 20.2 25.9 24.6 5 10 15 20 25 30 35 40 Clean Data White-Box A1ack Black-Box A1ack Error Rate
ImageNet (IncepLon v3, IncepLon ResNet v2)
- Adv. Training
Ensemble Adv. Training Ensemble Adv. Training (ResNet)
What about stronger a1acks?
- Li1le gain on strong white-box a1acks!
- But, improvements in black-box seQng!
22
Open Problems
- How far can we go with adversarial training?
– White-box robustness is possible! (Madry et al. 2017)
- Caveat 1: Very expensive
- Caveat 2: What is the right metric (l∞ , l2 , rotaXons) ?
- Can we say anything formal (and useful)
about adversarial examples?
– Why do they exist? Why do they transfer?
23