Ensemble Adversarial Training A1acks and Defenses Cybersecurity - - PowerPoint PPT Presentation
Ensemble Adversarial Training A1acks and Defenses Cybersecurity - - PowerPoint PPT Presentation
Ensemble Adversarial Training A1acks and Defenses Cybersecurity With The Best October 15 th 2017 Florian Tramr Stanford Joint work with Alexey Kurakin (Google Brain) Nicolas Papernot (PSU) Dan Boneh (Stanford) Patrick McDaniel (PSU)
Adversarial Examples in ML
+ .007 ⇥ =
2
(Goodfellow et al. 2015)
15/10/17 Cybersecurity With The Best – Florian Tramèr
Pre?y sure this is a panda I’m certain this is a gibbon
Adversarial Examples in ML
- Images
Szegedy et al. 2013, Nguyen et al. 2015, Goodfellow et al. 2015, Papernot et al. 2016, Liu et al. 2016, Kurakin et al. 2016, …
- Physical Objects
Sharif et al. 2016, Kurakin et al. 2017, Ev[mov et al. 2017, Lu et al. 2017
- Malware
Šrndić & Laskov 2014, Xu et al. 2016, Grosse et al. 2016, Hu et al. 2017
- Text Understanding
Papernot et al. 2016, Jia & Liang 2017
- Speech
Carlini et al. 2015, Cisse et al. 2017
3 15/10/17 Cybersecurity With The Best – Florian Tramèr
Crea[ng an adversarial example
4
ML Model
bird tree plane
Loss
bird
15/10/17 Cybersecurity With The Best – Florian Tramèr
What happens if I nudge this pixel?
Crea[ng an adversarial example
5
ML Model
bird tree plane
Loss
bird
15/10/17 Cybersecurity With The Best – Florian Tramèr
What happens if I nudge this pixel? What about this one?
Crea[ng an adversarial example
6
ML Model
bird tree plane
Loss
bird
15/10/17 Cybersecurity With The Best – Florian Tramèr
What about this one?
Maximize loss with gradient ascent
Threat Model: Black-Box A1acks
7
ML Model
plane plane plane
Adversarial Examples transfer
ML Model
ML Model
15/10/17 Cybersecurity With The Best – Florian Tramèr
Defenses?
- Ensembles
- Preprocessing (blurring, cropping, etc.)
- Dis[lla[on
- Genera[ve modeling
- Adversarial training
8 15/10/17 Cybersecurity With The Best – Florian Tramèr
?
Adversarial Training
9
ML Model
bird
Loss ML Model
plane
Loss
a1ack
15/10/17 Cybersecurity With The Best – Florian Tramèr
Adversarial Training - Tradeoffs
“weak” a?ack single step
15/10/17 Cybersecurity With The Best – Florian Tramèr 10
“strong” a?ack many steps
Adversarial Training - Tradeoffs
“weak” a?ack fast
15/10/17 Cybersecurity With The Best – Florian Tramèr 11
“strong” a?ack slow
Adversarial Training - Tradeoffs
“weak” a?ack not infallible but scalable
15/10/17 Cybersecurity With The Best – Florian Tramèr 12
“strong” a?ack learn robust models on small datasets
Madry et al. 2017
Adversarial Training on ImageNet
- Adversarial training with single-step a1ack
(Kurakin et al. 2016)
15/10/17 Cybersecurity With The Best – Florian Tramèr 13
22.0 26.8 36.5 5 10 15 20 25 30 35 40 Clean Data White-Box Single-Step Black-Box Single-Step Error Rate
Top1 error
Adversarial examples transferred from another model
“Gradient Masking”
- How to get robustness to single-step a1acks?
Large Margin Classifier
What’s happening? Gradient Masking!
14 15/10/17 Cybersecurity With The Best – Florian Tramèr
Loss of Adversarially Trained Model
15
Data Point Move in direc[on of another model’s gradient (black-box a1ack) Adversarial Example Move in direc[on
- f model’s gradient
(white-box a1ack) Non-Adversarial Example
15/10/17 Cybersecurity With The Best – Florian Tramèr
Loss of Adversarially Trained Model
16 15/10/17 Cybersecurity With The Best – Florian Tramèr
Simple A1ack: RAND+Single-Step
17
- 1. Small random step
- 2. Step in direc[on of gradient
26.8 64.3 10 20 30 40 50 60 70 Error Rate
Top1 error
15/10/17 Cybersecurity With The Best – Florian Tramèr
What’s wrong with “Single-Step” Adversarial Training?
Minimize:
18
Solu[on:
- 1. The model is actually robust
- 2. Or, the a3ack is really bad
15/10/17 Cybersecurity With The Best – Florian Tramèr
self.loss(self.attack( ))
Be1er approach? decouple a1ack and defense Degenerate Minimum
Ensemble Adversarial Training
19
ML Model Loss
ML Model ML Model
pre-trained
15/10/17 Cybersecurity With The Best – Florian Tramèr
Results
20
22.0 26.8 36.5 23.6 30.0 30.4 20.2 25.9 24.6 5 10 15 20 25 30 35 40 Clean Data White-Box A1ack Black-Box A1ack Error Rate
ImageNet (IncepNon v3, IncepNon ResNet v2)
- Adv. Training
Ensemble Adv. Training Ensemble Adv. Training (ResNet)
15/10/17 Cybersecurity With The Best – Florian Tramèr
What about stronger a1acks?
- Li1le gain on strong white-box a1acks!
- But, improvements in black-box seSng!
21 15/10/17 Cybersecurity With The Best – Florian Tramèr
Open Problems
- How far can we go with adversarial training?
– White-box robustness is possible! (Madry et al. 2017)
- Caveat 1: Very expensive
- Caveat 2: What is the right metric (l∞ , l2 , rota[ons) ?
- Can we say anything formal (and useful)
about adversarial examples?
– Why do they exist? Why do they transfer?
22
THANK YOU
15/10/17 Cybersecurity With The Best – Florian Tramèr
Related Work
Adversarial training + black-box a?acks: Szegedy et al., h1ps://arxiv.org/abs/1312.6199
- riginal paper on adversarial examples
Nguyen et al., h1ps://arxiv.org/abs/1412.1897 a gene[c algorithm for adversarial examples Goodfellow et al., h1ps://arxiv.org/abs/1412.6572 adversarial training with single-step a1acks Papernot et al., h1ps://arxiv.org/abs/1511.04508 the dis[lla[on defense Papernot et al., h1ps://arxiv.org/abs/1602.02697 black-box a1acks, model reverse-engineering Liu et al., h1ps://arxiv.org/abs/1611.02770 black-box a1acks on ImageNet Kurakin et al., h1ps://arxiv.org/abs/1611.01236 adversarial training on ImageNet Tramer et al., h1ps://www.usenix.org/conference/usenixsecurity16/technical-sessions/presenta[on/tramer (model reverse-engineering) Madry et al., h1ps://arxiv.org/abs/1706.06083 learning robust models with strong a1acks Tramer et al., h1ps://arxiv.org/abs/1705.07204
- ur paper
Physical world: Sharif et al., h1ps://dl.acm.org/cita[on.cfm?id=2978392 fooling facial recogni[on with glasses Kurakin et al., h1ps://arxiv.org/abs/1607.02533 physical-world adversarial examples Lu et al., h1ps://arxiv.org/abs/1707.03501 self driving cars will be fine Es[mov et al., h1ps://arxiv.org/abs/1707.08945 maybe they won’t!
15/10/17 Cybersecurity With The Best – Florian Tramèr 23
Related Work (cont.)
Malware: Srndic et al., h1ps://dl.acm.org/cita[on.cfm?id=2650798 fooling a pdf-malware detector Xu et al., h1ps://www.cs.virginia.edu/yanjun/paperA14/2016-evade_classifier.pdf (same as above) Grosse et al., h1ps://arxiv.org/abs/1606.04435 adversarial examples for Android malware Hu et al., h1ps://arxiv.org/abs/1702.05983 adversarial examples for Android malware Text: Papernot et al., h1ps://arxiv.org/abs/1604.08275 adversarial examples for text understanding Jia et al., h1ps://arxiv.org/abs/1707.07328 adversarial examples for reading comprehension Speech: Carlini et al., h1ps://www.usenix.org/conference/usenixsecurity16/technical-sessions/presenta[on/carlini (fooling a voice assistant) Cisse et al., h1ps://arxiv.org/abs/1707.05373 adversarial examples for speech, segmenta[on, etc Reinforcement Learning: Huang et al., h1ps://arxiv.org/abs/1702.02284 adversarial examples for neural network policies Kos et al., h1ps://arxiv.org/abs/1705.06452 adversarial examples for neural network policies
15/10/17 Cybersecurity With The Best – Florian Tramèr 24