Ensemble Adversarial Training A1acks and Defenses Cybersecurity - - PowerPoint PPT Presentation

ensemble adversarial training
SMART_READER_LITE
LIVE PREVIEW

Ensemble Adversarial Training A1acks and Defenses Cybersecurity - - PowerPoint PPT Presentation

Ensemble Adversarial Training A1acks and Defenses Cybersecurity With The Best October 15 th 2017 Florian Tramr Stanford Joint work with Alexey Kurakin (Google Brain) Nicolas Papernot (PSU) Dan Boneh (Stanford) Patrick McDaniel (PSU)


slide-1
SLIDE 1

Ensemble Adversarial Training

A1acks and Defenses

Cybersecurity With The Best October 15th 2017 Florian Tramèr Stanford Joint work with Alexey Kurakin (Google Brain) Nicolas Papernot (PSU) Dan Boneh (Stanford) Patrick McDaniel (PSU)

slide-2
SLIDE 2

Adversarial Examples in ML

+ .007 ⇥ =

2

(Goodfellow et al. 2015)

15/10/17 Cybersecurity With The Best – Florian Tramèr

Pre?y sure this is a panda I’m certain this is a gibbon

slide-3
SLIDE 3

Adversarial Examples in ML

  • Images

Szegedy et al. 2013, Nguyen et al. 2015, Goodfellow et al. 2015, Papernot et al. 2016, Liu et al. 2016, Kurakin et al. 2016, …

  • Physical Objects

Sharif et al. 2016, Kurakin et al. 2017, Ev[mov et al. 2017, Lu et al. 2017

  • Malware

Šrndić & Laskov 2014, Xu et al. 2016, Grosse et al. 2016, Hu et al. 2017

  • Text Understanding

Papernot et al. 2016, Jia & Liang 2017

  • Speech

Carlini et al. 2015, Cisse et al. 2017

3 15/10/17 Cybersecurity With The Best – Florian Tramèr

slide-4
SLIDE 4

Crea[ng an adversarial example

4

ML Model

bird tree plane

Loss

bird

15/10/17 Cybersecurity With The Best – Florian Tramèr

What happens if I nudge this pixel?

slide-5
SLIDE 5

Crea[ng an adversarial example

5

ML Model

bird tree plane

Loss

bird

15/10/17 Cybersecurity With The Best – Florian Tramèr

What happens if I nudge this pixel? What about this one?

slide-6
SLIDE 6

Crea[ng an adversarial example

6

ML Model

bird tree plane

Loss

bird

15/10/17 Cybersecurity With The Best – Florian Tramèr

What about this one?

Maximize loss with gradient ascent

slide-7
SLIDE 7

Threat Model: Black-Box A1acks

7

ML Model

plane plane plane

Adversarial Examples transfer

ML Model

ML Model

15/10/17 Cybersecurity With The Best – Florian Tramèr

slide-8
SLIDE 8

Defenses?

  • Ensembles
  • Preprocessing (blurring, cropping, etc.)
  • Dis[lla[on
  • Genera[ve modeling
  • Adversarial training

8 15/10/17 Cybersecurity With The Best – Florian Tramèr

?

slide-9
SLIDE 9

Adversarial Training

9

ML Model

bird

Loss ML Model

plane

Loss

a1ack

15/10/17 Cybersecurity With The Best – Florian Tramèr

slide-10
SLIDE 10

Adversarial Training - Tradeoffs

“weak” a?ack single step

15/10/17 Cybersecurity With The Best – Florian Tramèr 10

“strong” a?ack many steps

slide-11
SLIDE 11

Adversarial Training - Tradeoffs

“weak” a?ack fast

15/10/17 Cybersecurity With The Best – Florian Tramèr 11

“strong” a?ack slow

slide-12
SLIDE 12

Adversarial Training - Tradeoffs

“weak” a?ack not infallible but scalable

15/10/17 Cybersecurity With The Best – Florian Tramèr 12

“strong” a?ack learn robust models on small datasets

Madry et al. 2017

slide-13
SLIDE 13

Adversarial Training on ImageNet

  • Adversarial training with single-step a1ack

(Kurakin et al. 2016)

15/10/17 Cybersecurity With The Best – Florian Tramèr 13

22.0 26.8 36.5 5 10 15 20 25 30 35 40 Clean Data White-Box Single-Step Black-Box Single-Step Error Rate

Top1 error

Adversarial examples transferred from another model

slide-14
SLIDE 14

“Gradient Masking”

  • How to get robustness to single-step a1acks?

Large Margin Classifier

What’s happening? Gradient Masking!

14 15/10/17 Cybersecurity With The Best – Florian Tramèr

slide-15
SLIDE 15

Loss of Adversarially Trained Model

15

Data Point Move in direc[on of another model’s gradient (black-box a1ack) Adversarial Example Move in direc[on

  • f model’s gradient

(white-box a1ack) Non-Adversarial Example

15/10/17 Cybersecurity With The Best – Florian Tramèr

slide-16
SLIDE 16

Loss of Adversarially Trained Model

16 15/10/17 Cybersecurity With The Best – Florian Tramèr

slide-17
SLIDE 17

Simple A1ack: RAND+Single-Step

17

  • 1. Small random step
  • 2. Step in direc[on of gradient

26.8 64.3 10 20 30 40 50 60 70 Error Rate

Top1 error

15/10/17 Cybersecurity With The Best – Florian Tramèr

slide-18
SLIDE 18

What’s wrong with “Single-Step” Adversarial Training?

Minimize:

18

Solu[on:

  • 1. The model is actually robust
  • 2. Or, the a3ack is really bad

15/10/17 Cybersecurity With The Best – Florian Tramèr

self.loss(self.attack( ))

Be1er approach? decouple a1ack and defense Degenerate Minimum

slide-19
SLIDE 19

Ensemble Adversarial Training

19

ML Model Loss

ML Model ML Model

pre-trained

15/10/17 Cybersecurity With The Best – Florian Tramèr

slide-20
SLIDE 20

Results

20

22.0 26.8 36.5 23.6 30.0 30.4 20.2 25.9 24.6 5 10 15 20 25 30 35 40 Clean Data White-Box A1ack Black-Box A1ack Error Rate

ImageNet (IncepNon v3, IncepNon ResNet v2)

  • Adv. Training

Ensemble Adv. Training Ensemble Adv. Training (ResNet)

15/10/17 Cybersecurity With The Best – Florian Tramèr

slide-21
SLIDE 21

What about stronger a1acks?

  • Li1le gain on strong white-box a1acks!
  • But, improvements in black-box seSng!

21 15/10/17 Cybersecurity With The Best – Florian Tramèr

slide-22
SLIDE 22

Open Problems

  • How far can we go with adversarial training?

– White-box robustness is possible! (Madry et al. 2017)

  • Caveat 1: Very expensive
  • Caveat 2: What is the right metric (l∞ , l2 , rota[ons) ?
  • Can we say anything formal (and useful)

about adversarial examples?

– Why do they exist? Why do they transfer?

22

THANK YOU

15/10/17 Cybersecurity With The Best – Florian Tramèr

slide-23
SLIDE 23

Related Work

Adversarial training + black-box a?acks: Szegedy et al., h1ps://arxiv.org/abs/1312.6199

  • riginal paper on adversarial examples

Nguyen et al., h1ps://arxiv.org/abs/1412.1897 a gene[c algorithm for adversarial examples Goodfellow et al., h1ps://arxiv.org/abs/1412.6572 adversarial training with single-step a1acks Papernot et al., h1ps://arxiv.org/abs/1511.04508 the dis[lla[on defense Papernot et al., h1ps://arxiv.org/abs/1602.02697 black-box a1acks, model reverse-engineering Liu et al., h1ps://arxiv.org/abs/1611.02770 black-box a1acks on ImageNet Kurakin et al., h1ps://arxiv.org/abs/1611.01236 adversarial training on ImageNet Tramer et al., h1ps://www.usenix.org/conference/usenixsecurity16/technical-sessions/presenta[on/tramer (model reverse-engineering) Madry et al., h1ps://arxiv.org/abs/1706.06083 learning robust models with strong a1acks Tramer et al., h1ps://arxiv.org/abs/1705.07204

  • ur paper

Physical world: Sharif et al., h1ps://dl.acm.org/cita[on.cfm?id=2978392 fooling facial recogni[on with glasses Kurakin et al., h1ps://arxiv.org/abs/1607.02533 physical-world adversarial examples Lu et al., h1ps://arxiv.org/abs/1707.03501 self driving cars will be fine Es[mov et al., h1ps://arxiv.org/abs/1707.08945 maybe they won’t!

15/10/17 Cybersecurity With The Best – Florian Tramèr 23

slide-24
SLIDE 24

Related Work (cont.)

Malware: Srndic et al., h1ps://dl.acm.org/cita[on.cfm?id=2650798 fooling a pdf-malware detector Xu et al., h1ps://www.cs.virginia.edu/yanjun/paperA14/2016-evade_classifier.pdf (same as above) Grosse et al., h1ps://arxiv.org/abs/1606.04435 adversarial examples for Android malware Hu et al., h1ps://arxiv.org/abs/1702.05983 adversarial examples for Android malware Text: Papernot et al., h1ps://arxiv.org/abs/1604.08275 adversarial examples for text understanding Jia et al., h1ps://arxiv.org/abs/1707.07328 adversarial examples for reading comprehension Speech: Carlini et al., h1ps://www.usenix.org/conference/usenixsecurity16/technical-sessions/presenta[on/carlini (fooling a voice assistant) Cisse et al., h1ps://arxiv.org/abs/1707.05373 adversarial examples for speech, segmenta[on, etc Reinforcement Learning: Huang et al., h1ps://arxiv.org/abs/1702.02284 adversarial examples for neural network policies Kos et al., h1ps://arxiv.org/abs/1705.06452 adversarial examples for neural network policies

15/10/17 Cybersecurity With The Best – Florian Tramèr 24