Gradient Masking in Machine Learning Nicolas Papernot Pennsylvania - - PowerPoint PPT Presentation

gradient masking in machine learning
SMART_READER_LITE
LIVE PREVIEW

Gradient Masking in Machine Learning Nicolas Papernot Pennsylvania - - PowerPoint PPT Presentation

@NicolasPapernot Gradient Masking in Machine Learning Nicolas Papernot Pennsylvania State University ARO Workshop on Adversarial Machine Learning, Stanford September 2017 @NicolasPapernot Thank you to our collaborators Sandy Huang


slide-1
SLIDE 1

Gradient Masking in Machine Learning

Nicolas Papernot

Pennsylvania State University ARO Workshop on Adversarial Machine Learning, Stanford September 2017

@NicolasPapernot

slide-2
SLIDE 2

Pieter Abbeel (Berkeley) Michael Backes (CISPA) Dan Boneh (Stanford)

  • Z. Berkay Celik

(Penn State) Yan Duan (OpenAI) Ian Goodfellow (Google) Matt Fredrikson (CMU) Kathrin Grosse (CISPA)

Thank you to our collaborators

2

Sandy Huang (Berkeley) Somesh Jha (U of Wisconsin) Alexey Kurakin (Google) Praveen Manoharan (CISPA) Patrick McDaniel (Penn State) Arunesh Sinha (U of Michigan) Ananthram Swami (US ARL) Florian Tramèr (Stanford) Michael Wellman (U of Michigan)

@NicolasPapernot

slide-3
SLIDE 3

Gradient Masking

3

slide-4
SLIDE 4

Training

4

Small when prediction is correct on legitimate input

slide-5
SLIDE 5

Adversarial training

5

Small when prediction is correct on legitimate input Small when prediction is correct on adversarial input

slide-6
SLIDE 6

Gradient masking in adversarially trained models

6

Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses Illustration adapted from slides by Florian Tramèr

Direction of the adversarially trained model’s gradient Direction of another model’s gradient

slide-7
SLIDE 7

Gradient masking in adversarially trained models

7

Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses Illustration adapted from slides by Florian Tramèr

Direction of the adversarially trained model’s gradient Adversarial example Non-adversarial example Direction of another model’s gradient

slide-8
SLIDE 8

Gradient masking in adversarially trained models

8

Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses Illustration adapted from slides by Florian Tramèr

Non-adversarial example Direction of the adversarially trained model’s gradient Direction of another model’s gradient Adversarial example Non-adversarial example

slide-9
SLIDE 9

Evading gradient masking (1)

9

Threat model: white-box adversary Attack: (1) Random step (of norm alpha) (2) FGSM step (of norm eps - alpha)

Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses Illustration adapted from slides by Florian Tramèr

slide-10
SLIDE 10

Threat model: black-box attack Attack: (1) Learn substitute for defended model (2) Find adversarial direction using substitute

Evading gradient masking (2)

10

Papernot et al. Practical Black-Box Attacks against Machine Learning Papernot et al. Towards the Science of Security and Privacy in Machine Learning

slide-11
SLIDE 11

11

Black-box ML sys Local substitute “no truck sign” “STOP sign” (1) The adversary queries the remote ML system with synthetic inputs to learn a local substitute.

Attacking black-box models

Papernot et al. Practical Black-box Attacks Against Machine Learning

slide-12
SLIDE 12

12

Black-box ML sys Local substitute “yield sign” (2) The adversary uses the local substitute to craft adversarial examples.

Attacking black-box models

Papernot et al. Practical Black-box Attacks Against Machine Learning

slide-13
SLIDE 13

Adversarial example transferability

13

Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

slide-14
SLIDE 14

Large adversarial subspaces enable transferability

14

Tramèr et al. The Space of Transferable Adversarial Examples

On average: 44 orthogonal directions -> 25 transfer

slide-15
SLIDE 15

Adversarial training

15

Small when prediction is correct on legitimate input Small when prediction is correct on adversarial input gradient is not adversarial

slide-16
SLIDE 16

Ensemble Adversarial Training

16

slide-17
SLIDE 17

Ensemble adversarial training

Intuition: present adversarial gradients from multiple models during training

17

Model A Adversarial gradient

slide-18
SLIDE 18

Ensemble adversarial training

Intuition: present adversarial gradients from multiple models during training

18

Model A Inference

slide-19
SLIDE 19

Ensemble adversarial training

Intuition: present adversarial gradients from multiple models during training

19

Model A Adversarial gradient Model B Model C Model D

slide-20
SLIDE 20

Ensemble adversarial training

Intuition: present adversarial gradients from multiple models during training

20

Model A Adversarial gradient Model B Model C Model D

slide-21
SLIDE 21

Model C

Ensemble adversarial training

Intuition: present adversarial gradients from multiple models during training

21

Model A Adversarial gradient Model B Model D

slide-22
SLIDE 22

Model D

Ensemble adversarial training

Intuition: present adversarial gradients from multiple models during training

22

Model A Adversarial gradient Model B Model C

slide-23
SLIDE 23

Model D

Ensemble adversarial training

Intuition: present adversarial gradients from multiple models during training

23

Model A Adversarial gradient Model B Model C

slide-24
SLIDE 24

Model D Model B Model C

Ensemble adversarial training

Intuition: present adversarial gradients from multiple models during training

24

Model A Inference

slide-25
SLIDE 25

Experimental results on MNIST

25

(from holdout)

slide-26
SLIDE 26

Experimental results on ImageNet

26

(from holdout)

slide-27
SLIDE 27

Reproducible Adversarial ML research with CleverHans

27

slide-28
SLIDE 28

CleverHans library guiding principles

28

1. Benchmark reproducibility 2. Can be used with any TensorFlow model 3. Always include state-of-the-art attacks and defenses

slide-29
SLIDE 29

29

Growing community 1.1K+ stars 290+ forks 35 contributors

slide-30
SLIDE 30

30

Adversarial examples represent worst-case distribution drifts

[DDS04] Dalvi et al. Adversarial Classification (KDD)

slide-31
SLIDE 31

31

Adversarial examples are a tangible instance of hypothetical AI safety problems

Image source: http://www.nerdist.com/wp-content/uploads/2013/07/Space-Odyssey-4.jpg

slide-32
SLIDE 32

This research was funded by:

?

Thank you for listening! Get involved at:

nicolas@papernot.fr

32

www.cleverhans.io @NicolasPapernot

github.com/tensorflow/cleverhans