Gradient Masking in Machine Learning
Nicolas Papernot
Pennsylvania State University ARO Workshop on Adversarial Machine Learning, Stanford September 2017
@NicolasPapernot
Gradient Masking in Machine Learning Nicolas Papernot Pennsylvania - - PowerPoint PPT Presentation
@NicolasPapernot Gradient Masking in Machine Learning Nicolas Papernot Pennsylvania State University ARO Workshop on Adversarial Machine Learning, Stanford September 2017 @NicolasPapernot Thank you to our collaborators Sandy Huang
Nicolas Papernot
Pennsylvania State University ARO Workshop on Adversarial Machine Learning, Stanford September 2017
@NicolasPapernot
Pieter Abbeel (Berkeley) Michael Backes (CISPA) Dan Boneh (Stanford)
(Penn State) Yan Duan (OpenAI) Ian Goodfellow (Google) Matt Fredrikson (CMU) Kathrin Grosse (CISPA)
2
Sandy Huang (Berkeley) Somesh Jha (U of Wisconsin) Alexey Kurakin (Google) Praveen Manoharan (CISPA) Patrick McDaniel (Penn State) Arunesh Sinha (U of Michigan) Ananthram Swami (US ARL) Florian Tramèr (Stanford) Michael Wellman (U of Michigan)
@NicolasPapernot
3
4
Small when prediction is correct on legitimate input
5
Small when prediction is correct on legitimate input Small when prediction is correct on adversarial input
6
Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses Illustration adapted from slides by Florian Tramèr
Direction of the adversarially trained model’s gradient Direction of another model’s gradient
7
Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses Illustration adapted from slides by Florian Tramèr
Direction of the adversarially trained model’s gradient Adversarial example Non-adversarial example Direction of another model’s gradient
8
Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses Illustration adapted from slides by Florian Tramèr
Non-adversarial example Direction of the adversarially trained model’s gradient Direction of another model’s gradient Adversarial example Non-adversarial example
9
Threat model: white-box adversary Attack: (1) Random step (of norm alpha) (2) FGSM step (of norm eps - alpha)
Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses Illustration adapted from slides by Florian Tramèr
Threat model: black-box attack Attack: (1) Learn substitute for defended model (2) Find adversarial direction using substitute
10
Papernot et al. Practical Black-Box Attacks against Machine Learning Papernot et al. Towards the Science of Security and Privacy in Machine Learning
11
Black-box ML sys Local substitute “no truck sign” “STOP sign” (1) The adversary queries the remote ML system with synthetic inputs to learn a local substitute.
Papernot et al. Practical Black-box Attacks Against Machine Learning
12
Black-box ML sys Local substitute “yield sign” (2) The adversary uses the local substitute to craft adversarial examples.
Papernot et al. Practical Black-box Attacks Against Machine Learning
13
Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
14
Tramèr et al. The Space of Transferable Adversarial Examples
On average: 44 orthogonal directions -> 25 transfer
15
Small when prediction is correct on legitimate input Small when prediction is correct on adversarial input gradient is not adversarial
16
Intuition: present adversarial gradients from multiple models during training
17
Model A Adversarial gradient
Intuition: present adversarial gradients from multiple models during training
18
Model A Inference
Intuition: present adversarial gradients from multiple models during training
19
Model A Adversarial gradient Model B Model C Model D
Intuition: present adversarial gradients from multiple models during training
20
Model A Adversarial gradient Model B Model C Model D
Model C
Intuition: present adversarial gradients from multiple models during training
21
Model A Adversarial gradient Model B Model D
Model D
Intuition: present adversarial gradients from multiple models during training
22
Model A Adversarial gradient Model B Model C
Model D
Intuition: present adversarial gradients from multiple models during training
23
Model A Adversarial gradient Model B Model C
Model D Model B Model C
Intuition: present adversarial gradients from multiple models during training
24
Model A Inference
25
(from holdout)
26
(from holdout)
27
28
29
30
[DDS04] Dalvi et al. Adversarial Classification (KDD)
31
Image source: http://www.nerdist.com/wp-content/uploads/2013/07/Space-Odyssey-4.jpg
This research was funded by:
Thank you for listening! Get involved at:
nicolas@papernot.fr
32
www.cleverhans.io @NicolasPapernot