Security and Privacy in Machine Learning
Nicolas Papernot
Pennsylvania State University & Google Brain
Lecture for Prof. Trent Jaeger’s CSE 543 Computer Security Class
November 2017 - Penn State
Security and Privacy in Machine Learning Nicolas Papernot - - PowerPoint PPT Presentation
Security and Privacy in Machine Learning Nicolas Papernot Pennsylvania State University & Google Brain Lecture for Prof. Trent Jaegers CSE 543 Computer Security Class November 2017 - Penn State Thank you to my collaborators Patrick
Nicolas Papernot
Pennsylvania State University & Google Brain
Lecture for Prof. Trent Jaeger’s CSE 543 Computer Security Class
November 2017 - Penn State
Patrick McDaniel (Penn State) Ian Goodfellow (Google Brain)
Martín Abadi (Google Brain) Pieter Abbeel (Berkeley) Michael Backes (CISPA) Dan Boneh (Stanford)
(Penn State) Yan Duan (OpenAI) Úlfar Erlingsson (Google Brain) Matt Fredrikson (CMU) Kathrin Grosse (CISPA) Sandy Huang (Berkeley) Somesh Jha (U of Wisconsin)
2
Alexey Kurakin (Google Brain) Praveen Manoharan (CISPA) Ilya Mironov (Google Brain) Ananth Raghunathan (Google Brain) Arunesh Sinha (U of Michigan) Shuang Song (UCSD) Ananthram Swami (US ARL) Kunal Talwar (Google Brain) Florian Tramèr (Stanford) Michael Wellman (U of Michigan) Xi Wu (Google)
3
Machine Learning Classifier
[0.01, 0.84, 0.02, 0.01, 0.01, 0.01, 0.05, 0.01, 0.03, 0.01] [p(0|x,θ), p(1|x,θ), p(2|x,θ), …, p(7|x,θ), p(8|x,θ), p(9|x,θ)]
f(x,θ) x Classifier: map inputs to one class among a predefined set
4
Machine Learning Classifier
[0 1 0 0 0 0 0 0 0 0] [0 1 0 0 0 0 0 0 0 0] [1 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 1 0 0] [0 0 0 0 0 0 0 0 0 1] [0 0 0 1 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 1 0] [0 0 0 0 0 0 1 0 0 0] [0 1 0 0 0 0 0 0 0 0] [0 0 0 0 1 0 0 0 0 0]
Learning: find internal classifier parameters θ that minimize a cost/loss function (~model error)
5
6
Attacker may see the model: bad even if an attacker needs to know details of the machine
learning model to do an attack --- aka a white-box attacker
Attacker may not need the model: worse if attacker who knows very little (e.g. only gets to
ask a few questions) can do an attack --- aka a black-box attacker
7
ML ML
Papernot et al. Towards the Science of Security and Privacy in Machine Learning
Attacker may see the model: bad even if an attacker needs to know details of the machine
learning model to do an attack --- aka a white-box attacker
Attacker may not need the model: worse if attacker who knows very little (e.g. only gets to
ask a few questions) can do an attack --- aka a black-box attacker
8
ML ML
Papernot et al. Towards the Science of Security and Privacy in Machine Learning
9
10
Papernot et al. The Limitations of Deep Learning in Adversarial Settings
11
Papernot et al. The Limitations of Deep Learning in Adversarial Settings
DREBIN dataset of Android applications Add constraints to JSMA approach:
“Most accurate” neural network
12
Grosse et al. Adversarial Perturbations Against Deep Neural Networks for Malware Classification
P[X=Malware] = 0.90 P[X=Benign] = 0.10 P[X*=Malware] = 0.10 P[X*=Benign] = 0.90
13
Supervised learning Reinforcement learning Model inputs Observation (e.g., traffic sign, music, email) Environment & Reward function Model outputs Class (e.g., stop/yield, jazz/classical, spam/legitimate) Action Training “goal” (i.e., cost/loss) Minimize class prediction error
Maximize reward by exploring the environment and taking actions Example
14
Huang et al. Adversarial Attacks on Neural Network Policies
15
Training data Model architecture Model parameters Model scores Adversarial capabilities (limited) oracle access: labels Adversarial goal Force a ML model remotely accessible through an API to misclassify
16
Example
17
Alleviate lack of knowledge about model Alleviate lack of training data
Adversarial examples have a transferability property:
These property comes in several variants:
○ Cross model transferability ○ Cross training set transferability
18
ML A
Szegedy et al. Intriguing properties of neural networks
Adversarial examples have a transferability property:
These property comes in several variants:
○ Cross model transferability ○ Cross training set transferability
19
Szegedy et al. Intriguing properties of neural networks
ML A ML B Victim
Adversarial examples have a transferability property:
20
21
Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
22
Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
23
Adversarial example transferability from a substitute model to target model Alleviate lack of knowledge about model Alleviate lack of training data
24
Remote ML sys “no truck sign” “STOP sign” “STOP sign” (1) The adversary queries remote ML system for labels on inputs of its choice.
25
Remote ML sys Local substitute “no truck sign” “STOP sign” “STOP sign” (2) The adversary uses this labeled data to train a local substitute for the remote system.
26
Remote ML sys Local substitute “no truck sign” “STOP sign” (3) The adversary selects new synthetic inputs for queries to the remote ML system based on the local substitute’s output surface sensitivity to input variations.
27
Remote ML sys Local substitute “yield sign” (4) The adversary then uses the local substitute to craft adversarial examples, which are misclassified by the remote ML system because of transferability.
28
Adversarial example transferability from a substitute model to target model Synthetic data generation
Alleviate lack of knowledge about model Alleviate lack of training data
29
All remote classifiers are trained on the MNIST dataset (10 classes, 60,000 training samples)
Remote Platform ML technique Number of queries Adversarial examples misclassified (after querying)
Deep Learning 6,400 84.24% Logistic Regression 800 96.19% Unknown 2,000 97.72%
[PMG16a] Papernot et al. Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples
30
31
32
33
[DDS04] Dalvi et al. Adversarial Classification (KDD)
34
Image source: http://www.nerdist.com/wp-content/uploads/2013/07/Space-Odyssey-4.jpg
35
36
In our work, the threat model assumes:
Zhang et al. (2017) Understanding DL requires rethinking generalization
Shokri et al. (2016) Membership Inference Attacks against ML Models Fredrikson et al. (2015) Model Inversion Attacks ? Black-box ML
37
Randomized Algorithm Randomized Algorithm Answer 1 Answer 2 ... Answer n Answer 1 Answer 2 ... Answer n
38
Preserve privacy of training data when learning classifiers Differential privacy protection guarantees Intuitive privacy protection guarantees Generic* (independent of learning algorithm)
Goals Problem
*This is a key distinction from previous work, such as Pathak et al. (2011) Privacy preserving probabilistic inference with hidden markov models Jagannathan et al. (2013) A semi-supervised learning approach to differential privacy Shokri et al. (2015) Privacy-preserving Deep Learning Abadi et al. (2016) Deep Learning with Differential Privacy Hamm et al. (2016) Learning privately from multiparty data
39
40
Partition 1 Partition 2 Partition n Partition 3
...
Teacher 1 Teacher 2 Teacher n Teacher 3
...
Training Sensitive Data Data flow
41
Count votes Take maximum
42
If most teachers agree on the label, it does not depend on specific partitions, so the privacy cost is small. If two classes have close vote counts, the disagreement may reveal private information.
43
Count votes Add Laplacian noise Take maximum
44
Partition 1 Partition 2 Partition n Partition 3
...
Teacher 1 Teacher 2 Teacher n Teacher 3
...
Aggregated Teacher Training Sensitive Data Data flow
45
Partition 1 Partition 2 Partition n Partition 3
...
Teacher 1 Teacher 2 Teacher n Teacher 3
...
Aggregated Teacher Student Training Available to the adversary Not available to the adversary Sensitive Data Public Data Inference Data flow Queries
46
Privacy budgets create a tension between the accuracy and number of predictions.
Privacy guarantees should hold in the face of white-box adversaries. 1 2
47
Partition 1 Partition 2 Partition n Partition 3
...
Teacher 1 Teacher 2 Teacher n Teacher 3
...
Aggregated Teacher Student Training Available to the adversary Not available to the adversary Sensitive Data Public Data Inference Data flow Queries
48
Inference Available to the adversary Queries Student
Differential privacy:
A randomized algorithm M satisfies (,) differential privacy if for all pairs of neighbouring datasets (d,d’), for all subsets S of outputs:
Application of the Moments Accountant technique (Abadi et al, 2016) Strong quorum ⟹ Small privacy cost Bound is data-dependent: computed using the empirical quorum
49
50
51
Dataset Teacher Model Student Model MNIST Convolutional Neural Network Generative Adversarial Networks SVHN Convolutional Neural Network Generative Adversarial Networks UCI Adult Random Forest Random Forest UCI Diabetes Random Forest Random Forest
/ /models/tree/master/differential_privacy/multiple_teachers
52
53
54
UCI Diabetes
Non-private baseline 93.81% Student accuracy 93.94%
Synergy between privacy and generalization
55
www.papernot.fr
56
@NicolasPapernot Some online ressources: Blog on S&P in ML (joint work w/ Ian Goodfellow) www.cleverhans.io ML course https://coursera.org/learn/machine-learning DL course https://coursera.org/learn/neural-networks Assigned reading and more in-depth technical survey paper: Machine Learning in Adversarial Settings Patrick McDaniel, Nicolas Papernot, Z. Berkay Celik Towards the Science of Security and Privacy in Machine Learning Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael Wellman
57
58
Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses
59
Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses