Security and Privacy in Machine Learning Nicolas Papernot - - PowerPoint PPT Presentation

security and privacy in machine learning
SMART_READER_LITE
LIVE PREVIEW

Security and Privacy in Machine Learning Nicolas Papernot - - PowerPoint PPT Presentation

Security and Privacy in Machine Learning Nicolas Papernot Pennsylvania State University & Google Brain Lecture for Prof. Trent Jaegers CSE 543 Computer Security Class November 2017 - Penn State Thank you to my collaborators Patrick


slide-1
SLIDE 1

Security and Privacy in Machine Learning

Nicolas Papernot

Pennsylvania State University & Google Brain

Lecture for Prof. Trent Jaeger’s CSE 543 Computer Security Class

November 2017 - Penn State

slide-2
SLIDE 2

Patrick McDaniel (Penn State) Ian Goodfellow (Google Brain)

Martín Abadi (Google Brain) Pieter Abbeel (Berkeley) Michael Backes (CISPA) Dan Boneh (Stanford)

  • Z. Berkay Celik

(Penn State) Yan Duan (OpenAI) Úlfar Erlingsson (Google Brain) Matt Fredrikson (CMU) Kathrin Grosse (CISPA) Sandy Huang (Berkeley) Somesh Jha (U of Wisconsin)

Thank you to my collaborators

2

Alexey Kurakin (Google Brain) Praveen Manoharan (CISPA) Ilya Mironov (Google Brain) Ananth Raghunathan (Google Brain) Arunesh Sinha (U of Michigan) Shuang Song (UCSD) Ananthram Swami (US ARL) Kunal Talwar (Google Brain) Florian Tramèr (Stanford) Michael Wellman (U of Michigan) Xi Wu (Google)

slide-3
SLIDE 3

3

Machine Learning Classifier

[0.01, 0.84, 0.02, 0.01, 0.01, 0.01, 0.05, 0.01, 0.03, 0.01] [p(0|x,θ), p(1|x,θ), p(2|x,θ), …, p(7|x,θ), p(8|x,θ), p(9|x,θ)]

f(x,θ) x Classifier: map inputs to one class among a predefined set

slide-4
SLIDE 4

4

Machine Learning Classifier

[0 1 0 0 0 0 0 0 0 0] [0 1 0 0 0 0 0 0 0 0] [1 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 1 0 0] [0 0 0 0 0 0 0 0 0 1] [0 0 0 1 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 1 0] [0 0 0 0 0 0 1 0 0 0] [0 1 0 0 0 0 0 0 0 0] [0 0 0 0 1 0 0 0 0 0]

Learning: find internal classifier parameters θ that minimize a cost/loss function (~model error)

slide-5
SLIDE 5

Outline of this lecture

5

1 2

Security in ML Privacy in ML

slide-6
SLIDE 6

Part I Security in machine learning

6

slide-7
SLIDE 7

Attacker may see the model: bad even if an attacker needs to know details of the machine

learning model to do an attack --- aka a white-box attacker

Attacker may not need the model: worse if attacker who knows very little (e.g. only gets to

ask a few questions) can do an attack --- aka a black-box attacker

Attack Models

7

ML ML

Papernot et al. Towards the Science of Security and Privacy in Machine Learning

slide-8
SLIDE 8

Attacker may see the model: bad even if an attacker needs to know details of the machine

learning model to do an attack --- aka a white-box attacker

Attacker may not need the model: worse if attacker who knows very little (e.g. only gets to

ask a few questions) can do an attack --- aka a black-box attacker

Attack Models

8

ML ML

Papernot et al. Towards the Science of Security and Privacy in Machine Learning

slide-9
SLIDE 9

Adversarial examples (white-box attacks)

9

slide-10
SLIDE 10

Jacobian-based Saliency Map Approach (JSMA)

10

Papernot et al. The Limitations of Deep Learning in Adversarial Settings

slide-11
SLIDE 11

11

Jacobian-Based Iterative Approach: source-target misclassification

Papernot et al. The Limitations of Deep Learning in Adversarial Settings

slide-12
SLIDE 12

Evading a Neural Network Malware Classifier

DREBIN dataset of Android applications Add constraints to JSMA approach:

  • nly add features: keep malware behavior
  • nly features from manifest: easy to modify

“Most accurate” neural network

  • 98% accuracy, with 9.7% FP and 1.3% FN
  • Evaded with a 63.08% success rate

12

Grosse et al. Adversarial Perturbations Against Deep Neural Networks for Malware Classification

P[X=Malware] = 0.90 P[X=Benign] = 0.10 P[X*=Malware] = 0.10 P[X*=Benign] = 0.90

slide-13
SLIDE 13

Supervised vs. reinforcement learning

13

Supervised learning Reinforcement learning Model inputs Observation (e.g., traffic sign, music, email) Environment & Reward function Model outputs Class (e.g., stop/yield, jazz/classical, spam/legitimate) Action Training “goal” (i.e., cost/loss) Minimize class prediction error

  • ver pairs of (inputs, outputs)

Maximize reward by exploring the environment and taking actions Example

slide-14
SLIDE 14

Adversarial attacks on neural network policies

14

Huang et al. Adversarial Attacks on Neural Network Policies

slide-15
SLIDE 15

Adversarial examples (black-box attacks)

15

slide-16
SLIDE 16

Threat model of a black-box attack

Training data Model architecture Model parameters Model scores Adversarial capabilities (limited) oracle access: labels Adversarial goal Force a ML model remotely accessible through an API to misclassify

16

Example

slide-17
SLIDE 17

Our approach to black-box attacks

17

Alleviate lack of knowledge about model Alleviate lack of training data

slide-18
SLIDE 18

Adversarial example transferability

Adversarial examples have a transferability property:

samples crafted to mislead a model A are likely to mislead a model B

These property comes in several variants:

  • Intra-technique transferability:

○ Cross model transferability ○ Cross training set transferability

  • Cross-technique transferability

18

ML A

Szegedy et al. Intriguing properties of neural networks

slide-19
SLIDE 19

Adversarial examples have a transferability property:

samples crafted to mislead a model A are likely to mislead a model B

These property comes in several variants:

  • Intra-technique transferability:

○ Cross model transferability ○ Cross training set transferability

  • Cross-technique transferability

19

Szegedy et al. Intriguing properties of neural networks

ML A ML B Victim

Adversarial example transferability

slide-20
SLIDE 20

Adversarial examples have a transferability property:

samples crafted to mislead a model A are likely to mislead a model B

20

Adversarial example transferability

slide-21
SLIDE 21

Cross-technique transferability

21

Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

slide-22
SLIDE 22

Cross-technique transferability

22

Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

slide-23
SLIDE 23

Our approach to black-box attacks

23

Adversarial example transferability from a substitute model to target model Alleviate lack of knowledge about model Alleviate lack of training data

slide-24
SLIDE 24

Attacking remotely hosted black-box models

24

Remote ML sys “no truck sign” “STOP sign” “STOP sign” (1) The adversary queries remote ML system for labels on inputs of its choice.

slide-25
SLIDE 25

25

Remote ML sys Local substitute “no truck sign” “STOP sign” “STOP sign” (2) The adversary uses this labeled data to train a local substitute for the remote system.

Attacking remotely hosted black-box models

slide-26
SLIDE 26

26

Remote ML sys Local substitute “no truck sign” “STOP sign” (3) The adversary selects new synthetic inputs for queries to the remote ML system based on the local substitute’s output surface sensitivity to input variations.

Attacking remotely hosted black-box models

slide-27
SLIDE 27

27

Remote ML sys Local substitute “yield sign” (4) The adversary then uses the local substitute to craft adversarial examples, which are misclassified by the remote ML system because of transferability.

Attacking remotely hosted black-box models

slide-28
SLIDE 28

Our approach to black-box attacks

28

Adversarial example transferability from a substitute model to target model Synthetic data generation

+

Alleviate lack of knowledge about model Alleviate lack of training data

slide-29
SLIDE 29

Results on real-world remote systems

29

All remote classifiers are trained on the MNIST dataset (10 classes, 60,000 training samples)

Remote Platform ML technique Number of queries Adversarial examples misclassified (after querying)

Deep Learning 6,400 84.24% Logistic Regression 800 96.19% Unknown 2,000 97.72%

[PMG16a] Papernot et al. Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples

slide-30
SLIDE 30

Benchmarking progress in the adversarial ML community

30

slide-31
SLIDE 31

31

slide-32
SLIDE 32

32

Growing community 1.3K+ stars 340+ forks 40+ contributors

slide-33
SLIDE 33

33

Adversarial examples represent worst-case distribution drifts

[DDS04] Dalvi et al. Adversarial Classification (KDD)

slide-34
SLIDE 34

34

Adversarial examples are a tangible instance of hypothetical AI safety problems

Image source: http://www.nerdist.com/wp-content/uploads/2013/07/Space-Odyssey-4.jpg

slide-35
SLIDE 35

Part II Privacy in machine learning

35

slide-36
SLIDE 36

Types of adversaries and our threat model

36

In our work, the threat model assumes:

  • Adversary can make a potentially unbounded number of queries
  • Adversary has access to model internals

Model inspection (white-box adversary)

Zhang et al. (2017) Understanding DL requires rethinking generalization

Model querying (black-box adversary)

Shokri et al. (2016) Membership Inference Attacks against ML Models Fredrikson et al. (2015) Model Inversion Attacks ? Black-box ML

slide-37
SLIDE 37

A definition of privacy

37

Randomized Algorithm Randomized Algorithm Answer 1 Answer 2 ... Answer n Answer 1 Answer 2 ... Answer n

? ? ? ?

}

slide-38
SLIDE 38

Our design goals

38

Preserve privacy of training data when learning classifiers Differential privacy protection guarantees Intuitive privacy protection guarantees Generic* (independent of learning algorithm)

Goals Problem

*This is a key distinction from previous work, such as Pathak et al. (2011) Privacy preserving probabilistic inference with hidden markov models Jagannathan et al. (2013) A semi-supervised learning approach to differential privacy Shokri et al. (2015) Privacy-preserving Deep Learning Abadi et al. (2016) Deep Learning with Differential Privacy Hamm et al. (2016) Learning privately from multiparty data

slide-39
SLIDE 39

The PATE approach

39

slide-40
SLIDE 40

Teacher ensemble

40

Partition 1 Partition 2 Partition n Partition 3

...

Teacher 1 Teacher 2 Teacher n Teacher 3

...

Training Sensitive Data Data flow

slide-41
SLIDE 41

Aggregation

41

Count votes Take maximum

slide-42
SLIDE 42

Intuitive privacy analysis

42

If most teachers agree on the label, it does not depend on specific partitions, so the privacy cost is small. If two classes have close vote counts, the disagreement may reveal private information.

slide-43
SLIDE 43

Noisy aggregation

43

Count votes Add Laplacian noise Take maximum

slide-44
SLIDE 44

Teacher ensemble

44

Partition 1 Partition 2 Partition n Partition 3

...

Teacher 1 Teacher 2 Teacher n Teacher 3

...

Aggregated Teacher Training Sensitive Data Data flow

slide-45
SLIDE 45

Student training

45

Partition 1 Partition 2 Partition n Partition 3

...

Teacher 1 Teacher 2 Teacher n Teacher 3

...

Aggregated Teacher Student Training Available to the adversary Not available to the adversary Sensitive Data Public Data Inference Data flow Queries

slide-46
SLIDE 46

Why train an additional “student” model?

46

Each prediction increases total privacy loss.

Privacy budgets create a tension between the accuracy and number of predictions.

Inspection of internals may reveal private data.

Privacy guarantees should hold in the face of white-box adversaries. 1 2

The aggregated teacher violates our threat model:

slide-47
SLIDE 47

Student training

47

Partition 1 Partition 2 Partition n Partition 3

...

Teacher 1 Teacher 2 Teacher n Teacher 3

...

Aggregated Teacher Student Training Available to the adversary Not available to the adversary Sensitive Data Public Data Inference Data flow Queries

slide-48
SLIDE 48

Deployment

48

Inference Available to the adversary Queries Student

slide-49
SLIDE 49

Differential privacy:

A randomized algorithm M satisfies (,) differential privacy if for all pairs of neighbouring datasets (d,d’), for all subsets S of outputs:

Application of the Moments Accountant technique (Abadi et al, 2016) Strong quorum ⟹ Small privacy cost Bound is data-dependent: computed using the empirical quorum

Differential privacy analysis

49

slide-50
SLIDE 50

Experimental results

50

slide-51
SLIDE 51

Experimental setup

51

Dataset Teacher Model Student Model MNIST Convolutional Neural Network Generative Adversarial Networks SVHN Convolutional Neural Network Generative Adversarial Networks UCI Adult Random Forest Random Forest UCI Diabetes Random Forest Random Forest

/ /models/tree/master/differential_privacy/multiple_teachers

slide-52
SLIDE 52

Aggregated teacher accuracy

52

slide-53
SLIDE 53

Trade-off between student accuracy and privacy

53

slide-54
SLIDE 54

Trade-off between student accuracy and privacy

54

UCI Diabetes

  • 1.44
  • 10-5

Non-private baseline 93.81% Student accuracy 93.94%

slide-55
SLIDE 55

Synergy between privacy and generalization

55

slide-56
SLIDE 56

www.papernot.fr

56

@NicolasPapernot Some online ressources: Blog on S&P in ML (joint work w/ Ian Goodfellow) www.cleverhans.io ML course https://coursera.org/learn/machine-learning DL course https://coursera.org/learn/neural-networks Assigned reading and more in-depth technical survey paper: Machine Learning in Adversarial Settings Patrick McDaniel, Nicolas Papernot, Z. Berkay Celik Towards the Science of Security and Privacy in Machine Learning Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael Wellman

slide-57
SLIDE 57

57

slide-58
SLIDE 58

Gradient masking

58

Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses

slide-59
SLIDE 59

Gradient masking

59

Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses