Differential Privacy Machine Learning Li Xiong Big Data + Machine - - PowerPoint PPT Presentation

differential privacy machine learning
SMART_READER_LITE
LIVE PREVIEW

Differential Privacy Machine Learning Li Xiong Big Data + Machine - - PowerPoint PPT Presentation

CS573 Data Privacy and Security Differential Privacy Machine Learning Li Xiong Big Data + Machine Learning + Machine Learning Under Adversarial Settings Data privacy/confidentiality attacks membership attacks, model inversion


slide-1
SLIDE 1

CS573 Data Privacy and Security Differential Privacy – Machine Learning

Li Xiong

slide-2
SLIDE 2

Big Data + Machine Learning

+

slide-3
SLIDE 3

Machine Learning Under Adversarial Settings

  • Data privacy/confidentiality attacks
  • membership attacks, model inversion attacks
  • Model integrity attacks
  • Training time: data poisoning attacks
  • Inference time: adversarial examples
slide-4
SLIDE 4

Differential Privacy for Machine Learning

  • Data privacy attacks
  • Model inversion attacks
  • Membership inference attacks
  • Differential privacy for deep learning
  • Noisy SGD
  • PATE
slide-5
SLIDE 5

Neural Networks

slide-6
SLIDE 6
slide-7
SLIDE 7

Learning the parameters: Gradient Descent

slide-8
SLIDE 8

Stochastic Gradient Descent

 Gradient Descent (batch GD)

 The cost gradient is based on the complete training set, can be costly and

longer to converge to minimum

 Stochastic Gradient Descent (SGD, iterative or online-GD)

 Update the weight after each training sample  The gradient based on a single training sample is a stochastic

approximation of the true cost gradient

 Converges faster but the path towards minimum may zig-zag

 Mini-Batch Gradient Descent (MB-GD)

 Update the weights based on small group of training samples

slide-9
SLIDE 9

Facial Recognitio n Model

Private training dataset

Philip Jack Monica unknown

Input (facial image) Output (label) …

Training-data extraction attacks

Fredrikson et al. (2015) :

slide-10
SLIDE 10

Membership Inference Attacks against Machine Learning Models

Reza Shokri, Marco Stronati, Congzheng Song, Vitaly Shmatikov

slide-11
SLIDE 11

Membership Inference Attack

5

Model Training

DATA

Prediction Input data Classification Was this specific data record part of the training set?

airplane automobile … ship truck

slide-12
SLIDE 12

Membership Inference Attack

8

  • n Summary Statistics
  • Summary statistics (e.g., average) on each attribute
  • Underlying distribution of data is known

[Homer et al. (2008)], [Dwork et al. (2015)], [Backes et al. (2016)]

  • n Machine Learning Models

Black-box setting:

  • No knowledge about the models’ parameters
  • No access to internal computations of the model
  • No knowledge about the underlying distribution of data
slide-13
SLIDE 13

9

Model Training API

DATA

Prediction API

Exploit Model’s Predictions

Main insight: ML models overfit to their training data

slide-14
SLIDE 14

9

Model Training API

DATA

Prediction API

Exploit Model’s Predictions

Input from the training set Classification

Main insight: ML models overfit to their training data

slide-15
SLIDE 15

9

Model Training API

DATA

Prediction API

Exploit Model’s Predictions

Input from the training set Input NOT from the training set Classification Classification

Main insight: ML models overfit to their training data

slide-16
SLIDE 16

9

Model Training API

DATA

Prediction API

Exploit Model’s Predictions

Input from the training set Input NOT from the training set Classification Classification

Recognize the difference

slide-17
SLIDE 17

10

Model Training API

DATA

Prediction API

Input from the training set Input not from the training set Classification Classification

recognize the difference Train a ML model to

ML against ML

slide-18
SLIDE 18

11

IN OUT IN OUT IN OUT classification classification classification

Shadow Model 2 Shadow Model k Shadow Model 1

Train Attack Model using

Shadow Models

Train the attack model

Train 1 Test 1 Train 2 Test 2 Train k Test k

to predict if an input was a member of the training set (in) or a non-member (out)

slide-19
SLIDE 19

Obtaining Data for Training

Shadow Models

  • Real: similar to training data of the target model

(i.e., drawn from same distribution)

  • Synthetic: use a sampling algorithm to obtain data

classified with high confidence by the target model

12

slide-20
SLIDE 20

Constructing the Attack Model

14

Model

Prediction API

DATA SYNTHETIC

Shadow Shadow Shadow Shadow Shadow Shadow

Shadow Models

DATA

AT TA C K Tr a i n i n g

Attack Model

slide-21
SLIDE 21

Constructing the Attack Model

14

Model

Prediction API

Attack Model

membership probability classification

  • ne single

data record

Using the Attack Model

Model

Prediction API

DATA SYNTHETIC

Shadow Shadow Shadow Shadow Shadow Shadow

Shadow Models

DATA

AT TA C K Tr a i n i n g

Attack Model

slide-22
SLIDE 22

15

Purchase Dataset — Classify Customers (100 classes)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.4 0.6 0.8 1 Cumulative Fraction of Classes Real Data Marginal-Based Synthetic Model-Based Synthetic

shadows trained

  • n real data
  • verall accuracy:

0.93

shadows trained

  • n synthetic data
  • verall accuracy:

0.89

Membership inference precision

slide-23
SLIDE 23

16

Privacy Learning

data universe

training set Model

slide-24
SLIDE 24

16

Privacy Learning

data universe

training set Model

Does the model leak information about data in the training set?

slide-25
SLIDE 25

16

Privacy Learning

data universe

training set Model

Does the model leak information about data in the training set? Does the model generalize to data

  • utside the training set?
slide-26
SLIDE 26

16

Privacy Learning

data universe

training set Model

Overfitting is the common enemy!

Does the model leak information about data in the training set? Does the model generalize to data

  • utside the training set?
slide-27
SLIDE 27

Not in a Direct Conflict!

17

Privacy-preserving machine learning

Privacy

Utility

(prediction accuracy)

slide-28
SLIDE 28

Differential Privacy for Machine Learning

  • Data privacy attacks
  • Model inversion attacks
  • Membership inference attacks
  • Differential privacy for deep learning
  • Noisy SGD
  • PATE
slide-29
SLIDE 29

DEEP LEARNING WITH DIFFERENTIAL PRIVACY

Martin Abadi, Andy Chu, Ian Goodfellow*, Brendan McMahan, Ilya Mironov, Kunal T alwar , Li Zhang

Google

* OpenAI

slide-30
SLIDE 30

Differential Privacy

(ε, δ)-Differential Privacy: The distribution of the output M(D) on database D is (nearly) the same as M(D′): ∀S: Pr[M(D)∊S] ≤ exp(ε) ∙ Pr[M(D′)∊S]+δ. quantifies information leakage allows for a small probability of failure

slide-31
SLIDE 31

Interpreting Differential Privacy

DD′

Training Data Model SGD

slide-32
SLIDE 32

Differential Privacy: Gaussian Mechanism

If ℓ2-sensitivity of f:D→ℝn: maxD,D′ ||f(D) − f(D′)||2 < 1, then the Gaussian mechanism f(D) + Nn(0, σ2)

  • ffers (ε, δ)-differential privacy

, where δ ≈exp(-(εσ)2/2).

Dwork, Kenthapadi, McSherry, Mironov, Naor , “Our Data, Ourselves”, Eurocrypt 2006

slide-33
SLIDE 33

Basic Composition Theorem

If f is (ε1, δ1)-DP and g is (ε2, δ2)-DP , then f(D), g(D) is (ε1+ε2,δ1+δ2)-DP

slide-34
SLIDE 34

Simple Recipe for CompositeFunctions

Tocompute composite f with differential privacy

  • 1. Bound sensitivity of f’scomponents
  • 2. Apply the Gaussian mechanism to each component
  • 3. Compute total privacy via the composition theorem
slide-35
SLIDE 35

Deep Learning with DifferentialPrivacy

slide-36
SLIDE 36

Differentially Private Deep Learning

softmax loss MNIST andCIFAR-10 PCA+ neural network

  • 1. Loss function
  • 2. Training / Test data
  • 3. Topology
  • 4. Training algorithm
  • 5. Hyperparameters

Differentially private SGD tune experimentally

slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40

Naïve Privacy Analysis

  • 1. Choose
  • 2. Each step is(ε, δ)-DP
  • 3. Number of steps T
  • 4. Composition: (Tε, Tδ)-DP

= 4 (1.2, 10-5)-DP 10,000 (12,000, .1)-DP

slide-41
SLIDE 41

Advanced Composition Theorems

slide-42
SLIDE 42

Composition theorem

+ε for Blue +.2ε for Blue + ε for Red

slide-43
SLIDE 43

Strong Composition Theorem

Dwork, Rothblum, Vadhan, “Boosting and Differential Privacy”, FOCS 2010 Dwork, Rothblum, “Concentrated Differential Privacy”, https://arxiv .org/abs/1603.0188

  • 1. Choose

= 4

  • 2. Each step is(ε, δ)-DP
  • 3. Number of steps T

(1.2, 10-5)-DP 10,000

  • 4. Strong comp: (

, Tδ)-DP (360, .1)-DP

slide-44
SLIDE 44

Amplification by Sampling

  • 1. Choose
  • 2. Each batch is q fraction of data
  • 3. Each step is (2qε, qδ)-DP
  • 4. Number of steps T
  • 5. Strong comp: (

, qTδ)-DP = 4 1% (.024, 10-7)-DP 10,000 (10, .001)-DP

  • S. Kasiviswanathan, H. Lee, K. Nissim, S. Raskhodnikova, A. Smith, “What Can We Learn Privately?”, SIAM J. Comp, 2011
slide-45
SLIDE 45

Moments Accountant

  • 1. Choose
  • 2. Each batch is q fraction of data
  • 3. Keeping track of privacy loss’s moments
  • 4. Number of steps T
  • 5. Moments: (

, δ)-DP = 4 1% 10,000 (1.25, 10-5)-DP

slide-46
SLIDE 46

Results

slide-47
SLIDE 47

Our Datasets: “Fruit Flies of Machine Learning” MNIST dataset: 70,000 images 28⨉28 pixels each CIFAR-10 dataset: 60,000 color images 32⨉32 pixels each

slide-48
SLIDE 48

Summary of Results

Baseline no privacy

MNIST 98.3% CIFAR-10 80%

slide-49
SLIDE 49

Summary of Results

Baseline [SS15] [WKC+16] no privacy

reports ε per parameter

ε =2

MNIST 98.3% 98% 80% CIFAR-10 80%

slide-50
SLIDE 50

Baseline [SS15]

[WKC+16]

this work

no privacy

reports ε per parameter

ε =2 ε =8 δ = 10-5 ε =2 δ = 10-5 ε =0.5 δ = 10-5

MNIST 98.3% 98% 80% 97% 95% 90% CIFAR-10 80% 73% 67%

Summary of Results

slide-51
SLIDE 51

Contributions

  • Differentially private deep learning applied to publicly

available datasets and implemented in TensorFlow

○ https://github.com/tensorflow/models

  • Innovations

○ Bounding sensitivity ofupdates ○ Moments accountant to keep tracking of privacy loss

  • Lessons

○ Recommendations for selection ofhyperparameters

  • Full version: https://arxiv

.org/abs/1607.00133

slide-52
SLIDE 52

Differential Privacy for Machine Learning

  • Data privacy attacks
  • Model inversion attacks
  • Membership inference attacks
  • Differential privacy for deep learning
  • Noisy SGD
  • PATE
slide-53
SLIDE 53

In their work, the threat model assumes:

  • Adversary can make a potentially unbounded number of queries
  • Adversary has access to model internals
slide-54
SLIDE 54

Private Aggregation of Teacher Ensembles (PATE)

Intuitive privacy analysis:

  • If most teachers agree on the label, it does not depend on specific partitions, so

the privacy cost is small.

  • If two classes have close vote counts, the disagreement may reveal private

information

  • 1. Count votes
  • 2. Take maximum
slide-55
SLIDE 55

Nois isy aggregatio ion

slide-56
SLIDE 56

The aggregated teacher violates the threat model:

  • Each prediction increases total privacy loss.

privacy budgets create a tension between the accuracy and number of predictions

  • Inspection of internals may reveal private data.

Privacy guarantees should hold in the face of white-box adversaries

Private Aggregation of Teacher Ensembles (PATE)

  • 1. Count votes
  • 2. Take maximum
slide-57
SLIDE 57

Private Aggregation of Teacher Ensembles (PATE)

Privacy Analysis:

  • Privacy loss is fixed after the student model is done training.
  • Even if white-box adversary can inspect the model parameters, the

information can be revealed from student model is unlabeled public data and labels from aggregate teacher which is protected with privacy

slide-58
SLIDE 58

Generator:

Input: noise sampled from random distribution Output: synthetic input close to the expected training distribution

Discriminator:

Input: output from generator OR example from real training distribution Output: in distribution OR fake

Gaussia n sample Fake sample Sample

P(real) = … P(fake) = …

GANs

IJ Goodfellow et al. (2014) Generative Adversarial Networks

2 computing models

slide-59
SLIDE 59

Generator:

Input: noise sampled from random distribution Output: synthetic input close to the expected training distribution

Discriminator:

Input: output from generator OR example from real training distribution Output: in distribution (which class) OR fake

Gaussia n sample Fake sample Sample

P(real0) = … P(real1) = … … P(realN) = … P(fake) = …

Improved Training of GANs

T Salimans et al. (2016) Improved Techniques for Training GANs

slide-60
SLIDE 60

Private Aggregation of Teacher Ensembles using GANs (PATE-G)

Generator Discriminato r Public Data Queries Not available to the adversary Available to the adversary

slide-61
SLIDE 61

Aggregated Teacher Accuracy Before the Student Model is Trained

slide-62
SLIDE 62

(2, 10−5) (8, 10−5) 97% 95% (0.5, 10−5) 90%

M Abadi et al. (2016) Deep Learning with Differential Privacy

Evaluation

increase # teachers will increase privacy guarantee, but decrease model accuracy # teachers is constrained by task’s complexity and the available data

slide-63
SLIDE 63

Differential Privacy for Machine Learning

  • Data privacy attacks
  • Model inversion attacks
  • Membership inference attacks
  • Differential privacy for deep learning
  • Noisy SGD
  • PATE