Differentially-Private Deep Learning from an Optimization - - PowerPoint PPT Presentation

differentially private deep learning from an optimization
SMART_READER_LITE
LIVE PREVIEW

Differentially-Private Deep Learning from an Optimization - - PowerPoint PPT Presentation

Differentially-Private Deep Learning from an Optimization Perspective Presenter: Liyao Xiang Shanghai Jiao Tong University 4/30/2019 Privacy Threat Personal information in big data era Is anonymization sufficient to protect user


slide-1
SLIDE 1

Differentially-Private Deep Learning from an Optimization Perspective

Presenter: Liyao Xiang Shanghai Jiao Tong University 4/30/2019

slide-2
SLIDE 2

Privacy Threat

  • Personal information in big data era
  • Is anonymization sufficient to protect user privacy?
  • Netflix recommendation challenge: remove personal identity

information, replace names with random numbers

  • De-anonymize the Netflix database with the public

information on IMDb

  • De-anonymization even works on partial, distorted, wrong

data!

2

slide-3
SLIDE 3

Side Information

3

Side information hurts privacy! new comer

3

slide-4
SLIDE 4

Differential Privacy

4

P[M(7) ∈ O] P[M(8) ∈ O]

Output adjacent inputs

Constraint:

P[M(I) ∈ O] ≤ e✏P[M(I0) ∈ O]

smaller indicates higher privacy

4

O

slide-5
SLIDE 5

Deep Learning with Differential Privacy

5

Perturbation

θ = (𝜄1, …, 𝜄n) ϑ= (𝜘1, …, 𝜘n)

model tells! Differentially-private stochastic gradient (DPSGD): add noise to gradient gt in each iteration of update

slide-6
SLIDE 6

Deep Learning with Differential Privacy

6

The recent work [Abadi et. al., CCS’ 16] only achieves ~90% accuracy whereas training w/o privacy reaches over 99% on MNIST. The result

  • f [Shokri et. al., CCS’ 15] is even worse.

Privacy Accuracy

6

slide-7
SLIDE 7

7

In previous works: link between inserted noise and accuracy is broken

θ ϑ ϑ

more noise less noise lower accuracy? higher privacy level

slide-8
SLIDE 8

Model Sensitivity

8

90%

θ ϑ ϑ

85% same amount

  • f total

noise different accuracy levels

slide-9
SLIDE 9

Example

9

X1 X2 h1 h2 Y

𝝸1 𝝸6 b3 b1 b2

add noise

b3 Noise θ6 Noise

different cost!

slide-10
SLIDE 10

Optimized Additive Noise Scheme

  • Model sensitivity w = (w1, w2, …, wd) ∈ Dd: derivative

vector of the cost on all training examples w.r.t. all parameters

  • To keep the cost minimal, noise should be added to the

least sensitive direction of the cost function

  • Seek a probability distribution of the noise to minimize the

cost as well as to meet differential privacy constraint!

10

slide-11
SLIDE 11

Optimized Additive Noise Scheme

11

Objective

minimize

P

Z

zd

. . . Z

z1

hw, ziP(dz1 . . . dzd)

model sensitivity distribution of noise additive noise

wi = ∂C ∂θi > 0

cost increases as θi increases ⇒ pushes zi to a direction where zi < 0

∂C ∂θi > ∂C ∂θj > 0

cost is more sensitive to changes of θi than θj ⇒ less noise should be added to θi

slide-12
SLIDE 12

Optimized Additive Noise Scheme

12

Constraint global sensitivity on adjacent inputs:

α = sup

8X,X0 s.t. d(X,X0)=1

kgt g0tk

training datasets differ by a single instance L2-norm between the gradients

Pr[M(gt) ∈ O] ≤ e✏ Pr[M(g0t) ∈ O] ⇒ Pr[gt + z ∈ O] ≤ e✏ Pr[g0t + z ∈ O] ⇒ Pr[z ∈ O − gt] ≤ e✏ Pr[z ∈ O − g0t] ⇒ Pr[z ∈ O0] ≤ e✏ Pr[z ∈ O0 + gt − g0t]

slide-13
SLIDE 13

Optimized Additive Noise Scheme

13

minimize

p

Z

z∈Rd kw zk1p(z)dz

s.t. ln p(z) p(z + ∆)  ✏, 8||∆||  ↵, ∆ 2 Rd. minimize

P

Z

zd

. . . Z

z1

hw, ziP(dz1 . . . dzd) s.t. Pr[z 2 O0]  e✏ Pr[z 2 O0 + ∆] 8O0 ✓ Rd, ||∆||  α

slide-14
SLIDE 14

Composition

  • So far, we only show how to provide privacy guarantee in

a single iteration of update

  • In practice, SGD takes many iterations until convergence
  • Iterative computation exposes the training data multiple

times, degrading privacy level!

  • Our solution: Advanced composition theorem for

differential privacy + privacy amplification by sampling

14

slide-15
SLIDE 15

Optimized Additive Noise Mechanism

  • 1. Compute per-iteration privacy parameters according to composition theorem
  • 2. For each iteration
  • 1. Compute model sensitivity w
  • 2. Solve the optimization problem to find noise distribution
  • 3. Sample a noise
  • 4. For each batch of training data: Compute and clip the gradient by global

sensitivity

  • 5. Compute the average gradient for the batch
  • 6. Add noise to the average gradient
  • 7. Update model parameters

15

slide-16
SLIDE 16

Implementation

Implement optimized noise generator (ours) and Gaussian noise generator (the state-of-the-art, Abadi et. al.) on Keras and Tensorflow Problem: computational challenges due to high dimensionality Solving the optimization problem using GPU

  • perations

Numpy noise generator

16

slide-17
SLIDE 17

17

Our scheme achieves higher accuracy over [Abadi CCS’ 16] under the same privacy guarantee

Accuracy (%) 35 50 65 80 95 ε 0.01 0.025 0.1 0.5 1.2

G: δ=1e-5 G: δ=1e-4 O: δ=1e-5 O: δ=1e-4

17

MNIST

Accuracy (%) 40 51.25 62.5 73.75 85

Iteration No.

400 800 1200 1600 2000

G(𝝑=0.3, δ=1e-5) O(𝝑=0.3, δ=1e-5) G(𝝑=1, δ=1e-5) O(𝝑=1, δ=1e-5) Unperturbed

CIFAR-10

slide-18
SLIDE 18

Thank you!

18