Differentially-Private Deep Learning from an Optimization - - PowerPoint PPT Presentation
Differentially-Private Deep Learning from an Optimization - - PowerPoint PPT Presentation
Differentially-Private Deep Learning from an Optimization Perspective Presenter: Liyao Xiang Shanghai Jiao Tong University 4/30/2019 Privacy Threat Personal information in big data era Is anonymization sufficient to protect user
Privacy Threat
- Personal information in big data era
- Is anonymization sufficient to protect user privacy?
- Netflix recommendation challenge: remove personal identity
information, replace names with random numbers
- De-anonymize the Netflix database with the public
information on IMDb
- De-anonymization even works on partial, distorted, wrong
data!
2
Side Information
3
Side information hurts privacy! new comer
3
Differential Privacy
4
P[M(7) ∈ O] P[M(8) ∈ O]
Output adjacent inputs
Constraint:
P[M(I) ∈ O] ≤ e✏P[M(I0) ∈ O]
smaller indicates higher privacy
✏
4
O
Deep Learning with Differential Privacy
5
Perturbation
θ = (𝜄1, …, 𝜄n) ϑ= (𝜘1, …, 𝜘n)
model tells! Differentially-private stochastic gradient (DPSGD): add noise to gradient gt in each iteration of update
Deep Learning with Differential Privacy
6
The recent work [Abadi et. al., CCS’ 16] only achieves ~90% accuracy whereas training w/o privacy reaches over 99% on MNIST. The result
- f [Shokri et. al., CCS’ 15] is even worse.
Privacy Accuracy
6
7
In previous works: link between inserted noise and accuracy is broken
θ ϑ ϑ
more noise less noise lower accuracy? higher privacy level
Model Sensitivity
8
90%
θ ϑ ϑ
85% same amount
- f total
noise different accuracy levels
Example
9
X1 X2 h1 h2 Y
𝝸1 𝝸6 b3 b1 b2
add noise
b3 Noise θ6 Noise
different cost!
Optimized Additive Noise Scheme
- Model sensitivity w = (w1, w2, …, wd) ∈ Dd: derivative
vector of the cost on all training examples w.r.t. all parameters
- To keep the cost minimal, noise should be added to the
least sensitive direction of the cost function
- Seek a probability distribution of the noise to minimize the
cost as well as to meet differential privacy constraint!
10
Optimized Additive Noise Scheme
11
Objective
minimize
P
Z
zd
. . . Z
z1
hw, ziP(dz1 . . . dzd)
model sensitivity distribution of noise additive noise
wi = ∂C ∂θi > 0
cost increases as θi increases ⇒ pushes zi to a direction where zi < 0
∂C ∂θi > ∂C ∂θj > 0
cost is more sensitive to changes of θi than θj ⇒ less noise should be added to θi
Optimized Additive Noise Scheme
12
Constraint global sensitivity on adjacent inputs:
α = sup
8X,X0 s.t. d(X,X0)=1
kgt g0tk
training datasets differ by a single instance L2-norm between the gradients
Pr[M(gt) ∈ O] ≤ e✏ Pr[M(g0t) ∈ O] ⇒ Pr[gt + z ∈ O] ≤ e✏ Pr[g0t + z ∈ O] ⇒ Pr[z ∈ O − gt] ≤ e✏ Pr[z ∈ O − g0t] ⇒ Pr[z ∈ O0] ≤ e✏ Pr[z ∈ O0 + gt − g0t]
Optimized Additive Noise Scheme
13
minimize
p
Z
z∈Rd kw zk1p(z)dz
s.t. ln p(z) p(z + ∆) ✏, 8||∆|| ↵, ∆ 2 Rd. minimize
P
Z
zd
. . . Z
z1
hw, ziP(dz1 . . . dzd) s.t. Pr[z 2 O0] e✏ Pr[z 2 O0 + ∆] 8O0 ✓ Rd, ||∆|| α
Composition
- So far, we only show how to provide privacy guarantee in
a single iteration of update
- In practice, SGD takes many iterations until convergence
- Iterative computation exposes the training data multiple
times, degrading privacy level!
- Our solution: Advanced composition theorem for
differential privacy + privacy amplification by sampling
14
Optimized Additive Noise Mechanism
- 1. Compute per-iteration privacy parameters according to composition theorem
- 2. For each iteration
- 1. Compute model sensitivity w
- 2. Solve the optimization problem to find noise distribution
- 3. Sample a noise
- 4. For each batch of training data: Compute and clip the gradient by global
sensitivity
- 5. Compute the average gradient for the batch
- 6. Add noise to the average gradient
- 7. Update model parameters
15
Implementation
Implement optimized noise generator (ours) and Gaussian noise generator (the state-of-the-art, Abadi et. al.) on Keras and Tensorflow Problem: computational challenges due to high dimensionality Solving the optimization problem using GPU
- perations
Numpy noise generator
16
17
Our scheme achieves higher accuracy over [Abadi CCS’ 16] under the same privacy guarantee
Accuracy (%) 35 50 65 80 95 ε 0.01 0.025 0.1 0.5 1.2
G: δ=1e-5 G: δ=1e-4 O: δ=1e-5 O: δ=1e-4
17
MNIST
Accuracy (%) 40 51.25 62.5 73.75 85
Iteration No.
400 800 1200 1600 2000
G(𝝑=0.3, δ=1e-5) O(𝝑=0.3, δ=1e-5) G(𝝑=1, δ=1e-5) O(𝝑=1, δ=1e-5) Unperturbed
CIFAR-10
Thank you!
18