Differentially-Private Deep Learning from an Optimization - PowerPoint PPT Presentation

Differentially-Private Deep Learning from an Optimization Perspective Presenter: Liyao Xiang Shanghai Jiao Tong University 4/30/2019

Privacy Threat ‣ Personal information in big data era ‣ Is anonymization sufficient to protect user privacy? ‣ Netflix recommendation challenge: remove personal identity information, replace names with random numbers ‣ De-anonymize the Netflix database with the public information on IMDb ‣ De-anonymization even works on partial, distorted, wrong data! 2

Side Information new comer Side information hurts privacy! 3 3

Differential Privacy adjacent inputs Constraint: P [ M ( I ) ∈ O ] ≤ e ✏ P [ M ( I 0 ) ∈ O ] P[ M (7) ∈ O] P[ M (8) ∈ O] smaller ✏ indicates higher privacy Output O 4 4

Deep Learning with Differential Privacy θ = ( 𝜄 1 , …, 𝜄 n ) ϑ = ( 𝜘 1 , …, 𝜘 n ) Perturbation Differentially-private model stochastic gradient tells! (DPSGD): add noise to gradient g t in each iteration of update 5

Deep Learning with Differential Privacy The recent work [Abadi et. al., CCS’ 16] only achieves ~90% accuracy whereas training w/o privacy reaches over 99% on MNIST. The result of [Shokri et. al., CCS’ 15] is even worse. Privacy Accuracy 6 6

higher lower more noise ϑ privacy accuracy? level ϑ In previous works: link between inserted noise less noise and accuracy is θ broken 7

Model Sensitivity 85% ϑ 90% different θ same accuracy amount levels of total noise ϑ 8

Example add noise different cost! b 1 b 2 b 3 𝝸 1 X1 h1 Y b 3 Noise X2 h2 θ 6 Noise 𝝸 6 9

Optimized Additive Noise Scheme ‣ Model sensitivity w = (w 1 , w 2 , …, w d ) ∈ D d : derivative vector of the cost on all training examples w.r.t. all parameters ‣ To keep the cost minimal, noise should be added to the least sensitive direction of the cost function ‣ Seek a probability distribution of the noise to minimize the cost as well as to meet differential privacy constraint! 10

Optimized Additive Noise Scheme Objective distribution of model sensitivity noise Z Z minimize h w , z i P (d z 1 . . . d z d ) . . . P z d z 1 additive noise cost increases as θ i increases ⇒ w i = ∂ C > 0 cost is more sensitive to ∂θ i pushes z i to a direction where z i < 0 changes of θ i than θ j ⇒ less ∂ C > ∂ C > 0 noise should be added to θ i ∂θ i ∂θ j 11

Optimized Additive Noise Scheme Constraint k g t � g 0 t k global sensitivity on α = sup 8 X , X 0 s.t. d ( X , X 0 )=1 adjacent inputs: L2-norm training datasets between the differ by a single Pr[ M ( g t ) ∈ O ] ≤ e ✏ Pr[ M ( g 0 t ) ∈ O ] gradients instance ⇒ Pr[ g t + z ∈ O ] ≤ e ✏ Pr[ g 0 t + z ∈ O ] ⇒ Pr[ z ∈ O − g t ] ≤ e ✏ Pr[ z ∈ O − g 0 t ] ⇒ Pr[ z ∈ O 0 ] ≤ e ✏ Pr[ z ∈ O 0 + g t − g 0 t ] 12

Optimized Additive Noise Scheme Z Z minimize h w , z i P (d z 1 . . . d z d ) . . . P z d z 1 s.t. Pr[ z 2 O 0 ]  e ✏ Pr[ z 2 O 0 + ∆ ] 8 O 0 ✓ R d , || ∆ ||  α Z minimize z ∈ R d k w � z k 1 p ( z )d z p p ( z ) p ( z + ∆ )  ✏ , 8 || ∆ ||  ↵ , ∆ 2 R d . s.t. ln 13

Composition ‣ So far, we only show how to provide privacy guarantee in a single iteration of update ‣ In practice, SGD takes many iterations until convergence ‣ Iterative computation exposes the training data multiple times, degrading privacy level! ‣ Our solution: Advanced composition theorem for differential privacy + privacy amplification by sampling 14

Optimized Additive Noise Mechanism 1. Compute per-iteration privacy parameters according to composition theorem 2. For each iteration 1. Compute model sensitivity w 2. Solve the optimization problem to find noise distribution 3. Sample a noise 4. For each batch of training data: Compute and clip the gradient by global sensitivity 5. Compute the average gradient for the batch 6. Add noise to the average gradient 7. Update model parameters 15

Implementation Implement optimized noise generator (ours) and Gaussian noise generator (the state-of-the-art, Abadi et. al.) on Keras and Tensorflow Problem : computational challenges due to high dimensionality Solving the optimization problem using GPU operations Numpy noise generator 16

MNIST CIFAR-10 95 85 80 73.75 Accuracy (%) Accuracy (%) 65 62.5 G: δ =1e-5 G( 𝝑 =0.3, δ =1e-5) G: δ =1e-4 O( 𝝑 =0.3, δ =1e-5) 50 O: δ =1e-5 G( 𝝑 =1, δ =1e-5) 51.25 O( 𝝑 =1, δ =1e-5) O: δ =1e-4 Unperturbed 35 40 400 800 1200 1600 2000 0.01 0.025 0.1 0.5 1.2 Iteration No. ε Our scheme achieves higher accuracy over [Abadi CCS’ 16] under the same privacy guarantee 17 17

Thank you! 18

Differentially-Private Deep Learning from an Optimization - PowerPoint PPT Presentation

Differentially-Private Deep Learning from an Optimization Perspective Presenter: Liyao Xiang Shanghai Jiao Tong University 4/30/2019 Privacy Threat Personal information in big data era Is anonymization sufficient to protect user

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Verifying Differentially Private Bayesian Inference Marco Gaboardi University of Dundee Joint

Differentially Private Recommender Systems David Madras University of Toronto April 4, 2017

Estimating the Variance of Complex Differentially Private Algorithms Robert Ashmead JSM 2019,

Differentially Private Model Publishing for Deep Learning Lei Yu, Ling Liu, Calton Pu , Mehmet

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Towards Practical Differentially Private Convex Optimization ROGER JOSEPH P. DAWN SONG IYENGAR

Differentially Private Distributed Convex Optimization via Functional Perturbation Erfan Nozari

Absorption Line Profiles for Absorption Line Profiles for Differentially Rotating 2 M

Efficient Online Learning using A Private Oracle Alon Gonen, UCSD Elad Hazan, Princeton Shay

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Differentially Private Oblivious RAM Sameer Wagh , Paul Cuff , Prateek Mittal July 24,

Building Blocks of Privacy: Differentially Private Mechanisms Graham Cormode graham@cormode.org

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Low Background Laboratories Per Provencher PHYS 575 Fall 2015 12/1/2015 Why Low Background

Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array

Robustly Reusable Fuzzy Extractor from Standard Assumptions Yunhua Wen and Shengli Liu Shanghai

What is luban http://luban.danse.us A python package Simple, natural syntax for

Probabilistic Bisimilarity Revisited Yuxin Deng Shanghai Jiao Tong University

Polynomial Invariant Generation for Non-deterministic Recursive Programs Krishnendu Chatterjee 1 ,

Optimal investment under multiple defaults: a BSDE-decomposition approach en PHAM Huy

Proving Lock-Freedom Easily and Automatically Xiao Jia 1 Wei Li 1 Viktor Vafeiadis 2 1 Shanghai