Di ff erentially Private Empirical Risk Minimization with Non-convex - PowerPoint PPT Presentation

Di ff erentially Private Empirical Risk Minimization with Non-convex Loss Functions Di Wang , Changyou Chen and Jinhui Xu State University of New York at Bu ff alo International Conference on Machine Learning 2019 Di Wang Non-convex DP-ERM ICML 2019 1 / 15

Outline Introduction 1 Problem Description Result 1 Result 2 Result 3 Di Wang Non-convex DP-ERM ICML 2019 2 / 15

Empirical Risk Minimization (ERM) Given: A dataset D = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , · · · , ( x n , y n ) } , where each ( x i , y i ) ∈ R d × R ∼ P . Regularization r ( · ) : R d 󰀂→ R , we use ℓ 2 regularization with r ( w ) = λ 2 󰀃 w 󰀃 2 2 . For a loss function ℓ , the (regularized) Empirical Risk: n 󰁜 L r ( w ; D ) = 1 ˆ ℓ ( w ; x i , y i ) + r ( w ) . n i =1 the (regularized) Population Risk: L r P ( w ) = E ( x , y ) ∼ P [ ℓ ( w ; x , y )] + r ( w ) . Goal: Find w so as to minimize the empirical or population risk. Di Wang Non-convex DP-ERM ICML 2019 4 / 15

( 󰂄 , δ )- Di ff erential Privacy (DP) Di ff erential Privacy (DP) [Dwork et al,. 2006] We say that two datasets, D and D ′ , are neighbors if they di ff er by only one entry, denoted as D ∼ D ′ . A randomized algorithm A is ( 󰂄 , δ )-di ff erentially private if for all neighboring datasets D , D ′ , and for all events S in the output space of A , we have Pr( A ( D ) ∈ S ) ≤ e 󰂄 Pr A ( D ′ ) ∈ S ) + δ . Di Wang Non-convex DP-ERM ICML 2019 5 / 15

DP-ERM DP-ERM Determine a sample complexity n = n (1 / 󰂄 , 1 / δ , p , 1 / α ) such that there is an ( 󰂄 , δ )-DP algorithm whose output w priv achieves an α -error in the expected excess empirical risk : Err r D ( w priv ) = E ˆ L ( w LDP ; D ) − min w ∈ R d ˆ L ( w ; D ) ≤ α . or in the expected excess empirical risk : Err r P ( w priv ) = E [ L r P ( w priv )] − min w ∈ R d L r P ( w ) ≤ α . Di Wang Non-convex DP-ERM ICML 2019 6 / 15

Motivation Previous work on DP-ERM mainly focuses on convex loss functions. Di Wang Non-convex DP-ERM ICML 2019 7 / 15

Motivation Previous work on DP-ERM mainly focuses on convex loss functions. For non-convex loss functions, [Zhang et al, 2017] and [Wang and Xu 2019] studied the problem and used, as error measurement, the ℓ 2 gradient norm of a private estimator, i.e., 󰀃∇ ˆ L r D ( w priv ) 󰀃 2 and E P 󰀃∇ ℓ ( w priv ; x , y ) 󰀃 2 Di Wang Non-convex DP-ERM ICML 2019 7 / 15

Motivation Previous work on DP-ERM mainly focuses on convex loss functions. For non-convex loss functions, [Zhang et al, 2017] and [Wang and Xu 2019] studied the problem and used, as error measurement, the ℓ 2 gradient norm of a private estimator, i.e., 󰀃∇ ˆ L r D ( w priv ) 󰀃 2 and E P 󰀃∇ ℓ ( w priv ; x , y ) 󰀃 2 Main Question: Can the excess empirical (population) risk be used to measure the error of non-convex loss functions in the di ff erential privacy model? Di Wang Non-convex DP-ERM ICML 2019 7 / 15

Result 1 Theorem 1 If the loss function is L -Lipschitz, twice di ff erentiable and M -smooth, by using the private version of Gradient Langevin Dynamics (DP-GLD) we show that the excess empirical (or population) risk is upper bounded by O ( d log(1 / δ ) ˜ log n 󰂄 2 ). Di Wang Non-convex DP-ERM ICML 2019 9 / 15

Result 1 Theorem 1 If the loss function is L -Lipschitz, twice di ff erentiable and M -smooth, by using the private version of Gradient Langevin Dynamics (DP-GLD) we show that the excess empirical (or population) risk is upper bounded by O ( d log(1 / δ ) ˜ log n 󰂄 2 ). The proof is based on some recent developments in Bayesian learning and analysis of GLD. By using a finer analysis of the time-average error of some SDE, we show the following Di Wang Non-convex DP-ERM ICML 2019 9 / 15

Result 1 Theorem 1 If the loss function is L -Lipschitz, twice di ff erentiable and M -smooth, by using the private version of Gradient Langevin Dynamics (DP-GLD) we show that the excess empirical (or population) risk is upper bounded by O ( d log(1 / δ ) ˜ log n 󰂄 2 ). The proof is based on some recent developments in Bayesian learning and analysis of GLD. By using a finer analysis of the time-average error of some SDE, we show the following Theorem 2 For the excessed empirical risk, there is an ( 󰂄 , δ )-DP algorithm which satisfies 󰀄 C 0 ( d ) log(1 / δ ) 󰀅 T →∞ Err r D ( w T ) ≤ ˜ lim O , n τ 󰂄 τ where C 0 ( d ) is a function of d and 0 < τ < 1 is some cosntant. Di Wang Non-convex DP-ERM ICML 2019 9 / 15

Result 2 Are these bounds tight? Di Wang Non-convex DP-ERM ICML 2019 11 / 15

Result 2 Are these bounds tight? Based on the exponential mechanism, we have Empirical Risk For any β < 1, there is an 󰂄 -di ff erentially private algorithm whose output w priv induces an excess empirical risk Err r D ( w priv ) ≤ ˜ O ( d n 󰂄 ) with probability at least 1 − β . Di Wang Non-convex DP-ERM ICML 2019 11 / 15

Result 2 Are these bounds tight? Based on the exponential mechanism, we have Empirical Risk For any β < 1, there is an 󰂄 -di ff erentially private algorithm whose output w priv induces an excess empirical risk Err r D ( w priv ) ≤ ˜ O ( d n 󰂄 ) with probability at least 1 − β . Population Risk For Generalized Linear model and Robust Regressions (whose loss function is ℓ ( w ; x , y ) = ( σ ( 〈 w , x 〉 ) − y ) 2 and ℓ ( w ; x , y ) = Φ ( 〈 w , x 〉 − y ), respectively), under some reasonable assumptions, there is an ( 󰂄 , δ )-DP algorithm whose excess population risk is upper bounded by 󰁵 d ln 1 4 󰀄 󰀅 δ Err P ( w priv ) ≤ O √ n 󰂄 . Di Wang Non-convex DP-ERM ICML 2019 11 / 15

Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Di Wang Non-convex DP-ERM ICML 2019 13 / 15

Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Recent research on Deep Learning and other non-convex problems show that local minima , but not critical points, are su ffi cient. Di Wang Non-convex DP-ERM ICML 2019 13 / 15

Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Recent research on Deep Learning and other non-convex problems show that local minima , but not critical points, are su ffi cient. But , finding local minima is still NP-hard. Di Wang Non-convex DP-ERM ICML 2019 13 / 15

Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Recent research on Deep Learning and other non-convex problems show that local minima , but not critical points, are su ffi cient. But , finding local minima is still NP-hard. Fortunately, many non-convex functions are strict saddle. Thus, it is su ffi cient to find the second order stationary point (or approximate local minimum). Definition w is an α -second-order stationary point ( α -SOSP), if 󰀃∇ F ( w ) 󰀃 2 ≤ α and λ min ( ∇ 2 F ( w )) ≥ −√ ρα . (1) Di Wang Non-convex DP-ERM ICML 2019 13 / 15

Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Recent research on Deep Learning and other non-convex problems show that local minima , but not critical points, are su ffi cient. But , finding local minima is still NP-hard. Fortunately, many non-convex functions are strict saddle. Thus, it is su ffi cient to find the second order stationary point (or approximate local minimum). Definition w is an α -second-order stationary point ( α -SOSP), if 󰀃∇ F ( w ) 󰀃 2 ≤ α and λ min ( ∇ 2 F ( w )) ≥ −√ ρα . (1) Can we find some approximate local minimum which escapes saddle points and still keeps the algorithm ( 󰂄 , δ ) -di ff erentially private? Di Wang Non-convex DP-ERM ICML 2019 13 / 15

Result 3 On one hand, (Ge et al. 2015) proposed an algorithm, noisy Stochastic Gradient Descent , to find approximate local minima. Di Wang Non-convex DP-ERM ICML 2019 14 / 15

Result 3 On one hand, (Ge et al. 2015) proposed an algorithm, noisy Stochastic Gradient Descent , to find approximate local minima. On the other hand, in DP community, one popular method for ERM is called DP-SGD , which adds some Gaussian noise in each iteration. Di Wang Non-convex DP-ERM ICML 2019 14 / 15

Result 3 On one hand, (Ge et al. 2015) proposed an algorithm, noisy Stochastic Gradient Descent , to find approximate local minima. On the other hand, in DP community, one popular method for ERM is called DP-SGD , which adds some Gaussian noise in each iteration. Using DP-GD, we can show Theorem 4 If the data size n is large enough such that 󰁵 log 1 δ d log 1 ξ n ≥ ˜ Ω ( ) , (2) 󰂄α 2 then with probability 1 − ζ , one of the outputs is an α -SOSP of the empirical risk ˆ L ( · , D ). Di Wang Non-convex DP-ERM ICML 2019 14 / 15

Thank you! Di Wang Non-convex DP-ERM ICML 2019 15 / 15

Di ff erentially Private Empirical Risk Minimization with Non-convex - PowerPoint PPT Presentation

Di ff erentially Private Empirical Risk Minimization with Non-convex Loss Functions Di Wang , Changyou Chen and Jinhui Xu State University of New York at Bu ff alo International Conference on Machine Learning 2019 Di Wang Non-convex DP-ERM ICML

Empirical Risk Minimization October 29, 2015 Outline Empirical risk minimization view

CSC2412: Private Gradient Descent & Empirical Risk Minimization Sasho Nikolov 1 Empirical

Di ff erentially-Private Batch Query Answering Exploiting the Workload vs. Exploiting the Data

Privacy Accounting and Quality Control in the Sage Di ff erentially Private ML Platform Mathias

Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabs Pczos Empirical Risk

Minimization Satoru Iwata (University of Tokyo) Submodular Function Minimization ( )

Introduction to Machine Learning CMU-10701 10. Risk Minimization Barnabs Pczos 10. Risk

Introduction to Machine Learning ML-Basics: Losses & Risk Minimization Learning goals Know

Risk Management Workshop 1 Risk management workshop Why do we Risk Risk and need risk

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Suboptimality of Penalized Empirical Risk Minimization in Classification. Guillaume Lecu e

Nonconvex Variance Reduced Optimization with Arbitrary Sampling Samuel Horvth Peter Richtrik

High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and

Benefits of Radial Build Benefits of Radial Build Minimization and Requirements Minimization and

1 The Minimization Problem The Minimization Problem Input: A DFA (deterministic finite-state

ARS Workshop Context Markov Random Fields minimization and minimal cuts in Exact total variation

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

MassiveBlack Rupert Croft Tiziana Di Matteo Yu Feng Nishikanta Khandai Colin Degraf Evan

Euro-Mediterranean Center on Climate Change M. Mancini 1 , A. Raolil 1 , G. Cal 1 , G. Aloisio

New Insights into Disability Beneficiaries Pursuit of Work Presenters Michael Levere, Denise

Your Future Why become a Registered Dietitian, Registered Dietitian Nutritionist? What are

CLOSER 2019, May., 2-4, Heraklion, Greece 1 CLOSER 2019, May., 2-4, Heraklion, Greece 2 Cloud

Cada Da - Welsh Meeting Template Social Language Learning Program - Template - Thursday - Dydd

Cada Da - Welsh Meeting Template Social Language Learning Program - Template - Wednesday - Dydd

Di ff erentially Private Empirical Risk Minimization with Non-convex - PowerPoint PPT Presentation

Di ff erentially Private Empirical Risk Minimization with Non-convex Loss Functions Di Wang , Changyou Chen and Jinhui Xu State University of New York at Bu ff alo International Conference on Machine Learning 2019 Di Wang Non-convex DP-ERM ICML

Empirical Risk Minimization October 29, 2015 Outline Empirical risk minimization view

CSC2412: Private Gradient Descent &amp; Empirical Risk Minimization Sasho Nikolov 1 Empirical

Di ff erentially-Private Batch Query Answering Exploiting the Workload vs. Exploiting the Data

Privacy Accounting and Quality Control in the Sage Di ff erentially Private ML Platform Mathias

Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabs Pczos Empirical Risk

Minimization Satoru Iwata (University of Tokyo) Submodular Function Minimization ( )

Introduction to Machine Learning CMU-10701 10. Risk Minimization Barnabs Pczos 10. Risk

Introduction to Machine Learning ML-Basics: Losses &amp; Risk Minimization Learning goals Know

Risk Management Workshop 1 Risk management workshop Why do we Risk Risk and need risk

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Suboptimality of Penalized Empirical Risk Minimization in Classification. Guillaume Lecu e

Nonconvex Variance Reduced Optimization with Arbitrary Sampling Samuel Horvth Peter Richtrik

High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and

Benefits of Radial Build Benefits of Radial Build Minimization and Requirements Minimization and

1 The Minimization Problem The Minimization Problem Input: A DFA (deterministic finite-state

ARS Workshop Context Markov Random Fields minimization and minimal cuts in Exact total variation

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

MassiveBlack Rupert Croft Tiziana Di Matteo Yu Feng Nishikanta Khandai Colin Degraf Evan

Euro-Mediterranean Center on Climate Change M. Mancini 1 , A. Raolil 1 , G. Cal 1 , G. Aloisio

New Insights into Disability Beneficiaries Pursuit of Work Presenters Michael Levere, Denise

Your Future Why become a Registered Dietitian, Registered Dietitian Nutritionist? What are

CLOSER 2019, May., 2-4, Heraklion, Greece 1 CLOSER 2019, May., 2-4, Heraklion, Greece 2 Cloud

Cada Da - Welsh Meeting Template Social Language Learning Program - Template - Thursday - Dydd

Cada Da - Welsh Meeting Template Social Language Learning Program - Template - Wednesday - Dydd

CSC2412: Private Gradient Descent & Empirical Risk Minimization Sasho Nikolov 1 Empirical

Introduction to Machine Learning ML-Basics: Losses & Risk Minimization Learning goals Know