 
              Di ff erentially Private Empirical Risk Minimization with Non-convex Loss Functions Di Wang , Changyou Chen and Jinhui Xu State University of New York at Bu ff alo International Conference on Machine Learning 2019 Di Wang Non-convex DP-ERM ICML 2019 1 / 15
Outline Introduction 1 Problem Description Result 1 Result 2 Result 3 Di Wang Non-convex DP-ERM ICML 2019 2 / 15
Outline Introduction 1 Problem Description Result 1 Result 2 Result 3 Di Wang Non-convex DP-ERM ICML 2019 3 / 15
Empirical Risk Minimization (ERM) Given: A dataset D = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , · · · , ( x n , y n ) } , where each ( x i , y i ) ∈ R d × R ∼ P . Regularization r ( · ) : R d → R , we use ℓ 2 regularization with r ( w ) = λ 2  w  2 2 . For a loss function ℓ , the (regularized) Empirical Risk: n  L r ( w ; D ) = 1 ˆ ℓ ( w ; x i , y i ) + r ( w ) . n i =1 the (regularized) Population Risk: L r P ( w ) = E ( x , y ) ∼ P [ ℓ ( w ; x , y )] + r ( w ) . Goal: Find w so as to minimize the empirical or population risk. Di Wang Non-convex DP-ERM ICML 2019 4 / 15
(  , δ )- Di ff erential Privacy (DP) Di ff erential Privacy (DP) [Dwork et al,. 2006] We say that two datasets, D and D ′ , are neighbors if they di ff er by only one entry, denoted as D ∼ D ′ . A randomized algorithm A is (  , δ )-di ff erentially private if for all neighboring datasets D , D ′ , and for all events S in the output space of A , we have Pr( A ( D ) ∈ S ) ≤ e  Pr A ( D ′ ) ∈ S ) + δ . Di Wang Non-convex DP-ERM ICML 2019 5 / 15
DP-ERM DP-ERM Determine a sample complexity n = n (1 /  , 1 / δ , p , 1 / α ) such that there is an (  , δ )-DP algorithm whose output w priv achieves an α -error in the expected excess empirical risk : Err r D ( w priv ) = E ˆ L ( w LDP ; D ) − min w ∈ R d ˆ L ( w ; D ) ≤ α . or in the expected excess empirical risk : Err r P ( w priv ) = E [ L r P ( w priv )] − min w ∈ R d L r P ( w ) ≤ α . Di Wang Non-convex DP-ERM ICML 2019 6 / 15
Motivation Previous work on DP-ERM mainly focuses on convex loss functions. Di Wang Non-convex DP-ERM ICML 2019 7 / 15
Motivation Previous work on DP-ERM mainly focuses on convex loss functions. For non-convex loss functions, [Zhang et al, 2017] and [Wang and Xu 2019] studied the problem and used, as error measurement, the ℓ 2 gradient norm of a private estimator, i.e., ∇ ˆ L r D ( w priv )  2 and E P ∇ ℓ ( w priv ; x , y )  2 Di Wang Non-convex DP-ERM ICML 2019 7 / 15
Motivation Previous work on DP-ERM mainly focuses on convex loss functions. For non-convex loss functions, [Zhang et al, 2017] and [Wang and Xu 2019] studied the problem and used, as error measurement, the ℓ 2 gradient norm of a private estimator, i.e., ∇ ˆ L r D ( w priv )  2 and E P ∇ ℓ ( w priv ; x , y )  2 Main Question: Can the excess empirical (population) risk be used to measure the error of non-convex loss functions in the di ff erential privacy model? Di Wang Non-convex DP-ERM ICML 2019 7 / 15
Outline Introduction 1 Problem Description Result 1 Result 2 Result 3 Di Wang Non-convex DP-ERM ICML 2019 8 / 15
Result 1 Theorem 1 If the loss function is L -Lipschitz, twice di ff erentiable and M -smooth, by using the private version of Gradient Langevin Dynamics (DP-GLD) we show that the excess empirical (or population) risk is upper bounded by O ( d log(1 / δ ) ˜ log n  2 ). Di Wang Non-convex DP-ERM ICML 2019 9 / 15
Result 1 Theorem 1 If the loss function is L -Lipschitz, twice di ff erentiable and M -smooth, by using the private version of Gradient Langevin Dynamics (DP-GLD) we show that the excess empirical (or population) risk is upper bounded by O ( d log(1 / δ ) ˜ log n  2 ). The proof is based on some recent developments in Bayesian learning and analysis of GLD. By using a finer analysis of the time-average error of some SDE, we show the following Di Wang Non-convex DP-ERM ICML 2019 9 / 15
Result 1 Theorem 1 If the loss function is L -Lipschitz, twice di ff erentiable and M -smooth, by using the private version of Gradient Langevin Dynamics (DP-GLD) we show that the excess empirical (or population) risk is upper bounded by O ( d log(1 / δ ) ˜ log n  2 ). The proof is based on some recent developments in Bayesian learning and analysis of GLD. By using a finer analysis of the time-average error of some SDE, we show the following Theorem 2 For the excessed empirical risk, there is an (  , δ )-DP algorithm which satisfies  C 0 ( d ) log(1 / δ )  T →∞ Err r D ( w T ) ≤ ˜ lim O , n τ  τ where C 0 ( d ) is a function of d and 0 < τ < 1 is some cosntant. Di Wang Non-convex DP-ERM ICML 2019 9 / 15
Outline Introduction 1 Problem Description Result 1 Result 2 Result 3 Di Wang Non-convex DP-ERM ICML 2019 10 / 15
Result 2 Are these bounds tight? Di Wang Non-convex DP-ERM ICML 2019 11 / 15
Result 2 Are these bounds tight? Based on the exponential mechanism, we have Empirical Risk For any β < 1, there is an  -di ff erentially private algorithm whose output w priv induces an excess empirical risk Err r D ( w priv ) ≤ ˜ O ( d n  ) with probability at least 1 − β . Di Wang Non-convex DP-ERM ICML 2019 11 / 15
Result 2 Are these bounds tight? Based on the exponential mechanism, we have Empirical Risk For any β < 1, there is an  -di ff erentially private algorithm whose output w priv induces an excess empirical risk Err r D ( w priv ) ≤ ˜ O ( d n  ) with probability at least 1 − β . Population Risk For Generalized Linear model and Robust Regressions (whose loss function is ℓ ( w ; x , y ) = ( σ ( 〈 w , x 〉 ) − y ) 2 and ℓ ( w ; x , y ) = Φ ( 〈 w , x 〉 − y ), respectively), under some reasonable assumptions, there is an (  , δ )-DP algorithm whose excess population risk is upper bounded by  d ln 1 4   δ Err P ( w priv ) ≤ O √ n  . Di Wang Non-convex DP-ERM ICML 2019 11 / 15
Outline Introduction 1 Problem Description Result 1 Result 2 Result 3 Di Wang Non-convex DP-ERM ICML 2019 12 / 15
Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Di Wang Non-convex DP-ERM ICML 2019 13 / 15
Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Recent research on Deep Learning and other non-convex problems show that local minima , but not critical points, are su ffi cient. Di Wang Non-convex DP-ERM ICML 2019 13 / 15
Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Recent research on Deep Learning and other non-convex problems show that local minima , but not critical points, are su ffi cient. But , finding local minima is still NP-hard. Di Wang Non-convex DP-ERM ICML 2019 13 / 15
Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Recent research on Deep Learning and other non-convex problems show that local minima , but not critical points, are su ffi cient. But , finding local minima is still NP-hard. Fortunately, many non-convex functions are strict saddle. Thus, it is su ffi cient to find the second order stationary point (or approximate local minimum). Definition w is an α -second-order stationary point ( α -SOSP), if ∇ F ( w )  2 ≤ α and λ min ( ∇ 2 F ( w )) ≥ −√ ρα . (1) Di Wang Non-convex DP-ERM ICML 2019 13 / 15
Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Recent research on Deep Learning and other non-convex problems show that local minima , but not critical points, are su ffi cient. But , finding local minima is still NP-hard. Fortunately, many non-convex functions are strict saddle. Thus, it is su ffi cient to find the second order stationary point (or approximate local minimum). Definition w is an α -second-order stationary point ( α -SOSP), if ∇ F ( w )  2 ≤ α and λ min ( ∇ 2 F ( w )) ≥ −√ ρα . (1) Can we find some approximate local minimum which escapes saddle points and still keeps the algorithm (  , δ ) -di ff erentially private? Di Wang Non-convex DP-ERM ICML 2019 13 / 15
Result 3 On one hand, (Ge et al. 2015) proposed an algorithm, noisy Stochastic Gradient Descent , to find approximate local minima. Di Wang Non-convex DP-ERM ICML 2019 14 / 15
Result 3 On one hand, (Ge et al. 2015) proposed an algorithm, noisy Stochastic Gradient Descent , to find approximate local minima. On the other hand, in DP community, one popular method for ERM is called DP-SGD , which adds some Gaussian noise in each iteration. Di Wang Non-convex DP-ERM ICML 2019 14 / 15
Result 3 On one hand, (Ge et al. 2015) proposed an algorithm, noisy Stochastic Gradient Descent , to find approximate local minima. On the other hand, in DP community, one popular method for ERM is called DP-SGD , which adds some Gaussian noise in each iteration. Using DP-GD, we can show Theorem 4 If the data size n is large enough such that  log 1 δ d log 1 ξ n ≥ ˜ Ω ( ) , (2) α 2 then with probability 1 − ζ , one of the outputs is an α -SOSP of the empirical risk ˆ L ( · , D ). Di Wang Non-convex DP-ERM ICML 2019 14 / 15
Thank you! Di Wang Non-convex DP-ERM ICML 2019 15 / 15
Recommend
More recommend