di ff erentially private empirical risk minimization with
play

Di ff erentially Private Empirical Risk Minimization with Non-convex - PowerPoint PPT Presentation

Di ff erentially Private Empirical Risk Minimization with Non-convex Loss Functions Di Wang , Changyou Chen and Jinhui Xu State University of New York at Bu ff alo International Conference on Machine Learning 2019 Di Wang Non-convex DP-ERM ICML


  1. Di ff erentially Private Empirical Risk Minimization with Non-convex Loss Functions Di Wang , Changyou Chen and Jinhui Xu State University of New York at Bu ff alo International Conference on Machine Learning 2019 Di Wang Non-convex DP-ERM ICML 2019 1 / 15

  2. Outline Introduction 1 Problem Description Result 1 Result 2 Result 3 Di Wang Non-convex DP-ERM ICML 2019 2 / 15

  3. Outline Introduction 1 Problem Description Result 1 Result 2 Result 3 Di Wang Non-convex DP-ERM ICML 2019 3 / 15

  4. Empirical Risk Minimization (ERM) Given: A dataset D = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , · · · , ( x n , y n ) } , where each ( x i , y i ) ∈ R d × R ∼ P . Regularization r ( · ) : R d 󰀂→ R , we use ℓ 2 regularization with r ( w ) = λ 2 󰀃 w 󰀃 2 2 . For a loss function ℓ , the (regularized) Empirical Risk: n 󰁜 L r ( w ; D ) = 1 ˆ ℓ ( w ; x i , y i ) + r ( w ) . n i =1 the (regularized) Population Risk: L r P ( w ) = E ( x , y ) ∼ P [ ℓ ( w ; x , y )] + r ( w ) . Goal: Find w so as to minimize the empirical or population risk. Di Wang Non-convex DP-ERM ICML 2019 4 / 15

  5. ( 󰂄 , δ )- Di ff erential Privacy (DP) Di ff erential Privacy (DP) [Dwork et al,. 2006] We say that two datasets, D and D ′ , are neighbors if they di ff er by only one entry, denoted as D ∼ D ′ . A randomized algorithm A is ( 󰂄 , δ )-di ff erentially private if for all neighboring datasets D , D ′ , and for all events S in the output space of A , we have Pr( A ( D ) ∈ S ) ≤ e 󰂄 Pr A ( D ′ ) ∈ S ) + δ . Di Wang Non-convex DP-ERM ICML 2019 5 / 15

  6. DP-ERM DP-ERM Determine a sample complexity n = n (1 / 󰂄 , 1 / δ , p , 1 / α ) such that there is an ( 󰂄 , δ )-DP algorithm whose output w priv achieves an α -error in the expected excess empirical risk : Err r D ( w priv ) = E ˆ L ( w LDP ; D ) − min w ∈ R d ˆ L ( w ; D ) ≤ α . or in the expected excess empirical risk : Err r P ( w priv ) = E [ L r P ( w priv )] − min w ∈ R d L r P ( w ) ≤ α . Di Wang Non-convex DP-ERM ICML 2019 6 / 15

  7. Motivation Previous work on DP-ERM mainly focuses on convex loss functions. Di Wang Non-convex DP-ERM ICML 2019 7 / 15

  8. Motivation Previous work on DP-ERM mainly focuses on convex loss functions. For non-convex loss functions, [Zhang et al, 2017] and [Wang and Xu 2019] studied the problem and used, as error measurement, the ℓ 2 gradient norm of a private estimator, i.e., 󰀃∇ ˆ L r D ( w priv ) 󰀃 2 and E P 󰀃∇ ℓ ( w priv ; x , y ) 󰀃 2 Di Wang Non-convex DP-ERM ICML 2019 7 / 15

  9. Motivation Previous work on DP-ERM mainly focuses on convex loss functions. For non-convex loss functions, [Zhang et al, 2017] and [Wang and Xu 2019] studied the problem and used, as error measurement, the ℓ 2 gradient norm of a private estimator, i.e., 󰀃∇ ˆ L r D ( w priv ) 󰀃 2 and E P 󰀃∇ ℓ ( w priv ; x , y ) 󰀃 2 Main Question: Can the excess empirical (population) risk be used to measure the error of non-convex loss functions in the di ff erential privacy model? Di Wang Non-convex DP-ERM ICML 2019 7 / 15

  10. Outline Introduction 1 Problem Description Result 1 Result 2 Result 3 Di Wang Non-convex DP-ERM ICML 2019 8 / 15

  11. Result 1 Theorem 1 If the loss function is L -Lipschitz, twice di ff erentiable and M -smooth, by using the private version of Gradient Langevin Dynamics (DP-GLD) we show that the excess empirical (or population) risk is upper bounded by O ( d log(1 / δ ) ˜ log n 󰂄 2 ). Di Wang Non-convex DP-ERM ICML 2019 9 / 15

  12. Result 1 Theorem 1 If the loss function is L -Lipschitz, twice di ff erentiable and M -smooth, by using the private version of Gradient Langevin Dynamics (DP-GLD) we show that the excess empirical (or population) risk is upper bounded by O ( d log(1 / δ ) ˜ log n 󰂄 2 ). The proof is based on some recent developments in Bayesian learning and analysis of GLD. By using a finer analysis of the time-average error of some SDE, we show the following Di Wang Non-convex DP-ERM ICML 2019 9 / 15

  13. Result 1 Theorem 1 If the loss function is L -Lipschitz, twice di ff erentiable and M -smooth, by using the private version of Gradient Langevin Dynamics (DP-GLD) we show that the excess empirical (or population) risk is upper bounded by O ( d log(1 / δ ) ˜ log n 󰂄 2 ). The proof is based on some recent developments in Bayesian learning and analysis of GLD. By using a finer analysis of the time-average error of some SDE, we show the following Theorem 2 For the excessed empirical risk, there is an ( 󰂄 , δ )-DP algorithm which satisfies 󰀄 C 0 ( d ) log(1 / δ ) 󰀅 T →∞ Err r D ( w T ) ≤ ˜ lim O , n τ 󰂄 τ where C 0 ( d ) is a function of d and 0 < τ < 1 is some cosntant. Di Wang Non-convex DP-ERM ICML 2019 9 / 15

  14. Outline Introduction 1 Problem Description Result 1 Result 2 Result 3 Di Wang Non-convex DP-ERM ICML 2019 10 / 15

  15. Result 2 Are these bounds tight? Di Wang Non-convex DP-ERM ICML 2019 11 / 15

  16. Result 2 Are these bounds tight? Based on the exponential mechanism, we have Empirical Risk For any β < 1, there is an 󰂄 -di ff erentially private algorithm whose output w priv induces an excess empirical risk Err r D ( w priv ) ≤ ˜ O ( d n 󰂄 ) with probability at least 1 − β . Di Wang Non-convex DP-ERM ICML 2019 11 / 15

  17. Result 2 Are these bounds tight? Based on the exponential mechanism, we have Empirical Risk For any β < 1, there is an 󰂄 -di ff erentially private algorithm whose output w priv induces an excess empirical risk Err r D ( w priv ) ≤ ˜ O ( d n 󰂄 ) with probability at least 1 − β . Population Risk For Generalized Linear model and Robust Regressions (whose loss function is ℓ ( w ; x , y ) = ( σ ( 〈 w , x 〉 ) − y ) 2 and ℓ ( w ; x , y ) = Φ ( 〈 w , x 〉 − y ), respectively), under some reasonable assumptions, there is an ( 󰂄 , δ )-DP algorithm whose excess population risk is upper bounded by 󰁵 d ln 1 4 󰀄 󰀅 δ Err P ( w priv ) ≤ O √ n 󰂄 . Di Wang Non-convex DP-ERM ICML 2019 11 / 15

  18. Outline Introduction 1 Problem Description Result 1 Result 2 Result 3 Di Wang Non-convex DP-ERM ICML 2019 12 / 15

  19. Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Di Wang Non-convex DP-ERM ICML 2019 13 / 15

  20. Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Recent research on Deep Learning and other non-convex problems show that local minima , but not critical points, are su ffi cient. Di Wang Non-convex DP-ERM ICML 2019 13 / 15

  21. Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Recent research on Deep Learning and other non-convex problems show that local minima , but not critical points, are su ffi cient. But , finding local minima is still NP-hard. Di Wang Non-convex DP-ERM ICML 2019 13 / 15

  22. Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Recent research on Deep Learning and other non-convex problems show that local minima , but not critical points, are su ffi cient. But , finding local minima is still NP-hard. Fortunately, many non-convex functions are strict saddle. Thus, it is su ffi cient to find the second order stationary point (or approximate local minimum). Definition w is an α -second-order stationary point ( α -SOSP), if 󰀃∇ F ( w ) 󰀃 2 ≤ α and λ min ( ∇ 2 F ( w )) ≥ −√ ρα . (1) Di Wang Non-convex DP-ERM ICML 2019 13 / 15

  23. Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Recent research on Deep Learning and other non-convex problems show that local minima , but not critical points, are su ffi cient. But , finding local minima is still NP-hard. Fortunately, many non-convex functions are strict saddle. Thus, it is su ffi cient to find the second order stationary point (or approximate local minimum). Definition w is an α -second-order stationary point ( α -SOSP), if 󰀃∇ F ( w ) 󰀃 2 ≤ α and λ min ( ∇ 2 F ( w )) ≥ −√ ρα . (1) Can we find some approximate local minimum which escapes saddle points and still keeps the algorithm ( 󰂄 , δ ) -di ff erentially private? Di Wang Non-convex DP-ERM ICML 2019 13 / 15

  24. Result 3 On one hand, (Ge et al. 2015) proposed an algorithm, noisy Stochastic Gradient Descent , to find approximate local minima. Di Wang Non-convex DP-ERM ICML 2019 14 / 15

  25. Result 3 On one hand, (Ge et al. 2015) proposed an algorithm, noisy Stochastic Gradient Descent , to find approximate local minima. On the other hand, in DP community, one popular method for ERM is called DP-SGD , which adds some Gaussian noise in each iteration. Di Wang Non-convex DP-ERM ICML 2019 14 / 15

  26. Result 3 On one hand, (Ge et al. 2015) proposed an algorithm, noisy Stochastic Gradient Descent , to find approximate local minima. On the other hand, in DP community, one popular method for ERM is called DP-SGD , which adds some Gaussian noise in each iteration. Using DP-GD, we can show Theorem 4 If the data size n is large enough such that 󰁵 log 1 δ d log 1 ξ n ≥ ˜ Ω ( ) , (2) 󰂄α 2 then with probability 1 − ζ , one of the outputs is an α -SOSP of the empirical risk ˆ L ( · , D ). Di Wang Non-convex DP-ERM ICML 2019 14 / 15

  27. Thank you! Di Wang Non-convex DP-ERM ICML 2019 15 / 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend