on the local minima of the empirical risk
play

On the Local Minima of the Empirical Risk Chi Jin * 1 , Lydia T. Liu* - PowerPoint PPT Presentation

On the Local Minima of the Empirical Risk Chi Jin * 1 , Lydia T. Liu* 1 , Rong Ge 2 , Michael I. Jordan 1 1EECS, University of California, Berkeley. 2Duke University. 1 / 6 Chi Jin On the Local Minima of the Empirical Risk Overview Nonconvex


  1. On the Local Minima of the Empirical Risk Chi Jin * 1 , Lydia T. Liu* 1 , Rong Ge 2 , Michael I. Jordan 1 1EECS, University of California, Berkeley. 2Duke University. 1 / 6 Chi Jin On the Local Minima of the Empirical Risk

  2. Overview Nonconvex Optimization. ◮ Gradient Descent (GD) → stationary points : local max, saddle points, local min. 2 / 6 Chi Jin On the Local Minima of the Empirical Risk

  3. Overview Nonconvex Optimization. ◮ Gradient Descent (GD) → stationary points : local max, saddle points, local min. ◮ Perturbed GD [ Jin et al. 2017] efficiently escapes local max and saddle points. 2 / 6 Chi Jin On the Local Minima of the Empirical Risk

  4. Overview Nonconvex Optimization. ◮ Gradient Descent (GD) → stationary points : local max, saddle points, local min. ◮ Perturbed GD [ Jin et al. 2017] efficiently escapes local max and saddle points. ◮ How to deal with spurious local min? 2 / 6 Chi Jin On the Local Minima of the Empirical Risk

  5. Local Minima In general, finding global minima is NP-hard . 3 / 6 Chi Jin On the Local Minima of the Empirical Risk

  6. Local Minima In general, finding global minima is NP-hard . f Avoiding “shallow” local minima Goal: finds approximate local minima of smooth nonconvex function F , given only access to an errorneous version f where sup x | F ( x ) − f ( x ) | ≤ ν 3 / 6 Chi Jin On the Local Minima of the Empirical Risk

  7. Application Statistical Learning. Minimize population risk R while only have access to emprical risk ˆ R n . n R n ( θ ) = 1 ˆ � R ( θ ) = E z ∼D [ L ( θ ; z )] , L ( θ ; z i ) . n i =1 4 / 6 Chi Jin On the Local Minima of the Empirical Risk

  8. Application Statistical Learning. Minimize population risk R while only have access to emprical risk ˆ R n . n R n ( θ ) = 1 ˆ � R ( θ ) = E z ∼D [ L ( θ ; z )] , L ( θ ; z i ) . n i =1 R n ( θ ) | ≤ O (1 / √ n ). Unifrom convergence guarantees sup θ | R ( θ ) − ˆ 4 / 6 Chi Jin On the Local Minima of the Empirical Risk

  9. Results f Goal: find ǫ -approximate local minima of F in polynomial time. Central Questions: 1. What algorithm can achieve this ? 2. How much error ν can be tolerated ? 5 / 6 Chi Jin On the Local Minima of the Empirical Risk

  10. Results f Goal: find ǫ -approximate local minima of F in polynomial time. Central Questions: 1. What algorithm can achieve this ? 2. How much error ν can be tolerated ? Zhang et al. [2017]: Stochastic Gradient Langevin Dynamics (SGLD) if ν ≤ ǫ 2 / d 8 . 5 / 6 Chi Jin On the Local Minima of the Empirical Risk

  11. Results f Goal: find ǫ -approximate local minima of F in polynomial time. Central Questions: 1. What algorithm can achieve this ? 2. How much error ν can be tolerated ? Zhang et al. [2017]: Stochastic Gradient Langevin Dynamics (SGLD) if ν ≤ ǫ 2 / d 8 . This Work: Perturbed SGD on a “smoothed” version of f if ν ≤ ǫ 1 . 5 / d . 5 / 6 Chi Jin On the Local Minima of the Empirical Risk

  12. Almost Sharp Guarantees Is there better polynomial time algorithms that tolerate larger error? 6 / 6 Chi Jin On the Local Minima of the Empirical Risk

  13. Almost Sharp Guarantees Is there better polynomial time algorithms that tolerate larger error? No! Complete characterization of error ν vs accuracy ǫ and dimension d . 6 / 6 Chi Jin On the Local Minima of the Empirical Risk

  14. Almost Sharp Guarantees Is there better polynomial time algorithms that tolerate larger error? No! Complete characterization of error ν vs accuracy ǫ and dimension d . Poster: Wed 5-7 PM, #43. Thanks! 6 / 6 Chi Jin On the Local Minima of the Empirical Risk

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend