the landscape of non convex losses for statistical
play

The landscape of non-convex losses for statistical learning problems - PowerPoint PPT Presentation

The landscape of non-convex losses for statistical learning problems Song Mei Stanford University October 19, 2017 Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 1 / 32 Deep learning Song Mei


  1. The landscape of non-convex losses for statistical learning problems Song Mei Stanford University October 19, 2017 Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 1 / 32

  2. Deep learning Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 2 / 32

  3. Deep learning Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 2 / 32

  4. Convolutional Neural Network Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 3 / 32

  5. Non-convex optimization Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 4 / 32

  6. Why does non-convex neural network perform well? Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 5 / 32

  7. Why does some non-convex optimization perform well? Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 6 / 32

  8. Why does some non-convex optimization perform well? ◮ Stochastic gradient descent escape bad local minima. ◮ Good initialization escape bad local minima. ◮ Globally there are less bad local minima. ◮ .... Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 6 / 32

  9. Non-convex optimization: analysis of global geometry Number and locations of saddle points and local minima. Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 7 / 32

  10. Let’s do it! The objective function ♥ ❳ ✶ ❢ ② ✐ � ✛ ✭ ❲ ❦ ✁ ✁ ✁ ✛ ✭ ❲ ✷ ✁ ✛ ✭ ❲ ✶ ① ✐ ✮✮✮ ❣ ✷ ♠✐♥ ♥ ❲ ✐ ✐ ❂✶ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 8 / 32

  11. Let’s do it! The objective function ♥ ❳ ✶ ❢ ② ✐ � ✛ ✭ ❲ ✷ ✁ ✛ ✭ ❲ ✶ ① ✐ ✮✮ ❣ ✷ ♠✐♥ ♥ ❲ ✐ ✐ ❂✶ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 8 / 32

  12. Let’s do it! The objective function ♥ ❳ ✶ ❢ ② ✐ � ✛ ✭ ❤ ✒❀ ① ✐ ✐ ✮ ❣ ✷ ♠✐♥ ♥ ✒ ✐ ❂✶ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 8 / 32

  13. Binary linear classification The model ③ ✐ ❂ ✭ ① ✐ ❀ ② ✐ ✮ . ① ✐ ✷ R ❞ , ② ✐ ✷ ❢ ✵ ❀ ✶ ❣ . Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 9 / 32

  14. One node neural network The model ③ ✐ ❂ ✭ ① ✐ ❀ ② ✐ ✮ . ① ✐ ✷ R ❞ , ② ✐ ✷ ❢ ✵ ❀ ✶ ❣ . ◮ Convex logit loss ( ❵ ❝ is cvx in ✒ ) ❵ ❝ ✭ ✒ ❀ ③ ✮ ❂ ② ❤ ①❀ ✒ ✐ � ❧♦❣ ❢ ✶ ✰ ❡①♣✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✿ ◮ Non-convex loss ( ❵ is not cvx in ✒ ) ❵ ✭ ✒ ❀ ③ ✮ ❂ ❢ ② � ✛ ✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✷ ❀ where ✛ ✭ t ✮ ❂ ✶ ❂ ✭✶ ✰ ❡①♣✭ t ✮✮ ✿ ◮ Empirical Risk ♥ ♥ ❳ ❳ ❘ ♥ ✭ ✒ ✮ ❂ ✶ ❵ ✭ ✒ ❀ ③ ✐ ✮ ❂ ✶ ❜ ❢ ② ✐ � ✛ ✭ ❤ ✒❀ ① ✐ ✐ ✮ ❣ ✷ ✿ ♥ ♥ ✐ ❂✶ ✐ ❂✶ ◮ Empirical risk minimizer ❫ ❜ ✒ ♥ ❂ ❛r❣ ♠✐♥ ❘ ♥ ✭ ✒ ✮ ✿ ✒ ✷ B ❞ ✭ ❘ ✮ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 10 / 32

  15. One node neural network The model ③ ✐ ❂ ✭ ① ✐ ❀ ② ✐ ✮ . ① ✐ ✷ R ❞ , ② ✐ ✷ ❢ ✵ ❀ ✶ ❣ . ◮ Convex logit loss ( ❵ ❝ is cvx in ✒ ) ❵ ❝ ✭ ✒ ❀ ③ ✮ ❂ ② ❤ ①❀ ✒ ✐ � ❧♦❣ ❢ ✶ ✰ ❡①♣✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✿ ◮ Non-convex loss ( ❵ is not cvx in ✒ ) ❵ ✭ ✒ ❀ ③ ✮ ❂ ❢ ② � ✛ ✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✷ ❀ where ✛ ✭ t ✮ ❂ ✶ ❂ ✭✶ ✰ ❡①♣✭ t ✮✮ ✿ ◮ Empirical Risk ♥ ♥ ❳ ❳ ❘ ♥ ✭ ✒ ✮ ❂ ✶ ❵ ✭ ✒ ❀ ③ ✐ ✮ ❂ ✶ ❜ ❢ ② ✐ � ✛ ✭ ❤ ✒❀ ① ✐ ✐ ✮ ❣ ✷ ✿ ♥ ♥ ✐ ❂✶ ✐ ❂✶ ◮ Empirical risk minimizer ❫ ❜ ✒ ♥ ❂ ❛r❣ ♠✐♥ ❘ ♥ ✭ ✒ ✮ ✿ ✒ ✷ B ❞ ✭ ❘ ✮ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 10 / 32

  16. One node neural network The model ③ ✐ ❂ ✭ ① ✐ ❀ ② ✐ ✮ . ① ✐ ✷ R ❞ , ② ✐ ✷ ❢ ✵ ❀ ✶ ❣ . ◮ Convex logit loss ( ❵ ❝ is cvx in ✒ ) ❵ ❝ ✭ ✒ ❀ ③ ✮ ❂ ② ❤ ①❀ ✒ ✐ � ❧♦❣ ❢ ✶ ✰ ❡①♣✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✿ ◮ Non-convex loss ( ❵ is not cvx in ✒ ) ❵ ✭ ✒ ❀ ③ ✮ ❂ ❢ ② � ✛ ✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✷ ❀ where ✛ ✭ t ✮ ❂ ✶ ❂ ✭✶ ✰ ❡①♣✭ t ✮✮ ✿ ◮ Empirical Risk ♥ ♥ ❳ ❳ ❘ ♥ ✭ ✒ ✮ ❂ ✶ ❵ ✭ ✒ ❀ ③ ✐ ✮ ❂ ✶ ❜ ❢ ② ✐ � ✛ ✭ ❤ ✒❀ ① ✐ ✐ ✮ ❣ ✷ ✿ ♥ ♥ ✐ ❂✶ ✐ ❂✶ ◮ Empirical risk minimizer ❫ ❜ ✒ ♥ ❂ ❛r❣ ♠✐♥ ❘ ♥ ✭ ✒ ✮ ✿ ✒ ✷ B ❞ ✭ ❘ ✮ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 10 / 32

  17. One node neural network The model ③ ✐ ❂ ✭ ① ✐ ❀ ② ✐ ✮ . ① ✐ ✷ R ❞ , ② ✐ ✷ ❢ ✵ ❀ ✶ ❣ . ◮ Convex logit loss ( ❵ ❝ is cvx in ✒ ) ❵ ❝ ✭ ✒ ❀ ③ ✮ ❂ ② ❤ ①❀ ✒ ✐ � ❧♦❣ ❢ ✶ ✰ ❡①♣✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✿ ◮ Non-convex loss ( ❵ is not cvx in ✒ ) ❵ ✭ ✒ ❀ ③ ✮ ❂ ❢ ② � ✛ ✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✷ ❀ where ✛ ✭ t ✮ ❂ ✶ ❂ ✭✶ ✰ ❡①♣✭ t ✮✮ ✿ ◮ Empirical Risk ♥ ♥ ❳ ❳ ❘ ♥ ✭ ✒ ✮ ❂ ✶ ❵ ✭ ✒ ❀ ③ ✐ ✮ ❂ ✶ ❜ ❢ ② ✐ � ✛ ✭ ❤ ✒❀ ① ✐ ✐ ✮ ❣ ✷ ✿ ♥ ♥ ✐ ❂✶ ✐ ❂✶ ◮ Empirical risk minimizer ❫ ❜ ✒ ♥ ❂ ❛r❣ ♠✐♥ ❘ ♥ ✭ ✒ ✮ ✿ ✒ ✷ B ❞ ✭ ❘ ✮ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 10 / 32

  18. ❜ ❘ ♥ ✭ ✒ ✮ A negative theoretical result Theorem (Auer et. al. . 1996) For the one node neural network, ✽ ♥❀ ❞ ❃ ✵ , there exists a dataset ❞ ❝ ❞ distinct local ✭ ① ✐ ❀ ② ✐ ✮ ♥ ✐ ❂✶ such that the empirical risk ❜ ❘ ♥ ✭ ✒ ✮ has ❜ ♥ minima. Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 11 / 32

  19. A negative theoretical result Theorem (Auer et. al. . 1996) For the one node neural network, ✽ ♥❀ ❞ ❃ ✵ , there exists a dataset ❞ ❝ ❞ distinct local ✭ ① ✐ ❀ ② ✐ ✮ ♥ ✐ ❂✶ such that the empirical risk ❜ ❘ ♥ ✭ ✒ ✮ has ❜ ♥ minima. The landscape of ❜ ❘ ♥ ✭ ✒ ✮ is very rough. Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 11 / 32

  20. A negative theoretical result Theorem (Auer et. al. . 1996) For the one node neural network, ✽ ♥❀ ❞ ❃ ✵ , there exists a dataset ❞ ❝ ❞ distinct local ✭ ① ✐ ❀ ② ✐ ✮ ♥ ✐ ❂✶ such that the empirical risk ❜ ❘ ♥ ✭ ✒ ✮ has ❜ ♥ minima. The landscape of ❜ ❘ ♥ ✭ ✒ ✮ is very rough. Is this the end of the world of deep learning? Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 11 / 32

  21. Real data experiment ◮ The "Australian" data set from Statlog: ❞ ❂ ✶✶ , ♥ ❂ ✻✽✸ . ◮ Random initialization ✒ ✭✵✮ ✘ ◆ ✭ 0 ❀ ■ ❞ ✮ . ◮ Run gradient descent and track the path ✒ ✭ ❦ ✮ . ◮ Generate multiple paths with independent initializations. ◮ Plot standard deviation over paths st❞✭ ✒ ✭ ❦ ✮✮ versus ❦ . Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 12 / 32

  22. Real data experiment ◮ The "Australian" data set from Statlog: ❞ ❂ ✶✶ , ♥ ❂ ✻✽✸ . ◮ Random initialization ✒ ✭✵✮ ✘ ◆ ✭ 0 ❀ ■ ❞ ✮ . ◮ Run gradient descent and track the path ✒ ✭ ❦ ✮ . ◮ Generate multiple paths with independent initializations. ◮ Plot standard deviation over paths st❞✭ ✒ ✭ ❦ ✮✮ versus ❦ . Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 12 / 32

  23. Real data experiment ◮ The "Australian" data set from Statlog: ❞ ❂ ✶✶ , ♥ ❂ ✻✽✸ . ◮ Random initialization ✒ ✭✵✮ ✘ ◆ ✭ 0 ❀ ■ ❞ ✮ . ◮ Run gradient descent and track the path ✒ ✭ ❦ ✮ . ◮ Generate multiple paths with independent initializations. ◮ Plot standard deviation over paths st❞✭ ✒ ✭ ❦ ✮✮ versus ❦ . Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 12 / 32

  24. Real data experiment ◮ The "Australian" data set from Statlog: ❞ ❂ ✶✶ , ♥ ❂ ✻✽✸ . ◮ Random initialization ✒ ✭✵✮ ✘ ◆ ✭ 0 ❀ ■ ❞ ✮ . ◮ Run gradient descent and track the path ✒ ✭ ❦ ✮ . ◮ Generate multiple paths with independent initializations. ◮ Plot standard deviation over paths st❞✭ ✒ ✭ ❦ ✮✮ versus ❦ . Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 12 / 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend