a mean field view of the landscape of two layers neural
play

A Mean Field View of the Landscape of Two-Layers Neural Networks - PowerPoint PPT Presentation

A Mean Field View of the Landscape of Two-Layers Neural Networks Song Mei Stanford University November 14, 2018 Joint work with Andrea Montanari and Phan-Minh Nguyen Song Mei (Stanford University) Mean Field Dynamics for Neural Network


  1. A Mean Field View of the Landscape of Two-Layers Neural Networks Song Mei Stanford University November 14, 2018 Joint work with Andrea Montanari and Phan-Minh Nguyen Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 1 / 37

  2. Empirical surprise [Zhang, Bengio, Hardt, Recht, Vinyals, 2016] ◮ Overparameterized regime. ◮ Efficiently fit all the data. ◮ Generalize well. Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 2 / 37

  3. Empirical surprise ◮ Overparameterized regime. ◮ Efficiently fit all the data. ◮ Generalize well. Questions ◮ Why can complex neural network be optimized efficiently? ◮ Why does overparameterization not harm generalization? Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 3 / 37

  4. Two-layers neural networks Input layer Hidden layer Output layer Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 4 / 37

  5. Two-layers neural networks ◮ Parameter: θ ❂ ✭ θ ✶ ❀ ✿ ✿ ✿ ❀ θ ◆ ✮ ✷ R ◆ ✂ ❉ . ◮ Prediction: ◆ ② ✭ x ❀ θ ✮ ❂ ✶ ❳ ❫ ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ ◮ An example: θ ✐ ❂ ✭ ❛ ✐ ❀ w ✐ ✮ , ✛ ❄ ✭ x ❀ θ ✐ ✮ ❂ ❛ ✐ ✛ ✭ ❤ x ❀ w ✐ ✐ ✮ . ◮ Data distribution: ✭ x ❀ ② ✮ ✘ P x ❀② . ◮ Risk function: ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ θ ✮ ❂ E x ❀② ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 5 / 37

  6. Two-layers neural networks ◮ Parameter: θ ❂ ✭ θ ✶ ❀ ✿ ✿ ✿ ❀ θ ◆ ✮ ✷ R ◆ ✂ ❉ . ◮ Prediction: ◆ ② ✭ x ❀ θ ✮ ❂ ✶ ❳ ❫ ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ ◮ An example: θ ✐ ❂ ✭ ❛ ✐ ❀ w ✐ ✮ , ✛ ❄ ✭ x ❀ θ ✐ ✮ ❂ ❛ ✐ ✛ ✭ ❤ x ❀ w ✐ ✐ ✮ . ◮ Data distribution: ✭ x ❀ ② ✮ ✘ P x ❀② . ◮ Risk function: ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ θ ✮ ❂ E x ❀② ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 5 / 37

  7. Two-layers neural networks ◮ Parameter: θ ❂ ✭ θ ✶ ❀ ✿ ✿ ✿ ❀ θ ◆ ✮ ✷ R ◆ ✂ ❉ . ◮ Prediction: ◆ ② ✭ x ❀ θ ✮ ❂ ✶ ❳ ❫ ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ ◮ An example: θ ✐ ❂ ✭ ❛ ✐ ❀ w ✐ ✮ , ✛ ❄ ✭ x ❀ θ ✐ ✮ ❂ ❛ ✐ ✛ ✭ ❤ x ❀ w ✐ ✐ ✮ . ◮ Data distribution: ✭ x ❀ ② ✮ ✘ P x ❀② . ◮ Risk function: ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ θ ✮ ❂ E x ❀② ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 5 / 37

  8. Two-layers neural networks ◮ Parameter: θ ❂ ✭ θ ✶ ❀ ✿ ✿ ✿ ❀ θ ◆ ✮ ✷ R ◆ ✂ ❉ . ◮ Prediction: ◆ ② ✭ x ❀ θ ✮ ❂ ✶ ❳ ❫ ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ ◮ An example: θ ✐ ❂ ✭ ❛ ✐ ❀ w ✐ ✮ , ✛ ❄ ✭ x ❀ θ ✐ ✮ ❂ ❛ ✐ ✛ ✭ ❤ x ❀ w ✐ ✐ ✮ . ◮ Data distribution: ✭ x ❀ ② ✮ ✘ P x ❀② . ◮ Risk function: ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ θ ✮ ❂ E x ❀② ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 5 / 37

  9. Two-layers neural networks ◮ Parameter: θ ❂ ✭ θ ✶ ❀ ✿ ✿ ✿ ❀ θ ◆ ✮ ✷ R ◆ ✂ ❉ . ◮ Prediction: ◆ ② ✭ x ❀ θ ✮ ❂ ✶ ❳ ❫ ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ ◮ An example: θ ✐ ❂ ✭ ❛ ✐ ❀ w ✐ ✮ , ✛ ❄ ✭ x ❀ θ ✐ ✮ ❂ ❛ ✐ ✛ ✭ ❤ x ❀ w ✐ ✐ ✮ . ◮ Data distribution: ✭ x ❀ ② ✮ ✘ P x ❀② . ◮ Risk function: ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ θ ✮ ❂ E x ❀② ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 5 / 37

  10. Two-layers neural networks Input layer Hidden layer Output layer w 1 a 1 w 2 a 2 w 3 a 3 a 4 w 4 Figure: θ ✐ ❂ ✭ ❛ ✐ ❀ w ✐ ✮ . Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 6 / 37

  11. Related literatures (before 2018) ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ θ ✮ ❂ E x ❀② ✛ ❄ ✭ x ❀ θ ❥ ✮ ✿ ◆ ❥ ❂✶ ◮ Landscape analysis: [Soudry, Carmon, 2016], [Freeman, Bruna, 2016], [Ge, Lee, Ma, 2017], [Soltanolkotabi, Javanmard, Lee, 2017], [Zhong, Song, Jain, Bartlett, Dhillon, 2017]... ◮ Optimization dynamics: [Tian, 2017], [Soltanolkotabi, 2017], [Li, Yuan, 2017]... ◮ Generalization: [Bartlett, Foster, Telgarsky, 2017], [Neyshabur, Bhojanapalli, McAllester, Srebro, 2017]... Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 7 / 37

  12. Related literatures (before 2018) ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ θ ✮ ❂ E x ❀② ✛ ❄ ✭ x ❀ θ ❥ ✮ ✿ ◆ ❥ ❂✶ ◮ Landscape analysis: [Soudry, Carmon, 2016], [Freeman, Bruna, 2016], [Ge, Lee, Ma, 2017], [Soltanolkotabi, Javanmard, Lee, 2017], [Zhong, Song, Jain, Bartlett, Dhillon, 2017]... ◮ Optimization dynamics: [Tian, 2017], [Soltanolkotabi, 2017], [Li, Yuan, 2017]... ◮ Generalization: [Bartlett, Foster, Telgarsky, 2017], [Neyshabur, Bhojanapalli, McAllester, Srebro, 2017]... Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 7 / 37

  13. Related literatures (before 2018) ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ θ ✮ ❂ E x ❀② ✛ ❄ ✭ x ❀ θ ❥ ✮ ✿ ◆ ❥ ❂✶ ◮ Landscape analysis: [Soudry, Carmon, 2016], [Freeman, Bruna, 2016], [Ge, Lee, Ma, 2017], [Soltanolkotabi, Javanmard, Lee, 2017], [Zhong, Song, Jain, Bartlett, Dhillon, 2017]... ◮ Optimization dynamics: [Tian, 2017], [Soltanolkotabi, 2017], [Li, Yuan, 2017]... ◮ Generalization: [Bartlett, Foster, Telgarsky, 2017], [Neyshabur, Bhojanapalli, McAllester, Srebro, 2017]... Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 7 / 37

  14. Overparameterization: what happens for large ◆ ? [Bengio, et. al, 2006]. Expand the square ◮ ◆ ◆ ❘ ◆ ✭ θ ✮ ❂ E ❬ ② ✷ ❪ ✰ ✷ ❱ ✭ θ ✐ ✮ ✰ ✶ ❳ ❳ ❯ ✭ θ ✐ ❀ θ ❥ ✮ ❀ ◆ ✷ ◆ ✐ ❂✶ ✐❀❥ ❂✶ where ❱ ✭ θ ✐ ✮ ❂ � E ❬ ②✛ ❄ ✭ x ❀ θ ✐ ✮❪ ❀ ❯ ✭ θ ✐ ❀ θ ❥ ✮ ❂ E ❬ ✛ ❄ ✭ x ❀ θ ✐ ✮ ✛ ❄ ✭ x ❀ θ ❥ ✮❪ ✿ ◮ ❘ ◆ depends on ✭ θ ✐ ✮ ✐ ✔ ◆ through ✚ ◆ ❂ ✭✶ ❂◆ ✮ P ◆ ✐ ❂✶ ✍ θ ✐ . ◮ Motivate us to define ❘ ✭ ✚ ✮ , ✚ ✷ P ✭ R ❉ ✮ , ❩ ❩ ❘ ✭ ✚ ✮ ❂ E ❬ ② ✷ ❪ ✰ ✷ ❱ ✭ θ ✮ ✚ ✭❞ θ ✮ ✰ ❯ ✭ θ ✶ ❀ θ ✷ ✮ ✚ ✭❞ θ ✶ ✮ ✚ ✭❞ θ ✷ ✮ ✿ Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 8 / 37

  15. Overparameterization: what happens for large ◆ ? [Bengio, et. al, 2006]. Expand the square ◮ ◆ ◆ ❘ ◆ ✭ θ ✮ ❂ E ❬ ② ✷ ❪ ✰ ✷ ❱ ✭ θ ✐ ✮ ✰ ✶ ❳ ❳ ❯ ✭ θ ✐ ❀ θ ❥ ✮ ❀ ◆ ✷ ◆ ✐ ❂✶ ✐❀❥ ❂✶ where ❱ ✭ θ ✐ ✮ ❂ � E ❬ ②✛ ❄ ✭ x ❀ θ ✐ ✮❪ ❀ ❯ ✭ θ ✐ ❀ θ ❥ ✮ ❂ E ❬ ✛ ❄ ✭ x ❀ θ ✐ ✮ ✛ ❄ ✭ x ❀ θ ❥ ✮❪ ✿ ◮ ❘ ◆ depends on ✭ θ ✐ ✮ ✐ ✔ ◆ through ✚ ◆ ❂ ✭✶ ❂◆ ✮ P ◆ ✐ ❂✶ ✍ θ ✐ . ◮ Motivate us to define ❘ ✭ ✚ ✮ , ✚ ✷ P ✭ R ❉ ✮ , ❩ ❩ ❘ ✭ ✚ ✮ ❂ E ❬ ② ✷ ❪ ✰ ✷ ❱ ✭ θ ✮ ✚ ✭❞ θ ✮ ✰ ❯ ✭ θ ✶ ❀ θ ✷ ✮ ✚ ✭❞ θ ✶ ✮ ✚ ✭❞ θ ✷ ✮ ✿ Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 8 / 37

  16. Overparameterization: what happens for large ◆ ? [Bengio, et. al, 2006]. Expand the square ◮ ◆ ◆ ❘ ◆ ✭ θ ✮ ❂ E ❬ ② ✷ ❪ ✰ ✷ ❱ ✭ θ ✐ ✮ ✰ ✶ ❳ ❳ ❯ ✭ θ ✐ ❀ θ ❥ ✮ ❀ ◆ ✷ ◆ ✐ ❂✶ ✐❀❥ ❂✶ where ❱ ✭ θ ✐ ✮ ❂ � E ❬ ②✛ ❄ ✭ x ❀ θ ✐ ✮❪ ❀ ❯ ✭ θ ✐ ❀ θ ❥ ✮ ❂ E ❬ ✛ ❄ ✭ x ❀ θ ✐ ✮ ✛ ❄ ✭ x ❀ θ ❥ ✮❪ ✿ ◮ ❘ ◆ depends on ✭ θ ✐ ✮ ✐ ✔ ◆ through ✚ ◆ ❂ ✭✶ ❂◆ ✮ P ◆ ✐ ❂✶ ✍ θ ✐ . ◮ Motivate us to define ❘ ✭ ✚ ✮ , ✚ ✷ P ✭ R ❉ ✮ , ❩ ❩ ❘ ✭ ✚ ✮ ❂ E ❬ ② ✷ ❪ ✰ ✷ ❱ ✭ θ ✮ ✚ ✭❞ θ ✮ ✰ ❯ ✭ θ ✶ ❀ θ ✷ ✮ ✚ ✭❞ θ ✶ ✮ ✚ ✭❞ θ ✷ ✮ ✿ Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 8 / 37

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend