a mean field view of the landscape of two layers neural
play

A Mean Field View of the Landscape of Two-Layers Neural Networks - PowerPoint PPT Presentation

A Mean Field View of the Landscape of Two-Layers Neural Networks Song Mei Stanford University November 14, 2018 Joint work with Andrea Montanari and Phan-Minh Nguyen Song Mei (Stanford University) Mean Field Dynamics for Neural Network


  1. A Mean Field View of the Landscape of Two-Layers Neural Networks Song Mei Stanford University November 14, 2018 Joint work with Andrea Montanari and Phan-Minh Nguyen Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 1 / 37

  2. Empirical surprise [Zhang, Bengio, Hardt, Recht, Vinyals, 2016] ◮ Overparameterized regime. ◮ Efficiently fit all the data. ◮ Generalize well. Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 2 / 37

  3. Empirical surprise ◮ Overparameterized regime. ◮ Efficiently fit all the data. ◮ Generalize well. Questions ◮ Why can complex neural network be optimized efficiently? ◮ Why does overparameterization not harm generalization? Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 3 / 37

  4. Two-layers neural networks Input layer Hidden layer Output layer Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 4 / 37

  5. Two-layers neural networks ◮ Parameter: θ ❂ ✭ θ ✶ ❀ ✿ ✿ ✿ ❀ θ ◆ ✮ ✷ R ◆ ✂ ❉ . ◮ Prediction: ◆ ② ✭ x ❀ θ ✮ ❂ ✶ ❳ ❫ ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ ◮ An example: θ ✐ ❂ ✭ ❛ ✐ ❀ w ✐ ✮ , ✛ ❄ ✭ x ❀ θ ✐ ✮ ❂ ❛ ✐ ✛ ✭ ❤ x ❀ w ✐ ✐ ✮ . ◮ Data distribution: ✭ x ❀ ② ✮ ✘ P x ❀② . ◮ Risk function: ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ θ ✮ ❂ E x ❀② ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 5 / 37

  6. Two-layers neural networks ◮ Parameter: θ ❂ ✭ θ ✶ ❀ ✿ ✿ ✿ ❀ θ ◆ ✮ ✷ R ◆ ✂ ❉ . ◮ Prediction: ◆ ② ✭ x ❀ θ ✮ ❂ ✶ ❳ ❫ ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ ◮ An example: θ ✐ ❂ ✭ ❛ ✐ ❀ w ✐ ✮ , ✛ ❄ ✭ x ❀ θ ✐ ✮ ❂ ❛ ✐ ✛ ✭ ❤ x ❀ w ✐ ✐ ✮ . ◮ Data distribution: ✭ x ❀ ② ✮ ✘ P x ❀② . ◮ Risk function: ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ θ ✮ ❂ E x ❀② ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 5 / 37

  7. Two-layers neural networks ◮ Parameter: θ ❂ ✭ θ ✶ ❀ ✿ ✿ ✿ ❀ θ ◆ ✮ ✷ R ◆ ✂ ❉ . ◮ Prediction: ◆ ② ✭ x ❀ θ ✮ ❂ ✶ ❳ ❫ ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ ◮ An example: θ ✐ ❂ ✭ ❛ ✐ ❀ w ✐ ✮ , ✛ ❄ ✭ x ❀ θ ✐ ✮ ❂ ❛ ✐ ✛ ✭ ❤ x ❀ w ✐ ✐ ✮ . ◮ Data distribution: ✭ x ❀ ② ✮ ✘ P x ❀② . ◮ Risk function: ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ θ ✮ ❂ E x ❀② ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 5 / 37

  8. Two-layers neural networks ◮ Parameter: θ ❂ ✭ θ ✶ ❀ ✿ ✿ ✿ ❀ θ ◆ ✮ ✷ R ◆ ✂ ❉ . ◮ Prediction: ◆ ② ✭ x ❀ θ ✮ ❂ ✶ ❳ ❫ ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ ◮ An example: θ ✐ ❂ ✭ ❛ ✐ ❀ w ✐ ✮ , ✛ ❄ ✭ x ❀ θ ✐ ✮ ❂ ❛ ✐ ✛ ✭ ❤ x ❀ w ✐ ✐ ✮ . ◮ Data distribution: ✭ x ❀ ② ✮ ✘ P x ❀② . ◮ Risk function: ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ θ ✮ ❂ E x ❀② ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 5 / 37

  9. Two-layers neural networks ◮ Parameter: θ ❂ ✭ θ ✶ ❀ ✿ ✿ ✿ ❀ θ ◆ ✮ ✷ R ◆ ✂ ❉ . ◮ Prediction: ◆ ② ✭ x ❀ θ ✮ ❂ ✶ ❳ ❫ ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ ◮ An example: θ ✐ ❂ ✭ ❛ ✐ ❀ w ✐ ✮ , ✛ ❄ ✭ x ❀ θ ✐ ✮ ❂ ❛ ✐ ✛ ✭ ❤ x ❀ w ✐ ✐ ✮ . ◮ Data distribution: ✭ x ❀ ② ✮ ✘ P x ❀② . ◮ Risk function: ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ θ ✮ ❂ E x ❀② ✛ ❄ ✭ x ❀ θ ✐ ✮ ✿ ◆ ✐ ❂✶ Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 5 / 37

  10. Two-layers neural networks Input layer Hidden layer Output layer w 1 a 1 w 2 a 2 w 3 a 3 a 4 w 4 Figure: θ ✐ ❂ ✭ ❛ ✐ ❀ w ✐ ✮ . Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 6 / 37

  11. Related literatures (before 2018) ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ θ ✮ ❂ E x ❀② ✛ ❄ ✭ x ❀ θ ❥ ✮ ✿ ◆ ❥ ❂✶ ◮ Landscape analysis: [Soudry, Carmon, 2016], [Freeman, Bruna, 2016], [Ge, Lee, Ma, 2017], [Soltanolkotabi, Javanmard, Lee, 2017], [Zhong, Song, Jain, Bartlett, Dhillon, 2017]... ◮ Optimization dynamics: [Tian, 2017], [Soltanolkotabi, 2017], [Li, Yuan, 2017]... ◮ Generalization: [Bartlett, Foster, Telgarsky, 2017], [Neyshabur, Bhojanapalli, McAllester, Srebro, 2017]... Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 7 / 37

  12. Related literatures (before 2018) ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ θ ✮ ❂ E x ❀② ✛ ❄ ✭ x ❀ θ ❥ ✮ ✿ ◆ ❥ ❂✶ ◮ Landscape analysis: [Soudry, Carmon, 2016], [Freeman, Bruna, 2016], [Ge, Lee, Ma, 2017], [Soltanolkotabi, Javanmard, Lee, 2017], [Zhong, Song, Jain, Bartlett, Dhillon, 2017]... ◮ Optimization dynamics: [Tian, 2017], [Soltanolkotabi, 2017], [Li, Yuan, 2017]... ◮ Generalization: [Bartlett, Foster, Telgarsky, 2017], [Neyshabur, Bhojanapalli, McAllester, Srebro, 2017]... Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 7 / 37

  13. Related literatures (before 2018) ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ θ ✮ ❂ E x ❀② ✛ ❄ ✭ x ❀ θ ❥ ✮ ✿ ◆ ❥ ❂✶ ◮ Landscape analysis: [Soudry, Carmon, 2016], [Freeman, Bruna, 2016], [Ge, Lee, Ma, 2017], [Soltanolkotabi, Javanmard, Lee, 2017], [Zhong, Song, Jain, Bartlett, Dhillon, 2017]... ◮ Optimization dynamics: [Tian, 2017], [Soltanolkotabi, 2017], [Li, Yuan, 2017]... ◮ Generalization: [Bartlett, Foster, Telgarsky, 2017], [Neyshabur, Bhojanapalli, McAllester, Srebro, 2017]... Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 7 / 37

  14. Overparameterization: what happens for large ◆ ? [Bengio, et. al, 2006]. Expand the square ◮ ◆ ◆ ❘ ◆ ✭ θ ✮ ❂ E ❬ ② ✷ ❪ ✰ ✷ ❱ ✭ θ ✐ ✮ ✰ ✶ ❳ ❳ ❯ ✭ θ ✐ ❀ θ ❥ ✮ ❀ ◆ ✷ ◆ ✐ ❂✶ ✐❀❥ ❂✶ where ❱ ✭ θ ✐ ✮ ❂ � E ❬ ②✛ ❄ ✭ x ❀ θ ✐ ✮❪ ❀ ❯ ✭ θ ✐ ❀ θ ❥ ✮ ❂ E ❬ ✛ ❄ ✭ x ❀ θ ✐ ✮ ✛ ❄ ✭ x ❀ θ ❥ ✮❪ ✿ ◮ ❘ ◆ depends on ✭ θ ✐ ✮ ✐ ✔ ◆ through ✚ ◆ ❂ ✭✶ ❂◆ ✮ P ◆ ✐ ❂✶ ✍ θ ✐ . ◮ Motivate us to define ❘ ✭ ✚ ✮ , ✚ ✷ P ✭ R ❉ ✮ , ❩ ❩ ❘ ✭ ✚ ✮ ❂ E ❬ ② ✷ ❪ ✰ ✷ ❱ ✭ θ ✮ ✚ ✭❞ θ ✮ ✰ ❯ ✭ θ ✶ ❀ θ ✷ ✮ ✚ ✭❞ θ ✶ ✮ ✚ ✭❞ θ ✷ ✮ ✿ Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 8 / 37

  15. Overparameterization: what happens for large ◆ ? [Bengio, et. al, 2006]. Expand the square ◮ ◆ ◆ ❘ ◆ ✭ θ ✮ ❂ E ❬ ② ✷ ❪ ✰ ✷ ❱ ✭ θ ✐ ✮ ✰ ✶ ❳ ❳ ❯ ✭ θ ✐ ❀ θ ❥ ✮ ❀ ◆ ✷ ◆ ✐ ❂✶ ✐❀❥ ❂✶ where ❱ ✭ θ ✐ ✮ ❂ � E ❬ ②✛ ❄ ✭ x ❀ θ ✐ ✮❪ ❀ ❯ ✭ θ ✐ ❀ θ ❥ ✮ ❂ E ❬ ✛ ❄ ✭ x ❀ θ ✐ ✮ ✛ ❄ ✭ x ❀ θ ❥ ✮❪ ✿ ◮ ❘ ◆ depends on ✭ θ ✐ ✮ ✐ ✔ ◆ through ✚ ◆ ❂ ✭✶ ❂◆ ✮ P ◆ ✐ ❂✶ ✍ θ ✐ . ◮ Motivate us to define ❘ ✭ ✚ ✮ , ✚ ✷ P ✭ R ❉ ✮ , ❩ ❩ ❘ ✭ ✚ ✮ ❂ E ❬ ② ✷ ❪ ✰ ✷ ❱ ✭ θ ✮ ✚ ✭❞ θ ✮ ✰ ❯ ✭ θ ✶ ❀ θ ✷ ✮ ✚ ✭❞ θ ✶ ✮ ✚ ✭❞ θ ✷ ✮ ✿ Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 8 / 37

  16. Overparameterization: what happens for large ◆ ? [Bengio, et. al, 2006]. Expand the square ◮ ◆ ◆ ❘ ◆ ✭ θ ✮ ❂ E ❬ ② ✷ ❪ ✰ ✷ ❱ ✭ θ ✐ ✮ ✰ ✶ ❳ ❳ ❯ ✭ θ ✐ ❀ θ ❥ ✮ ❀ ◆ ✷ ◆ ✐ ❂✶ ✐❀❥ ❂✶ where ❱ ✭ θ ✐ ✮ ❂ � E ❬ ②✛ ❄ ✭ x ❀ θ ✐ ✮❪ ❀ ❯ ✭ θ ✐ ❀ θ ❥ ✮ ❂ E ❬ ✛ ❄ ✭ x ❀ θ ✐ ✮ ✛ ❄ ✭ x ❀ θ ❥ ✮❪ ✿ ◮ ❘ ◆ depends on ✭ θ ✐ ✮ ✐ ✔ ◆ through ✚ ◆ ❂ ✭✶ ❂◆ ✮ P ◆ ✐ ❂✶ ✍ θ ✐ . ◮ Motivate us to define ❘ ✭ ✚ ✮ , ✚ ✷ P ✭ R ❉ ✮ , ❩ ❩ ❘ ✭ ✚ ✮ ❂ E ❬ ② ✷ ❪ ✰ ✷ ❱ ✭ θ ✮ ✚ ✭❞ θ ✮ ✰ ❯ ✭ θ ✶ ❀ θ ✷ ✮ ✚ ✭❞ θ ✶ ✮ ✚ ✭❞ θ ✷ ✮ ✿ Song Mei (Stanford University) Mean Field Dynamics for Neural Network November 14, 2018 8 / 37

Recommend


More recommend