The Effect of Network Width on Stochastic Gradient Descent and Generalization
Daniel S. Park
ICML 2019
Daniel S. Park (Google) ICML 2019 1 / 9
The Effect of Network Width on Stochastic Gradient Descent and - - PowerPoint PPT Presentation
The Effect of Network Width on Stochastic Gradient Descent and Generalization Daniel S. Park Google ICML 2019 Daniel S. Park (Google) ICML 2019 1 / 9 Work with Jascha Sohl-Dickstein, Quoc V. Le and Samuel L. Smith. Daniel S. Park (Google)
Daniel S. Park (Google) ICML 2019 1 / 9
Daniel S. Park (Google) ICML 2019 2 / 9
Daniel S. Park (Google) ICML 2019 3 / 9
Daniel S. Park (Google) ICML 2019 4 / 9
init governs how noisy the SGD is.
*Mandt et al. (2017); Chaudhari & Soatto (2017); Jastrzebski et al. (2017); Smith & Le (2017). Daniel S. Park (Google) ICML 2019 5 / 9
Daniel S. Park (Google) ICML 2019 6 / 9
Daniel S. Park (Google) ICML 2019 7 / 9
Daniel S. Park (Google) ICML 2019 8 / 9
Daniel S. Park (Google) ICML 2019 9 / 9