networks on structured data
play

Networks on Structured Data Yingyu Liang@UW-Madison Joint work with - PowerPoint PPT Presentation

Learning Over-Parameterized Neural Networks on Structured Data Yingyu Liang@UW-Madison Joint work with Yuanzhi Li@Princeton Stanford Empirical Success of Deep Learning Machine translation Computer vision Game playing Robots Fundamental


  1. Learning Over-Parameterized Neural Networks on Structured Data Yingyu Liang@UW-Madison Joint work with Yuanzhi Li@Princeton → Stanford

  2. Empirical Success of Deep Learning Machine translation Computer vision Game playing Robots

  3. Fundamental Questions • Optimization: Why can find a network with good accuracy on training data? • Generalization: Why the network also accurate on new test instances?

  4. Fundamental Questions • Optimization: Why can find a network with good accuracy on training data? • Generalization: Why the network also accurate on new test instances? • Key challenge: the optimization is non-convex Theoretically hard but practically not difficult!

  5. Mystery I: Over-Parameterization Helps Optimization • Empirical observation: easier to train wider networks Synthetic data … … Train a larger network Ground truth On the Computational Efficiency of Training Neural Networks. Roi Livni, Shai Shalev-Shwartz, Ohad Shamir. NeurIPS 2014.

  6. Mystery II: Practical DNNs Easily Fit Random Labels • Empirical observation: practical DNNs easily fit random labels Understanding deep learning requires rethinking generalization. Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals. ICLR 2017.

  7. Mystery II: Practical DNNs Easily Fit Random Labels • Empirical observation: practical DNNs easily fit random labels Understanding deep learning requires rethinking generalization. Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals. ICLR 2017.

  8. Our Work Is there a simple theoretical explanation?

  9. Our Work Is there a simple theoretical explanation? Our work: Yes for two-layer NN on clustered data!

  10. Our Work Is there a simple theoretical explanation? Our work: Yes for two-layer NN on clustered data! Poster: Tue Poster Session A #143

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend