Networks on Structured Data Yingyu Liang@UW-Madison Joint work with - - PowerPoint PPT Presentation

networks on structured data
SMART_READER_LITE
LIVE PREVIEW

Networks on Structured Data Yingyu Liang@UW-Madison Joint work with - - PowerPoint PPT Presentation

Learning Over-Parameterized Neural Networks on Structured Data Yingyu Liang@UW-Madison Joint work with Yuanzhi Li@Princeton Stanford Empirical Success of Deep Learning Machine translation Computer vision Game playing Robots Fundamental


slide-1
SLIDE 1

Learning Over-Parameterized Neural Networks on Structured Data

Yingyu Liang@UW-Madison Joint work with Yuanzhi Li@Princeton → Stanford

slide-2
SLIDE 2

Empirical Success of Deep Learning

Computer vision Machine translation Game playing Robots

slide-3
SLIDE 3

Fundamental Questions

  • Optimization:

Why can find a network with good accuracy on training data?

  • Generalization:

Why the network also accurate on new test instances?

slide-4
SLIDE 4

Fundamental Questions

  • Optimization:

Why can find a network with good accuracy on training data?

  • Generalization:

Why the network also accurate on new test instances?

  • Key challenge: the optimization is non-convex

Theoretically hard but practically not difficult!

slide-5
SLIDE 5

Mystery I: Over-Parameterization Helps Optimization

  • Empirical observation: easier to train wider networks

Train a larger network Ground truth

… …

On the Computational Efficiency of Training Neural Networks. Roi Livni, Shai Shalev-Shwartz, Ohad Shamir. NeurIPS 2014.

Synthetic data

slide-6
SLIDE 6

Mystery II: Practical DNNs Easily Fit Random Labels

  • Empirical observation: practical DNNs easily fit random labels

Understanding deep learning requires rethinking generalization. Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals. ICLR 2017.

slide-7
SLIDE 7

Mystery II: Practical DNNs Easily Fit Random Labels

  • Empirical observation: practical DNNs easily fit random labels

Understanding deep learning requires rethinking generalization. Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals. ICLR 2017.

slide-8
SLIDE 8

Our Work

Is there a simple theoretical explanation?

slide-9
SLIDE 9

Our Work

Is there a simple theoretical explanation? Our work: Yes for two-layer NN on clustered data!

slide-10
SLIDE 10

Our Work

Is there a simple theoretical explanation? Our work: Yes for two-layer NN on clustered data! Poster: Tue Poster Session A #143