Machine Learning Basics Lecture 6: Overfitting
Princeton University COS 495 Instructor: Yingyu Liang
Lecture 6: Overfitting Princeton University COS 495 Instructor: - - PowerPoint PPT Presentation
Machine Learning Basics Lecture 6: Overfitting Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data , : 1 i.i.d. from distribution
Princeton University COS 495 Instructor: Yingyu Liang
๐ ๐ =
1 ๐ ฯ๐=1 ๐
๐(๐, ๐ฆ๐, ๐ง๐)
๐ ๐ = ๐ฝ ๐ฆ,๐ง ~๐ธ[๐(๐, ๐ฆ, ๐ง)]
Feature mapping Gradient descent; convex optimization Occamโs razor
Maximum Likelihood
๐ฆ1 ๐ฆ2 ๐ฆ1
2
๐ฆ2
2
2๐ฆ1๐ฆ2 2๐๐ฆ1 2๐๐ฆ2 ๐ ๐ง = sign(๐ฅ๐๐(๐ฆ) + ๐) Polynomial kernel
1))
๐ข = sin 2๐๐ฆ + ๐
Figure from Machine Learning and Pattern Recognition, Bishop
Figure from Machine Learning and Pattern Recognition, Bishop
๐ข = sin 2๐๐ฆ + ๐ Regression using polynomial of degree M
Figure from Machine Learning and Pattern Recognition, Bishop
๐ข = sin 2๐๐ฆ + ๐
Figure from Machine Learning and Pattern Recognition, Bishop
๐ข = sin 2๐๐ฆ + ๐
๐ข = sin 2๐๐ฆ + ๐
Figure from Machine Learning and Pattern Recognition, Bishop
Figure from Machine Learning and Pattern Recognition, Bishop
difference between the two
difference between the two
Use prior knowledge/model to prune hypotheses Use experience/data to prune hypotheses
๐โ: the best function
1}
๐โ: the best function
๐โ: the best function
๐โ: the best function ๐1 ๐2
๐ = {๐โ} Infinite data practice
Figure from Deep Learning, Goodfellow, Bengio and Courville
Figure from Machine Learning and Pattern Recognition, Bishop
hyper-parameters