Machine Learning Basics Lecture 1: Linear Regression
Princeton University COS 495 Instructor: Yingyu Liang
Lecture 1: Linear Regression Princeton University COS 495 - - PowerPoint PPT Presentation
Machine Learning Basics Lecture 1: Linear Regression Princeton University COS 495 Instructor: Yingyu Liang Machine learning basics What is machine learning? A computer program is said to learn from experience E with respect to some
Princeton University COS 495 Instructor: Yingyu Liang
to some class of tasks T and performance measure P, if its performance at tasks in T as measured by P, improves with experience E.β
indoor
Experience/Data: images with labels Indoor
ImageNet figure borrowed from vision.standford.edu
Color Histogram
Red Green Blue
Feature vector: π¦π Label: π§π Extract features
Color Histogram
Red Green Blue
1
Feature vector: π¦π Label: π§π Extract features
What kind of functions?
Hypothesis class
Connection between training data and test data?
They have the same distribution i.i.d.: independently identically distributed
What kind of performance measure?
π π = π½ π¦,π§ ~πΈ[π(π, π¦, π§)]
Various loss functions
π π = π½ π¦,π§ ~πΈ[π(π, π¦, π§)]
π π = π½ π¦,π§ ~πΈ[π(π, π¦, π§)]
How to use?
π π =
1 π Οπ=1 π
π(π, π¦π, π§π)
π π = π½ π¦,π§ ~πΈ[π(π, π¦, π§)]
Empirical loss
Intelligence
2013 2014 2015 2016
Stock Market (Disclaimer: synthetic data/in another parallel universe)
Orange MacroHard Ackermann
Sliding window over time: serve as input π¦; non-i.i.d.
Real data: Prostate Cancer by Stamey et al. (1989)
Figure borrowed from The Elements of Statistical Learning
π§: prostate
specific antigen
(π¦1, β¦ , π¦8): clinical measures
π₯ π¦ = π₯ππ¦ that minimizes ΰ·
π π
π₯ = 1 π Οπ=1 π
π₯ππ¦π β π§π 2
π2 loss; also called mean square error
Hypothesis class π
π₯ π¦ = π₯ππ¦ that minimizes ΰ·
π π
π₯ = 1 π Οπ=1 π
π₯ππ¦π β π§π 2
π, π§ be the vector π§1, β¦ , π§π π
ΰ· π π
π₯ = 1
π ΰ·
π=1 π
π₯ππ¦π β π§π 2 = 1 π β¦ππ₯ β π§ β¦2
2
πΌ
π₯ ΰ·
π π
π₯ = πΌ π₯
1 π β¦ππ₯ β π§ β¦2
2 = 0
πΌ
π₯[ ππ₯ β π§ π(ππ₯ β π§)] = 0
πΌ
π₯[ π₯πππππ₯ β 2π₯ππππ§ + π§ππ§] = 0
2ππππ₯ β 2πππ§ = 0 w = πππ β1πππ§
π π₯ = π§ πππ π₯ = πππ§ Normal equation: w = πππ β1πππ§
π₯,π π¦ = π₯ππ¦ + π to minimize the loss
π₯,π π¦ = π₯ππ¦ + π = π₯β² π(π¦β²)
Bias term