Linear Regression
Jia-Bin Huang Virginia Tech
Spring 2019
ECE-5424G / CS-5824
Linear Regression Jia-Bin Huang Virginia Tech Spring 2019 - - PowerPoint PPT Presentation
Linear Regression Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative Office hour Chen Gao Shih-Yang Su Feedback (Thanks!) Notation? More descriptive slides? Video/audio recording? TA hours
Jia-Bin Huang Virginia Tech
Spring 2019
ECE-5424G / CS-5824
Supervised Learning Unsupervised Learning Discrete Classification Clustering Continuous Regression Dimensionality reduction
๐ฆ 1 , ๐ง 1 , ๐ฆ 2 , ๐ง 2 , โฏ , ๐ฆ ๐ , ๐ง ๐
Do nothing.
โ ๐ฆ = ๐ง(๐), where ๐ = argmini ๐ธ(๐ฆ, ๐ฆ(๐))
Slide credit: Carlos Guestrin
Slide credit: CS231 @ Stanford
Slide credit: CS231 @ Stanford
Hypothesis Size of house Estimated price
Regression real-valued output
Price ($) in 1000โs 500 1000 1500 2000 2500 100 200 300 400 Size in feet^2
Size in feet^2 (x) Price ($) in 1000โs (y) 2104 460 1416 232 1534 315 852 178 โฆ โฆ
๐ = 47 Examples: ๐ฆ(1) = 2104 ๐ฆ(2) = 1416 ๐ง(1) = 460
Slide credit: Andrew Ng
๐ง = โ๐ ๐ฆ = ๐0 + ๐1๐ฆ
Shorthand โ ๐ฆ
Training set Learning Algorithm โ ๐ฆ ๐ง
Hypothesis Size of house Estimated price
Price ($) in 1000โs 500 1000 1500 2000 2500 100 200 300 400 Size in feet^2
Univariate linear regression
Slide credit: Andrew Ng
Size in feet^2 (x) Price ($) in 1000โs (y) 2104 460 1416 232 1534 315 852 178 โฆ โฆ
๐ = 47
Slide credit: Andrew Ng
1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
๐0 = 1.5 ๐1 = 0 ๐0 = 0 ๐1 = 0.5 ๐0 = 1 ๐1 = 0.5
๐ฆ ๐ง ๐ฆ ๐ง ๐ฆ ๐ง
Slide credit: Andrew Ng
Choose ๐0, ๐1 so that โ๐ ๐ฆ is close to ๐ง for our training example (๐ฆ, ๐ง)
Price ($) in 1000โs 500 1000 1500 2000 2500 100 200 300 400 Size in feet^2
๐ฆ ๐ง
โ๐ ๐ฆ ๐ = ๐0 + ๐1๐ฆ(๐)
๐พ ๐0, ๐1 = 1 2๐ เท
๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐
2
minimize
1 2๐ ฯ๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐
2
๐0, ๐1
minimize ๐พ ๐0, ๐1
๐0, ๐1
Cost function
Slide credit: Andrew Ng
โ๐ ๐ฆ = ๐0 + ๐1๐ฆ
๐0, ๐1
๐พ ๐0, ๐1 = 1 2๐ เท
๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐
2
minimize ๐พ ๐0, ๐1
๐0, ๐1
โ๐ ๐ฆ = ๐1๐ฆ ๐0 = 0
๐1
๐พ ๐1 = 1 2๐ เท
๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐
2
minimize ๐พ ๐1
๐0, ๐1
Slide credit: Andrew Ng
1 2 3 1 2 3 ๐ฆ ๐ง 1 2 3 1 2 3 ๐พ ๐1 ๐1
Slide credit: Andrew Ng
1 2 3 1 2 3 ๐ฆ ๐ง 1 2 3 1 2 3 ๐พ ๐1 ๐1
Slide credit: Andrew Ng
1 2 3 1 2 3 ๐ฆ ๐ง 1 2 3 1 2 3 ๐พ ๐1 ๐1
Slide credit: Andrew Ng
1 2 3 1 2 3 ๐ฆ ๐ง 1 2 3 1 2 3 ๐พ ๐1 ๐1
Slide credit: Andrew Ng
1 2 3 1 2 3 ๐ฆ ๐ง 1 2 3 1 2 3 ๐พ ๐1 ๐1
Slide credit: Andrew Ng
1 2๐ ฯ๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐
2
๐0, ๐1
Slide credit: Andrew Ng
Slide credit: Andrew Ng
How do we find good ๐0, ๐1 that minimize ๐พ ๐0, ๐1 ?
Slide credit: Andrew Ng
Have some function ๐พ ๐0, ๐1 Want argmin ๐พ ๐0, ๐1 Outline:
until we hopefully end up at minimum
๐0, ๐1
Slide credit: Andrew Ng
Slide credit: Andrew Ng
๐ โ ๐ ๐ โ ๐ฝ ๐ ๐๐๐ ๐พ ๐0, ๐1
๐ ๐๐๐ ๐พ ๐0, ๐1 : derivative (rate of change)
Slide credit: Andrew Ng
Incorrect: temp0 โ ๐0 โ๐ฝ ๐ ๐๐0 ๐พ ๐0, ๐1 ๐0 โ temp0 temp1 โ ๐1 โ๐ฝ ๐ ๐๐1 ๐พ ๐0, ๐1 ๐1 โ temp1 Correct: simultaneous update temp0 โ ๐0 โ๐ฝ ๐ ๐๐0 ๐พ ๐0, ๐1 temp1 โ ๐1 โ๐ฝ ๐ ๐๐1 ๐พ ๐0, ๐1 ๐0 โ temp0 ๐1 โ temp1
Slide credit: Andrew Ng
1 2 3 1 2 3 ๐พ ๐1 ๐1
๐ ๐๐1 ๐พ ๐1 > 0 ๐ ๐๐1 ๐พ ๐1 < 0
Slide credit: Andrew Ng
Repeat until convergence{ ๐
๐ โ ๐ ๐ โ ๐ฝ ๐ ๐๐๐ ๐พ ๐0, ๐1
(for ๐ = 0 and ๐ = 1) }
โ๐ ๐ฆ = ๐0 + ๐1๐ฆ ๐พ ๐0, ๐1 = 1 2๐ เท
๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐
2
Slide credit: Andrew Ng
๐๐๐ ๐พ ๐0, ๐1 = ๐ ๐๐๐ 1 2๐ ฯ๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐
2
=
๐ ๐๐๐ 1 2๐ ฯ๐=1 ๐
๐0 + ๐1๐ฆ(๐) โ ๐ง ๐
2
๐ ๐๐0 ๐พ ๐0, ๐1 = 1 ๐ ฯ๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐
๐ ๐๐1 ๐พ ๐0, ๐1 = 1 ๐ ฯ๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐ ๐ฆ ๐
Slide credit: Andrew Ng
Repeat until convergence{ ๐0 โ ๐0 โ ๐ฝ 1 ๐ เท
๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐ ๐1 โ ๐1 โ ๐ฝ 1 ๐ เท
๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐ ๐ฆ ๐ } Update ๐0 and ๐1 simultaneously
Slide credit: Andrew Ng
Repeat until convergence{ ๐0 โ ๐0 โ ๐ฝ 1 ๐ เท
๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐ ๐1 โ ๐1 โ ๐ฝ 1 ๐ เท
๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐ ๐ฆ ๐ }
๐: Number of training examples
Slide credit: Andrew Ng
Size in feet^2 (x) Price ($) in 1000โs (y) 2104 460 1416 232 1534 315 852 178 โฆ โฆ
โ๐ ๐ฆ = ๐0 + ๐1๐ฆ
Slide credit: Andrew Ng
Size in feet^2 (๐ฆ1) Number of bedrooms (๐ฆ2) Number of floors (๐ฆ3) Age of home (years) (๐ฆ4) Price ($) in 1000โs (y) 2104 5 1 45 460 1416 3 2 40 232 1534 3 2 30 315 852 2 1 36 178 โฆ โฆ Notation: ๐ = Number of features ๐ฆ(๐)= Input features of ๐๐ขโ training example ๐ฆ๐
(๐)= Value of feature ๐ in ๐๐ขโ training example
๐ฆ3
(2) =?
๐ฆ3
(4) =?
Slide credit: Andrew Ng
Previously: โ๐ ๐ฆ = ๐0 + ๐1๐ฆ Now: โ๐ ๐ฆ = ๐0 + ๐1๐ฆ1 + ๐2๐ฆ2 +๐3 ๐ฆ3 + ๐4๐ฆ4
Slide credit: Andrew Ng
(๐ฆ0
(๐) = 1 for all examples)
๐ฆ0 ๐ฆ1 ๐ฆ2 โฎ ๐ฆ๐ โ ๐๐+1 ๐พ = ๐0 ๐1 ๐2 โฎ ๐๐ โ ๐๐+1
= ๐พโค๐
Slide credit: Andrew Ng
Repeat until convergence{ ๐0 โ ๐0 โ ๐ฝ 1 ๐ เท
๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐ ๐1 โ ๐1 โ ๐ฝ 1 ๐ เท
๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐ ๐ฆ ๐ }
Repeat until convergence{ ๐
๐ โ ๐ ๐ โ ๐ฝ 1
๐ เท
๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐ ๐ฆ๐
๐
} Simultaneously update ๐
๐, for ๐ = 0, 1, โฏ , ๐
Slide credit: Andrew Ng
๐ฆ2 = number of bedrooms (1-5)
1 2 3 1 2 3 ๐1 ๐2 1 2 3 1 2 3 ๐1 ๐2
Slide credit: Andrew Ng
0.001, โฆ 0.01, โฆ, 0.1, โฆ , 1
Image credit: CS231n@Stanford
๐ฆ = frontage ร depth
Slide credit: Andrew Ng
= ๐0 + ๐1 ๐ก๐๐จ๐ + ๐2 ๐ก๐๐จ๐ 2 + ๐3 ๐ก๐๐จ๐ 3
Price ($) in 1000โs 500 1000 1500 2000 2500 100 200 300 400 Size in feet^2
๐ฆ1 = (size) ๐ฆ2 = (size)^2 ๐ฆ3 = (size)^3
Slide credit: Andrew Ng
(๐ฆ0) Size in feet^2 (๐ฆ1) Number of bedrooms (๐ฆ2) Number of floors (๐ฆ3) Age of home (years) (๐ฆ4) Price ($) in 1000โs (y) 1 2104 5 1 45 460 1 1416 3 2 40 232 1 1534 3 2 30 315 1 852 2 1 36 178 โฆ โฆ
๐ง = 460 232 315 178
Slide credit: Andrew Ng
1 2๐ ฯ๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐
2
=
1 2๐ ฯ๐=1 ๐
๐โค๐ฆ(๐) โ ๐ง ๐
2
=
1 2๐ ๐๐ โ ๐ง 2 2
๐๐ ๐พ ๐ = 0
๐พ ๐ = 1 2๐ เท
๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐
2 = 1
๐ เท
๐=1 ๐
๐๐๐ก โ๐ ๐ฆ ๐ , ๐ง ๐
๐ง =
1 2 ๐ง โ เท
๐ง 2
2: Least squares loss
1 ๐ เท
๐=1 ๐
๐๐๐ก ๐ง ๐ , เท ๐ง
๐๐ ๐ง ๐ ๐ฆ ๐ = 1 2๐๐2 exp(โ 1 2๐2 (๐ง ๐ โ ๐โค๐ฆ ๐ )
argmin
๐
เท
๐=1 ๐
๐๐ ๐ง ๐ ๐ฆ ๐ argmin
๐
log(เท
๐=1 ๐
๐ ๐ง ๐ ๐ฆ ๐ ) = argmin
๐
1 2๐2 เท
๐=1 ๐ 1
2 ๐โค๐ฆ ๐ โ ๐ง ๐
2
Image credit: CS 446@UIUC
๐ = 1 โ ๐(1) โ 1 โฎ 1 โ ๐(2) โ โฎ โ ๐(๐) โ = โ ๐๐ โ ๐๐ โ ๐๐ โฏ โ ๐๐ โ โ โ โ
๐
column space of ๐
๐๐พ ๐๐พ โ ๐
Slide credit: Andrew Ng
โ๐ ๐ฆ = ๐0 + ๐1๐ฆ1 + ๐2๐ฆ2 + โฏ + ๐๐๐ฆ๐ = ๐โค๐ฆ
๐พ ๐ =
1 2๐ ฯ๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐
2
Repeat until convergence {๐
๐ โ ๐ ๐ โ ๐ฝ 1 ๐ ฯ๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐ ๐ฆ๐
๐ }
Can combine features; can use different functions to generate features (e.g., polynomial)