Linear Regression
4/14/17
Linear Regression 4/14/17 Hypothesis Space Supervised learning - - PowerPoint PPT Presentation
Linear Regression 4/14/17 Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number, not a category label The learned model: A linear function mapping
4/14/17
Supervised learning
Regression
The learned model:
In two dimensions: In d dimensions: We want to find the linear model that fits our data best. When have we seen a model like this before? f(x) = wx + b ~ x ≡ x0 x1 . . . xd f(~ x) = wb w0 . . . wd · 1 x0 . . . xd
We want to find the linear model that fits our data best. Key idea: model data as linear model plus noise. Pick the weights to minimize noise magnitude. f(~ x) = wb w0 . . . wd · 1 x0 . . . xd + ✏
ˆ f(~ x) = wb w0 . . . wd · 1 x0 . . . xd f(~ x) = wb w0 . . . wd · 1 x0 . . . xd + ✏
Define error for a data point to be the squared distance between correct output and predicted output: Error for the model is the sum of point errors: ⇣ f(~ x) − ˆ f(~ x) ⌘2 = ✏2 X
~ x∈data
⇣ y − ˆ f(~ x) ⌘ = X
~ x∈data
✏2
~ x
Goal: pick weights that minimize squared error. Approach #1: gradient descent Your reading showed how to do this for 1D inputs:
Goal: pick weights that minimize squared error. Approach #2 (the right way): analytical solution
~ w = ⇣ XT X ⌘−1 XT~ y
X ≡ ⇥~ x0 ~ x1 . . . ~ xn ⇤ ≡ 2 6 6 6 6 6 4 1 1 . . . 1 x00 x01 . . . x0n x10 x11 . . . x1n . . . . . . ... . . . xd0 xd1 . . . xdn 3 7 7 7 7 7 5
Polynomial regression is just linear regression with a change of basis. Perform linear regression on the new representation.
x0 x1 . . . xd − → x0 (x0)2 x1 (x1)2 . . . xd (xd)2 x0 x1 . . . xd − → x0 (x0)2 (x0)3 x1 (x1)2 (x1)3 . . . xd (xd)2 (xd)3
quadratic basis cubic basis
Recall from KNN: locally weighted averaging We can apply the same idea here: points that are further away should contribute less to the estimate. To estimate the value for a specific test point xt compute a linear regression with error weighted by distance:
X
~ x∈data
⇣ y − ˆ f(~ x) ⌘ dist(~ xt, ~ x) = X
~ x∈data
✏2
~ x
||~ xt − ~ x||2
Covers the machine learning portion of the class.
Know the differences between these topics. Know what algorithms apply to which problems.
analysis