Linear regression DS GA 1002 Statistical and Mathematical Models - - PowerPoint PPT Presentation
Linear regression DS GA 1002 Statistical and Mathematical Models - - PowerPoint PPT Presentation
Linear regression DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall15 Carlos Fernandez-Granda Linear models Least-squares estimation Overfitting Example: Global warming Regression The aim is
Linear models Least-squares estimation Overfitting Example: Global warming
Regression
The aim is to learn a function h that relates
◮ a response or dependent variable y ◮ to several observed variables x1, x2, . . . , xp, known as covariates,
features or independent variables The response is assumed to be of the form y = h ( x) + z where x ∈ Rp contains the features and z is noise
Linear regression
The regression function h is assumed to be linear y(i) = x (i) T β∗ + z(i), 1 ≤ i ≤ n Our aim is to estimate β∗ ∈ Rp from the data
Linear regression
In matrix form y(1) y(2) · · · y(n) =
- x (1)
1
- x (1)
2
· · ·
- x (1)
p
- x (2)
1
- x (2)
2
· · ·
- x (2)
p
· · · · · · · · · · · ·
- x (n)
1
- x (n)
2
· · ·
- x (n)
p
- β∗
1
- β∗
2
· · ·
- β∗
p
+ z(1) z(2) · · · z(n) Equivalently,
- y = X
β∗ + z
Linear model for GDP
Population Unemployment GDP rate (%) (USD millions) California 38 332 521 5.5 2 448 467 Minnesota 5 420 380 4.0 334 780 Oregon 3 930 065 5.5 228 120 Nevada 2 790 136 5.8 141 204 Idaho 1 612 136 3.8 65 202 Alaska 735 132 6.9 54 256 South Carolina 4 774 839 4.9 ???
Linear model for GDP
After normalizing the features and the response
- y :=
0.984 0.135 0.092 0.057 0.026 0.022 , X := 0.982 0.419 0.139 0.305 0.101 0.419 0.071 0.442 0.041 0.290 0.019 0.526 Aim: find β ∈ R2 such that y ≈ X β The estimate for the GDP of South Carolina will be x T
sc
β
Linear models Least-squares estimation Overfitting Example: Global warming
Least squares
For fixed β we can evaluate the error using
n
- i=1
- y(i) −
x (i) T β 2 =
- y − X
β
- 2
2
The least-squares estimate βLS minimizes this cost function
- βLS := arg min
- β
- y − X
β
- 2
Least-squares fit
0.0 0.2 0.4 0.6 0.8 1.0 1.2
x
0.0 0.2 0.4 0.6 0.8 1.0 1.2
y
Data Least-squares fit
Linear model for GDP
The least-squares estimate is
- βLS =
1.010 −0.019
- GDP roughly proportional to the population
Unemployment doesn’t help (linearly)
Linear model for GDP
GDP Estimate California 2 448 467 2 446 186 Minnesota 334 780 334 584 Oregon 228 120 233 460 Nevada 141 204 159 088 Idaho 65 202 90 345 Alaska 54 256 23 050 South Carolina 199 256 289 903
Geometric interpretation
◮ Any vector X
β is in the span of the columns of X
◮ The least-squares estimate is the closest vector to
y that can be represented in this way
◮ This is the projection of
y onto the column space of X
Geometric interpretation
Probabilistic interpretation
We model the noise as an iid Gaussian random vector Z Entries have zero mean and variance σ2 The data are a realization of the random vector
- Y := X
β + Z
- Y is Gaussian with mean X
β and covariance matrix σ2I
Likelihood
The joint pdf of Y is f
Y (
a) :=
n
- i=1
1 √ 2πσ exp
- − 1
2σ2
- ai −
- X
β
- i
2 = 1
- (2π)nσn exp
- − 1
2σ2
- a − X
β
- 2
2
- The likelihood is
L
y
- β
- =
1
- (2π)n exp
- −1
2
- y − X
β
- 2
2
Maximum-likelihood estimate
The maximum-likelihood estimate is
- βML = arg max
- β
L
y
- β
- = arg max
- β
log L
y
- β
- = arg min
- β
- y − X
β
- 2
2
= βLS
Linear models Least-squares estimation Overfitting Example: Global warming
Temperature predictor
A friend tells you: I found a cool way to predict the temperature in New York: It’s just a linear combination of the temperature in every other state. I fit the model on data from the last month and a half and it’s perfect!
Overfitting
If a model is very complex, it may overfit the data To evaluate a model we separate the data into a training and a test set
- 1. We fit the model using the training set
- 2. We evaluate the error on the test set
Experiment
Xtrain, Xtest, ztrain and β are iid Gaussian with mean 0 and variance 1
- ytrain = Xtrain
β∗ + ztrain
- ytest = Xtest
β∗ We use ytrain and Xtrain to compute βLS errortrain =
- Xtrain
βLS − ytrain
- 2
|| ytrain||2 errortest =
- Xtest
βLS − ytest
- 2
|| ytest||2
Experiment
100 200 300 400 500 50 n 0.0 0.1 0.2 0.3 0.4 0.5 Relative error (l2 norm) Error (training) Error (test) Noise level (training)
Linear models Least-squares estimation Overfitting Example: Global warming
Maximum temperatures in Oxford, UK
1860 1880 1900 1920 1940 1960 1980 2000 5 10 15 20 25 30 Temperature (Celsius)
Maximum temperatures in Oxford, UK
1900 1901 1902 1903 1904 1905 5 10 15 20 25 Temperature (Celsius)
Linear model
- yt ≈
β0 + β1 cos 2πt 12
- +
β2 sin 2πt 12
- +
β3 t 1 ≤ t ≤ n is the time in months (n = 12 · 150)
Model fitted by least squares
1860 1880 1900 1920 1940 1960 1980 2000 5 10 15 20 25 30 Temperature (Celsius) Data Model
Model fitted by least squares
1900 1901 1902 1903 1904 1905 5 10 15 20 25 Temperature (Celsius) Data Model
Model fitted by least squares
1960 1961 1962 1963 1964 1965 5 5 10 15 20 25 Temperature (Celsius) Data Model
Trend: Increase of 0.75 ◦C / 100 years (1.35 ◦F)
1860 1880 1900 1920 1940 1960 1980 2000 5 10 15 20 25 30 Temperature (Celsius) Data Trend
Model for minimum temperatures
1860 1880 1900 1920 1940 1960 1980 2000 10 5 5 10 15 20 Temperature (Celsius) Data Model
Model for minimum temperatures
1900 1901 1902 1903 1904 1905 2 2 4 6 8 10 12 14 Temperature (Celsius) Data Model
Model for minimum temperatures
1960 1961 1962 1963 1964 1965 10 5 5 10 15 Temperature (Celsius) Data Model
Trend: Increase of 0.88 ◦C / 100 years (1.58 ◦F)
1860 1880 1900 1920 1940 1960 1980 2000 10 5 5 10 15 20 Temperature (Celsius) Data Trend