linear regression
play

Linear regression DS GA 1002 Statistical and Mathematical Models - PowerPoint PPT Presentation

Linear regression DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall15 Carlos Fernandez-Granda Linear models Least-squares estimation Overfitting Example: Global warming Regression The aim is


  1. Linear regression DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall15 Carlos Fernandez-Granda

  2. Linear models Least-squares estimation Overfitting Example: Global warming

  3. Regression The aim is to learn a function h that relates ◮ a response or dependent variable y ◮ to several observed variables x 1 , x 2 , . . . , x p , known as covariates, features or independent variables The response is assumed to be of the form y = h ( � x ) + z x ∈ R p contains the features and z is noise where �

  4. Linear regression The regression function h is assumed to be linear y ( i ) = � β ∗ + z ( i ) , x ( i ) T � 1 ≤ i ≤ n β ∗ ∈ R p from the data Our aim is to estimate �

  5. Linear regression In matrix form x ( 1 ) x ( 1 ) x ( 1 )     � y ( 1 ) z ( 1 )   � � · · · � β ∗   p 1 2 1 x ( 2 ) x ( 2 ) x ( 2 ) � y ( 2 ) z ( 2 ) · · · β ∗  � � �        p 2  = 1 2  +         · · · · · · · · ·  · · · · · · · · · · · ·          y ( n ) z ( n ) x ( n ) x ( n ) x ( n ) � β ∗ � � · · · � p p 1 2 Equivalently, β ∗ + � y = X � � z

  6. Linear model for GDP Population Unemployment GDP rate (%) (USD millions) California 38 332 521 5.5 2 448 467   Minnesota 5 420 380 4.0 334 780   Oregon 3 930 065 5.5 228 120     Nevada 2 790 136 5.8 141 204     Idaho 1 612 136 3.8 65 202     Alaska 735 132 6.9 54 256   South Carolina 4 774 839 4.9 ???

  7. Linear model for GDP After normalizing the features and the response  0 . 984   0 . 982 0 . 419  0 . 135 0 . 139 0 . 305         0 . 092 0 . 101 0 . 419     y := X := � ,     0 . 057 0 . 071 0 . 442         0 . 026 0 . 041 0 . 290     0 . 022 0 . 019 0 . 526 β ∈ R 2 such that � Aim: find � y ≈ X � β sc � x T The estimate for the GDP of South Carolina will be � β

  8. Linear models Least-squares estimation Overfitting Example: Global warming

  9. Least squares For fixed � β we can evaluate the error using n � 2 2 � y ( i ) − � � � � � x ( i ) T � � y − X � β = � � β � � � � � � � 2 i = 1 The least-squares estimate � β LS minimizes this cost function � � � � � y − X � β LS := arg min � � β � � � � � � � � 2 β

  10. Least-squares fit 1.2 Data Least-squares fit 1.0 0.8 0.6 y 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 x

  11. Linear model for GDP The least-squares estimate is � 1 . 010 � � β LS = − 0 . 019 GDP roughly proportional to the population Unemployment doesn’t help (linearly)

  12. Linear model for GDP GDP Estimate California 2 448 467 2 446 186   Minnesota 334 780 334 584   Oregon 228 120 233 460     Nevada 141 204 159 088     Idaho 65 202 90 345     Alaska 54 256 23 050   South Carolina 199 256 289 903

  13. Geometric interpretation ◮ Any vector X � β is in the span of the columns of X ◮ The least-squares estimate is the closest vector to � y that can be represented in this way ◮ This is the projection of � y onto the column space of X

  14. Geometric interpretation

  15. Probabilistic interpretation We model the noise as an iid Gaussian random vector � Z Entries have zero mean and variance σ 2 The data are a realization of the random vector Y := X � � β + � Z Y is Gaussian with mean X � � β and covariance matrix σ 2 I

  16. Likelihood The joint pdf of � Y is n � � 2 � 1 − 1 � � � � X � Y ( � a ) := √ exp � a i − f � β 2 σ 2 2 πσ i i = 1 1 � − 1 2 � � � � � a − X � = ( 2 π ) n σ n exp � � β � � � � 2 σ 2 � � � � 2 The likelihood is � � 1 − 1 2 � � � � � � � y − X � L � = ( 2 π ) n exp � � β β � � � � y � 2 � � � 2

  17. Maximum-likelihood estimate The maximum-likelihood estimate is � � � � β ML = arg max L � β y � β � � � = arg max log L � β y � β 2 � � � � y − X � = arg min � � β � � � � � � � � 2 β = � β LS

  18. Linear models Least-squares estimation Overfitting Example: Global warming

  19. Temperature predictor A friend tells you: I found a cool way to predict the temperature in New York: It’s just a linear combination of the temperature in every other state. I fit the model on data from the last month and a half and it’s perfect!

  20. Overfitting If a model is very complex, it may overfit the data To evaluate a model we separate the data into a training and a test set 1. We fit the model using the training set 2. We evaluate the error on the test set

  21. Experiment X train , X test , � z train and β are iid Gaussian with mean 0 and variance 1 β ∗ + � y train = X train � � z train y test = X test � � β ∗ y train and X train to compute � We use � β LS � � � � � X train � β LS − � y train � � � � � � � 2 error train = || � y train || 2 � � � � � X test � β LS − � y test � � � � � � � 2 error test = || � y test || 2

  22. Experiment 0.5 Error (training) Error (test) Noise level (training) 0.4 Relative error (l2 norm) 0.3 0.2 0.1 0.0 50 100 200 300 400 500 n

  23. Linear models Least-squares estimation Overfitting Example: Global warming

  24. Maximum temperatures in Oxford, UK 30 25 20 Temperature (Celsius) 15 10 5 0 1860 1880 1900 1920 1940 1960 1980 2000

  25. Maximum temperatures in Oxford, UK 25 20 Temperature (Celsius) 15 10 5 0 1900 1901 1902 1903 1904 1905

  26. Linear model � 2 π t � � 2 π t � y t ≈ � β 0 + � + � + � � β 1 cos β 2 sin β 3 t 12 12 1 ≤ t ≤ n is the time in months ( n = 12 · 150)

  27. Model fitted by least squares 30 25 20 Temperature (Celsius) 15 10 5 0 Data Model 1860 1880 1900 1920 1940 1960 1980 2000

  28. Model fitted by least squares 25 20 Temperature (Celsius) 15 10 5 Data Model 0 1900 1901 1902 1903 1904 1905

  29. Model fitted by least squares 25 20 Temperature (Celsius) 15 10 5 0 Data Model 5 1960 1961 1962 1963 1964 1965

  30. Trend: Increase of 0.75 ◦ C / 100 years (1.35 ◦ F) 30 25 20 Temperature (Celsius) 15 10 5 0 Data Trend 1860 1880 1900 1920 1940 1960 1980 2000

  31. Model for minimum temperatures 20 15 Temperature (Celsius) 10 5 0 5 Data Model 10 1860 1880 1900 1920 1940 1960 1980 2000

  32. Model for minimum temperatures 14 12 10 Temperature (Celsius) 8 6 4 2 0 Data Model 2 1900 1901 1902 1903 1904 1905

  33. Model for minimum temperatures 15 10 Temperature (Celsius) 5 0 5 Data Model 10 1960 1961 1962 1963 1964 1965

  34. Trend: Increase of 0.88 ◦ C / 100 years (1.58 ◦ F) 20 15 Temperature (Celsius) 10 5 0 5 Data Trend 10 1860 1880 1900 1920 1940 1960 1980 2000

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend