linear models for statistical learning regression
play

Linear Models for Statistical Learning, Regression David Dalpiaz - PowerPoint PPT Presentation

Linear Models for Statistical Learning, Regression David Dalpiaz STAT 430, Fall 2017 1 Announcements Homework 01 due today. Homework 02 released later today. (Hopefully.) 2 Statistical Learning Supervised Learning Regression


  1. Linear Models for Statistical Learning, Regression David Dalpiaz STAT 430, Fall 2017 1

  2. Announcements • Homework 01 due today. • Homework 02 released later today. (Hopefully.) 2

  3. Statistical Learning • Supervised Learning • Regression • Classification • Unsupervised Learning 3

  4. Regression Setup Y = f ( x 1 , x 2 , x 3 , . . . x p ) + ǫ numeric response = signal + noise • Want to learn the signal • Want to be very careful not to “learn noise” 4

  5. Using a Linear Model Setup: Y = f ( x 1 , x 2 , x 3 , . . . x p ) + ǫ Assume: f ( x 1 , x 2 , x 3 , . . . x p ) = β 0 + β 1 x 1 + β 2 x 2 + . . . + β p x p 5

  6. The Linear Model ǫ ∼ N (0 , σ 2 ) Y = β 0 + β 1 x 1 + β 2 x 2 + . . . + β p x p + ǫ, Y | X ∼ N ( β 0 + β 1 x 1 + β 2 x 2 + . . . + β p x p , σ 2 ) There are a total of p + 2 parameters in this model • The p + 1 β parameters, or coefficients, control the signal • The σ 2 controls the noise 6

  7. Fitting a Linear Model This is a parametric model, meaning to fit the model, we need to estimate the parameters. For the sake of making predictions, we only need to estimate the β parameters since ˆ y ( x 1 , x 2 , x 3 , . . . x p ) = ˆ β 0 +ˆ β 1 x 1 +ˆ β 2 x 2 + . . . +ˆ f ( x 1 , x 2 , x 3 , . . . x p ) = ˆ β p x p Using either least squares or maximum likelihood , this becomes the same optimization problem n � ( y i − ( β 0 + β 1 x i 1 + β 2 x i 2 + · · · + β p x ip )) 2 argmin β 0 ,β 1 ,...β p i =1 7

  8. Estimating σ 2 While it is not needed to make predictions, to fully estimate the model, we would also need to estimate σ 2 . n 1 s 2 � y i ) 2 e = ( y i − ˆ Least Squares n − ( p + 1) i =1 n σ 2 = 1 � y i ) 2 ˆ ( y i − ˆ MLE n i =1 Both are estimates of σ 2 . What is the difference? 8

  9. Model “Size” Consider two models: Y = β 0 + β 1 x 1 + β 2 x 2 + ǫ Y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 4 + ǫ Which is bigger? 9

  10. Model Complexity In general, we are interested in the complexity or flexibility of a model. For nested linear models, the more parameters, the bigger, thus, more complex. Models that are more complex will be more wiggly . 10

  11. Pictures of Complexity Go to ISL Slides 11

  12. Test-Train Split We’ve already discussed the Test-Train Split and RMSE � � 1 � 2 � RMSE Train = RMSE(ˆ � y i − ˆ � f , Train Data) = f ( x i ) � n Tr i ∈ Train � � 1 � 2 � � RMSE Test = RMSE(ˆ � y i − ˆ f , Test Data) = f ( x i ) � n Te i ∈ Test 12

  13. Overfitting • Overfitting occurs when a model is too complex (too flexible) for the data • Underfitting occurs when a model is not complex enough (too inflexible) for the data 13

  14. Train RMSE Prediction Error vs Model Complexity 3.0 2.5 2.0 Error (RMSE) 1.5 1.0 0.5 0.0 0 20 40 60 80 100 Complexity (Parameters) 14

  15. (Expected) Test RMSE Prediction Error vs Model Complexity 3.0 2.5 2.0 Error (RMSE) 1.5 1.0 0.5 (Expected) Test 0.0 Train 0 20 40 60 80 100 Complexity (Parameters) 15

  16. The “Best” Model • Pick the model with the lowest Test RMSE • Compared to this. . . • More complex models with higher Test RMSE are Overfitting • Less complex models with higher Test RMSE are Underfitting • This is only a “guess” of the “best” model based on available information • In practice, Test RMSE might not be such a nice curve • This is due to the randomness of the split • You could get lucky, or unlucky 16

  17. Explanation vs Prediction • Sometimes we check model assumptions directly • When predicting, we make assumptions and check them indirectly • If we assume a correct (or close to correct) form of the model, the Test RMSE will be low 17

  18. If Time. . . • rmarkdown Tables • Using code from the Internet • Back to Test-Train Split Lab • What would be a good Test RMSE? • Overfitting: n vs p • Randomness of Split • Pseudo RNG 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend