e v al u ating a model graphicall y
play

E v al u ating a model graphicall y SU P E R VISE D L E AR N IN G - PowerPoint PPT Presentation

E v al u ating a model graphicall y SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z u mel and John Mo u nt Win - Vector LLC Plotting Gro u nd Tr u th v s . Predictions A w ell ing model A poorl y ing model x = y line r


  1. E v al u ating a model graphicall y SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z u mel and John Mo u nt Win - Vector LLC

  2. Plotting Gro u nd Tr u th v s . Predictions A w ell � � ing model A poorl y � � ing model x = y line r u ns thro u gh Points are all on one side of center of points x = y line " line of perfect prediction " S y stematic errors SUPERVISED LEARNING IN R : REGRESSION

  3. The Resid u al Plot A w ell � � ing model A poorl y � � ing model Resid u al : act u al o u tcome - S y stematic errors prediction Good � t : no s y stematic errors SUPERVISED LEARNING IN R : REGRESSION

  4. The Gain C u r v e Meas u res ho w w ell model sorts the o u tcome x- a x is : ho u ses in model - sorted order ( decreasing ) y- a x is : fraction of total acc u m u lated home sales Wi z ard c u r v e : perfect model SUPERVISED LEARNING IN R : REGRESSION

  5. Reading the Gain C u r v e GainCurvePlot(houseprices, "prediction", "price", "Home price model") SUPERVISED LEARNING IN R : REGRESSION

  6. Let ' s practice ! SU P E R VISE D L E AR N IN G IN R : R E G R E SSION

  7. Root Mean Sq u ared Error ( RMSE ) SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z u mel and John Mo u nt Win - Vector LLC

  8. What is Root Mean Sq u ared Error ( RMSE )? √ ( pred − y ) 2 RMSE = w here pred − y : the error , or resid u als v ector 2 ( pred − y ) 2 : mean v al u e of ( pred − y ) SUPERVISED LEARNING IN R : REGRESSION

  9. RMSE of the Home Sales Price Model # Calculate error err <- houseprices$prediction - houseprices$price price : col u mn of act u al sale prices ( in tho u sands ) prediction : col u mn of predicted sale prices ( in tho u sands ) SUPERVISED LEARNING IN R : REGRESSION

  10. RMSE of the Home Sales Price Model # Calculate error err <- houseprices$prediction - houseprices$price # Square the error vector err2 <- err^2 SUPERVISED LEARNING IN R : REGRESSION

  11. RMSE of the Home Sales Price Model # Calculate error err <- houseprices$prediction - houseprices$price # Square the error vector err2 <- err^2 # Take the mean, and sqrt it (rmse <- sqrt(mean(err2))) 58.33908 RMSE ≈ 58.3 SUPERVISED LEARNING IN R : REGRESSION

  12. Is the RMSE Large or Small ? # Take the mean, and sqrt it (rmse <- sqrt(mean(err2))) 58.33908 # The standard deviation of the outcome (sdtemp <- sd(houseprices$price)) 135.2694 RMSE ≈ 58.3 sd ( price ) ≈ 135 SUPERVISED LEARNING IN R : REGRESSION

  13. Let ' s practice ! SU P E R VISE D L E AR N IN G IN R : R E G R E SSION

  14. 2 R - Sq u ared ( R ) SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z u mel and John Mo u nt Win - Vector LLC

  15. 2 What is R ? A meas u re of ho w w ell the model � ts or e x plains the data A v al u e bet w een 0-1 near 1: model � ts w ell near 0: no be � er than g u essing the a v erage v al u e SUPERVISED LEARNING IN R : REGRESSION

  16. 2 Calc u lating R 2 R is the v ariance e x plained b y the model . RSS 2 R = 1 − SS Tot w here 2 RSS = ( y − prediction ) ∑ Resid u al s u m of sq u ares (v ariance from model ) 2 = ( y − ) ∑ SS y Tot Total s u m of sq u ares (v ariance of data ) SUPERVISED LEARNING IN R : REGRESSION

  17. 2 Calc u late R of the Ho u se Price Model : RSS Calc u late error err <- houseprices$prediction - houseprices$price Sq u are it and take the s u m rss <- sum(err^2) price : col u mn of act u al sale prices ( in tho u sands ) pred : col u mn of predicted sale prices ( in tho u sands ) RSS ≈ 136138 SUPERVISED LEARNING IN R : REGRESSION

  18. 2 Calc u late R of the Ho u se Price Model : SS Tot Take the di � erence of prices from the mean price toterr <- houseprices$price - mean(houseprices$price) Sq u are it and take the s u m sstot <- sum(toterr^2) RSS ≈ 136138 ≈ 713615 SS Tot SUPERVISED LEARNING IN R : REGRESSION

  19. 2 Calc u late R of the Ho u se Price Model (r_squared <- 1 - (rss/sstot) ) 0.8092278 RSS ≈ 136138 ≈ 713615 SS Tot 2 R ≈ 0.809 SUPERVISED LEARNING IN R : REGRESSION

  20. 2 Reading R from the lm () model # From summary() summary(hmodel) ... Residual standard error: 60.66 on 37 degrees of freedom Multiple R-squared: 0.8092, Adjusted R-squared: 0.7989 F-statistic: 78.47 on 2 and 37 DF, p-value: 4.893e-14 summary(hmodel)$r.squared 0.8092278 # From glance() glance(hmodel)$r.squared 0.8092278 SUPERVISED LEARNING IN R : REGRESSION

  21. 2 Correlation and R rho <- cor(houseprices$prediction, houseprices$price) 0.8995709 rho^2 0.8092278 ρ = cor(prediction, price) = 0.8995709 2 2 ρ = 0.8092278 = R SUPERVISED LEARNING IN R : REGRESSION

  22. 2 Correlation and R Tr u e for models that minimi z e sq u ared error : Linear regression GAM regression Tree - based algorithms that minimi z e sq u ared error Tr u e for training data ; NOT tr u e for f u t u re application data SUPERVISED LEARNING IN R : REGRESSION

  23. Let ' s practice ! SU P E R VISE D L E AR N IN G IN R : R E G R E SSION

  24. Properl y Training a Model SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z u mel and John Mo u nt Win - Vector , LLC

  25. Models can perform m u ch better on training than the y do on f u t u re data . 2 2 Training R : 0.9; Test R : 0.15 -- O v er � t SUPERVISED LEARNING IN R : REGRESSION

  26. Test / Train Split Recommended method w hen data is plentif u l SUPERVISED LEARNING IN R : REGRESSION

  27. E x ample : Model Female Unemplo y ment Train on 66 ro w s , test on 30 ro w s SUPERVISED LEARNING IN R : REGRESSION

  28. Model Performance : Train v s . Test 2 Training : RMSE 0.71, R 0.8 2 Test : RMSE 0.93, R 0.75 SUPERVISED LEARNING IN R : REGRESSION

  29. Cross - Validation Preferred w hen data is not large eno u gh to split o � a test set SUPERVISED LEARNING IN R : REGRESSION

  30. Cross - Validation SUPERVISED LEARNING IN R : REGRESSION

  31. Cross - Validation SUPERVISED LEARNING IN R : REGRESSION

  32. Cross - Validation SUPERVISED LEARNING IN R : REGRESSION

  33. Create a cross -v alidation plan library(vtreat) splitPlan <- kWayCrossValidation(nRows, nSplits, NULL, NULL) nRows : n u mber of ro w s in the training data nSplits : n u mber folds ( partitions ) in the cross -v alidation e . g , nfolds = 3 for 3-w a y cross -v alidation remaining 2 arg u ments not needed here SUPERVISED LEARNING IN R : REGRESSION

  34. Create a cross -v alidation plan library(vtreat) splitPlan <- kWayCrossValidation(10, 3, NULL, NULL) First fold ( A and B to train , C to test ) splitPlan[[1]] $train 1 2 4 5 7 9 10 $app 3 6 8 Train on A and B , test on C , etc ... split <- splitPlan[[1]] model <- lm(fmla, data = df[split$train,]) df$pred.cv[split$app] <- predict(model, newdata = df[split$app,]) SUPERVISED LEARNING IN R : REGRESSION

  35. Final Model SUPERVISED LEARNING IN R : REGRESSION

  36. E x ample : Unemplo y ment Model 2 R Meas u re t y pe RMSE train 0.7082675 0.8029275 test 0.9349416 0.7451896 cross -v alidation 0.8175714 0.7635331 SUPERVISED LEARNING IN R : REGRESSION

  37. Let ' s practice ! SU P E R VISE D L E AR N IN G IN R : R E G R E SSION

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend