E v al u ating a model graphicall y SU P E R VISE D L E AR N IN G - PowerPoint PPT Presentation

E v al u ating a model graphicall y SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z u mel and John Mo u nt Win - Vector LLC

Plotting Gro u nd Tr u th v s . Predictions A w ell � � ing model A poorl y � � ing model x = y line r u ns thro u gh Points are all on one side of center of points x = y line " line of perfect prediction " S y stematic errors SUPERVISED LEARNING IN R : REGRESSION

The Resid u al Plot A w ell � � ing model A poorl y � � ing model Resid u al : act u al o u tcome - S y stematic errors prediction Good � t : no s y stematic errors SUPERVISED LEARNING IN R : REGRESSION

The Gain C u r v e Meas u res ho w w ell model sorts the o u tcome x- a x is : ho u ses in model - sorted order ( decreasing ) y- a x is : fraction of total acc u m u lated home sales Wi z ard c u r v e : perfect model SUPERVISED LEARNING IN R : REGRESSION

Reading the Gain C u r v e GainCurvePlot(houseprices, "prediction", "price", "Home price model") SUPERVISED LEARNING IN R : REGRESSION

Let ' s practice ! SU P E R VISE D L E AR N IN G IN R : R E G R E SSION

Root Mean Sq u ared Error ( RMSE ) SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z u mel and John Mo u nt Win - Vector LLC

What is Root Mean Sq u ared Error ( RMSE )? √ ( pred − y ) 2 RMSE = w here pred − y : the error , or resid u als v ector 2 ( pred − y ) 2 : mean v al u e of ( pred − y ) SUPERVISED LEARNING IN R : REGRESSION

RMSE of the Home Sales Price Model # Calculate error err <- houseprices$prediction - houseprices$price price : col u mn of act u al sale prices ( in tho u sands ) prediction : col u mn of predicted sale prices ( in tho u sands ) SUPERVISED LEARNING IN R : REGRESSION

RMSE of the Home Sales Price Model # Calculate error err <- houseprices$prediction - houseprices$price # Square the error vector err2 <- err^2 SUPERVISED LEARNING IN R : REGRESSION

RMSE of the Home Sales Price Model # Calculate error err <- houseprices$prediction - houseprices$price # Square the error vector err2 <- err^2 # Take the mean, and sqrt it (rmse <- sqrt(mean(err2))) 58.33908 RMSE ≈ 58.3 SUPERVISED LEARNING IN R : REGRESSION

Is the RMSE Large or Small ? # Take the mean, and sqrt it (rmse <- sqrt(mean(err2))) 58.33908 # The standard deviation of the outcome (sdtemp <- sd(houseprices$price)) 135.2694 RMSE ≈ 58.3 sd ( price ) ≈ 135 SUPERVISED LEARNING IN R : REGRESSION

2 R - Sq u ared ( R ) SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z u mel and John Mo u nt Win - Vector LLC

2 What is R ? A meas u re of ho w w ell the model � ts or e x plains the data A v al u e bet w een 0-1 near 1: model � ts w ell near 0: no be � er than g u essing the a v erage v al u e SUPERVISED LEARNING IN R : REGRESSION

2 Calc u lating R 2 R is the v ariance e x plained b y the model . RSS 2 R = 1 − SS Tot w here 2 RSS = ( y − prediction ) ∑ Resid u al s u m of sq u ares (v ariance from model ) 2 = ( y − ) ∑ SS y Tot Total s u m of sq u ares (v ariance of data ) SUPERVISED LEARNING IN R : REGRESSION

2 Calc u late R of the Ho u se Price Model : RSS Calc u late error err <- houseprices$prediction - houseprices$price Sq u are it and take the s u m rss <- sum(err^2) price : col u mn of act u al sale prices ( in tho u sands ) pred : col u mn of predicted sale prices ( in tho u sands ) RSS ≈ 136138 SUPERVISED LEARNING IN R : REGRESSION

2 Calc u late R of the Ho u se Price Model : SS Tot Take the di � erence of prices from the mean price toterr <- houseprices$price - mean(houseprices$price) Sq u are it and take the s u m sstot <- sum(toterr^2) RSS ≈ 136138 ≈ 713615 SS Tot SUPERVISED LEARNING IN R : REGRESSION

2 Calc u late R of the Ho u se Price Model (r_squared <- 1 - (rss/sstot) ) 0.8092278 RSS ≈ 136138 ≈ 713615 SS Tot 2 R ≈ 0.809 SUPERVISED LEARNING IN R : REGRESSION

2 Reading R from the lm () model # From summary() summary(hmodel) ... Residual standard error: 60.66 on 37 degrees of freedom Multiple R-squared: 0.8092, Adjusted R-squared: 0.7989 F-statistic: 78.47 on 2 and 37 DF, p-value: 4.893e-14 summary(hmodel)$r.squared 0.8092278 # From glance() glance(hmodel)$r.squared 0.8092278 SUPERVISED LEARNING IN R : REGRESSION

2 Correlation and R rho <- cor(houseprices$prediction, houseprices$price) 0.8995709 rho^2 0.8092278 ρ = cor(prediction, price) = 0.8995709 2 2 ρ = 0.8092278 = R SUPERVISED LEARNING IN R : REGRESSION

2 Correlation and R Tr u e for models that minimi z e sq u ared error : Linear regression GAM regression Tree - based algorithms that minimi z e sq u ared error Tr u e for training data ; NOT tr u e for f u t u re application data SUPERVISED LEARNING IN R : REGRESSION

Properl y Training a Model SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z u mel and John Mo u nt Win - Vector , LLC

Models can perform m u ch better on training than the y do on f u t u re data . 2 2 Training R : 0.9; Test R : 0.15 -- O v er � t SUPERVISED LEARNING IN R : REGRESSION

Test / Train Split Recommended method w hen data is plentif u l SUPERVISED LEARNING IN R : REGRESSION

E x ample : Model Female Unemplo y ment Train on 66 ro w s , test on 30 ro w s SUPERVISED LEARNING IN R : REGRESSION

Model Performance : Train v s . Test 2 Training : RMSE 0.71, R 0.8 2 Test : RMSE 0.93, R 0.75 SUPERVISED LEARNING IN R : REGRESSION

Cross - Validation Preferred w hen data is not large eno u gh to split o � a test set SUPERVISED LEARNING IN R : REGRESSION

Cross - Validation SUPERVISED LEARNING IN R : REGRESSION

Create a cross -v alidation plan library(vtreat) splitPlan <- kWayCrossValidation(nRows, nSplits, NULL, NULL) nRows : n u mber of ro w s in the training data nSplits : n u mber folds ( partitions ) in the cross -v alidation e . g , nfolds = 3 for 3-w a y cross -v alidation remaining 2 arg u ments not needed here SUPERVISED LEARNING IN R : REGRESSION

Create a cross -v alidation plan library(vtreat) splitPlan <- kWayCrossValidation(10, 3, NULL, NULL) First fold ( A and B to train , C to test ) splitPlan[[1]] $train 1 2 4 5 7 9 10 $app 3 6 8 Train on A and B , test on C , etc ... split <- splitPlan[[1]] model <- lm(fmla, data = df[split$train,]) df$pred.cv[split$app] <- predict(model, newdata = df[split$app,]) SUPERVISED LEARNING IN R : REGRESSION

Final Model SUPERVISED LEARNING IN R : REGRESSION

E x ample : Unemplo y ment Model 2 R Meas u re t y pe RMSE train 0.7082675 0.8029275 test 0.9349416 0.7451896 cross -v alidation 0.8175714 0.7635331 SUPERVISED LEARNING IN R : REGRESSION

E v al u ating a model graphicall y SU P E R VISE D L E AR N IN G - PowerPoint PPT Presentation

E v al u ating a model graphicall y SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z u mel and John Mo u nt Win - Vector LLC Plotting Gro u nd Tr u th v s . Predictions A w ell ing model A poorl y ing model x = y line r

WoodsE dg e L e a rning Ce nte r Advo c ating and Cre ating Uniq ue L e arning E nviro

Cosmological model : Cosmological model Cosmological model Cosmological model : : : :

Integr Integrating social & ec ating social & ecological needs w ological needs when

Cr e ating Pathways fo r All Pre se nte d b y CT E T a sk Gro up Ma king the Co nne c tio

Carlos da Fonseca ATING DIGITAL TRANSFORMATION GOOD PRACTICES FOR ACCELER- Brazilian Digital

Cre Creat ating ing a RevM a RevMan Pr an Profil ofile 1. Start RevMan 2. Select the

E ating on the Run Presentation by NCES from the book by Evelyn Tribole, MS, RD Third Edition,

e ating the m he y said Im a squar e fo r Dr Ra c he l Po ve y Asso c iate Pr o fe

MOLECULA R A NALYSIS OF SINGLE C IRCU L ATING AND DISSEMINATED TUMOR C E LLS ON CHIP CRISTINA DE

NARRAGANSETT BAY COMMISSION R ATING AGENCY PRESENTATION P ROPOSED $268.7 MILLION WIFIA LOAN J ULY

Ce rtific a tion Use r Group 2018 Annua l Me e ting Ce le br ating Str o nge r Par tne r

Cre Creat ating ing a an n online online GRAD GRADE E Profi Profile le 1. Download and

Europ ropean ean Investm stment ent Fund d (EIF) F) Creat ating ng Winning ng Nationa

Measur suring ing Result lts s and nd Evalu luating ating Impa pact: t: Turn rnin ing

Cr e ating a c ultur e of e vide nc e - base d r e spite c ar e thr ough an inte r

Med ediating ating Lea earner rner-Cont Conten ent Interaction eraction Using g Emergin

METHOD HODS FOR T TESTING G UNI NIFORMITY S STATI TISTI TICS CS KRISHNA PATEL AND ROBERT

The Distribution of the Sample Mean Suppose that X 1 , X 2 , . . . , X n are a simple random sample

Statistics, Probability, Distributions, & Error Propagation James R. Graham 9/2/09 1

Foundations of Computer Science Lecture 21 Deviations from the Mean How Good is the Expectation

Simulating the impact of wind power on the Nord Pool Reference year: 2011 (26% penetration, i.e.,

Introductory Statistics Day 25 Paired Means Test 4.4 Paired Tests Find the data set

Presented in collaboration with Nebraska ICAP, Nebraska DHHS HAI Team, Nebraska Medicine, and The

Applications of S-measurability to regularity and limit theorems Pisa 2006 David A. Ross

Sambuz

Useful Links

Newsletter

Mail Us

E v al u ating a model graphicall y SU P E R VISE D L E AR N IN G - PowerPoint PPT Presentation

E v al u ating a model graphicall y SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z u mel and John Mo u nt Win - Vector LLC Plotting Gro u nd Tr u th v s . Predictions A w ell ing model A poorl y ing model x = y line r

WoodsE dg e L e a rning Ce nte r Advo c ating and Cre ating Uniq ue L e arning E nviro

Cosmological model : Cosmological model Cosmological model Cosmological model : : : :

Integr Integrating social &amp; ec ating social &amp; ecological needs w ological needs when

Cr e ating Pathways fo r All Pre se nte d b y CT E T a sk Gro up Ma king the Co nne c tio

Carlos da Fonseca ATING DIGITAL TRANSFORMATION GOOD PRACTICES FOR ACCELER- Brazilian Digital

Cre Creat ating ing a RevM a RevMan Pr an Profil ofile 1. Start RevMan 2. Select the

E ating on the Run Presentation by NCES from the book by Evelyn Tribole, MS, RD Third Edition,

e ating the m he y said Im a squar e fo r Dr Ra c he l Po ve y Asso c iate Pr o fe

MOLECULA R A NALYSIS OF SINGLE C IRCU L ATING AND DISSEMINATED TUMOR C E LLS ON CHIP CRISTINA DE

NARRAGANSETT BAY COMMISSION R ATING AGENCY PRESENTATION P ROPOSED $268.7 MILLION WIFIA LOAN J ULY

Ce rtific a tion Use r Group 2018 Annua l Me e ting Ce le br ating Str o nge r Par tne r

Cre Creat ating ing a an n online online GRAD GRADE E Profi Profile le 1. Download and

Europ ropean ean Investm stment ent Fund d (EIF) F) Creat ating ng Winning ng Nationa

Measur suring ing Result lts s and nd Evalu luating ating Impa pact: t: Turn rnin ing

Cr e ating a c ultur e of e vide nc e - base d r e spite c ar e thr ough an inte r

Med ediating ating Lea earner rner-Cont Conten ent Interaction eraction Using g Emergin

METHOD HODS FOR T TESTING G UNI NIFORMITY S STATI TISTI TICS CS KRISHNA PATEL AND ROBERT

The Distribution of the Sample Mean Suppose that X 1 , X 2 , . . . , X n are a simple random sample

Statistics, Probability, Distributions, &amp; Error Propagation James R. Graham 9/2/09 1

Foundations of Computer Science Lecture 21 Deviations from the Mean How Good is the Expectation

Simulating the impact of wind power on the Nord Pool Reference year: 2011 (26% penetration, i.e.,

Introductory Statistics Day 25 Paired Means Test 4.4 Paired Tests Find the data set

Presented in collaboration with Nebraska ICAP, Nebraska DHHS HAI Team, Nebraska Medicine, and The

Applications of S-measurability to regularity and limit theorems Pisa 2006 David A. Ross

Sambuz

Useful Links

Newsletter

Mail Us

Integr Integrating social & ec ating social & ecological needs w ological needs when

Statistics, Probability, Distributions, & Error Propagation James R. Graham 9/2/09 1