lecture 2
play

Lecture 2 Diagnostics and Model Evaluation Colin Rundel 1/23/2017 - PowerPoint PPT Presentation

Lecture 2 Diagnostics and Model Evaluation Colin Rundel 1/23/2017 1 From last time 2 Linear model and data library (rjags) library (dplyr) set.seed (01172017) beta = c (0.7,1.5,-2.2,0.1) X = cbind (X0, X1, X2, X3) Y = X %*% beta + eps 3 n


  1. Lecture 2 Diagnostics and Model Evaluation Colin Rundel 1/23/2017 1

  2. From last time 2

  3. Linear model and data library (rjags) library (dplyr) set.seed (01172017) beta = c (0.7,1.5,-2.2,0.1) X = cbind (X0, X1, X2, X3) Y = X %*% beta + eps 3 n = 100 eps = rnorm (n, mean=0, sd=1) X0 = rep (1, n) X1 = rt (n,df=5) X2 = rt (n,df=5) X3 = rt (n,df=5) d = data.frame (Y,X[,-1])

  4. Bayesian model for(j in 1:4){ ## } sigma <- 1/sqrt(tau2) ## ~ dgamma(1, 1) tau2 ## # Prior for the inverse variance ## ## } ## beta[j] ~ dnorm(0,1/100) ## ## ## model{ # Prior for beta ## ## } ## mu[i] <- beta[1] + beta[2]*X1[i] + beta[3]*X2[i] + beta[4]*X3[i] ## ~ dnorm(mu[i],tau2) Y[i] ## for(i in 1:length(Y)){ ## # Likelihood ## 4

  5. 5 beta[1] beta[2] beta[3] 5 4 4 4 3 3 2 2 2 1 1 parameter beta[1] 0 0 0 density beta[2] 0.6 0.8 1.0 1.4 1.6 1.8 −2.5 −2.4 −2.3 −2.2 −2.1 −2.0 −1.9 beta[3] beta[4] sigma beta[4] 6 sigma 4 4 2 2 0 0 −0.2 0.0 0.2 0.7 0.8 0.9 1.0 1.1 value

  6. Model Evaluation 6

  7. Y 2 1 Y i 1 Y i Corr Y Y 2 Y 2 Y 2 1 Y i 1 Y i Model assessment? i Y 2 n i n 1 i Y i 2 n i n Error R 2 1 gives us the variability in Y explained by our model. Quick review: n i Y i Total n i Y i Y 2 Model n i 1 Y i Y i 2 7 If we think back to our first regression class, one common option is R 2 which

  8. 1 Y i 1 Y i Corr Y Y 2 Y 2 Y 2 1 Y i 1 Y i Model assessment? n i Y 2 n 1 i Error n i Y i 2 n i R 2 Y i 7 Total n Y i Model n n Quick review: gives us the variability in Y explained by our model. If we think back to our first regression class, one common option is R 2 which ∑ ∑ ∑ ( Y i − ¯ ( ˆ Y i − ¯ Y i − ˆ Y ) 2 = ) 2 + ( ) 2 i = 1 i = 1

  9. Model assessment? n Y i Y Error Y i n Model Y 7 gives us the variability in Y explained by our model. Quick review: Total i n If we think back to our first regression class, one common option is R 2 which ∑ ∑ ∑ ( Y i − ¯ ( ˆ Y i − ¯ Y i − ˆ Y ) 2 = ) 2 + ( ) 2 i = 1 i = 1 ( ˆ Y i − ¯ ( ˆ Y i − ˆ ∑ n ) 2 ∑ n ) 2 R 2 = Corr ( Y , ˆ Y ) 2 = i = 1 i = 1 Y ) 2 = 1 − i = 1 ( Y i − ¯ i = 1 ( Y i − ¯ ∑ n ∑ n Y ) 2

  10. Bayesian R 2 1.7027229 0.4478495 3.035438 -4.419457 2.2174576 0.3661503 ## [3,] 3.2258841 2.608588 -2.472159 1.1553329 ## [4,] ## [1,] -1.7193735 2.712832 -2.935192 -0.3472465 -0.5000324 1.8120911 2.612417 -3.349495 2.3028439 1.0398770 ## [5,] 1.2504531 1.996477 -2.424596 0.5437237 0.7191012 ## [2,] Yhat[5] While we can collapse our posterior parameter distributions to a single Yhat[4] Y and then R 2 , we don’t need to. Y_hat = matrix (NA, nrow=n_sim, ncol=n, dimnames= list (NULL, paste0 (”Yhat[”,1:n,”]”))) { beta_post = samp[[1]][i,1:4] sigma_post = samp[[1]][i,5] } Y_hat[1:5, 1:5] ## Yhat[1] Yhat[2] Yhat[3] 8 value (using mean, median, etc.) before calculating ˆ n_sim = 5000; n = 100 for(i in 1:n_sim) Y_hat[i,] = beta_post %*% t (X) + rnorm (n, sd = sigma_post)

  11. 9 Y - lm vs Bayesian lm ˆ Yhat[1] Yhat[2] Yhat[3] 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 parameter Yhat[1] 0.0 0.0 0.0 Yhat[2] density −2 0 2 4 0 2 4 6 −5.0 −2.5 0.0 Yhat[3] Yhat[4] Yhat[5] Yhat[6] Yhat[4] Yhat[5] 0.4 0.4 0.4 Yhat[6] 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0.0 0.0 0.0 0.0 2.5 −4 −2 0 2 −4 −2 0 2 value

  12. Posterior R 2 Max. ## [1] 0.9262839 For each posterior sample s we can calculate R 2 0.8672 0.9168 0.8410 0.8549 0.8536 ## [1,] 0.7531 summary ( lm (Y~., data=d))$r.squared Mean 3rd Qu. Min. 1st Qu. Median 10 ## summary ( c (R2_post)) %>% t () s = Corr ( Y , ˆ Y s ) 2 , R2_post = apply (Y_hat, 1, function(Y_hat_s) cor (Y, Y_hat_s)^2) 20 15 density 10 5 0 0.75 0.80 0.85 0.90 value

  13. What if we collapsed first? [,1] Yhat[6] ## 1.3336394 2.7270929 -3.1659989 1.1631532 -0.1003662 -1.5807078 cor (Y_hat_post_mean, Y)^2 ## ## [1,] 0.9264776 Yhat[4] cor (Y_hat_post_med, Y)^2 ## [,1] ## [1,] 0.9264527 summary ( lm (Y~., data=d))$r.squared ## [1] 0.9262839 Yhat[5] Yhat[3] Y_hat_post_mean = apply (Y_hat, 2, mean) Yhat[6] head (Y_hat_post_mean) ## Yhat[1] Yhat[2] Yhat[3] Yhat[4] Yhat[5] ## Yhat[2] 1.3324163 2.7363221 -3.1772690 1.1531411 -0.1029881 -1.5918980 Y_hat_post_med head (Y_hat_post_med) ## Yhat[1] 11 = apply (Y_hat, 2, median)

  14. 1 Y i 1 Y i Corr Y Y 2 Y 2 Y 2 1 Y i 1 Y i What went wrong? i Y 2 n i n 1 i Y i 2 n i n R 2 If our criteria is to maximize R 2 , then nothing. Why this result? X i Remember that MLE LS , the latter of which is achieved by arg min n i 1 Y i 2 that tell us about arg min n i 1 Y i Y i 2 So if we have such that it minimizes the least squares criterion what does 12

  15. 1 Y i 1 Y i Corr Y Y 2 Y 2 Y 2 1 Y i 1 Y i What went wrong? i Y 2 n i n 1 i Y i 2 n i n R 2 If our criteria is to maximize R 2 , then nothing. Why this result? n arg min n that tell us about 12 Y i So if we have such that it minimizes the least squares criterion what does Remember that ˆ β MLE = ˆ β LS , the latter of which is achieved by ( Y i − X i · β ) 2 = arg min ∑ ∑ Y i − ˆ ( ) 2 β β i = 1 i = 1

  16. What went wrong? n Y i Y If our criteria is to maximize R 2 , then nothing. Why this result? that tell us about Y i 12 arg min n Remember that ˆ β MLE = ˆ β LS , the latter of which is achieved by ( Y i − X i · β ) 2 = arg min ∑ ∑ Y i − ˆ ( ) 2 β β i = 1 i = 1 So if we have β such that it minimizes the least squares criterion what does ( ˆ Y i − ¯ Y i − ˆ ) 2 ( ) 2 ∑ n ∑ n R 2 = Corr ( Y , ˆ Y ) 2 = i = 1 i = 1 Y ) 2 = 1 − i = 1 ( Y i − ¯ i = 1 ( Y i − ¯ ∑ n ∑ n Y ) 2

  17. inferences about key parameters) Some problems with R 2 13 • R 2 always increases (or stays the same) when a predictor is added • R 2 is highly susecptible to over fitting • R 2 is sensitive to outliers • R 2 depends heavily on current values of Y • R 2 can differ drastically for two equivalent models (i.e. nearly identical

  18. Other Metrics 14

  19. Root Mean Square Error Y i equation. n n s n n s 1 The traditional definition of rmse is as follows parameter / prediction of interest we can express this equation as In the bayesian context where we have posterior samples from each 15 n n � � ∑ � Y i − ˆ RMSE = ( ) 2 � 1 i = 1 � � � ∑ ∑ Y i − ˆ ( ) 2 RMSE = � Y i , s s = 1 i = 1 Note that as we just saw with R 2 using the first definition with ˆ s = 1 ˆ Y i = ∑ n s Y i , s / n does not necessarily give the same result as the 2nd

  20. Continuous Rank Probability Score RMSE (and related metrics like MAE) are not directly applicable to true/observed value of Y . Y (the posterior predictive distribution for Y ) dz 16 distribution using a Continuous Rank Probability Score Y is given by a predictive Y i . We can probabilistic predictions since they require fixed values of ˆ generalize to a fully continuous case where ˆ ∫ ∞ ( ) 2 CRPS = Y ( z ) − 1 { z ≥ Y } F ˆ −∞ Y is the empirical CDF of ˆ where F ˆ and 1 z ≥ Y is the indicator function which equals 1 when z ≥ Y , the

  21. Accuracy vs. Precision 17 dist1 (rmse = 2.002) dist2 (rmse = 2.798) dist3 (rmse = 1.002) dist4 (rmse = 2.22) 0.4 0.3 dist dist1 density 0.2 dist2 dist3 dist4 0.1 0.0 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 value

  22. CDF vs Indicator ## dist1 dist2 dist3 dist4 ## 0.468 1.188 0.235 1.432 18 1.00 0.75 dist dist1 dist2 0.50 y dist3 dist4 0.25 0.00 −10 −5 0 5 10 value

  23. Empirical Coverage One final method of assessing model calibration is assessing how well credible intervals, derived from the posterior predictive distributions of the Y s, capture the true/observed values. ## 90% CI empirical coverage ## 0.92 19

  24. Empirical Coverage One final method of assessing model calibration is assessing how well 0.92 ## ## 90% CI empirical coverage 19 credible intervals, derived from the posterior predictive distributions of the Y s, capture the true/observed values. 5 capture 0 FALSE Y TRUE −5 0 10 20 30 40 50 index

  25. Cross-validation 20

  26. Cross-validation styles Kaggle style: k -fold: 21

  27. Cross-validation in R with modelr test 5 ## 5 <S3: resample> <S3: resample> 4 ## 4 <S3: resample> <S3: resample> 3 ## 3 <S3: resample> <S3: resample> 2 ## 2 <S3: resample> <S3: resample> 1 ## 1 <S3: resample> <S3: resample> <list> <chr> <list> ## .id train library (modelr) ## ## # A tibble: 5 × 3 d_kfold ## <resample [16 x 4]> 5, 10, 18, 21, 31, 41, 45, 46, 57, 68, ... ## $test2 ## ## <resample [15 x 4]> 12, 19, 23, 30, 34, 35, 36, 48, 54, 58, ... ## $test1 ## ## <resample [69 x 4]> 1, 2, 3, 4, 6, 7, 8, 9, 11, 13, ... ## $train d_kaggle 22 d_kaggle = resample_partition (d, c (train=0.70, test1=0.15, test2=0.15)) d_kfold = crossv_kfold (d, k=5)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend