Lecture 2 Diagnostics and Model Evaluation Colin Rundel 1/23/2017 - PowerPoint PPT Presentation

Lecture 2 Diagnostics and Model Evaluation Colin Rundel 1/23/2017 1

From last time 2

Linear model and data library (rjags) library (dplyr) set.seed (01172017) beta = c (0.7,1.5,-2.2,0.1) X = cbind (X0, X1, X2, X3) Y = X %*% beta + eps 3 n = 100 eps = rnorm (n, mean=0, sd=1) X0 = rep (1, n) X1 = rt (n,df=5) X2 = rt (n,df=5) X3 = rt (n,df=5) d = data.frame (Y,X[,-1])

Bayesian model for(j in 1:4){ ## } sigma <- 1/sqrt(tau2) ## ~ dgamma(1, 1) tau2 ## # Prior for the inverse variance ## ## } ## beta[j] ~ dnorm(0,1/100) ## ## ## model{ # Prior for beta ## ## } ## mu[i] <- beta[1] + beta[2]*X1[i] + beta[3]*X2[i] + beta[4]*X3[i] ## ~ dnorm(mu[i],tau2) Y[i] ## for(i in 1:length(Y)){ ## # Likelihood ## 4

5 beta[1] beta[2] beta[3] 5 4 4 4 3 3 2 2 2 1 1 parameter beta[1] 0 0 0 density beta[2] 0.6 0.8 1.0 1.4 1.6 1.8 −2.5 −2.4 −2.3 −2.2 −2.1 −2.0 −1.9 beta[3] beta[4] sigma beta[4] 6 sigma 4 4 2 2 0 0 −0.2 0.0 0.2 0.7 0.8 0.9 1.0 1.1 value

Model Evaluation 6

Y 2 1 Y i 1 Y i Corr Y Y 2 Y 2 Y 2 1 Y i 1 Y i Model assessment? i Y 2 n i n 1 i Y i 2 n i n Error R 2 1 gives us the variability in Y explained by our model. Quick review: n i Y i Total n i Y i Y 2 Model n i 1 Y i Y i 2 7 If we think back to our first regression class, one common option is R 2 which

1 Y i 1 Y i Corr Y Y 2 Y 2 Y 2 1 Y i 1 Y i Model assessment? n i Y 2 n 1 i Error n i Y i 2 n i R 2 Y i 7 Total n Y i Model n n Quick review: gives us the variability in Y explained by our model. If we think back to our first regression class, one common option is R 2 which ∑ ∑ ∑ ( Y i − ¯ ( ˆ Y i − ¯ Y i − ˆ Y ) 2 = ) 2 + ( ) 2 i = 1 i = 1

Model assessment? n Y i Y Error Y i n Model Y 7 gives us the variability in Y explained by our model. Quick review: Total i n If we think back to our first regression class, one common option is R 2 which ∑ ∑ ∑ ( Y i − ¯ ( ˆ Y i − ¯ Y i − ˆ Y ) 2 = ) 2 + ( ) 2 i = 1 i = 1 ( ˆ Y i − ¯ ( ˆ Y i − ˆ ∑ n ) 2 ∑ n ) 2 R 2 = Corr ( Y , ˆ Y ) 2 = i = 1 i = 1 Y ) 2 = 1 − i = 1 ( Y i − ¯ i = 1 ( Y i − ¯ ∑ n ∑ n Y ) 2

Bayesian R 2 1.7027229 0.4478495 3.035438 -4.419457 2.2174576 0.3661503 ## [3,] 3.2258841 2.608588 -2.472159 1.1553329 ## [4,] ## [1,] -1.7193735 2.712832 -2.935192 -0.3472465 -0.5000324 1.8120911 2.612417 -3.349495 2.3028439 1.0398770 ## [5,] 1.2504531 1.996477 -2.424596 0.5437237 0.7191012 ## [2,] Yhat[5] While we can collapse our posterior parameter distributions to a single Yhat[4] Y and then R 2 , we don’t need to. Y_hat = matrix (NA, nrow=n_sim, ncol=n, dimnames= list (NULL, paste0 (”Yhat[”,1:n,”]”))) { beta_post = samp[[1]][i,1:4] sigma_post = samp[[1]][i,5] } Y_hat[1:5, 1:5] ## Yhat[1] Yhat[2] Yhat[3] 8 value (using mean, median, etc.) before calculating ˆ n_sim = 5000; n = 100 for(i in 1:n_sim) Y_hat[i,] = beta_post %*% t (X) + rnorm (n, sd = sigma_post)

9 Y - lm vs Bayesian lm ˆ Yhat[1] Yhat[2] Yhat[3] 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 parameter Yhat[1] 0.0 0.0 0.0 Yhat[2] density −2 0 2 4 0 2 4 6 −5.0 −2.5 0.0 Yhat[3] Yhat[4] Yhat[5] Yhat[6] Yhat[4] Yhat[5] 0.4 0.4 0.4 Yhat[6] 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0.0 0.0 0.0 0.0 2.5 −4 −2 0 2 −4 −2 0 2 value

Posterior R 2 Max. ## [1] 0.9262839 For each posterior sample s we can calculate R 2 0.8672 0.9168 0.8410 0.8549 0.8536 ## [1,] 0.7531 summary ( lm (Y~., data=d))$r.squared Mean 3rd Qu. Min. 1st Qu. Median 10 ## summary ( c (R2_post)) %>% t () s = Corr ( Y , ˆ Y s ) 2 , R2_post = apply (Y_hat, 1, function(Y_hat_s) cor (Y, Y_hat_s)^2) 20 15 density 10 5 0 0.75 0.80 0.85 0.90 value

What if we collapsed first? [,1] Yhat[6] ## 1.3336394 2.7270929 -3.1659989 1.1631532 -0.1003662 -1.5807078 cor (Y_hat_post_mean, Y)^2 ## ## [1,] 0.9264776 Yhat[4] cor (Y_hat_post_med, Y)^2 ## [,1] ## [1,] 0.9264527 summary ( lm (Y~., data=d))$r.squared ## [1] 0.9262839 Yhat[5] Yhat[3] Y_hat_post_mean = apply (Y_hat, 2, mean) Yhat[6] head (Y_hat_post_mean) ## Yhat[1] Yhat[2] Yhat[3] Yhat[4] Yhat[5] ## Yhat[2] 1.3324163 2.7363221 -3.1772690 1.1531411 -0.1029881 -1.5918980 Y_hat_post_med head (Y_hat_post_med) ## Yhat[1] 11 = apply (Y_hat, 2, median)

1 Y i 1 Y i Corr Y Y 2 Y 2 Y 2 1 Y i 1 Y i What went wrong? i Y 2 n i n 1 i Y i 2 n i n R 2 If our criteria is to maximize R 2 , then nothing. Why this result? X i Remember that MLE LS , the latter of which is achieved by arg min n i 1 Y i 2 that tell us about arg min n i 1 Y i Y i 2 So if we have such that it minimizes the least squares criterion what does 12

1 Y i 1 Y i Corr Y Y 2 Y 2 Y 2 1 Y i 1 Y i What went wrong? i Y 2 n i n 1 i Y i 2 n i n R 2 If our criteria is to maximize R 2 , then nothing. Why this result? n arg min n that tell us about 12 Y i So if we have such that it minimizes the least squares criterion what does Remember that ˆ β MLE = ˆ β LS , the latter of which is achieved by ( Y i − X i · β ) 2 = arg min ∑ ∑ Y i − ˆ ( ) 2 β β i = 1 i = 1

What went wrong? n Y i Y If our criteria is to maximize R 2 , then nothing. Why this result? that tell us about Y i 12 arg min n Remember that ˆ β MLE = ˆ β LS , the latter of which is achieved by ( Y i − X i · β ) 2 = arg min ∑ ∑ Y i − ˆ ( ) 2 β β i = 1 i = 1 So if we have β such that it minimizes the least squares criterion what does ( ˆ Y i − ¯ Y i − ˆ ) 2 ( ) 2 ∑ n ∑ n R 2 = Corr ( Y , ˆ Y ) 2 = i = 1 i = 1 Y ) 2 = 1 − i = 1 ( Y i − ¯ i = 1 ( Y i − ¯ ∑ n ∑ n Y ) 2

inferences about key parameters) Some problems with R 2 13 • R 2 always increases (or stays the same) when a predictor is added • R 2 is highly susecptible to over fitting • R 2 is sensitive to outliers • R 2 depends heavily on current values of Y • R 2 can differ drastically for two equivalent models (i.e. nearly identical

Other Metrics 14

Root Mean Square Error Y i equation. n n s n n s 1 The traditional definition of rmse is as follows parameter / prediction of interest we can express this equation as In the bayesian context where we have posterior samples from each 15 n n � � ∑ � Y i − ˆ RMSE = ( ) 2 � 1 i = 1 � � � ∑ ∑ Y i − ˆ ( ) 2 RMSE = � Y i , s s = 1 i = 1 Note that as we just saw with R 2 using the first definition with ˆ s = 1 ˆ Y i = ∑ n s Y i , s / n does not necessarily give the same result as the 2nd

Continuous Rank Probability Score RMSE (and related metrics like MAE) are not directly applicable to true/observed value of Y . Y (the posterior predictive distribution for Y ) dz 16 distribution using a Continuous Rank Probability Score Y is given by a predictive Y i . We can probabilistic predictions since they require fixed values of ˆ generalize to a fully continuous case where ˆ ∫ ∞ ( ) 2 CRPS = Y ( z ) − 1 { z ≥ Y } F ˆ −∞ Y is the empirical CDF of ˆ where F ˆ and 1 z ≥ Y is the indicator function which equals 1 when z ≥ Y , the

Accuracy vs. Precision 17 dist1 (rmse = 2.002) dist2 (rmse = 2.798) dist3 (rmse = 1.002) dist4 (rmse = 2.22) 0.4 0.3 dist dist1 density 0.2 dist2 dist3 dist4 0.1 0.0 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 value

CDF vs Indicator ## dist1 dist2 dist3 dist4 ## 0.468 1.188 0.235 1.432 18 1.00 0.75 dist dist1 dist2 0.50 y dist3 dist4 0.25 0.00 −10 −5 0 5 10 value

Empirical Coverage One final method of assessing model calibration is assessing how well credible intervals, derived from the posterior predictive distributions of the Y s, capture the true/observed values. ## 90% CI empirical coverage ## 0.92 19

Empirical Coverage One final method of assessing model calibration is assessing how well 0.92 ## ## 90% CI empirical coverage 19 credible intervals, derived from the posterior predictive distributions of the Y s, capture the true/observed values. 5 capture 0 FALSE Y TRUE −5 0 10 20 30 40 50 index

Cross-validation 20

Cross-validation styles Kaggle style: k -fold: 21

Cross-validation in R with modelr test 5 ## 5 <S3: resample> <S3: resample> 4 ## 4 <S3: resample> <S3: resample> 3 ## 3 <S3: resample> <S3: resample> 2 ## 2 <S3: resample> <S3: resample> 1 ## 1 <S3: resample> <S3: resample> <list> <chr> <list> ## .id train library (modelr) ## ## # A tibble: 5 × 3 d_kfold ## <resample [16 x 4]> 5, 10, 18, 21, 31, 41, 45, 46, 57, 68, ... ## $test2 ## ## <resample [15 x 4]> 12, 19, 23, 30, 34, 35, 36, 48, 54, 58, ... ## $test1 ## ## <resample [69 x 4]> 1, 2, 3, 4, 6, 7, 8, 9, 11, 13, ... ## $train d_kaggle 22 d_kaggle = resample_partition (d, c (train=0.70, test1=0.15, test2=0.15)) d_kfold = crossv_kfold (d, k=5)

Lecture 2 Diagnostics and Model Evaluation Colin Rundel 1/23/2017 - PowerPoint PPT Presentation

Lecture 2 Diagnostics and Model Evaluation Colin Rundel 1/23/2017 1 From last time 2 Linear model and data library (rjags) library (dplyr) set.seed (01172017) beta = c (0.7,1.5,-2.2,0.1) X = cbind (X0, X1, X2, X3) Y = X %*% beta + eps 3 n

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

10703 Deep Reinforcement Learning Policy Gradient Methods Part 3 Tom Mitchell October 8, 2018

Introduction to Machine Learning Milan Straka October 07, 2019 Charles University in Prague

MY FIRST STEPS IN SLIT SPECTROSCOPY BAAVSS Spectroscopy Workshop Norman Lockyer Observatory

Gauge mediation with a local flavour Felix Brmmer partly based on 1312.0935 (with M. McGarrie,

Introduction to Machine Learning Evaluation: Measures for Regression Learning goals Know the

CSE 158 Lecture 2 Web Mining and Recommender Systems Supervised learning Regression

Day 6: Model Selection II Lucas Leemann Essex Summer School Introduction to Statistical Learning

( Y n a bX n ) 2 . n = 1 Thus, Note that E [ X ] = 0 and E [ Y ] = 0 in these