Lecture 2 Diagnostics and Model Evaluation Colin Rundel 10/05/2018 - PowerPoint PPT Presentation

Lecture 2 Diagnostics and Model Evaluation Colin Rundel 10/05/2018 1

Some more linear models

Linear model and data ggplot (d, aes (x=x,y=y)) + 2 geom_smooth (method=”lm”, color=”blue”, se = FALSE) geom_line () + 7.5 5.0 2.5 y 0.0 −2.5 0 25 50 75 100 x

Linear model ## --- -3.32 0.00126 ** ## x 0.068409 0.005335 12.82 < 2e-16 *** ## Signif. codes: ## (Intercept) -1.030315 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.54 on 98 degrees of freedom ## Multiple R-squared: 0.6266, Adjusted R-squared: 0.6227 ## F-statistic: 164.4 on 1 and 98 DF, p-value: < 2.2e-16 0.310326 Estimate Std. Error t value Pr(>|t|) l = lm (y ~ x, data=d) Min summary (l) ## ## Call: ## lm(formula = y ~ x, data = d) ## ## Residuals: ## 1Q ## Median 3Q Max ## -2.6041 -1.2142 -0.1973 1.1969 3.7072 ## ## Coefficients: 3

Bayesian model specification (JAGS) for(j in 1:2){ }” sigma2 = 1/tau tau ~ dgamma(1, 1) # Prior for sigma / tau2 } beta[j] ~ dnorm(0,1/100) # Prior for beta model = } mu[i] = beta[1] + beta[2]*x[i] y[i] ~ dnorm(mu[i], tau) for(i in 1:length(y)){ # Likelihood ”model{ 4

Bayesian model fitting (JAGS) ..- attr(*, ”mcpar”)= num [1:3] 1001 6000 1 ## ..- attr(*, ”mcpar”)= num [1:3] 1001 6000 1 ## $ : 'mcmc' num [1:5000, 1:3] -0.981 -1.087 -0.768 -0.831 -0.907 ... ## ..- attr(*, ”dimnames”)=List of 2 ## ## ## $ : 'mcmc' num [1:5000, 1:3] -0.67 -0.854 -0.959 -1.169 -1.375 ... ## ..- attr(*, ”dimnames”)=List of 2 ## ..- attr(*, ”mcpar”)= num [1:3] 1001 6000 1 ## - attr(*, ”class”)= chr ”mcmc.list” ..- attr(*, ”dimnames”)=List of 2 $ : 'mcmc' num [1:5000, 1:3] -1.57 -1.55 -1.55 -1.69 -1.32 ... n_burn = 1000; n_iter = 5000 ## m = rjags:: jags.model ( textConnection (model), data=d, quiet=TRUE, n.chains = 4 ) update (m, n.iter=n_burn, progress.bar=”none”) samp = rjags:: coda.samples ( n.iter=n_iter, progress.bar=”none” ) str (samp, max.level=1) ## List of 4 ## $ : 'mcmc' num [1:5000, 1:3] -0.747 -0.677 -0.911 -0.906 -0.794 ... ## ..- attr(*, ”dimnames”)=List of 2 ## ..- attr(*, ”mcpar”)= num [1:3] 1001 6000 1 5 m, variable.names= c (”beta”,”sigma2”),

tidybayes i .variable .value parameter 1.69 sigma2 NA sigma2 4995 19995 4 ## 1 <dbl> <chr> <int> <int> <int> <chr> <int> ## .chain .iteration .draw 4 ## parameter, .chain [1] ## # Groups: ## # A tibble: 6 x 7 tail (df_mcmc) 0.0625 beta[2] 2 beta 3 3 ## 2 4996 19996 ## 6 ## 5 2.43 sigma2 NA sigma2 5000 20000 4 ## 6 1.81 sigma2 NA sigma2 4999 19999 4 2.65 sigma2 NA sigma2 NA sigma2 4998 19998 4 ## 4 2.20 sigma2 NA sigma2 4997 19997 4 ## 3 2.43 sigma2 1 beta[1] df_mcmc = tidybayes:: gather_draws (samp, beta[i], sigma2) %>% ## -0.747 1 beta 1 1 1 ## 1 <dbl> <chr> <int> <int> <int> <chr> <int> .value parameter ## 2 i .variable .chain .iteration .draw ## parameter, .chain [2] ## # Groups: ## # A tibble: 6 x 7 head (df_mcmc) group_by (parameter, .chain) mutate (parameter = paste0 (.variable, ifelse ( is.na (i),””, paste0 (”[”,i,”]”)))) %>% beta[1] 1 -0.911 1 1 beta 3 3 1 ## 5 0.0581 beta[2] 2 beta 2 2 ## 4 1 beta[1] -0.677 1 beta 2 2 1 ## 3 0.0646 beta[2] 2 beta 1 6

Posterior plots ggplot (df_mcmc, aes (fill= as.factor (.chain), group=.chain, x=.value)) + 7 facet_wrap (~parameter, scales = ”free”) geom_density (alpha=0.5, color=NA) + beta[1] beta[2] sigma2 1.25 80 1.00 1.0 60 as.factor(.chain) 0.75 1 density 40 2 3 0.50 0.5 4 20 0.25 0.0 0 0.00 −2.0 −1.5 −1.0 −0.5 0.0 0.05 0.06 0.07 0.08 0.09 2 3 4 .value

Trace plots geom_smooth (method=”loess”) + labs (color=”chain”) df_mcmc %>% filter (.iteration %% 10 == 0) %>% 8 facet_grid (parameter~.chain, scale=”free_y”) + geom_line (alpha=0.5) + ggplot ( aes (x=.iteration, y=.value, color= as.factor (.chain))) + 1 2 3 4 0.0 −0.5 beta[1] −1.0 −1.5 −2.0 chain 0.08 1 .value beta[2] 0.07 2 0.06 3 0.05 4 4.0 3.5 sigma2 3.0 2.5 2.0 1.5 0 1000 2000 3000 4000 50000 1000 2000 3000 4000 50000 1000 2000 3000 4000 50000 1000 2000 3000 4000 5000 .iteration

Credible Intervals 0.8 mean hdi 0.8 mean 0.0751 0.0613 0.0685 3 7 beta[2] ## hdi 0.0748 8 beta[2] 0.0612 0.0683 2 6 beta[2] ## hdi 0.8 mean 0.0748 0.0614 ## 4 1 0.8 mean ## # ... with 14 more rows hdi 0.8 mean 2.78 1.92 2.39 2 ## 10 sigma2 hdi 2.78 0.0684 1.91 2.39 1 9 sigma2 ## hdi 0.8 mean 0.0752 0.0613 0.0680 5 beta[2] df_ci = tidybayes:: mean_hdi (df_mcmc, .value, .width= c (0.8, 0.95)) <chr> 1 -1.01 1 beta[1] ## <chr> <dbl> <chr> <dbl> <dbl> <dbl> <int> ## -0.635 .upper .width .point .interval .lower .value parameter .chain ## parameter [3] ## # Groups: ## # A tibble: 24 x 8 df_ci -1.42 0.8 mean ## -0.622 hdi 0.8 mean -0.606 -1.42 4 -1.03 4 beta[1] ## hdi 0.8 mean -1.44 hdi 3 -1.03 3 beta[1] ## hdi 0.8 mean -0.627 -1.42 2 -1.03 2 beta[1] ## 9

Aside - mean_qi vs mean_hdi These differ in the use of the quantile interval vs. the highest-density interval. 10

Aside - mean_qi vs mean_hdi These differ in the use of the quantile interval vs. the highest-density 10 interval. dist_1 dist_2 4 3 hpd 2 1 prob density 0 0.5 0.95 4 3 qi 2 1 0 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 2.0 x

Caterpillar Plots ylim (0.5,4.5) df_ci %>% 11 tidybayes:: geom_pointintervalh () + ggplot ( aes (x=.value, y=.chain, color= as.factor (.chain))) + facet_grid (parameter~.) + 4 beta[1] 3 2 1 4 .chain 3 beta[2] 2 1 4 sigma2 3 2 1 −1 0 1 2 3 .value

Prediction

Revised model for(j in 1:2){ }” sigma2 = 1/tau tau ~ dgamma(1, 1) # Prior for sigma / tau2 } beta[j] ~ dnorm(0,1/100) # Prior for beta model_pred = } y_pred[i] ~ dnorm(mu[i], tau) y[i] ~ dnorm(mu[i], tau) mu[i] = beta[1] + beta[2]*x[i] for(i in 1:length(y)){ # Likelihood ”model{ 12

Revised fitting n_burn = 1000; n_iter = 5000 m = rjags:: jags.model ( textConnection (model_pred), data=d, quiet=TRUE, n.chains = 1 ) update (m, n.iter=n_burn, progress.bar=”none”) pred = rjags:: coda.samples ( n.iter=n_iter, progress.bar=”none” ) 13 m, variable.names= c (”beta”,”sigma2”,”mu”,”y_pred”,”y”,”x”),

Predictions 6 7 7 1 7 ## 1 -0.696 -2.54 0.172 -3.24 1 6 1 -3.24 6 ## 1 -0.718 -2.52 0.396 -3.24 1 5 5 1 5 1 -1.24 1 -0.797 -2.44 1 -0.727 -2.51 9 ## # ... with 499,990 more rows 1 -0.573 -2.67 1 -0.503 -3.24 10 10 1 ## 10 1 -0.863 -2.38 1 -0.829 -3.24 9 ## 1 9 ## 1 -0.736 -2.50 1 -0.251 -3.24 8 8 1 8 ## -3.24 df_pred = tidybayes:: spread_draws (pred, y_pred[i], y[i], x[i], mu[i]) %>% mu resid 1 1 1 ## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> <int> <int> ## x 1 -1.67 y i y_pred .chain .iteration .draw ## i [100] ## # Groups: ## # A tibble: 500,000 x 9 df_pred mutate (resid = y - mu) 1 -3.24 1 -3.26 1 4 4 1 4 ## 1 -0.800 -2.44 1 -0.554 -3.24 3 3 3 1 -0.825 -2.42 ## 1 -0.984 -2.26 0.340 -3.24 1 2 2 1 2 ## 14

𝜈 vs 𝑧 𝑞𝑠𝑓𝑒 15 i=1 i=20 i=40 2.0 1.5 1.0 0.5 param 0.0 density mu i=60 i=80 i=100 y_pred 2.0 1.5 1.0 0.5 0.0 −8 −4 0 4 8 12−8 −4 0 4 8 12−8 −4 0 4 8 12 y

Predictions geom_point (data=d, aes (y=y)) df_pred %>% ungroup () %>% filter (.iteration %% 10 == 0) %>% 16 geom_line ( aes (y=mu, group=.iteration), color=”red”, alpha=0.05) + ggplot ( aes (x=x)) + 7.5 5.0 2.5 mu 0.0 −2.5 0 25 50 75 100 x

Posterior distribution ( 𝜈 ) geom_point (data=d, aes (y=y)) df_pred %>% ungroup () %>% 17 ggplot ( aes (x=x)) + tidybayes:: stat_lineribbon ( aes (y=mu), alpha=0.5) + 7.5 5.0 0.95 2.5 mu 0.8 0.5 0.0 −2.5 0 25 50 75 100 x

Posterior predictive distribution ( 𝑧 𝑞𝑠𝑓𝑒 ) geom_point (data=d, aes (y=y)) df_pred %>% ungroup () %>% 18 tidybayes:: stat_lineribbon ( aes (y=y_pred), alpha=0.5) + ggplot ( aes (x=x)) + 5 0.95 y_pred 0.8 0.5 0 0 25 50 75 100 x

Residual plot df_pred %>% ungroup () %>% ggplot ( aes (x=x, y=resid)) + geom_boxplot ( aes (group=x), outlier.alpha = 0.2) 19 4 2 resid 0 −2 0 25 50 75 100 x

Model Evaluation

( ̂ (𝑍 𝑗 − 𝑍 𝑗 − (𝑍 𝑗 − 𝑗=1 ( ̂ Var ( ̂ 𝑍 𝑗 − 𝑆 2 = 𝑇𝑇 𝑛𝑝𝑒𝑓𝑚 2 = Var (𝐙) = Cor (𝐙, 𝑗=1 (𝑍 𝑗 − 𝑇𝑇 𝑢𝑝𝑢𝑏𝑚 = ∑ 𝑜 ̄ ∑ 𝑍 ) 2 2 𝑜 ̄ 𝑍 ) 𝐙) ̂ 𝐙) 2 Error Model assessment? 𝑍 𝑗 ) 𝑜 which gives us the variability in 𝑍 explained by our model. Quick review: 𝑜 ∑ 𝑗=1 ̄ 𝑍 ) 2 Total = ∑ ̂ 𝑗=1 ̄ 𝑍 ) 2 Model + 𝑜 ∑ 𝑗=1 If we think back to our first regression class, one common option is 𝑆 2 20

Lecture 2 Diagnostics and Model Evaluation Colin Rundel 10/05/2018 - PowerPoint PPT Presentation

Lecture 2 Diagnostics and Model Evaluation Colin Rundel 10/05/2018 1 Some more linear models Linear model and data ggplot (d, aes (x=x,y=y)) + 2 geom_smooth (method=lm, color=blue, se = FALSE) geom_line () + 7.5 5.0 2.5 y 0.0

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Certified Global Optimization using Max-Plus based Templates Joint Work with B. Werner, S.

CS525u Perceptual Quality Multimedia Computing Network Issues The Science (or

Video-helper Abhishek Mangla, Faramarz Munshi, Yuzhou Guo Motivation & Goal Goal : Video

Integrating STEM & Student Success: Leading Intrusive Academic Advising Through an

Structured Prediction Basics Graham Neubig Site https://phontron.com/class/nn4nlp2019/ A

Leakage-Resilient Cryptography from Puncturable Primitives and Obfuscation ASIACRYPT 2018 Dec.

Foundation of Cryptography (0368-4162-01), Lecture 2 Pseudorandom Generators Iftach Haitner, Tel

Supplemental NDA/BLA Submissions in Oncology Considerations for Summary Review of Supplemental

Lecture 2 Diagnostics and Model Evaluation Colin Rundel 10/05/2018 - PowerPoint PPT Presentation

Lecture 2 Diagnostics and Model Evaluation Colin Rundel 10/05/2018 1 Some more linear models Linear model and data ggplot (d, aes (x=x,y=y)) + 2 geom_smooth (method=lm, color=blue, se = FALSE) geom_line () + 7.5 5.0 2.5 y 0.0

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Certified Global Optimization using Max-Plus based Templates Joint Work with B. Werner, S.

CS525u Perceptual Quality Multimedia Computing Network Issues The Science (or

Video-helper Abhishek Mangla, Faramarz Munshi, Yuzhou Guo Motivation &amp; Goal Goal : Video

Integrating STEM &amp; Student Success: Leading Intrusive Academic Advising Through an

Structured Prediction Basics Graham Neubig Site https://phontron.com/class/nn4nlp2019/ A

Leakage-Resilient Cryptography from Puncturable Primitives and Obfuscation ASIACRYPT 2018 Dec.

Foundation of Cryptography (0368-4162-01), Lecture 2 Pseudorandom Generators Iftach Haitner, Tel

Supplemental NDA/BLA Submissions in Oncology Considerations for Summary Review of Supplemental

Video-helper Abhishek Mangla, Faramarz Munshi, Yuzhou Guo Motivation & Goal Goal : Video

Integrating STEM & Student Success: Leading Intrusive Academic Advising Through an