Exploring and Modeling Dichotomous Outcomes Brandon LeBeau - - PowerPoint PPT Presentation

exploring and modeling dichotomous outcomes
SMART_READER_LITE
LIVE PREVIEW

Exploring and Modeling Dichotomous Outcomes Brandon LeBeau - - PowerPoint PPT Presentation

DataCamp Longitudinal Analysis in R LONGITUDINAL ANALYSIS IN R Exploring and Modeling Dichotomous Outcomes Brandon LeBeau Assistant Professor DataCamp Longitudinal Analysis in R Dichotomous outcomes Dichotomous or binary outcomes take two


slide-1
SLIDE 1

DataCamp Longitudinal Analysis in R

Exploring and Modeling Dichotomous Outcomes

LONGITUDINAL ANALYSIS IN R

Brandon LeBeau

Assistant Professor

slide-2
SLIDE 2

DataCamp Longitudinal Analysis in R

Dichotomous outcomes

Dichotomous or binary outcomes take two values Examples: 0 = No, 1 = Yes 0 = Not Present, 1 = Present 0 = Not Proficient, 1 = Proficient 0 = No symptoms, 1 = Symptoms

slide-3
SLIDE 3

DataCamp Longitudinal Analysis in R

Exploring data with dichotomous outcomes

library(HSAUR2) head(toenail, n = 10) patientID outcome treatment time visit 1 1 moderate or severe terbinafine 0.0000000 1 2 1 moderate or severe terbinafine 0.8571429 2 3 1 moderate or severe terbinafine 3.5357140 3 4 1 none or mild terbinafine 4.5357140 4 5 1 none or mild terbinafine 7.5357140 5 6 1 none or mild terbinafine 10.0357100 6 7 1 none or mild terbinafine 13.0714300 7 8 2 none or mild itraconazole 0.0000000 1 9 2 none or mild itraconazole 0.9642857 2 10 2 moderate or severe itraconazole 2.0000000 3

slide-4
SLIDE 4

DataCamp Longitudinal Analysis in R

Generalized linear mixed model (GLMM)

Explores the log-odds of success Success refers to the outcome coded as 1 Continuous models are not appropriate due to predictions often being out of bounds due to mean and variance being related

slide-5
SLIDE 5

DataCamp Longitudinal Analysis in R

Changes in the outcome variable over time

toenail <- toenail %>% mutate(outcome_dich = ifelse(outcome == "none or mild", 1, 0), visit_0 = visit - 1) toenail %>% group_by(visit_0) %>% summarise(prop_outcome = mean(outcome_dich), num = n()) visit_0 prop_outcome num <dbl> <dbl> <int> 1 0 0.629 294 2 1 0.663 288 3 2 0.703 283 4 3 0.787 272 5 4 0.916 263 6 5 0.926 244 7 6 0.924 264

slide-6
SLIDE 6

DataCamp Longitudinal Analysis in R

Fitting GLMM with lme4

Fitting GLMMs with lme4 are similar to previous chapters Two additions: use glmer instead of lmer specify family = binomial argument

toe_output <- glmer(outcome_dich ~ 1 + visit_0 + treatment + ( 1 | patientID), data = toenail, family = binomial) summary(toe_output)

slide-7
SLIDE 7

DataCamp Longitudinal Analysis in R

GLMM output

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod] Family: binomial ( logit ) Formula: outcome_dich ~ 1 + visit_0 + treatment + (1 | patientID) Data: toenail AIC BIC logLik deviance df.resid 1260.3 1282.6 -626.2 1252.3 1904 Random effects: Groups Name Variance Std.Dev. patientID (Intercept) 21.97 4.687 Number of obs: 1908, groups: patientID, 294 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.96335 0.81901 2.397 0.0165 * visit_0 0.91153 0.07433 12.263 <2e-16 *** treatmentterbinafine 0.69688 0.68696 1.014 0.3104

slide-8
SLIDE 8

DataCamp Longitudinal Analysis in R

Time to practice!

LONGITUDINAL ANALYSIS IN R

slide-9
SLIDE 9

DataCamp Longitudinal Analysis in R

Generalized Estimating Equations (GEE)

LONGITUDINAL ANALYSIS IN R

Brandon LeBeau

Assistant Professor

slide-10
SLIDE 10

DataCamp Longitudinal Analysis in R

Introduction to geepack

Let's fit a first GEE model using the geepack package

geeglm() is the model fitting function

toenail <- toenail %>% mutate(outcome_dich = ifelse(outcome == "none or mild", 1, 0), visit_0 = visit - 1) # Fit GEE model gee_toe <- geeglm(outcome_dich ~ 1 + visit_0, data = toenail, id = patientID, family = binomial, scale.fix = TRUE) # Extract model summary summary(gee_toe)

slide-11
SLIDE 11

DataCamp Longitudinal Analysis in R

geeglm output

Call: geeglm(formula = outcome_dich ~ 1 + visit_0, family = binomial, data = toenail, id = patientID, scale.fix = TRUE) Coefficients: Estimate Std.err Wald Pr(>|W|) (Intercept) 0.35522 0.13122 7.328 0.00679 ** visit_0 0.38319 0.03728 105.673 < 2e-16 ***

slide-12
SLIDE 12

DataCamp Longitudinal Analysis in R

Specifying working correlations

An optional argument, corstr is used to control the working correlation matrix Accounts for the dependency due to repeated measures The default is independence

# Fit GEE model gee_toe <- geeglm(outcome_dich ~ 1 + visit_0, data = toenail, id = patientID, family = binomial, corstr = 'exchangeable', scale.fix = TRUE) # Extract model summary summary(gee_toe)

slide-13
SLIDE 13

DataCamp Longitudinal Analysis in R

GEE exchangeable output

Here is the exchangeable output

Call: geeglm(formula = outcome_dich ~ 1 + visit_0, family = binomial, data = toenail, id = patientID, corstr = "exchangeable", scale.fix = TRUE) Coefficients: Estimate Std.err Wald Pr(>|W|) (Intercept) 0.3332 0.1345 6.14 0.013 * visit_0 0.3797 0.0363 109.29 <2e-16 ***

slide-14
SLIDE 14

DataCamp Longitudinal Analysis in R

Other working correlation structures

corstr = "ar1"

Example: correlation = 0.5

corstr = "unstructured"

[,1] [,2] [,3] [,4] [,5] [1,] 1.0000 0.500 0.25 0.125 0.0625 [2,] 0.5000 1.000 0.50 0.250 0.1250 [3,] 0.2500 0.500 1.00 0.500 0.2500 [4,] 0.1250 0.250 0.50 1.000 0.5000 [5,] 0.0625 0.125 0.25 0.500 1.0000 [,1] [,2] [,3] [,4] [,5] [1,] 1.000 0.559 0.492 0.363 0.082 [2,] 0.559 1.000 0.398 0.250 0.139 [3,] 0.492 0.590 1.000 0.071 0.209 [4,] 0.398 0.493 0.629 1.000 0.166 [5,] 0.363 0.313 0.426 0.604 1.000

slide-15
SLIDE 15

DataCamp Longitudinal Analysis in R

Try GEE models!

LONGITUDINAL ANALYSIS IN R

slide-16
SLIDE 16

DataCamp Longitudinal Analysis in R

Model Selection

LONGITUDINAL ANALYSIS IN R

Brandon LeBeau

Assistant Professor

slide-17
SLIDE 17

DataCamp Longitudinal Analysis in R

QIC

QIC = quasi-likelihood under the independence model criterion GEE does not use maximum likelihood estimation like GLMM QIC needed for GEE MuMIn package calculates this statistic

library(MuMIn) toenail <- toenail %>% mutate(outcome_dich = ifelse(outcome == "none or mild", 1, 0), visit_0 = visit - 1) # Fit GEE model gee_toe <- geeglm(outcome_dich ~ 1 + visit_0, data = toenail, id = patientID, family = binomial, scale.fix = TRUE) QIC(gee_toe) QIC 1828.552

slide-18
SLIDE 18

DataCamp Longitudinal Analysis in R

Evaluating working correlation

QIC can help select working correlation matrix

# Fit GEE model gee_ind <- geeglm(outcome_dich ~ 1 + visit_0, data = toenail, id = patientID, family = binomial, scale.fix = TRUE) gee_exch <- geeglm(outcome_dich ~ 1 + visit_0, data = toenail, id = patientID, family = binomial, scale.fix = TRUE, corstr = 'exchangeable') gee_ar1 <- geeglm(outcome_dich ~ 1 + visit_0, data = toenail, id = patientID, family = binomial, scale.fix = TRUE, corstr = 'ar1') QIC(gee_ind, gee_exch, gee_ar1) QIC gee_ind 1828.552 gee_exch 1828.564 gee_ar1 1827.805

slide-19
SLIDE 19

DataCamp Longitudinal Analysis in R

Model selection GLMM

aictab() function from AICcmodavg package can be used for GLMM

library(AICcmodavg) toe_baseline <- glmer(outcome_dich ~ 1 + visit_0 + ( 1 | patientID), data = toenail, family = binomial) toe_output <- glmer(outcome_dich ~ 1 + visit_0 + treatment + ( 1 | patientID), data = toenail, family = binomial) aictab(list(toe_baseline, toe_output), c("no treatment", "treatement")) Model selection based on AICc: K AICc Delta_AICc AICcWt Cum.Wt LL no treatment 3 1259.40 0.00 0.62 0.62 -626.69 treatement 4 1260.36 0.97 0.38 1.00 -626.17

slide-20
SLIDE 20

DataCamp Longitudinal Analysis in R

Time to practice model selection!

LONGITUDINAL ANALYSIS IN R

slide-21
SLIDE 21

DataCamp Longitudinal Analysis in R

Interpreting and Visualizing Model Results

LONGITUDINAL ANALYSIS IN R

Brandon LeBeau

Assistant Professor

slide-22
SLIDE 22

DataCamp Longitudinal Analysis in R

Visualize GLMM

Generate predicted values with predict() function

toe_output <- glmer(outcome_dich ~ 1 + visit_0 + treatment + ( 1 | patientID), data = toenail, family = binomial) toenail <- toenail %>% mutate(pred_values = predict(toe_output)) ggplot(toenail, aes(x = visit_0, y = pred_values)) + geom_line(aes(group = patientID), linetype = 2) + theme_bw(base_size = 16) + xlab("Visit Number") + ylab("Predicted Values")

slide-23
SLIDE 23

DataCamp Longitudinal Analysis in R

slide-24
SLIDE 24

DataCamp Longitudinal Analysis in R

Visualize GLMM - probabilities

Often the probability metric is more intuitive

predict() function with argument type = "response" will give probabilities

toenail <- toenail %>% mutate(pred_values = predict(toe_output, type = "response")) ggplot(toenail, aes(x = visit_0, y = pred_values)) + geom_line(aes(group = patientID), linetype = 2) + theme_bw(base_size = 16) + xlab("Visit Number") + ylab("Prob of none or mild separation")

slide-25
SLIDE 25

DataCamp Longitudinal Analysis in R

slide-26
SLIDE 26

DataCamp Longitudinal Analysis in R

Visualize GEE

predict() can again be used here as with GLMMs

gee_toe <- geeglm(outcome_dich ~ 1 + visit_0 + treatment, data = toenail, id = patientID, family = binomial, corstr = 'exchangeable', scale.fix = TRUE) toenail_gee <- toenail %>% mutate(pred_gee = predict(gee_toe, type = "response")) ggplot(toenail_gee, aes(x = visit_0, y = pred_gee)) + geom_line(aes(color = treatment)) + theme_bw(base_size = 16) + xlab("Visit Number") + ylab("Probability of none or mild separation")

slide-27
SLIDE 27

DataCamp Longitudinal Analysis in R

slide-28
SLIDE 28

DataCamp Longitudinal Analysis in R

Compare GLMM and GEE

toenail_glmm <- toenail %>% group_by(visit_0, treatment) %>% summarise(prob = mean(pred_values)) toenail_gee <- toenail_gee %>% group_by(visit_0, treatment) %>% summarise(prob = mean(pred_values)) toenail_agg = bind_rows( mutate(toenail_glmm, model = "GLMM"), mutate(toenail_gee, model = "GEE") ) ggplot(toenail_agg, aes(x = visit_0, y = prob)) + geom_line(aes(color = treatment, linetype = model), size = 1) + theme_bw(base_size = 16) + xlab("Visit Number") + ylab("Prob of non or mild separation")

slide-29
SLIDE 29

DataCamp Longitudinal Analysis in R

slide-30
SLIDE 30

DataCamp Longitudinal Analysis in R

Let's practice!

LONGITUDINAL ANALYSIS IN R