Lecture 12: Effect modification, and confounding in logistic - PowerPoint PPT Presentation

Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007

Today n Categorical predictor n create dummy variables n just like for linear regression n Comparing nested models that differ by two or more variables for logistic regression n X 2 Test of Deviance n analogous to the F test in linear regression n Effect Modification and Confounding

Example n Mean SAT scores were compared for the 50 US states. The goal of the study was to compare overall SAT scores using state-wide predictors such as per- pupil expenditures and average teachers’ salary. The investigators also considered the proportion of student eligible to take the SAT who actually took the examination.

Variables n Outcome n Total SAT score [sat_low] n 1= low, 0= high n Primary predictor n Average expenditures per pupil [expen] in thousands n Continuous, range: 3.65-9.77, mean: 5.9

Variables n Secondary predictors n Percent of pupils taking the SAT, in quartiles n percent1 – lowest quartile n percent2 – 2 nd quartile n percent3 – 3 rd quartile n percent4 – highest quartile n Mean teacher salary in thousands, in quartiles n salary1 – lowest quartile n salary2 – 2 nd quartile n salary3 – 3 rd quartile n salary4 – highest quartile

Modifications to variables n Expenditures: continuous, doesn’t include 0: center at $5,000 per pupil n Percent: four dummy variables for four categories; must exclude one category to create a reference group n Salary: four dummy variables for four categories; must exclude one category to create a reference group

Plan n Assess primary relationship n Add each secondary predictor separately n Determine which secondary predictor is more statistically significant n Add other secondary predictor to model with “better” secondary predictor

The X 2 Test of Deviance n We would like to consider adding salary quartiles to our model n We want to compare parent model to an extended model, which differs by the three dummy variables for the four salary quartiles. n The X 2 test of deviance compares nested models n We use it for nested models that differ by two or more variables because the Wald test cannot be used in that situation

1. Get the Log Likelihood from both models n The log likelihood is shown in the upper right corner of the logit or logistic output n Null model: LL = -28.94 n Extended model B: LL = -28.25

2. Find the deviance for each model Deviance = -2 x (log likelihood) n Deviance is analogous to residual sums of squares n (RSS) in linear regression; it measures the deviation still available in the model n A saturated model is one in which every Y is perfectly predicted Null model: n n Deviance = -2(-28.94) = 57.88 Extended model B: n n Deviance = -2(-28.25) = 56.50

3. Find the change in deviance between the nested models n Null model: Deviance = 57.88 n Extended model B: Deviance = 56.50 n Change in deviance = deviance null – deviance extended = 57.88 - 56.50 = 1.38

4. Evaluate the change in deviance n The change in deviance from the parent model to the nested model is an observed Chi-square statistic n df = # of variables added n H 0 : all new � ’s are 0 in the population n or H 0 : the parent model is better

4. Evaluate the change in deviance n H 0 : After adjusting for per-pupil expenditures, teachers’ salary is not an important predictor of SAT score. n X 2 obs = 1.38 n df = 3 n with 3 df and � = 0.05, X 2 cr is 7.81 n Fail to reject H 0

Notes about deviance test n The deviance test gives us a framework in which to add several predictors to a model simultaneously n Can only handle nested models n Analogous to F-test for linear regression n Also known as a "likelihood ratio test"

Conclusions n per-pupil expenditure is associated with SAT score n After adjusting for per-pupil expenditure n Percent of students taking the SAT is statistically significant n Teachers’ salary is not statistically significant n Is salary significant after adjusting for both expenditure and percent?

Possible ways to improve this model: n Add an interaction variable n Does the effect of expenditures on odds of low mean SAT score vary between states with low and high percentages of students taking the SAT? n Add a spline n Does the effect of expenditures on odds of low mean SAT score vary over the level of expenditures?

Effect Modification in Logistic Regression Heart Disease Smoking and Coffee

Effect modification n Just like with linear regression, we may want to allow different relationships between the primary predictor and outcome across levels of another covariate n Can model such relationships by fitting interaction terms n Modelling effect modification will require dealing with two or more covariates

Logistic models with two covariates β 0 + β 1 X 1 + β 2 X 2 n logit( p) = Then: logit( p | X 1 = X 1 + 1,X 2 = X 2 ) = β 0 + β 1 (X 1 + 1)+ β 2 X 2 ,X 2 = X 2 ) = β 0 + β 1 (X 1 )+ β 2 X 2 logit( p | X 1 = X 1 ∆ in log-odds β 1 = n β 1 is the change in log-odds for a 1 unit change in X 1 provided X 2 is held constant.

Interpretation in General  = +  odds(Y 1 | X 1, X ) = β 1   n Also: log 1 2   =   odds(Y 1 | X , X )   1 2 = exp( β 1 ) !! n And: OR n exp( β 1 ) is the Multiplicative change in odds for a 1 unit increase in X 1 provided X 2 is held constant . n The result is similar for X 2

Risk of CHD from Smoking and Coffee n = 151

Study Information n Study Facts: n Case-Control study n 40-50 year-old males previously in good health n Study questions: n Is smoking and/or coffee related to an increased odds of CHD? n Is the association of coffee with CHD higher among smokers? That is, is smoking an effect modifier of the coffee-CHD associations?

Fraction with CHD by smoking and coffee

Pooled data, ignoring smoking Odds ratio = (40 * 50) / (26 * 35) = 2.2 95% CI = (1.14, 4.24)

Among Non-Smokers Odds ratio = (15 * 42) / (15 * 21) = 2.0 95% CI = (0.82, 4.9)

Among Smokers Odds ratio = (25 * 8) / (11 * 14) = 1.3 95% CI = (.42, 4.0)

Plot Odds Ratios and 95% CIs

Define Variables n Y i = 1 if CHD case, 0 if control n COF i = 1 if Coffee Drinker, 0 if not n SMK i = 1 if Smoker, 0 if not n p i = Pr (Y i = 1) n n i = Number observed at pattern i of Xs

Logistic Regression Model n Y i are from a Binomial (n i , p i ) distribution n Yi are independent n log odds (Y i = 1) (or, logit( Y i = 1) ) is a function of n Coffee n Smoking n and coffee x smoking interaction

Logistic Regression Model   p   = β + β + β + β i log COF SMK COF SMK   − 0 1 2 3 i i i i  1  p i n Which implies that Pr(Y i = 1) is the logistic function + + β + β � � X X X X e 0 1 1 2 2 3 1 2 i i i i = p + + β + β � � i + X X X X 0 1 1 2 2 3 1 2 i i i i 1 e

Probabilities of CHD as a function of coffee and smoking history Smoke No Yes Coffee + β e � 0 2 � e 0 No + β � + 0 2 � + 0 1 e 1 e + + β + β � � + � � e 0 1 2 3 e 0 1 Yes + � � + + β + β � � + + 0 1 0 1 2 3 1 e 1 e

Among Non-Smokers: β + β e 0 1 + β + β 1 e 0 1 1 ( ) β + β Odds Case | Coffee + 1 e = 0 1 ( ) β Odds Case | No Coffee e 0 β + 1 e 0 1 + β 1 e 0 β + β e 0 1 = β e 0 β = e 1 = Odds Ratio

Interpretations n exp{ � 1 } : odds ratio of being a CHD case for coffee drinkers -vs- non-drinkers among non-smokers n exp{ � 1 �� 3 } : odds ratio of being a CHD case for coffee drinkers -vs- non- drinkers among smokers

Interpretations n exp{ � 2 } : odds ratio of being a CHD case for smokers -vs- non-smokers among non-coffee drinkers n exp{ � 2 �� 3 } : odds ratio of being case for smokers -vs- non-smokers among coffee drinkers

Interpretations β e 0 fraction of cases among non- β n + 0 1 e smoking non-coffee drinking individuals in the sample (determined by sampling plan) n exp{ � 3 } : ratio of odds ratios

exp{ � 3 } Interpretations n exp{ � 3 } : factor by which odds ratio of being a CHD case for coffee drinkers -vs- nondrinkers is multiplied for smokers as compared to non-smokers or n exp{ � 3 } : factor by which odds ratio of being a CHD case for smokers -vs- non-smokers is multiplied for coffee drinkers as compared to non-coffee drinkers

Some Special Cases n Given   = Pr( 1 ) Y   = β + β + β + β log * COF SMK COF SMK   = 0 1 2 3   Pr( 0 ) Y n If � 1 = � 2 = � 3 = 0 n Neither smoking no coffee drinking is associated with increased risk of CHD

Some Special Cases n Given   = Pr( 1 ) Y   = β + β + β + β log * COF SMK COF SMK   = 0 1 2 3   Pr( 0 ) Y n If � 1 = � 3 = 0 n Smoking, but not coffee drinking, is associated with increased risk of CHD

Lecture 12: Effect modification, and confounding in logistic - PowerPoint PPT Presentation

Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today n Categorical predictor n create dummy variables n just like for linear regression n Comparing nested models that

13 Jan, 2011 Statistical Literacy: Confounding UTSA Confounding 2011 1 2011 2 Statistical

Confounding variables EX P ERIMEN TAL DES IGN IN P YTH ON Luke Hayden Instructor Confounding

STAT 113 Sampling, Randomization and Confounding Colin Reimer Dawson Oberlin College September

STAT 113 Sampling, Randomization and Confounding Colin Reimer Dawson Oberlin College August 31

V0G 7/21/2016 IASE 2B: Teaching Confounding V0 2016 IASE 1 V0 2016 IASE-2 2 B: Teaching

V1 August 1, 2016 Confounding: A Big Idea V1 2015 StatChat2 1 V1 2015 StatChat2 2 2

Fields modification Fields modification in high Al or In content III- -nitrides nitrides in

Fiber Modification Kraft-TMP Bonding Fiber Modification: Fiber Bond Enhancement Background:

Protection of metals from corrosion. 1.Modification of environment. 2.Modification of

8/21/2015 Reasonable Modification of Policy: New Final Rule August 2015 John Day Program

Modification of branched Rough Paths Nikolas Tapia, joint work w. Lorenzo Zambotti (Paris) 23

Spin Hall Effect and Experimental Observation 1701110147@pku.edu.cn 2017.12.15

Quantum Hall effect effect Quantum Hall integer integer Hall bar geometry classical quantum

Holger Langkabel Introduction: Confounding in Non-Randomized Settings Assessing Balance The

Identification of Causal Effect in the Presence of Selection Bias Juan D. Correa Jin Tian Elias

On the Causal Interpretation of Race in Regressions Adjusting for Confounding and Mediating

Computing and using the deviance with classification trees Gilbert Ritschard Dept of

Re-Placing Research in the Literature Classroom Aaron Brenner Robin Kear Amy Twyning

Our Approach: J. N. Darby and Theological Method Outline Overview: Materials, History, and

The Psychiatrist Experience Shabana Khan, MD Assistant Professor of Psychiatry University of

Mixed models in R using the lme4 package Part 4: Inference based on profiled deviance Douglas

Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled

Session 07 GLM extensions The Negative Binomial distribution Probability function (

Analysis of Count Data A Business Perspective George J. Hurley Sr. Research Manager The