lecture 11 interpreting logistic regression models
play

Lecture 11: Interpreting logistic regression models Ani Manichaikul - PowerPoint PPT Presentation

Lecture 11: Interpreting logistic regression models Ani Manichaikul amanicha@jhsph.edu 3 May 2007 Logistic regression n Framework and ideas of linear modelling similar to linear regression n Still have a systematic and probabilistic part to


  1. Lecture 11: Interpreting logistic regression models Ani Manichaikul amanicha@jhsph.edu 3 May 2007

  2. Logistic regression n Framework and ideas of linear modelling similar to linear regression n Still have a systematic and probabilistic part to any model n Coefficients have a new interpretation, based on log(odds) and log(odds ratios)

  3. The logit function n In logistic regression, we are always modelling the outcome log(p/(1-p)) n We define the function: logit(p)= log(p/(1-p)) n We often use the name logit for convenience

  4. Example: Public health graduate students n 323 graduate students in introductory biostatistics took a health survey. Current smoking status was gathered, which we will predict with gender. n Associating demographics with smoking is vital to planning public health programs. n Information was also collected on age, exercise, and history of smoking; potential confounders of the association between gender and current smoking. n Today, we will focus only on the association between gender and current smoking status.

  5. Coding n Outcome: n smoking = 1 for current smokers 0 for current nonsmokers n Primary predictor: n gender = 1 for men 0 for women

  6. Recall n In linear regression, if we had only one binary X like gender, we would be predicting two means: n � 0 – the mean outcome when X= 0 n � 0 + � 1 – the mean outcome when X= 1 n � 1 – the difference in mean outcome when X= 1 vs. when X= 0

  7. Output Logit estimates Number of obs = 323 LR chi2(1) = 4.46 Prob > chi2 = 0.0348 Log likelihood = -75.469757 Pseudo R2 = 0.0287 ------------------------------------------------------------------------------ smoke | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gender | .967966 .4547931 2.13 0.033 .0765879 1.859344 _cons | -3.058707 .3235656 -9.45 0.000 -3.692884 -2.42453 ------------------------------------------------------------------------------     p p ( ) ( )   = + ⇒   = + � � ln Gender ln - 3 . 1 1 . 0 Gender     − − 0 1     1 p 1 p

  8. Predictions by gender n For women, gender= 0:   p ( )   = − + = − ln 3 . 1 1 . 0 0 3 . 1   −   1 p n For men, gender= 1:   p ( )   = − + = − ln 3 . 1 1 . 0 1 2 . 1   −   1 p n � 1 is the difference: � 1 is the change in log odds

  9. Interpretation 1: log(odds) n � 0 : the log odds of smoking for women n � 0 �� 1 : the log odds of smoking for men n � 1 : the difference in the log odds of smoking for men as compared to women

  10. But, we really wanted to predict P(Y= 1), not the log odds… n We can start to “untransform” the equation ( ) = = b ln a , then e if b a n n For women, X= 0: ln(odds)= � 0 �� 1 (0) = � 0 = = = � 0 -3.1 o dds of smoking for women e e 0 . 05 n For men, X= 1: ln(odds)= � 0 �� 1 (1) + + = = = = � � -3.1 1.0 -2.1 o dds of smoking for men e e e 0 . 12 0 1

  11. Interpretation 2: odds � the odds of smoking for women e : 0 n (when X= 0) � + 0 � the odds of smoking for men e : 1 n (when X= 1) n In the past, we’ve compared two sets of odds by dividing to find the odds ratio (OR)

  12. Comparing odds n If we subtract the log odds, mathematically that’s equivalent to dividing inside the log: n ln(a) – ln(b) = ln(a/b) n So, if + + = = = � � -3.1 1.0 -2.1 is the odds when X= 1, and e e e 0 . 12 0 1 n � 0 = = -3.1 is the odds when X= 0, then e e 0 . 05 n n we want to divide them in order to compare + � � odds for men e 0 . 12 0 1 = = = = Odds Ratio 2 . 4 � odds for women e 0 . 05 0

  13. Interpreting the odds ratio n The odds of smoking is about 2 ½ times greater for men than for women. n Based on this study, smoking cessation programs should be targeted toward men, while perhaps smoking prevention programs should be targeted toward women.

  14. Useful math n We can usually simplify an equation like this + � � e 0 1 = Odds Ratio � e 0 ( ) ( ) + = � � � - e 0 1 0 = � e 1 a e = − a b e because b e

  15. odds and odds ratio � e : the odds when X= 0 0 n � + 0 � e : the odds when X= 1 1 n + � � e 0 1 = the odds ratio � e n 1 � e 0 comparing the odds when X= 1 vs. X= 0

  16. Note on the computer output � e n R does not give in the output 0 n This is because logistic regression is so often used for case-control studies n the odds aren’t appropriate for a case-control study, because the investigators determine the ratio of cases to controls n the odds ratio is appropriate regardless of whether exposure or outcome was gathered first (by invariance of the odds ratio)

  17. Types of interpretation n � 0 �� 1 = ln(odds) (for X= 1) n � 1 = difference in log odds + � 0 � e 1 = odds (for X= 1) n � e = odds ratio 1 n n But we started with P(Y= 1) n Can we find that?

  18. More useful math p robability = n odds − 1 p robability odds = p robability n + 1 odds + � � e ( ) 0 1 = = so p robability for X 1 n + + � � 1 e 0 1

  19. Finding the probability Find the log odds: For X= 0: ln(odds) = � 0 For X= 1: ln(odds) = � 0 + � 1 Find odds: � e 0 For X= 0: odds = + � 0 � e 1 For X= 1: odds =

  20. Finding the probability Transform odds into probability: odds = p + 1 odds � e 0 = = For X 0 : p robability + � 1 e 0 + � � e 0 1 = = For X 1 : p robability + + � � 1 e 0 1

  21. We could even go one step further p = 1 Re lative Risk (RR) n p 2 + � � e ( ) 0 1 = = For X 1 : P smoke | male n + + � � 1 e 0 1 � e ( ) 0 = = For X 0 : P smoke | female + � 1 e 0   + � � e 0 1     + + � �  1 e  p 0 1 = 1 R elative Risk for Men vs. Women : n   � p e 0   2   n no way to simplify + �  1 e  0

  22. Remember to consider study design n We always can calculate the relative risk n The relative risk is not appropriate for case-control studies n Again, because the investigators decide the number of cases and controls to study n The odds ratio is appropriate for all study designs

  23. Types of interpretation n � 0 �� 1 = ln(odds) (for X= 1) n � 1 = difference in log odds + � 0 � = odds (for X= 1) e 1 n � e = odds ratio 1 n + � � e ( ) 0 1 = = p robability for X 1 n + + � � 1 e 0 1  +  � � e 0 1     + + � �  1 e  0 1 = R elative Risk n   � e 0     + �  1 e  0

  24. Interpretation Tips If the equation includes � 0 , then it is usually for a n particular set of people n log odds n odds n probability n exception: the equation for RR will include � 0 , because that equation cannot be simplified If the equation does not include � 0 , then it must n compare two groups n difference of log odds � log odds ratio n odds ratio

  25. In General n Logistic regression for a binary outcome n Left side of equation is log odds n Can transform the equation to find n odds n probability n Can compare two groups n difference of log odds � log odds ratio n odds ratio n relative risk n Everything we learned before applies 25

  26. Useful math for logistic regression b = ( ) = If ln a b , then e a n ( ) + = = � 0 � X= 1: ln(odds)= � 0 �� 1 (1) so o dds for X 1 e 1 n ln(a) – ln(b) = ln(a/b) n so ln(odds|X= 1) – ln(odds|X= 0) = ln(OR for X= 1 vs. X= 0) n + � � a + e e = × 0 1 a b a b Also : e e e = − = � a b e so e 1 n ( ) � b e e 0 2 �� = � × � = � so e e e e 1 1 1 1 odds = p robability n + 1 odds + � � e ( ) 0 1 = = so p robability for X 1 n + + � � 1 e 0 1

  27. Another Example n Regular physical examination is an important preventative public health measure n We’ll study this outcome using the public health graduate student dataset. n Outcome: No physical exam in the past two years n Primary predictor: age n Secondary predictor and potential confounder : regularly taking a multivitamin

  28. Problem n The original “phys” variable was meant to be continuous, but it was collected categorically. n time since last physician visit n Since it is now categorical and we wish to use it as the outcome for a regression model, we have to make it binary and use logistic regression.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend