Lecture 10: Introduction to Logistic Regression Ani Manichaikul - - PowerPoint PPT Presentation
Lecture 10: Introduction to Logistic Regression Ani Manichaikul - - PowerPoint PPT Presentation
Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression n Regression for a response variable that follows a binomial distribution n Recall the binomial model n And the Binomial
Logistic Regression
n Regression for a response variable that
follows a binomial distribution
n Recall the “binomial model” n And the Binomial Distribution
Binomial Model
n n independent trials
n (e.g., coin tosses)
n p = probability of success on each trial
n (e.g., p = ½ = Pr of Heads)
n Y = number of successes out of n trials
n (e.g., Y= number of heads)
Binomial Distribution
Example:
( )
y n y
p p y n y Y P
−
− = = 1 ) (
Why can’t we use regular regression (SLR or MLR)?
Cannot use Linear Regression
n The response, Y, is NOT Normally
Distributed
n The variability of Y is NOT constant
since the variance, Var(Y)= pq, depends
- n the expected response, E(Y)= p.
n The predicted/fitted values must be
such that the corresponding probabilities are between 0 and 1.
Example
n Consider phase I clinical trial in which
35 independent patients are given a new medication for pain relief. Of the 35 patients, 22 report “significant” relief
- ne hour after medication
n Question: How effective is the drug?
Model
n Y = # patients who get relief n n = 35 patients (trials) n p = probability of relief for any patient
n The truth we seek in the population
n How effective is the drug?
What is p?
n Get best estimate of p given data n Determine margin of error -- range of plausible
values for p
Maximum Likelihood Method
n The method of maximum likelihood
estimation chooses values for parameter estimates which make the
- bserved data “maximally likely” under
the specified model
Maximum Likelihood
n For the binomial model, we have
- bserved Y= y and
n So for this example
( )
y n y
p p y n y Y P
−
− = = 1 ) (
( )
13 22 1
22 35 ) ( p p y Y P − = =
Maximum Likelihood
n So, estimate p by choosing the value for p
which makes observed data “maximally likely”
n i.e., choose p that makes the value of Pr (Y= 22)
maximal
n The ML estimate is y/n
= 22/35 = 0.63 estimated proportion of patients who will experience relief
Maximum Likelihood
Likelihood Function: Pr(22 of 35) Likelihood p=Prob(Event) 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 5.0e-11 1.0e-10
Max Likelihood MLE: p= 0.63
Confidence Interval for p
n Variance of : Var( )= n “Standard Error” of : n Estimate of “Standard Error” of :
p ˆ
p ˆ
( )
n pq n p p = − 1
p ˆ
n pq
p ˆ
n q pˆ ˆ
Confidence Interval for p
n 95% Confidence Interval for the ‘true’
proportion, p: = 0.63-1.96(.082),0.63+ 1.96(.082) = (0.47, 0.79)
( )( )
35 37 . 63 . 96 . 1 63 . ˆ ˆ 96 . 1 ˆ ± = ± n q p p
Conclusion
n Based upon our clinical trial in which 22
- f 35 patients experience relief, we
estimate that 63% of persons who receive the new drug experience relief within 1 hour (95% CI: 47% to 79% )
Conclusion
n Whether 63% (47% to 79%)
represents an ‘effective’ drug will depend many things, especially on the science of the problem.
n Sore throat pain? n Arthritis pain? n Accidentally cut your leg off pain?
Aside: Probabilities and Odds
n The odds of an event are defined as:
- dds(Y= 1) = =
= 0) P(Y 1) P(Y = = 1) P(Y
- 1
1) P(Y = =
- p
p 1
Probabilities and Odds
n We can go back and forth between
- dds and probabilities:
n Odds = n p = odds/(odds+ 1)
- p
p 1
Odds Ratio
n We saw that an odds ratio (OR) can be
helpful for comparisons. Recall the Vitamin A trial:
n OR =
A.) Vit No |
- dds(Death
A) Vit. |
- dds(Death
Odds Ratio
n The OR here describes the benefits of
Vitamin A therapy. We saw for this example that:
n OR = 0.59
n An estimated 40% reduction in mortality
n OR is a building block for logistic
regression
Logistic Regression
n Suppose we want to ask whether new
drug is better than a placebo and have the following observed data:
35 35 Total 15 22 Yes 20 13 No Placebo Drug Relief?
Confidence Intervals for p
p .1 .2 .3 .4 .5 .6 .7 .8 .9 1
( ) ( )
Placebo Drug
Odds Ratio
OR = = = = 2.26
Placebo) | f
- dds(Relie
Drug) | f
- dds(Relie
Placebo)] | P(Relief
- [1
/ Placebo) | P(Relief Drug)] | P(Relief
- [1
/ Drug) | P(Relief
0.45)
- 0.45/(1
0.63)
- 0.63/(1
Confidence Interval for OR
n CI used Woolf’s method for the
standard error of :
n se(
) =
n
find
n Then (eL,eU)
489 . 20 1 15 1 13 1 22 1 = + + +
)) ˆ (log( 96 . 1 ) ˆ log( R O se R O ±
) ˆ log( R O
) ˆ log( R O
Interpretation
n OR = 2.26 n 95% CI: (0.86 , 5.9) n The Drug is an estimated 2 ¼ times
better than the placebo.
n But could the difference be due to
chance alone?
Logistic Regression
n Can we set up a model for this similar
to what we’ve done in ANOVA and Regression?
n Idea: model the log odds of the event,
(in this example, relief) as a function of predictor variables
Model
[ ]
= Tx) | relief P(no Tx) | P(relief log Tx) | f
- dds(Relie
log
= β0 + β1Tx 0 if Placebo where: Tx = 1 if Drug
Then…
n log( odds(Relief|Drug) ) = β0 + β1 n log( odds(Relief|Placebo) ) = β0 n log( odds(R|D)) – log( odds(R|P)) = β1
And…
n Thus:
log = β1
n And:
OR = exp(β1) = eβ1 !!
n So: exp(β1) = odds ratio of relief for
patients taking the Drug-vs-patients taking the Placebo.
P) |
- dds(R
D) |
- dds(R
Logistic Regression
Logit estimates Number of obs = 70 LR chi2(1) = 2.83 Prob > chi2 = 0.0926 Log likelihood = -46.99169 Pseudo R2 = 0.0292
- y | Coef. Std. Err. z P>|z|
[95% Conf. Interval]
- ------------+----------------------------------------------------------------
drug | .8137752 .4889211 1.66 0.096 -.1444926 1.772043 _cons | -.2876821 .341565 -0.84 0.400 -.9571372 .3817731
- Estimates:
log( odds(relief) ) =
= -0.288 + 0.814(Drug) Therefore: OR = exp(0.814) = 2.26 !
Drug 1
ˆ ˆ β β
+
It’s the same!
n So, why go to all the trouble of setting
up a linear model?
n What if there is a biologic reason to
expect that the rate of relief (and perhaps drug efficacy) is age dependent?
Adding other variables
n What if
Pr(relief) = function of Drug or Placebo AND Age
n We could easily include age in a model
such as: log( odds(relief) ) = β0 + β1Drug + β2Age
Logistic Regression
n As in MLR, we can include many
additional covariates.
n For a Logistic Regression model with p
predictors: log ( odds(Y= 1)) = β0 + β1X1 + ... + βpXp where: odds(Y= 1) = =
) 1 Pr( 1 ) 1 Pr( = − = Y Y
) Pr( ) 1 Pr( = = Y Y
Logistic Regression
n Thus:
log = β0 + β1X1 + ... + βpXp
n But, why use log(odds)?
= = ) Pr( ) 1 Pr( Y Y
Logistic regression
n Linear regression might estimate
anything (-, + ), not just a proportion in the range of 0 to 1.
n Logistic regression is a way to
estimate a proportion (between 0 and 1) as well as some related items
n We would like to use something like
what we know from linear regression:
Continuous outcome = 0 + 1X1 + 2X2+ …
n How can we turn a proportion into a
continuous outcome?
Linear models for binary outcomes
Transforming a proportion
n The odds are always positive: n The log odds is continuous:
) , [ p 1 p
- dds
+∞ ⇒ − =
) , ( p 1 p ln
- dds
Log +∞ −∞ ⇒ − =
Logit transformation
) 1 Pr( 1 ) 1 Pr( = − = Y Y
= − = ) 1 Pr( 1 ) 1 Pr( log Y Y
“log-odds” or “logit” ∞
- ∞
“odds” ∞ “probability” 1 Pr(Y = 1) Name Max Min Measure
Logit Function
n Relates log-odds (logit) to p =
Pr(Y= 1)
logit function log-odds Probability of Success .5 1
- 10
- 5
5 10
Key Relationships
n Relating log-odds, probabilities, and
parameters in logistic regression:
n Suppose model:
logit(p) =
β0 + β1X
i.e. log =
β0 + β1X
n Take “anti-logs”
= exp(β0 + β1X)
- p
p 1
- p
p 1
Solve for p
n p = (1 – p)⋅exp(β0 + β1X) n p = exp(β0 + β1X) – p⋅exp(β0 + β1X) n p + p⋅exp(β0 + β1X) = exp(β0 + β1X) n p⋅{ 1+ exp(β0 + β1X)} = exp(β0 + β1X) n p =
) 1 exp ) 1 exp X X β + (β + 1 β + (β
What’s the point?
n We can determine the probability of
success for a specific set of covariates, X, after running a logistic regression model.
Dependence of Blindness on Age
n The following data concern the Aegean
island of Kalytos where inhabitants suffer from a congenital eye disease whose effects become more marked with age.
n Samples of 50 people were taken at five
different ages and the numbers of blind people were counted.
Example: Data
44 / 50 70 37 / 50 55 26 / 50 45 7 / 50 35 6 / 50 20 Number blind / 50 Age
Question
n The scientific question of interest is to
determine how the probability of blindness is related to age in this population. Let pi = Pr(a person in age classi is blind)
Model 1
n
logit(pi) = β0*
β0* = log-odds of blindness for all ages
exp(β0* ) = odds of blindness for all ages
n No age dependence in this model
Model 2
n
logit(pi) =
β0 + β1(agei – 45)
n
β0 = log-odds of blindness among 45 year olds
n
exp(β0) = odds of blindness among 45 year olds
n
β1 = difference in log-odds of blindness comparing
a group that is one year older than another
n
exp(β1) = odds ratio of blindness comparing a group that is one year older than another
Results
n Model 1 n logit( ) = -0.08 or
Iteration 0: log likelihood = -173.08674 Logit estimates Number of obs = 250 LR chi2(0) = 0.00 Prob > chi2 = . Log likelihood = -173.08674 Pseudo R2 = 0.0000
- y | Coef. Std. Err. z P>|z|
[95% Conf. Interval]
- ------------+----------------------------------------------------------------
_cons | -.0800427 .1265924 -0.63 0.527 -.3281593 .1680739
- i
p ˆ
48 . ) exp ) exp ˆ = .08 − ( + 1 .08 − ( = i p
Results
n Model 2 n logit( ) = -4.4 + .094(agei - 45)
- r
i
p ˆ
Logit estimates Number of obs = 250 LR chi2(1) = 99.30 Prob > chi2 = 0.0000 Log likelihood = -123.43444 Pseudo R2 = 0.2869
- y | Coef. Std. Err. z P>|z|
[95% Conf. Interval]
- ------------+----------------------------------------------------------------
age | .0940683 .0119755 7.86 0.000 .0705967 .1175399 _cons | -4.356181 .5700966 -7.64 0.000 -5.473549 -3.238812
- (
) { } ( ) { }
45 094 . 4 . 4 exp 45 094 . 4 . 4 exp ˆ − + − + 1 − + − = i age i age i p
Test of significance
n Is the addition of the age variable in the
model important?
n Maximum likelihood estimates:
= 0.094 s.e.( )= 0.012
n z-test: H0: β1 = 0
n z= 7.855; p-val= 0.000 n 95% C.I. (0.07, 0.12)
1
ˆ β
1
ˆ β
What about the Odds Ratio?
n Maximum likelihood estimates: n OR = exp( )= 1.10
s.e.( ) = 0.013
n z-test: Ho: exp(β1) = 1
n z = 7.86 p-val = 0.000 n 95% C.I. (1.07, 1.13) * (done on log scale)
n It appears that blindness is age dependent n Note: exp(0) = 1, where is this fact useful?
1
ˆ β
1
ˆ β
Model 1:
n Plot of observed proportion -vs-
predicted proportions using an intercept
- nly model
Prob Blindness Age 20 40 60 80 .5 1
Observed Predicted
Model 2
n Plot of observed proportion -vs-
predicted proportions with age in the model
Prob Blindness Age 20 40 60 80 .5 1
Observed Predicted
Conclusion
n Model 2 clearly fits better than Model 1! n Including age in our model is better
than intercept alone.
Summary
n
Logistic regression gives us a framework in which to model binary outcomes
n
Uses the structure of linear models, with outcomes modelled as a function of covariates
n
Many concepts carry over from linear regression
n
Interactions
n
Linear splines
n
Tests of significance for coefficients
n