Lecture 10: Introduction to Logistic Regression Ani Manichaikul - - PowerPoint PPT Presentation

lecture 10 introduction to logistic regression
SMART_READER_LITE
LIVE PREVIEW

Lecture 10: Introduction to Logistic Regression Ani Manichaikul - - PowerPoint PPT Presentation

Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression n Regression for a response variable that follows a binomial distribution n Recall the binomial model n And the Binomial


slide-1
SLIDE 1

Lecture 10: Introduction to Logistic Regression

Ani Manichaikul amanicha@jhsph.edu 2 May 2007

slide-2
SLIDE 2

Logistic Regression

n Regression for a response variable that

follows a binomial distribution

n Recall the “binomial model” n And the Binomial Distribution

slide-3
SLIDE 3

Binomial Model

n n independent trials

n (e.g., coin tosses)

n p = probability of success on each trial

n (e.g., p = ½ = Pr of Heads)

n Y = number of successes out of n trials

n (e.g., Y= number of heads)

slide-4
SLIDE 4

Binomial Distribution

Example:

( )

y n y

p p y n y Y P

−         = = 1 ) (

slide-5
SLIDE 5

Why can’t we use regular regression (SLR or MLR)?

slide-6
SLIDE 6

Cannot use Linear Regression

n The response, Y, is NOT Normally

Distributed

n The variability of Y is NOT constant

since the variance, Var(Y)= pq, depends

  • n the expected response, E(Y)= p.

n The predicted/fitted values must be

such that the corresponding probabilities are between 0 and 1.

slide-7
SLIDE 7

Example

n Consider phase I clinical trial in which

35 independent patients are given a new medication for pain relief. Of the 35 patients, 22 report “significant” relief

  • ne hour after medication

n Question: How effective is the drug?

slide-8
SLIDE 8

Model

n Y = # patients who get relief n n = 35 patients (trials) n p = probability of relief for any patient

n The truth we seek in the population

n How effective is the drug?

What is p?

n Get best estimate of p given data n Determine margin of error -- range of plausible

values for p

slide-9
SLIDE 9

Maximum Likelihood Method

n The method of maximum likelihood

estimation chooses values for parameter estimates which make the

  • bserved data “maximally likely” under

the specified model

slide-10
SLIDE 10

Maximum Likelihood

n For the binomial model, we have

  • bserved Y= y and

n So for this example

( )

y n y

p p y n y Y P

−         = = 1 ) (

( )

13 22 1

22 35 ) ( p p y Y P −         = =

slide-11
SLIDE 11

Maximum Likelihood

n So, estimate p by choosing the value for p

which makes observed data “maximally likely”

n i.e., choose p that makes the value of Pr (Y= 22)

maximal

n The ML estimate is y/n

= 22/35 = 0.63 estimated proportion of patients who will experience relief

slide-12
SLIDE 12

Maximum Likelihood

Likelihood Function: Pr(22 of 35) Likelihood p=Prob(Event) 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 5.0e-11 1.0e-10

Max Likelihood MLE: p= 0.63

slide-13
SLIDE 13

Confidence Interval for p

n Variance of : Var( )= n “Standard Error” of : n Estimate of “Standard Error” of :

p ˆ

p ˆ

( )

n pq n p p = − 1

p ˆ

n pq

p ˆ

n q pˆ ˆ

slide-14
SLIDE 14

Confidence Interval for p

n 95% Confidence Interval for the ‘true’

proportion, p: = 0.63-1.96(.082),0.63+ 1.96(.082) = (0.47, 0.79)

( )( )

35 37 . 63 . 96 . 1 63 . ˆ ˆ 96 . 1 ˆ ± = ± n q p p

slide-15
SLIDE 15

Conclusion

n Based upon our clinical trial in which 22

  • f 35 patients experience relief, we

estimate that 63% of persons who receive the new drug experience relief within 1 hour (95% CI: 47% to 79% )

slide-16
SLIDE 16

Conclusion

n Whether 63% (47% to 79%)

represents an ‘effective’ drug will depend many things, especially on the science of the problem.

n Sore throat pain? n Arthritis pain? n Accidentally cut your leg off pain?

slide-17
SLIDE 17

Aside: Probabilities and Odds

n The odds of an event are defined as:

  • dds(Y= 1) = =

= 0) P(Y 1) P(Y = = 1) P(Y

  • 1

1) P(Y = =

  • p

p 1

slide-18
SLIDE 18

Probabilities and Odds

n We can go back and forth between

  • dds and probabilities:

n Odds = n p = odds/(odds+ 1)

  • p

p 1

slide-19
SLIDE 19

Odds Ratio

n We saw that an odds ratio (OR) can be

helpful for comparisons. Recall the Vitamin A trial:

n OR =

A.) Vit No |

  • dds(Death

A) Vit. |

  • dds(Death
slide-20
SLIDE 20

Odds Ratio

n The OR here describes the benefits of

Vitamin A therapy. We saw for this example that:

n OR = 0.59

n An estimated 40% reduction in mortality

n OR is a building block for logistic

regression

slide-21
SLIDE 21

Logistic Regression

n Suppose we want to ask whether new

drug is better than a placebo and have the following observed data:

35 35 Total 15 22 Yes 20 13 No Placebo Drug Relief?

slide-22
SLIDE 22

Confidence Intervals for p

p .1 .2 .3 .4 .5 .6 .7 .8 .9 1

( ) ( )

Placebo Drug

slide-23
SLIDE 23

Odds Ratio

OR = = = = 2.26

Placebo) | f

  • dds(Relie

Drug) | f

  • dds(Relie

Placebo)] | P(Relief

  • [1

/ Placebo) | P(Relief Drug)] | P(Relief

  • [1

/ Drug) | P(Relief

0.45)

  • 0.45/(1

0.63)

  • 0.63/(1
slide-24
SLIDE 24

Confidence Interval for OR

n CI used Woolf’s method for the

standard error of :

n se(

) =

n

find

n Then (eL,eU)

489 . 20 1 15 1 13 1 22 1 = + + +

)) ˆ (log( 96 . 1 ) ˆ log( R O se R O ±

) ˆ log( R O

) ˆ log( R O

slide-25
SLIDE 25

Interpretation

n OR = 2.26 n 95% CI: (0.86 , 5.9) n The Drug is an estimated 2 ¼ times

better than the placebo.

n But could the difference be due to

chance alone?

slide-26
SLIDE 26

Logistic Regression

n Can we set up a model for this similar

to what we’ve done in ANOVA and Regression?

n Idea: model the log odds of the event,

(in this example, relief) as a function of predictor variables

slide-27
SLIDE 27

Model

[ ]

       

= Tx) | relief P(no Tx) | P(relief log Tx) | f

  • dds(Relie

log

= β0 + β1Tx 0 if Placebo where: Tx = 1 if Drug

slide-28
SLIDE 28

Then…

n log( odds(Relief|Drug) ) = β0 + β1 n log( odds(Relief|Placebo) ) = β0 n log( odds(R|D)) – log( odds(R|P)) = β1

slide-29
SLIDE 29

And…

n Thus:

log = β1

n And:

OR = exp(β1) = eβ1 !!

n So: exp(β1) = odds ratio of relief for

patients taking the Drug-vs-patients taking the Placebo.

       

P) |

  • dds(R

D) |

  • dds(R
slide-30
SLIDE 30

Logistic Regression

Logit estimates Number of obs = 70 LR chi2(1) = 2.83 Prob > chi2 = 0.0926 Log likelihood = -46.99169 Pseudo R2 = 0.0292

  • y | Coef. Std. Err. z P>|z|

[95% Conf. Interval]

  • ------------+----------------------------------------------------------------

drug | .8137752 .4889211 1.66 0.096 -.1444926 1.772043 _cons | -.2876821 .341565 -0.84 0.400 -.9571372 .3817731

  • Estimates:

log( odds(relief) ) =

= -0.288 + 0.814(Drug) Therefore: OR = exp(0.814) = 2.26 !

Drug 1

ˆ ˆ β β

+

slide-31
SLIDE 31

It’s the same!

n So, why go to all the trouble of setting

up a linear model?

n What if there is a biologic reason to

expect that the rate of relief (and perhaps drug efficacy) is age dependent?

slide-32
SLIDE 32

Adding other variables

n What if

Pr(relief) = function of Drug or Placebo AND Age

n We could easily include age in a model

such as: log( odds(relief) ) = β0 + β1Drug + β2Age

slide-33
SLIDE 33

Logistic Regression

n As in MLR, we can include many

additional covariates.

n For a Logistic Regression model with p

predictors: log ( odds(Y= 1)) = β0 + β1X1 + ... + βpXp where: odds(Y= 1) = =

) 1 Pr( 1 ) 1 Pr( = − = Y Y

) Pr( ) 1 Pr( = = Y Y

slide-34
SLIDE 34

Logistic Regression

n Thus:

log = β0 + β1X1 + ... + βpXp

n But, why use log(odds)?

        = = ) Pr( ) 1 Pr( Y Y

slide-35
SLIDE 35

Logistic regression

n Linear regression might estimate

anything (-, + ), not just a proportion in the range of 0 to 1.

n Logistic regression is a way to

estimate a proportion (between 0 and 1) as well as some related items

slide-36
SLIDE 36

n We would like to use something like

what we know from linear regression:

Continuous outcome = 0 + 1X1 + 2X2+ …

n How can we turn a proportion into a

continuous outcome?

Linear models for binary outcomes

slide-37
SLIDE 37

Transforming a proportion

n The odds are always positive: n The log odds is continuous:

) , [ p 1 p

  • dds

+∞ ⇒         − =

) , ( p 1 p ln

  • dds

Log +∞ −∞ ⇒         − =

slide-38
SLIDE 38

Logit transformation

) 1 Pr( 1 ) 1 Pr( = − = Y Y

        = − = ) 1 Pr( 1 ) 1 Pr( log Y Y

“log-odds” or “logit” ∞

“odds” ∞ “probability” 1 Pr(Y = 1) Name Max Min Measure

slide-39
SLIDE 39

Logit Function

n Relates log-odds (logit) to p =

Pr(Y= 1)

logit function log-odds Probability of Success .5 1

  • 10
  • 5

5 10

slide-40
SLIDE 40

Key Relationships

n Relating log-odds, probabilities, and

parameters in logistic regression:

n Suppose model:

logit(p) =

β0 + β1X

i.e. log =

β0 + β1X

n Take “anti-logs”

= exp(β0 + β1X)

       

  • p

p 1

       

  • p

p 1

slide-41
SLIDE 41

Solve for p

n p = (1 – p)⋅exp(β0 + β1X) n p = exp(β0 + β1X) – p⋅exp(β0 + β1X) n p + p⋅exp(β0 + β1X) = exp(β0 + β1X) n p⋅{ 1+ exp(β0 + β1X)} = exp(β0 + β1X) n p =

) 1 exp ) 1 exp X X β + (β + 1 β + (β

slide-42
SLIDE 42

What’s the point?

n We can determine the probability of

success for a specific set of covariates, X, after running a logistic regression model.

slide-43
SLIDE 43

Dependence of Blindness on Age

n The following data concern the Aegean

island of Kalytos where inhabitants suffer from a congenital eye disease whose effects become more marked with age.

n Samples of 50 people were taken at five

different ages and the numbers of blind people were counted.

slide-44
SLIDE 44

Example: Data

44 / 50 70 37 / 50 55 26 / 50 45 7 / 50 35 6 / 50 20 Number blind / 50 Age

slide-45
SLIDE 45

Question

n The scientific question of interest is to

determine how the probability of blindness is related to age in this population. Let pi = Pr(a person in age classi is blind)

slide-46
SLIDE 46

Model 1

n

logit(pi) = β0*

β0* = log-odds of blindness for all ages

exp(β0* ) = odds of blindness for all ages

n No age dependence in this model

slide-47
SLIDE 47

Model 2

n

logit(pi) =

β0 + β1(agei – 45)

n

β0 = log-odds of blindness among 45 year olds

n

exp(β0) = odds of blindness among 45 year olds

n

β1 = difference in log-odds of blindness comparing

a group that is one year older than another

n

exp(β1) = odds ratio of blindness comparing a group that is one year older than another

slide-48
SLIDE 48

Results

n Model 1 n logit( ) = -0.08 or

Iteration 0: log likelihood = -173.08674 Logit estimates Number of obs = 250 LR chi2(0) = 0.00 Prob > chi2 = . Log likelihood = -173.08674 Pseudo R2 = 0.0000

  • y | Coef. Std. Err. z P>|z|

[95% Conf. Interval]

  • ------------+----------------------------------------------------------------

_cons | -.0800427 .1265924 -0.63 0.527 -.3281593 .1680739

  • i

p ˆ

48 . ) exp ) exp ˆ = .08 − ( + 1 .08 − ( = i p

slide-49
SLIDE 49

Results

n Model 2 n logit( ) = -4.4 + .094(agei - 45)

  • r

i

p ˆ

Logit estimates Number of obs = 250 LR chi2(1) = 99.30 Prob > chi2 = 0.0000 Log likelihood = -123.43444 Pseudo R2 = 0.2869

  • y | Coef. Std. Err. z P>|z|

[95% Conf. Interval]

  • ------------+----------------------------------------------------------------

age | .0940683 .0119755 7.86 0.000 .0705967 .1175399 _cons | -4.356181 .5700966 -7.64 0.000 -5.473549 -3.238812

  • (

) { } ( ) { }

45 094 . 4 . 4 exp 45 094 . 4 . 4 exp ˆ − + − + 1 − + − = i age i age i p

slide-50
SLIDE 50

Test of significance

n Is the addition of the age variable in the

model important?

n Maximum likelihood estimates:

= 0.094 s.e.( )= 0.012

n z-test: H0: β1 = 0

n z= 7.855; p-val= 0.000 n 95% C.I. (0.07, 0.12)

1

ˆ β

1

ˆ β

slide-51
SLIDE 51

What about the Odds Ratio?

n Maximum likelihood estimates: n OR = exp( )= 1.10

s.e.( ) = 0.013

n z-test: Ho: exp(β1) = 1

n z = 7.86 p-val = 0.000 n 95% C.I. (1.07, 1.13) * (done on log scale)

n It appears that blindness is age dependent n Note: exp(0) = 1, where is this fact useful?

1

ˆ β

1

ˆ β

slide-52
SLIDE 52

Model 1:

n Plot of observed proportion -vs-

predicted proportions using an intercept

  • nly model

Prob Blindness Age 20 40 60 80 .5 1

Observed Predicted

slide-53
SLIDE 53

Model 2

n Plot of observed proportion -vs-

predicted proportions with age in the model

Prob Blindness Age 20 40 60 80 .5 1

Observed Predicted

slide-54
SLIDE 54

Conclusion

n Model 2 clearly fits better than Model 1! n Including age in our model is better

than intercept alone.

slide-55
SLIDE 55

Summary

n

Logistic regression gives us a framework in which to model binary outcomes

n

Uses the structure of linear models, with outcomes modelled as a function of covariates

n

Many concepts carry over from linear regression

n

Interactions

n

Linear splines

n

Tests of significance for coefficients

n

All coefficients will have different interpretations in logistic regression