Lecture 16: Survival Analysis I Kaplan Meier and Log-rank test Ani - - PowerPoint PPT Presentation

lecture 16 survival analysis i kaplan meier and log rank
SMART_READER_LITE
LIVE PREVIEW

Lecture 16: Survival Analysis I Kaplan Meier and Log-rank test Ani - - PowerPoint PPT Presentation

Lecture 16: Survival Analysis I Kaplan Meier and Log-rank test Ani Manichaikul amanicha@jhsph.edu 11 May 2007 Survival Analysis n Statistical methods for the study of time to an event n Accounts for: n Time that events occur n Different


slide-1
SLIDE 1

Lecture 16: Survival Analysis I – Kaplan Meier and Log-rank test

Ani Manichaikul amanicha@jhsph.edu 11 May 2007

slide-2
SLIDE 2

2

Survival Analysis

n Statistical methods for the study of time

to an event

n Accounts for:

n Time that events occur n Different follow-up times

slide-3
SLIDE 3

3

Survival Analysis

n Survival analysis methods allow us to

incorporate information about both frequency of event occurrence and time to event information

n Subjects are followed until they have an

“event,” or the study ends

slide-4
SLIDE 4

4

Endpoint

n The endpoint doesn’t have to be

‘death’; it can be any well-defined event

n Death n Disease onset n Menopause n Pregnancy n Relapse

slide-5
SLIDE 5

5

Time Scale

n When do you start the clock?

n Time from diagnosis of disease to death n Time from HIV infection to AIDS n Time from birth (chronological age) n Time from randomization in clinical trial

slide-6
SLIDE 6

6

Why Is Survival Analysis Tricky?

n We need a method which can

incorporate information about censored data into an analysis

slide-7
SLIDE 7

7

S(t) Time

S(t) is an estimate of the proportion of individuals still alive (have not had the event) at time t

The Survival Curve

slide-8
SLIDE 8

8

The Survival Curve

n The survival curve as an important and

complete summary

n Time 0: “start of clock”

( ) ( )

at time alive # t time followup at alive # ) ( = t S

slide-9
SLIDE 9

9

Survival Curve Facts:

n The curve starts at 1 and decreases n Estimating these curves and comparing them

among groups constitutes a “survival analysis”

n Need to decide on what summary is

important

n Mean survival time n Median survival time n Height at a specific time: One, two year survival rates n Difference of curves: S1(12) - S2(12)

slide-10
SLIDE 10

10

.50

Estimating Median Survival

S(t) Time

m

slide-11
SLIDE 11

11

S(t) Time

Caveat—Medians Do Not Describe Whole Curve

.50 m

slide-12
SLIDE 12

12

Survival Function

n The survival function, denoted S(t), is a

better way to represent the probability distribution of the survival time T, when some

  • f the observed times are censored

n only know that T> t, rather than T= t

n S(t) = Pr(T > t) = Pr(No event by time t) n S(t) is the probability of surviving beyond t

slide-13
SLIDE 13

13

n Uncensored data: The event has

  • ccurred

n Censored data: The event has yet to

  • ccur

n Event-free at the current followup time n A competing event that is not an endpoint

stops followup

n Death (if not part of the endpoint) n Clinical event that requires treatment, etc.

slide-14
SLIDE 14

14

n Important issue: If no events are

reported in the interval from last follow- up to “now”, need to choose between:

n No news is good news? n No news is no news

slide-15
SLIDE 15

15

n Ignore the incomplete cases; drop them

n Produces bias in the estimated curve n Unbalanced censoring produces biased

comparisons

n Impute an event time

n Depends on a model

n Use the available information on each

participant

slide-16
SLIDE 16

16

n Example: 5 events in 600 person months

n

5/600 = 1/120 events per month = 0.1 events per year = 10 events per 100 person-years

n Gives an average event rate over the follow-

up period

n For a finer time resolution, do the above for

small intervals

( )

n time

  • bservatio

total events # Rate Event =

slide-17
SLIDE 17

17

Quantities of Interest

n The survivor function S(t)

S(t)= P(T> t)= P(No event by time t)

n Hazard function (t)

(t) “= ” P(T= t)/ P(T> t)

= risk of event occurring at time t

The above form is true for discrete time, but involves more complicated calculus-based notation for continuous time.

slide-18
SLIDE 18

18

Quantities of Interest

n Often, we are interested in comparing

the hazard between groups, for example, the relative hazard of relapse comparing those on chemo to those not

  • n chemo

n Relative Risk n Hazard Ratio n Risk Ratio

slide-19
SLIDE 19

19

Estimation

n Kaplan-Meier survivor function

estimator

n Cox proportional hazards model (PHM)

for hazard ratio

n We’ll start with Kaplan-Meier (K-M)

slide-20
SLIDE 20

20

Central Problem

n Estimation of the survival curve n S(t) = Proportion surviving at least to

time t or beyond

slide-21
SLIDE 21

21

S(t) Time

S(0) always equals 1 All subjects are alive at beginning of the study 1.0

The Survival Curve

slide-22
SLIDE 22

22

S(t) Time

Curve can only remain at same value or decrease as time progresses 1.0

The Survival Curve

slide-23
SLIDE 23

23

S(t) Time

If all the subjects do not experience the event by the end of the study window, the curve may never reach zero 1.0

The Survival Curve

slide-24
SLIDE 24

24

Example

n Consider a clinical trial in patients with

acute myelogenous leukemia (AML) comparing two groups of patients: no maintenance treatment with chemotherapy (X= 0) -vs- maintenance chemotherapy treatment (X= 1)

slide-25
SLIDE 25

25

Example: Data

slide-26
SLIDE 26

26

Why Survival Methods?

n We are interested in estimating the

relationship between chemotherapy and the time to AML relapse in weeks.

n We need some tools because:

n Data are censored, so linear regression is

not appropriate

n We are interested in time to relapse, not

just relapse (yes/no), so logistic regression is not appropriate

slide-27
SLIDE 27

27

Kaplan-Meier Estimate

n Curve can be estimated at each event,

but not at censoring times

n S(t) = proportion of individuals

surviving beyond time t

slide-28
SLIDE 28

28

) _ _ (Pr ) ( ) ( ) ( ) ( Time Event evious t n t y t n t S S ×         − =

Kaplan-Meier Estimate

n Curve can be estimated at each event,

but not at censoring times

n y(t) = # events at time t n n(t) = # subjects at risk for event at

time t

slide-29
SLIDE 29

29

) _ _ (Pr ) ( ) ( ) ( ) ( Time Event evious t n t y t n t S S ×         − =

Proportion of original sample making it to time t

Kaplan-Meier Estimate

n Curve can be estimated at each event,

but not at censoring times

slide-30
SLIDE 30

30

Proportion surviving to time t who survive beyond time t

Kaplan-Meier Estimate

n Curve can be estimated at each event,

but not at censoring times

) _ _ (Pr ) ( ) ( ) ( ) ( Time Event evious t n t y t n t S S ×         − =

slide-31
SLIDE 31

31

n Start estimate at first event time

n No Chemotherapy Group: Time = 5

833 . 12 10 12 2 12 ) 5 ( ) 5 ( ) 5 ( ) 5 ( = = − =         − = n y n S

Kaplan-Meier Estimate

slide-32
SLIDE 32

32

n No Chemotherapy group: Time= 8

n 2nd event time

666 . 833 . 10 8 ) 833 (. 10 2 10 ) 5 ( ) 8 ( ) 8 ( ) 8 ( ) 8 ( = × = ×       − = ×         − = S n y n S

Kaplan-Meier Estimate

slide-33
SLIDE 33

33

Kaplan-Meier Estimate

n Skip over censoring times: Remove

from number at risk for next event time

n Continue through final event time

slide-34
SLIDE 34

34

Alternative Notation

1 ) ( ˆ ) ( ˆ

: :

=         − = ∏

S n y n t S

t t i i i i

i

(by convention)

slide-35
SLIDE 35

35

slide-36
SLIDE 36

36

Notice

n Time 16 was not included in the table,

yet 2 people were subtracted from the risk set at time 23

n The estimated survivor function does not

change at censoring times when no event

  • ccurs

n Censored individuals are subtracted from

the risk set at subsequent times because they are “lost to follow-up”

slide-37
SLIDE 37

37

slide-38
SLIDE 38

38

Kaplan-Meier Estimate

n Graph is a step function n “Jumps” at each observed event time n Nothing is assumed about curved shape

between each observed event time

slide-39
SLIDE 39

39

Kaplan-Meier Estimate

slide-40
SLIDE 40

40

Kaplan-Meier Estimate

n Product limit estimate

n Order survival times n Computed at observed events n Multiplying conditional probabilities

n Next time we’ll discuss Confidence

Intervals for S(t)!

slide-41
SLIDE 41

41

Big Assumption

n Independence of censoring and survival n Those censored at time t have the same

prognosis as those not censored at t

slide-42
SLIDE 42

42

Comparing Survival Curves

n Common statistical tests:

n Generalized Wilcoxon

(Breslow, Gehan)

n Logrank

slide-43
SLIDE 43

43

Comparing Survival Curves

n Both compare survival curves across

multiple time points to answer the question: “Is overall survival different between any of the groups?”

n Ho: No difference in S(t) n Ha: Difference in S(t)

slide-44
SLIDE 44

44

0.00 0.25 0.50 0.75 1.00 100 200 300 400 analysis time

Kaplan Meier Curve, by Group

Comparing Survival Curves

n Wilcoxon (Breslow, Gehan) more sensitive to

early survival differences

Group 1 Group 2

slide-45
SLIDE 45

45

Comparing Survival Curves

n Logrank more sensitive to later survival

differences

0.00 0.25 0.50 0.75 1.00 100 200 300 400 analysis time

Kaplan Meier Curve, by Group

Group 1 Group 2

slide-46
SLIDE 46

46

Comparing Survival Curves

n Neither test very good if curves “crossover”

0.00 0.25 0.50 0.75 1.00 100 200 300 400 analysis time

Kaplan Meier Curve, by Group

Group 1 Group 2

slide-47
SLIDE 47

47

Logrank Test

n Answers the Quesiton:

Are two survivor curves the same?

n Use the times of events: t1, t2, ...

(do not include censoring times)

n Treat each event and its “set of persons

still at risk” (i.e., risk set) at each time tj as an independent table

slide-48
SLIDE 48

48

Logrank Test: Recipe

n Make a 2×2 table at each tj

slide-49
SLIDE 49

49

Logrank test

n At each event time tj, under assumption of

equal survival (SA(t) = SB(t)) the expected number of events in Group A

  • ut of the total events (dj= aj+ cj) is in

proportion to the numbers at risk in group A to the total at risk at time tj: E(aj)= dj* njA/nj

slide-50
SLIDE 50

50

Logrank Test: Formula

slide-51
SLIDE 51

51

Logrank Test

n Uses the Cochran Mantel-Haenszel idea

  • f pooling over events j to get the log-

rank statistic

n This Chi-square statistic has 1 degree of

freedom (use to get p-value)

n Small p-value; Reject H0; n Conclusion: Survival Curves ARE different!

slide-52
SLIDE 52

52

Logrank Test: Our Example

Chi2=2.61 pval = .1061

slide-53
SLIDE 53

53

Conclusion

n Fail to reject the null hypothesis.

Cannot conclude that there is a difference between the survival (time to relapse) of those on maintenance chemotherapy and those not on maintenance chemotherapy.

slide-54
SLIDE 54

54

Conclusion

n What if we want to adjust for other

factors?

n Cox Proportional Hazards Model! n Next time…