Lecture 16: Survival Analysis I Kaplan Meier and Log-rank test Ani - - PowerPoint PPT Presentation
Lecture 16: Survival Analysis I Kaplan Meier and Log-rank test Ani - - PowerPoint PPT Presentation
Lecture 16: Survival Analysis I Kaplan Meier and Log-rank test Ani Manichaikul amanicha@jhsph.edu 11 May 2007 Survival Analysis n Statistical methods for the study of time to an event n Accounts for: n Time that events occur n Different
2
Survival Analysis
n Statistical methods for the study of time
to an event
n Accounts for:
n Time that events occur n Different follow-up times
3
Survival Analysis
n Survival analysis methods allow us to
incorporate information about both frequency of event occurrence and time to event information
n Subjects are followed until they have an
“event,” or the study ends
4
Endpoint
n The endpoint doesn’t have to be
‘death’; it can be any well-defined event
n Death n Disease onset n Menopause n Pregnancy n Relapse
5
Time Scale
n When do you start the clock?
n Time from diagnosis of disease to death n Time from HIV infection to AIDS n Time from birth (chronological age) n Time from randomization in clinical trial
6
Why Is Survival Analysis Tricky?
n We need a method which can
incorporate information about censored data into an analysis
7
S(t) Time
S(t) is an estimate of the proportion of individuals still alive (have not had the event) at time t
The Survival Curve
8
The Survival Curve
n The survival curve as an important and
complete summary
n Time 0: “start of clock”
( ) ( )
at time alive # t time followup at alive # ) ( = t S
9
Survival Curve Facts:
n The curve starts at 1 and decreases n Estimating these curves and comparing them
among groups constitutes a “survival analysis”
n Need to decide on what summary is
important
n Mean survival time n Median survival time n Height at a specific time: One, two year survival rates n Difference of curves: S1(12) - S2(12)
10
.50
Estimating Median Survival
S(t) Time
m
11
S(t) Time
Caveat—Medians Do Not Describe Whole Curve
.50 m
12
Survival Function
n The survival function, denoted S(t), is a
better way to represent the probability distribution of the survival time T, when some
- f the observed times are censored
n only know that T> t, rather than T= t
n S(t) = Pr(T > t) = Pr(No event by time t) n S(t) is the probability of surviving beyond t
13
n Uncensored data: The event has
- ccurred
n Censored data: The event has yet to
- ccur
n Event-free at the current followup time n A competing event that is not an endpoint
stops followup
n Death (if not part of the endpoint) n Clinical event that requires treatment, etc.
14
n Important issue: If no events are
reported in the interval from last follow- up to “now”, need to choose between:
n No news is good news? n No news is no news
15
n Ignore the incomplete cases; drop them
n Produces bias in the estimated curve n Unbalanced censoring produces biased
comparisons
n Impute an event time
n Depends on a model
n Use the available information on each
participant
16
n Example: 5 events in 600 person months
n
5/600 = 1/120 events per month = 0.1 events per year = 10 events per 100 person-years
n Gives an average event rate over the follow-
up period
n For a finer time resolution, do the above for
small intervals
( )
n time
- bservatio
total events # Rate Event =
17
Quantities of Interest
n The survivor function S(t)
S(t)= P(T> t)= P(No event by time t)
n Hazard function (t)
(t) “= ” P(T= t)/ P(T> t)
= risk of event occurring at time t
The above form is true for discrete time, but involves more complicated calculus-based notation for continuous time.
18
Quantities of Interest
n Often, we are interested in comparing
the hazard between groups, for example, the relative hazard of relapse comparing those on chemo to those not
- n chemo
n Relative Risk n Hazard Ratio n Risk Ratio
19
Estimation
n Kaplan-Meier survivor function
estimator
n Cox proportional hazards model (PHM)
for hazard ratio
n We’ll start with Kaplan-Meier (K-M)
20
Central Problem
n Estimation of the survival curve n S(t) = Proportion surviving at least to
time t or beyond
21
S(t) Time
S(0) always equals 1 All subjects are alive at beginning of the study 1.0
The Survival Curve
22
S(t) Time
Curve can only remain at same value or decrease as time progresses 1.0
The Survival Curve
23
S(t) Time
If all the subjects do not experience the event by the end of the study window, the curve may never reach zero 1.0
The Survival Curve
24
Example
n Consider a clinical trial in patients with
acute myelogenous leukemia (AML) comparing two groups of patients: no maintenance treatment with chemotherapy (X= 0) -vs- maintenance chemotherapy treatment (X= 1)
25
Example: Data
26
Why Survival Methods?
n We are interested in estimating the
relationship between chemotherapy and the time to AML relapse in weeks.
n We need some tools because:
n Data are censored, so linear regression is
not appropriate
n We are interested in time to relapse, not
just relapse (yes/no), so logistic regression is not appropriate
27
Kaplan-Meier Estimate
n Curve can be estimated at each event,
but not at censoring times
n S(t) = proportion of individuals
surviving beyond time t
28
) _ _ (Pr ) ( ) ( ) ( ) ( Time Event evious t n t y t n t S S × − =
Kaplan-Meier Estimate
n Curve can be estimated at each event,
but not at censoring times
n y(t) = # events at time t n n(t) = # subjects at risk for event at
time t
29
) _ _ (Pr ) ( ) ( ) ( ) ( Time Event evious t n t y t n t S S × − =
Proportion of original sample making it to time t
Kaplan-Meier Estimate
n Curve can be estimated at each event,
but not at censoring times
30
Proportion surviving to time t who survive beyond time t
Kaplan-Meier Estimate
n Curve can be estimated at each event,
but not at censoring times
) _ _ (Pr ) ( ) ( ) ( ) ( Time Event evious t n t y t n t S S × − =
31
n Start estimate at first event time
n No Chemotherapy Group: Time = 5
833 . 12 10 12 2 12 ) 5 ( ) 5 ( ) 5 ( ) 5 ( = = − = − = n y n S
Kaplan-Meier Estimate
32
n No Chemotherapy group: Time= 8
n 2nd event time
666 . 833 . 10 8 ) 833 (. 10 2 10 ) 5 ( ) 8 ( ) 8 ( ) 8 ( ) 8 ( = × = × − = × − = S n y n S
Kaplan-Meier Estimate
33
Kaplan-Meier Estimate
n Skip over censoring times: Remove
from number at risk for next event time
n Continue through final event time
34
Alternative Notation
1 ) ( ˆ ) ( ˆ
: :
= − = ∏
≤
S n y n t S
t t i i i i
i
(by convention)
35
36
Notice
n Time 16 was not included in the table,
yet 2 people were subtracted from the risk set at time 23
n The estimated survivor function does not
change at censoring times when no event
- ccurs
n Censored individuals are subtracted from
the risk set at subsequent times because they are “lost to follow-up”
37
38
Kaplan-Meier Estimate
n Graph is a step function n “Jumps” at each observed event time n Nothing is assumed about curved shape
between each observed event time
39
Kaplan-Meier Estimate
40
Kaplan-Meier Estimate
n Product limit estimate
n Order survival times n Computed at observed events n Multiplying conditional probabilities
n Next time we’ll discuss Confidence
Intervals for S(t)!
41
Big Assumption
n Independence of censoring and survival n Those censored at time t have the same
prognosis as those not censored at t
42
Comparing Survival Curves
n Common statistical tests:
n Generalized Wilcoxon
(Breslow, Gehan)
n Logrank
43
Comparing Survival Curves
n Both compare survival curves across
multiple time points to answer the question: “Is overall survival different between any of the groups?”
n Ho: No difference in S(t) n Ha: Difference in S(t)
44
0.00 0.25 0.50 0.75 1.00 100 200 300 400 analysis time
Kaplan Meier Curve, by Group
Comparing Survival Curves
n Wilcoxon (Breslow, Gehan) more sensitive to
early survival differences
Group 1 Group 2
45
Comparing Survival Curves
n Logrank more sensitive to later survival
differences
0.00 0.25 0.50 0.75 1.00 100 200 300 400 analysis time
Kaplan Meier Curve, by Group
Group 1 Group 2
46
Comparing Survival Curves
n Neither test very good if curves “crossover”
0.00 0.25 0.50 0.75 1.00 100 200 300 400 analysis time
Kaplan Meier Curve, by Group
Group 1 Group 2
47
Logrank Test
n Answers the Quesiton:
Are two survivor curves the same?
n Use the times of events: t1, t2, ...
(do not include censoring times)
n Treat each event and its “set of persons
still at risk” (i.e., risk set) at each time tj as an independent table
48
Logrank Test: Recipe
n Make a 2×2 table at each tj
49
Logrank test
n At each event time tj, under assumption of
equal survival (SA(t) = SB(t)) the expected number of events in Group A
- ut of the total events (dj= aj+ cj) is in
proportion to the numbers at risk in group A to the total at risk at time tj: E(aj)= dj* njA/nj
50
Logrank Test: Formula
51
Logrank Test
n Uses the Cochran Mantel-Haenszel idea
- f pooling over events j to get the log-
rank statistic
n This Chi-square statistic has 1 degree of
freedom (use to get p-value)
n Small p-value; Reject H0; n Conclusion: Survival Curves ARE different!
52
Logrank Test: Our Example
Chi2=2.61 pval = .1061
53
Conclusion
n Fail to reject the null hypothesis.
Cannot conclude that there is a difference between the survival (time to relapse) of those on maintenance chemotherapy and those not on maintenance chemotherapy.
54
Conclusion
n What if we want to adjust for other
factors?
n Cox Proportional Hazards Model! n Next time…