Estimating the Survival Function One-sample nonparametric methods: - PowerPoint PPT Presentation

Estimating the Survival Function One-sample nonparametric methods: We will consider three methods for estimating a survivorship function S ( t ) = Pr ( T ≥ t ) without resorting to parametric methods: (1) Kaplan-Meier (2) Life-table (Actuarial Estimator) (3) Cumulative hazard estimator 1

The Kaplan-Meier Estimator The Kaplan-Meier (or KM) estimator is probably the most popular approach. It can be justified from several perspectives: • product limit estimator • likelihood justification • redistribute to the right estimator We will start with an intuitive motivation based on conditional probabilities, then review some of the other justifications. 2

Motivation: First, consider an example where there is no censoring. The following are times of remission (weeks) for 21 leukemia patients receiving control treatment (Table 1.1 of Cox & Oakes): 1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12, 15, 17, 22, 23 How would we estimate S(10), the probability that an individual survives to time 10 or later? What about ˜ S (8)? Is it 12 8 21 or 21 ? 3

Let’s construct a table of ˜ S ( t ): ˆ Values of t S ( t ) t ≤ 1 21/21=1.000 1 < t ≤ 2 19/21=0.905 2 < t ≤ 3 17/21=0.809 3 < t ≤ 4 4 < t ≤ 5 5 < t ≤ 8 8 < t ≤ 11 11 < t ≤ 12 12 < t ≤ 15 15 < t ≤ 17 17 < t ≤ 22 22 < t ≤ 23 4

Empirical Survival Function: When there is no censoring, the general formula is: S ( t ) = # individuals with T ≥ t ˜ total sample size 5

Example for leukemia data (control arm): 6

What if there is censoring? Consider the treated group from Table 1.1 of Cox and Oakes: 6 + , 6 , 6 , 6 , 7 , 9 + , 10 + , 10 , 11 + , 13 , 16 , 17 + 19 + , 20 + , 22 , 23 , 25 + , 32 + , 32 + , 34 + , 35 + [Note: times with + are right censored] We know S(6)= 21/21, because everyone survived at least until time 6 or greater. But, we can’t say S(7) = 17/21, because we don’t know the status of the person who was censored at time 6. In a 1958 paper in the Journal of the American Statistical Association , Kaplan and Meier proposed a way to nonparametrically estimate S(t), even in the presence of censoring. The method is based on the ideas of conditional probability . 7

A quick review of conditional probability: Conditional Probability: Suppose A and B are two events. Then, P ( A | B ) = P ( A ∩ B ) P ( B ) Multiplication law of probability : can be obtained from the above relationship, by multiplying both sides by P ( B ): P ( A ∩ B ) = P ( A | B ) P ( B ) 8

Extension to more than 2 events: Suppose A 1 , A 2 ...A k are k different events. Then, the probability of all k events happening together can be written as a product of conditional probabilities: P ( A 1 ∩ A 2 ... ∩ A k ) = P ( A k | A k − 1 ∩ ... ∩ A 1 ) × × P ( A k − 1 | A k − 2 ∩ ... ∩ A 1 ) ... × P ( A 2 | A 1 ) × P ( A 1 ) 9

Now, let’s apply these ideas to estimate S ( t ): Suppose a k < t ≤ a k +1 . Then S ( t ) = P ( T ≥ a k +1 ) = P ( T ≥ a 1 , T ≥ a 2 , . . . , T ≥ a k +1 ) k � P ( T ≥ a 1 ) × P ( T ≥ a j +1 | T ≥ a j ) = j =1 k � = [1 − P ( T = a j | T ≥ a j )] j =1 k � = [1 − λ j ] j =1 10

So, k � � 1 − d j ∼ ˆ � S ( t ) = r j j =1 � � 1 − d j � = r j j : a j <t d j is the number of deaths at a j r j is the number at risk at a j 11

Intuition behind the Kaplan-Meier Estimator Think of dividing the observed timespan of the study into a series of fine intervals so that there is a separate interval for each time of death or censoring: D C C D D D Using the law of conditional probability, � Pr ( T ≥ t ) = Pr ( survive j -th interval I j | survived to start of I j ) j where the product is taken over all the intervals including or preceding time t. 12

There are possibilities for each interval: (1) No events (death or censoring) - conditional probability of surviving the interval is 1 (2) Censoring - assume they survive to the end of the interval, so that the conditional probability of surviving the interval is 1 (3) Death, but no censoring - conditional probability of not surviving the interval is # deaths (d) divided by # ‘at risk’ (r) at the beginning of the interval. So the conditional probability of surviving the interval is 1 − ( d/r ). (4) Tied deaths and censoring - assume censorings last to the end of the interval, so that conditional probability of surviving the interval is still 1 − ( d/r ) 13

General Formula for j th interval: It turns out we can write a general formula for the conditional probability of surviving the j -th interval that holds for all 4 cases: 1 − d j r j We could use the same approach by grouping the event times into intervals (say, one interval for each month), and then counting up the number of deaths (events) in each to estimate the probability of surviving the interval (this is called the lifetable estimate ). However, the assumption that those censored last until the end of the interval wouldn’t be quite accurate, so we would end up with a cruder approximation. 14

The Kaplan-Meier - product-limit - estimator As the intervals get finer and finer, the approximations made in estimating the probabilities of getting through each interval become smaller and smaller, so that the estimator converges to the true S ( t ). This intuition clarifies why an alternative name for the KM is the product limit estimator. 15

The Kaplan-Meier estimator of the survivorship function (or survival probability) S ( t ) = Pr ( T ≥ t ) is: � � r j − d j 1 − d j ˆ S ( t ) = � = � j : τ j <t j : τ j <t r j r j where, • τ 1 , ...τ K are the K distinct death times observed in the sample • d j is the number of deaths at τ j • r j is the number of individuals “at risk” right before the j -th death time (everyone dead or censored at or after that time). • c j is the number of censored observations between the j -th and ( j + 1)-st death times. Censorings tied at τ j are included in c j 16

Note: two useful formulas are: (1) r j = r j − 1 − d j − 1 − c j − 1 � (2) r j = ( c l + d l ) l ≥ j 17

Calculating the KM - Cox and Oakes example Make a table with a row for every death or censoring time: S ( τ + ˆ τ j d j c j r j 1 − ( d j /r j ) j ) 18 6 3 1 21 21 = 0.857 7 1 0 17 9 0 1 16 10 11 13 16 17 19 20 22 23 Note that: • ˆ S ( t + ) only changes at death (failure) times • ˆ S ( t + ) is 1 up to the first death time 18

• ˆ S ( t + ) only goes to 0 if the last event is a death 19

KM plot for treated leukemia patients 20

Note: most statistical software packages summarize the KM survival function at τ + j , i.e., just after the time of the j -th failure. In other words, they provide ˆ S ( τ + j ). When there is no censoring, the empirical survival estimate would then be: S ( t + ) = # individuals with T > t ˜ total sample size 21

Output from STATA KM Estimator: failure time: weeks failure/censor: remiss Beg. Net Survivor Std. Time Total Fail Lost Function Error [95% Conf. Int.] ------------------------------------------------------------------- 6 21 3 1 0.8571 0.0764 0.6197 0.9516 7 17 1 0 0.8067 0.0869 0.5631 0.9228 9 16 0 1 0.8067 0.0869 0.5631 0.9228 10 15 1 1 0.7529 0.0963 0.5032 0.8894 11 13 0 1 0.7529 0.0963 0.5032 0.8894 13 12 1 0 0.6902 0.1068 0.4316 0.8491 16 11 1 0 0.6275 0.1141 0.3675 0.8049 17 10 0 1 0.6275 0.1141 0.3675 0.8049 19 9 0 1 0.6275 0.1141 0.3675 0.8049 20 8 0 1 0.6275 0.1141 0.3675 0.8049 22 7 1 0 0.5378 0.1282 0.2678 0.7468 23 6 1 0 0.4482 0.1346 0.1881 0.6801 25 5 0 1 0.4482 0.1346 0.1881 0.6801 32 4 0 2 0.4482 0.1346 0.1881 0.6801 34 2 0 1 0.4482 0.1346 0.1881 0.6801 35 1 0 1 0.4482 0.1346 0.1881 0.6801 22

Two Other Justifications for KM Estimator I. Likelihood-based derivation (Cox and Oakes) For a discrete failure time variable, define: d j number of failures at a j r j number of individuals at risk at a j (including those censored at a j ). λ j Pr(death) in j -th interval (conditional on survival to start of interval) The likelihood is that of g independent binomials: g λ d j � j (1 − λ j ) r j − d j L ( λ ) = j =1 Therefore, the maximum likelihood estimator of λ j is: ˆ λ j = d j /r j 23

Now we plug in the MLE’s of λ to estimate S(t):: (1 − ˆ ˆ � S ( t ) = λ j ) j : a j <t � � 1 − d j � = r j j : a j <t 24

II. Redistribute to the right justification (Efron, 1967) In the absence of censoring, ˆ S ( t ) is just the proportion of individuals with T ≥ t . The idea behind Efron’s approach is to spread the contributions of censored observations out over all the possible times to their right. Algorithm: • Step (1): arrange the n observed times (deaths or censorings) in increasing order. If there are ties, put censored after deaths. • Step (2): Assign weight (1 /n ) to each time. • Step (3): Moving from left to right, each time you encounter a censored observation, distribute its mass to all times to its right. • Step (4): Calculate ˆ S j by subtracting the final weight for time j from ˆ S j − 1 25

Estimating the Survival Function One-sample nonparametric methods: - PowerPoint PPT Presentation

Estimating the Survival Function One-sample nonparametric methods: We will consider three methods for estimating a survivorship function S ( t ) = Pr ( T t ) without resorting to parametric methods: (1) Kaplan-Meier (2) Life-table (Actuarial

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Estimating survival from Grays Outline flexible model I. Introduction II. Semiparametric

Survival Analysis / Time-to- Event Analysis in R Heidi Seibold Statistician at LMU Munich

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Survival curve showing cohorts Overall Survival Survival Frequency Time (%) 1 year 53.7 2

Survival Analysis Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

Kaplan-Meier estimate Heidi Seibold Statistician at LMU Munich DataCamp Survival Analysis in R

Comparison of Survival Curves We spent the last class looking at some nonparametric approaches

RcmdrPlugin.survival : An R Commander Plug-in Package for Survival Analysis John Fox McMaster

The LIFETEST Procedure Stratum 1: treatment = 0 Product-Limit Survival Estimates Survival

Why use the Weibull model? Heidi Seibold Statistician at LMU Munich DataCamp Survival Analysis

TRIBPIT: ESTIMATING SALMONID COHORT SURVIVAL DURING JUVENILE MIGRATION Rebecca Buchanan and John

Estimating Relative Expression Mark Voorhies 4/6/2011 Mark Voorhies Estimating Relative

Documents Complete Cardiovascular Risk in Type 2 Diabetes: New Therapeutic Approaches Barry

A piece of pie for peace of Adhemar Bultheel Dept. Computer Science, KU Leuven Leuven, March

Data Management Data capture & electronic Case Report Form www.tri-london.ac.uk

Myocarditis - Dilated Cardiomyopathies: The Role of Endomyocardial Biopsy Diagnostic,

Intravascular Ultrasound and Near-infrared spectroscopy of non-culprit, non-steno7c segments for

A Program for Adolescents and Young Adults CARE Program Overview

Some Advice on Applying Machine Learning in Practice CS 760@UW-Madison Its generalization

The Role of the Lung Microenvironment in Has documented that she has no financial Modulating

Estimating the Survival Function One-sample nonparametric methods: - PowerPoint PPT Presentation

Estimating the Survival Function One-sample nonparametric methods: We will consider three methods for estimating a survivorship function S ( t ) = Pr ( T t ) without resorting to parametric methods: (1) Kaplan-Meier (2) Life-table (Actuarial

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Estimating survival from Grays Outline flexible model I. Introduction II. Semiparametric

Survival Analysis / Time-to- Event Analysis in R Heidi Seibold Statistician at LMU Munich

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Survival curve showing cohorts Overall Survival Survival Frequency Time (%) 1 year 53.7 2

Survival Analysis Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

Kaplan-Meier estimate Heidi Seibold Statistician at LMU Munich DataCamp Survival Analysis in R

Comparison of Survival Curves We spent the last class looking at some nonparametric approaches

RcmdrPlugin.survival : An R Commander Plug-in Package for Survival Analysis John Fox McMaster

The LIFETEST Procedure Stratum 1: treatment = 0 Product-Limit Survival Estimates Survival

Why use the Weibull model? Heidi Seibold Statistician at LMU Munich DataCamp Survival Analysis

TRIBPIT: ESTIMATING SALMONID COHORT SURVIVAL DURING JUVENILE MIGRATION Rebecca Buchanan and John

Estimating Relative Expression Mark Voorhies 4/6/2011 Mark Voorhies Estimating Relative

Documents Complete Cardiovascular Risk in Type 2 Diabetes: New Therapeutic Approaches Barry

A piece of pie for peace of Adhemar Bultheel Dept. Computer Science, KU Leuven Leuven, March

Data Management Data capture &amp; electronic Case Report Form www.tri-london.ac.uk

Myocarditis - Dilated Cardiomyopathies: The Role of Endomyocardial Biopsy Diagnostic,

Intravascular Ultrasound and Near-infrared spectroscopy of non-culprit, non-steno7c segments for

A Program for Adolescents and Young Adults CARE Program Overview

Some Advice on Applying Machine Learning in Practice CS 760@UW-Madison Its generalization

The Role of the Lung Microenvironment in Has documented that she has no financial Modulating

Data Management Data capture & electronic Case Report Form www.tri-london.ac.uk