[PPT] - Censored Survival Data: Simulation and Kernel Estimates Ji r PowerPoint Presentation

SLIDE 1

Censored Survival Data: Simulation and Kernel Estimates

Jiˇ r´ ı Zelinka Compstat’2010, Paris, August 22–27

Department of Mathematics and Statistics Faculty of Science, Masaryk University Brno, Czech Republic

Supported by M ˇ SMT LC06024 – p. 1

SLIDE 2

Introduction

Previous research (Horová, Pospíšil & Zelinka (2008) and

Horová, Pospíšil & Zelinka (2009)): combination of kernel smoothing and dynamic model in survival analysis.

Verification of developed method: simulations → problem
The subject of this paper is to solve this problem.

– p. 2

SLIDE 3

Survival and hazard functions

T ≥ 0 survival time F cumulative distribution function (c.d.f.) of T ¯ F = 1 − F survival function λ = λ(x) hazard function Hazard function: – intensity of survival probability: λ(x) = − ¯ F ′(x) ¯ F(x) = − log′( ¯ F(x)) = f(x) ¯ F(x)

(1)

if the density f exists. From (1) we have ¯ F(x) = e

−

x

R λ(t)dt

.

(2)

– p. 3

SLIDE 4

Random censorship model

T1, T2, . . . , Tn i.i.d. lifetimes with c.d.f. F C1, . . . , Cn i.i.d. censoring times with c.d.f. G Censoring times are independent of the lifetimes. In the random censorship model we observe pairs (Xi, δi), i = 1, . . . , n, where Xi = min(Ti, Ci) δi = 1{Xi=Ti} indicates whether the observations is censored or not. {Xi} are i.i.d. with survival function ¯ L: ¯ L(x) = ¯ F(x) ¯ G(x).

– p. 4

SLIDE 5

Kernel estimates of the hazard function

Let [0, τ], τ > 0, be an interval for which L(τ) < 1 and λ ∈ C2[0, τ] and let K be a continuous and symmetric function on R called a kernel satisfying conditions:

1. supp K = [−1, 1]
2. K ∈ Lip[−1, 1]

3. 1

−1 xkK(x)dx =

       1, k = 0 0, k = 1 β2 = 0, k = 2. The well-known kernels: K(x) = 3

4(1 − x2)1[−1,1]

Epanechnikov kernel K(x) = 15

16(1 − x2)21[−1,1]

quartic kernel

– p. 5

SLIDE 6

The kernel estimate of the hazard function is given as ˆ λh,K(x) = 1 h

n

i=1

K x − X(i) h

δ(i)

n − i + 1 .

(3)

The parameter h is called bandwidth or smoothing parameter. Let us denote V (K) = 1

−1 K2(x)dx,

β2 = 1

−1 x2K(x)dx,

Λ = T

λ(x) ¯ L(x)dx,

D2 = T

λ(2)(x)

2 dx. The global quality of the estimate – Mean Integrated Square Error: MISE

ˆ

λh,K

=

T MSE

ˆ

λh,K(x)

dx =

T E

ˆ

λh,K(x) − λ(x) 2 dx,

– p. 6

SLIDE 7

The leading term MISE(ˆ λh,K) of MISE(ˆ λh,K) takes the form MISE

ˆ

λh,K

= 1

4h4β2

2D2 + V (K)Λ

nh The asymptotically optimal bandwidth minimizing MISE(ˆ λh,K) with respect to h is given by the formula hopt = n−1/5 ΛV (K) β2

2D2

1/5

(4)

The estimate of hopt will be denoted with ˆ

hopt. See Horová & Zelinka (2006)

for method of evaluating the appropriate estimate ˆ hopt.

– p. 7

SLIDE 8

Kernel estimate of the hazard function – example

50 100 150 200 250 0.75 0.8 0.85 0.9 0.95 1 50 100 150 200 250 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10−3 – p. 8

SLIDE 9

Simulation of lifetimes

For given hazard function λ we have (see (2)) F(x) = 1 − e

−

x

R λ(t)dt

The lifetimes T1,. . . ,Tn can be evaluated numerically by re-sampling random variables U1,. . . ,Un uniformly distributed on interval [0, 1].

1 2 3 4 5 6 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

F

T U

– p. 9

SLIDE 10

Simulation of censoring times

Real situation: Let’s have a clinical study dealing with some disease. The research begins in time t0 (we can suppose t0 = 0). Patients come to the study randomly in interval [t0, t1], the begin of treatment is given by random variable B with cumulative distribution function H. The coming of patients is broken in time t1, but the study may continue to some time t2 ≥ t1 when it is finished. The censorship time is C = t2 − B For the survival function ¯ G we have ¯ G(x) = H(t2 − x) .

t2−t1 t1 t2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

H(x) 1−G(x)

Cumulative distribution function for coming of patients (H) and survival function of censoring times ( ¯ G = 1 − G).

– p. 10

SLIDE 11

Let us recall h5

pt = V (K)Λ

nβ2

2 D2

for V (K) = 1

−1 K2(x)dx,

β2 = 1

−1 x2K(x)dx,

Λ = τ

λ(x) ¯ F (x) ¯ G(x)dx,

D2 = τ

λ(2)(x)

2 dx. Choice of τ: naturally τ = t2, ⇒ problem with counting Λ as ¯ G(t2) = 0. Solution: for ¯ G let us take such λ that ¯ F(t2) > 0, λ(x) ¯ G(x) = O(1), for x → t2 . As a result of this property we have λ(t2) = 0 and for λ ∈ C2[0, T] also λ′(t2) = 0 as λ is non-negative. In all simulations let the begins of treatment B be uniformly distributed on [0, t1]. Due to this fact the cumulative distribution function C is uniformly dis- tributed on [t2 − t1, t2].

– p. 11

SLIDE 12

Simulation 1

λ: unimodal hazard function on [0, t2]: λ(x) = x(2 − x)2 F(x) = 1 − e

x2 12 (3x2−16x+24).

Let K be the Epanechnikov kernel and n = 100 Case A: t1 = 1, t2 = 2, hopt = 0.4437 Case B: t1 = 1.5, t2 = 2, hopt = 0.4721 Case C: t1 = 2, t2 = 2, hopt = 0.4993

– p. 12

SLIDE 13

Estimate of λ for optimal bandwidth

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.2 0.4 0.6 0.8 1 1.2 1.4

Case A: λ – dashed line, estimate – solid line

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.2 0.4 0.6 0.8 1 1.2 1.4

Case B: λ – dashed line, estimate – solid line

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.2 0.4 0.6 0.8 1 1.2 1.4

Case C: λ – dashed line, estimate – solid line

– p. 13

SLIDE 14

Estimate of hopt for 200 repetitions

t1=1.0 t1=1.5 t1=2.0 0.2 0.4 0.6 0.8 1 Values

Dashed lines: optimal bandwidths

– p. 14

SLIDE 15

Simulation 2

λ: unimodal hazard function on [0, t2]: λ(x) = 1 100

1 − cos 2 ∗ π

t2 x

F(x) = 1 − e

1 100 ( t2 2∗pi sin 2∗pi t2 x−x)

Let K be the Epanechnikov kernel and n = 100 Case A: t1 = 100, t2 = 200, hopt = 43.703 Case B: t1 = 150, t2 = 200, hopt = 47.122

– p. 15

SLIDE 16

Estimate of λ for optimal bandwidth

20 40 60 80 100 120 140 160 180 200 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02

Case A: λ – dashed line, estimate – solid line

20 40 60 80 100 120 140 160 180 200 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02

Case B: λ – dashed line, estimate – solid line

– p. 16

SLIDE 17

Estimate of hopt for 200 repetitions

t1=100 t1=150 15 20 25 30 35 40 45 50 55 60 Values

Dashed lines: optimal bandwidths

– p. 17

SLIDE 18

Simulation 3

λ: bimodal hazard function on [0, t2]: λ(x) = 1 100

1 − cos 4 ∗ π

t2 x

F(x) = 1 − e

1 100 ( t2 4∗pi sin 4∗pi t2 x−x)

Let K be the Epanechnikov kernel and n = 200 Case A: t1 = 100, t2 = 200, hopt = 23.443 Case B: t1 = 150, t2 = 200, hopt = 25.255

– p. 18

SLIDE 19

Estimate of λ for optimal bandwidth

20 40 60 80 100 120 140 160 180 200 0.005 0.01 0.015 0.02 0.025

Case A: λ – dashed line, estimate – solid line

20 40 60 80 100 120 140 160 180 200 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02

Case B: λ – dashed line, estimate – solid line

– p. 19

SLIDE 20

Estimate of hopt for 200 repetitions

t1=100 t1=150 8 10 12 14 16 18 20 22 24 26 Values

Dashed lines: optimal bandwidths

– p. 20

SLIDE 21

Conclusion

The simulations indicate that the proposed method of generating

random censored data for given cumulative distribution function C and hazard function λ can be well applied for testing the algorithms of survival analysis.

At the same time the simulations show that the method of bandwidth

choice proposed in Horová & Zelinka (2006) gives worse results for the greater frequency of censored data, but the estimates of optimal bandwidth are still well usable.

– p. 21

SLIDE 22

References

Collett D.: Modelling Survival Data in Medical Research. Chapman & Hall/CRC: Boca Raton-London-New York-Washington, D.C., 2003. Horová I., Zelinka J., Budíková M.: Estimates of Hazard Functions for Carcinoma Data Sets. Environmetrics, 17, 239–255, 2006. Horová I., Zelinka J.: (2006) Kernel Estimates of Hazard Functions for Biomedical Data Sets. In Applied Biostatistics: Case studies and Interdisciplinary Methods, Springer, 2006. Horová I., Pospíšil Z., Zelinka J.: Semiparametric Estimation of Hazard Function for Cancer Patients, Sankhya, 69, 494–513, 2008. Horová I., Pospíšil Z., Zelinka J.: Hazard function for cancer patients and cancer cell dynamics, Journal of Theoretical Biology, textbf258, 437–443, 2009.

– p. 22

SLIDE 23

References

Müller H.G., Wang J.L.: Nonparametric Analysis of Changes in Hazard Rates for Censored Survival Data: An alternative Change-Point Models. Biometrika, 77(2), 305–314, 1990. Ramlau-Hansen H.: Counting Processes Intensities by Means of Kernel

Functions. The Annals of Statistics, 11(2), 453–466, 1983.

Tanner M.A., Wong W.H.: The Estimation of the Hazard Function from Randomly Censored Data by the Kernel Method. The Annals of Statistics, 11(3), 989–993, 1983. Uzunogullari U., Wang J.L.: A comparision of Hazard Rate Estimators for Left Truncated and Right Censored Data. Biometrika, 79(2), 297–310, 1992. Wand, I.P ., Jones, I.C.: Kernel smoothing. Chapman & Hall, London, 1995.

– p. 22