SEMIPARAMETRIC MODELS IN SURVIVAL ANALYSIS USING BETA PROCESS: - - PowerPoint PPT Presentation

semiparametric models in survival analysis using beta
SMART_READER_LITE
LIVE PREVIEW

SEMIPARAMETRIC MODELS IN SURVIVAL ANALYSIS USING BETA PROCESS: - - PowerPoint PPT Presentation

14th EYSM Debrecen, 24 August 2005 SEMIPARAMETRIC MODELS IN SURVIVAL ANALYSIS USING BETA PROCESS: PROPORTIONAL HAZARDS MODEL Pierpaolo De Blasi Universit` a Luigi Bocconi di Milano, Italy Joint work with: Nils Lid Hjort (University of Oslo)


slide-1
SLIDE 1

14th EYSM Debrecen, 24 August 2005

SEMIPARAMETRIC MODELS IN SURVIVAL ANALYSIS USING BETA PROCESS: PROPORTIONAL HAZARDS MODEL

Pierpaolo De Blasi Universit` a Luigi Bocconi di Milano, Italy Joint work with: Nils Lid Hjort (University of Oslo)

slide-2
SLIDE 2

SURVIVAL ANALYSIS Let T be a survival time, with cumulative df F(t) on [0, ∞). In survival analysis the basic quantity of interest is the hazard rate: α(t) = lim

h↓0 h−1P{T ≤ t + dt|T > t}

so that F(t) = 1 − exp{− t

0 α(s)ds}.

Nonparametric estimation deals with cumulative hazard: A(t) = t dF(s) F(s, ∞) Correspondence formula: F(t) = 1 −

[0,t]{1 − dA(s)}, which involves the product integral.

With continuous survival times A(t) = t

0 α(s)ds. Otherwise it has jumps in (0,1).

Hazard rate concept allows to introduce covariate information on individual survival distribution by hazards regression models:

  • Proportional hazards model
  • Additive hazards model

see Andersen, Borgan, Gill and Keiding (1993)

slide-3
SLIDE 3

COUNTING PROCESS FORMULATION Take (T1, δ1), . . . , (Tn, δn) be a sample of right-censored survival times: failure indicators δi = 0 when Ti > ti counting process Ni(t) = I{Ti ≤ t, δi = 1} at risk indicator process Yi(t) = I{Ti ≥ t} Multiplicative Intensity Model (Aalen 1978) Ni can be decomposed into the sum of its cumulative intensity process Λi(t) = t

0 Yi(s)dA(s) and a

martingale noise Mi: dNi(t) = Yi(t)dA(s) + dMi(t). Nelson-Aalen estimator of A and Kaplan-Meier estimator of F are defined as: ˆ A∗(t) =

  • ti≤t

δi Y·(ti) = t dN·(ds) Y·(s) , ˆ F ∗(t) = 1 −

  • [0,t]
  • 1 − dN·(ds)

Y·(s)

  • .

Probabilistic interpretation: dN·(t)|Ft− ∼ Binomial(Y·(t), dA(t)).

slide-4
SLIDE 4

BETA PROCESS Nonparametric Bayes analysis for F, or for A: need a nonparametric prior for F, in the space F of all survival distribution, or for A, in the space A of all cumulative hazard functions. Hjort (1990) introduced the so called beta process. Let c(t) be a positive, piecewise continuous function on [0, ∞) and A0 ∈ A. A ∼ Beta

  • c(·), A0(·)
  • has independent increments with

dA(s) ∼ beta

  • c(s)dA0(s), c(s){1 − dA0(s)}
  • .

It admits L´ evy representation: E exp{−θA(t)} = exp

1 (1 − eθs) t s−1(1 − s)c(x)−1c(x)dA0(x) ds

evy measure

  • and jumps in (0, 1). Also

EA(t) = A0(t) and V ar A(t) = t dA0(s) 1 + c(s). Special case: when c(s) = c exp{−A0(s)}, then F is a Dirichlet process.

slide-5
SLIDE 5

Posterior distribution: Conjugacy to right-censored data, A|data ∼ Beta

  • c(·) + Y·(·), ˆ

A

  • , with

ˆ A(t) = E{A(t)|data} = t c(s)dA0(s) + dN·(s) c(s) + Y·(s) . Also ˆ F(t) = E{F(t)|data} = 1 −

[0,t]{1 − d ˆ

A(s)}.

slide-6
SLIDE 6

PROPORTIONAL HAZARDS MODEL Consider lifetimes drawn from a non homogeneous population: data take the form (T1, δ1, X1), . . . , (Tn, δn, Xn), where X is a covariate vector. Assume that the measurements Xi influence the individual’s hazard rate according to αi(t) = r(Xt

iβ)α0(t).

(prop haz)

  • β is a vector of coefficients;
  • the relative risk function r(·) is a nonnegative function on [0, ∞), often assumed to be 1 at zero;
  • α0(·) is a baseline hazard rate, usually left unspecified =

⇒ semiparametric analysis. Cox regression model (Cox 1972) It postulates the exponential function for r(·): αi(t) = eXt

i βα0(t).

(oldcox) Model presented for absolutely continuous failure times. # How to extend to accomodate discrete components?

slide-7
SLIDE 7

BOUNDED RISK FACTOR Idea 1: dAi(s) = dA0(s) exp(Xt

iβ)

Take Beta process for A0 and prior for β. Does not work since we need jumps in (0, 1). Idea 2: dAi(s) = 1 − {1 − dA0(s)}exp(Xt

i β)

Take Beta process for A0 and prior for β. Work well, see Hjort(1990), Laud, Damien and Smith (1998), Kim and Lee (2003,2004). ”New Cox” model It postulates the logistic function for r(·): αi(s) = α0(s) eXt

i γ

1 + eXt

i γ

(newcox) Since the relative risk function r(y) = ey/(1 + ey) is bounded by 1, Idea 1 is fine: dAi(s) = dA0(s) eXt

i γ

1 + eXt

i γ .

Use Beta process for A0 and a prior π(γ) for γ. # How can we justify the boundness condition?

slide-8
SLIDE 8

Frailty models that yelds proportional hazards (Aalen and Hjort 2002, Hjort 2003) Individuals are exposed to an unobservable cumulative damage type process of the form Z(t) =

  • j≤M(t)

Gj where G1, G2, . . . are iid nonnegative random variables, interpreted as adding over time to the hazard level of the individual, while M(t) is a Poisson process with cumulative intensity Λ(t) = t

0 λ(s).

F(t, ∞|Ft−) = exp {−Z(t)} The unconditional survival function takes the form F(t, ∞) = E exp {−Z(t)} = exp

  • −(1 − Ee−G1)Λ(t)
  • The covariate Xi may enters in the specification of
  • the individual specific Poisson rate λi;
  • in the distribution of Gi,j.

With common Poisson rate λ(t), the individual hazard rate is given by αi(t) = (1 − Ee−Gi,1)λ(t). (1 − Ee−Gi,1) ≡ r(xt

iγ) =

⇒ 0 ≤ r(xt

iγ) ≤ 1.

The logistic function is a reasonable choice!

slide-9
SLIDE 9

αi(s) = α0(s) eXt

i γ

1 + eXt

i γ

(newcox) # Why New Cox should be preferred to the Cox model?

  • exponential form for r(·) is mostly tradition & convenience. The logistic function may be appro-

priate as well.

  • It may achieve a better fit on real data: space for goodness-of-fit:

– Nelson-Aalen plots of ˆ Zi = ˆ A∗

0(ti)r(Xt i ˆ

γ) for the two r functions. When the model is correct, the ˆ Zi’s are almost like right-censored life-times from unit exponential. – ”Very General Cox” model: αi(s) = α0(s) eXt

i γ

{1 + eXt

i γ}κ

(very gen) and check whether ˆ κ is closer to 0 or 1.

slide-10
SLIDE 10

REAL DATA EXAMPLE

Thickness: goodness−of−fit of NEW vs OLD

time hatZ_i 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 newcox

  • ldcox

Thickness: maximum of logLike(gamma,kappa)

kappa profile log_like 0.0 0.2 0.4 0.6 0.8 1.0 1.2 −273 −272 −271 −270 −269 −268 −267

Figure 1: Danish melanoma data, n = 205, covariate ”thickness”. (Left) Nelson-Aalen plots of ˆ Zi = ˆ A∗

0(ti)r(xt iˆ

γ). (Right) Very Gen: profile log-likelihood for the κ parameter: ˆ κ = 1.008.

slide-11
SLIDE 11

Thickness: estimated relative risk

x r(x’ hat_gamma) 5 10 15 1 2 3 4 5 6

  • ldcox

newcox very general

Figure 2: Danish melanoma data, n = 205, covariate ”thickness”. Estimated relative risk for oldcox and newcox versus very gen.

slide-12
SLIDE 12

FREQUENTIST ANALYSIS The estimator ˆ γ is the maximizer of the partial likelihood PLn(γ) =

n

  • i
  • r(xt

iγ)

n−1 n

j=1 Yj(ti)r(xt jγ)

  • S

(0)

n (s, γ)

δi , i.e. ∂/∂ˆ γ log PLn(ˆ γ) = 0, where log PLn(γ) =

n

  • i=1

∞ {log r(xt

iγ) − log S

(0)

n (s, γ)}dNi(s)

The Aalen-Breslow estimator for A0 is given by ˆ A∗

0(t) =

t n

i=1 dNi(s)

nS

(0)

n (s, γ) .

Assume that the regression model is correct; by using martingale theory (Prentice & Self 1983) we may prove the asimptotic normality of the MPL estimator of γ: n1/2(ˆ γ − γ)

d

→ Np(0, Σ(γ)−1) where Σ(γ) = ∞ v(s, γ)s

(0)(s, γ)α0(s)

slide-13
SLIDE 13

with v = s(2)/s(0) − eet and e = s(1)/s(0). These are limit functions of S

(1)

n (s, γ)

= ∂ ∂γ S

(0)

n (s, γ)

= n−1

n

  • i=1

Yi(s)r(xt

iγ)[1 − r(xt iγ)]xi

S

(2)

n (s, γ)

= n−1

n

  • i=1

Yi(s)r(xt

iγ)[1 − r(xt iγ)]2xixt i

slide-14
SLIDE 14

BAYES ANALYSIS L(A, γ) =

n

  • i=1
  • t
  • ext

1 + ext

iγ dA(t)

dNi(t) 1 − ext

1 + ext

iγ dA(t)

Yi(t)−dNi(t)

  • Beta process A ∼ Beta(c, A0)
  • Prior π(γ) for γ. For example the Jeffreys prior:

π(γ) ∝

  • 1

n

n

  • i=1

xixt

i

(1 + ext

iγ)2

  • 1/2

which leads to proper posterior. # How to do updating for (A, γ)?

  • 1. posterior of A given γ;
  • 2. posterior of γ: integrate the likelihood w.r. to A|(γ, data).

# How does the posterior distribution behaves?

  • 3. Bernstein-von Mises theorem for γ.
slide-15
SLIDE 15
  • 1. posterior of A given γ

Theoretical tools: L´ evy representations; L´ evy measures; Poisson random measures. Let A be a L´ evy process in the space A of cumulative hazard function (dA(t) ∈ [0, 1]). We may define a random measure on [0, 1] × [0, ∞): µ(ds, dz) = I{dA(z) ∈ [s, s + ds]}, characterized by its mean measure ν(ds, dz) = Eµ(ds, dz) on [0, 1] × [0, ∞). The connection with the L´ evy representation is that ν simply extends the L´ evy measure by incorpo- rating the fixed points of discontinuity. We may recover the mean by EA(t) = t 1 sν(ds, dz). A Beta(c, A0) process for A is characterized by ν(ds, dz) = c(z)(1 − s)c(z)−1

  • a(s, z)

ds dA0(z) when A0 is continuous. More general formulae are in order for noncontinuous A0 (fixed points of discontinuity with beta density).

slide-16
SLIDE 16

Let t1 < . . . < tqn denote the complete distinct observations. Before data A is characterized by ν(ds, dz) = a(s, z)ds dA0(z). A|(γ, data) is still a L´ evy process

  • Outside the data points z = {t1, . . . , tqn}

νn(ds, dz) =

  • n
  • i=1

{1 − r(xt

iγ)s}Yi(z)

  • Rn,z(s, γ)

a(s, z)ds dA0(z)

  • At data points A has random jumps: ∆A(z), at z = ti has density

fi(s, γ) = 1 ki(γ)gi(s, γ) fors ∈ [0, 1] with ki(γ) = 1

0 gi(s, γ)ds and

gi(s, γ) = r(xt

iγ)

  • n
  • j=1

{1 − r(xt

jγ)s}Yj(ti)−△Nj(ti)

  • R−

n,ti(s, γ)

c(ti)(1 − s)c(ti)−1.

slide-17
SLIDE 17
  • 2. posterior of γ

γ|data has a well-defined density: π(γ|data) ∝ π(γ) exp{−ρn(γ)}

qn

  • i=1

ki(γ) where ρn(γ) =

n

  • i=1

ti 1 r(xt

iγ)s n

  • j=i+1

[1 − r(xt

jγ)s] a(s, z)ds dA0(z)

# How to simulate from γ|data? Open question With samples of γ|data, we are interested to simulate from A|data by averaging A|(γ, data) with respect to π(γ|data). It would make possible to simulate from the full posterior and to obtain posterior distribution of quantities of interest, e.g. m(t0, x) = A−1

t0

  • log 2/r(xtγ)
  • ,

the median remaining life-length for a person with covariate vector x and who is alive at age t0.

slide-18
SLIDE 18

# Why not implement a Gibbs sampler using auxiliary variables? Almost done We may generate a sample path of A|(γ, data) by separating the fixed jumps from the continuous part and approximating the continuous part by a compound point process. (a) [∆|γ, data]: for large m, generate jump times t1 < . . . < tm from an appropriate probability measure on [0, τ], then ∆i = △A(ti), each with an infinitely divisible distribution, according to the L´ evy measure νn given above; (b) [U|γ, data]: independent jumps of A at event times t1, . . . , tqn with density proportional to gi(s, γ). It can be done by introducing auxiliary variables; (c) [γ|∆, U, data]: adapting the likelihood to the previous notation [γ|∆, U, data] ∝ π(γ) ×

qn

  • i=1

r(xt

iγ)Ui R− n,ti(Ui, γ) × N

  • i=1

Rn,ti(∆i, γ) then implement a standard Metropolis-Hasting sampler.

slide-19
SLIDE 19
  • 3. Bernstein-von Mises theorem for γ

Write π(γ|data) ∝ exp{Λn(γ)}π(γ) where Λn(γ) = −ρn(γ) + qn

i=1 log ki(γ)

Assume model is correct for some γ0. For a given compact neighborhood G of γ0 sup

γ∈G

|ρn(γ)| = OP(log n) Note that n−1(Λn(γ) − Λn(γ0)) = n−1

qn

  • i=1
  • log ki(γ) − log ki(γ0)
  • −n−1

ρn(γ) − ρn(γ0)

  • is asymptotically equivalent to

n−1(log PLn(γ) − log PLn(γ0)) It enables us to prove a consistency result in terms of asymptotic normality of the posterior: Bernstein-von Mises theorem n1/2(γ − ˆ γ)|data →d Np(0, Σ(γ0)−1) in prob.

slide-20
SLIDE 20

REFERENCE

  • 1. Aalen O. (1978). Nonpmic Inference for a Family of Counting Processes. Ann Statist 6: 701-726.
  • 2. Aalen O., Hjort N. L. (2002).

Frailty models that yeld proportional hazards. Statistics and Probability Letters 58: 335-342.

  • 3. Andersen P. K., Borgan O., Gill R.D., Keiding N. (1993). Statistical Models Based on Counting
  • Processes. Springer, New York.
  • 4. Cox D. R. (1972). Regression Models with Life Tables (with discussion). JRSS B 34 187-220.
  • 5. Hjort N. L. (1990). Nonparametric Bayes Estimators Based on Beta Processes in Models for Life

History Data. Ann. Statist. 18: 1259-1294.

  • 6. Hjort N. L. (2003). Topics in Nonparametric Bayesian Statistics, In: Higly Structured Stochastic
  • Systems. Eds Green P.J., Richardson S. and Hjort N.L. Oxford University Press.
  • 7. Kim, Y., Lee, J. (2003). Bayes Analysis of Proportional Hazard Models. Ann Stat 31: 493-511.
  • 8. Kim, Y., Lee, J. (2004). A Bernstein-von Mises Theorem of Semiparametric Bayesian Models

for Survival Data. Preprint.

  • 9. Laud P.W., Damien P., Smith A.F.M. (1998). Bayesian Nonparametric and Covariate Analysis of

Failure Time Data. Practical Nonpmic and Semipmic Bayesian Satatistics, Springer New York.

  • 10. Prentice, R.L. and Self S.G. (1983). Asymptotic DistributionTheory for Cox-Type Regression

Models with General Relative Risk Form. Ann. Statist. 11: 804-813.