Conditional density estimation in a censored single-index regression - - PowerPoint PPT Presentation

conditional density estimation in a censored single index
SMART_READER_LITE
LIVE PREVIEW

Conditional density estimation in a censored single-index regression - - PowerPoint PPT Presentation

Conditional density estimation in a censored single-index regression model Olivier Bouaziz 1 and Olivier Lopez 2 1 Laboratoire de Statistique Thorique et Applique 2 Crest-Ensai, Irmar, and Weierstrass Institute (Berlin) International Workshop


slide-1
SLIDE 1

Conditional density estimation in a censored single-index regression model

Olivier Bouaziz1 and Olivier Lopez2

1 Laboratoire de Statistique Théorique et Appliquée 2 Crest-Ensai, Irmar, and Weierstrass Institute (Berlin)

International Workshop on Applied Probability Compiègne, 10-07-08

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 1 / 24

slide-2
SLIDE 2

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Introduction

Standford Heart Transplant Data :

Yi response variable : survival time of the patient i. Xi covariate vector (age and square of age)

Censored data : for some patients Yi is not observed. Possible causes :

Administrative censoring Patient died of causes independent of the heart transplant ...

Regression model on these data : Miller and Halpern (1982), Wei et al. (1990), Stute et al. (2000)...

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 2 / 24

slide-3
SLIDE 3

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Semiparametric model

Conditional density estimation of Y given X = x : f (Y |x). Problem of the “curse of dimensionality”. Semiparametric model for dimension reduction. S.I.M. assumption ∃ θ0 ∈ Θ ⊂ Rd s.a. f (y|x) = fθ0(y,x′θ0) where fθ(y,u) denotes the conditional density of Y given X ′θ = u evaluated at Y = y.

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 3 / 24

slide-4
SLIDE 4

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Censored data

We look at Y1,...,Yn (non observed). C1,...,Cn censoring random variables. Observations      Zi = Yi ∧Ci 1 ≤ i ≤ n δi = 1Yi≤Ci 1 ≤ i ≤ n Xi ∈ χ ⊂ Rd 1 ≤ i ≤ n. Assumptions of Koul et al.(1981), Stute (1996), Stute (1999), Stute et al.(2000), Sellero et al.(2005)... For i = 1...n, P(Yi = Ci) = 0 Yi ⊥ ⊥ Ci P(Yi ≤ Ci|Xi,Yi) = P(Yi ≤ Ci|Yi).

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 4 / 24

slide-5
SLIDE 5

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Outline

1

Estimation procedure

2

Asymptotic results for ˆ θ

3

Key ingredients of proof

4

Simulation study and analysis on real data

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 5 / 24

slide-6
SLIDE 6

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Estimation procedure

Assume we know fθ and define for any function J ≥ 0, L(θ,J) = E

  • logfθ(Y ,θ ′X)J(X)
  • =
  • logfθ(y,θ ′x)J(x)dFX,Y (x,y)

where FX,Y (x,y) = P(X ≤ x,Y ≤ y). Then θ0 = argmax

θ∈Θ

L(θ,J). Problems Estimation of FX,Y (x,y) Estimation of fθ

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 6 / 24

slide-7
SLIDE 7

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Estimation procedure

Assume we know fθ and define for any function J ≥ 0, L(θ,J) = E

  • logfθ(Y ,θ ′X)J(X)
  • =
  • logfθ(y,θ ′x)J(x)dFX,Y (x,y)

where FX,Y (x,y) = P(X ≤ x,Y ≤ y). Then θ0 = argmax

θ∈Θ

L(θ,J). Problems Estimation of FX,Y (x,y) Estimation of fθ

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 6 / 24

slide-8
SLIDE 8

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Estimation of FX,Y

Estimator of FX,Y Estimator of FX,Y proposed by Stute (1993) : ˆ F(x,y) =

n

i=1

δiWin1Zi≤y,Xi≤x where Win =

1 n(1−ˆ G(Zi−)) and ˆ

G is the Kaplan Meier estimator of G(·) = P(C ≤ ·).

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 7 / 24

slide-9
SLIDE 9

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Estimation of fθ

We use a nonparametric kernel smoothing estimator. Let K be a kernel and h a bandwith with classical hypotheses. Estimator of fθ ˆ f h

θ (z,θ ′x) =

Kh(θ ′x −θ ′u)Kh(z −y)d ˆ

F(u,y)

Kh(θ ′x −θ ′u)d ˆ

FX(u) , where Kh(·) = h−1K(·/h) and ˆ FX is the empirical estimator of FX.

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 8 / 24

slide-10
SLIDE 10

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

First estimator of θ

We use the following pseudo-likelihood : Pseudo likelihood Ln(θ,ˆ f h

θ ,J) =

  • logˆ

f h

θ (y,θ ′x)J(x)d ˆ

FX,Y (x,y) =

n

i=1

δiWin logˆ f h

θ (Zi,θ ′Xi)J(Xi)

We derive the following estimator :

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 9 / 24

slide-11
SLIDE 11

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

First estimator of θ

We use the following pseudo-likelihood : Pseudo likelihood Ln(θ,ˆ f h

θ ,J) =

  • logˆ

f h

θ (y,θ ′x)J(x)d ˆ

FX,Y (x,y) =

n

i=1

δiWin logˆ f h

θ (Zi,θ ′Xi)J(Xi)

We derive the following estimator : Estimator of θ ˆ θ(h) = argmax

θ∈Θ

Ln(θ,ˆ f h

θ ,J).

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 9 / 24

slide-12
SLIDE 12

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

First estimator of θ

We use the following pseudo-likelihood : Pseudo likelihood Ln(θ,ˆ f h

θ ,J) =

  • logˆ

f h

θ (y,θ ′x)J(x)d ˆ

FX,Y (x,y) =

n

i=1

δiWin logˆ f h

θ (Zi,θ ′Xi)J(Xi)

We derive the following estimator : Estimator of θ ˆ θ(ˆ h) = argmax

θ∈Θ

Ln(θ,ˆ f

ˆ h θ ,J).

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 9 / 24

slide-13
SLIDE 13

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Adaptive choice of τ

The Kaplan-Meier estimator does not behave well in the tail of the distribution. Truncation bound : we only keep observations lower than τ. SIM assumption For any τ, L (Y |X,Y ≤ τ) = L (Y |X ′θ0,Y ≤ τ) How can we choose τ from the data ? Asymptotic criterion :

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 10 / 24

slide-14
SLIDE 14

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Adaptive choice of τ

The Kaplan-Meier estimator does not behave well in the tail of the distribution. Truncation bound : we only keep observations lower than τ. SIM assumption For any τ, L (Y |X,Y ≤ τ) = L (Y |X ′θ0,Y ≤ τ) How can we choose τ from the data ? Asymptotic criterion :

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 10 / 24

slide-15
SLIDE 15

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Adaptive choice of τ

The Kaplan-Meier estimator does not behave well in the tail of the distribution. Truncation bound : we only keep observations lower than τ. SIM assumption For any τ, L (Y |X,Y ≤ τ) = L (Y |X ′θ0,Y ≤ τ) How can we choose τ from the data ? Asymptotic criterion :

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 10 / 24

slide-16
SLIDE 16

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Adaptive choice of τ

The Kaplan-Meier estimator does not behave well in the tail of the distribution. Truncation bound : we only keep observations lower than τ. SIM assumption For any τ, L (Y |X,Y ≤ τ) = L (Y |X ′θ0,Y ≤ τ) How can we choose τ from the data ? Asymptotic criterion :

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 10 / 24

slide-17
SLIDE 17

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Adaptive choice of τ

The Kaplan-Meier estimator does not behave well in the tail of the distribution. Truncation bound : we only keep observations lower than τ. SIM assumption For any τ, L (Y |X,Y ≤ τ) = L (Y |X ′θ0,Y ≤ τ) How can we choose τ from the data ? Asymptotic criterion : E 2(τ) := lim

n E

  • ˆ

θ τ(ˆ hτ)−θ02

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 10 / 24

slide-18
SLIDE 18

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Adaptive choice of τ

The Kaplan-Meier estimator does not behave well in the tail of the distribution. Truncation bound : we only keep observations lower than τ. SIM assumption For any τ, L (Y |X,Y ≤ τ) = L (Y |X ′θ0,Y ≤ τ) How can we choose τ from the data ? Asymptotic criterion : ˆ E 2(τ) := lim

n

ˆ E

  • ˆ

θ τ(ˆ hτ)−θ02

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 10 / 24

slide-19
SLIDE 19

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Final estimator

Our final estimator is obtained after estimation of h and τ : adaptive bandwidth ˆ h adaptive truncation bound ˆ τ Final estimator ˆ θ ˆ

τ(ˆ

h) = argmax

θ∈Θ

τ n(θ,ˆ

f

ˆ h,ˆ τ θ

,J) = argmax

θ∈Θ n

i=1

δi1Zi≤ˆ

τWin logˆ

f

ˆ h,ˆ τ θ

(Zi,θ ′Xi)J(Xi). ˆ θ := ˆ θ ˆ

τ(ˆ

h)

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 11 / 24

slide-20
SLIDE 20

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Outline

1

Estimation procedure

2

Asymptotic results for ˆ θ

3

Key ingredients of proof

4

Simulation study and analysis on real data

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 12 / 24

slide-21
SLIDE 21

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Consistency and asymptotic normality

Consistency sup

θ,h,τ

|Lτ

n(θ,ˆ

f h,τ

θ

,J)−L(θ,J)| = oP(1) and consequently, ˆ θ →P θ0. Theorem Under some appropriate assumptions, Lτ

n(θ,ˆ

f h,τ

θ

,J) = Lτ

n(θ,fθ,J)+negligible terms.

As a consequence, √n(ˆ θ −θ0) = ⇒ N (0,Στopt).

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 13 / 24

slide-22
SLIDE 22

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Outline

1

Estimation procedure

2

Asymptotic results for ˆ θ

3

Key ingredients of proof

4

Simulation study and analysis on real data

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 14 / 24

slide-23
SLIDE 23

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Convergence properties of ˆ f

Using Einmahl and Mason (2005) results, we have obtained the following uniform rates of convergence in h, τ and θ : Convergence rates sup

x,y,h,τ,θ

h

  • ˆ

f h,τ

θ0 (y,θ ′x)−fθ0(y,θ ′x)

  • 1y≤τJ(x) = OP
  • [logn]1/2

n1/2

  • ,

sup

x,y,h,τ,θ

h2

  • ∇θˆ

f h,τ

θ0,τ(y,x)−∇θfθ0(y,x)

  • 1y≤τJ(x) = OP
  • [logn]1/2

n1/2

  • ,

sup

x,y,h,τ,θ

  • ∇2

θˆ

f h,τ

θ

(y,x)−∇2

θfθ(y,x)

  • 1y≤τJ(x) = oP(1).
  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 15 / 24

slide-24
SLIDE 24

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Regularity assumptions on the regression model

Let Y ∈ Y , M > 0. We define H1 = C 1+δ(θ ′

0X ×Y ,M),

H2 = xC 1+δ(θ ′

0X ×Y ,M)+C 1+δ(θ ′ 0X ×Y ,M)

where C 1+δ(θ ′

0X ×Y ,M) is the class of M bounded

functions with δ-Hölderian partial derivatives on θ ′

0X ×Y .

From van der Vaart and Wellner, (1996) : logN

  • ε,H1,·∞
  • ≤ K

1 ε 2/(1+δ) .

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 16 / 24

slide-25
SLIDE 25

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Regularity assumptions on the regression model

H1 and H2 are Donsker classes. Donsker classes Assumption : we suppose fθ0 ∈ H1 and ∇θfθ0 ∈ H2. Proposition : then ˆ f h,τ

θ0 ∈ H1 and ∇θˆ

f h,τ

θ0 ∈ H2 with probability

tending to one for some M. Empirical processes tools to prove the asymptotic normality.

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 17 / 24

slide-26
SLIDE 26

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Outline

1

Estimation procedure

2

Asymptotic results for ˆ θ

3

Key ingredients of proof

4

Simulation study and analysis on real data

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 18 / 24

slide-27
SLIDE 27

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Simulation study

Regression model Yi = θ ′

0Xi +εi,

i = 1,...,n Yi ∈ R i = 1,...,n Ci ∼ Exp(λ), λ = 0.3,1 i = 1,...,n θ0 = (1,0.5,1.4,0.2)′ Xi ∈ R4 Xij ∼ 0.2N (0,1)+0.8N (0.25,2) i = 1,...,n j = 1,...,4 εi ∼ N (0,|θ ′

0Xi|)

i = 1,...,n ˆ θ ADE : Lu and Burke’s estimator (2005).

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 19 / 24

slide-28
SLIDE 28

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Simulation study

Bias Variance MSE ˆ θADE   −0.112 −0.551 −0.155     0.14 0.005 −0.022 0.005 0.075 0.016 −0.022 0.016 0.116   0.6714181 ˆ θ∞   0.057 0.215 0.048     0.033 0.012 0.001 0.012 0.073 −0.004 0.001 −0.004 0.027   0.1841227 ˆ θτ   0.07 0.221 0.028     0.034 0.002 0.002 0.002 0.074 0.002 0.02   0.1825980

100 simulations of sampling of size n = 100 25% of censored variables ˆ τ : on average we have removed the 10 last largest

  • bservations
  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 20 / 24

slide-29
SLIDE 29

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Simulation study

Bias Variance MSE ˆ θADE   −0.334 −0.743 −0.158     0.159 0.009 −0.014 0.009 0.268 0.048 −0.014 0.048 0.165   1.280163 ˆ θ∞   0.127 0.296 0.096     0.11 −0.034 −0.01 −0.034 0.101 0.021 −0.01 0.021 0.059   0.3829797 ˆ θτ   0.074 0.176 0.061     0.064 −0.005 −0.004 −0.005 0.051 0.014 −0.004 0.014 0.069   0.2239023

100 simulations of sampling of size n = 100 40% of censored variables ˆ τ : on average we have removed the 13 last largest

  • bservations
  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 21 / 24

slide-30
SLIDE 30

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Analysis on real data

Estimators of θ0,2/θ0,1 Miller and Halpern

  • 0.01588785

Wei et al. 63.75 Stute et al.

  • 0.01367034

ˆ θ∞

  • 0.07351351

ˆ θτ

  • 0.0421508

n = 157 55 censored variables ˆ τ : we have removed the 67 last largest observations Thank you for you attention !

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 22 / 24

slide-31
SLIDE 31

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

Analysis on real data

Estimators of θ0,2/θ0,1 Miller and Halpern

  • 0.01588785

Wei et al. 63.75 Stute et al.

  • 0.01367034

ˆ θ∞

  • 0.07351351

ˆ θτ

  • 0.0421508

n = 157 55 censored variables ˆ τ : we have removed the 67 last largest observations Thank you for you attention !

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 22 / 24

slide-32
SLIDE 32

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

References

  • M. Delecroix, W. Härdle and M. Hristache. Efficient estimation

in conditional single-index regression. J. Multivariate Anal., 86(2) : 213-226, 2003.

  • U. Einmahl and D. Mason. Uniform in bandwidth consistency
  • f kernel-type function estimators. Ann. Statist., 33(3) :

1380-1403, 2005.

  • X. Lu and M. D. Burke. Censored multiple regression by the

method of average derivatives. J. Multivariate Anal., 95(1) : 182-205, 2005.

  • R. Miller and J. Halpern. Regression with censored data.

Biometrika, 69(3) : 521-531, 1982.

  • C. Sánchez Sellero, W. González Manteiga and I. Van
  • Keilegom. Uniform representation of product-limit integrals

with applications. Scand. J. Statist., 32(4) : 563-581, 2005.

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 23 / 24

slide-33
SLIDE 33

Estimation procedure Asymptotic results for ˆ θ Key ingredients of proof Simulation study and analysis on real data

References

  • W. Stute. Distributional convergence under random censorship

when covariables are present. Scand. J. Statist., 23(4) : 461-471, 1996.

  • W. Stute. Nonlinear censored regression. Statist. Sinica, 9(4) :

1089-1102, 1999.

  • W. Stute, W. González Manteiga and C. Sánchez Sellero.

Nonparametric model checks in censored regression. Comm.

  • Statist. Theory Methods, 29(7) : 1611-1629, 2000.
  • A. W. van der Vaart and Jon A. Wellner. Weak convergence

and empirical processes. Springer Series in Statistics. Springer-Verlag, New York, 1996.

  • L. J. Wei, Z. Ying and D. Y. Lin. Linear regression analysis of

censored survival data based on rank tests. Biometrika, 77(4) : 845-851, 1990.

  • O. Bouaziz and O. Lopez

Estimation in a SIM with censored data IWAP 10-07-08 24 / 24