Conditional AIC For Mixed Effects Models Florin Vaida Division of - - PowerPoint PPT Presentation

conditional aic for mixed effects models
SMART_READER_LITE
LIVE PREVIEW

Conditional AIC For Mixed Effects Models Florin Vaida Division of - - PowerPoint PPT Presentation

Conditional AIC For Mixed Effects Models Florin Vaida Division of Biostatistics and Bioinformatics, UCSD Vienna Workshop on Model Selection July 24, 2008 1 Model Selection for Mixed Effects Models Setting: longitudinal data Subjects i


slide-1
SLIDE 1

Conditional AIC For Mixed Effects Models

Florin Vaida

Division of Biostatistics and Bioinformatics, UCSD Vienna Workshop on Model Selection July 24, 2008

1

slide-2
SLIDE 2

Model Selection for Mixed Effects Models

  • Setting: longitudinal data
  • Subjects i = 1 . . . m; Observations j = 1 . . . ni
  • Model: Linear, Generalized Linear, Nonlinear Mixed Effects

Models (LME, GLME, NLME)

  • NLME:

yij = h(ηij) + ǫij, ǫij ∼ N(0, σ2)

  • GLME:

E(yij) = h(ηij), yij ∼ exponential family

  • Linear predictor

ηij = x⊤

ijβ + z⊤ ijbi

  • bi

iid

∼ N(0, G) random effects;

2

slide-3
SLIDE 3

Model Selection For Cluster Focus

  • Focus of inference/prediction on subjects (clusters) in the

dataset, not on new subjects (clusters)

  • Model selection:

– what covariates to include? – what random effects to include? – should I fit subject effects as fixed or random?

  • Vaida & Blanchard (2005): conditional Akaike information
  • For LME cAIC = −2loglik + 2K
  • conditional loglik and effective d.f. K
  • Extend such a formula to GLME and NLME?

3

slide-4
SLIDE 4

Example: PK of Cadralazine

Time since drug administration (hrs) log(Concentration/log(Dose) (1/L)

−9 −8 −7 −6 −5 −4 −3 5 10 15 20 25

6 1

5 10 15 20 25

9 10

5 10 15 20 25

7 5 3

5 10 15 20 25

4 8

5 10 15 20 25 −9 −8 −7 −6 −5 −4 −3

2

4

slide-5
SLIDE 5

yij = β0i + β1i · tij + eij Two models:

  • 1. Linear regression model :

β0i, β1i = fixed parameters, subject-specific

  • 2. Random effects model:

β0i = β0 + b0i, β1i = β1 + b1i (b0i, b1i)

iid

∼ N(0, G) Which model is better?

5

slide-6
SLIDE 6

−9 −8 −7 −6 −5 −4 −3 −2 −9 −8 −7 −6 −5 −4 −3 −2

RE Population: observed vs fitted

Fitted Values Observed Values −9 −8 −7 −6 −5 −4 −3 −2 −9 −8 −7 −6 −5 −4 −3 −2

RE Individual: observed vs fitted

Fitted Values Observed Values −9 −8 −7 −6 −5 −4 −3 −2 −9 −8 −7 −6 −5 −4 −3 −2

SS: observed vs fitted

Fitted Values Observed Values −9 −8 −7 −6 −5 −4 −3 −2 −3 −2 −1 1 2 3

RE Population: residauls vs fitted

Fitted Values Residuals −9 −8 −7 −6 −5 −4 −3 −2 −0.4 −0.2 0.0 0.2 0.4

RE Individual: residauls vs fitted

Fitted Values Residuals −9 −8 −7 −6 −5 −4 −3 −2 −0.4 −0.2 0.0 0.2 0.4

SS: residuals vs fitted

Fitted Values Residuals

Figure 1:

6

slide-7
SLIDE 7

Comparison of the two models

Linear regression Random effects AIC

  • 47.1

11.0 AIC: small is beautiful |∆AIC| < 2 = similar fit > 10 = overwhelming evidence Why is AIC for linear regression model so much smaller? Something wrong with AIC?

7

slide-8
SLIDE 8

Effective Degrees of Freedom for LME

  • LME:

yi = Xiβ + Zibi + ei, bi ∼ N(0, σ2D0) Or in general y = Xβ + Zb + e, b ∼ N(0, G = σ2D), e ∼ N(0, σ2I)

  • Inference: Maximum likelihood for β, σ2, D (or REML)
  • ˆ

bi = arg sup p(bi|y, ˆ β, ˆ G) = arg sup p(yi, bi|ˆ β, ˆ G) = BLUP, or Empirical Bayes

  • Hodges and Sargent (2001): Effective degrees of freedom

ρ = trace(H) where ˆ y = Hy Count bi as a fraction of a parameter

8

slide-9
SLIDE 9

Effective Degrees of Freedom

  • Henderson’s “score” equations for (β, b):

X⊤y = X⊤Xβ + X⊤Zb Z⊤y = Z⊤Xβ + (Z⊤Z + D−1)b

  • Corresponding to the formal linear model

⎛ ⎝y ⎞ ⎠ = ⎛ ⎝X Z −I ⎞ ⎠ ⎛ ⎝β b ⎞ ⎠ + ⎛ ⎝e b ⎞ ⎠

  • ˆ

y = Hy, H =

  • X

Z

⎝X⊤X X⊤Z Z⊤X Z⊤Z + D−1 ⎞ ⎠

−1

X Z ⊤

9

slide-10
SLIDE 10

ρ = trace(H) = trace ⎧ ⎪ ⎨ ⎪ ⎩

  • X

Z

⎝X⊤X X⊤Z Z⊤X Z⊤Z + D−1 ⎞ ⎠

−1

X Z ⊤ ⎫ ⎪ ⎬ ⎪ ⎭ = trace ⎧ ⎪ ⎨ ⎪ ⎩ ⎛ ⎝X⊤X X⊤Z Z⊤X Z⊤Z + D−1 ⎞ ⎠

−1 ⎛

⎝X⊤X X⊤Z Z⊤X Z⊤Z ⎞ ⎠ ⎫ ⎪ ⎬ ⎪ ⎭ Inspired by Hastie and Tibshirani (1990) for GAM.

10

slide-11
SLIDE 11

Counting parameters for NLME

  • Our idea: take NLME

yij = h(ηij) + ǫij, ηij = x⊤

ijβ + z⊤ ijbi

  • Linearization:

h(ηij) ≈ h(ˆ ηij) + h′(ˆ ηij)(ηij − ˆ ηij) = {h′(ˆ ηij)xij}β + {h′(ˆ ηij)zij}bi + const

  • wij = s⊤

ijβ + t⊤ ijbi + ǫij

  • Where wij = yij − h(ˆ

ηij) + h′(ˆ ηij)ˆ ηij si = h′(ˆ ηij)xij, tij = h′(ˆ ηij)zij

  • Compute ρ for the linearized mixed effects model in wij
  • Call ρ effective d.f. for NLME

11

slide-12
SLIDE 12

Effective DF for GLME/NLME

  • Lu, Carlin and Hodges (2007), for GLME:
  • For a GLMM with linear predictor ηij = x⊤

ijβ + z⊤ ijbi expand

loglik l(ηij) = l(ˆ ηij) + l′(ˆ ηij)(ηij − ˆ ηij) + 1 2l′′(ˆ ηij)(ηij − ˆ ηij)2 = − 1 2σ2

ij

(uij − ηij)2 + const, σ2

ij

= −1/E{l′′(ˆ ηij} = σ2/{h′(η∗

ij)2}

  • uij ≈ ηij + ǫij,

ǫij ∼ N(0, σ2

ij) 12

slide-13
SLIDE 13

Formal linear model ⎛ ⎝u ⎞ ⎠ = ⎛ ⎝X Z −I ⎞ ⎠ ⎛ ⎝β b ⎞ ⎠ + ⎛ ⎝e b ⎞ ⎠ Var(ǫ) = σ2W −1 W = diag[{h′(ηij)}2] ρ = trace(H) = trace ⎧ ⎪ ⎨ ⎪ ⎩

  • X

Z

⎝X⊤WX X⊤WZ Z⊤WX Z⊤WZ + D−1 ⎞ ⎠

−1

X Z ⊤ W ⎫ ⎪ ⎬ ⎪ ⎭ = trace ⎧ ⎪ ⎨ ⎪ ⎩ ⎛ ⎝X⊤WX X⊤WZ Z⊤WX Z⊤WZ + D−1 ⎞ ⎠

−1 ⎛

⎝X⊤WX X⊤WZ Z⊤WX Z⊤WZ ⎞ ⎠ ⎫ ⎪ ⎬ ⎪ ⎭ ρ = effective degrees of freedom of GLME/NLME

13

slide-14
SLIDE 14

Effective DF for NLME

  • Result: The two definitions of ρ for NLME are equivalent.

They are based on different linearizations: on the scale of yij and on the scale of ηij, respectively.

  • For NLME/GLME ρ depends also on ηij through

W = diag[{h′(ηij)}2].

  • Relevant values: true ρ∗, using “true” η∗

ij, W ∗ and estimated ˆ

ρ, using ˆ ηij, ˆ W.

  • H, ρ correspond to the score equations for (β, b):

X⊤Wy = X⊤WXβ + X⊤WZb Z⊤Wy = Z⊤WXβ + (Z⊤WZ + D−1)b which are the PQL equations of Breslow and Clayton (1993) for GLME.

14

slide-15
SLIDE 15

Model selection using Akaike information

AI = Ef(y)

  • −2Ef(y∗) log g(y∗|ˆ

θ(y))

  • How good is model g(·|θ) at predicting new data y∗ from model

f(·), based on the sample y from f(·)? Akaike information is not about finding the “true model”. Estimator: AI ≈ AIC = −2 log g(y|ˆ θ(y)) + 2K K = d f = # parameters in the model Asymptotically, AIC ≈ unbiased for AI

15

slide-16
SLIDE 16

Conditional AIC

  • Assume truth f(|b0) and model g(|θ, b) GLMM/NLMM/LMM
  • f(y|b0) = conditional distribution, b0 true value of random eff.
  • Definition: Conditional Akaike Info (V & B 2005)

cAI = −2Ef(y,b0)Ef(y∗|b0) log g(y∗|ˆ β(y),ˆ b(y)) where y∗ iid with y, conditional on same b0

  • cAI is appropriate for comparing models at subject-specific

level

  • E.g., when chosing between fixed and random subject effects

16

slide-17
SLIDE 17

Theorem: Conditional AIC for LME

Assume that y ∼ LME and model class g contains the operational model f; σ2, D are known. Then

cAIC = −2 log g(y|ˆ β(y),ˆ b(y)) + 2ρ

is an unbiased estimator of the conditional Akaike information.

  • g(y|ˆ

β,ˆ b) is the conditional distribution

  • ρ = effective d.f.
  • Unknown σ2, correction = ρ + 1 asymptotically, small sample

correction available

  • No correction needed for unknown D asymptotically

17

slide-18
SLIDE 18

Back to Cadralazine data:

Random effects model Linear regression model mAIC cAIC AIC Asymptotic 11.0 −44.5 −47.1 Finite-sample corrected 12.6 −42.3 −22.8 REML − −43.7 −40.6

18

slide-19
SLIDE 19

Theorem: Conditional AIC for GLMM/NLMM

Assume that y ∼ GLMM or NLMM and σ2, D are known. Then, under regularity conditions

cAIC = −2 log g(y|ˆ β,ˆ b) + 2ρ

is an asymptotically unbiased estimator of the conditional Akaike information.

  • Regularity conditions include m/N → 0 as N → ∞;

min |Zi| ≥ c0 (Jiang, Jia and Chen, 2001). They ensure consistency of β, bi.

  • No results yet for unknown σ2; correction = ρ + 1?

19

slide-20
SLIDE 20

Simulation study

Bias of cAIC as an estimator of cAI = −2 cond loglik + 2(ρ + 1) σ ni .50 .25 .125 24 1.7 0.5 0.0 12 3.0 1.0 0.4 6 4.9 3.0 1.9 3 9.5 9.7 8.7 10 clusters, ni observations each; One-compartment PK model Bias reduces with increasing ni and decreasing σ/||D||. Bias includes effects of unknown D and model non-linearity

20

slide-21
SLIDE 21

Cadralazine Data

Time since drug administration (hrs) Cadralizine concentration (mg/L) 0.5 1 1.5 2 5 10 15 20 25 6 1 5 10 15 20 25 9 10 7 5 3 0.5 1 1.5 2 4 0.5 1 1.5 2 8 2 5 10 15 20 25

Mean random var ρ cAIC exp{β1i + β2tij} β1i σ2 10.23 −473.88 exp{β1i + β2itij} β1i, β2i σ2 10.23 −473.88 exp{β1i + β2itij} none σ2

i

30 −577.18

21