conditional aic for mixed effects models
play

Conditional AIC For Mixed Effects Models Florin Vaida Division of - PowerPoint PPT Presentation

Conditional AIC For Mixed Effects Models Florin Vaida Division of Biostatistics and Bioinformatics, UCSD Vienna Workshop on Model Selection July 24, 2008 1 Model Selection for Mixed Effects Models Setting: longitudinal data Subjects i


  1. Conditional AIC For Mixed Effects Models Florin Vaida Division of Biostatistics and Bioinformatics, UCSD Vienna Workshop on Model Selection July 24, 2008 1

  2. Model Selection for Mixed Effects Models • Setting: longitudinal data • Subjects i = 1 . . . m ; Observations j = 1 . . . n i • Model: Linear, Generalized Linear, Nonlinear Mixed Effects Models (LME, GLME, NLME) ǫ ij ∼ N (0 , σ 2 ) • NLME: y ij = h ( η ij ) + ǫ ij , • GLME: E ( y ij ) = h ( η ij ), y ij ∼ exponential family η ij = x ⊤ ij β + z ⊤ • Linear predictor ij b i iid • b i ∼ N (0 , G ) random effects; 2

  3. Model Selection For Cluster Focus • Focus of inference/prediction on subjects (clusters) in the dataset, not on new subjects (clusters) • Model selection: – what covariates to include? – what random effects to include? – should I fit subject effects as fixed or random? • Vaida & Blanchard (2005): conditional Akaike information • For LME cAIC = − 2loglik + 2 K • conditional loglik and effective d.f. K • Extend such a formula to GLME and NLME? 3

  4. Example: PK of Cadralazine 5 10 15 20 25 5 10 15 20 25 5 3 4 8 2 − 3 − 4 − 5 − 6 log(Concentration/log(Dose) (1/L) − 7 − 8 − 9 6 1 9 10 7 − 3 − 4 − 5 − 6 − 7 − 8 − 9 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 Time since drug administration (hrs) 4

  5. y ij = β 0 i + β 1 i · t ij + e ij Two models: 1. Linear regression model : β 0 i , β 1 i = fixed parameters, subject-specific 2. Random effects model: β 0 i = β 0 + b 0 i , β 1 i = β 1 + b 1 i iid ( b 0 i , b 1 i ) ∼ N (0 , G ) Which model is better? 5

  6. RE Population: observed vs fitted RE Individual: observed vs fitted SS: observed vs fitted − 2 − 2 − 2 − 3 − 3 − 3 − 4 − 4 − 4 Observed Values Observed Values Observed Values − 5 − 5 − 5 − 6 − 6 − 6 − 7 − 7 − 7 − 8 − 8 − 8 − 9 − 9 − 9 − 9 − 8 − 7 − 6 − 5 − 4 − 3 − 2 − 9 − 8 − 7 − 6 − 5 − 4 − 3 − 2 − 9 − 8 − 7 − 6 − 5 − 4 − 3 − 2 Fitted Values Fitted Values Fitted Values RE Population: residauls vs fitted RE Individual: residauls vs fitted SS: residuals vs fitted 3 0.4 0.4 2 0.2 0.2 1 Residuals Residuals Residuals 0.0 0.0 0 − 1 − 0.2 − 0.2 − 2 − 0.4 − 0.4 − 3 − 9 − 8 − 7 − 6 − 5 − 4 − 3 − 2 − 9 − 8 − 7 − 6 − 5 − 4 − 3 − 2 − 9 − 8 − 7 − 6 − 5 − 4 − 3 − 2 Fitted Values Fitted Values Fitted Values Figure 1: 6

  7. Comparison of the two models Linear regression Random effects AIC -47.1 11.0 AIC: small is beautiful | ∆AIC | < 2 = similar fit > 10 = overwhelming evidence Why is AIC for linear regression model so much smaller? Something wrong with AIC? 7

  8. Effective Degrees of Freedom for LME • LME: b i ∼ N (0 , σ 2 D 0 ) y i = X i β + Z i b i + e i , Or in general b ∼ N (0 , G = σ 2 D ) , e ∼ N (0 , σ 2 I ) y = Xβ + Zb + e, • Inference: Maximum likelihood for β, σ 2 , D (or REML) • ˆ b i = arg sup p ( b i | y, ˆ β, ˆ G ) = arg sup p ( y i , b i | ˆ β, ˆ G ) = BLUP, or Empirical Bayes • Hodges and Sargent (2001): Effective degrees of freedom ρ = trace( H ) where ˆ y = Hy Count b i as a fraction of a parameter 8

  9. Effective Degrees of Freedom • Henderson’s “score” equations for ( β, b ): X ⊤ y X ⊤ Xβ + X ⊤ Zb = Z ⊤ y Z ⊤ Xβ + ( Z ⊤ Z + D − 1 ) b = • Corresponding to the formal linear model ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎝ y ⎝ X Z ⎝ β ⎝ e ⎠ = ⎠ + ⎠ ⎠ 0 0 − I b b • ˆ y = Hy , − 1 � ⎛ ⎞ ⎝ X ⊤ X X ⊤ Z � ⊤ � � H = X Z X Z ⎠ Z ⊤ X Z ⊤ Z + D − 1 9

  10. ρ = trace( H ) ⎧ ⎫ − 1 � ⎛ ⎞ ⎝ X ⊤ X X ⊤ Z ⎪ ⎪ � ⊤ ⎨ ⎬ � � = trace X Z X Z ⎠ Z ⊤ X Z ⊤ Z + D − 1 ⎪ ⎪ ⎩ ⎭ ⎧ ⎫ − 1 ⎛ ⎛ ⎞ ⎞ ⎝ X ⊤ X X ⊤ Z ⎝ X ⊤ X X ⊤ Z ⎪ ⎪ ⎨ ⎬ = trace ⎠ ⎠ Z ⊤ X Z ⊤ Z + D − 1 Z ⊤ X Z ⊤ Z ⎪ ⎪ ⎩ ⎭ Inspired by Hastie and Tibshirani (1990) for GAM. 10

  11. Counting parameters for NLME • Our idea: take NLME η ij = x ⊤ ij β + z ⊤ y ij = h ( η ij ) + ǫ ij , ij b i • Linearization: η ij ) + h ′ (ˆ h ( η ij ) ≈ h (ˆ η ij )( η ij − ˆ η ij ) { h ′ (ˆ η ij ) x ij } β + { h ′ (ˆ = η ij ) z ij } b i + const • w ij = s ⊤ ij β + t ⊤ ij b i + ǫ ij η ij ) + h ′ (ˆ • Where w ij = y ij − h (ˆ η ij )ˆ η ij s i = h ′ (ˆ t ij = h ′ (ˆ η ij ) x ij , η ij ) z ij • Compute ρ for the linearized mixed effects model in w ij • Call ρ effective d.f. for NLME 11

  12. Effective DF for GLME/NLME • Lu, Carlin and Hodges (2007), for GLME: • For a GLMM with linear predictor η ij = x ⊤ ij β + z ⊤ ij b i expand loglik η ij ) + 1 η ij ) + l ′ (ˆ 2 l ′′ (ˆ η ij ) 2 l ( η ij ) = l (ˆ η ij )( η ij − ˆ η ij )( η ij − ˆ − 1 ( u ij − η ij ) 2 + const , = 2 σ 2 ij σ 2 − 1 /E { l ′′ (ˆ η ij } = σ 2 / { h ′ ( η ∗ ij ) 2 } = ij ǫ ij ∼ N (0 , σ 2 • u ij ≈ η ij + ǫ ij , ij ) 12

  13. Formal linear model ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎝ u ⎝ X Z ⎝ β ⎝ e ⎠ = ⎠ + ⎠ ⎠ 0 0 − I b b Var( ǫ ) = σ 2 W − 1 W = diag[ { h ′ ( η ij ) } 2 ] ρ = trace( H ) ⎧ ⎫ − 1 � ⎛ ⎞ ⎝ X ⊤ WX X ⊤ WZ ⎪ ⎪ � ⊤ ⎨ ⎬ � � = trace W X Z X Z ⎠ Z ⊤ WX Z ⊤ WZ + D − 1 ⎪ ⎪ ⎩ ⎭ ⎧ ⎫ − 1 ⎛ ⎛ ⎞ ⎞ ⎝ X ⊤ WX X ⊤ WZ ⎝ X ⊤ WX X ⊤ WZ ⎪ ⎪ ⎨ ⎬ = trace ⎠ ⎠ Z ⊤ WX Z ⊤ WZ + D − 1 Z ⊤ WX Z ⊤ WZ ⎪ ⎪ ⎩ ⎭ ρ = effective degrees of freedom of GLME/NLME 13

  14. Effective DF for NLME • Result: The two definitions of ρ for NLME are equivalent. They are based on different linearizations: on the scale of y ij and on the scale of η ij , respectively. • For NLME/GLME ρ depends also on η ij through W = diag[ { h ′ ( η ij ) } 2 ]. ij , W ∗ and estimated ˆ • Relevant values: true ρ ∗ , using “true” η ∗ ρ , η ij , ˆ using ˆ W . • H , ρ correspond to the score equations for ( β, b ): X ⊤ Wy X ⊤ WXβ + X ⊤ WZb = Z ⊤ Wy Z ⊤ WXβ + ( Z ⊤ WZ + D − 1 ) b = which are the PQL equations of Breslow and Clayton (1993) for GLME. 14

  15. Model selection using Akaike information � � − 2 E f ( y ∗ ) log g ( y ∗ | ˆ AI = E f ( y ) θ ( y )) How good is model g ( ·| θ ) at predicting new data y ∗ from model f ( · ), based on the sample y from f ( · )? Akaike information is not about finding the “true model”. Estimator: AI ≈ AIC = − 2 log g ( y | ˆ θ ( y )) + 2 K K = d f = # parameters in the model Asymptotically, AIC ≈ unbiased for AI 15

  16. Conditional AIC • Assume truth f ( | b 0 ) and model g ( | θ, b ) GLMM/NLMM/LMM • f ( y | b 0 ) = conditional distribution, b 0 true value of random eff. • Definition: Conditional Akaike Info (V & B 2005) cAI = − 2 E f ( y,b 0 ) E f ( y ∗ | b 0 ) log g ( y ∗ | ˆ β ( y ) , ˆ b ( y )) where y ∗ iid with y , conditional on same b 0 • cAI is appropriate for comparing models at subject-specific level • E.g., when chosing between fixed and random subject effects 16

  17. Theorem: Conditional AIC for LME Assume that y ∼ LME and model class g contains the operational model f ; σ 2 , D are known. Then cAIC = − 2 log g ( y | ˆ β ( y ) , ˆ b ( y )) + 2 ρ is an unbiased estimator of the conditional Akaike information. • g ( y | ˆ β, ˆ b ) is the conditional distribution • ρ = effective d.f. • Unknown σ 2 , correction = ρ + 1 asymptotically, small sample correction available • No correction needed for unknown D asymptotically 17

  18. Back to Cadralazine data: Random effects model Linear regression model mAIC cAIC AIC Asymptotic 11 . 0 − 44 . 5 − 47 . 1 Finite-sample corrected 12 . 6 − 42 . 3 − 22 . 8 REML − − 43 . 7 − 40 . 6 18

  19. Theorem: Conditional AIC for GLMM/NLMM Assume that y ∼ GLMM or NLMM and σ 2 , D are known. Then, under regularity conditions cAIC = − 2 log g ( y | ˆ β, ˆ b ) + 2 ρ is an asymptotically unbiased estimator of the conditional Akaike information. • Regularity conditions include m/N → 0 as N → ∞ ; min | Z i | ≥ c 0 (Jiang, Jia and Chen, 2001). They ensure consistency of β, b i . • No results yet for unknown σ 2 ; correction = ρ + 1? 19

  20. Simulation study Bias of cAIC as an estimator of cAI = − 2 cond loglik + 2( ρ + 1) σ n i .50 .25 .125 24 1.7 0.5 0.0 12 3.0 1.0 0.4 6 4.9 3.0 1.9 3 9.5 9.7 8.7 10 clusters, n i observations each; One-compartment PK model Bias reduces with increasing n i and decreasing σ/ || D || . Bias includes effects of unknown D and model non-linearity 20

  21. Cadralazine Data 5 10 15 20 25 8 2 2 1.5 1 0.5 0 7 5 3 4 Cadralizine concentration (mg/L) 2 1.5 1 0.5 0 6 1 9 10 2 1.5 1 0.5 0 5 10 15 20 25 5 10 15 20 25 Time since drug administration (hrs) Mean random var cAIC ρ σ 2 exp { β 1 i + β 2 t ij } 10.23 − 473 . 88 β 1 i σ 2 exp { β 1 i + β 2 i t ij } − 473 . 88 β 1 i , β 2 i 10.23 σ 2 exp { β 1 i + β 2 i t ij } − 577 . 18 none 30 i 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend