Conditional AIC For Mixed Effects Models Florin Vaida Division of - PowerPoint PPT Presentation

Conditional AIC For Mixed Effects Models Florin Vaida Division of Biostatistics and Bioinformatics, UCSD Vienna Workshop on Model Selection July 24, 2008 1

Model Selection for Mixed Effects Models • Setting: longitudinal data • Subjects i = 1 . . . m ; Observations j = 1 . . . n i • Model: Linear, Generalized Linear, Nonlinear Mixed Effects Models (LME, GLME, NLME) ǫ ij ∼ N (0 , σ 2 ) • NLME: y ij = h ( η ij ) + ǫ ij , • GLME: E ( y ij ) = h ( η ij ), y ij ∼ exponential family η ij = x ⊤ ij β + z ⊤ • Linear predictor ij b i iid • b i ∼ N (0 , G ) random effects; 2

Model Selection For Cluster Focus • Focus of inference/prediction on subjects (clusters) in the dataset, not on new subjects (clusters) • Model selection: – what covariates to include? – what random effects to include? – should I fit subject effects as fixed or random? • Vaida & Blanchard (2005): conditional Akaike information • For LME cAIC = − 2loglik + 2 K • conditional loglik and effective d.f. K • Extend such a formula to GLME and NLME? 3

Example: PK of Cadralazine 5 10 15 20 25 5 10 15 20 25 5 3 4 8 2 − 3 − 4 − 5 − 6 log(Concentration/log(Dose) (1/L) − 7 − 8 − 9 6 1 9 10 7 − 3 − 4 − 5 − 6 − 7 − 8 − 9 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 Time since drug administration (hrs) 4

y ij = β 0 i + β 1 i · t ij + e ij Two models: 1. Linear regression model : β 0 i , β 1 i = fixed parameters, subject-specific 2. Random effects model: β 0 i = β 0 + b 0 i , β 1 i = β 1 + b 1 i iid ( b 0 i , b 1 i ) ∼ N (0 , G ) Which model is better? 5

RE Population: observed vs fitted RE Individual: observed vs fitted SS: observed vs fitted − 2 − 2 − 2 − 3 − 3 − 3 − 4 − 4 − 4 Observed Values Observed Values Observed Values − 5 − 5 − 5 − 6 − 6 − 6 − 7 − 7 − 7 − 8 − 8 − 8 − 9 − 9 − 9 − 9 − 8 − 7 − 6 − 5 − 4 − 3 − 2 − 9 − 8 − 7 − 6 − 5 − 4 − 3 − 2 − 9 − 8 − 7 − 6 − 5 − 4 − 3 − 2 Fitted Values Fitted Values Fitted Values RE Population: residauls vs fitted RE Individual: residauls vs fitted SS: residuals vs fitted 3 0.4 0.4 2 0.2 0.2 1 Residuals Residuals Residuals 0.0 0.0 0 − 1 − 0.2 − 0.2 − 2 − 0.4 − 0.4 − 3 − 9 − 8 − 7 − 6 − 5 − 4 − 3 − 2 − 9 − 8 − 7 − 6 − 5 − 4 − 3 − 2 − 9 − 8 − 7 − 6 − 5 − 4 − 3 − 2 Fitted Values Fitted Values Fitted Values Figure 1: 6

Comparison of the two models Linear regression Random effects AIC -47.1 11.0 AIC: small is beautiful | ∆AIC | < 2 = similar fit > 10 = overwhelming evidence Why is AIC for linear regression model so much smaller? Something wrong with AIC? 7

Effective Degrees of Freedom for LME • LME: b i ∼ N (0 , σ 2 D 0 ) y i = X i β + Z i b i + e i , Or in general b ∼ N (0 , G = σ 2 D ) , e ∼ N (0 , σ 2 I ) y = Xβ + Zb + e, • Inference: Maximum likelihood for β, σ 2 , D (or REML) • ˆ b i = arg sup p ( b i | y, ˆ β, ˆ G ) = arg sup p ( y i , b i | ˆ β, ˆ G ) = BLUP, or Empirical Bayes • Hodges and Sargent (2001): Effective degrees of freedom ρ = trace( H ) where ˆ y = Hy Count b i as a fraction of a parameter 8

Effective Degrees of Freedom • Henderson’s “score” equations for ( β, b ): X ⊤ y X ⊤ Xβ + X ⊤ Zb = Z ⊤ y Z ⊤ Xβ + ( Z ⊤ Z + D − 1 ) b = • Corresponding to the formal linear model ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎝ y ⎝ X Z ⎝ β ⎝ e ⎠ = ⎠ + ⎠ ⎠ 0 0 − I b b • ˆ y = Hy , − 1 � ⎛ ⎞ ⎝ X ⊤ X X ⊤ Z � ⊤ � � H = X Z X Z ⎠ Z ⊤ X Z ⊤ Z + D − 1 9

ρ = trace( H ) ⎧ ⎫ − 1 � ⎛ ⎞ ⎝ X ⊤ X X ⊤ Z ⎪ ⎪ � ⊤ ⎨ ⎬ � � = trace X Z X Z ⎠ Z ⊤ X Z ⊤ Z + D − 1 ⎪ ⎪ ⎩ ⎭ ⎧ ⎫ − 1 ⎛ ⎛ ⎞ ⎞ ⎝ X ⊤ X X ⊤ Z ⎝ X ⊤ X X ⊤ Z ⎪ ⎪ ⎨ ⎬ = trace ⎠ ⎠ Z ⊤ X Z ⊤ Z + D − 1 Z ⊤ X Z ⊤ Z ⎪ ⎪ ⎩ ⎭ Inspired by Hastie and Tibshirani (1990) for GAM. 10

Counting parameters for NLME • Our idea: take NLME η ij = x ⊤ ij β + z ⊤ y ij = h ( η ij ) + ǫ ij , ij b i • Linearization: η ij ) + h ′ (ˆ h ( η ij ) ≈ h (ˆ η ij )( η ij − ˆ η ij ) { h ′ (ˆ η ij ) x ij } β + { h ′ (ˆ = η ij ) z ij } b i + const • w ij = s ⊤ ij β + t ⊤ ij b i + ǫ ij η ij ) + h ′ (ˆ • Where w ij = y ij − h (ˆ η ij )ˆ η ij s i = h ′ (ˆ t ij = h ′ (ˆ η ij ) x ij , η ij ) z ij • Compute ρ for the linearized mixed effects model in w ij • Call ρ effective d.f. for NLME 11

Effective DF for GLME/NLME • Lu, Carlin and Hodges (2007), for GLME: • For a GLMM with linear predictor η ij = x ⊤ ij β + z ⊤ ij b i expand loglik η ij ) + 1 η ij ) + l ′ (ˆ 2 l ′′ (ˆ η ij ) 2 l ( η ij ) = l (ˆ η ij )( η ij − ˆ η ij )( η ij − ˆ − 1 ( u ij − η ij ) 2 + const , = 2 σ 2 ij σ 2 − 1 /E { l ′′ (ˆ η ij } = σ 2 / { h ′ ( η ∗ ij ) 2 } = ij ǫ ij ∼ N (0 , σ 2 • u ij ≈ η ij + ǫ ij , ij ) 12

Formal linear model ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎝ u ⎝ X Z ⎝ β ⎝ e ⎠ = ⎠ + ⎠ ⎠ 0 0 − I b b Var( ǫ ) = σ 2 W − 1 W = diag[ { h ′ ( η ij ) } 2 ] ρ = trace( H ) ⎧ ⎫ − 1 � ⎛ ⎞ ⎝ X ⊤ WX X ⊤ WZ ⎪ ⎪ � ⊤ ⎨ ⎬ � � = trace W X Z X Z ⎠ Z ⊤ WX Z ⊤ WZ + D − 1 ⎪ ⎪ ⎩ ⎭ ⎧ ⎫ − 1 ⎛ ⎛ ⎞ ⎞ ⎝ X ⊤ WX X ⊤ WZ ⎝ X ⊤ WX X ⊤ WZ ⎪ ⎪ ⎨ ⎬ = trace ⎠ ⎠ Z ⊤ WX Z ⊤ WZ + D − 1 Z ⊤ WX Z ⊤ WZ ⎪ ⎪ ⎩ ⎭ ρ = effective degrees of freedom of GLME/NLME 13

Effective DF for NLME • Result: The two definitions of ρ for NLME are equivalent. They are based on different linearizations: on the scale of y ij and on the scale of η ij , respectively. • For NLME/GLME ρ depends also on η ij through W = diag[ { h ′ ( η ij ) } 2 ]. ij , W ∗ and estimated ˆ • Relevant values: true ρ ∗ , using “true” η ∗ ρ , η ij , ˆ using ˆ W . • H , ρ correspond to the score equations for ( β, b ): X ⊤ Wy X ⊤ WXβ + X ⊤ WZb = Z ⊤ Wy Z ⊤ WXβ + ( Z ⊤ WZ + D − 1 ) b = which are the PQL equations of Breslow and Clayton (1993) for GLME. 14

Model selection using Akaike information � � − 2 E f ( y ∗ ) log g ( y ∗ | ˆ AI = E f ( y ) θ ( y )) How good is model g ( ·| θ ) at predicting new data y ∗ from model f ( · ), based on the sample y from f ( · )? Akaike information is not about finding the “true model”. Estimator: AI ≈ AIC = − 2 log g ( y | ˆ θ ( y )) + 2 K K = d f = # parameters in the model Asymptotically, AIC ≈ unbiased for AI 15

Conditional AIC • Assume truth f ( | b 0 ) and model g ( | θ, b ) GLMM/NLMM/LMM • f ( y | b 0 ) = conditional distribution, b 0 true value of random eff. • Definition: Conditional Akaike Info (V & B 2005) cAI = − 2 E f ( y,b 0 ) E f ( y ∗ | b 0 ) log g ( y ∗ | ˆ β ( y ) , ˆ b ( y )) where y ∗ iid with y , conditional on same b 0 • cAI is appropriate for comparing models at subject-specific level • E.g., when chosing between fixed and random subject effects 16

Theorem: Conditional AIC for LME Assume that y ∼ LME and model class g contains the operational model f ; σ 2 , D are known. Then cAIC = − 2 log g ( y | ˆ β ( y ) , ˆ b ( y )) + 2 ρ is an unbiased estimator of the conditional Akaike information. • g ( y | ˆ β, ˆ b ) is the conditional distribution • ρ = effective d.f. • Unknown σ 2 , correction = ρ + 1 asymptotically, small sample correction available • No correction needed for unknown D asymptotically 17

Back to Cadralazine data: Random effects model Linear regression model mAIC cAIC AIC Asymptotic 11 . 0 − 44 . 5 − 47 . 1 Finite-sample corrected 12 . 6 − 42 . 3 − 22 . 8 REML − − 43 . 7 − 40 . 6 18

Theorem: Conditional AIC for GLMM/NLMM Assume that y ∼ GLMM or NLMM and σ 2 , D are known. Then, under regularity conditions cAIC = − 2 log g ( y | ˆ β, ˆ b ) + 2 ρ is an asymptotically unbiased estimator of the conditional Akaike information. • Regularity conditions include m/N → 0 as N → ∞ ; min | Z i | ≥ c 0 (Jiang, Jia and Chen, 2001). They ensure consistency of β, b i . • No results yet for unknown σ 2 ; correction = ρ + 1? 19

Simulation study Bias of cAIC as an estimator of cAI = − 2 cond loglik + 2( ρ + 1) σ n i .50 .25 .125 24 1.7 0.5 0.0 12 3.0 1.0 0.4 6 4.9 3.0 1.9 3 9.5 9.7 8.7 10 clusters, n i observations each; One-compartment PK model Bias reduces with increasing n i and decreasing σ/ || D || . Bias includes effects of unknown D and model non-linearity 20

Cadralazine Data 5 10 15 20 25 8 2 2 1.5 1 0.5 0 7 5 3 4 Cadralizine concentration (mg/L) 2 1.5 1 0.5 0 6 1 9 10 2 1.5 1 0.5 0 5 10 15 20 25 5 10 15 20 25 Time since drug administration (hrs) Mean random var cAIC ρ σ 2 exp { β 1 i + β 2 t ij } 10.23 − 473 . 88 β 1 i σ 2 exp { β 1 i + β 2 i t ij } − 473 . 88 β 1 i , β 2 i 10.23 σ 2 exp { β 1 i + β 2 i t ij } − 577 . 18 none 30 i 21

Conditional AIC For Mixed Effects Models Florin Vaida Division of - PowerPoint PPT Presentation

Conditional AIC For Mixed Effects Models Florin Vaida Division of Biostatistics and Bioinformatics, UCSD Vienna Workshop on Model Selection July 24, 2008 1 Model Selection for Mixed Effects Models Setting: longitudinal data Subjects i

AIC Progr Annual Inspection Certification 2010 Annual Inspection Certification (AIC) (AIC)

Why Mixed Effects Models? Mixed Effects Models Recap/Intro Three issues with ANOVA

Regression 2: Mixed Models Marco Baroni Practical Statistics in R Outline Mixed models with

Mixing it up with random effects Joshua Loftus Mixed models Intro to mixed models What is a

to AIC Systems 1 In compliance with MOHH IT Security Policy The following AIC systems will

to AIC Systems 1 In compliance with MOHH IT Security Policy The following AIC systems will

Nonlinear mixed-effects models using Stata Yulia Marchenko Executive Director of Statistics

Nonlinear mixed-effects models using Stata Yulia Marchenko Executive Director of Statistics

Workshop 11.2a: Generalized Linear Mixed Effects Models (GLMM) Murray Logan February 7, 2017

Mixed models in R using the lme4 package Part 3: Linear mixed models with simple, scalar random

Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random

Linear mixed effect model- Birth rates data Richard Erickson Quantitative Ecologist DataCamp

Using Mixed Effects Models in Psychology Scott Fraundorf sfraundo@pitt.edu Office: 608 LRDC

Conditional Inference Functions for Mixed-Effects Models with Unspecified Random-Effects

Outline Statistical inference for linear mixed models general form of linear mixed models

Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in

Wikimoldia Digital Revitalization of the Moldovan Language Dr. Christian

Outline 1 Introduction Motivation Some Definitions Previous Results 2 Methods for finding LI

KUL guest presentation Florin Stoican Norwegian University of Science and Technology (NTNU) -

Article Feedback V5 Project Update Prepared by: Fabrice Florin Wikimedia Foundation

Joint International Meeting of the American Mathematical Society and the Romanian Mathematical

Maximally Parallel Contextual String Rewriting a 1 Traian Florin S , erb anut , University

Factorizing a string into squares in linear time Yoshiaki Matsuoka, Shunsuke Inenaga, Hideo

Upscaling of the Reaction-Advection-Diffusion Equation in Porous Media with Monod-Like Kinetics