Overview Multiple Imputation for Multilevel Data Bayesian - PowerPoint PPT Presentation

Overview Multiple Imputation for Multilevel Data Bayesian estimation for MLMs Univariate multiple imputation Session 1 Craig K. Enders Brian T. Keller Joint model imputation University of California - Los Angeles Fully conditional specification Department of Psychology Incomplete categorical variables Session 2 Work supported by IES award R305D150056 Software examples Why Imputation? Model Notation Dedicated multilevel programs restricts maximum Two-level model with observation i nested in likelihood estimation to incomplete outcomes cluster j (e.g., student i in school j ) Multilevel SEM software is more flexible but typically imposes normality on incomplete predictors and may perform poorly in some cases Imputation is flexible (e.g., mixtures of categorical and continuous variables are no problem)

Bayesian Estimation And Imputation Bayesian estimation (e.g., Gibbs sampler) is the mathematical machinery for imputation Bayesian Estimation For Multilevel Models Each algorithmic cycle is a complete-data Bayes analysis followed by an imputation step A multilevel model generates imputations Analysis Example Bayesian Paradigm The Bayesian framework views parameters and Random intercept model with a level-1 predictor level-2 residuals as random variables that follow a probability distribution (a posterior) Assume complete data, estimation steps do not change with missing values Posterior Prior Likelihood

Gibbs Sampler Gibbs Sampler Steps For One Iteration An iterative Gibbs sampler algorithm estimates quantities in one at a time, treating all other Estimate regression coefficients variables as known Estimate level-2 random effects Monte Carlo simulation “samples” parameter Estimate within-cluster residual variance values from their conditional distributions Estimate level-2 covariance matrix Repeating the sampling steps many times yields a distribution of each estimate Estimating Regression Coefficients Conditional Distribution Regression coefficients are drawn from a multivariate normal distribution that conditions on random effects, variances, and the data Current iteration Previous iteration

Estimating Level-2 Random Effects Conditional Distribution Level-2 random effects are drawn from a multivariate normal distribution that conditions on the coefficients, variances, and the data Updated estimates Previous iteration Estimating The Residual Variance Conditional Distribution The within-cluster residual variance is drawn from an inverse Wishart distribution that conditions on the previous coefficients, random effects, level-2 covariance matrix, and the data

Estimating Level-2 Covariance Matrix Conditional Distribution The level-2 covariance matrix is sampled from an inverse Wishart distribution that conditions on the previous coefficients, random effects, residual variance, and the data Iteration t is complete, start anew at iteration t + 1 Multilevel Imputation Imputation uses a model with an incomplete variable regressed on complete variables Univariate Multiple Imputation Bayesian estimation steps are applied to the filled-in data from the previous iteration Model parameters and level-2 residuals define a distribution from which imputations are sampled

Analysis And Imputation Models Gibbs Sampler Steps Random intercept analysis model with an Estimate coefficients incomplete predictor Estimate random effects Complete-data Bayes estimation Estimate residual variance Random intercept imputation model with the Estimate covariance matrix incomplete predictor as the outcome Update imputations Imputation step Random Intercept Imputation Model Distribution Of Missing Values Cluster 1 A normal distribution generates imputations, Cluster 2 with center equal to the predicted value for Cluster 3 observation i in cluster j and spread equal to the X (Incomplete) within-cluster residual variance Y (Complete)

Random Intercept Imputation Model Random Intercept Imputation Model Cluster 1 Cluster 1 Imputation = Cluster 2 Cluster 2 Cluster 3 Cluster 3 X (Incomplete) X (Incomplete) Y (Complete) Y (Complete) Burn-In Period Thinning Interval Estimate parameters Estimate parameters Update imputations Update imputations Estimate parameters Update imputations Estimate parameters Update imputations Burn-in interval Thinning interval (e.g., 2000) (e.g., 2000) Estimate parameters Estimate parameters Update imputations Update imputations Iterate . . . . . Iterate . . . . . Save data set 1 Save data set 2 Estimate parameters Update imputations Estimate parameters Update imputations

Repeat Until Finished … Analysis And Pooling The analysis model is fit to each data set, and Estimate parameters Update imputations the arithmetic average of the M estimates is the multiple imputation point estimate Estimate parameters Update imputations Thinning interval (e.g., 2000) Estimate parameters Update imputations Iterate . . . . . Save data set 20 Estimate parameters Update imputations Pooling assumes a normal sampling distribution Pooling Standard Errors Multivariate Missing Data Average sampling Joint model imputation uses multivariate variance regression to impute the set of missing variables Fully conditional specification imputes variables Variance across one at a time in a sequence imputations Both are multilevel extensions of major single- Standard error level imputation frameworks

Joint Model Imputation Two forms: Multivariate Imputation With 1) Multivariate regression model with incomplete variables regressed on complete variables The Joint Modeling Framework 2) Empty model treating all variables as outcomes Available in Mplus, MLwiN, and R packages (e.g., jomo, pan, mlmmm) Imputation Model Random Intercept Analysis Model Two-level random intercept analysis with continuous level-1 and level-2 predictors All variables have missing data

Covariance Structure Imputation Step ? Level-2 Level-1 Compatible Analysis Models Compatibility Of Imputation And Analysis Contextual effects analyses The imputation model is more flexible than the analysis model because it allows level-1 and level-2 covariance matrices to freely vary Multilevel SEM The analysis model assumes a common slope Imputations are appropriate for random intercept analyses that partition relations into within- and between-cluster parts

R Package jomo Mplus # load packages data: library (jomo) file = ridata.csv; variable: # read raw data names = cluster av1 av2 y x w; dat <- read.table("~/desktop/examples/ridata.csv", sep = ",") usevariables = av1 av2 y x w; names(dat) = c("cluster", "av1", "av2", "y", "x","w") missing = all(999); dat[dat == 999] <- NA analysis: type = basic; # jomo imputation set.seed(90291) bseed = 90291; dat$icept <- 1 data imputation: l1miss <- c("y", "x") impute = y x w; l2miss <- c("w") ndatasets = 20; l1complete <- c("icept") save = imp*.dat; l2complete <- c("icept") thin = 1000; impdata <- jomo(dat[l1miss], Y2 = dat[l2miss], X = dat[l1complete], output: X2 = dat[l2complete], clus = dat$cluster, tech8; nburn = 2000, nbetween = 2000, nimp = 20, meth = "common") Complete Data Joint Model Imputation Simulation Study J = 30, n j = 5 J = 30, n j = 30 Random intercept model with 1000 replications Intercept L1 Slope ICC = .25, medium effect sizes L2 Slope 30 clusters with 5 or 30 observations per cluster Intercept Var. (i.e., N = 150 and 900) Residual Var. 15% MAR missing data on all analysis variables -40 -20 0 20 -40 -20 0 20 20 imputations with R package jomo Percentage Bias Percentage Bias

Random Slope Analysis Model Joint Model Limitations Two-level random slope analysis with continuous Within-cluster covariances must preserve level-1 level-1 and level-2 predictors relations, including the random coefficients The classic formulation of the joint model assumes a common covariance matrix at level-1 Imputation ignores random slope variation All variables have missing data Covariance Structure Revisited Simulation Study Random slope model with 1000 replications ICC = .25, medium effect sizes ? 30 clusters with 5 or 30 observations per cluster Level-2 (i.e., N = 150 and 900) Level-1 15% MAR missing data on all analysis variables 20 imputations with R package jomo

Complete Data Joint Model Imputation Brief Maximum Likelihood Detour J = 30, n j = 5 J = 30, n j = 30 J = 30, n j = 30 Intercept Mplus allows incomplete Intercept L1 Slope random slope predictors L2 Slope L1 Slope L2 Slope Intercept Var. Requires numerical Intercept Var. Covariance integration and many Covariance Slope Var. latent variable products Slope Var. Residual Var. Residual Var. Often yields severe bias -40 -20 0 20 -40 -20 0 20 -40 -20 0 20 Percentage Bias Percentage Bias Percentage Bias Joint Modeling With Covariance Structure Random Level-1 Covariance Matrices Yucel (2011) extended the joint model to ? incorporate random level-1 covariance matrices Level-2 Available in the R package jomo Level-1 Currently limited to 2-level models

Overview Multiple Imputation for Multilevel Data Bayesian - PowerPoint PPT Presentation

Overview Multiple Imputation for Multilevel Data Bayesian estimation for MLMs Univariate multiple imputation Session 1 Craig K. Enders Brian T. Keller Joint model imputation University of California - Los Angeles Fully conditional

Multiple Imputation for Missing Data in KLoSA Juwon Song Korea University and UCLA Contents 1.

Consistent Variance Estimates for Multiple Multiple imputation Imputation in R MI alternative

Reference based multiple imputation; for sensitivity analysis of clinical trials with missing

Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH , 2017 Outline Types of missing data

Figure 1a: Multilevel Visualization of Market Power Cases in Tree Format Figure 1b: Multilevel

Agenda 1. More on multilevel R formulas 2. Generalized Multilevel Models 3. GMLM in R 1 More

Multilevel Krylov Methods Deflation Deflation, DD, MG Reinhard Nabben Multilevel Krylov

MixtComp software: Model-based clustering/imputation with mixed data, missing data and uncertain

Handling missing data in Stata: Imputation and likelihood-based approaches Rose Medeiros

Introduction to Multilevel Analysis Prof. Dr. Ulrike Cress Knowledge Media Research Center

Imputation by Gaussian Copula Model with an Application to Incomplete Customer Satisfaction Data

brms: Bayesian Multilevel Models using Stan Paul Brkner 2018-04-09 1 Why using Multilevel

Multilevel Cooling Systems August Hale May 16, 2014 August Hale Multilevel Cooling Systems May

The Challenge of The Challenge of Multilevel Security Multilevel Security Rick Smith, Ph.D.,

Introduction to Multilevel Analysis Introduction to Multilevel Analysis PSYC 575 PSYC 575 Mark

Cascadic Multilevel Methods for Cascadic Multilevel Methods for Large-Scale Ill-Posed Problems

Improving Quality and Access in Specialty Mental Health By Applying the Principles of Integrated

Welcome to Part 2 of the Wildland Fire Assessment Tool lesson: Applications . 1 In Part 1, you

Fitting Bayesian regression models using the bayes prefix Yulia Marchenko Executive Director of

Fire Mitigation, Fire Mitigation, Prescription Burning Prescription Burning and and Post-

Automatic selection of ergonomic indicators for the design of collaborative robots: a

The Natural Logarithm Function and The Exponential Function One specific logarithm function is

The logarithm as an inverse function In this section we concentrate on understanding the logarithm

Today Finish Linear Regression: Best linear function prediction of Y given X . MMSE: Best Function