imputation of missing covariates when standard methods
play

Imputation of missing covariates: when standard methods may fail - PowerPoint PPT Presentation

Imputation of missing covariates: when standard methods may fail Nicole S. Erler 1 , 2 , Dimitris Rizopoulos 1 , Oscar H. Franco 2 , Emmanuel M.E.H. Lesaffre 1 , 3 1 Department of Biostatistics, Erasmus MC, Rotterdam, the Netherlands 2 Department


  1. Imputation of missing covariates: when standard methods may fail Nicole S. Erler 1 , 2 , Dimitris Rizopoulos 1 , Oscar H. Franco 2 , Emmanuel M.E.H. Lesaffre 1 , 3 1 Department of Biostatistics, Erasmus MC, Rotterdam, the Netherlands 2 Department of Epidemiology, Erasmus MC, Rotterdam, the Netherlands 3 L-Biostat, KU Leuven, Leuven, Belgium

  2. Motivation (1) Vitamin D concentration during fetal life and bone health at age 6 • bone mineral content (BMC) • serum vitamin D concentration ( ✻ ) • sun exposure ( ✻ ), season at measurement ( ✻ ) • gender, age at measurement • . . . ( ✻ ) ( ✻ ) incomplete Analysis model: BMD = ( age + V itD + V itD 2 ) × gender + season + sun exposure + . . . Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 1

  3. Motivation (2) Maternal sugar-sweetened bevarage consumption and child’s body composition • child BMI at up to 13 time points • maternal sugar-sweetened bevarage consumption (SBC) • child’s physical activity, TV watching ( ✻ ) • gender, age at measurement • . . . ( ✻ ) ( ✻ ) incomplete Analysis model: BMI ij = SBC i + age ij + . . . + u 0 i + u 1 i × age ij Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 2

  4. Standard for imputation: Multiple Imputation (MI) impute ➡ analyze ➡ pool fully conditional specification ( FCS ) joint model imputation chained equations ( MICE ) ➡ In iteration k = 1 , . . . , K : for variable j = 1 , . . . , p : � k k j ∼ p ( θ k e.g. regression with ❼ Draw parameter ˆ , ˆ j | x obs − j ) θ X j all other variables k x k , X k − j , ˆ j ∼ p ( x mis | x obs ❼ Draw imputation ˆ j ) θ in the lin. predictor j j ➡ keep last iteration ➡ 1 imputed data set ➡ repeat m times Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 3

  5. Standard for imputation: Multiple Imputation (MI) impute ➡ analyze ➡ pool fully conditional specification ( FCS ) joint model imputation chained equations ( MICE ) ➡ In iteration k = 1 , . . . , K : for variable j = 1 , . . . , p : � k k j ∼ p ( θ k e.g. regression with ❼ Draw parameter ˆ , ˆ j | x obs − j ) θ X j all other variables k x k , X k − j , ˆ j ∼ p ( x mis | x obs ❼ Draw imputation ˆ j ) θ in the lin. predictor j j ➡ keep last iteration ➡ 1 imputed data set ➡ repeat m times Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 3

  6. Standard for imputation: Multiple Imputation (MI) impute ➡ analyze ➡ pool fully conditional specification ( FCS ) joint model imputation chained equations ( MICE ) ➡ In iteration k = 1 , . . . , K : for variable j = 1 , . . . , p : � k k j ∼ p ( θ k e.g. regression with ❼ Draw parameter ˆ , ˆ j | x obs − j ) θ X j all other variables k x k , X k − j , ˆ j ∼ p ( x mis | x obs ❼ Draw imputation ˆ j ) θ in the lin. predictor j j ➡ keep last iteration ➡ 1 imputed data set ➡ repeat m times Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 3

  7. Standard for imputation: Multiple Imputation (MI) impute ➡ analyze ➡ pool fully conditional specification ( FCS ) joint model imputation chained equations ( MICE ) ➡ In iteration k = 1 , . . . , K : for variable j = 1 , . . . , p : � k k j ∼ p ( θ k e.g. regression with ❼ Draw parameter ˆ , ˆ j | x obs − j ) θ X j all other variables k x k , X k − j , ˆ j ∼ p ( x mis | x obs ❼ Draw imputation ˆ j ) θ in the lin. predictor j j ➡ keep last iteration ➡ 1 imputed data set ➡ repeat m times Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 3

  8. Requirements for MICE • all relevant variables must be included – covariates (from all analyses) – the outcome • compatibility: a joint model exists that has the imputation models as its conditional distributions • congeniality: compatibility between analysis model and imputation model • imputation models should fit the data • M(C)AR (in most implementations) Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 4

  9. When MICE might fail Imputation model not congenial with analysis: • quadratic, logarithmic, . . . effects • interactions between covariates Complex (non univariate) outcomes: • survival • longitudinal Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 5

  10. Uncongeniality y = β 0 + β 1 x 1 + β 2 x 2 True model: 1 + . . . (quadratic association) Imputation model: x 1 = θ 10 + θ 11 y + . . . (linear association) y x 1 Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 6

  11. Uncongeniality y = β 0 + β 1 x 1 + β 2 x 2 True model: 1 + . . . (quadratic association) Imputation model: x 1 = θ 10 + θ 11 y + . . . (linear association) y x 1 Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 6

  12. Uncongeniality y = β 0 + β 1 x 1 + β 2 x 2 True model: 1 + . . . (quadratic association) Imputation model: x 1 = θ 10 + θ 11 y + . . . (linear association) y x 1 Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 6

  13. Uncongeniality y = β 0 + β 1 x 1 + β 2 x 2 True model: 1 + . . . (quadratic association) Imputation model: x 1 = θ 10 + θ 11 y + . . . (linear association) y original imputed x 1 Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 6

  14. Simple approaches • passive normal imputation: standard MICE ➡ calculate interactions & non-lin. terms afterwards • predictive mean matching (pmm) (also passive) use pmm instead of linear regression for imputation • just another variable – calculate interactions & non-lin. terms before imputation – add as columns to data set (Can be done in SPSS) Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 7

  15. Simple approaches • passive normal imputation: standard MICE ➡ calculate interactions & non-lin. terms afterwards • predictive mean matching (pmm) (also passive) use pmm instead of linear regression for imputation • just another variable – calculate interactions & non-lin. terms before imputation – add as columns to data set (Can be done in SPSS) Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 7

  16. Simple approaches • passive normal imputation: standard MICE ➡ calculate interactions & non-lin. terms afterwards • predictive mean matching (pmm) (also passive) use pmm instead of linear regression for imputation • just another variable – calculate interactions & non-lin. terms before imputation – add as columns to data set (Can be done in SPSS) Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 7

  17. Some advanced approaches • smcfcs: S ubstantive M odel C ompatible FCS ➡ MICE type approach • jomo: joint modeling MI using multivariate normal distribution ➡ joint model MI • JointAI: joint analysis and imputation ➡ not MI, but simultaneous analysis & imputation Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 8

  18. Some advanced approaches • smcfcs: S ubstantive M odel C ompatible FCS ➡ MICE type approach • jomo: joint modeling MI using multivariate normal distribution ➡ joint model MI • JointAI: joint analysis and imputation ➡ not MI, but simultaneous analysis & imputation Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 8

  19. Some advanced approaches • smcfcs: S ubstantive M odel C ompatible FCS ➡ MICE type approach • jomo: joint modeling MI using multivariate normal distribution ➡ joint model MI • JointAI: joint analysis and imputation ➡ not MI, but simultaneous analysis & imputation Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 8

  20. Some advanced approaches • smcfcs: S ubstantive M odel C ompatible FCS ➡ MICE type approach • jomo: joint modeling MI using multivariate normal distribution ➡ joint model MI • JointAI: joint analysis and imputation ➡ not MI, but simultaneous analysis & imputation Explicitly take into account the analysis model in the sampling distribution for ˆ x j Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 8

  21. Simulation study (I): Data setup Models: linear regression with • interaction • logarithmic or quadratic effect • combinations Missing values: • in one or two covariates • MAR, depending on outcome (and other covariate) • 20%, 40%, 60% Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 9

  22. Simulation study (I): Data setup Models: linear regression with • interaction • logarithmic or quadratic effect • combinations Missing values: • in one or two covariates • MAR, depending on outcome (and other covariate) • 20%, 40%, 60% Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 9

  23. Simulation study (I): Methods Approaches using the mice package: • norm • pmm • JAV (using pmm ) other packages: • smcfcs: smcfcs() • jomo: jomo.lm() • JointAI: lm imp() Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 10

  24. (effect of c 2 qdr. with interaction: y ∼ c 1 + ( c ( ∗ ) + c 2( ∗ ) 2 × b ) ) × b ( ∗ ) 2 2 Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 11

  25. Summary of Simulation Study (I) interaction log quadratic interact & qdr norm pmm JAV � smcfcs � jomo JointAI Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 12

  26. When MICE might fail � Imputation model not congenial with analysis: • quadratic, logistic, . . . , effects • interactions between covariates Complex (non univariate) outcomes: • survival • longitudinal Nicole Erler, 38th Conference of the ISCB, Vigo, 2017 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend