Dealing with Missing Data
Challenges and Solutions
Nicole Erler
Department of Biostatistics, Erasmus Medical Center
Dealing with Missing Data Challenges and Solutions Nicole Erler - - PowerPoint PPT Presentation
Dealing with Missing Data Challenges and Solutions Nicole Erler Department of Biostatistics, Erasmus Medical Center n.erler@erasmusmc.nl N_Erler www.nerler.com NErler 13 January 2020 Handling Missing Values is Easy! Functions
Department of Biostatistics, Erasmus Medical Center
1
1
2
2
2
2
2
2
3
4
4
4
incomplete data multiple imputed datasets pooled results analysis results
5
6
6
7
7
7
x1 x2 x3 x4 ...
NA ... NA
...
NA
. . . . . . . . . . . .
8
x1 x2 x3 x4 ...
NA ... NA
...
NA
. . . . . . . . . . . .
8
x1 x2 x3 x4 ...
NA ... NA
...
NA
. . . . . . . . . . . .
8
x1 x2 x3 x4 ...
NA ... NA
...
NA
. . . . . . . . . . . .
8
x1 x2 x3 x4 ...
NA ... NA
...
NA
. . . . . . . . . . . .
8
x1 x2 x3 x4 ...
NA ... NA
...
NA
. . . . . . . . . . . .
8
9
10
10
10
10
11
12
13
incomplete covariate count incomplete covariate count covariate incomplete covariate
13
incomplete covariate count incomplete covariate count covariate incomplete covariate
14
incomplete covariate count incomplete covariate count covariate incomplete covariate
14
incomplete covariate count incomplete covariate count covariate incomplete covariate
14
15
16
17
17
18
18
Gibbs MICE
19
Gibbs MICE
19
y
missing
20
y
missing imputed
21
y
fit on complete fit on imputed
missing imputed
21
y
missing (z = 1)
22
y
missing (z = 1)
imputed (z = 0) imputed (z = 1)
23
x y
missing (z = 0) missing (z = 1)
imputed (z = 0) imputed (z = 1) true imputed
23
24
25
time y time y time y time y
26
27
Gibbs MICE
28
Gibbs MICE
28
Gibbs MICE
28
Gibbs MICE
28
29
29
30
30
31
X
31
32
32
32
33
33
34
35
35
35
36
37
37
37
38
## ## Linear model fitted with JointAI ## ## Call: ## lm_imp(formula = SBP ~ age + gender + smoke + occup, data = NHANES, ## n.iter = 300) ## ## Posterior summary: ## Mean SD 2.5% 97.5% tail-prob. GR-crit ## (Intercept) 106.222 3.3979 99.461 112.961 0.0000 1.00 ## age 0.427 0.0798 0.278 0.583 0.0000 1.00 ## genderfemale
0.0000 1.00 ## smokeformer
0.0267 1.03 ## smokecurrent
3.313 0.3711 1.01 ## occuplooking for work 3.817 6.4037
16.087 0.5044 1.01 ## occupnot working
4.256 0.7511 1.02 ## ## Posterior summary of residual std. deviation: ## Mean SD 2.5% 97.5% GR-crit ## sigma_SBP 14.3 0.753 12.8 15.8 0.999 ## ## ## MCMC settings ## [...]
39
40
40
40
40
41
41
41
41