dealing with missing values part 2
play

Dealing with missing values part 2 Applied Multivariate Statistics - PowerPoint PPT Presentation

Dealing with missing values part 2 Applied Multivariate Statistics Spring 2012 Overview More on Single Imputation: Shortcomings Multiple Imputation: Accounting for uncertainty Appl. Multivariate Statistics - Spring 2012 2 Single


  1. Dealing with missing values – part 2 Applied Multivariate Statistics – Spring 2012

  2. Overview  More on Single Imputation: Shortcomings  Multiple Imputation: Accounting for uncertainty Appl. Multivariate Statistics - Spring 2012 2

  3. Single Imputation Easy / Inaccurate  Unconditional Mean  Unconditional Distribution  Conditional Mean  Conditional Distribution Hard / Accurate Appl. Multivariate Statistics - Spring 2012 3

  4. Example: Blood Pressure - Revisited  30 participants in January (X) and February (Y)  MCAR: Delete 23 Y values randomly  MAR: Keep Y only where X > 140 (follow-up)  MNAR: Record Y only where Y > 140 (test everybody again but only keep values of critical participants) Appl. Multivariate Statistics - Spring 2012 4

  5. Black points are missing (MAR) Example: Blood Pressure Appl. Multivariate Statistics - Spring 2012 5

  6. + Mean of Y ok - Variance of Y wrong Unconditional Mean Appl. Multivariate Statistics - Spring 2012 6

  7. + Mean of Y ok, Variance better - Correlation btw X and Y wrong Unconditional Distribution Appl. Multivariate Statistics - Spring 2012 7

  8. + Conditional Mean of Y ok + Correlation ok Conditional Mean - (Conditional) Variance wrong Y = 84 + 0.3*X Appl. Multivariate Statistics - Spring 2012 8

  9. + Conditional Mean of Y ok + Correlation ok Conditional Distribution + Conditional Variance of Y ok Y = 84 + 0.3*X + e e ~ N(0, 23 2 ) Appl. Multivariate Statistics - Spring 2012 9

  10. Problem: We ignore uncertainty Conditional Distribution Y = 84 + 0.3*X + e e ~ N(0, 23 2 ) 95%-CI: [-234; 402] 95%-CI: [-1.7; 2.4] Appl. Multivariate Statistics - Spring 2012 10

  11. Problem of Single Imputation  Too optimistic: Imputation model (e.g. in Y = a + bX) is just estimated, but not the true model  Thus, imputed values have some uncertainty  Single Imputation ignores this uncertainty  Coverage probability of confidence intervals is wrong  Solution: Multiple Imputation Incorporates both - residual error - model uncertainty (excluding model mis-specification) Appl. Multivariate Statistics - Spring 2012 11

  12. Multiple Imputation: Idea ? ? Aggregate results Do standard analysis for each imputed data set; Impute several times get estimate and std.error Appl. Multivariate Statistics - Spring 2012 12

  13. Multiple Imputation: Idea  Need special imputation schemes that include both - uncertainty of residuals - uncertainty of model (e.g. values of intercept a and slope b)  Rough idea: - Fill in random values - Iteratively predict values for each variable until some convergence is reached (as in missForest) - Sample values for residuals AND for (a,b)  Gibbs sampler is used  Excellent for intuition (by one of the big guys in the field): http://sites.stat.psu.edu/~jls/mifaq.html Appl. Multivariate Statistics - Spring 2012 13

  14. Multiple Imputation: Intuition Predict missing values accounting for - Uncertainty of residuals - Uncertainty of parameter estimates Appl. Multivariate Statistics - Spring 2012 14

  15. Multiple Imputation: Intuition Predict missing values accounting for - Uncertainty of residuals - Uncertainty of parameter estimates Appl. Multivariate Statistics - Spring 2012 15

  16. Multiple Imputation: Intuition Predict missing values accounting for - Uncertainty of residuals - Uncertainty of parameter estimates Appl. Multivariate Statistics - Spring 2012 16

  17. Multiple Imputation: Intuition Predict missing values accounting for - Uncertainty of residuals - Uncertainty of parameter estimates Appl. Multivariate Statistics - Spring 2012 17

  18. Multiple Imputation: Intuition Predict missing values accounting for - Uncertainty of residuals - Uncertainty of parameter estimates Appl. Multivariate Statistics - Spring 2012 18

  19. Multiple Imputation: Intuition Predict missing values accounting for - Uncertainty of residuals - Uncertainty of parameter estimates Appl. Multivariate Statistics - Spring 2012 19

  20. Multiple Imputation: Gibbs sampler (Not for exam)  Iteration t; repeat until convergence: Intuition For each variable i: µ ¤ ( t ) ; Y ( t ) » P ( µ i j Y obs ¡ i ) Sample (a,b) i i Y ¤ ( t ) ¡ i ;µ ¤ ( t ) ; Y ( t ) » P ( Y i j Y obs ) i i i Predict missings using y = a + bx + e ;Y ¤ ( t ) Y ( t ) = ( Y obs where ) i i j Appl. Multivariate Statistics - Spring 2012 20

  21. R package: MICE Multiple Imputation with Chained Equations  MICE has good default settings; don’t worry about the data type  Defaults for data types of columns: - numeric: Predictive Mean Matching (pmm) (like fancy linear regression; faster alternative: linear regression) - factor, 2 lev: Logistic Regression (logreg) - factor, >2 lev: Multinomial logit model (polyreg) - ordered, >2 lev: Ordered logit model (polr) Appl. Multivariate Statistics - Spring 2012 21

  22. Aggregation of estimates  : Estimate of imputation i ^ Q i : Variance of estimate (= square of std. error) U i ^  Assume: Q ¡ Q ¼ N (0 ; 1) p U P m  Average estimate: ¹ j =1 ^ Q = 1 Q j m P m  Within-imputation variance: ¹ j =1 ^ U = 1 U j m P m  Between-imputation variance: j =1 ( ^ Q j ¡ ¹ 1 Q ) 2 B = m ¡ 1  Total variance: T = ¹ 1 U + m ¡ 1 B ³ ´ 2 ¹ Q ¡ Q m ¹  Approximately: with T » t º U p º = ( m ¡ 1) 1 + (1+ m ) B p  95%-CI: ¹ Q § t º ;0 : 975 T Appl. Multivariate Statistics - Spring 2012 22

  23. Do manually, if you have Multiple Imputation with MICE non standard analysis Appl. Multivariate Statistics - Spring 2012 23

  24. How much uncertainty due to missings?  Relative increase in variance due to nonrespose: r = (1+ 1 m ) B ¹ U  Fraction (or rate) of missing information fmi: (!! Not the same as fraction of missing OBSERVATIONS) 2 r + º +3 fmi = r +1  Proportion of the total variance that is attributed to the missing data: ¸ = B (1+ 1 m ) Returned by mice T Appl. Multivariate Statistics - Spring 2012 24

  25. Rule of thumb: How many imputations? - Preliminary analysis: m = 5 - Paper: m = 20 or even m = 50  Surprisingly few! m = 1  Efficiency compared to depends on fmi: ³ ´ ¡ 1 1 + fmi eff = m Oftentimes OK  Examples (eff in %): Perfect ! M fmi=0.1 fmi=0.3 fmi=0.5 fmi=0.7 fmi=0.9 3 97 91 86 81 77 5 98 94 91 88 85 10 99 97 95 93 92 20 100 99 98 97 96 Appl. Multivariate Statistics - Spring 2012 25

  26. Concepts to know  Idea of mice  How to aggregate results from imputed data sets?  How many imputations? Appl. Multivariate Statistics - Spring 2012 26

  27. R functions to know  mice, with, pool Appl. Multivariate Statistics - Spring 2012 27

  28. Next time  Multidimensional Scaling  Distance metrics Appl. Multivariate Statistics - Spring 2012 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend