dealing with missing data
play

Dealing With Missing Data Possible Future Topics Novice user - PowerPoint PPT Presentation

Dealing With Missing Data Possible Future Topics Novice user topics: Advanced topics: Using R Growth curve modeling of eye Contrast coding & fixations hypothesis testing Overfitting Random slopes & Path


  1. Dealing With Missing Data

  2. Possible Future Topics ● Novice user topics: ● Advanced topics: ● Using R ● Growth curve modeling of eye ● Contrast coding & fixations hypothesis testing ● Overfitting ● Random slopes & ● Path analysis model comparison ● Logit & probit ● Principal models components analysis

  3. Dealing With Missing Data

  4. Outline ● Ways Data May Be Missing ● Solutions ● Deletion ● Imputation ● In MLM framework ● Means & Centering ● Special Case: Incomplete Designs

  5. Big Issue: WHY data is missing ● The fact that data is missing may itself be data! ● Missingness of data may not be arbitrary ● Affects what conclusions we can draw from the data we do have

  6. Big Issue: WHY data is missing ● Missing Completely at Random: Missingness unrelated to variables in experiment ● Computer crashes, snow day, etc... ● Missing at Random : May be related to predictor ● Production experiments: More unusable responses in some conditions ● Missing Not at Random : Missingness related to outcome measure even after controlling for predictors ● RT w/ a cutoff ● People w/ low memory don't return for test

  7. Outline ● Ways Data May Be Missing ● Solutions ● Deletion ● Imputation ● In MLM framework ● Means & Centering ● Special Case: Incomplete Designs

  8. Solutions: Deletion Methods ● Listwise deletion ● Drop cases with any missing variables ● Default in R … and in most software ● Properties ● Only OK if missingness completely random – otherwise, looking at selective group ● Potentially, losing a lot of data!

  9. Solutions: Deletion Methods ● Listwise deletion ● Pairwise deletion ● Drop cases separately for computing each effect ● Properties ● Less data loss ● Results not completely consistent / comparable ● Again, missingness needs to be completely random

  10. Solutions: Imputation Methods ● Mean Imputation ● Replace missing values with the variable's mean ● Underestimates variance – Thus, increases chance of detecting spurious effects 5, 8, 3, 5.33, 5.33 5, 8, 3, ?, ? M = 5.33 M = 5.33 σ 2 = 3.17 σ 2 = 12.5

  11. Solutions: Imputation Methods ● Mean Imputation ● Conditional Imputation ● If Y missing, impute value predicted by regression based on other cases ● Possibly with some amount of error (to preserve variance) ● OK in a lot of missing-at-random cases = + RT WM Vocab ? RT

  12. Solutions: Imputation Methods ● Mean Imputation ● Conditional Imputation ● Multiple Imputation ● Impute multiple possible values & fit model to each ● Final result averages over these ● Can see how much that one value affects results ● Solution preferred by Schafer & Graham (2002) ● Software available for this, at least for standard regression

  13. What if Missing NOT at Random? ● Cases where the DV determines missingness ● Solutions ● In many cases, “only a minor impact” on results (Schafer & Graham, 2002, p. 152) ● Can also try: – Model the missingness in some way ● e.g. missingness and observed DV are both indicators of a latent variable – Grouping participants by missingness – See Schafer and Graham (2002) for more details

  14. Outline ● Ways Data May Be Missing ● Solutions ● Deletion ● Imputation ● In MLM framework ● Means & Centering ● Special Case: Incomplete Designs

  15. In the MLM Context ● Simulations by Quene & van den Bergh (2004) of casewise deletion ● Robust even with lots of data missing (25%) ● But this would require missingness to be completely at random ● “This robustness is only if data are missing in a random fashion. If observations were predominantly missing for certain participants and/or under certain treatments, then the full and reduced data sets would not have yielded similar estimates” (p 116).

  16. Outline ● Ways Data May Be Missing ● Solutions ● Deletion ● Imputation ● In MLM framework ● Means & Centering ● Special Case: Incomplete Designs

  17. Means & Unbalanced Data ● When unbalanced, mean of all observations may not be the same as mean of means Primed 600, 600, 700, 700, 700 660 933 Unprimed 900, 900, 1000 796.67 MEAN 762

  18. Centering ● If missingness is completely at random , assumption is that “mean of means” is what you're interested in ● “Controlling for the missingness”

  19. Centering ● Mean centering / reweighting of fixed effects ● Suppose I code Primed as 1 as Unprimed as -1 Primed 1 1 1 1 1 MEAN: 0.40 Overall mean ( intercept ) more -1 -1 -1 Unprimed influenced by Primed condition ● Reweight: less numerous level gets stronger weight Primed .375 .375 .375 .375 .375 MEAN: 0.00 Overall mean ( intercept ) -.625 -.625 -.625 equally influenced by each Unprimed condition

  20. Centering ● If missingness is completely at random , assumption is that “mean of means” is what you're interested in ● “Controlling for the missingness” ● Can get this by centering fixed effects ● lmer does this automatically with random effects (subjects & items)

  21. Centering ● What does centering affect? ● Value & interpretation of intercept ● Main effect estimates & tests if an interaction ● What is unaffected? ● Interactions ● Main effects if no interaction

  22. Outline ● Ways Data May Be Missing ● Solutions ● Deletion ● Imputation ● In MLM framework ● Means & Centering ● Special Case: Incomplete Designs

  23. Incomplete Designs ● Designs where an entire cell is missing WORDS FACES FAST Young Adults PRESENTATION SLOW PRESENTATION WORDS FACES FAST PRESENTATION Older Adults SLOW PRESENTATION

  24. Incomplete Designs ● Designs where an entire cell is missing ● Not possible to include all interactions in the model FAST, FAST, FACES WORDS ● We don't know the 2-way interaction effect for older SLOW, SLOW, adults … so can't look at the WORDS FACES 3-way interaction involving age ● Can still look at some lower- FAST, order effects (e.g. Age x Speed) if you FACES assume no 3-way interaction – Would be inappropriate if there SLOW, SLOW, is an interaction since we're WORDS FACES missing part of the picture!

  25. Incomplete Designs ● Designs where an entire cell is missing ● lmer error message: ● Error in mer_finalize(ans) : Downdated X'X is not positive definite. ● Dependencies in data → matrix is not full rank → not invertible ● The good news: If cell missing by FAST, design, clearly predicted only by FACES IVs and unrelated to the DV SLOW, SLOW, ● Thus, missing at random WORDS FACES

  26. Outline ● Ways Data May Be Missing ● Solutions ● Deletion ● Imputation ● In MLM framework ● Means & Centering ● Special Case: Incomplete Designs

  27. ● Encyclopedia Brown confronted local troublemaker “Bugs” Meany about the missing data. Bugs says he distinctly remembers storing the missing sheet of data between pages 151 and 152 of his lab notebook. Bugs says that the sheet must have just fallen out when Bugs's gang, the Tigers, were cleaning their clubhouse. ● How did Encyclopedia know Bugs was lying? sheet. Pages 151 and 152 are the front and back of the same

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend