Dealing With Missing Data Possible Future Topics Novice user - - PowerPoint PPT Presentation

dealing with missing data
SMART_READER_LITE
LIVE PREVIEW

Dealing With Missing Data Possible Future Topics Novice user - - PowerPoint PPT Presentation

Dealing With Missing Data Possible Future Topics Novice user topics: Advanced topics: Using R Growth curve modeling of eye Contrast coding & fixations hypothesis testing Overfitting Random slopes & Path


slide-1
SLIDE 1

Dealing With Missing Data

slide-2
SLIDE 2

Possible Future Topics

  • Novice user topics:
  • Using R
  • Contrast coding &

hypothesis testing

  • Random slopes &

model comparison

  • Logit & probit

models

  • Advanced topics:
  • Growth curve

modeling of eye fixations

  • Overfitting
  • Path analysis
  • Principal

components analysis

slide-3
SLIDE 3

Dealing With Missing Data

slide-4
SLIDE 4

Outline

  • Ways Data May Be Missing
  • Solutions
  • Deletion
  • Imputation
  • In MLM framework
  • Means & Centering
  • Special Case: Incomplete Designs
slide-5
SLIDE 5

Big Issue: WHY data is missing

  • The fact that data is missing may itself be data!
  • Missingness of data may not be arbitrary
  • Affects what conclusions we can draw from the

data we do have

slide-6
SLIDE 6

Big Issue: WHY data is missing

  • Missing Completely at Random: Missingness

unrelated to variables in experiment

  • Computer crashes, snow day, etc...
  • Missing at Random: May be related to predictor
  • Production experiments: More unusable responses

in some conditions

  • Missing Not at Random: Missingness

related to outcome measure even after controlling for predictors

  • RT w/ a cutoff
  • People w/ low memory don't return for test
slide-7
SLIDE 7

Outline

  • Ways Data May Be Missing
  • Solutions
  • Deletion
  • Imputation
  • In MLM framework
  • Means & Centering
  • Special Case: Incomplete Designs
slide-8
SLIDE 8

Solutions: Deletion Methods

  • Listwise deletion
  • Drop cases with any missing variables
  • Default in R … and in most software
  • Properties
  • Only OK if missingness completely random –
  • therwise, looking at selective group
  • Potentially, losing a lot of data!
slide-9
SLIDE 9

Solutions: Deletion Methods

  • Listwise deletion
  • Pairwise deletion
  • Drop cases separately for computing each effect
  • Properties
  • Less data loss
  • Results not completely consistent / comparable
  • Again, missingness needs to be completely random
slide-10
SLIDE 10

Solutions: Imputation Methods

  • Mean Imputation
  • Replace missing values with the variable's mean
  • Underestimates variance

– Thus, increases chance of detecting spurious effects

5, 8, 3, ?, ? M = 5.33 σ2 = 12.5 5, 8, 3, 5.33, 5.33 M = 5.33 σ2 = 3.17

slide-11
SLIDE 11

Solutions: Imputation Methods

  • Mean Imputation
  • Conditional Imputation
  • If Y missing, impute value predicted by regression

based on other cases

  • Possibly with some amount of error (to preserve

variance)

  • OK in a lot of missing-at-random cases

WM Vocab RT

= +

? RT

slide-12
SLIDE 12

Solutions: Imputation Methods

  • Mean Imputation
  • Conditional Imputation
  • Multiple Imputation
  • Impute multiple possible values & fit model to each
  • Final result averages over these
  • Can see how much that one value affects results
  • Solution preferred by Schafer & Graham (2002)
  • Software available for this, at least for standard

regression

slide-13
SLIDE 13

What if Missing NOT at Random?

  • Cases where the DV determines missingness
  • Solutions
  • In many cases, “only a minor impact” on results

(Schafer & Graham, 2002, p. 152)

  • Can also try:

– Model the missingness in some way

  • e.g. missingness and observed DV are both indicators of a latent

variable

– Grouping participants by missingness – See Schafer and Graham (2002) for more details

slide-14
SLIDE 14

Outline

  • Ways Data May Be Missing
  • Solutions
  • Deletion
  • Imputation
  • In MLM framework
  • Means & Centering
  • Special Case: Incomplete Designs
slide-15
SLIDE 15

In the MLM Context

  • Simulations by Quene & van den Bergh (2004)
  • f casewise deletion
  • Robust even with lots of data missing (25%)
  • But this would require missingness to be

completely at random

  • “This robustness is only if data are missing in a random
  • fashion. If observations were predominantly missing for

certain participants and/or under certain treatments, then the full and reduced data sets would not have yielded similar estimates” (p 116).

slide-16
SLIDE 16

Outline

  • Ways Data May Be Missing
  • Solutions
  • Deletion
  • Imputation
  • In MLM framework
  • Means & Centering
  • Special Case: Incomplete Designs
slide-17
SLIDE 17

Means & Unbalanced Data

  • When unbalanced, mean of all observations

may not be the same as mean of means

Primed Unprimed

600, 600, 700, 700, 700 900, 900, 1000

660 933 796.67

MEAN

762

slide-18
SLIDE 18

Centering

  • If missingness is completely at random,

assumption is that “mean of means” is what you're interested in

  • “Controlling for the missingness”
slide-19
SLIDE 19

Centering

  • Mean centering / reweighting of fixed effects
  • Suppose I code Primed as 1 as Unprimed as -1
  • Reweight: less numerous level gets stronger weight

1 1 1 1 1

  • 1
  • 1
  • 1

Primed Unprimed

Overall mean (intercept) more influenced by Primed condition

.375 .375 .375 .375 .375

  • .625 -.625 -.625

Primed Unprimed

MEAN: 0.40 MEAN: 0.00 Overall mean (intercept) equally influenced by each condition

slide-20
SLIDE 20

Centering

  • If missingness is completely at random,

assumption is that “mean of means” is what you're interested in

  • “Controlling for the missingness”
  • Can get this by centering fixed effects
  • lmer does this automatically with random

effects (subjects & items)

slide-21
SLIDE 21

Centering

  • What does centering affect?
  • Value & interpretation of intercept
  • Main effect estimates & tests if an interaction
  • What is unaffected?
  • Interactions
  • Main effects if no interaction
slide-22
SLIDE 22

Outline

  • Ways Data May Be Missing
  • Solutions
  • Deletion
  • Imputation
  • In MLM framework
  • Means & Centering
  • Special Case: Incomplete Designs
slide-23
SLIDE 23

Incomplete Designs

  • Designs where an entire cell is missing

WORDS FACES FAST PRESENTATION SLOW PRESENTATION WORDS FACES FAST PRESENTATION SLOW PRESENTATION

Young Adults Older Adults

slide-24
SLIDE 24

Incomplete Designs

  • Designs where an entire cell is missing
  • Not possible to include all

interactions in the model

  • We don't know the 2-way

interaction effect for older adults … so can't look at the 3-way interaction involving age

  • Can still look at some lower-
  • rder effects (e.g. Age x Speed) if you

assume no 3-way interaction

– Would be inappropriate if there

is an interaction since we're missing part of the picture!

FAST, WORDS FAST, FACES SLOW, WORDS SLOW, FACES FAST, FACES SLOW, WORDS SLOW, FACES

slide-25
SLIDE 25

Incomplete Designs

  • Designs where an entire cell is missing
  • lmer error message:
  • Error in mer_finalize(ans) : Downdated

X'X is not positive definite.

  • Dependencies in data → matrix is not full rank → not

invertible

  • The good news: If cell missing by

design, clearly predicted only by IVs and unrelated to the DV

  • Thus, missing at random

FAST, FACES SLOW, WORDS SLOW, FACES

slide-26
SLIDE 26

Outline

  • Ways Data May Be Missing
  • Solutions
  • Deletion
  • Imputation
  • In MLM framework
  • Means & Centering
  • Special Case: Incomplete Designs
slide-27
SLIDE 27
  • Encyclopedia Brown confronted local

troublemaker “Bugs” Meany about the missing

  • data. Bugs says he distinctly remembers

storing the missing sheet of data between pages 151 and 152 of his lab notebook. Bugs says that the sheet must have just fallen out when Bugs's gang, the Tigers, were cleaning their clubhouse.

  • How did Encyclopedia know Bugs was lying?

Pages 151 and 152 are the front and back of the same sheet.