Influence.ME: Tools for detecting influential data in mixed models - - PowerPoint PPT Presentation

influence me tools for detecting influential data in
SMART_READER_LITE
LIVE PREVIEW

Influence.ME: Tools for detecting influential data in mixed models - - PowerPoint PPT Presentation

Influence.ME: Tools for detecting influential data in mixed models Rense Nieuwenhuis // Ben Pelzer // Manfred te Grotenhuis A first indication something may go wrong ... A first indication something may go wrong ... Math score by Class


slide-1
SLIDE 1

Influence.ME: Tools for detecting influential data in mixed models

Rense Nieuwenhuis // Ben Pelzer // Manfred te Grotenhuis

slide-2
SLIDE 2

A first indication something may go wrong ...

slide-3
SLIDE 3

A first indication something may go wrong ...

  • 2.0

2.5 3.0 3.5 4.0 4.5 5.0 45 50 55 60

Math score by Class Structure, by school

Level of Class Structure Average Math Test Score

slide-4
SLIDE 4
  • 2.0

2.5 3.0 3.5 4.0 4.5 5.0 45 50 55 60

Math score by Class Structure, by school

Level of Class Structure Average Math Test Score

A first indication something may go wrong ...

slide-5
SLIDE 5
  • 2.0

2.5 3.0 3.5 4.0 4.5 5.0 45 50 55 60

Math score by Class Structure, by school

Level of Class Structure Average Math Test Score

A first indication something may go wrong ...

slide-6
SLIDE 6
  • 2.0

2.5 3.0 3.5 4.0 4.5 5.0 45 50 55 60

Math score by Class Structure, by school

Level of Class Structure Average Math Test Score

A first indication something may go wrong ...

slide-7
SLIDE 7

Mixed models in Social Sciences

slide-8
SLIDE 8

Mixed models in Social Sciences

  • Mixed, Multilevel, or Hierarchical Models
  • Observations nested within “groups”
  • Explanatory variables at all “levels”
slide-9
SLIDE 9

Mixed models in Social Sciences

  • Mixed, Multilevel, or Hierarchical Models
  • Observations nested within “groups”
  • Explanatory variables at all “levels”
  • High-N Surveys
  • General Social Survey (n = 51,020)
  • World Value Survey (n = 267,870)
slide-10
SLIDE 10

Mixed models in Social Sciences

  • Mixed, Multilevel, or Hierarchical Models
  • Observations nested within “groups”
  • Explanatory variables at all “levels”
  • High-N Surveys
  • General Social Survey (n = 51,020)
  • World Value Survey (n = 267,870)
  • Small number of “groups” (van der Meer et al. 2009)
  • No country-comparative study exceeds 54 countries
  • Re-evaluation of risk for influential data
slide-11
SLIDE 11

Measures of Influential Data

slide-12
SLIDE 12

Measures of Influential Data

  • Compare estimates including a particular case to the estimates without that

particular case

  • In multilevel regression: case=group
slide-13
SLIDE 13

Measures of Influential Data

  • Compare estimates including a particular case to the estimates without that

particular case

  • In multilevel regression: case=group
  • DFbetaS: standardized difference in magnitude of single parameter estimate

(Belsley et al., 1980)

slide-14
SLIDE 14

Measures of Influential Data

  • Compare estimates including a particular case to the estimates without that

particular case

  • In multilevel regression: case=group
  • DFbetaS: standardized difference in magnitude of single parameter estimate

(Belsley et al., 1980)

  • Cook’s Distance: standardized summary measure of influence on (one or)

multiple parameter estimates (Cook 1977, Belsley et al., 1980)

slide-15
SLIDE 15

Measures of Influential Data

  • Compare estimates including a particular case to the estimates without that

particular case

  • In multilevel regression: case=group
  • DFbetaS: standardized difference in magnitude of single parameter estimate

(Belsley et al., 1980)

  • Cook’s Distance: standardized summary measure of influence on (one or)

multiple parameter estimates (Cook 1977, Belsley et al., 1980)

  • Improvement in influence.ME: cases not deleted, but influence neutralized by

altered intercept + dummy variable (Langford & Lewis, 1998)

slide-16
SLIDE 16

Influence.ME: Analytical Steps

slide-17
SLIDE 17

Influence.ME: Analytical Steps

Original model

slide-18
SLIDE 18

Influence.ME: Analytical Steps

Original model

estex()

Estimates without influence group 'j'

slide-19
SLIDE 19

Influence.ME: Analytical Steps

Original model ME.cook() ME.dfbetas()

No influential data? Correct(ed) model Identification of influential data

estex()

Estimates without influence group 'j'

slide-20
SLIDE 20

Influence.ME: Analytical Steps

Original model ME.cook() ME.dfbetas()

No influential data? Correct(ed) model Identification of influential data

exclude.influence()

Corrected model to re-check

estex()

Estimates without influence group 'j'

slide-21
SLIDE 21
  • 2.0

2.5 3.0 3.5 4.0 4.5 5.0 45 50 55 60

Math score by Class Structure, by school

Level of Class Structure Average Math Test Score

Again, a first indication something is wrong ...

slide-22
SLIDE 22

Example: School 23 (Kreft & De Leeuw, 1998)

Linear mixed model fit by REML Formula: math ~ structure + (1 | school.ID) Number of obs: 519, groups: school.ID, 23 Fixed effects: Estimate Std. Error t value (Intercept) 60.002 5.853 10.252 structure -2.343 1.456 -1.609

slide-23
SLIDE 23

Example: School 23 (Kreft & De Leeuw, 1998)

Linear mixed model fit by REML Formula: math ~ structure + (1 | school.ID) Number of obs: 519, groups: school.ID, 23 Fixed effects: Estimate Std. Error t value (Intercept) 60.002 5.853 10.252 structure -2.343 1.456 -1.609

slide-24
SLIDE 24

Example: School 23 (Kreft & De Leeuw, 1998)

Linear mixed model fit by REML Formula: math ~ structure + (1 | school.ID) Number of obs: 519, groups: school.ID, 23 Fixed effects: Estimate Std. Error t value (Intercept) 60.002 5.853 10.252 structure -2.343 1.456 -1.609

slide-25
SLIDE 25
slide-26
SLIDE 26

Cook's Distances

Cook's Distance School Identifier

7194 7801 72080 25456 24371 7930 72292 72991 6467 25642 68493 47583 46417 26537 68448 6327 6053 24725 7474 7829 54344 62821 7472 0.0 0.2 0.4 0.6 0.8 1.0

slide-27
SLIDE 27

Cook's Distances

Cook's Distance School Identifier

7194 7801 72080 25456 24371 7930 72292 72991 6467 25642 68493 47583 46417 26537 68448 6327 6053 24725 7474 7829 54344 62821 7472 0.0 0.2 0.4 0.6 0.8 1.0

slide-28
SLIDE 28

Cook's Distances

Cook's Distance School Identifier

7194 7801 72080 25456 24371 7930 72292 72991 6467 25642 68493 47583 46417 26537 68448 6327 6053 24725 7474 7829 54344 62821 7472 0.0 0.2 0.4 0.6 0.8 1.0

slide-29
SLIDE 29

Adjusted Model

slide-30
SLIDE 30

Adjusted Model

> model.7472 <- exclude.influence(model.simple, + "school.ID", + "7472")

slide-31
SLIDE 31

Adjusted Model

> model.7472 <- exclude.influence(model.simple, + "school.ID", + "7472") > model.62821 <- exclude.influence(model.7472, + "school.ID", + "62821")

slide-32
SLIDE 32

Adjusted Model

Fixed effects: Estimate Std. Error t value intercept.alt 64.285 6.353 10.119 estex.62821 73.069 4.735 15.432 estex.7472 52.571 3.600 14.602 structure -3.416 1.535 -2.226 > model.7472 <- exclude.influence(model.simple, + "school.ID", + "7472") > model.62821 <- exclude.influence(model.7472, + "school.ID", + "62821")

slide-33
SLIDE 33

Adjusted Model

Fixed effects: Estimate Std. Error t value intercept.alt 64.285 6.353 10.119 estex.62821 73.069 4.735 15.432 estex.7472 52.571 3.600 14.602 structure -3.416 1.535 -2.226 > model.7472 <- exclude.influence(model.simple, + "school.ID", + "7472") > model.62821 <- exclude.influence(model.7472, + "school.ID", + "62821")

slide-34
SLIDE 34

Adjusted Model

Fixed effects: Estimate Std. Error t value intercept.alt 64.285 6.353 10.119 estex.62821 73.069 4.735 15.432 estex.7472 52.571 3.600 14.602 structure -3.416 1.535 -2.226 > model.7472 <- exclude.influence(model.simple, + "school.ID", + "7472") > model.62821 <- exclude.influence(model.7472, + "school.ID", + "62821")

slide-35
SLIDE 35

Adjusted Model

Fixed effects: Estimate Std. Error t value intercept.alt 64.285 6.353 10.119 estex.62821 73.069 4.735 15.432 estex.7472 52.571 3.600 14.602 structure -3.416 1.535 -2.226 > model.7472 <- exclude.influence(model.simple, + "school.ID", + "7472") > model.62821 <- exclude.influence(model.7472, + "school.ID", + "62821")

slide-36
SLIDE 36

Known Issues & Future Development

slide-37
SLIDE 37

Known Issues & Future Development

  • Modification of intercept
  • More difficult to converge
  • Fails with factor-variables in model
  • Solution: use delete=TRUE in estex()
slide-38
SLIDE 38

Known Issues & Future Development

  • Modification of intercept
  • More difficult to converge
  • Fails with factor-variables in model
  • Solution: use delete=TRUE in estex()
  • Currently, only fixed effects
  • Measures of influence for random effects available
slide-39
SLIDE 39

Known Issues & Future Development

  • Modification of intercept
  • More difficult to converge
  • Fails with factor-variables in model
  • Solution: use delete=TRUE in estex()
  • Currently, only fixed effects
  • Measures of influence for random effects available
  • Can be highly computational intensive
  • split over multiple sessions / computers
slide-40
SLIDE 40

Known Issues & Future Development

  • Modification of intercept
  • More difficult to converge
  • Fails with factor-variables in model
  • Solution: use delete=TRUE in estex()
  • Currently, only fixed effects
  • Measures of influence for random effects available
  • Can be highly computational intensive
  • split over multiple sessions / computers
  • Development continues in Rennes ...
  • Partial residual plots
slide-41
SLIDE 41

http://www.rensenieuwenhuis.nl/r-project/influenceme/

slide-42
SLIDE 42

Discussion on Influential Data in Sociology

  • Original Article:
  • Ruiter, Stijn and De Graaf, Nan Dirk. 2006. National context, religiosity, and

volunteering: results from 53 countries. American Sociological Review 71: 191-210.

  • Research Note:
  • Meer, T. van der, te Grotenhuis, M., and Pelzer, B. (2010). Influential cases in

multilevel modeling. a methodological comment on Ruiter and de Graaf (asr, 2006). American Sociological Review, accepted for publication.

  • Response to Research Note:
  • Ruiter, Stijn and De Graaf, Nan Dirk. (2010). National Religious Context and

Volunteering:More Rigorous Tests Supporting the Association. American Sociological Review, accepted for publication.

slide-43
SLIDE 43

References

  • Bates, D., Maechler, M., and Dai, B. (2008). lme4: Linear mixed-effects models using S4 classes. R

package version 0.999375-28.

  • Belsley, D. A., Kuh, E., and Welsch, R. E. (1980). Regression Diagnostics. Identifying Influential Data

and Sources of Collinearity. Wiley.

  • Cook, R. D. (1977). Detection of influential observation in linear regression. Technometrics, 19(1):15–

18.

  • Kreft, I. and De Leeuw, J. (1998). Introducing Multilevel Modelling. Sage Publications.
  • Langford, I. H. and Lewis, T. (1998). Outliers in multilevel data. Journal of the Royal Statistical

Society: Series A (Statistics in Society), 161:121–160.

  • Nieuwenhuis, R., Pelzer, B., and Te Grotenhuis, M. (2009). influence.ME: Tools for detecting

influential data in mixed models. R package version 0.7.

  • Meer, T. van der, te Grotenhuis, M., and Pelzer, B. (2010). Influential cases in multilevel modeling. a

methodological comment on ruiter and de graaf (asr, 2006). American Sociological Review, accepted for publication.

  • Snijders, T. A. and Berkhof, J. (2008). Diagnostic checks for multilevel models. In De Leeuw, J. and

Meijer, E., editors, Handbook of Multilevel Analysis, chapter 3, pages 141–175. Springer.

slide-44
SLIDE 44

Formulae

C0F

j

=

1 r+1(ˆ

γ − ˆ γ(−j))′ ˆ −1

F (ˆ

γ − ˆ γ(−j))

4 n

Cutoff:

d fbetasij =

ˆ γi− ˆ γi(−j) se( ˆ γi(−j))

2/√n

Cutoff:

Cook’s distance: (Snijders & Berkhof, 2008) DFBETAS: (Belsley et al., 1980)