Score-Based Measurement Invariance Tests for Multistage Testing (A - - PowerPoint PPT Presentation

β–Ά
score based measurement invariance tests for multistage
SMART_READER_LITE
LIVE PREVIEW

Score-Based Measurement Invariance Tests for Multistage Testing (A - - PowerPoint PPT Presentation

Department of Psychology - Psychological Methods, Evaluation and Statistics Score-Based Measurement Invariance Tests for Multistage Testing (A Tale of Two and a Half Tests) Rudolf Debelak, Dries Debeer Department of Psychology - Psychological


slide-1
SLIDE 1

Department of Psychology - Psychological Methods, Evaluation and Statistics

Score-Based Measurement Invariance Tests for Multistage Testing (A Tale of Two and a Half Tests)

Rudolf Debelak, Dries Debeer

slide-2
SLIDE 2

Department of Psychology - Psychological Methods, Evaluation and Statistics

Road Map

  • What are score-based DIF tests?
  • Adaptive Testing: MSTs (and CATs)
  • Two and a half solutions
  • A simulation study
  • Summary and future work

Page 2

slide-3
SLIDE 3

Department of Psychology - Psychological Methods, Evaluation and Statistics

What are score-based tests for DIF?

Score-based DIF tests detect an instability of item parameters with regard to a person covariate:

  • Age
  • Native language
  • Gender
  • …

Page 3

slide-4
SLIDE 4

Department of Psychology - Psychological Methods, Evaluation and Statistics

What are score-based tests for DIF?

  • Bradley-Terry Models (Strobl, Wickelmaier & Zeileis, 2011).
  • Factor analytical models (Merkle & Zeileis, 2013; Merkle, Fan & Zeileis,

2014)

  • Rasch models (Strobl, Kopf & Zeileis, 2015; Komboz, Strobl & Zeileis, 2016)
  • Normal-ogive IRT models (Wang, Strobl, Zeileis & Merkle, 2017)
  • Logistic IRT models(Debelak & Strobl, 2018)

Page 4

slide-5
SLIDE 5

Department of Psychology - Psychological Methods, Evaluation and Statistics

What are score-based tests for DIF?

Consider a statistic of model bias 𝐢𝑗 on the person level for each item

  • parameter. We assume that under the null model:
  • Its expected value for any person 𝐹(𝐢𝑗) is 0.
  • This statistic is independent and identically distributed for all test takers.

We now consider sums Οƒ 𝐢𝑗 over sufficiently large groups of test takers.

Page 5

slide-6
SLIDE 6

Department of Psychology - Psychological Methods, Evaluation and Statistics

What are score-based tests for DIF?

Consider a statistic of model bias 𝐢𝑗 on the person level for each item

  • parameter. We assume that under the null model:
  • Its expected value for any person 𝐹(𝐢𝑗) is 0.
  • This statistic is independent and identically distributed for all respondents.

We now consider sums Οƒ 𝐢𝑗 over sufficiently large groups of test takers. If our null model is correct,

  • Οƒ 𝐢𝑗 follows a normal distribution (Central Limit Theorem)
  • The related stochastic process is a Brownian bridge (Functional Central

Limit Theorem) These assumptions are met by individual score contributions for ML estimators (Hjort & Koning, 2002; Zeileis & Hornik, 2007).

Page 6

slide-7
SLIDE 7

Department of Psychology - Psychological Methods, Evaluation and Statistics

What are score-based tests for DIF?

Page 7

slide-8
SLIDE 8

Department of Psychology - Psychological Methods, Evaluation and Statistics

What are score-based tests for DIF?

Summary:

  • Obtain ML estimates for the item parameters.
  • Calculate the individual score contributions
  • Order the persons with regards to a person covariate of interest (gender, age).
  • Calculate the cumulative sums with regard to this order.
  • Compare the stochastic processes (the scores) with the process assumed

under the null models (by some test statistic) for an item of interest

Page 8

slide-9
SLIDE 9

Department of Psychology - Psychological Methods, Evaluation and Statistics

Β«Can you apply this to adaptive tests in R?Β»

Page 9

slide-10
SLIDE 10

Department of Psychology - Psychological Methods, Evaluation and Statistics

Adaptive Testing: MSTs (and CATs)

P(π‘Œπ‘—π‘˜ = 1|πœ„π‘—, π‘π‘˜, 𝑐

π‘˜) = exp(π‘π‘˜πœ„π‘—+π‘π‘˜) 1+exp(π‘π‘˜πœ„π‘—+π‘π‘˜)

  • Consider the 2PL model:
  • Further assume that we have a large set of items with known item parameters.

Page 10

slide-11
SLIDE 11

Department of Psychology - Psychological Methods, Evaluation and Statistics

Adaptive Testing: MSTs (and CATs)

Stage 1 Stage 2 Stage 3 Medium Medium Medium Easy Easy Difficult Difficult

Page 11

slide-12
SLIDE 12

Department of Psychology - Psychological Methods, Evaluation and Statistics

Β«Can you apply this to adaptive tests in R?Β»

Page 12

slide-13
SLIDE 13

Department of Psychology - Psychological Methods, Evaluation and Statistics

Test 1: Asymptotic Score-Based Tests

3 Steps: 1. Use the observed data from an adaptive test. 2. Treat the missing data as missing at random and estimate the item parameters. 3. Apply score-based DIF tests for this IRT model.

Page 13

slide-14
SLIDE 14

Department of Psychology - Psychological Methods, Evaluation and Statistics

Test 2: Bootstrap Score-Based Tests

5 Steps: 1. Consider the calibrated item parameters and person parameter estimates 2. For an item of interest, generate artificial responses based on your IRT model and the estimated person parameters. 3. Repeat Step 2 many (e.g., 1000) times. 4. Calculate a score-based statistic of model fit for the original and the artificial data. 5. Calculate p-values.

Page 14

slide-15
SLIDE 15

Department of Psychology - Psychological Methods, Evaluation and Statistics

Bootstrap Score-Based Tests οƒΌ Use calibrated item parameters οƒΌ Use person parameter estimates οƒΌ Calculate p-values based on Bootstrapping (or permutation) Asymptotic Score-Based Tests οƒΌ Estimate item parameters using an assumed distribution of person parameters οƒΌ Calculate p-values based on asymptotic results.

Page 15

slide-16
SLIDE 16

Department of Psychology - Psychological Methods, Evaluation and Statistics

An Evaluation with a Simulation Study

Design:

  • 1 – 3 – 3 MST design
  • 3 sample sizes: 200, 500, 1000 test takers
  • 3 lengths of modules: 9, 18, 36 items
  • 2PL model
  • Two known groups of equal size:
  • Impact absent / present
  • No DIF, DIF of 0.3 in a parameter, DIF of 0.6 in b parameter (4 in 9 items

per module)

  • Evaluation with Bootstrap score-based tests and asymptotic score-based

tests.

  • 500 repetitions per condition

Page 16

slide-17
SLIDE 17

Department of Psychology - Psychological Methods, Evaluation and Statistics

Results for the Bootstrap Test

Page 17

slide-18
SLIDE 18

Department of Psychology - Psychological Methods, Evaluation and Statistics

Results for the Bootstrap Test

Page 18

slide-19
SLIDE 19

Department of Psychology - Psychological Methods, Evaluation and Statistics

Results for the Bootstrap Test

Page 19

slide-20
SLIDE 20

Department of Psychology - Psychological Methods, Evaluation and Statistics

Results for the Bootstrap Test

Page 20

slide-21
SLIDE 21

Department of Psychology - Psychological Methods, Evaluation and Statistics

Results for the Asymptotic Test (only short modules)

Page 21

slide-22
SLIDE 22

Department of Psychology - Psychological Methods, Evaluation and Statistics

Results for the Asymptotic Test

Page 22

slide-23
SLIDE 23

Department of Psychology - Psychological Methods, Evaluation and Statistics

Results for the Asymptotic Test

Page 23

slide-24
SLIDE 24

Department of Psychology - Psychological Methods, Evaluation and Statistics

Results for the Asymptotic Test

Page 24

slide-25
SLIDE 25

Department of Psychology - Psychological Methods, Evaluation and Statistics

Summary

  • We presented two and a half tests for the flexible detection of DIF in adaptive

tests.

  • The Bootstrap score-based test uses the calibrated item parameters and

has higher power if these are correct. If not, it shows an increased Type I error.

  • The asymptotic score-based test estimates the item parameters from the

data, which makes it computationally intensive.

  • A third approach based on permutation leads to identical results as the

Bootstrap test.

  • These and other tests are available in the mstDIF package (Debelak, Debeer,

& Appelbaum, 2020).

Page 25

slide-26
SLIDE 26

Department of Psychology - Psychological Methods, Evaluation and Statistics

Thank you for your interest!

Page 26

slide-27
SLIDE 27

Department of Psychology - Psychological Methods, Evaluation and Statistics

References

Debelak, R., & Strobl, C. (2018). Investigating Measurement Invariance by Means of Parameter Instability Tests for 2PL and 3PL Models. Educational and Psychological Measurement, doi: 10.1177/0013164418777784 Hjort, N. L., & Koning, A. (2002). Tests for constancy of model parameters over time. Journal of Nonparametric Statistics, 14(1-2), 113-132. Merkle, E. C., Fan, J., & Zeileis, A. (2014). Testing for measurement invariance with respect to an ordinal variable. Psychometrika, 79 (4), 569-584. Merkle, E. C., & Zeileis, A. (2013). Tests of measurement invariance without subgroups: a generalization of classical

  • methods. Psychometrika, 78 (1), 59-82.

Strobl, C., Kopf, J., & Zeileis, A. (2015). Rasch trees: A new method for detecting differential item functioning in the Rasch

  • model. Psychometrika, 80 (2), 289-316.

Strobl, C., Wickelmaier, F., & Zeileis, A. (2011). Accounting for individual differences in Bradley-Terry models by means of recursive partitioning. Journal of Educational and Behavioral Statistics, 36 (2), 135-153. Wang, T., Strobl, C., Zeileis, A., & Merkle, E. C. (2017). Score-based tests of differential item functioning via pairwise maximum likelihood estimation. Psychometrika. doi: 10.1007/s11336-017-9591-8 Zeileis, A., & Hornik, K. (2007). Generalized M‐fluctuation tests for parameter instability. Statistica Neerlandica, 61 (4), 488- 508.

Page 27

slide-28
SLIDE 28

Department of Psychology - Psychological Methods, Evaluation and Statistics

Appendix

Page 28

slide-29
SLIDE 29

Department of Psychology - Psychological Methods, Evaluation and Statistics

Results for the Bootstrap Test

Page 29

slide-30
SLIDE 30

Department of Psychology - Psychological Methods, Evaluation and Statistics

Results for the Bootstrap Test

Page 30

slide-31
SLIDE 31

Department of Psychology - Psychological Methods, Evaluation and Statistics

Results for the Asymptotic Test

Page 31

slide-32
SLIDE 32

Department of Psychology - Psychological Methods, Evaluation and Statistics

Results for the Asymptotic Test

Page 32