6/17/2011 Screening Linking Items with Varied approaches C. Allen - - PDF document

▶

May 24, 2023 142 likes •227 views

6/17/2011 Screening Linking Items with Varied approaches C. Allen Lau, Ph.D. Pearson Liru Zhang, Ph.D. Delaware Department of Education Equating Equating is a statistical process used to adjust scores on test forms so that scores on these

SLIDE 1

6/17/2011 1

Screening Linking Items with Varied approaches

C. Allen Lau, Ph.D.

Pearson Liru Zhang, Ph.D. Delaware Department of Education

Screen Linking Items 3

Equating

Equating is a statistical process used to adjust scores on test forms so that scores on these equated forms can be used interchangeably (Kolen & Brennan, 2004). Common-item nonequivalent groups design

SLIDE 2

6/17/2011 2

Screen Linking Items 4

Differential item functioning (DIF)

DIF exists when examinees of the same ability from different groups have a different probability of giving a certain response on a test item. DIF analysis could provide an indication of unexpected behavior of items. DIF analysis methods can be also employed to detect the linking item stability in equating

Screen Linking Items 5

Net and global DIF

Penfield defined two types of differential item functioning, naming net DIF and global DIF (Penfield, 2010). Net DIF concerns the item response across score points. Global DIF concerns the item response within each score point.

Screen Linking Items 6

Observed-score base (OSB) method for detecting net and global DIF

In conception, Mantel matches net DIF while generalized Mantel- Haenszel (GMH) matches global DIF in IRT partial credit model. In practice, Mantel is suitable for identifying net DIF while GMH is suitable for identifying global DIF. The item will be flagged if the critical p-value (significance level) is less than a preset value, say 0.05 (i.e., probability<0.05).

SLIDE 3

6/17/2011 3

Screen Linking Items 7

IRT-base method for detecting net and global DIF

Item parameter value comparison (IPVC) approach

measure DIF by comparing the item parameter values of the same

item estimated from different groups

1. Average item step-parameter value comparison (AISVC)

– detecting net DIF

2. Item step-parameter value comparison (ISVC)

– detecting global DIF

The item will be flagged if the difference is large than a preset criterion, say 0.5 logit in absolute value (i.e., D>|0.5|)

Screen Linking Items 8

Study: methods for screening linking items

Investigate different screening methods to identify unstable anchor items in IRT equating using Rasch & partial credit models under different DIF conceptions Screening method IRT-base and observed-score base methods Other independent variables

type of DIF (net or global)
equating sample sizes
DIF intensity
flagging criterion

Screen Linking Items 9

Methods (1)

Monte Carlo simulation 64 combinations of conditions Independent variables Detecting method Item parameter value comparison (IPVC) method

Average item step-parameter value comparison (AISVC)
Item step-parameter value comparison (ISVC)

Observed-score base (OSB) method

Mantel
GMH

SLIDE 4

6/17/2011 4

Screen Linking Items 10

Methods (2)

DIF conception

Net DIF
Global DIF

Equating Sample size

8000
4000
2000
1000

Screen Linking Items 11

Methods (3)

DIF intensity in logit

Flagging criterion For IPVC approach D > |0.5| logit D > |0.3| logit For OSB approach p < 0.05 p < 0.01

Screen Linking Items 12

Methods (4)

Evaluation criterion (dependent variable)

False positive: non-DIF item is classified as DIF item
False negative: DIF item is classified as non-DIF item
Total error: false positive + false negative
Accuracy: 1 – total error

SLIDE 5

6/17/2011 5

Screen Linking Items 13

Results: detecting method

Accuracy rate IPVC (IRT base) AISVC: 0.972 ISVC: 0.955 Observed-score base Mantel: 0.896 GMH: 0.906

IPVC method was found outperforming OSB method by 6.3% in

accuracy in average.

Screen Linking Items 14

Results: DIF conception

Accuracy rate net DIF: 0.937 global DIF: 0.931

Across different conditions, the accuracy rates from net DIF and

global DIF conception detecting methods were very close.

Screen Linking Items 15

Results: equating sample size

Accuracy rate N=8000 IPVC:0.972, OSB: 0.806 N=4000 IPVC:0.979, OSB: 0.875 N=2000 IPVC:0.972, OSB: 0.965 N=1000 IPVC:0.931, OSB: 0.958

accuracy of IPVC approach was more independent to sample size.
accuracy of OSB approach was negatively correlated to sample size.

SLIDE 6

6/17/2011 6

Screen Linking Items 16

Results: DIF intensity

Accuracy rate DIFI=0.5 IPVC: 0.948 OSB: 0.906 DIFI=1.0 IPVC: 0.979 OSB: 0.896

IPVC approach was sensitive to DIFI.
OSB approach was not so sensitive to DIFI.

Screen Linking Items 17

Results: flagging criterion

Accuracy rate IPVC D>|0.5|: 0.955 D>|0.3|: 0.972 OSB p<0.05: 0.851 p<0.01: 0.951

Both methods perform better by setting more strict flagging criteria

especially for OSB approach.

Screen Linking Items 18

Summary & Discussion (1)

Both IRT-base and OSB methods could be applied to screen linking
items. IPVC approach especially shows promising.
No significant difference was found in applying different DIF

conceptions.

Like other χ2 tests, Mantel and GMH are sensitive to sample size.

Both committed much more false positive errors when the equating sample size was large.

The flagging philosophy and mechanism are different in the IRT-base

and OSB approaches.

– IPVC aims at detecting strength of DIF – OSB aims at controlling type I error

SLIDE 7

6/17/2011 7

Screen Linking Items 19

Summary & Discussion (2)

Compared with OSB, the way to set flagging criterion in IPVC

approach sounds more natural and it could effectively detect the DIF intensity.

In this study, IPVC was found a better approach for screening linking

item in terms of accuracy, convenience, & information.

– The average accuracy of IPVC was 6.3% higher than OSB – OSB approach needs running extra analysis – IPVC result provides DIF intensity value & direction (value can be

+, -, or 0)

– IPVC result is more stable with different sample sizes

Thank you

Screen Linking Items 21