Flexible Latent Trait Metrics An Application of the Filtered - - PowerPoint PPT Presentation

flexible latent trait metrics
SMART_READER_LITE
LIVE PREVIEW

Flexible Latent Trait Metrics An Application of the Filtered - - PowerPoint PPT Presentation

Flexible Latent Trait Metrics An Application of the Filtered Monotonic Polynomial Item Response Model Leah Feuerstahler University of California, Berkeley 1/78 Overview Premise : In many applications of item response theory (IRT), reported


slide-1
SLIDE 1

1/78

Flexible Latent Trait Metrics

An Application of the Filtered Monotonic Polynomial Item Response Model Leah Feuerstahler

University of California, Berkeley

slide-2
SLIDE 2

2/78

Overview

Premise: In many applications of item response theory (IRT), reported scores are nonlinear transformations of the IRT θ estimates. Goal: Develop an IRT framework such that θ is the continuous metric on which scores are reported.

slide-3
SLIDE 3

3/78

Overview

−6 −4 −2 2 4 6 0.0 0.2 0.4 0.6 0.8 1.0 θ Probability 20 40 60 80 0.0 0.2 0.4 0.6 0.8 1.0 True Score (T) Probability

slide-4
SLIDE 4

4/78

Overview

Premise: In many applications of item response theory (IRT), reported scores are nonlinear transformations of the IRT θ estimates. Goal: Develop an IRT framework such that θ is the continuous metric on which scores are reported.

1 Why

  • Why is the IRT θ metric often transformed?
  • Why is an IRT for transformed metrics needed?

2 How

  • Filtered monotonic polynomial (FMP) item response model
  • Item parameter linking

3 Applications

  • Functional metric transformations
  • Estimated metric transformations

4 Considerations, Limitations, Future Directions

slide-5
SLIDE 5

5/78

1 Why 2 How 3 Applications 4 Considerations, Limitations, Future Directions

slide-6
SLIDE 6

6/78

Scaling

[T]he process of associating numbers or other ordered indicators with the performance of examinees.1 Scaled scores are often transformations of number-correct scores or IRT ˆ θ. What are the criteria for selecting a scale? Examples:

1 Facilitates appropriate interpretation by the public 2 Anchored to external indicators 3 Consistent with intuitions about how variables should behave

1Kolen and Brennan (2014, p. 371)

slide-7
SLIDE 7

7/78

Scaling

1 Facilitates appropriate interpretation by the public

  • Normalized scores for a representative sample
  • z-scores (mean 0, sd 1)
  • T-scores (mean 50, sd 10)
  • Scores range from 0 to test length, or 0 to 100
  • Domain Scores1
  • Optimal Scores2
  • Equated number-correct3
  • Constant measurement error
  • ACT scores (arcsine transformation of number-correct)4
  • Constant IRT information5

1Bock, Thissen, & Zimowski (1997) 2Ramsay & Wiberg (2017) 3Stocking (1996) 4Kolen (1988) 5Samejima (1979)

slide-8
SLIDE 8

8/78

Scaling

2 Anchored to external indicators

  • Expected number-correct scores1
  • Grade-equivalent scores2
  • Equating with a different test form
  • Linear relationship with other variables (intended use)3
  • Dollars are nonlinearly related to quality of life4
  • Typed words per minute is nonlinearly related to practice/effort5

1Stocking (1996) 2Schulz & Nicewander (1997) 3Nunnally (1967, p. 28) 4Jones (1971) 5Angoff (1971, pp. 509-510)

slide-9
SLIDE 9

9/78

Scaling

3 Consistent with beliefs about how variables should behave

  • Normally distributed ability1
  • Uncorrelated difficulty and discrimination parameters2
  • Does variability of achievement increase or decrease with grade level?
  • With Thurstonian scales, variability usually increases with grade level
  • IRT scales often exhibit “scale shrinkage”3
  • “Armchair” theorizing can lead to conflicting answers4
  • Interval level measurement “in some sense”5

1Thurstone (1925) 2Lord (1975) 3Camilli (1988) 4Yen (1986, p. 312) 5Kolen & Brennan (2014, p. 374)

slide-10
SLIDE 10

10/78

Interval vs. Ordinal

Stevens (1946): Nominal Ordinal Interval Ratio Scale type defined in terms of admissible operations. Ordinal Interval Any monotonic transformation Only linear transformations Invariant ordering of observations Meaningful intervals Median, Percentiles Mean, Standard deviation Hardness of minerals Temperature

slide-11
SLIDE 11

11/78

Interval vs. Ordinal

Interval-level measurement is highly desirable for educational and psychological tests. What is actually MEANT by interval-level measurement?

  • Only linear transformations are admissible given the IRT model
  • The (Rasch) model fits
  • Declaring that scores are equal-interval ‘in some sense’1
  • Scores are linearly related to the underlying construct2

1Kolen & Brennan (2014, p. 374) 2Yen (1986)

slide-12
SLIDE 12

12/78

Where Does the IRT θ Come From?

What do item response models assume? Simple case: Mokken’s (1971) monotone homogeneity model (MHM) assumes only

1 Unidimensionality 2 Local independence 3 Monotonicity

If the MHM assumptions hold, individuals can be ordered uniquely.

slide-13
SLIDE 13

13/78

Where Does the IRT θ Come From?

Under the MHM assumptions, any monotonic function of the latent trait implies an equally admissible item response model.1 Suppose an IRT model with item response function (IRF) Pi(θ). For a continuous monotonic function h, where θ⋆ = h−1(θ), another item response model exists such that Pi(θ) = P ⋆

i [h−1(θ)] = P ⋆ i (θ⋆).

Any reason to prefer θ to θ⋆?

1Lord (1975)

slide-14
SLIDE 14

14/78

Where Does the IRT θ Come From?

Under the MHM assumptions, an infinite number of IRT models can fit data equally well. Identification restrictions are needed in practice. Two main solutions:

1 Parametric IRT (PIRT)

  • Specify the IRF shape
  • (Usually) determines scale up to linear transformations
  • Assumes that the chosen IRF shape(s) fits all scale items

2 Nonparametric IRT (NIRT)

  • Specify the latent trait distribution (e.g., standard normal)
  • Often conditions on (a monotonic transformation of) sum scores
  • Nonparametrically estimates the IRF shape
slide-15
SLIDE 15

15/78

Nonlinear Transformations of the IRT Metric

What does not change?

  • Ordering of examinees
  • Percentile rankings
  • Relative efficiency of item response curves

What does change?

  • Item and test information
  • Standard errors
  • Confidence intervals
  • Reliability
slide-16
SLIDE 16

16/78

Item Information

Metric transformations can have dramatic effects on information functions. Lord (1974, p. 353): Ii(θ) = Ii(θ⋆)

  • ∂h(θ⋆)

∂θ⋆

2 The trait level that maximizes Ii(θ) need not be the corresponding trait level that maximizes Ii(θ⋆).

slide-17
SLIDE 17

17/78

Metric Transformations

−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0

θ Probability

A

−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0

θ* Probability

B

slide-18
SLIDE 18

18/78

Metric Transformations

−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0

θ Information

C

−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0

θ* Information

D

slide-19
SLIDE 19

19/78

Relative Efficiency

The relative efficiency of two information functions does not change with metric transformations.1 RE = I⋆

1(θ⋆ n)

I⋆

2(θ⋆ n) = I1(θn)

I2(θn) The relative information provided by each item is invariant to monotonic transformations of the latent trait. The maximally informative item for a trait level is invariant to metric transformations.

1Lord (1974, 1980, p. 89)

slide-20
SLIDE 20

20/78

Relative Efficiency

−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0

θ Probability

A

−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0

θ* Probability

B

slide-21
SLIDE 21

21/78

Relative Efficiency

−3 −2 −1 1 2 3 0.0 0.5 1.0 1.5 2.0

θ Information

C

−3 −2 −1 1 2 3 0.0 0.5 1.0 1.5 2.0

θ* Information

D

slide-22
SLIDE 22

22/78

Why Specify IRT on a Transformed Metric?

  • Parsimony (avoid multi-step analyses)
  • Many scale transformations (e.g., quadratic) do not enforce

monotonicity

  • Computerized adaptive testing (CAT)
  • Many item selection and termination rules are metric-dependent
  • CAT requires computationally efficient methods
  • No need to repeatedly solve for transformed quantities
  • Statistical properties (e.g., bias) of ˆ

θ can change with metric transformations1

  • Appropriately account for measurement error when evaluating the

relationship between the latent variable and external variables

1Yi et al. (2001)

slide-23
SLIDE 23

23/78

Desiderata for a Flexible-Metric IRT

  • Continuous, invertible metric transformations
  • Flexible, ability to express any continuous monotonic transformation
  • Model parameters that are readily portable to new contexts
  • Closed-form derivatives for computing information, standard errors,

trait estimates

  • Reduction to commonly used IRT models (Rasch, 2PL, 3PL, etc.)
slide-24
SLIDE 24

24/78

1 Why 2 How 3 Applications 4 Considerations, Limitations, Future Directions

slide-25
SLIDE 25

25/78

Filtered Monotonic Polynomial IRT

Proposed as a new NIRT model by Liang & Browne (2015). Based on the work of Elphinstone (1983, 1985). Pi(θ) = H[mi(θ)] = {1 + exp[−mi(θ)]}−1 where mi(θ) = b0i + b1iθ + b2iθ2 + · · · + b2ki+1,iθ2ki+1

  • bi = (b0i, b1i, . . . , b2ki+1,i)′: item parameters/polynomial coefficients
  • ki: item complexity parameter, higher ki → greater flexibility
  • If ki = 0, FMP reduces to 2PL (slope-intercept parameterization)
slide-26
SLIDE 26

26/78

Filtered Monotonic Polynomial IRT

With high enough ki, FMP can closely approximate any IRF that meets the MHM assumptions. Closeness of approximation can be characterized by the root integrated mean squared error (RIMSE)1: RIMSEi =

  • [ ˆ

Pi(θ) − Pi(θ)]2g(θ)dθ g(θ) is the standard normal distribution

1Ramsay (1991)

slide-27
SLIDE 27

27/78

Example FMP Approximations

RIMSEi = {.034, .034, .004} for ki = {0, 1, 2}

−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0

Four−Parameter Model

θ Probability

A

True ki = 0 ki = 1 ki = 2

slide-28
SLIDE 28

28/78

Example FMP Approximations

RIMSEi = {.031, .006, .004} for ki = {0, 1, 2}

−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0

Logical Positive Exponent Model

θ Probability

B

True ki = 0 ki = 1 ki = 2

slide-29
SLIDE 29

29/78

Example FMP Approximations

RIMSEi = {.091, .025, .007} for ki = {0, 1, 2}

−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0

Mixture Normal Model

θ Probability

C

True ki = 0 ki = 1 ki = 2

slide-30
SLIDE 30

30/78

Example FMP Approximations

RIMSEi = {.085, .018, .010} for ki = {0, 1, 2}

−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0

FMP with k i = 8

θ Probability

D

True ki = 0 ki = 1 ki = 2

slide-31
SLIDE 31

31/78

Ensuring Monotonicity

Need to ensure that Pi(θ) is a monotonic function of θ. If mi(θ) is monotonic, then Pi(θ) is also monotonic. mi(θ) is monotonic iff its first derivative, ∂mi(θ) ∂θ = a0i + a1iθ + · · · + a2ki,iθ2ki, is nonnegative at all θ.

slide-32
SLIDE 32

32/78

A Parameter Transformation

bi = (b0i, b1i, . . . , b2ki+1,i)′ are item parameters/polynomial coefficients. Let b0i = ξi and bsi = as−1,i s for s = 1, 2, . . . , 2ki + 1. ai = (a0i, a1i, . . . , a2ki,i)′ are polynomial coefficients of ∂mi(θ)

∂θ

.

slide-33
SLIDE 33

33/78

A Parameter Transformation

Recall: we want ∂mi(θ) ∂θ = a0i + a1iθ + · · · + a2ki,iθ2ki to be nonnegative everywhere. If ∂mi(θ)

∂θ

is nonnegative everywhere, then we can write ∂mi(θ) ∂θ = λi

ki

  • s=1
  • 1 − 2αsiθ +
  • α2

si + βsi

  • θ2

if ki ≥ 1 = λi if ki = 0, where λi ≥ 0 and βsi ≥ 0, s = 1, . . . ki. Transformed parameters defined on the entire real line: ωi = ln(λi) and τsi = ln(βsi)

slide-34
SLIDE 34

34/78

A Parameter Transformation

To fit the FMP model, estimate γi = (ξi, ωi, α1i, τ1i, α2i, τ2i, · · · , α2ki,i, τ2ki,i)′, instead of bi = (b0i, b1i, . . . , b2ki+1,i)′, to ensure monotonically increasing IRFs. Details of FMP model estimation can be found elsewhere1

1Falk & Cai (2016a, 2016b); Feuerstahler (2016); Liang & Browne (2015)

slide-35
SLIDE 35

35/78

Linking

IRT models need to be identified.

  • PIRT: Specifying IRF shape (up to a linear transformation)
  • NIRT: Specifying latent trait distribution

In IRT-based linking, linearly transform item and person parameters to put multiple test calibrations have the same underlying metric. Linking resolves the metric differences that result from differences in model identification restrictions.1 Linking allows researchers to pass among models that make equivalent predictions.

1van der Linden & Barrett (2015)

slide-36
SLIDE 36

36/78

Linking

Recall: when the chosen form of a PIRT model is appropriate for all items, then θ is determined up to a linear transformation. Nonlinear monotonic transformations of θ do not affect model predictions. Nonlinear transformations of θ result in a different IRF form. Under the FMP model, both linear and nonlinear transformations of the θ metric can be modeled explicitly.

  • ⋆: parameters on the transformed metric
  • If ki = k⋆

i : linear transformation

  • If ki = k⋆

i : nonlinear transformation

slide-37
SLIDE 37

37/78

Linear Linking

−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0

θ Probability

A

−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0

θ* Probability

B

slide-38
SLIDE 38

38/78

Nonlinear Linking

−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0

θ Probability

A

−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0

θ* Probability

B

slide-39
SLIDE 39

39/78

Linear Linking

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

θ θ*

C

−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0

θ Density

g(θ) g(h(θ*))

D

slide-40
SLIDE 40

40/78

Nonlinear Linking

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

θ θ*

C

−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4 0.5

θ Density

g(θ) g(h(θ*))

D

slide-41
SLIDE 41

41/78

Linear Linking with FMP

Linking relationships are defined in the population. Goal: find b⋆

i from bi.

If the metric transformation is linear, then θ = t0 + t1θ⋆, where t0 and t1 are the linking parameters.

slide-42
SLIDE 42

42/78

Linear Linking with FMP

Fundamental equation: P(yin = 1|θn, bi) = P(yin = 1|θ⋆

n, b⋆ i )

  • 1 + exp

2ki+1

  • r=0

briθr −1 =

  • 1 + exp

2ki+1

  • s=0

b⋆

siθ⋆s

−1

2ki+1

  • r=0

briθr =

2ki+1

  • s=0

b⋆

siθ⋆s

. . . . . . . . .

2ki+1

  • r=0

bri

r

  • u=0

r u

  • (t1θ⋆)r−u t0u

=

2ki+1

  • s=0

b⋆

siθ⋆s

Strategy: Matching polynomial coefficients.

slide-43
SLIDE 43

43/78

Linear Linking with FMP

From previous slide:

2ki+1

  • r=0

bri

r

  • u=0

r u

  • (t1θ⋆)r−u t0u =

2ki+1

  • s=0

b⋆

siθ⋆s

Result: b⋆

si = 2ki+1

  • r=s
  • r

r − s

  • ts

1tr−s

bri for s = 0, . . . 2k⋆

i + 1.

slide-44
SLIDE 44

44/78

Nonlinear Linking with FMP

Let k⋆

i > ki for all i = 1, . . . , I.

This relationship is a mathematical property of the FMP model, and need not hold when FMP curves are estimated. Linking coefficients t0, t1, . . . , t2kθ+1 define the metric transformation: θ = h(θ⋆) =

2kθ+1

  • l=0

tlθ⋆l

slide-45
SLIDE 45

45/78

Nonlinear Linking with FMP

Suppose mi(θ) = 2ki+1

s=0

bsiθs and m⋆

i (θ⋆) = 2k⋆

i +1

s=0

b⋆

siθ⋆s.

By substitution, mi(θ) =

2ki+1

  • s=0

bsi 2kθ+1

  • l=0

tlθ⋆l

  • Fundamental equation:

mi(θ) = m⋆

i (θ⋆) 2ki+1

  • s=0

bsi 2kθ+1

  • l=0

tlθ⋆l

  • =

2k⋆

i +1

  • s=0

b⋆

siθ⋆s

slide-46
SLIDE 46

46/78

Transformed Item Complexities

Three complexities:

  • ki
  • k⋆

i

Matching coefficients from the fundamental equation gives 2k⋆

i + 1 = (2kθ + 1)(2ki + 1)

which implies that k⋆

i

= (2kθ + 1)(2ki + 1) − 1 2 = 2kikθ + ki + kθ. Again, this relationship holds in the population.

slide-47
SLIDE 47

47/78

Linking with FMP: General Solution

The transformation from bi to b⋆

i can be expressed in matrix notation.

b⋆

i = W bi

Define W with dimension (2k⋆

i + 2) × (2ki + 2).

W =          w1 w2 · · · w2ki+2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .          ws is of length (s − 1)(2kθ + 1) + 1, s = 1, . . . , 2ki + 2.

slide-48
SLIDE 48

48/78

Linking with FMP: General Solution

The ws vectors are found recursively. Define V (s) with dimension (s(2kθ + 1) + 1) × (2kθ + 2). V (s) =          ws · · · . . . ws · · · . . . ws · · · . . . ws          Set w1 = 1 and w2 = t = (t0, t1, . . . , t2kθ+1)′. Then, ws = V (s−1)t.

slide-49
SLIDE 49

49/78

Linking with FMP: General Solution

The matrix-based solution has been verified for ki ≤ 10 and kθ ≤ 10. Linear linking is the kθ = 0 special case. The general solution can also be used to find bi from b⋆

i :

bi =

  • W ′W

−1 W ′b⋆

i

If W is of full column rank, then (W ′W ) is invertible.

slide-50
SLIDE 50

50/78

Two Uses for the FMP Linking Equations

Linking FMP item parameters (equating separate calibrations)

  • bi and b⋆

i known

  • t unknown

Transforming the item response model

  • bi and t known
  • b⋆

i unknown

slide-51
SLIDE 51

51/78

1 Why 2 How 3 Applications 4 Considerations, Limitations, Future Directions

slide-52
SLIDE 52

52/78

Equating Separate Calibrations

Linear and nonlinear linking coefficients t can be estimated using existing methods:

  • IRF-based Haebara (1980) method
  • TRF-based Stocking-Lord (1983) method

Need to ensure that t defines a monotonic metric transformation. Monotonicity can be enforced using the same parameter transformations described earlier.

slide-53
SLIDE 53

53/78

Equating Separate Calibrations: Monotonicity

h(θ⋆) is monotone if and only if ∂h(θ⋆)

∂θ

is strictly nonnegative: ∂h(θ⋆) ∂θ = a0θ + a1θθ⋆ + · · · + a2kθ+1,θθ⋆2kθ+1 Instead of estimating t = (t0, t1, t2, . . . t2kθ+1)′, estimating γθ = (ξθ, ωθ, α1θ, τ1θ, . . . , α2kθ+1,θ, τ2kθ+1,θ)′ ensures monotonicity.

slide-54
SLIDE 54

54/78

Transforming the Item Response Model

Recall:

  • bi and t known
  • b⋆

i unknown

bi are 2PL item parameters (slope-intercept form) What is t?

1 Functional transformations

  • True-score metric (expected number-correct)
  • ACT scores (arcsine transformation of number-correct scores)

2 Transformations estimated from data

  • Grade-equivalent scores
  • Linear relationship with an external variable
  • Transformation to normality/symmetry
slide-55
SLIDE 55

55/78

Transforming the Item Response Model

The FMP model includes the 2PL as a special case. A guessing-added FMP model1 contains the 3PL as a special case: P(yin|θ, bi, ci) = ci + (1 − ci)H[mi(θ)] ci is unaffected by metric transformations. The linking transformations hold for the guessing-added model.

1Falk & Cai (2016a)

slide-56
SLIDE 56

56/78

Transforming the Item Response Model

A composite FMP model: P(yin|θ⋆

n, bi, t) = H[mi(θ)] = H{mi[h(θ⋆ n)]},

where θ = h(θ⋆) =

2kθ+1

  • l=0

tlθ⋆l defines the linking transformation.

slide-57
SLIDE 57

57/78

Approximating a Known Function

Monotonic polynomial regression1 can be used to estimate strictly monotonic regression curves MonoPoly package2 in R. Example: expected number-correct (true score):

  • T = I

i=1 Pi(θ) = TRF(θ)

  • θ = TRF −1(T)

TRF −1(T) generally does NOT have a closed-form expression.

1Hawkins (1994) 2Murray et al. (2016)

slide-58
SLIDE 58

58/78

Approximating a Known Function

3PL item parameters for 80 items taken from Lord (1968). Steps:

1 Draw a large number of θ values in a range of interest.

→ 1,000 values evenly spaced from -7 to 7

2 Use the known function to transform the θ values to T. 3 Use Monopoly to regress θ on T. 4 Choose a kθ value that provides a “good enough” approximation. 5 The estimated coefficients t can be plugged into the composite FMP

model.

slide-59
SLIDE 59

59/78

Approximating a Known Function

Maximum absolute residual for kθ ∈ {0}: {2.65}

20 30 40 50 60 70 80 −2 −1 1 2 True Score (T) Residuals k θ = 0

slide-60
SLIDE 60

60/78

Approximating a Known Function

Maximum absolute residual for kθ ∈ {0, 1}: {2.65, 1.51}

20 30 40 50 60 70 80 −2 −1 1 2 True Score (T) Residuals k θ = 0 k θ = 1

slide-61
SLIDE 61

61/78

Approximating a Known Function

Maximum absolute residual for kθ ∈ {0, 1, 2}: {2.65, 1.51, 1.38}

20 30 40 50 60 70 80 −2 −1 1 2 True Score (T) Residuals k θ = 0 k θ = 1 k θ = 2

slide-62
SLIDE 62

62/78

Approximating a Known Function

Maximum absolute residual for kθ ∈ {0, 1, 2, 3}: {2.65, 1.51, 1.38, 0.63}

20 30 40 50 60 70 80 −2 −1 1 2 True Score (T) Residuals k θ = 0 k θ = 1 k θ = 2 k θ = 3

slide-63
SLIDE 63

63/78

Approximating a Known Function

Maximum absolute residual for kθ ∈ {0, 1, 2, 3, 4}: {2.65, 1.51, 1.38, 0.63, 0.42}

20 30 40 50 60 70 80 −2 −1 1 2 True Score (T) Residuals k θ = 0 k θ = 1 k θ = 2 k θ = 3 k θ = 4

slide-64
SLIDE 64

64/78

Approximating a Known Function

Maximum absolute residual for kθ ∈ {0, 1, 2, 3, 4, 5}: {2.65, 1.51, 1.38, 0.63, 0.42, 0.29}

20 30 40 50 60 70 80 −2 −1 1 2 True Score (T) Residuals k θ = 0 k θ = 1 k θ = 2 k θ = 3 k θ = 4 k θ = 5

slide-65
SLIDE 65

65/78

Approximating a Known Function

Test response function for kθ = 5 approximation:

20 30 40 50 60 70 80 20 30 40 50 60 70 80 True Score (T) Expected Number Correct

slide-66
SLIDE 66

66/78

Estimated Transformations: Grade Equivalents

Grade-equivalent metrics are often estimated with a quadratic function. Monotonic polynomial regression can guarantee an order-preserving transformation. Example data1:

  • Reading ability test data
  • ˆ

θ values were obtained for 46,667 students in grades 1–8

  • Original 3PL item parameters not available
  • Illustrate with Lord’s 80-item test

1Courtesy of Ted Christ, Department of Educational Psychology, University of Minnesota

slide-67
SLIDE 67

67/78

Estimated Transformations: Grade Equivalents

1 2 3 4 5 6 7 8 Grade Level Number of Students 2000 4000 6000 8000

slide-68
SLIDE 68

68/78

Estimated Transformations: Grade Equivalents

MonoPoly was used to regress ˆ θ on grade level for several kθ values. AIC and BIC can be used to select a kθ value. In this example,

  • AIC selected kθ = 2
  • BIC selected kθ = 1

kθ = 1 model: θ = −1.678 + .766 × grade − .097 × grade2 + .004 × grade3 + ǫ

slide-69
SLIDE 69

69/78

Estimated Transformations: Grade Equivalents

1 2 3 4 5 6 7 8 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

Grade Level θ ^ k θ = 0 k θ = 1 k θ = 2 Empirical mean

slide-70
SLIDE 70

70/78

Estimated Transformations: Grade Equivalents

A: θ metric; B: grade-equivalent metric (kθ = 1)

−3 −2 −1 1 2 3 1 2 3 4 5 6 Information Inf 1 0.7 0.6 0.5 0.4 0.4 Expected Standard Error of Measurement

A

1 2 3 4 5 6 7 8 0.0 0.2 0.4 0.6 0.8 1.0 information Inf 5 2.5 1.7 1.2 1 Expected Standard Error of Measurement

B

slide-71
SLIDE 71

71/78

1 Why 2 How 3 Applications 4 Considerations, Limitations, Future Directions

slide-72
SLIDE 72

72/78

Future Applications

Many current psychometric issues are heavily metric dependent

  • Measurement of growth
  • Value-added assessment
  • IRT-based DIF detection

Fitting IRT models does not (and cannot) resolve these issues. A flexible metric modeling framework provides

  • a context for understanding metric-dependent problems
  • tools for specifying IRT on user-specified metrics
slide-73
SLIDE 73

73/78

Limitations

Polynomials are notoriously unstable

  • especially at high polynomial degrees
  • especially at extreme θ values

→ numerical precision/stability should be monitored carefully Need to account for additional errors

  • if the θ transformation is approximated by a polynomial
  • if sampled data is used to estimate the metric transformation

Some transformed metrics are item-dependent or sample-dependent

  • e.g., true-score metric depends on the items used
  • e.g., transformations to normality depend on the sample
slide-74
SLIDE 74

74/78

Future Directions

Extensions to other data types/models

  • Upper and lower asymptote parameters1
  • Polytomous item responses2
  • Multiple dimensions

Extensions to SEM

  • θ may be nonlinearly related to outcome variables
  • The model should be specified on the desired metric in order to

appropriately account for measurement error Alternative methods for metric transformations

  • Non-logistic filter function?
  • Alternatives to polynomials?

1Falk & Cai (2016b) 2Falk & Cai (2016a)

slide-75
SLIDE 75

75/78

Big Picture

Lord’s (1975) indeterminacy: Pi(θ) = P ⋆[h−1(θ)] = P ⋆

i (θ⋆)

  • It is important to make assumptions explicit
  • IRT ˆ

θ are often used without realizing the assumptions made

  • Often, a transformation of θ is more practical than θ

Through metric transformations, Lord’s indeterminacy can be an asset to good data analysis, rather than a limitation.

slide-76
SLIDE 76

76/78

Learn More

  • Feuerstahler, L. M. (under review). Metric transformations and the

filtered monotonic polynomial IRT model. Psychometrika.

  • flexmet package for R

https://github.com/leahfeuerstahler/flexmet

  • Feuerstahler, L. M. (2016). Exploring alternate latent trait metrics

with the filtered monotonic polynomial IRT model. (Doctoral dissertation). University of Minnesota.

THANK YOU

slide-77
SLIDE 77

77/78

References I

  • Angoff. W. A. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement

(pp. 508–600). Washington, DC: American Council on Education. Bock, R. D., Thissen, D., & Zimowski, M. F. (1997). IRT estimation of domain scores. Journal of Educational Measurement, 34, 197–211. Camilli, G. (1988). Scale shrinkage and the estimation of latent distribution parameters. Journal of Educational Statistics, 13, 227–241. Elphinstone, C. D. (1983). A target distribution model for nonparametric density estimation. Communication in Statistics–Theory and Methods, 12, 161–198. Elphinstone, C. D. (1985). A method of distribution and density estimation (Unpublished dissertation). University

  • f South Africa, Pretoria, South Africa.

Falk, C. F. & Cai, L. (2016a). Maximum marginal likelihood estimation of a monotonic polynomial generalized partial credit model with applications to multiple group analysis. Psychometrika, 81, 434–460. Falk, C. F. & Cai, L. (2016b). Semiparametric item response functions in the context of guessing. Journal of Educational Measurement, 53, 229–247. Feuerstahler, L. M. (2016). Exploring alternate latent trait metrics with the filtered monotonic polynomial IRT

  • model. (Doctoral dissertation). University of Minnesota.

Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144–149. Hawkins, D. M. (1994). Fitting monotonic polynomials to data. Computational Statistics, 9, 233–247. Jones, L. V. (1971). The nature of measurement. In R. L. Thorndike (Ed.), Educational measurement (pp. 335–355). Washington, DC: American Council on Education. Kolen, M. J. (1988). Defining score scales in relation to measurement error. Journal of Educational Measurement, 25, 97–110. Kolen, M. J., & Brennan, R. L. (2014). Test Equating, Scaling, and Linking (3rd. Ed). New York: Springer. Liang, L., & Browne, M. W. (2015). A quasi-parametric method for fitting flexible item response functions. Journal

  • f Educational and Behavioral Statistics, 40, 5–34.

Lord, F. M. (1968). An analysis of the verbal scholastic aptitude test using Birnbaum’s three-parameter logistic

  • model. Educational and Psychological Measurement, 28, 989–1020.

Lord, F. M. (1974). The relative efficiency of two tests as a function of ability level. Psychometrika, 39, 351–358. Lord, F. M. (1975). The ‘ability’ scale in item characteristic curve theory. Psychometrika, 40, 205–217. Mokken, R. J. (1971). A theory and procedure of scale analysis with applications in political research. The Hague: Mouton.

slide-78
SLIDE 78

78/78

References II

Murray, K., Müller, S., & Turlach, B. A. (2016). Fast and flexible methods for monotone polynomial fitting. Journal

  • f Statistical Computation and Simulation, 86, 2946–2966.

Nunnally, J. C. (1967). Psychometric theory. New York: McGraw-Hill. Ramsay, J. O. (1991). Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56, 611–630. Ramsay, J. O., & Wiberg, M. (2017). A strategy for replacing sum scoring. Journal of Educational and Behavioral Statistics, 42, 282–307. Samejima, F. (1979). Constant information model: A new, promising item characteristic function. (Research Rep.

  • No. 79-1). Knoxville: University of Tennessee, Department of Psychology.

Schulz, E. M., & Nicewander, W. A. (1997). Grade equivalent and IRT representations of growth. Journal of Educational Measurement, 34, 315–331. Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677–680. Stocking, M. L. (1996). An alternative method for scoring adaptive tests. Journal of Educational and Behavioral Statistics, 21, 365–389. Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210. Thurstone, L. L. (1925). A method of scaling psychological and educational tests. The Journal of Educational Psychology, 18, 505–524. van der Linden, W., & Barrett, M. D. (2016). Linking item response model parameters. Psychometrika, 81, 650–673. Yen, W. M. (1986). The choice of scale for educational measurement: An IRT perspective. Journal of Educational Measurement, 23, 299–325. Yi, Q., Wang, T., & Ban, J. -C. (2001). Effects of scale transformation and test-termination rule on the precision of ability estimation in computerized adaptive testing. Journal of Educational Measurement, 38, 267–292.