1/78
Flexible Latent Trait Metrics An Application of the Filtered - - PowerPoint PPT Presentation
Flexible Latent Trait Metrics An Application of the Filtered - - PowerPoint PPT Presentation
Flexible Latent Trait Metrics An Application of the Filtered Monotonic Polynomial Item Response Model Leah Feuerstahler University of California, Berkeley 1/78 Overview Premise : In many applications of item response theory (IRT), reported
2/78
Overview
Premise: In many applications of item response theory (IRT), reported scores are nonlinear transformations of the IRT θ estimates. Goal: Develop an IRT framework such that θ is the continuous metric on which scores are reported.
3/78
Overview
−6 −4 −2 2 4 6 0.0 0.2 0.4 0.6 0.8 1.0 θ Probability 20 40 60 80 0.0 0.2 0.4 0.6 0.8 1.0 True Score (T) Probability
4/78
Overview
Premise: In many applications of item response theory (IRT), reported scores are nonlinear transformations of the IRT θ estimates. Goal: Develop an IRT framework such that θ is the continuous metric on which scores are reported.
1 Why
- Why is the IRT θ metric often transformed?
- Why is an IRT for transformed metrics needed?
2 How
- Filtered monotonic polynomial (FMP) item response model
- Item parameter linking
3 Applications
- Functional metric transformations
- Estimated metric transformations
4 Considerations, Limitations, Future Directions
5/78
1 Why 2 How 3 Applications 4 Considerations, Limitations, Future Directions
6/78
Scaling
[T]he process of associating numbers or other ordered indicators with the performance of examinees.1 Scaled scores are often transformations of number-correct scores or IRT ˆ θ. What are the criteria for selecting a scale? Examples:
1 Facilitates appropriate interpretation by the public 2 Anchored to external indicators 3 Consistent with intuitions about how variables should behave
1Kolen and Brennan (2014, p. 371)
7/78
Scaling
1 Facilitates appropriate interpretation by the public
- Normalized scores for a representative sample
- z-scores (mean 0, sd 1)
- T-scores (mean 50, sd 10)
- Scores range from 0 to test length, or 0 to 100
- Domain Scores1
- Optimal Scores2
- Equated number-correct3
- Constant measurement error
- ACT scores (arcsine transformation of number-correct)4
- Constant IRT information5
1Bock, Thissen, & Zimowski (1997) 2Ramsay & Wiberg (2017) 3Stocking (1996) 4Kolen (1988) 5Samejima (1979)
8/78
Scaling
2 Anchored to external indicators
- Expected number-correct scores1
- Grade-equivalent scores2
- Equating with a different test form
- Linear relationship with other variables (intended use)3
- Dollars are nonlinearly related to quality of life4
- Typed words per minute is nonlinearly related to practice/effort5
1Stocking (1996) 2Schulz & Nicewander (1997) 3Nunnally (1967, p. 28) 4Jones (1971) 5Angoff (1971, pp. 509-510)
9/78
Scaling
3 Consistent with beliefs about how variables should behave
- Normally distributed ability1
- Uncorrelated difficulty and discrimination parameters2
- Does variability of achievement increase or decrease with grade level?
- With Thurstonian scales, variability usually increases with grade level
- IRT scales often exhibit “scale shrinkage”3
- “Armchair” theorizing can lead to conflicting answers4
- Interval level measurement “in some sense”5
1Thurstone (1925) 2Lord (1975) 3Camilli (1988) 4Yen (1986, p. 312) 5Kolen & Brennan (2014, p. 374)
10/78
Interval vs. Ordinal
Stevens (1946): Nominal Ordinal Interval Ratio Scale type defined in terms of admissible operations. Ordinal Interval Any monotonic transformation Only linear transformations Invariant ordering of observations Meaningful intervals Median, Percentiles Mean, Standard deviation Hardness of minerals Temperature
11/78
Interval vs. Ordinal
Interval-level measurement is highly desirable for educational and psychological tests. What is actually MEANT by interval-level measurement?
- Only linear transformations are admissible given the IRT model
- The (Rasch) model fits
- Declaring that scores are equal-interval ‘in some sense’1
- Scores are linearly related to the underlying construct2
1Kolen & Brennan (2014, p. 374) 2Yen (1986)
12/78
Where Does the IRT θ Come From?
What do item response models assume? Simple case: Mokken’s (1971) monotone homogeneity model (MHM) assumes only
1 Unidimensionality 2 Local independence 3 Monotonicity
If the MHM assumptions hold, individuals can be ordered uniquely.
13/78
Where Does the IRT θ Come From?
Under the MHM assumptions, any monotonic function of the latent trait implies an equally admissible item response model.1 Suppose an IRT model with item response function (IRF) Pi(θ). For a continuous monotonic function h, where θ⋆ = h−1(θ), another item response model exists such that Pi(θ) = P ⋆
i [h−1(θ)] = P ⋆ i (θ⋆).
Any reason to prefer θ to θ⋆?
1Lord (1975)
14/78
Where Does the IRT θ Come From?
Under the MHM assumptions, an infinite number of IRT models can fit data equally well. Identification restrictions are needed in practice. Two main solutions:
1 Parametric IRT (PIRT)
- Specify the IRF shape
- (Usually) determines scale up to linear transformations
- Assumes that the chosen IRF shape(s) fits all scale items
2 Nonparametric IRT (NIRT)
- Specify the latent trait distribution (e.g., standard normal)
- Often conditions on (a monotonic transformation of) sum scores
- Nonparametrically estimates the IRF shape
15/78
Nonlinear Transformations of the IRT Metric
What does not change?
- Ordering of examinees
- Percentile rankings
- Relative efficiency of item response curves
What does change?
- Item and test information
- Standard errors
- Confidence intervals
- Reliability
16/78
Item Information
Metric transformations can have dramatic effects on information functions. Lord (1974, p. 353): Ii(θ) = Ii(θ⋆)
- ∂h(θ⋆)
∂θ⋆
2 The trait level that maximizes Ii(θ) need not be the corresponding trait level that maximizes Ii(θ⋆).
17/78
Metric Transformations
−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0
θ Probability
A
−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0
θ* Probability
B
18/78
Metric Transformations
−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0
θ Information
C
−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0
θ* Information
D
19/78
Relative Efficiency
The relative efficiency of two information functions does not change with metric transformations.1 RE = I⋆
1(θ⋆ n)
I⋆
2(θ⋆ n) = I1(θn)
I2(θn) The relative information provided by each item is invariant to monotonic transformations of the latent trait. The maximally informative item for a trait level is invariant to metric transformations.
1Lord (1974, 1980, p. 89)
20/78
Relative Efficiency
−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0
θ Probability
A
−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0
θ* Probability
B
21/78
Relative Efficiency
−3 −2 −1 1 2 3 0.0 0.5 1.0 1.5 2.0
θ Information
C
−3 −2 −1 1 2 3 0.0 0.5 1.0 1.5 2.0
θ* Information
D
22/78
Why Specify IRT on a Transformed Metric?
- Parsimony (avoid multi-step analyses)
- Many scale transformations (e.g., quadratic) do not enforce
monotonicity
- Computerized adaptive testing (CAT)
- Many item selection and termination rules are metric-dependent
- CAT requires computationally efficient methods
- No need to repeatedly solve for transformed quantities
- Statistical properties (e.g., bias) of ˆ
θ can change with metric transformations1
- Appropriately account for measurement error when evaluating the
relationship between the latent variable and external variables
1Yi et al. (2001)
23/78
Desiderata for a Flexible-Metric IRT
- Continuous, invertible metric transformations
- Flexible, ability to express any continuous monotonic transformation
- Model parameters that are readily portable to new contexts
- Closed-form derivatives for computing information, standard errors,
trait estimates
- Reduction to commonly used IRT models (Rasch, 2PL, 3PL, etc.)
24/78
1 Why 2 How 3 Applications 4 Considerations, Limitations, Future Directions
25/78
Filtered Monotonic Polynomial IRT
Proposed as a new NIRT model by Liang & Browne (2015). Based on the work of Elphinstone (1983, 1985). Pi(θ) = H[mi(θ)] = {1 + exp[−mi(θ)]}−1 where mi(θ) = b0i + b1iθ + b2iθ2 + · · · + b2ki+1,iθ2ki+1
- bi = (b0i, b1i, . . . , b2ki+1,i)′: item parameters/polynomial coefficients
- ki: item complexity parameter, higher ki → greater flexibility
- If ki = 0, FMP reduces to 2PL (slope-intercept parameterization)
26/78
Filtered Monotonic Polynomial IRT
With high enough ki, FMP can closely approximate any IRF that meets the MHM assumptions. Closeness of approximation can be characterized by the root integrated mean squared error (RIMSE)1: RIMSEi =
- [ ˆ
Pi(θ) − Pi(θ)]2g(θ)dθ g(θ) is the standard normal distribution
1Ramsay (1991)
27/78
Example FMP Approximations
RIMSEi = {.034, .034, .004} for ki = {0, 1, 2}
−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0
Four−Parameter Model
θ Probability
A
True ki = 0 ki = 1 ki = 2
28/78
Example FMP Approximations
RIMSEi = {.031, .006, .004} for ki = {0, 1, 2}
−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0
Logical Positive Exponent Model
θ Probability
B
True ki = 0 ki = 1 ki = 2
29/78
Example FMP Approximations
RIMSEi = {.091, .025, .007} for ki = {0, 1, 2}
−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0
Mixture Normal Model
θ Probability
C
True ki = 0 ki = 1 ki = 2
30/78
Example FMP Approximations
RIMSEi = {.085, .018, .010} for ki = {0, 1, 2}
−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0
FMP with k i = 8
θ Probability
D
True ki = 0 ki = 1 ki = 2
31/78
Ensuring Monotonicity
Need to ensure that Pi(θ) is a monotonic function of θ. If mi(θ) is monotonic, then Pi(θ) is also monotonic. mi(θ) is monotonic iff its first derivative, ∂mi(θ) ∂θ = a0i + a1iθ + · · · + a2ki,iθ2ki, is nonnegative at all θ.
32/78
A Parameter Transformation
bi = (b0i, b1i, . . . , b2ki+1,i)′ are item parameters/polynomial coefficients. Let b0i = ξi and bsi = as−1,i s for s = 1, 2, . . . , 2ki + 1. ai = (a0i, a1i, . . . , a2ki,i)′ are polynomial coefficients of ∂mi(θ)
∂θ
.
33/78
A Parameter Transformation
Recall: we want ∂mi(θ) ∂θ = a0i + a1iθ + · · · + a2ki,iθ2ki to be nonnegative everywhere. If ∂mi(θ)
∂θ
is nonnegative everywhere, then we can write ∂mi(θ) ∂θ = λi
ki
- s=1
- 1 − 2αsiθ +
- α2
si + βsi
- θ2
if ki ≥ 1 = λi if ki = 0, where λi ≥ 0 and βsi ≥ 0, s = 1, . . . ki. Transformed parameters defined on the entire real line: ωi = ln(λi) and τsi = ln(βsi)
34/78
A Parameter Transformation
To fit the FMP model, estimate γi = (ξi, ωi, α1i, τ1i, α2i, τ2i, · · · , α2ki,i, τ2ki,i)′, instead of bi = (b0i, b1i, . . . , b2ki+1,i)′, to ensure monotonically increasing IRFs. Details of FMP model estimation can be found elsewhere1
1Falk & Cai (2016a, 2016b); Feuerstahler (2016); Liang & Browne (2015)
35/78
Linking
IRT models need to be identified.
- PIRT: Specifying IRF shape (up to a linear transformation)
- NIRT: Specifying latent trait distribution
In IRT-based linking, linearly transform item and person parameters to put multiple test calibrations have the same underlying metric. Linking resolves the metric differences that result from differences in model identification restrictions.1 Linking allows researchers to pass among models that make equivalent predictions.
1van der Linden & Barrett (2015)
36/78
Linking
Recall: when the chosen form of a PIRT model is appropriate for all items, then θ is determined up to a linear transformation. Nonlinear monotonic transformations of θ do not affect model predictions. Nonlinear transformations of θ result in a different IRF form. Under the FMP model, both linear and nonlinear transformations of the θ metric can be modeled explicitly.
- ⋆: parameters on the transformed metric
- If ki = k⋆
i : linear transformation
- If ki = k⋆
i : nonlinear transformation
37/78
Linear Linking
−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0
θ Probability
A
−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0
θ* Probability
B
38/78
Nonlinear Linking
−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0
θ Probability
A
−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0
θ* Probability
B
39/78
Linear Linking
−3 −2 −1 1 2 3 −3 −2 −1 1 2 3
θ θ*
C
−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0
θ Density
g(θ) g(h(θ*))
D
40/78
Nonlinear Linking
−3 −2 −1 1 2 3 −3 −2 −1 1 2 3
θ θ*
C
−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4 0.5
θ Density
g(θ) g(h(θ*))
D
41/78
Linear Linking with FMP
Linking relationships are defined in the population. Goal: find b⋆
i from bi.
If the metric transformation is linear, then θ = t0 + t1θ⋆, where t0 and t1 are the linking parameters.
42/78
Linear Linking with FMP
Fundamental equation: P(yin = 1|θn, bi) = P(yin = 1|θ⋆
n, b⋆ i )
- 1 + exp
- −
2ki+1
- r=0
briθr −1 =
- 1 + exp
- −
2ki+1
- s=0
b⋆
siθ⋆s
−1
2ki+1
- r=0
briθr =
2ki+1
- s=0
b⋆
siθ⋆s
. . . . . . . . .
2ki+1
- r=0
bri
r
- u=0
r u
- (t1θ⋆)r−u t0u
=
2ki+1
- s=0
b⋆
siθ⋆s
Strategy: Matching polynomial coefficients.
43/78
Linear Linking with FMP
From previous slide:
2ki+1
- r=0
bri
r
- u=0
r u
- (t1θ⋆)r−u t0u =
2ki+1
- s=0
b⋆
siθ⋆s
Result: b⋆
si = 2ki+1
- r=s
- r
r − s
- ts
1tr−s
bri for s = 0, . . . 2k⋆
i + 1.
44/78
Nonlinear Linking with FMP
Let k⋆
i > ki for all i = 1, . . . , I.
This relationship is a mathematical property of the FMP model, and need not hold when FMP curves are estimated. Linking coefficients t0, t1, . . . , t2kθ+1 define the metric transformation: θ = h(θ⋆) =
2kθ+1
- l=0
tlθ⋆l
45/78
Nonlinear Linking with FMP
Suppose mi(θ) = 2ki+1
s=0
bsiθs and m⋆
i (θ⋆) = 2k⋆
i +1
s=0
b⋆
siθ⋆s.
By substitution, mi(θ) =
2ki+1
- s=0
bsi 2kθ+1
- l=0
tlθ⋆l
- Fundamental equation:
mi(θ) = m⋆
i (θ⋆) 2ki+1
- s=0
bsi 2kθ+1
- l=0
tlθ⋆l
- =
2k⋆
i +1
- s=0
b⋆
siθ⋆s
46/78
Transformed Item Complexities
Three complexities:
- ki
- k⋆
i
- kθ
Matching coefficients from the fundamental equation gives 2k⋆
i + 1 = (2kθ + 1)(2ki + 1)
which implies that k⋆
i
= (2kθ + 1)(2ki + 1) − 1 2 = 2kikθ + ki + kθ. Again, this relationship holds in the population.
47/78
Linking with FMP: General Solution
The transformation from bi to b⋆
i can be expressed in matrix notation.
b⋆
i = W bi
Define W with dimension (2k⋆
i + 2) × (2ki + 2).
W = w1 w2 · · · w2ki+2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ws is of length (s − 1)(2kθ + 1) + 1, s = 1, . . . , 2ki + 2.
48/78
Linking with FMP: General Solution
The ws vectors are found recursively. Define V (s) with dimension (s(2kθ + 1) + 1) × (2kθ + 2). V (s) = ws · · · . . . ws · · · . . . ws · · · . . . ws Set w1 = 1 and w2 = t = (t0, t1, . . . , t2kθ+1)′. Then, ws = V (s−1)t.
49/78
Linking with FMP: General Solution
The matrix-based solution has been verified for ki ≤ 10 and kθ ≤ 10. Linear linking is the kθ = 0 special case. The general solution can also be used to find bi from b⋆
i :
bi =
- W ′W
−1 W ′b⋆
i
If W is of full column rank, then (W ′W ) is invertible.
50/78
Two Uses for the FMP Linking Equations
Linking FMP item parameters (equating separate calibrations)
- bi and b⋆
i known
- t unknown
Transforming the item response model
- bi and t known
- b⋆
i unknown
51/78
1 Why 2 How 3 Applications 4 Considerations, Limitations, Future Directions
52/78
Equating Separate Calibrations
Linear and nonlinear linking coefficients t can be estimated using existing methods:
- IRF-based Haebara (1980) method
- TRF-based Stocking-Lord (1983) method
Need to ensure that t defines a monotonic metric transformation. Monotonicity can be enforced using the same parameter transformations described earlier.
53/78
Equating Separate Calibrations: Monotonicity
h(θ⋆) is monotone if and only if ∂h(θ⋆)
∂θ
is strictly nonnegative: ∂h(θ⋆) ∂θ = a0θ + a1θθ⋆ + · · · + a2kθ+1,θθ⋆2kθ+1 Instead of estimating t = (t0, t1, t2, . . . t2kθ+1)′, estimating γθ = (ξθ, ωθ, α1θ, τ1θ, . . . , α2kθ+1,θ, τ2kθ+1,θ)′ ensures monotonicity.
54/78
Transforming the Item Response Model
Recall:
- bi and t known
- b⋆
i unknown
bi are 2PL item parameters (slope-intercept form) What is t?
1 Functional transformations
- True-score metric (expected number-correct)
- ACT scores (arcsine transformation of number-correct scores)
2 Transformations estimated from data
- Grade-equivalent scores
- Linear relationship with an external variable
- Transformation to normality/symmetry
55/78
Transforming the Item Response Model
The FMP model includes the 2PL as a special case. A guessing-added FMP model1 contains the 3PL as a special case: P(yin|θ, bi, ci) = ci + (1 − ci)H[mi(θ)] ci is unaffected by metric transformations. The linking transformations hold for the guessing-added model.
1Falk & Cai (2016a)
56/78
Transforming the Item Response Model
A composite FMP model: P(yin|θ⋆
n, bi, t) = H[mi(θ)] = H{mi[h(θ⋆ n)]},
where θ = h(θ⋆) =
2kθ+1
- l=0
tlθ⋆l defines the linking transformation.
57/78
Approximating a Known Function
Monotonic polynomial regression1 can be used to estimate strictly monotonic regression curves MonoPoly package2 in R. Example: expected number-correct (true score):
- T = I
i=1 Pi(θ) = TRF(θ)
- θ = TRF −1(T)
TRF −1(T) generally does NOT have a closed-form expression.
1Hawkins (1994) 2Murray et al. (2016)
58/78
Approximating a Known Function
3PL item parameters for 80 items taken from Lord (1968). Steps:
1 Draw a large number of θ values in a range of interest.
→ 1,000 values evenly spaced from -7 to 7
2 Use the known function to transform the θ values to T. 3 Use Monopoly to regress θ on T. 4 Choose a kθ value that provides a “good enough” approximation. 5 The estimated coefficients t can be plugged into the composite FMP
model.
59/78
Approximating a Known Function
Maximum absolute residual for kθ ∈ {0}: {2.65}
20 30 40 50 60 70 80 −2 −1 1 2 True Score (T) Residuals k θ = 0
60/78
Approximating a Known Function
Maximum absolute residual for kθ ∈ {0, 1}: {2.65, 1.51}
20 30 40 50 60 70 80 −2 −1 1 2 True Score (T) Residuals k θ = 0 k θ = 1
61/78
Approximating a Known Function
Maximum absolute residual for kθ ∈ {0, 1, 2}: {2.65, 1.51, 1.38}
20 30 40 50 60 70 80 −2 −1 1 2 True Score (T) Residuals k θ = 0 k θ = 1 k θ = 2
62/78
Approximating a Known Function
Maximum absolute residual for kθ ∈ {0, 1, 2, 3}: {2.65, 1.51, 1.38, 0.63}
20 30 40 50 60 70 80 −2 −1 1 2 True Score (T) Residuals k θ = 0 k θ = 1 k θ = 2 k θ = 3
63/78
Approximating a Known Function
Maximum absolute residual for kθ ∈ {0, 1, 2, 3, 4}: {2.65, 1.51, 1.38, 0.63, 0.42}
20 30 40 50 60 70 80 −2 −1 1 2 True Score (T) Residuals k θ = 0 k θ = 1 k θ = 2 k θ = 3 k θ = 4
64/78
Approximating a Known Function
Maximum absolute residual for kθ ∈ {0, 1, 2, 3, 4, 5}: {2.65, 1.51, 1.38, 0.63, 0.42, 0.29}
20 30 40 50 60 70 80 −2 −1 1 2 True Score (T) Residuals k θ = 0 k θ = 1 k θ = 2 k θ = 3 k θ = 4 k θ = 5
65/78
Approximating a Known Function
Test response function for kθ = 5 approximation:
20 30 40 50 60 70 80 20 30 40 50 60 70 80 True Score (T) Expected Number Correct
66/78
Estimated Transformations: Grade Equivalents
Grade-equivalent metrics are often estimated with a quadratic function. Monotonic polynomial regression can guarantee an order-preserving transformation. Example data1:
- Reading ability test data
- ˆ
θ values were obtained for 46,667 students in grades 1–8
- Original 3PL item parameters not available
- Illustrate with Lord’s 80-item test
1Courtesy of Ted Christ, Department of Educational Psychology, University of Minnesota
67/78
Estimated Transformations: Grade Equivalents
1 2 3 4 5 6 7 8 Grade Level Number of Students 2000 4000 6000 8000
68/78
Estimated Transformations: Grade Equivalents
MonoPoly was used to regress ˆ θ on grade level for several kθ values. AIC and BIC can be used to select a kθ value. In this example,
- AIC selected kθ = 2
- BIC selected kθ = 1
kθ = 1 model: θ = −1.678 + .766 × grade − .097 × grade2 + .004 × grade3 + ǫ
69/78
Estimated Transformations: Grade Equivalents
1 2 3 4 5 6 7 8 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0
Grade Level θ ^ k θ = 0 k θ = 1 k θ = 2 Empirical mean
70/78
Estimated Transformations: Grade Equivalents
A: θ metric; B: grade-equivalent metric (kθ = 1)
−3 −2 −1 1 2 3 1 2 3 4 5 6 Information Inf 1 0.7 0.6 0.5 0.4 0.4 Expected Standard Error of Measurement
A
1 2 3 4 5 6 7 8 0.0 0.2 0.4 0.6 0.8 1.0 information Inf 5 2.5 1.7 1.2 1 Expected Standard Error of Measurement
B
71/78
1 Why 2 How 3 Applications 4 Considerations, Limitations, Future Directions
72/78
Future Applications
Many current psychometric issues are heavily metric dependent
- Measurement of growth
- Value-added assessment
- IRT-based DIF detection
Fitting IRT models does not (and cannot) resolve these issues. A flexible metric modeling framework provides
- a context for understanding metric-dependent problems
- tools for specifying IRT on user-specified metrics
73/78
Limitations
Polynomials are notoriously unstable
- especially at high polynomial degrees
- especially at extreme θ values
→ numerical precision/stability should be monitored carefully Need to account for additional errors
- if the θ transformation is approximated by a polynomial
- if sampled data is used to estimate the metric transformation
Some transformed metrics are item-dependent or sample-dependent
- e.g., true-score metric depends on the items used
- e.g., transformations to normality depend on the sample
74/78
Future Directions
Extensions to other data types/models
- Upper and lower asymptote parameters1
- Polytomous item responses2
- Multiple dimensions
Extensions to SEM
- θ may be nonlinearly related to outcome variables
- The model should be specified on the desired metric in order to
appropriately account for measurement error Alternative methods for metric transformations
- Non-logistic filter function?
- Alternatives to polynomials?
1Falk & Cai (2016b) 2Falk & Cai (2016a)
75/78
Big Picture
Lord’s (1975) indeterminacy: Pi(θ) = P ⋆[h−1(θ)] = P ⋆
i (θ⋆)
- It is important to make assumptions explicit
- IRT ˆ
θ are often used without realizing the assumptions made
- Often, a transformation of θ is more practical than θ
Through metric transformations, Lord’s indeterminacy can be an asset to good data analysis, rather than a limitation.
76/78
Learn More
- Feuerstahler, L. M. (under review). Metric transformations and the
filtered monotonic polynomial IRT model. Psychometrika.
- flexmet package for R
https://github.com/leahfeuerstahler/flexmet
- Feuerstahler, L. M. (2016). Exploring alternate latent trait metrics
with the filtered monotonic polynomial IRT model. (Doctoral dissertation). University of Minnesota.
THANK YOU
77/78
References I
- Angoff. W. A. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement
(pp. 508–600). Washington, DC: American Council on Education. Bock, R. D., Thissen, D., & Zimowski, M. F. (1997). IRT estimation of domain scores. Journal of Educational Measurement, 34, 197–211. Camilli, G. (1988). Scale shrinkage and the estimation of latent distribution parameters. Journal of Educational Statistics, 13, 227–241. Elphinstone, C. D. (1983). A target distribution model for nonparametric density estimation. Communication in Statistics–Theory and Methods, 12, 161–198. Elphinstone, C. D. (1985). A method of distribution and density estimation (Unpublished dissertation). University
- f South Africa, Pretoria, South Africa.
Falk, C. F. & Cai, L. (2016a). Maximum marginal likelihood estimation of a monotonic polynomial generalized partial credit model with applications to multiple group analysis. Psychometrika, 81, 434–460. Falk, C. F. & Cai, L. (2016b). Semiparametric item response functions in the context of guessing. Journal of Educational Measurement, 53, 229–247. Feuerstahler, L. M. (2016). Exploring alternate latent trait metrics with the filtered monotonic polynomial IRT
- model. (Doctoral dissertation). University of Minnesota.
Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144–149. Hawkins, D. M. (1994). Fitting monotonic polynomials to data. Computational Statistics, 9, 233–247. Jones, L. V. (1971). The nature of measurement. In R. L. Thorndike (Ed.), Educational measurement (pp. 335–355). Washington, DC: American Council on Education. Kolen, M. J. (1988). Defining score scales in relation to measurement error. Journal of Educational Measurement, 25, 97–110. Kolen, M. J., & Brennan, R. L. (2014). Test Equating, Scaling, and Linking (3rd. Ed). New York: Springer. Liang, L., & Browne, M. W. (2015). A quasi-parametric method for fitting flexible item response functions. Journal
- f Educational and Behavioral Statistics, 40, 5–34.
Lord, F. M. (1968). An analysis of the verbal scholastic aptitude test using Birnbaum’s three-parameter logistic
- model. Educational and Psychological Measurement, 28, 989–1020.
Lord, F. M. (1974). The relative efficiency of two tests as a function of ability level. Psychometrika, 39, 351–358. Lord, F. M. (1975). The ‘ability’ scale in item characteristic curve theory. Psychometrika, 40, 205–217. Mokken, R. J. (1971). A theory and procedure of scale analysis with applications in political research. The Hague: Mouton.
78/78
References II
Murray, K., Müller, S., & Turlach, B. A. (2016). Fast and flexible methods for monotone polynomial fitting. Journal
- f Statistical Computation and Simulation, 86, 2946–2966.
Nunnally, J. C. (1967). Psychometric theory. New York: McGraw-Hill. Ramsay, J. O. (1991). Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56, 611–630. Ramsay, J. O., & Wiberg, M. (2017). A strategy for replacing sum scoring. Journal of Educational and Behavioral Statistics, 42, 282–307. Samejima, F. (1979). Constant information model: A new, promising item characteristic function. (Research Rep.
- No. 79-1). Knoxville: University of Tennessee, Department of Psychology.
Schulz, E. M., & Nicewander, W. A. (1997). Grade equivalent and IRT representations of growth. Journal of Educational Measurement, 34, 315–331. Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677–680. Stocking, M. L. (1996). An alternative method for scoring adaptive tests. Journal of Educational and Behavioral Statistics, 21, 365–389. Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210. Thurstone, L. L. (1925). A method of scaling psychological and educational tests. The Journal of Educational Psychology, 18, 505–524. van der Linden, W., & Barrett, M. D. (2016). Linking item response model parameters. Psychometrika, 81, 650–673. Yen, W. M. (1986). The choice of scale for educational measurement: An IRT perspective. Journal of Educational Measurement, 23, 299–325. Yi, Q., Wang, T., & Ban, J. -C. (2001). Effects of scale transformation and test-termination rule on the precision of ability estimation in computerized adaptive testing. Journal of Educational Measurement, 38, 267–292.