SLIDE 1 Rater agreement - ordinal ratings Karl Bang Christensen
- Dept. of Biostatistics,
- Univ. of Copenhagen
NORDSTAT, 2012 http://biostat.ku.dk/~kach/
1
SLIDE 2
Rater agreement - ordinal ratings
Methods for analyzing rater agreement are well-established when ratings are dichotomous or if they can be assumed to be normally distributed.
raters r = 1, . . . , R subjects s = 1, . . . , S Xrs ∈ {0, 1, . . . , K}
2
SLIDE 3 Two raters (R=2)
A A A A
Pr(a) observed proportion of agreement, κ coefficient κ = Pr(a) − Pr(e) 1 − Pr(e) (1) where Pr(e) =
Pr(X1 = k)Pr(X2 = k) is the expected proportion of agreement under independence.
- Cohen. Educational and Psychological Measurement, 1960,20:37–46.
3
SLIDE 4 Two raters (R=2) κ coefficient (1) is widely used (i) the value depends on the margins and thus on the sample (ii) if ratings are ordinal this is not taken into account (iii) no rational for saying that, e.g., κ > 0.7 is good. κ coefficient (1) is a marginal (population average) measure.
- Cohen. Educational and Psychological Measurement, 1960, 20:37-46.
4
SLIDE 5 (i) value depends on margins three tables with agreement about
70 100 of the subjects:
X2 +
+
+
+ 20 20 X1 + 10 20 X1 + 5 20
50
60
65 κ=0.35 κ=0.21 κ=0.08
5
SLIDE 6
(ii) if ratings are ordinal this is not taken into account Weighted κ coefficient
A 0.75A 0.50A 0.25A 0.75A A 0.75A 0.50A 0.50A 0.75A A 0.75A 0.25A 0.50A 0.75A A
Arbitrary weights (two standards implemented in SAS)
6
SLIDE 7 Marginal homogeneity Beyond agreement we would want Pr(X1 = k) = Pr(X2 = k) for all k = 0, 1, . . . , K Bowkers test of Symmetry tests this hypothesis. For K = 1 this is McNemars test. Pr(X1 = 1, X2 = 0) Pr(X1 = 1, X2 = 0) + Pr(X1 = 0, X2 = 1)
- McNemar. Psychometrika 1947, 12:153-157.
7
SLIDE 8
Continuous data: regression model Xrs = δr + γs + ǫrs ǫrs ∼ N(0, ω2) Limits of agreement / Bland-Altman plot X1s − X2s = δ1 − δ2 + (ǫ1s − ǫ2s) 95% reference interval V (ǫ1s − ǫ2s) ∼ N(0, 2ω2)
8
SLIDE 9 Ordinal data: regression models / IRT divide-by-total models
Pr(Xrs = x|θ) =
exp(xθs−x
k=1 βrk)
k=1 βrk)
exp(αr(xθs−x
k=1 βrk))
k=1 βrk))
(K = 1: logistic regression) threshold models
Pr(Xrs = x|θ) =
Φ(..) − Φ(..) expit(..) − expit(..) Thissen, Steinberg. Psychometrika, 1986, 51:567-577.
9
SLIDE 10
θ = θs latent location of subject s θ
10
SLIDE 11
θ = θs latent location of subject s θ
11
SLIDE 12
θ = θs latent location of subject s θ
12
SLIDE 13
θ = θs latent location of subject s θ
13
SLIDE 14
θ = θs latent location of subject s, (βrk)k=1,...,K rater parameters θ
14
SLIDE 15
Marginal homogeneity Rater parameters βr = (βr1, . . . , βrK). Test H0 : βr = β for all r = 1, . . . , R using likelihood ratio test based on (2) or (3) Example: 150 subjects, two raters X1 = 0 X1 = 1 X1 = 2 X2 = 0 9 10 1 X2 = 1 22 59 14 X2 = 2 3 25 7 Bowker’s test S = 8.6, d f = 3, p = 0.0351. LRT based on (2) −2 log Q = 12.2, d f = 2, p = 0.0022.
15
SLIDE 16 Quantify agreement Randomly chosen person s with location θs = θ. Compute reference interval for |X1s − X2s|, Pr(X1s = X2s)
Pr(|X1s − X2s| > 1) if θ ∼ N(0, ω2): computations for ’typical’ person θ = 0. Compare to population distribution of θ.
16
SLIDE 17 Example X ∈ {0, 1, 2} marginal homogeneity H0 : βr1 = βr2 accepted. Common estimate (β1, β2) = (−0.82, −0.75) Table
Pr(X1 = 0, X2 = 0|θ)
Pr(X1 = 0, X2 = 1|θ) Pr(X1 = 0, X2 = 2|θ) Pr(X1 = 1, X2 = 0|θ) Pr(X1 = 1, X2 = 1|θ) Pr(X1 = 1, X2 = 2|θ) Pr(X1 = 2, X2 = 0|θ) Pr(X1 = 2, X2 = 1|θ) Pr(X1 = 2, X2 = 2|θ)
- (normal latent distribution: Typical patient ∼ θ = 0)
17
SLIDE 18
Example X ∈ {0, 1, 2} Agreement Pr((X1, X2) ∈ {(1, 1), (2, 2), (3, 3)}|θ)
18
SLIDE 19 Examples of clinical applications T¨
- nnis grade 0,1,2,3: rating of x-rays in hip surgery population.
Modified Ashworth Scale 0,1,2,3,4,5: Spasticity as complications in spinal cord lesion patients (hospital sample). Sparse tables. Assessment of exercise-induced laryngeal obstruction 0,1,2,3: sub sampling best and worst cases.
19
SLIDE 20
Issues Conditional inference if persons locations cannot be assumed to be normally distributed. Reduced rank parametrization if tables are sparse. Interpretation on original scale.
20
SLIDE 21 Ordinal data: regression models / IRT (Marginal or Conditional inference). divide-by-total models
Pr(Xrs = x|θ) =
exp(xθs−x
k=1 βrk)
k=1 βrk)
exp(αr(xθs−x
k=1 βrk))
k=1 βrk))
(K = 1: logistic regression) threshold models
Pr(Xrs = x|θ) =
Φ(..) − Φ(..) expit(..) − expit(..) Thissen, Steinberg. Psychometrika, 1986, 51:567-577.
21
SLIDE 22 Ordinal data: regression models / IRT (Marginal or Conditional inference). divide-by-total models
Pr(Xrs = x|θ) =
exp(xθs−x
k=1 βrk)
k=1 βrk)
(C, M)
exp(αr(xθs−x
k=1 βrk))
k=1 βrk))
(M)
(K = 1: logistic regression) threshold models
Pr(Xrs = x|θ) =
Φ(..) − Φ(..) (M) expit(..) − expit(..) (M) Thissen, Steinberg. Psychometrika, 1986, 51:567-577.
22
SLIDE 23 let Xs = (X1s, . . . , XRs) and xs = (x1s, . . . , xRs) Marginal inference lM(β) =
log
(2) similar to the model yielding Limits of agreement Conditional inference lC(β) = Pr(Xs = xs|X1s + . . . + XRs = x1s + . . . + xRs) (3) similar to the McNemar test.
Bock, Aitkin. Psychometrika 1981, 46:443-459.
- Andersen. Journal of the Royal Statistical Society B, 1972, 34:42-54.
23
SLIDE 24 Reduced rank parametrization Interpreting and testing differences in rater parameters βr = (βrx)x=1,...,K and βr′ = (βr′x)x=1,...,K can be difficult for K = 4, 5, . . . Reparametrization using ’location’ parameter µr and ’spread’ pa- rameter σr βrx = µr + (2x − m − 1)σr.
- Andrich. Psychometrika, 1982, 47:105-113.
24
SLIDE 25
Reduced rank parametrization Reparametrize (β1+β2
2
, β2−β1
2
). Hypotheses:
Raters differ only wrt. location Raters differ only wrt. spread Raters do not differ 25
SLIDE 26
Interpretation on original scale Probability of agreement across values of θ can be compared to modeled distribution: ϕ(θ). empirical distribution: ˆ θ1, . . . , ˆ θS found by maximizing L(θ) = Pr ˆ
β(Xs = xs|θ).
values E(Xrs|θs = θ), same for all r under marginal homogeneity.
26
SLIDE 27
Example X ∈ {0, 1, 2} Agreement Pr((X1, X2) ∈ {(1, 1), (2, 2), (3, 3)}|θ) θ
27
SLIDE 28
Example X ∈ {0, 1, 2} Agreement Pr((X1, X2) ∈ {(1, 1), (2, 2), (3, 3)}|θ) θ
28
SLIDE 29
Example X ∈ {0, 1, 2} Agreement Pr((X1, X2) ∈ {(1, 1), (2, 2), (3, 3)}|θ) ˆ θ
29
SLIDE 30
Example X ∈ {0, 1, 2} Agreement Pr((X1, X2) ∈ {(1, 1), (2, 2), (3, 3)}|θ) E(X|ˆ θ) = 0.5, 1.0, 1.5
30