Rater agreement - ordinal ratings Karl Bang Christensen Dept. of - - PowerPoint PPT Presentation

rater agreement ordinal ratings karl bang christensen
SMART_READER_LITE
LIVE PREVIEW

Rater agreement - ordinal ratings Karl Bang Christensen Dept. of - - PowerPoint PPT Presentation

Rater agreement - ordinal ratings Karl Bang Christensen Dept. of Biostatistics, Univ. of Copenhagen NORDSTAT, 2012 http://biostat.ku.dk/~kach/ 1 Rater agreement - ordinal ratings Methods for analyzing rater agreement are well-established


slide-1
SLIDE 1

Rater agreement - ordinal ratings Karl Bang Christensen

  • Dept. of Biostatistics,
  • Univ. of Copenhagen

NORDSTAT, 2012 http://biostat.ku.dk/~kach/

1

slide-2
SLIDE 2

Rater agreement - ordinal ratings

Methods for analyzing rater agreement are well-established when ratings are dichotomous or if they can be assumed to be normally distributed.

raters r = 1, . . . , R subjects s = 1, . . . , S Xrs ∈ {0, 1, . . . , K}

2

slide-3
SLIDE 3

Two raters (R=2)

A A A A

Pr(a) observed proportion of agreement, κ coefficient κ = Pr(a) − Pr(e) 1 − Pr(e) (1) where Pr(e) =

  • k

Pr(X1 = k)Pr(X2 = k) is the expected proportion of agreement under independence.

  • Cohen. Educational and Psychological Measurement, 1960,20:37–46.

3

slide-4
SLIDE 4

Two raters (R=2) κ coefficient (1) is widely used (i) the value depends on the margins and thus on the sample (ii) if ratings are ordinal this is not taken into account (iii) no rational for saying that, e.g., κ > 0.7 is good. κ coefficient (1) is a marginal (population average) measure.

  • Cohen. Educational and Psychological Measurement, 1960, 20:37-46.

4

slide-5
SLIDE 5

(i) value depends on margins three tables with agreement about

70 100 of the subjects:

X2 +

  • X2

+

  • X2

+

  • X1

+ 20 20 X1 + 10 20 X1 + 5 20

  • 10

50

  • 10

60

  • 10

65 κ=0.35 κ=0.21 κ=0.08

5

slide-6
SLIDE 6

(ii) if ratings are ordinal this is not taken into account Weighted κ coefficient

A 0.75A 0.50A 0.25A 0.75A A 0.75A 0.50A 0.50A 0.75A A 0.75A 0.25A 0.50A 0.75A A

Arbitrary weights (two standards implemented in SAS)

6

slide-7
SLIDE 7

Marginal homogeneity Beyond agreement we would want Pr(X1 = k) = Pr(X2 = k) for all k = 0, 1, . . . , K Bowkers test of Symmetry tests this hypothesis. For K = 1 this is McNemars test. Pr(X1 = 1, X2 = 0) Pr(X1 = 1, X2 = 0) + Pr(X1 = 0, X2 = 1)

  • McNemar. Psychometrika 1947, 12:153-157.

7

slide-8
SLIDE 8

Continuous data: regression model Xrs = δr + γs + ǫrs ǫrs ∼ N(0, ω2) Limits of agreement / Bland-Altman plot X1s − X2s = δ1 − δ2 + (ǫ1s − ǫ2s) 95% reference interval V (ǫ1s − ǫ2s) ∼ N(0, 2ω2)

8

slide-9
SLIDE 9

Ordinal data: regression models / IRT divide-by-total models

Pr(Xrs = x|θ) =

          

exp(xθs−x

k=1 βrk)

  • l exp(lθs−l

k=1 βrk)

exp(αr(xθs−x

k=1 βrk))

  • l αr(lθs−l

k=1 βrk))

(K = 1: logistic regression) threshold models

Pr(Xrs = x|θ) =

  

Φ(..) − Φ(..) expit(..) − expit(..) Thissen, Steinberg. Psychometrika, 1986, 51:567-577.

9

slide-10
SLIDE 10

θ = θs latent location of subject s θ

10

slide-11
SLIDE 11

θ = θs latent location of subject s θ

11

slide-12
SLIDE 12

θ = θs latent location of subject s θ

12

slide-13
SLIDE 13

θ = θs latent location of subject s θ

13

slide-14
SLIDE 14

θ = θs latent location of subject s, (βrk)k=1,...,K rater parameters θ

14

slide-15
SLIDE 15

Marginal homogeneity Rater parameters βr = (βr1, . . . , βrK). Test H0 : βr = β for all r = 1, . . . , R using likelihood ratio test based on (2) or (3) Example: 150 subjects, two raters X1 = 0 X1 = 1 X1 = 2 X2 = 0 9 10 1 X2 = 1 22 59 14 X2 = 2 3 25 7 Bowker’s test S = 8.6, d f = 3, p = 0.0351. LRT based on (2) −2 log Q = 12.2, d f = 2, p = 0.0022.

15

slide-16
SLIDE 16

Quantify agreement Randomly chosen person s with location θs = θ. Compute reference interval for |X1s − X2s|, Pr(X1s = X2s)

  • r

Pr(|X1s − X2s| > 1) if θ ∼ N(0, ω2): computations for ’typical’ person θ = 0. Compare to population distribution of θ.

16

slide-17
SLIDE 17

Example X ∈ {0, 1, 2} marginal homogeneity H0 : βr1 = βr2 accepted. Common estimate (β1, β2) = (−0.82, −0.75) Table

Pr(X1 = 0, X2 = 0|θ)

Pr(X1 = 0, X2 = 1|θ) Pr(X1 = 0, X2 = 2|θ) Pr(X1 = 1, X2 = 0|θ) Pr(X1 = 1, X2 = 1|θ) Pr(X1 = 1, X2 = 2|θ) Pr(X1 = 2, X2 = 0|θ) Pr(X1 = 2, X2 = 1|θ) Pr(X1 = 2, X2 = 2|θ)

  • (normal latent distribution: Typical patient ∼ θ = 0)

17

slide-18
SLIDE 18

Example X ∈ {0, 1, 2} Agreement Pr((X1, X2) ∈ {(1, 1), (2, 2), (3, 3)}|θ)

18

slide-19
SLIDE 19

Examples of clinical applications T¨

  • nnis grade 0,1,2,3: rating of x-rays in hip surgery population.

Modified Ashworth Scale 0,1,2,3,4,5: Spasticity as complications in spinal cord lesion patients (hospital sample). Sparse tables. Assessment of exercise-induced laryngeal obstruction 0,1,2,3: sub sampling best and worst cases.

19

slide-20
SLIDE 20

Issues Conditional inference if persons locations cannot be assumed to be normally distributed. Reduced rank parametrization if tables are sparse. Interpretation on original scale.

20

slide-21
SLIDE 21

Ordinal data: regression models / IRT (Marginal or Conditional inference). divide-by-total models

Pr(Xrs = x|θ) =

          

exp(xθs−x

k=1 βrk)

  • l exp(lθs−l

k=1 βrk)

exp(αr(xθs−x

k=1 βrk))

  • l αr(lθs−l

k=1 βrk))

(K = 1: logistic regression) threshold models

Pr(Xrs = x|θ) =

  

Φ(..) − Φ(..) expit(..) − expit(..) Thissen, Steinberg. Psychometrika, 1986, 51:567-577.

21

slide-22
SLIDE 22

Ordinal data: regression models / IRT (Marginal or Conditional inference). divide-by-total models

Pr(Xrs = x|θ) =

          

exp(xθs−x

k=1 βrk)

  • l exp(lθs−l

k=1 βrk)

(C, M)

exp(αr(xθs−x

k=1 βrk))

  • l αr(lθs−l

k=1 βrk))

(M)

(K = 1: logistic regression) threshold models

Pr(Xrs = x|θ) =

  

Φ(..) − Φ(..) (M) expit(..) − expit(..) (M) Thissen, Steinberg. Psychometrika, 1986, 51:567-577.

22

slide-23
SLIDE 23

let Xs = (X1s, . . . , XRs) and xs = (x1s, . . . , xRs) Marginal inference lM(β) =

  • s

log

  • Pr(Xs = xs|θs)ϕ(θs)

(2) similar to the model yielding Limits of agreement Conditional inference lC(β) = Pr(Xs = xs|X1s + . . . + XRs = x1s + . . . + xRs) (3) similar to the McNemar test.

Bock, Aitkin. Psychometrika 1981, 46:443-459.

  • Andersen. Journal of the Royal Statistical Society B, 1972, 34:42-54.

23

slide-24
SLIDE 24

Reduced rank parametrization Interpreting and testing differences in rater parameters βr = (βrx)x=1,...,K and βr′ = (βr′x)x=1,...,K can be difficult for K = 4, 5, . . . Reparametrization using ’location’ parameter µr and ’spread’ pa- rameter σr βrx = µr + (2x − m − 1)σr.

  • Andrich. Psychometrika, 1982, 47:105-113.

24

slide-25
SLIDE 25

Reduced rank parametrization Reparametrize (β1+β2

2

, β2−β1

2

). Hypotheses:

Raters differ only wrt. location Raters differ only wrt. spread Raters do not differ 25

slide-26
SLIDE 26

Interpretation on original scale Probability of agreement across values of θ can be compared to modeled distribution: ϕ(θ). empirical distribution: ˆ θ1, . . . , ˆ θS found by maximizing L(θ) = Pr ˆ

β(Xs = xs|θ).

values E(Xrs|θs = θ), same for all r under marginal homogeneity.

26

slide-27
SLIDE 27

Example X ∈ {0, 1, 2} Agreement Pr((X1, X2) ∈ {(1, 1), (2, 2), (3, 3)}|θ) θ

27

slide-28
SLIDE 28

Example X ∈ {0, 1, 2} Agreement Pr((X1, X2) ∈ {(1, 1), (2, 2), (3, 3)}|θ) θ

28

slide-29
SLIDE 29

Example X ∈ {0, 1, 2} Agreement Pr((X1, X2) ∈ {(1, 1), (2, 2), (3, 3)}|θ) ˆ θ

29

slide-30
SLIDE 30

Example X ∈ {0, 1, 2} Agreement Pr((X1, X2) ∈ {(1, 1), (2, 2), (3, 3)}|θ) E(X|ˆ θ) = 0.5, 1.0, 1.5

30