SLIDE 35 University of Essex Motivation Measuring agreement Interpreting agreement Two coders Many coders Weighted coefficients
Example of α 35
Item C-1 C-2 C-3 C-4 C-5 Mean Variance (a) 7 7 7 7 7 7.0 0.0 (b) 5 4 5 6 5 5.0 0.5 (c) 5 5 5 6 4 5.0 0.5 (d) 7 8 6 7 7 7.0 0.5 (e) 4 2 3 3 2 2.8 0.7 (f) 6 7 6 6 6 6.2 0.2 (g) 6 6 6 5 6 5.8 0.2 (h) 7 6 9 6 9 7.4 2.3 (i) 5 5 5 4 5 4.8 0.2 (j) 4 5 2 4 6 4.2 2.2 (k) 3 5 2 4 4 3.6 1.3 (l) 5 5 6 6 5 5.4 0.3 (m) 3 4 2 3 3 3.0 0.5 (n) 2 3 4 3 4 3.2 0.7 (o) 7 7 6 7 7 6.8 0.2 (p) 7 8 7 8 7 7.4 0.3 (q) 3 3 3 1 3 2.6 0.8 (r) 4 2 4 2 4 3.2 1.2 (s) 3 2 3 3 3 2.8 0.2 (t) 4 4 2 4 4 3.6 0.8 (u) 5 6 4 5 6 5.2 0.7 (v) 4 3 4 3 1 3.0 1.5 (w) 6 6 7 5 7 6.2 0.7 (x) 4 5 2 4 3 3.6 1.3 (y) 4 5 5 6 5 5.0 0.5
Mean variance per item: 0.732 Overall: 25 items, 125 judgments.
‘1’ 2 ‘2’ 11 ‘3’ 19 ‘4’ 24 ‘5’ 23 ‘6’ 22 ‘7’ 19 ‘8’ 3 ‘9’ 2
Mean: 4.792, Variance: 3.085 α = 1 − 0.732 3.085 = 0.763 F(24, 100) = 12.891 0.732 = 17.611, p < 1−15
Ron Artstein Quality control of corpus annotation through reliability measures