Assessing inter-rater agreement in Stata
Daniel Klein klein.daniel.81@gmail.com klein@incher.uni-kassel.de
University of Kassel INCHER-Kassel
15th German Stata Users Group meeting Berlin June 23, 2017
1 / 28
Assessing inter-rater agreement in Stata Daniel Klein - - PowerPoint PPT Presentation
Assessing inter-rater agreement in Stata Daniel Klein klein.daniel.81@gmail.com klein@incher.uni-kassel.de University of Kassel INCHER-Kassel 15th German Stata Users Group meeting Berlin June 23, 2017 1 / 28 Interrater agreement and
1 / 28
2 / 28
3 / 28
◮ r = 2 raters ◮ n subjects ◮ q = 2 categories
4 / 28
◮ subject properties ◮ chance
5 / 28
◮ Define linear weights
◮ Define quadratic weights
6 / 28
7 / 28
8 / 28
9 / 28
◮ 0 pairs ◮ 1 pair ◮ all 3 pairs
◮ here 0, 0.33 or 1
10 / 28
11 / 28
◮ It instead reduces to Scott’s π ◮ Conger (1980) generalizes Cohen’s Kappa
12 / 28
◮ ordinal ◮ ratio ◮ circular ◮ bipolar
13 / 28
◮ Brennan and Prediger (1981) coefficient (κn)
q
q
◮ Gwet’s (2008, 2014) AC (κG)
k=1
l=1 wkl
q
14 / 28
n′
q
q
q
q
kπ′ l
k = 1
n′
15 / 28
◮ based on theoretical distribution under H0 ◮ not necessarily valid for confidence interval construction
◮ valid confidence intervals with few assumptions ◮ computationally intensive
◮ First introduced by Gwet (2014) ◮ sample of subjects drawn from subject universe ◮ sample of raters drawn from rater population 16 / 28
17 / 28
18 / 28
◮ the number of subjects ◮ the number of raters ◮ the number of categories
19 / 28
◮ Cohen’s Kappa, Fleiss Kappa for three or more raters ◮ Caseweise deletion of missing values ◮ Linear, quadratic and user-defined weights (two raters only) ◮ No confidence intervals
◮ Analytic confidence intervals for two raters and two ratings ◮ Bootstrap confidence intervals
◮ Confidence intervals for binomial ratings (uses ci for
◮ Conger’s (weighted) Kappa for three or more raters ◮ Uses available cases ◮ Jackknife confidence intervals ◮ Majority agreement 20 / 28
◮ Ordinal, quadratic and ratio weights ◮ No confidence intervals
◮ Ordinal, quadratic, ratio, circular and bipolar weights ◮ (Pseudo-) bootstrap confidence intervals (not
◮ Two raters with nominal ratings only ◮ No weights (for disagreement) ◮ Confidence intervals (delta method) ◮ Supports basic features of complex survey designs 21 / 28
◮ Observed agreement, Cohen and Conger’s Kappa, Fleiss’
◮ Uses available cases, optional casewise deletion ◮ Ordinal, linear, quadratic, radical, ratio, circular, bipolar,
◮ Confidence intervals for all coefficients (design-based) ◮ Standard errors conditional on sample of subjects, sample
◮ Benchmarking estimated coefficients (probabilistic and
◮ . . . 22 / 28
23 / 28
24 / 28
25 / 28
26 / 28
27 / 28
28 / 28