Semantic Krippendorffs for measuring inter- rater agreement in - - PowerPoint PPT Presentation

semantic krippendorff s for measuring inter rater
SMART_READER_LITE
LIVE PREVIEW

Semantic Krippendorffs for measuring inter- rater agreement in - - PowerPoint PPT Presentation

Semantic Krippendorffs for measuring inter- rater agreement in SNOMED CT coding studies Daniel Karlsson a , Kirstine Rosenbeck Geg b , Hkan rman a and Anne Randorff Hjen b a Department of Biomedical Engineering, Linkping


slide-1
SLIDE 1

Semantic Krippendorff’s α for measuring inter- rater agreement in SNOMED CT coding studies

Daniel Karlssona, Kirstine Rosenbeck Gøegb, Håkan Örmana and Anne Randorff Højenb

a Department of Biomedical Engineering, Linköping University, Sweden b Department of Health Science and Technology, Aalborg University, Denmark

slide-2
SLIDE 2

Coding Variation and Inter- rater Agreement

  • Judgement variables
  • Differences in use of

terms/codes of a terminology/coding system

  • Consistency important for

reuse

  • Inter-rater agreement (or

reliability) measures quantify these differences

slide-3
SLIDE 3

Inter-rater Agreement

  • Percentage agreement (“Simple

agreement, proportion of cases in agreement)

  • Chance agreement
  • Statistical significance
  • Cohen’s K & co.
  • Two coders, nominal scale
  • Weighted K
  • Paradoxes
  • ...

399210005 | neurological investigation (procedure) | 268970009 | central nervous system examination (procedure) | 271888005 | on examination - neurological (finding) | 271924005 | neurological test finding (observable entity) | 18373002 | nervous system function (observable entity) | ...

slide-4
SLIDE 4

Agreement and Semantic Distance

  • Some kinds of coding

variation are worse than

  • thers
  • E.g., different granularity vs

different entity types

  • Coding variation impact is use-

case dependent

75367002 | blood pressure (observable entity) | 251076008 | non-invasive arterial pressure (observable entity) | 392570002 | blood pressure finding (finding) | 6973005 | blood pressure taking (procedure) | 371911009 | measurement of blood pressure using cuff method (procedure) | ...

slide-5
SLIDE 5

SNOMED CT and Inter-rater Agreement Studies

  • Fung 2005 and Vikström 2007
  • Cohen’s Κ
  • Hwang 2006 and Chiang 2006
  • Percentage agreement
  • Andrews 2007
  • Krippendorff’s α
slide-6
SLIDE 6

Semantic Krippendorff’s α

  • Difference function based on

SNOMED CT hierarchy

  • IsA-levels up to the least common

subsumer

  • Ordinal scale Krippendorff’s α

lcsPathck≝ (max⁡(min(dist(c, LCS(c,k))), min(dist(k, LCS(c,k)))))2

75367002 | blood pressure (observable entity) | 251076008 | non-invasive arterial pressure (observable entity) | 22 = 4 75367002 | blood pressure (observable entity) | 6973005 | blood pressure taking (procedure) | 52 = 25

slide-7
SLIDE 7

Distance calculation

75367002 | blood pressure (observable entity) | 6973005 | blood pressure taking (procedure) | 138875005 | SNOMED CT Concept | LCS min(dist(c,k)) = 3 min(dist(c,k)) = 5 lcsPathck = max(3, 5)2 = 52 = 25

slide-8
SLIDE 8

Material

  • Two human coders A and B
  • 490 procedure codes cross-

mapped from NCSP to SNOMED CT

  • Percentage agreement 72 %
  • Datasets
  • Dataset AB: human coders
  • Dataset AC: coder A + novice

coding errors introduced

  • Dataset AD: coder A + random

codes

human human A B A C error A D random

slide-9
SLIDE 9

Material

  • Two human coders A and B
  • 490 procedure codes cross-

mapped from NCSP to SNOMED CT

  • Percentage agreement 72 %
  • Datasets
  • Dataset AB: human coders
  • Dataset AC: coder A + novice

coding errors introduced

  • Dataset AD: coder A + random

codes

human human A B A C error A D random

Examples of errors 399210005 | neurological investigation (procedure) | 271888005 | on examination - neurological (finding) | 252641007 | gastrointestinal transit study (procedure) | 83909001 | gastrointestinal transit, function (observable entity) | 275155009 | needle biopsy of kidney (procedure) | 309269002 | kidney biopsy sample (specimen) |

slide-10
SLIDE 10

Material

  • Two human coders A and B
  • 490 procedure codes cross-

mapped from NCSP to SNOMED CT

  • Percentage agreement 72 %
  • Datasets
  • Dataset AB: human coders
  • Dataset AC: coder A + novice

coding errors introduced

  • Dataset AD: coder A + random

codes

human human A B A C error A D random

slide-11
SLIDE 11

Method

  • Implementations for R and MatLab*
  • Difference function matrix precomputed
  • Bootstrapping, 10 000 iterations

* https://github.com/LiU-IMT/semantic_kripp_alpha

slide-12
SLIDE 12

Results

Dataset AB (human-human) Dataset AC (human-novice) Dataset AD (human-random) Nominal α, mean (95 % CI) 0.72 (0.68-0.76) 0.72 (0.68-0.76) 0.72 (0.68-0.76) Semantic α, mean (95 % CI) 0.89 (0.86-0.92) 0.84 (0.80-0.88) 0.47 (0.41-0.53)

slide-13
SLIDE 13

Discussion

  • Paradoxes: prevalence and bias
  • Only a few codes used > 1
  • Real-life datasets
  • Vs. tailored datasets
  • Constant number of exact matches
  • Value of the Semantic α
slide-14
SLIDE 14

Discussion

  • Difference function
  • Difference function should match use case
  • Human interpretation of difference vs.

SNOMED CT aggregation

lcsPath Lin 250411006 | bone marrow finding (finding) | vs 106048009 | respiratory finding (finding) | 1 0.32 8840004 | decreased breath sounds (finding) | vs 65503000 | absent breath sounds (finding) | 1 0.89

slide-15
SLIDE 15

Conclusion

  • Semantic Krippendorff’s α

captures the intuition that distance matters

  • Apples and oranges vs Fruit
  • The distance function needs

consideration

slide-16
SLIDE 16

https://github.com/LiU-IMT/semantic_kripp_alpha