Rater agreement - ordinal ratings Karl Bang Christensen Dept. of - PowerPoint PPT Presentation

Rater agreement - ordinal ratings Karl Bang Christensen Dept. of Biostatistics, Univ. of Copenhagen NORDSTAT, 2012 http://biostat.ku.dk/~kach/ 1

Rater agreement - ordinal ratings Methods for analyzing rater agreement are well-established when ratings are dichotomous or if they can be assumed to be normally distributed. raters r = 1 , . . . , R subjects s = 1 , . . . , S X rs ∈ { 0 , 1 , . . . , K } 2

Two raters (R=2) A A A A Pr ( a ) observed proportion of agreement, κ coefficient κ = Pr ( a ) − Pr ( e ) (1) 1 − Pr ( e ) where � Pr ( e ) = Pr ( X 1 = k ) Pr ( X 2 = k ) k is the expected proportion of agreement under independence. Cohen. Educational and Psychological Measurement, 1960,20:37–46. 3

Two raters (R=2) κ coefficient (1) is widely used (i) the value depends on the margins and thus on the sample (ii) if ratings are ordinal this is not taken into account (iii) no rational for saying that, e.g., κ > 0 . 7 is good. κ coefficient (1) is a marginal (population average) measure. Cohen. Educational and Psychological Measurement, 1960, 20:37-46. 4

(i) value depends on margins 70 three tables with agreement about 100 of the subjects: + - + - + - X 2 X 2 X 2 + 20 20 + 10 20 + 5 20 X 1 X 1 X 1 - 10 50 - 10 60 - 10 65 κ =0.35 κ =0.21 κ =0.08 5

(ii) if ratings are ordinal this is not taken into account Weighted κ coefficient A 0.75A 0.50A 0.25A 0.75A A 0.75A 0.50A 0.50A 0.75A A 0.75A 0.25A 0.50A 0.75A A Arbitrary weights (two standards implemented in SAS) 6

Marginal homogeneity Beyond agreement we would want Pr( X 1 = k ) = Pr( X 2 = k ) for all k = 0 , 1 , . . . , K Bowkers test of Symmetry tests this hypothesis. For K = 1 this is McNemars test. Pr( X 1 = 1 , X 2 = 0) Pr( X 1 = 1 , X 2 = 0) + Pr( X 1 = 0 , X 2 = 1) McNemar. Psychometrika 1947, 12:153-157. 7

Continuous data: regression model ǫ rs ∼ N (0 , ω 2 ) X rs = δ r + γ s + ǫ rs Limits of agreement / Bland-Altman plot X 1 s − X 2 s = δ 1 − δ 2 + ( ǫ 1 s − ǫ 2 s ) 95% reference interval V ( ǫ 1 s − ǫ 2 s ) ∼ N (0 , 2 ω 2 ) 8

Ordinal data: regression models / IRT divide-by-total models  exp( xθ s − � x k =1 β rk )   l exp( lθ s − � l �  k =1 β rk )   Pr( X rs = x | θ ) = exp( α r ( xθ s − � x k =1 β rk ))    l α r ( lθ s − � l  � k =1 β rk ))  ( K = 1: logistic regression) threshold models  Φ( .. ) − Φ( .. )  Pr( X rs = x | θ ) = expit( .. ) − expit( .. )  Thissen, Steinberg. Psychometrika, 1986, 51:567-577. 9

θ = θ s latent location of subject s θ 10

θ = θ s latent location of subject s , ( β rk ) k =1 ,...,K rater parameters θ 14

Marginal homogeneity Rater parameters β r = ( β r 1 , . . . , β rK ). Test H 0 : β r = β for all r = 1 , . . . , R using likelihood ratio test based on (2) or (3) Example: 150 subjects, two raters X 1 = 0 X 1 = 1 X 1 = 2 X 2 = 0 9 10 1 X 2 = 1 22 59 14 X 2 = 2 3 25 7 Bowker’s test S = 8 . 6, d f = 3, p = 0 . 0351. LRT based on (2) − 2 log Q = 12 . 2, d f = 2, p = 0 . 0022. 15

Quantify agreement Randomly chosen person s with location θ s = θ . Compute reference interval for | X 1 s − X 2 s | , Pr( X 1 s = X 2 s ) or Pr( | X 1 s − X 2 s | > 1) if θ ∼ N (0 , ω 2 ): computations for ’typical’ person θ = 0. Compare to population distribution of θ . 16

Example X ∈ { 0 , 1 , 2 } marginal homogeneity H 0 : β r 1 = β r 2 accepted. Common estimate ( β 1 , β 2 ) = ( − 0 . 82 , − 0 . 75) Table � Pr( X 1 = 0 , X 2 = 0 | θ ) Pr( X 1 = 0 , X 2 = 1 | θ ) Pr( X 1 = 0 , X 2 = 2 | θ ) � Pr( X 1 = 1 , X 2 = 0 | θ ) Pr( X 1 = 1 , X 2 = 1 | θ ) Pr( X 1 = 1 , X 2 = 2 | θ ) Pr( X 1 = 2 , X 2 = 0 | θ ) Pr( X 1 = 2 , X 2 = 1 | θ ) Pr( X 1 = 2 , X 2 = 2 | θ ) (normal latent distribution: Typical patient ∼ θ = 0) 17

Example X ∈ { 0 , 1 , 2 } Agreement Pr(( X 1 , X 2 ) ∈ { (1 , 1) , (2 , 2) , (3 , 3) }| θ ) 18

Examples of clinical applications T¨ onnis grade 0,1,2,3: rating of x-rays in hip surgery population. Modified Ashworth Scale 0,1,2,3,4,5: Spasticity as complications in spinal cord lesion patients (hospital sample). Sparse tables. Assessment of exercise-induced laryngeal obstruction 0,1,2,3: sub sampling best and worst cases. 19

Issues Conditional inference if persons locations cannot be assumed to be normally distributed. Reduced rank parametrization if tables are sparse. Interpretation on original scale. 20

Ordinal data: regression models / IRT (Marginal or Conditional inference). divide-by-total models  exp( xθ s − � x k =1 β rk )   l exp( lθ s − � l �  k =1 β rk )   Pr( X rs = x | θ ) = exp( α r ( xθ s − � x k =1 β rk ))    l α r ( lθ s − � l  � k =1 β rk ))  ( K = 1: logistic regression) threshold models  Φ( .. ) − Φ( .. )  Pr( X rs = x | θ ) = expit( .. ) − expit( .. )  Thissen, Steinberg. Psychometrika, 1986, 51:567-577. 21

Ordinal data: regression models / IRT (Marginal or Conditional inference). divide-by-total models  exp( xθ s − � x k =1 β rk ) ( C, M )   l exp( lθ s − � l �  k =1 β rk )   Pr( X rs = x | θ ) = exp( α r ( xθ s − � x k =1 β rk ))   ( M )  l α r ( lθ s − � l  � k =1 β rk ))  ( K = 1: logistic regression) threshold models  Φ( .. ) − Φ( .. ) ( M )  Pr( X rs = x | θ ) = expit( .. ) − expit( .. ) ( M )  Thissen, Steinberg. Psychometrika, 1986, 51:567-577. 22

let X s = ( X 1 s , . . . , X Rs ) and x s = ( x 1 s , . . . , x Rs ) Marginal inference � � l M ( β ) = log Pr ( X s = x s | θ s ) ϕ ( θ s ) (2) s similar to the model yielding Limits of agreement Conditional inference l C ( β ) = Pr ( X s = x s | X 1 s + . . . + X Rs = x 1 s + . . . + x Rs ) (3) similar to the McNemar test. Bock, Aitkin. Psychometrika 1981, 46:443-459. Andersen. Journal of the Royal Statistical Society B, 1972, 34:42-54. 23

Reduced rank parametrization Interpreting and testing differences in rater parameters β r = ( β rx ) x =1 ,...,K and β r ′ = ( β r ′ x ) x =1 ,...,K can be difficult for K = 4 , 5 , . . . Reparametrization using ’location’ parameter µ r and ’spread’ parameter σ r β rx = µ r + (2 x − m − 1) σ r . Andrich. Psychometrika, 1982, 47:105-113. 24

Reduced rank parametrization Reparametrize ( β 1 + β 2 , β 2 − β 1 ). Hypotheses: 2 2 Raters differ only wrt. location Raters differ only wrt. spread Raters do not differ 25

Interpretation on original scale Probability of agreement across values of θ can be compared to modeled distribution: ϕ ( θ ). empirical distribution: ˆ θ 1 , . . . , ˆ θ S found by maximizing L ( θ ) = Pr ˆ β ( X s = x s | θ ) . values E ( X rs | θ s = θ ), same for all r under marginal homogeneity. 26

Example X ∈ { 0 , 1 , 2 } Agreement Pr(( X 1 , X 2 ) ∈ { (1 , 1) , (2 , 2) , (3 , 3) }| θ ) θ 27

Example X ∈ { 0 , 1 , 2 } Agreement Pr(( X 1 , X 2 ) ∈ { (1 , 1) , (2 , 2) , (3 , 3) }| θ ) θ 28

Example X ∈ { 0 , 1 , 2 } Agreement Pr(( X 1 , X 2 ) ∈ { (1 , 1) , (2 , 2) , (3 , 3) }| θ ) ˆ θ 29

Example X ∈ { 0 , 1 , 2 } Agreement Pr(( X 1 , X 2 ) ∈ { (1 , 1) , (2 , 2) , (3 , 3) }| θ ) E ( X | ˆ θ ) = 0 . 5 , 1 . 0 , 1 . 5 30

Rater agreement - ordinal ratings Karl Bang Christensen Dept. of - PowerPoint PPT Presentation

Rater agreement - ordinal ratings Karl Bang Christensen Dept. of Biostatistics, Univ. of Copenhagen NORDSTAT, 2012 http://biostat.ku.dk/~kach/ 1 Rater agreement - ordinal ratings Methods for analyzing rater agreement are well-established

Bang Bang control of elliptic PDEs M. Hinze Fachbereich Mathematik Optimierung und

Automated Scoring and Rater Drift National Conference on Student Assessment Detroit, 2010

Assessing inter-rater agreement in Stata Daniel Klein klein.daniel.81@gmail.com

Brdr Christensen ApS Creating Quality Valves Worldvide since 1958 Brdr. Christensen is a

Ordinal social ranking : simulations for CP-majority rule Nicolas Fayard 1 and Meltem ztrk 1 1

Ordinal Numbers and the Axiom of Substitution Bernd Schr oder logo1 Bernd Schr oder

Representations of Ordinal Numbers Juan Sebasti an C ardenas-Rodr guez Andr es

Academic Affairs Student Ratings Report University-wide System of Student Ratings on Teaching

Fire Group Ratings & Critical Radiant Flux Fire Group Ratings for Interior wall &

The Ordinal Nature of Emotions Georgios N. Yannakakis, Roddy Cowie and Carlos Busso The story

Semantic Krippendorffs for measuring inter- rater agreement in SNOMED CT coding studies

Pricing Bounds and Bang-bang Analysis of the Polaris Variable Annuities Zhiyi (Joey) Shen

Presented Jan. 8, 2014 by Peter Seiler Does any one have a joke about Big Bang? Try

Life & Complexity Organisation, information and optimisation in real and simulated biological

The bang-bang funnel controller Daniel Liberzon and Stephan Trenn 49th IEEE Conference on

Optimality conditions for bang-bang controls (theory and examples) Joint work with Helmut Maurer

Natural Language Processing (CSE 490U): Text Classification Noah Smith 2017 c University of

Predicting voter turnout from survey data Julia Silge Data Scientist at Stack Overflow DataCamp

Evaluation learning algorithm ? Do you want to predict accuracy or predict Charles Sutton

Lecture 4: Hypothesis Testing Ani Manichaikul amanicha@jhsph.edu 20 April 2007 1 / 69 Steps of

Training Global Linear Models for Chinese Word Segmentation Dong Song and Anoop Sarkar Natural

Leveraging a Corpus of Natural Language Descriptions for Program Similarity Meital Zilberstein

Example-Based Automatic Phonetic Transcription Language Resources and Evaluation Conference 2010

Synergy between Proteasome Inhibitors and IMiDs for the treatment of Multiple Myeloma Pr Philippe

Sambuz

Useful Links

Newsletter

Mail Us

Rater agreement - ordinal ratings Karl Bang Christensen Dept. of - PowerPoint PPT Presentation

Rater agreement - ordinal ratings Karl Bang Christensen Dept. of Biostatistics, Univ. of Copenhagen NORDSTAT, 2012 http://biostat.ku.dk/~kach/ 1 Rater agreement - ordinal ratings Methods for analyzing rater agreement are well-established

Bang Bang control of elliptic PDEs M. Hinze Fachbereich Mathematik Optimierung und

Automated Scoring and Rater Drift National Conference on Student Assessment Detroit, 2010

Assessing inter-rater agreement in Stata Daniel Klein klein.daniel.81@gmail.com

Brdr Christensen ApS Creating Quality Valves Worldvide since 1958 Brdr. Christensen is a

Ordinal social ranking : simulations for CP-majority rule Nicolas Fayard 1 and Meltem ztrk 1 1

Ordinal Numbers and the Axiom of Substitution Bernd Schr oder logo1 Bernd Schr oder

Representations of Ordinal Numbers Juan Sebasti an C ardenas-Rodr guez Andr es

Academic Affairs Student Ratings Report University-wide System of Student Ratings on Teaching

Fire Group Ratings &amp; Critical Radiant Flux Fire Group Ratings for Interior wall &amp;

The Ordinal Nature of Emotions Georgios N. Yannakakis, Roddy Cowie and Carlos Busso The story

Semantic Krippendorffs for measuring inter- rater agreement in SNOMED CT coding studies

Pricing Bounds and Bang-bang Analysis of the Polaris Variable Annuities Zhiyi (Joey) Shen

Presented Jan. 8, 2014 by Peter Seiler Does any one have a joke about Big Bang? Try

Life &amp; Complexity Organisation, information and optimisation in real and simulated biological

The bang-bang funnel controller Daniel Liberzon and Stephan Trenn 49th IEEE Conference on

Optimality conditions for bang-bang controls (theory and examples) Joint work with Helmut Maurer

Natural Language Processing (CSE 490U): Text Classification Noah Smith 2017 c University of

Predicting voter turnout from survey data Julia Silge Data Scientist at Stack Overflow DataCamp

Evaluation learning algorithm ? Do you want to predict accuracy or predict Charles Sutton

Lecture 4: Hypothesis Testing Ani Manichaikul amanicha@jhsph.edu 20 April 2007 1 / 69 Steps of

Training Global Linear Models for Chinese Word Segmentation Dong Song and Anoop Sarkar Natural

Leveraging a Corpus of Natural Language Descriptions for Program Similarity Meital Zilberstein

Example-Based Automatic Phonetic Transcription Language Resources and Evaluation Conference 2010

Synergy between Proteasome Inhibitors and IMiDs for the treatment of Multiple Myeloma Pr Philippe

Sambuz

Useful Links

Newsletter

Mail Us

Fire Group Ratings & Critical Radiant Flux Fire Group Ratings for Interior wall &

Life & Complexity Organisation, information and optimisation in real and simulated biological