Your 2 is My 1, Your 3 is My 9: Handling Arbitrary Miscalibrations in Ratings
Jingyan Wang, Nihar B. Shah Carnegie Mellon University
Your 2 is My 1, Your 3 is My 9: Handling Arbitrary Miscalibrations - - PowerPoint PPT Presentation
Your 2 is My 1, Your 3 is My 9: Handling Arbitrary Miscalibrations in Ratings Jingyan Wang, Nihar B. Shah Carnegie Mellon University Miscalibration People have different scales when giving numerical scores. reviewing papers grading essays
Jingyan Wang, Nihar B. Shah Carnegie Mellon University
reviewing papers grading essays rating products
Wang & Shah Arbitrary Miscalibrations in Ratings 1
strict lenient extreme moderate
Wang & Shah Arbitrary Miscalibrations in Ratings 2
Wang & Shah Arbitrary Miscalibrations in Ratings 3
“We experimented with reviewer normalization and generally found it significantly harmful.” — John Langford (ICML 2012 program co-chair)
rankings
no assumptions on calibration
[Paul 1981, Flach et al. 2010, Roos et al. 2011, Baba and Kashima 2013, Ge et al. 2013, MacKay et al. 2017] [Rokeach 1968, Freund et al. 2003, Harzing et al. 2009, Mitliagkas et al. 2011, Ammar et al. 2012, Negahban et al. 2012]
Wang & Shah Arbitrary Miscalibrations in Ratings 4
Wang & Shah Arbitrary Miscalibrations in Ratings 5
", 𝑔 $ are strictly monotonic
", 𝑔 $
𝑦& ∈ [0, 1] 𝑦' ∈ [0, 1]
Calibration function 𝑔
": 0, 1 → [0, 1]
Gives score 𝑔
" 𝑦/ for 𝑗 ∈ {𝐵, 𝐶}
Calibration function 𝑔
$: 0, 1 → [0, 1]
Gives score 𝑔
$ 𝑦/ for 𝑗 ∈ {𝐵, 𝐶}
1 2
Wang & Shah Arbitrary Miscalibrations in Ratings 6
𝑦& ∈ [0, 1] 𝑦' ∈ [0, 1]
Calibration function 𝑔
": 0, 1 → [0, 1]
Gives score 𝑔
" 𝑦/ for 𝑗 ∈ {𝐵, 𝐶}
Calibration function 𝑔
$: 0, 1 → [0, 1]
Gives score 𝑔
$ 𝑦/ for 𝑗 ∈ {𝐵, 𝐶}
1 2
Wang & Shah Arbitrary Miscalibrations in Ratings 7
𝑔
" 𝑦 = 𝑦
𝑔
$ 𝑦 = 𝑦
⇒ 𝑦&< 𝑦'
𝑔
" 𝑦 = 𝑦/2
𝑔
$ 𝑦 = 𝑦
⇒ 𝑦&> 𝑦' 𝑦& = 0.5 𝑦' = 0.8
𝑦& 𝑦' 1 2 𝑦& = 1 𝑦' = 0.8
Wang & Shah Arbitrary Miscalibrations in Ratings 8
Theorem: No deterministic algorithm can always be strictly better than random guessing.
[Robbins 1956] [Cover 1987] [Stein 1956]
Wang & Shah Arbitrary Miscalibrations in Ratings 9
Theorem: This algorithm uniformly and strictly outperforms random guessing. Algorithm: The paper with the higher score is better, with probability "G HIJHK
$
.
Wang & Shah Arbitrary Miscalibrations in Ratings 10
𝒚𝑩 = 𝟏 𝒚𝑪 = 𝟐
Algorithm: The paper with the higher score is better, with probability "G HIJHK
$
.
Wang & Shah Arbitrary Miscalibrations in Ratings 11
𝒈𝟐 𝒚𝑩 = 𝟏 0.1 𝒚𝑪 = 𝟐 0.3
Algorithm: The paper with the higher score is better, with probability "G HIJHK
$
.
Wang & Shah Arbitrary Miscalibrations in Ratings 11
𝒈𝟐 𝒈𝟑 𝒚𝑩 = 𝟏 0.1 0.5 𝒚𝑪 = 𝟐 0.3 0.9
Algorithm: The paper with the higher score is better, with probability "G HIJHK
$
.
Wang & Shah Arbitrary Miscalibrations in Ratings 11
1 + 0.1 − 0.9 2 = 0.9
1 + 0.3 − 0.5 2 = 0.6
0.9 + (1 − 0.6) 2 = 0.65 > 0.5
𝒈𝟐 𝒈𝟑 𝒚𝑩 = 𝟏 0.1 0.5 𝒚𝑪 = 𝟐 0.3 0.9
Algorithm: The paper with the higher score is better, with probability "G HIJHK
$
.
Wang & Shah Arbitrary Miscalibrations in Ratings 11
Wang & Shah Arbitrary Miscalibrations in Ratings 12
[Saxena et al. 2018]
Wang & Shah Arbitrary Miscalibrations in Ratings 13