Your 2 is My 1, Your 3 is My 9: Handling Arbitrary Miscalibrations - PowerPoint PPT Presentation

Your 2 is My 1, Your 3 is My 9: Handling Arbitrary Miscalibrations in Ratings Jingyan Wang, Nihar B. Shah Carnegie Mellon University

Miscalibration People have different scales when giving numerical scores. reviewing papers grading essays rating products Arbitrary Miscalibrations in Ratings Wang & Shah 1

People are miscalibrated strict lenient extreme moderate …… …… Arbitrary Miscalibrations in Ratings Wang & Shah 2

Miscalibration • Ammar et al. 2012 “The rating scale as well as the individual ratings are often arbitrary and may not be consistent from one user to another.” • Mitliagkas et al. 2011 “A raw rating of 7 out of 10 in the absence of any other information is potentially useless.” What should we do with these scores? Arbitrary Miscalibrations in Ratings Wang & Shah 3

Two approaches in the literature 1. Assume simplified models for calibration [Paul 1981, Flach et al. 2010, Roos et al. 2011, Baba and Kashima 2013, Ge et al. 2013, MacKay et al. 2017] • People are complex [e.g. Griffin and Brenner 2008] • Did not work well in practice: “We experimented with reviewer normalization and generally found it significantly harmful.” — John Langford (ICML 2012 program co-chair) 2. Use rankings [Rokeach 1968, Freund et al. 2003, Harzing et al. 2009, Mitliagkas et al. 2011, Ammar et al. 2012, Negahban et al. 2012] • Use rankings induced from the scores or directly collect rankings • Commonly believed to be the only useful information, if no assumptions on calibration Arbitrary Miscalibrations in Ratings Wang & Shah 4

Folklore belief Freund et al. 2003 “[Using rankings instead of ratings] becomes very important when we combine the rankings of many viewers who often use completely different ranges of scores to express identical preferences.” Is it possible to do better than rankings with essentially no assumptions on the calibration? Arbitrary Miscalibrations in Ratings Wang & Shah 5

Simplified setting Calibration function 𝑔 " : 0, 1 → [0, 1] 𝑦 & ∈ [0, 1] 1 Gives score 𝑔 " 𝑦 / for 𝑗 ∈ {𝐵, 𝐶} Calibration function 𝑔 $ : 0, 1 → [0, 1] 𝑦 ' ∈ [0, 1] 2 Gives score 𝑔 $ 𝑦 / for 𝑗 ∈ {𝐵, 𝐶} • 𝑔 " , 𝑔 $ are strictly monotonic • Adversary chooses 𝑦 & , 𝑦 ' and strictly monotonic 𝑔 " , 𝑔 $ • Papers assigned to reviewers at random Arbitrary Miscalibrations in Ratings Wang & Shah 6

Simplified setting Calibration function 𝑔 " : 0, 1 → [0, 1] 𝑦 & ∈ [0, 1] 1 Gives score 𝑔 " 𝑦 / for 𝑗 ∈ {𝐵, 𝐶} Calibration function 𝑔 $ : 0, 1 → [0, 1] 𝑦 ' ∈ [0, 1] 2 Gives score 𝑔 $ 𝑦 / for 𝑗 ∈ {𝐵, 𝐶} • Goal: infer 𝑦 & > 𝑦 ' or 𝑦 & < 𝑦 ' ? • Eliciting ranking vacuous: random guessing baseline • 𝑧 / denotes score given by reviewer 𝑗 ∈ {1, 2} Given 𝑧 " , 𝑧 $ , assignment , is it possible to infer 𝑦 & > 𝑦 ' or 𝑦 & < 𝑦 ' better than random guessing? Arbitrary Miscalibrations in Ratings Wang & Shah 7

Impossibility? Intuition: The reported scores can be either due to x, or due to f. 𝑧 " = 0.5 𝑦 & 1 𝑦 ' 𝑧 $ = 0.8 2 Case I: Case II: 𝑔 " 𝑦 = 𝑦 𝑔 " 𝑦 = 𝑦/2 𝑦 & = 1 𝑦 & = 0.5 𝑔 $ 𝑦 = 𝑦 𝑔 $ 𝑦 = 𝑦 𝑦 ' = 0.8 𝑦 ' = 0.8 ⇒ 𝑦 & > 𝑦 ' ⇒ 𝑦 & < 𝑦 ' Arbitrary Miscalibrations in Ratings Wang & Shah 8

Impossibility… for deterministic algorithms Theorem: No deterministic algorithm can always be strictly better than random guessing. • Stein’s paradox [Stein 1956] • Empirical Bayes [Robbins 1956] • Two envelope problem [Cover 1987] Arbitrary Miscalibrations in Ratings Wang & Shah 9

Proposed algorithm Algorithm: The paper with the higher score is better, with probability "G H I JH K . $ Theorem: This algorithm uniformly and strictly outperforms random guessing. Scores > rankings! Arbitrary Miscalibrations in Ratings Wang & Shah 10

Intuition Algorithm: The paper with the higher score is better, with probability "G H I JH K . $ 𝒚 𝑩 = 𝟏 𝒚 𝑪 = 𝟐 Arbitrary Miscalibrations in Ratings Wang & Shah 11

Intuition Algorithm: The paper with the higher score is better, with probability "G H I JH K . $ 𝒈 𝟐 𝒚 𝑩 = 𝟏 0.1 𝒚 𝑪 = 𝟐 0.3 Arbitrary Miscalibrations in Ratings Wang & Shah 11

Intuition Algorithm: The paper with the higher score is better, with probability "G H I JH K . $ 𝒈 𝟐 𝒈 𝟑 𝒚 𝑩 = 𝟏 0.1 0.5 𝒚 𝑪 = 𝟐 0.3 0.9 Arbitrary Miscalibrations in Ratings Wang & Shah 11

Intuition Algorithm: The paper with the higher score is better, with probability "G H I JH K . $ 𝒈 𝟐 𝒈 𝟑 𝒚 𝑩 = 𝟏 0.1 0.5 𝒚 𝑪 = 𝟐 0.3 0.9 • Under blue assignment, output paper B with probability 1 + 0.1 − 0.9 = 0.9 2 • Under red assignment, output paper A with probability 1 + 0.3 − 0.5 = 0.6 2 • On average, correct with probability 0.9 + (1 − 0.6) = 0.65 > 0.5 2 Arbitrary Miscalibrations in Ratings Wang & Shah 11

Extensions • A/B testing and ranking • Noisy setting Arbitrary Miscalibrations in Ratings Wang & Shah 12

Take-aways • Scores > rankings in presence of arbitrary miscalibration • Randomized decisions good for both inference and fairness [Saxena et al. 2018] Arbitrary Miscalibrations in Ratings Wang & Shah 13

Your 2 is My 1, Your 3 is My 9: Handling Arbitrary Miscalibrations - PowerPoint PPT Presentation

Your 2 is My 1, Your 3 is My 9: Handling Arbitrary Miscalibrations in Ratings Jingyan Wang, Nihar B. Shah Carnegie Mellon University Miscalibration People have different scales when giving numerical scores. reviewing papers grading essays

Material Handling Chapter 5 Designing material handling systems Overview of material

Module 4.5 - Memory and Data Locality Handling Arbitrary Matrix Sizes in Tiled Algorithms

Powerpoint Presentation On Manual Handling Powerpoint Presentation On Manual Handling We proudly

Manual Handling Risk Assessment Powerpoint Presentation Manual handling technique. Hansen Manual

LITHIUM ION IN MATERIALS HANDLING LITHIUM ION IN MATERIALS HANDLING LITHIUM ION IN WAREHOUSE

Hand Ball Hand Ball What?? Handling the Ball Handling the Ball Goal - Consistent Calls

Self-testing quantum systems of arbitrary local Self-testing quantum systems of arbitrary local

Safe and Reliable Test Results Handling Running a practice session on results handling How to

HANDLING B2B OBJECTIONS National Growth Webinar RICK LAMBERT ALAN WHITE Sales Performance

Error Handling in RCMS Error Handling in RCMS An Overview Francesco Lelli

1 HERMES HERMES Re-inventing Ground Re-inventing Ground Handling Handling 2 HERMES Created

Control Exception Handling: Exception handling is the control of error conditions or other

Overview Attacks Handling Security Incidents Security Incidents Handling Security

Exceptions Announcements Exceptions Today's Topic: Handling Errors 4 Today's Topic: Handling

BK4047B 20 MHz Dual Channel Function/Arbitrary Waveform Generator BK4047B 20 MHz Dual Channel

Faster arbitrary-precision dot product and matrix multiplication Fredrik Johansson Inria

A First Investigation of Sturmian Trees Jean Berstel 2 , Luc Boasson 1 Olivier Carton 1 , Isabelle

Observations of IPv6 Addresses David Malone <David.Malone@nuim.ie> Hamilton Institute, NUI

Finite-state Strategies in Delay Games Martin Zimmermann Saarland University September 21st,

rs stt rtr

Spectral Learning Techniques for Weighted Automata, Transducers, and Grammars Borja Balle

Predicting Share Prices in Real-Time with Apache Spark and Apache Ignite MANUEL MOURATO Summary

Fractal Structures in Functions Related to Number Theory Je ff Lagarias University of Michigan

String Matching Inge Li Grtz CLRS 32 String Matching String matching problem: string

Sambuz

Useful Links

Newsletter

Mail Us

Your 2 is My 1, Your 3 is My 9: Handling Arbitrary Miscalibrations - PowerPoint PPT Presentation

Your 2 is My 1, Your 3 is My 9: Handling Arbitrary Miscalibrations in Ratings Jingyan Wang, Nihar B. Shah Carnegie Mellon University Miscalibration People have different scales when giving numerical scores. reviewing papers grading essays

Material Handling Chapter 5 Designing material handling systems Overview of material

Module 4.5 - Memory and Data Locality Handling Arbitrary Matrix Sizes in Tiled Algorithms

Powerpoint Presentation On Manual Handling Powerpoint Presentation On Manual Handling We proudly

Manual Handling Risk Assessment Powerpoint Presentation Manual handling technique. Hansen Manual

LITHIUM ION IN MATERIALS HANDLING LITHIUM ION IN MATERIALS HANDLING LITHIUM ION IN WAREHOUSE

Hand Ball Hand Ball What?? Handling the Ball Handling the Ball Goal - Consistent Calls

Self-testing quantum systems of arbitrary local Self-testing quantum systems of arbitrary local

Safe and Reliable Test Results Handling Running a practice session on results handling How to

HANDLING B2B OBJECTIONS National Growth Webinar RICK LAMBERT ALAN WHITE Sales Performance

Error Handling in RCMS Error Handling in RCMS An Overview Francesco Lelli

1 HERMES HERMES Re-inventing Ground Re-inventing Ground Handling Handling 2 HERMES Created

Control Exception Handling: Exception handling is the control of error conditions or other

Overview Attacks Handling Security Incidents Security Incidents Handling Security

Exceptions Announcements Exceptions Today's Topic: Handling Errors 4 Today's Topic: Handling

BK4047B 20 MHz Dual Channel Function/Arbitrary Waveform Generator BK4047B 20 MHz Dual Channel

Faster arbitrary-precision dot product and matrix multiplication Fredrik Johansson Inria

A First Investigation of Sturmian Trees Jean Berstel 2 , Luc Boasson 1 Olivier Carton 1 , Isabelle

Observations of IPv6 Addresses David Malone &lt;David.Malone@nuim.ie&gt; Hamilton Institute, NUI

Finite-state Strategies in Delay Games Martin Zimmermann Saarland University September 21st,

rs stt rtr

Spectral Learning Techniques for Weighted Automata, Transducers, and Grammars Borja Balle

Predicting Share Prices in Real-Time with Apache Spark and Apache Ignite MANUEL MOURATO Summary

Fractal Structures in Functions Related to Number Theory Je ff Lagarias University of Michigan

String Matching Inge Li Grtz CLRS 32 String Matching String matching problem: string

Sambuz

Useful Links

Newsletter

Mail Us

Observations of IPv6 Addresses David Malone <David.Malone@nuim.ie> Hamilton Institute, NUI