how to design an honest rating system Sergey I. Nikolenko 1,2 AI - PowerPoint PPT Presentation

how to design an honest rating system Sergey I. Nikolenko 1,2 AI Rush 2017 Dnipro, February 18, 2017 1 Laboratory for Internet Studies, NRU Higher School of Economics, St. Petersburg 2 Steklov Institute of Mathematics at St. Petersburg Random facts : February 18, 1268: forces of the Livonian Order defeated by Dovmont of Pskov in the Battle of Rakvere February 18, 1930: Ellie Farm Ollie became the first cow to fly and be milked inside an aircraft February 18, 1943: Joseph Gebbels delivers his Sportpalast speech February 18, 1954: the first Church of Scientology was established in Los Angeles

bayesian rating systems

my personal motivation • «What? Where? When?»: a team game of answering questions. Sometimes it looks like this... 3

my personal motivation • ...but usually it looks like this: 3

my personal motivation • Teams of ≤ 6 players answer questions. • Whoever gets the most correct answers wins. • My motivation was to create a rating system that would predict tournament results by team rosters. • Characteristic features that make the problem hard: • it’s a hobby: players have no contracts, teams do not have permanent rosters, playing for many teams is common; • hence, we cannot just make a rating list of the teams, we need to go deeper, to individual players; • but we do not know how players do, only team results; • relatively few questions per tournament (36, 45, 60), hence multiway ties; • undersized teams are common. 3

introduction • In probabilistic rating models, Bayesian inference aims to find a linear ordering on a certain set given noisy comparisons of relatively small subsets of this set. • Useful whenever there is no way to compare a large number of entities directly, but only partial (noisy) comparisons are available. • We will stick to the metaphor of matches and players. • Elo rating system: first probabilistic rating model. 4

𝛿 1 introduction • Bradley–Terry models: assume that each player has a “true” rating 𝛿 𝑗 , and the win probability is proportional to this rating: 𝛿 1 wins over 𝛿 2 with probability 𝛿 1 +𝛿 2 . • Inference: fit this model to the data from matches played. • Several extensions, but large matches are hard for Bradley–Terry models. • The model that looked right to us for «What? Where? When» was TrueSkill. 4

trueskill factor graph 5

trueskill • TrueSkill was initially developed in MS Research for Xbox Live gaming servers [Graepel, Minka, Herbrich, 2007]. • Given results of team competitions, learn the ratings of players of these teams. • Direct application – matchmaking: find interesting opponents for a player or team. • [Graepel et al., 2010]: AdPredictor. Predicts CTRs of advertisements based on a set of features: the features are a team, and the team wins whenever a user clicks the ad. • Basic idea: construct a probabilistic graphical model for a tournament. 6

trueskill • There is no evidence per se, it is incorporated in the structure of the graph, we just have to marginalize by message passing. • The marginalization problem is complicated by the step functions at the bottom; solved with Expectation Propagation [Minka, 2001]: • approximate messages from 𝕁(𝑒 𝑗 > 𝜗) and 𝕁(|𝑒 𝑗 | ≤ 𝜗) to 𝑒 𝑗 with normal distributions; • repeat message passing on the bottom layer of the graph until convergence. 7

example: a match of four players 8

trueskill problems and solutions • TrueSkill looked perfect for «What? Where? When». • But it didn’t really work due to the following properties of the «What? Where? When» dataset. 1. Teams vary in size (max 6 players, but often incomplete): • undersized teams stand a very good chance against a full one, • so adding player performances to get the team performance does not work. 2. Large multiway ties are common; 30–40 different places (35-50 questions) in a tournament with a thousand teams: • this is deadly for TrueSkill: consider four teams with performances 𝑞 1 , … , 𝑞 4 , 1 has won, and 2–4 drew behind it; • then the factor graph tells us that 𝑢 2 < 𝑢 1 − 𝜗, |𝑢 2 − 𝑢 3 | ≤ 𝜗, |𝑢 3 − 𝑢 4 | ≤ 𝜗. • 𝑢 3 may actually nearly equal 𝑢 1 , and 𝑢 4 may exceed 𝑢 1 ! 9

changes in the factor graph • For the multiway tie problem, we add another layer in the factor graph, namely the layer of place performances 𝑚 𝑗 . • Each team performs in the 𝜗 -neighborhood of its place performance, and place performances relate to each other with strict inequalities like 𝑚 2 < 𝑚 1 − 2𝜗 . • Then it’s inference as usual, no slowdown in convergence. 10

0.7 0.8 200 300 400 500 600 700 0.5 0.6 100 experimental results AUC TSa TSb TS2a TS2b TS2c Tournaments Average AUC over a sliding window of 50 tournaments. 11

more detailed data leads to a simpler model

changes • Several years ago, «What? Where? When?» tournament database started collecting question-wise data. • That is, we now know which specific questions a team has answered; previously we only had standings in a tournament. • So when I got back to the problem of «What? Where? When?» ratings, I found the problem greatly simplified. 13

changes • Sample relevant application: • consider a test suite with many questions that tests something (e.g., IQ or a specific ); • participants answer a random subset of questions; • we need to rate participants but questions are different, so the complexity level cannot be perfectly balanced. • «What? Where? When?» is just like that, but participants are working on the test in teams. 13

baseline: logistic regression • Baseline model – logistic regression; we model: • each player 𝑗 with his or her skill 𝑡 𝑗 , • each question 𝑟 with its complexity score 𝑑 𝑟 , • add the global average 𝜈 , • and train the logistic model 𝑞(𝑦 𝑢𝑟 ∣ 𝑡 𝑗 , 𝑑 𝑟 ) ∼ 𝜏(𝜈 + 𝑡 𝑗 + 𝑑 𝑟 ) for each player 𝑗 ∈ 𝑢 of a participating team 𝑢 ∈ 𝒰 (𝑒) and each question 𝑟 ∈ 𝑅 (𝑒) , where 𝜏(𝑦) = 1/(1 + 𝑓 𝑦 ) is the logistic sigmoid, and 𝑦 𝑢𝑟 denotes whether team 𝑢 answered question 𝑟 correctly. 14

model with latent variables • The logistic model basically assumes that each player successfully answered every question that the team had answered. • But in fact we do not know which player or players have answered. • We only can assume that if the team has failed then no one from this team has done it. • This situation is similar in spirit to presence-only data models found in, e.g., ecology [Ward et al., 2009; Royle et al., 2012]. 15

model with latent variables • Hence, a model with latent variables. • For each player-question pair, we add a latent variable 𝑨 𝑗𝑟 which means «player 𝑗 has answered question 𝑟 ». • For these variables, we have the following constraints: • if 𝑦 𝑢𝑟 = 0 then 𝑨 𝑗𝑟 = 0 for every player 𝑗 ∈ 𝑢 ; • if 𝑦 𝑢𝑟 = 1 then 𝑨 𝑗𝑟 = 1 for at least one player 𝑗 ∈ 𝑢 . 15

{ { 𝜏(𝜈+𝑡 𝑗 +𝑑 𝑟 ) 0 ⎩ ⎨ 𝔽 [𝑨 𝑗𝑟 ] = ⎧ model with latent variables • Model parameters are still skill and complexity of the tasks: 𝑞(𝑨 𝑗𝑟 ∣ 𝑡 𝑗 , 𝑑 𝑟 ) ∼ 𝜏(𝜈 + 𝑡 𝑗 + 𝑑 𝑟 ). • Training with EM: • E-step: fix all 𝑡 𝑗 and 𝑑 𝑟 , compute expected values of latent variables 𝑨 𝑗𝑟 as if 𝑦 𝑢𝑟 = 0, if 𝑦 𝑢𝑟 = 1; 𝑞(𝑨 𝑗𝑟 = 1 ∣ ∃𝑘 ∈ 𝑢 𝑨 𝑘𝑟 = 1) = 1−∏ 𝑘∈𝑢 (1−𝜏(𝜈+𝑡 𝑘 +𝑑 𝑟 )) , • M-step: fix 𝔽 [𝑨 𝑗𝑟 ] and train the logistic model 𝔽 [𝑨 𝑗𝑟 ] ∼ 𝜏(𝜈 + 𝑡 𝑗 + 𝑑 𝑟 ). 15

0.6 0.7 0.8 0.9 0.8 0.7 results • And, sure enough, it works fine. MAP AUC 2 2 2 3 3 3 3 4 4 4 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 . . . . . . . . . . 4 7 0 1 5 8 1 3 6 9 0 0 0 0 1 1 0 0 0 0 16

implementation 17

implementation 18

thank you! Thank you for your attention! Final takeaway points: • Try to collect new data! The new model is much simpler than TrueSkill but still works better because we have more detailed data available. • Don’t be afraid to work on your passions! If you are excited about the problem, you will make better progress, and «real» applications will find you. 19

how to design an honest rating system Sergey I. Nikolenko 1,2 AI - PowerPoint PPT Presentation

how to design an honest rating system Sergey I. Nikolenko 1,2 AI Rush 2017 Dnipro, February 18, 2017 1 Laboratory for Internet Studies, NRU Higher School of Economics, St. Petersburg 2 Steklov Institute of Mathematics at St. Petersburg Random

Rating Factor 1 Review Rating Factor 1 Capacity of the Applicant 1 Rating Factor Review 2

Carroll ISD Financial Accountability Rating Presentation Public Meeting December 3, 2018

The Honest Broker: The Honest Broker: Mediation and Mistrust Mediation and Mistrust Andrew Kydd

Identifying an Honest EXP NP Oracle Among Many Shuichi Hirahara The University of Tokyo CCC

Facility Construction Update Responsibilities, Tasks, and Timelines Bond Rating Bond rating

Model Year and Vehicle Rating LeRoy Boison, FCAS, MAAA CAS 2010 RPM Seminar Chicago, IL

Review Rubric - Presentation Category Does not meet Poor Fair Satisfactory Excellent

2020 Municipal Budget Borough of New Providence May 26, 2020 AAA Rating Our Guiding Principle

GCR GCR Global Credit Rating Co. Limited Global Credit Rating Co. Limited GCR

(IHBG) Competitive NOFA Training Rating Factor 3: Soundness of Approach 1 Rating Factor 3

Design and Rating 1 3D Analysis with AASHTOWare Bridge Design and Rating Heres what youll

FlickOh : Personalized Movie Recommendation and Rating System What is FlickOh? Movie rating

F I R S T Financial Integrity Rating System of Texas F I R S T The FIRST (Financial Integrity

Example of Challenges Unforeseen Ground conditions d d Rock Mass Rating Systems Rock Mass

Mechanisms for Enforcing Honest Signaling Christopher Ahern cahern @ ling . upenn . edu

Honest Signaling and the Maxim of Quality Christopher Ahern University of Pennsylvania April 13,

Who Benefited From Transportation Improvements? Weve seen that many of the transportation

IEEE ETFA 2018 Turin, September 6-th A single camera inspection system to detect and localize

ELECTRAIL COLOUR SLIDES List 30 List 30 01/02/12 Electric Railway Society, 17 Catherine

Rail Security Research Improving Transportation Safety Mark Hartong, Federal Railway

And even give understanding what key - activities such as RIIM ChuSanRen/ChuSanRen behavior

Industrial Automation Automation Industrielle Industrielle Automation 9.2 Dependability -

Functional Dependencies and Normalization There are many forms of constraints on relational

David Gifford Lecture 1 February 4, 2019 http://mit6874.github.io 1 Your guides Sid