Fast and Accurate Inference of PlackettLuce Models Lucas Maystre , - PowerPoint PPT Presentation

Fast and Accurate Inference of Plackett–Luce Models Lucas Maystre , Matthias Grossglauser LCA 4, EPFL Swiss Machine Learning Day — November 10 th , 2015 1

Outline 1. Introduction to Plackett–Luce models 2. Model inference : state of the art 3. Unifying ML and spectral algorithms 4. Experimental results 2

Plackett–Luce family of models

Modeling preferences universe of n items Goal: describe, explain & predict choices between alternatives Probabilistic approach 4

Luce's choice axiom Assumption (Luce, 1959.) The odds of choosing item i over item j are independent of the rest of the alternatives. sets of alternatives (contain i and j ) p ( j | A ) = p ( i | B ) p ( i | A ) alternatives p ( j | B ) . a.k.a. “independence of irrelevant alternatives” 5

Consequence of axiom To each item i = 1, ..., n we can assign a number π i ∈ R >0 such that π i p ( i | { 1 , . . . , k } ) = π 1 + · · · + π k π i = strength (or utility , or score ) of item i 6

Bradley–Terry model [Zermelo, 1928; Bradley & Terry, 1952; Ford, 1957] Variant of the model for pairwise comparisons π i p ( i � j ) = π i + π j 7

Plackett–Luce model [Luce, 1959; Plackett 1975] Variant of the model for (partial or full) rankings p ( i � j � k ) = p ( i | { i, j, k } ) · p ( j | { j, k } ) π i π j = · π i + π j + π k π j + π k 8

Rao–Kupper model [Rao & Kupper, 1967] Variant of the model for pairwise comparisons with ties π i p ( i � j ) = π i + απ j ( α 2 − 1) π i π j p ( i ≡ j ) = ( π i + απ j )( π j + απ i ) 9

RUM perspective New parameterization : θ i = log( π i ) X i ⇠ Gumbel( θ i , 1) X i � X j ⇠ Logistic( θ i � θ j , 1) p ( i � j ) = P ( X i � X j > 0) 1 = 1 + e − ( θ i − θ j ) − 4 − 2 0 2 4 6 8 θ i θ j 10

Identifying parameters π i p ( i | { 1 , . . . , k } ) = Defined up to π 1 + · · · + π k multiplicative term We use the following convention: X X θ i = 0 π i = 1 i i 11

Beyond preferences NASCAR rankings GIFGIF experiment (comparative judgment) Chess games 12

Model inference

Maximum-likelihood For conciseness, we consider pairwise comparisons Data in the form of counts : a ji = # times i beat j ◆ a ji ✓ π i Y Y L ( π ) = π i + π j i j 6 = i X X log L ( π ) = a ji (log π i − log( π i + π j )) i j 6 = i Can lead to problems if = 0 Assumption. In every partition of the n items into two subsets A and B , some i ∈ A beats some j ∈ B . 14

Rank Centrality [Negahban et al. 2012] Completely di ff erent take on parameter inference 4 3 1. Items are states of a Markov chain 2. Going from i to j more likely if j 5 6 o fu en won against i 3. Stationary distribution defines the scores 7 8 ( if i 6 = j , ε a ij P ij = 1 � ε P if i = j . k 6 = i a ik 9 1 0 2 15

GMM estimators [Azari Soufiani et al. 2013, 2014] a � b � c � d Generalization of Rank Centrality to rankings � � 1. Breaks the rankings into m choose 2 pairwise comparisons � a � b � a � d 2. Constructs a Markov chain, finds the stationary distribution � a � c b � c � b � d � The resulting estimator is asymptotically consistent c � d 16

Unifying ML inference and spectral algorithms

MLE as stationary distribution X X log L ( π ) = a ji (log π i − log( π i + π j )) i j 6 = i ✓ ◆ 1 1 ∂ X log L ( π ) = − ( a ji + a ij ) a ji π i + π j ∂π i π i j 6 = i ✓ ◆ = 1 π j π i X a ji − a ij π i + π j π i + π j π i j 6 = i a ji a ij incoming X X ∀ i π j = Global balance equations of π i flow π i + π j π i + π j outgoing Markov chain on the states j 6 = i j 6 = i flow transition rates 18

Corresponding MC a ij 8 if i 6 = j , ε k k > π i + π j > < k P ij = a ik X 1 � ε if i = j . k k > π i + π j > : k 6 = i We can iteratively adjust π ! • Stationary distribution is ML • (k +1 ) -th iterate is stationary estimate i ff π = ˆ π distribution of P k • If π i = 1/ n for all i , we recover Rank Centrality • Unique fixed point of iteration is the ML estimate 19

Generalization The same Markov chain formulation applies to other models in the same family! For choices among many alternatives 1 8 X if i 6 = j , ε > P > k 2 A π k > < A 2 D i � j P ij = X 1 � if i = j . P ik > > > : k 6 = i Spectral formulation for ranking data, comparisons with ties , etc... 20

Algorithms Algorithm 1 Luce Spectral Ranking Algorithm 2 Iterative Luce Spectral Ranking Require: observations D Require: observations D 1: � 0 n ⇥ n 1: π [1 /n, . . . , 1 /n ] | 2: for ( i, A ) 2 D do 2: repeat for j 2 A \ { i } do � 0 n ⇥ n 3: 3: � ji � ji + n/ | A | for ( i, A ) 2 D do 4: 4: for j 2 A \ { i } do end for 5: 5: � ji � ji + 1 / P t 2 A ⇡ t 6: end for 6: 7: ¯ π stat. dist. of Markov chain � end for 7: 8: return ¯ end for 8: π π stat. dist. of Markov chain � 9: 10: until convergence What is the statistical What is the e ff iciency of the spectral computational e ff iciency estimate? of the ML algorithm? 21

Experimental results

Statistical e ff iciency Which inference method works best? a � b � c � d � e � f � g � h 0 . 4 k =8 lower bound c � a � b � e � f � d � h � g ML-F GMM-F ML a � d � e � h RMSE LSR b � c � f � g 0 . 2 k =4 c � e � f � g a � b � d � h a � d e � h 0 . 1 b � g c � f k =2 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 2 10 k c � e f � g Take-away : careful derivation of MC leads to better estimator a � d b � h 23

Computational e ff iciency Table 2: Performance of iterative ML inference algorithms. I-LSR MM Newton Dataset I T [s] I T [s] I T [s] γ D NASCAR 0 . 832 3 0 . 08 4 0 . 10 — — Sushi 0 . 890 2 0 . 42 4 1 . 09 3 10 . 45 0 . 002 22 443 . 88 YouTube 12 414 . 44 8 680 — — GIFGIF 0 . 408 10 22 . 31 119 109 . 62 5 72 . 38 0 . 007 55 . 61 49 . 37 Chess 15 43 . 69 181 3 • I-LSR is competitive / faster than the state of the art • MM seems to converge very slowly in certain cases 24

I-LSR and MM mixing 10 0 10 − 2 10 − 4 RMSE 10 − 6 well mixing MM, k = 10 10 − 8 MM, k = 2 10 − 10 I-LSR, k = 10 I-LSR, k = 2 10 − 12 1 2 3 4 5 6 7 8 9 10 iteration Take-away : I-LSR seems to be robust to slow-mixing chains poorly mixing 25

Conclusions • Variety of models derived from Luce's choice axiom • Can interpret maximum-likelihood estimate as stationary distribution of Markov chain • Gives rise to fast and e ff icient spectral inference algorithm • Gives rise to new iterative algorithm for maximum-likelihood inference Paper & code available at: lucas.maystre.ch/nips15 26

Fast and Accurate Inference of PlackettLuce Models Lucas Maystre , - PowerPoint PPT Presentation

Fast and Accurate Inference of PlackettLuce Models Lucas Maystre , Matthias Grossglauser LCA 4, EPFL Swiss Machine Learning Day November 10 th , 2015 1 Outline 1. Introduction to PlackettLuce models 2. Model inference : state of the

Sries Temporelles 2018-2019 M1 MINT Marie-Luce Taupin marie-luce.taupin@univ-evry.fr

Drive-Thru: Drive-Thru: Fast, Accurate Evaluation of Fast, Accurate Evaluation of Storage Power

Comparing Renewable Energy Options for Vermont and the Northeast Dr. Ben Luce Asst. Prof. of

Rare Disease Summer 2014 Webinar August 13, 2014 1 Welcome Bryan Luce, PhD, MBA Chief Science

Il fascino discreto della luce laser Massimo.Ferrario@LNF.INFN.IT

TAKING DATA ON FORM TAKING DATA ON FORM- -WOUND WOUND MOTORS MOTORS By : Manuel Manny

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Fast and Accurate Inference for the Smoothing Parameter in Semiparametric Models Alex Trindade

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

ACCURATE FLOATING-POINT SUMMATION IN CUB URI VERNER Summer intern OUTLINE Who needs accurate

Bio Detectors Accurate and precise Stable system Fast and visible response Versatile

JIT-Assisted Fast-Forward Embedding and Instrumentation to Enable Fast, Accurate, and Agile

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Graphical Models Graphical Models MAP inference Siamak Ravanbakhsh Winter 2018 Learning

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Object Oriented Software Development Naufal F. Setiawan School of Computing and Information

Vietoris-Rips Complexes of Regular Polygons Samir Chowdhury Adam Jaffe The Ohio State University

Communication systems for vehicle electronics Presentation overview Background automotive

Design and Analysis of Safety Critical Systems Peter Seiler and Bin Hu Department of Aerospace

Real-valued average consensus over noisy quantized channels Andrea Censi Richard Murray Control

HIDDENMARKOVMODELS INSPEECHRECOGNITION WayneWard CarnegieMellonUniversity

Signed tropical convexity Georg Loho joint work with L aszl o V egh London School of

CSE 255 Lecture 4 Data Mining and Predictive Analytics Graphical Models 4. Network modularity