fast and accurate inference of plackett luce models
play

Fast and Accurate Inference of PlackettLuce Models Lucas Maystre , - PowerPoint PPT Presentation

Fast and Accurate Inference of PlackettLuce Models Lucas Maystre , Matthias Grossglauser LCA 4, EPFL Swiss Machine Learning Day November 10 th , 2015 1 Outline 1. Introduction to PlackettLuce models 2. Model inference : state of the


  1. Fast and Accurate Inference of Plackett–Luce Models Lucas Maystre , Matthias Grossglauser LCA 4, EPFL Swiss Machine Learning Day — November 10 th , 2015 1

  2. Outline 1. Introduction to Plackett–Luce models 2. Model inference : state of the art 3. Unifying ML and spectral algorithms 4. Experimental results 2

  3. Plackett–Luce family of models

  4. Modeling preferences universe of n items Goal: describe, explain & predict choices between alternatives Probabilistic approach 4

  5. Luce's choice axiom Assumption (Luce, 1959.) The odds of choosing item i over item j are independent of the rest of the alternatives. sets of alternatives (contain i and j ) p ( j | A ) = p ( i | B ) p ( i | A ) alternatives p ( j | B ) . a.k.a. “independence of irrelevant alternatives” 5

  6. Consequence of axiom To each item i = 1, ..., n we can assign a number π i ∈ R >0 such that π i p ( i | { 1 , . . . , k } ) = π 1 + · · · + π k π i = strength (or utility , or score ) of item i 6

  7. Bradley–Terry model [Zermelo, 1928; Bradley & Terry, 1952; Ford, 1957] Variant of the model for pairwise comparisons π i p ( i � j ) = π i + π j 7

  8. Plackett–Luce model [Luce, 1959; Plackett 1975] Variant of the model for (partial or full) rankings p ( i � j � k ) = p ( i | { i, j, k } ) · p ( j | { j, k } ) π i π j = · π i + π j + π k π j + π k 8

  9. Rao–Kupper model [Rao & Kupper, 1967] Variant of the model for pairwise comparisons with ties π i p ( i � j ) = π i + απ j ( α 2 − 1) π i π j p ( i ≡ j ) = ( π i + απ j )( π j + απ i ) 9

  10. RUM perspective New parameterization : θ i = log( π i ) X i ⇠ Gumbel( θ i , 1) X i � X j ⇠ Logistic( θ i � θ j , 1) p ( i � j ) = P ( X i � X j > 0) 1 = 1 + e − ( θ i − θ j ) − 4 − 2 0 2 4 6 8 θ i θ j 10

  11. Identifying parameters π i p ( i | { 1 , . . . , k } ) = Defined up to π 1 + · · · + π k multiplicative term We use the following convention: X X θ i = 0 π i = 1 i i 11

  12. Beyond preferences NASCAR rankings GIFGIF experiment (comparative judgment) Chess games 12

  13. Model inference

  14. Maximum-likelihood For conciseness, we consider pairwise comparisons Data in the form of counts : a ji = # times i beat j ◆ a ji ✓ π i Y Y L ( π ) = π i + π j i j 6 = i X X log L ( π ) = a ji (log π i − log( π i + π j )) i j 6 = i Can lead to problems if = 0 Assumption. In every partition of the n items into two subsets A and B , some i ∈ A beats some j ∈ B . 14

  15. Rank Centrality [Negahban et al. 2012] Completely di ff erent take on parameter inference 4 3 1. Items are states of a Markov chain 2. Going from i to j more likely if j 5 6 o fu en won against i 3. Stationary distribution defines the scores 7 8 ( if i 6 = j , ε a ij P ij = 1 � ε P if i = j . k 6 = i a ik 9 1 0 2 15

  16. GMM estimators [Azari Soufiani et al. 2013, 2014] a � b � c � d Generalization of Rank Centrality to rankings � � 1. Breaks the rankings into m choose 2 pairwise comparisons � a � b � a � d 2. Constructs a Markov chain, finds the stationary distribution � a � c b � c � b � d � The resulting estimator is asymptotically consistent c � d 16

  17. Unifying ML inference and spectral algorithms

  18. MLE as stationary distribution X X log L ( π ) = a ji (log π i − log( π i + π j )) i j 6 = i ✓ ◆ 1 1 ∂ X log L ( π ) = − ( a ji + a ij ) a ji π i + π j ∂π i π i j 6 = i ✓ ◆ = 1 π j π i X a ji − a ij π i + π j π i + π j π i j 6 = i a ji a ij incoming X X ∀ i π j = Global balance equations of π i flow π i + π j π i + π j outgoing Markov chain on the states j 6 = i j 6 = i flow transition rates 18

  19. Corresponding MC a ij 8 if i 6 = j , ε k k > π i + π j > < k P ij = a ik X 1 � ε if i = j . k k > π i + π j > : k 6 = i We can iteratively adjust π ! • Stationary distribution is ML • (k +1 ) -th iterate is stationary estimate i ff π = ˆ π distribution of P k • If π i = 1/ n for all i , we recover Rank Centrality • Unique fixed point of iteration is the ML estimate 19

  20. Generalization The same Markov chain formulation applies to other models in the same family! For choices among many alternatives 1 8 X if i 6 = j , ε > P > k 2 A π k > < A 2 D i � j P ij = X 1 � if i = j . P ik > > > : k 6 = i Spectral formulation for ranking data, comparisons with ties , etc... 20

  21. Algorithms Algorithm 1 Luce Spectral Ranking Algorithm 2 Iterative Luce Spectral Ranking Require: observations D Require: observations D 1: � 0 n ⇥ n 1: π [1 /n, . . . , 1 /n ] | 2: for ( i, A ) 2 D do 2: repeat for j 2 A \ { i } do � 0 n ⇥ n 3: 3: � ji � ji + n/ | A | for ( i, A ) 2 D do 4: 4: for j 2 A \ { i } do end for 5: 5: � ji � ji + 1 / P t 2 A ⇡ t 6: end for 6: 7: ¯ π stat. dist. of Markov chain � end for 7: 8: return ¯ end for 8: π π stat. dist. of Markov chain � 9: 10: until convergence What is the statistical What is the e ff iciency of the spectral computational e ff iciency estimate? of the ML algorithm? 21

  22. Experimental results

  23. Statistical e ff iciency Which inference method works best? a � b � c � d � e � f � g � h 0 . 4 k =8 lower bound c � a � b � e � f � d � h � g ML-F GMM-F ML a � d � e � h RMSE LSR b � c � f � g 0 . 2 k =4 c � e � f � g a � b � d � h a � d e � h 0 . 1 b � g c � f k =2 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 2 10 k c � e f � g Take-away : careful derivation of MC leads to better estimator a � d b � h 23

  24. Computational e ff iciency Table 2: Performance of iterative ML inference algorithms. I-LSR MM Newton Dataset I T [s] I T [s] I T [s] γ D NASCAR 0 . 832 3 0 . 08 4 0 . 10 — — Sushi 0 . 890 2 0 . 42 4 1 . 09 3 10 . 45 0 . 002 22 443 . 88 YouTube 12 414 . 44 8 680 — — GIFGIF 0 . 408 10 22 . 31 119 109 . 62 5 72 . 38 0 . 007 55 . 61 49 . 37 Chess 15 43 . 69 181 3 • I-LSR is competitive / faster than the state of the art • MM seems to converge very slowly in certain cases 24

  25. I-LSR and MM mixing 10 0 10 − 2 10 − 4 RMSE 10 − 6 well mixing MM, k = 10 10 − 8 MM, k = 2 10 − 10 I-LSR, k = 10 I-LSR, k = 2 10 − 12 1 2 3 4 5 6 7 8 9 10 iteration Take-away : I-LSR seems to be robust to slow-mixing chains poorly mixing 25

  26. Conclusions • Variety of models derived from Luce's choice axiom • Can interpret maximum-likelihood estimate as stationary distribution of Markov chain • Gives rise to fast and e ff icient spectral inference algorithm • Gives rise to new iterative algorithm for maximum-likelihood inference Paper & code available at: lucas.maystre.ch/nips15 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend