Fast and Accurate Inference
- f Plackett–Luce Models
Lucas Maystre, Matthias Grossglauser LCA 4, EPFL
1
Swiss Machine Learning Day — November 10th, 2015
Fast and Accurate Inference of PlackettLuce Models Lucas Maystre , - - PowerPoint PPT Presentation
Fast and Accurate Inference of PlackettLuce Models Lucas Maystre , Matthias Grossglauser LCA 4, EPFL Swiss Machine Learning Day November 10 th , 2015 1 Outline 1. Introduction to PlackettLuce models 2. Model inference : state of the
Lucas Maystre, Matthias Grossglauser LCA 4, EPFL
1
Swiss Machine Learning Day — November 10th, 2015
2
universe of n items Goal: describe, explain & predict choices between alternatives Probabilistic approach
4
Assumption (Luce, 1959.) The odds of choosing item i over item j are independent of the rest of the alternatives.
5
p(i | A) p(j | A) = p(i | B) p(j | B).
sets of alternatives (contain i and j) alternatives a.k.a. “independence of irrelevant alternatives”
To each item i = 1, ..., n we can assign a number πi ∈ R>0 such that
6
πi = strength (or utility, or score) of item i
[Zermelo, 1928; Bradley & Terry, 1952; Ford, 1957] Variant of the model for pairwise comparisons
7
[Luce, 1959; Plackett 1975] Variant of the model for (partial or full) rankings
8
[Rao & Kupper, 1967] Variant of the model for pairwise comparisons with ties
9
New parameterization: θi = log(πi)
10
−4 −2 2 4 6 8 θi θj
Xi ⇠ Gumbel(θi, 1) Xi Xj ⇠ Logistic(θi θj, 1) p(i j) = P(Xi Xj > 0) = 1 1 + e−(θi−θj)
11
p(i | {1, . . . , k}) = πi π1 + · · · + πk
Defined up to multiplicative term
We use the following convention:
12
GIFGIF experiment (comparative judgment) NASCAR rankings Chess games
For conciseness, we consider pairwise comparisons Data in the form of counts: aji = # times i beat j
14
L(π) = Y
i
Y
j6=i
✓ πi πi + πj ◆aji log L(π) = X
i
X
j6=i
aji (log πi − log(πi + πj))
Can lead to problems if = 0
A and B, some i ∈ A beats some j ∈ B.
1 2 3 5 7 8 9 4 6
the scores [Negahban et al. 2012] Completely different take on parameter inference
15
Pij = ( εaij if i 6= j, 1 ε P
k6=i aik
if i = j.
16
[Azari Soufiani et al. 2013, 2014] Generalization of Rank Centrality to rankings
The resulting estimator is asymptotically consistent
a b c d
c d
X
j6=i
aji πi + πj πj = X
j6=i
aij πi + πj πi ∀i
18
incoming flow
flow transition rates
log L(π) = X
i
X
j6=i
aji (log πi − log(πi + πj)) ∂ ∂πi log L(π) = X
j6=i
✓ aji 1 πi − (aji + aij) 1 πi + πj ◆ = 1 πi X
j6=i
✓ aji πj πi + πj − aij πi πi + πj ◆
Global balance equations of Markov chain on the states
19
Pij = 8 > > < > > : ε aij πi + πj if i 6= j, 1 ε X
k6=i
aik πi + πj if i = j.
estimate iff
Centrality
π = ˆ π
k k k k k
We can iteratively adjust π!
distribution of Pk
iteration is the ML estimate
20
Pij = 8 > > > < > > > : ε X
A2Dij
1 P
k2A πk
if i 6= j, 1 X
k6=i
Pik if i = j.
The same Markov chain formulation applies to other models in the same family! For choices among many alternatives Spectral formulation for ranking data, comparisons with ties, etc...
21
Algorithm 1 Luce Spectral Ranking Require: observations D
1: 0n⇥n 2: for (i, A) 2 D do 3:
for j 2 A \ {i} do
4:
ji ji + n/|A|
5:
end for
6: end for 7: ¯
π stat. dist. of Markov chain
8: return ¯
π Algorithm 2 Iterative Luce Spectral Ranking Require: observations D
1: π [1/n, . . . , 1/n]| 2: repeat 3:
0n⇥n
4:
for (i, A) 2 D do
5:
for j 2 A \ {i} do
6:
ji ji + 1/ P
t2A ⇡t
7:
end for
8:
end for
9:
π stat. dist. of Markov chain
10: until convergence
What is the statistical efficiency of the spectral estimate? What is the computational efficiency
23
21 22 23 24 25 26 27 28 29 210 k 0.1 0.2 0.4 RMSE lower bound ML-F GMM-F ML LSR
Take-away: careful derivation of MC leads to better estimator
a b c d e f g h c a b e f d h g a d e h b c f g c e f g a b d h a d e h b g c f c e f g a d b h
k=8 k=4 k=2 Which inference method works best?
24
Table 2: Performance of iterative ML inference algorithms.
I-LSR MM Newton Dataset γD I T [s] I T [s] I T [s] NASCAR 0.832 3 0.08 4 0.10 — — Sushi 0.890 2 0.42 4 1.09 3 10.45 YouTube 0.002 12 414.44 8 680 22 443.88 — — GIFGIF 0.408 10 22.31 119 109.62 5 72.38 Chess 0.007 15 43.69 181 55.61 3 49.37
25
1 2 3 4 5 6 7 8 9 10 iteration 100 10−2 10−4 10−6 10−8 10−10 10−12 RMSE MM, k = 10 MM, k = 2 I-LSR, k = 10 I-LSR, k = 2
Take-away: I-LSR seems to be robust to slow-mixing chains
well mixing poorly mixing
Markov chain
26
Paper & code available at: lucas.maystre.ch/nips15