Spectral Method and Regularized MLE Are Both Optimal for Top- K - - PowerPoint PPT Presentation
Spectral Method and Regularized MLE Are Both Optimal for Top- K - - PowerPoint PPT Presentation
Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang Ranking A fundamental problem in a wide range of contexts
Ranking
A fundamental problem in a wide range of contexts
- web search, recommendation systems, admissions, sports
competitions, voting, ... PageRank
figure credit: Dzenan Hamzic
Top-K ranking 2/ 21
Rank aggregation from pairwise comparisons
pairwise comparisons for ranking top tennis players
figure credit: Boz´
- ki, Csat´
- , Temesi
Top-K ranking 3/ 21
Parametric models
Assign latent score to each of n items w∗ = [w∗
1, · · · , w∗ n]
i: rank
k w∗
i : preference score
Top-K ranking 4/ 21
Parametric models
Assign latent score to each of n items w∗ = [w∗
1, · · · , w∗ n]
i: rank
k w∗
i : preference score
- This work: Bradley-Terry-Luce (logistic) model
P {item j beats item i} = w∗
j
w∗
i + w∗ j
- Other models: Thurstone model, low-rank model, ...
Top-K ranking 4/ 21
Typical ranking procedures
Estimate latent scores − → rank items based on score estimates
Top-K ranking 5/ 21
Top-K ranking
Estimate latent scores − → rank items based on score estimates Goal: identify the set of top-K items under minimal sample size
Top-K ranking 5/ 21
Model: random sampling
- Comparison graph: Erd˝
- s–R´
enyi graph G ∼ G(n, p)
1 2 3 4 5 6 7 8 9 10 11 12
- For each (i, j) ∈ G, obtain L paired comparisons
y(l)
i,j ind.
=
1, with prob.
w∗
j
w∗
i +w∗ j
0, else 1 ≤ l ≤ L
Top-K ranking 6/ 21
Model: random sampling
- Comparison graph: Erd˝
- s–R´
enyi graph G ∼ G(n, p)
1 2 3 4 5 6 7 8 9 10 11 12
- For each (i, j) ∈ G, obtain L paired comparisons
yi,j = 1 L
L
- l=1
y(l)
i,j
(sufficient statistic)
Top-K ranking 6/ 21
Prior art
Spectral method MLE mean square error for estimating scores top-K ranking accuracy
✔ ✔
Spectral MLE
✔ ✔
Negahban et al. ‘12 Negahban et al. ’12 Hajek et al. ‘14 Chen & Suh. ’15
Top-K ranking 7/ 21
Prior art
Spectral method MLE mean square error for estimating scores top-K ranking accuracy
✔ ✔
Spectral MLE
✔ ✔
“meta metric”
a
Negahban et al. ‘12 Negahban et al. ’12 Hajek et al. ‘14 Chen & Suh. ’15
Top-K ranking 7/ 21
Small ℓ2 loss = high ranking accuracy
Top-K ranking 8/ 21
Small ℓ2 loss = high ranking accuracy
Top-K ranking 8/ 21
Small ℓ2 loss = high ranking accuracy
These two estimates have same ℓ2 loss, but output different rankings
Top-K ranking 8/ 21
Small ℓ2 loss = high ranking accuracy
These two estimates have same ℓ2 loss, but output different rankings Need to control entrywise error!
Top-K ranking 8/ 21
Optimality?
Is spectral method or MLE alone optimal for top-K ranking?
Top-K ranking 9/ 21
Optimality?
Is spectral method or MLE alone optimal for top-K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense
Top-K ranking 9/ 21
Optimality?
Is spectral method or MLE alone optimal for top-K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense This work: affirmative answer for both methods + entire regime
- inc. sparse graphs
Top-K ranking 9/ 21
Spectral method (Rank Centrality)
Negahban, Oh, Shah ’12
- Construct a probability transition matrix P , whose off-diagonal
entries obey Pi,j ∝
- yi,j,
if (i, j) ∈ G 0, if (i, j) / ∈ G
- Return score estimate as leading left eigenvector of P
Top-K ranking 10/ 21
Rationale behind spectral method
In large-sample case, P → P ∗, whose off-diagonal entries obey P ∗
i,j ∝
w∗
j
w∗
i +w∗ j ,
if (i, j) ∈ G 0, if (i, j) / ∈ G
- Stationary distribution of
reversible
- check detailed balance
P ∗ π∗ ∝ [w∗
1, w∗ 2, . . . , w∗ n]
- true score
Top-K ranking 11/ 21
Regularized MLE
Negative log-likelihood L (w) := −
- (i,j)∈G
- yj,i log
wi wi + wj + (1 − yj,i) log wj wi + wj
- L (w) becomes convex after reparametrization:
w − → θ = [θ1, · · · , θn], θi = log wi
Top-K ranking 12/ 21
Regularized MLE
Negative log-likelihood L (w) := −
- (i,j)∈G
- yj,i log
wi wi + wj + (1 − yj,i) log wj wi + wj
- L (w) becomes convex after reparametrization:
w − → θ = [θ1, · · · , θn], θi = log wi (Regularized MLE) minimizeθ Lλ (θ) := L (θ) + 1 2λθ2
2
choose λ ≍
- np log n
L
Top-K ranking 12/ 21
Main result
comparison graph G(n, p); sample size ≍ pn2L
1 2 3 4 5 6 7 8 9 10 11 12
Theorem 1 (Chen, Fan, Ma, Wang ’17) When p log n
n , both spectral method and regularized MLE achieve
- ptimal sample complexity for top-K ranking!
Top-K ranking 13/ 21
Main result
infeasible
le sample size
4
- n: ∆K : score separation
separation:
K
score separation achievable by both methods
- ∆K :=
w∗
(K)−w∗ (K+1)
w∗∞
: score separation
Top-K ranking 13/ 21
Comparison with Jang et al ’16
Jang et al ’16: spectral method controls entrywise error if p
- log n
n
- relatively dense
Top-K ranking 14/ 21
Comparison with Jang et al ’16
Jang et al ’16: spectral method controls entrywise error if p
- log n
n
- relatively dense
Our work / optimal sample size e Jang et al ’16
al sample size
⇣
log n n
⌘1/4
- n: ∆K : score separation
Top-K ranking 14/ 21
Empirical top-K ranking accuracy
0.1 0.2 0.3 0.4 0.5
"K: score separation
0.2 0.4 0.6 0.8 1
top-K ranking accuracy
Spectral Method Regularized MLE
n = 200, p = 0.25, L = 20
Top-K ranking 15/ 21
Optimal control of entrywise error
w∗
1
w∗
2 w∗ 3
w∗
K
w∗
K+1
w1 w2 w3 wK
z }| {
∆K
|{z}
|{z}
wK+1 < 1
2∆K
< 1
2∆K
· · · · · · · · · · · ·
true score
- re
score estimates Theorem 2 Suppose p log n
n
and sample size n log n
∆2
K . Then with high prob.,
the estimates w returned by both methods obey (up to global scaling) w − w∗∞ w∗∞ < 1 2∆K
Top-K ranking 16/ 21
Key ingredient: leave-one-out analysis
For each 1 ≤ m ≤ n, introduce leave-one-out estimate w(m)
1 2 3 𝑛 𝑜 1 2 3 𝑛 𝑜 …
te w(m)
asible
… … …
y y = [yi,j]1≤i,j≤n = ⇒
Top-K ranking 17/ 21
Key ingredient: leave-one-out analysis
For each 1 ≤ m ≤ n, introduce leave-one-out estimate w(m)
ize statistical independence
ce stability
Top-K ranking 17/ 21
Exploit statistical independence
1 2 3 𝑛 𝑜 1 2 3 𝑛 𝑜 …
te w(m)
… … …
y y = [yi,j]1≤i,j≤n = ⇒
leave-one-out estimate w(m) | = all data related to mth item
Top-K ranking 18/ 21
Leave-one-out stability
leave-one-out estimate w(m) ≈ true estimate w
Top-K ranking 19/ 21
Leave-one-out stability
leave-one-out estimate w(m) ≈ true estimate w
- Spectral method: eigenvector perturbation bound
π − ππ∗
- π(P −
P )
- π∗
spectral-gap
- new Davis-Kahan bound for probability transition matrices
- asymmetric
Top-K ranking 19/ 21
Leave-one-out stability
leave-one-out estimate w(m) ≈ true estimate w
- Spectral method: eigenvector perturbation bound
π − ππ∗
- π(P −
P )
- π∗
spectral-gap
- new Davis-Kahan bound for probability transition matrices
- asymmetric
- MLE: local strong convexity
θ − θ2 ∇Lλ(θ; y)2 strong convexity parameter
Top-K ranking 19/ 21
A small sample of related works
- Parametric models
- Ford ’57
- Hunter ’04
- Negahban, Oh, Shah ’12
- Rajkumar, Agarwal ’14
- Hajek, Oh, Xu ’14
- Chen, Suh ’15
- Rajkumar, Agarwal ’16
- Jang, Kim, Suh, Oh ’16
- Suh, Tan, Zhao ’17
- Non-parametric models
- Shah, Wainwright ’15
- Shah, Balakrishnan, Guntuboyina, Wainwright ’16
- Chen, Gopi, Mao, Schneider ’17
- Leave-one-out analysis
- El Karoui, Bean, Bickel, Lim, Yu ’13
- Zhong, Boumal ’17
- Abbe, Fan, Wang, Zhong ’17
- Ma, Wang, Chi, Chen ’17
- Chen, Chi, Fan, Ma ’18
- Chen, Chi, Fan, Ma, Yan ’19
Top-K ranking 20/ 21
Summary
Spectral method Regularized MLE Optimal sample complexity Linear-time computational complexity
✔ ✔ ✔ ✔
Novel entrywise perturbation analysis for spectral method and convex
- ptimization
Paper: “Spectral method and regularized MLE are both optimal for top-K ranking”, Y. Chen, J. Fan, C. Ma, K. Wang, Annals of Statistics, vol. 47, 2019
Top-K ranking 21/ 21