Spectral Method and Regularized MLE Are Both Optimal for Top- K - - PowerPoint PPT Presentation

spectral method and regularized mle are both optimal for
SMART_READER_LITE
LIVE PREVIEW

Spectral Method and Regularized MLE Are Both Optimal for Top- K - - PowerPoint PPT Presentation

Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang Ranking A fundamental problem in a wide range of contexts


slide-1
SLIDE 1

Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking

Yuxin Chen Electrical Engineering, Princeton University

Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang

slide-2
SLIDE 2

Ranking

A fundamental problem in a wide range of contexts

  • web search, recommendation systems, admissions, sports

competitions, voting, ... PageRank

figure credit: Dzenan Hamzic

Top-K ranking 2/ 21

slide-3
SLIDE 3

Rank aggregation from pairwise comparisons

pairwise comparisons for ranking top tennis players

figure credit: Boz´

  • ki, Csat´
  • , Temesi

Top-K ranking 3/ 21

slide-4
SLIDE 4

Parametric models

Assign latent score to each of n items w∗ = [w∗

1, · · · , w∗ n]

i: rank

k w∗

i : preference score

Top-K ranking 4/ 21

slide-5
SLIDE 5

Parametric models

Assign latent score to each of n items w∗ = [w∗

1, · · · , w∗ n]

i: rank

k w∗

i : preference score

  • This work: Bradley-Terry-Luce (logistic) model

P {item j beats item i} = w∗

j

w∗

i + w∗ j

  • Other models: Thurstone model, low-rank model, ...

Top-K ranking 4/ 21

slide-6
SLIDE 6

Typical ranking procedures

Estimate latent scores − → rank items based on score estimates

Top-K ranking 5/ 21

slide-7
SLIDE 7

Top-K ranking

Estimate latent scores − → rank items based on score estimates Goal: identify the set of top-K items under minimal sample size

Top-K ranking 5/ 21

slide-8
SLIDE 8

Model: random sampling

  • Comparison graph: Erd˝
  • s–R´

enyi graph G ∼ G(n, p)

1 2 3 4 5 6 7 8 9 10 11 12

  • For each (i, j) ∈ G, obtain L paired comparisons

y(l)

i,j ind.

=

  

1, with prob.

w∗

j

w∗

i +w∗ j

0, else 1 ≤ l ≤ L

Top-K ranking 6/ 21

slide-9
SLIDE 9

Model: random sampling

  • Comparison graph: Erd˝
  • s–R´

enyi graph G ∼ G(n, p)

1 2 3 4 5 6 7 8 9 10 11 12

  • For each (i, j) ∈ G, obtain L paired comparisons

yi,j = 1 L

L

  • l=1

y(l)

i,j

(sufficient statistic)

Top-K ranking 6/ 21

slide-10
SLIDE 10

Prior art

Spectral method MLE mean square error for estimating scores top-K ranking accuracy

✔ ✔

Spectral MLE

✔ ✔

Negahban et al. ‘12 Negahban et al. ’12 Hajek et al. ‘14 Chen & Suh. ’15

Top-K ranking 7/ 21

slide-11
SLIDE 11

Prior art

Spectral method MLE mean square error for estimating scores top-K ranking accuracy

✔ ✔

Spectral MLE

✔ ✔

“meta metric”

a

Negahban et al. ‘12 Negahban et al. ’12 Hajek et al. ‘14 Chen & Suh. ’15

Top-K ranking 7/ 21

slide-12
SLIDE 12

Small ℓ2 loss = high ranking accuracy

Top-K ranking 8/ 21

slide-13
SLIDE 13

Small ℓ2 loss = high ranking accuracy

Top-K ranking 8/ 21

slide-14
SLIDE 14

Small ℓ2 loss = high ranking accuracy

These two estimates have same ℓ2 loss, but output different rankings

Top-K ranking 8/ 21

slide-15
SLIDE 15

Small ℓ2 loss = high ranking accuracy

These two estimates have same ℓ2 loss, but output different rankings Need to control entrywise error!

Top-K ranking 8/ 21

slide-16
SLIDE 16

Optimality?

Is spectral method or MLE alone optimal for top-K ranking?

Top-K ranking 9/ 21

slide-17
SLIDE 17

Optimality?

Is spectral method or MLE alone optimal for top-K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense

Top-K ranking 9/ 21

slide-18
SLIDE 18

Optimality?

Is spectral method or MLE alone optimal for top-K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense This work: affirmative answer for both methods + entire regime

  • inc. sparse graphs

Top-K ranking 9/ 21

slide-19
SLIDE 19

Spectral method (Rank Centrality)

Negahban, Oh, Shah ’12

  • Construct a probability transition matrix P , whose off-diagonal

entries obey Pi,j ∝

  • yi,j,

if (i, j) ∈ G 0, if (i, j) / ∈ G

  • Return score estimate as leading left eigenvector of P

Top-K ranking 10/ 21

slide-20
SLIDE 20

Rationale behind spectral method

In large-sample case, P → P ∗, whose off-diagonal entries obey P ∗

i,j ∝

  

w∗

j

w∗

i +w∗ j ,

if (i, j) ∈ G 0, if (i, j) / ∈ G

  • Stationary distribution of

reversible

  • check detailed balance

P ∗ π∗ ∝ [w∗

1, w∗ 2, . . . , w∗ n]

  • true score

Top-K ranking 11/ 21

slide-21
SLIDE 21

Regularized MLE

Negative log-likelihood L (w) := −

  • (i,j)∈G
  • yj,i log

wi wi + wj + (1 − yj,i) log wj wi + wj

  • L (w) becomes convex after reparametrization:

w − → θ = [θ1, · · · , θn], θi = log wi

Top-K ranking 12/ 21

slide-22
SLIDE 22

Regularized MLE

Negative log-likelihood L (w) := −

  • (i,j)∈G
  • yj,i log

wi wi + wj + (1 − yj,i) log wj wi + wj

  • L (w) becomes convex after reparametrization:

w − → θ = [θ1, · · · , θn], θi = log wi (Regularized MLE) minimizeθ Lλ (θ) := L (θ) + 1 2λθ2

2

choose λ ≍

  • np log n

L

Top-K ranking 12/ 21

slide-23
SLIDE 23

Main result

comparison graph G(n, p); sample size ≍ pn2L

1 2 3 4 5 6 7 8 9 10 11 12

Theorem 1 (Chen, Fan, Ma, Wang ’17) When p log n

n , both spectral method and regularized MLE achieve

  • ptimal sample complexity for top-K ranking!

Top-K ranking 13/ 21

slide-24
SLIDE 24

Main result

infeasible

le sample size

4

  • n: ∆K : score separation

separation:

K

score separation achievable by both methods

  • ∆K :=

w∗

(K)−w∗ (K+1)

w∗∞

: score separation

Top-K ranking 13/ 21

slide-25
SLIDE 25

Comparison with Jang et al ’16

Jang et al ’16: spectral method controls entrywise error if p

  • log n

n

  • relatively dense

Top-K ranking 14/ 21

slide-26
SLIDE 26

Comparison with Jang et al ’16

Jang et al ’16: spectral method controls entrywise error if p

  • log n

n

  • relatively dense

Our work / optimal sample size e Jang et al ’16

al sample size

log n n

⌘1/4

  • n: ∆K : score separation

Top-K ranking 14/ 21

slide-27
SLIDE 27

Empirical top-K ranking accuracy

0.1 0.2 0.3 0.4 0.5

"K: score separation

0.2 0.4 0.6 0.8 1

top-K ranking accuracy

Spectral Method Regularized MLE

n = 200, p = 0.25, L = 20

Top-K ranking 15/ 21

slide-28
SLIDE 28

Optimal control of entrywise error

w∗

1

w∗

2 w∗ 3

w∗

K

w∗

K+1

w1 w2 w3 wK

z }| {

∆K

|{z}

|{z}

wK+1 < 1

2∆K

< 1

2∆K

· · · · · · · · · · · ·

true score

  • re

score estimates Theorem 2 Suppose p log n

n

and sample size n log n

∆2

K . Then with high prob.,

the estimates w returned by both methods obey (up to global scaling) w − w∗∞ w∗∞ < 1 2∆K

Top-K ranking 16/ 21

slide-29
SLIDE 29

Key ingredient: leave-one-out analysis

For each 1 ≤ m ≤ n, introduce leave-one-out estimate w(m)

1 2 3 𝑛 𝑜 1 2 3 𝑛 𝑜 …

te w(m)

asible

… … …

y y = [yi,j]1≤i,j≤n = ⇒

Top-K ranking 17/ 21

slide-30
SLIDE 30

Key ingredient: leave-one-out analysis

For each 1 ≤ m ≤ n, introduce leave-one-out estimate w(m)

ize statistical independence

ce stability

Top-K ranking 17/ 21

slide-31
SLIDE 31

Exploit statistical independence

1 2 3 𝑛 𝑜 1 2 3 𝑛 𝑜 …

te w(m)

… … …

y y = [yi,j]1≤i,j≤n = ⇒

leave-one-out estimate w(m) | = all data related to mth item

Top-K ranking 18/ 21

slide-32
SLIDE 32

Leave-one-out stability

leave-one-out estimate w(m) ≈ true estimate w

Top-K ranking 19/ 21

slide-33
SLIDE 33

Leave-one-out stability

leave-one-out estimate w(m) ≈ true estimate w

  • Spectral method: eigenvector perturbation bound

π − ππ∗

  • π(P −

P )

  • π∗

spectral-gap

  • new Davis-Kahan bound for probability transition matrices
  • asymmetric

Top-K ranking 19/ 21

slide-34
SLIDE 34

Leave-one-out stability

leave-one-out estimate w(m) ≈ true estimate w

  • Spectral method: eigenvector perturbation bound

π − ππ∗

  • π(P −

P )

  • π∗

spectral-gap

  • new Davis-Kahan bound for probability transition matrices
  • asymmetric
  • MLE: local strong convexity

θ − θ2 ∇Lλ(θ; y)2 strong convexity parameter

Top-K ranking 19/ 21

slide-35
SLIDE 35

A small sample of related works

  • Parametric models
  • Ford ’57
  • Hunter ’04
  • Negahban, Oh, Shah ’12
  • Rajkumar, Agarwal ’14
  • Hajek, Oh, Xu ’14
  • Chen, Suh ’15
  • Rajkumar, Agarwal ’16
  • Jang, Kim, Suh, Oh ’16
  • Suh, Tan, Zhao ’17
  • Non-parametric models
  • Shah, Wainwright ’15
  • Shah, Balakrishnan, Guntuboyina, Wainwright ’16
  • Chen, Gopi, Mao, Schneider ’17
  • Leave-one-out analysis
  • El Karoui, Bean, Bickel, Lim, Yu ’13
  • Zhong, Boumal ’17
  • Abbe, Fan, Wang, Zhong ’17
  • Ma, Wang, Chi, Chen ’17
  • Chen, Chi, Fan, Ma ’18
  • Chen, Chi, Fan, Ma, Yan ’19

Top-K ranking 20/ 21

slide-36
SLIDE 36

Summary

Spectral method Regularized MLE Optimal sample complexity Linear-time computational complexity

✔ ✔ ✔ ✔

Novel entrywise perturbation analysis for spectral method and convex

  • ptimization

Paper: “Spectral method and regularized MLE are both optimal for top-K ranking”, Y. Chen, J. Fan, C. Ma, K. Wang, Annals of Statistics, vol. 47, 2019

Top-K ranking 21/ 21