Spectral Method and Regularized MLE Are Both Optimal for Top- K - - PowerPoint PPT Presentation

spectral method and regularized mle are both optimal for
SMART_READER_LITE
LIVE PREVIEW

Spectral Method and Regularized MLE Are Both Optimal for Top- K - - PowerPoint PPT Presentation

Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Cong Ma ORFE, Princeton University Joint work with Yuxin Chen, Jianqing Fan and Kaizheng Wang Ranking A fundamental problem in a wide range of contexts web search,


slide-1
SLIDE 1

Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking

Cong Ma ORFE, Princeton University

Joint work with Yuxin Chen, Jianqing Fan and Kaizheng Wang

slide-2
SLIDE 2

Ranking

A fundamental problem in a wide range of contexts

  • web search, recommendation systems, admissions, sports

competitions, voting, ... PageRank

figure credit: Dzenan Hamzic

Top-K ranking 2/ 20

slide-3
SLIDE 3

Rank aggregation from pairwise comparisons

pairwise comparisons for ranking top tennis players

figure credit: Boz´

  • ki, Csat´
  • , Temesi

Top-K ranking 3/ 20

slide-4
SLIDE 4

Parametric models

Assign latent preference score to each of n items w∗ = [w∗

1, · · · , w∗ n]

i: rank

k w∗

i : preference score

Top-K ranking 4/ 20

slide-5
SLIDE 5

Parametric models

Assign latent preference score to each of n items w∗ = [w∗

1, · · · , w∗ n]

i: rank

k w∗

i : preference score

  • This work: Bradley-Terry-Luce model: for w∗ ∈ Rn

+

P {item j beats item i} = w∗

j

w∗

i + w∗ j

Top-K ranking 4/ 20

slide-6
SLIDE 6

Other parametric models

  • Thurstone model: for w∗ ∈ Rn

P {item j beats item i} = Φ

  • Gaussian cdf
  • w∗

j − w∗ i

  • Top-K ranking

5/ 20

slide-7
SLIDE 7

Other parametric models

  • Thurstone model: for w∗ ∈ Rn

P {item j beats item i} = Φ

  • Gaussian cdf
  • w∗

j − w∗ i

  • Parametric models: for nondecreasing f : R → [0, 1] which obey

f(t) = 1 − f(−t), ∀t ∈ R Then we set P {item j beats item i} = f

  • w∗

j − w∗ i

  • Top-K ranking

5/ 20

slide-8
SLIDE 8

Typical ranking procedures

Estimate latent scores − → rank items based on score estimates

Top-K ranking 6/ 20

slide-9
SLIDE 9

Top-K ranking

Estimate latent scores − → rank items based on score estimates Goal: identify the set of top-K items with pairwise comparisons

Top-K ranking 6/ 20

slide-10
SLIDE 10

Model: random sampling

  • Comparison graph: Erd˝
  • s–R´

enyi graph G ∼ G(n, p)

1 2 3 4 5 6 7 8 9 10 11 12

  • For each (i, j) ∈ G, obtain L paired comparisons

y(l)

i,j ind.

=

  

1, with prob.

w∗

j

w∗

i +w∗ j

0, else 1 ≤ l ≤ L

Top-K ranking 7/ 20

slide-11
SLIDE 11

Model: random sampling

  • Comparison graph: Erd˝
  • s–R´

enyi graph G ∼ G(n, p)

1 2 3 4 5 6 7 8 9 10 11 12

  • For each (i, j) ∈ G, obtain L paired comparisons

yi,j = 1 L

L

  • l=1

y(l)

i,j

(sufficient statistic)

Top-K ranking 7/ 20

slide-12
SLIDE 12

Spectral method (Rank Centrality)

Negahban, Oh, Shah ’12

  • Construct a probability transition matrix P = [Pi,j]1≤i,j≤n:

Pi,j =

      

1 dyi,j,

if (i, j) ∈ E, 1 − 1

d

  • k:(i,k)∈E yi,k,

if i = j, 0,

  • therwise.
  • Return score estimate as leading left eigenvector of P

Top-K ranking 8/ 20

slide-13
SLIDE 13

Rationale behind spectral method

In large-sample case, P

L→∞

− − − − → P ∗ = [P ∗

i,j]1≤i,j≤n:

P ∗

i,j =

        

1 d w∗

j

w∗

i +w∗ j ,

if (i, j) ∈ E, 1 − 1

d

  • k:(i,k)∈E

w∗

k

w∗

i +w∗ k ,

if i = j, 0,

  • therwise.

Top-K ranking 9/ 20

slide-14
SLIDE 14

Rationale behind spectral method

In large-sample case, P

L→∞

− − − − → P ∗ = [P ∗

i,j]1≤i,j≤n:

P ∗

i,j =

        

1 d w∗

j

w∗

i +w∗ j ,

if (i, j) ∈ E, 1 − 1

d

  • k:(i,k)∈E

w∗

k

w∗

i +w∗ k ,

if i = j, 0,

  • therwise.
  • Stationary distribution of P ∗:

π∗ := 1

n

i=1 w∗ i

[w∗

1, w∗ 2, . . . , w∗ n]⊤

Top-K ranking 9/ 20

slide-15
SLIDE 15

Rationale behind spectral method

In large-sample case, P

L→∞

− − − − → P ∗ = [P ∗

i,j]1≤i,j≤n:

P ∗

i,j =

        

1 d w∗

j

w∗

i +w∗ j ,

if (i, j) ∈ E, 1 − 1

d

  • k:(i,k)∈E

w∗

k

w∗

i +w∗ k ,

if i = j, 0,

  • therwise.
  • Stationary distribution of P ∗:

π∗ := 1

n

i=1 w∗ i

[w∗

1, w∗ 2, . . . , w∗ n]⊤

  • Check detailed balance!

Top-K ranking 9/ 20

slide-16
SLIDE 16

Regularized MLE

Negative log-likelihood L (w) := −

  • (i,j)∈G
  • yj,i log

wi wi + wj + (1 − yj,i) log wj wi + wj

  • Top-K ranking

10/ 20

slide-17
SLIDE 17

Regularized MLE

Negative log-likelihood L (w) := −

  • (i,j)∈G
  • yj,i log

wi wi + wj + (1 − yj,i) log wj wi + wj

  • θi=log wi

− − − − − − → L (θ) :=

  • (i,j)∈G
  • −yj,i (θi − θj) + log
  • 1 + eθi−θj
  • Top-K ranking

10/ 20

slide-18
SLIDE 18

Regularized MLE

Negative log-likelihood L (w) := −

  • (i,j)∈G
  • yj,i log

wi wi + wj + (1 − yj,i) log wj wi + wj

  • θi=log wi

− − − − − − → L (θ) :=

  • (i,j)∈G
  • −yj,i (θi − θj) + log
  • 1 + eθi−θj
  • (Regularized MLE)

minimizeθ Lλ (θ) := L (θ) + 1 2λθ2

2

choose λ ≍

  • np log n

L

Top-K ranking 10/ 20

slide-19
SLIDE 19

Prior art

Spectral method MLE mean square error for estimating scores top-K ranking accuracy

✔ ✔

Spectral MLE

✔ ✔

Negahban et al. ‘12 Negahban et al. ’12 Hajek et al. ‘14 Chen & Suh. ’15

Top-K ranking 11/ 20

slide-20
SLIDE 20

Prior art

Spectral method MLE mean square error for estimating scores top-K ranking accuracy

✔ ✔

Spectral MLE

✔ ✔

“meta metric”

a

Negahban et al. ‘12 Negahban et al. ’12 Hajek et al. ‘14 Chen & Suh. ’15

Top-K ranking 11/ 20

slide-21
SLIDE 21

Small ℓ2 loss = high ranking accuracy

Top-K ranking 12/ 20

slide-22
SLIDE 22

Small ℓ2 loss = high ranking accuracy

Top-K ranking 12/ 20

slide-23
SLIDE 23

Small ℓ2 loss = high ranking accuracy

These two estimates have same ℓ2 loss, but output different rankings

Top-K ranking 12/ 20

slide-24
SLIDE 24

Small ℓ2 loss = high ranking accuracy

These two estimates have same ℓ2 loss, but output different rankings Need to control entrywise error!

Top-K ranking 12/ 20

slide-25
SLIDE 25

Optimality?

Is spectral method or MLE alone optimal for top-K ranking?

Top-K ranking 13/ 20

slide-26
SLIDE 26

Optimality?

Is spectral method or MLE alone optimal for top-K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense

Top-K ranking 13/ 20

slide-27
SLIDE 27

Optimality?

Is spectral method or MLE alone optimal for top-K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense This work: affirmative answer for both methods + entire regime

  • inc. sparse graphs

Top-K ranking 13/ 20

slide-28
SLIDE 28

Main result

comparison graph G(n, p); sample size ≍ pn2L

1 2 3 4 5 6 7 8 9 10 11 12

Theorem 1 (Chen, Fan, Ma, Wang ’17) When p log n

n , both spectral method and regularized MLE achieve

  • ptimal sample complexity for top-K ranking!

Top-K ranking 14/ 20

slide-29
SLIDE 29

Main result

infeasible

le sample size

4

  • n: ∆K : score separation

separation:

K

score separation achievable by both methods

  • ∆K :=

w∗

(K)−w∗ (K+1)

w∗∞

: score separation

Top-K ranking 14/ 20

slide-30
SLIDE 30

Empirical top-K ranking accuracy

0.1 0.2 0.3 0.4 0.5

"K: score separation

0.2 0.4 0.6 0.8 1

top-K ranking accuracy

Spectral Method Regularized MLE

n = 200, p = 0.25, L = 20

Top-K ranking 15/ 20

slide-31
SLIDE 31

Optimal control of entrywise error

w∗

1

w∗

2 w∗ 3

w∗

K

w∗

K+1

w1 w2 w3 wK

z }| {

∆K

|{z}

|{z}

wK+1 < 1

2∆K

< 1

2∆K

· · · · · · · · · · · ·

true score

  • re

score estimates Theorem 2 Suppose p log n

n

and sample size n log n

∆2

K . Then with high prob.,

the estimates w returned by both methods obey (up to global scaling) w − w∗∞ w∗∞ < 1 2∆K

Top-K ranking 16/ 20

slide-32
SLIDE 32

Key ingredient: leave-one-out analysis

For each 1 ≤ m ≤ n, introduce leave-one-out estimate w(m)

1 2 3 𝑛 𝑜 1 2 3 𝑛 𝑜 …

te w(m)

… … …

y y = [yi,j]1≤i,j≤n = ⇒

Top-K ranking 17/ 20

slide-33
SLIDE 33

Leave-one-out stability

leave-one-out estimate w(m) ≈ true estimate w

Top-K ranking 18/ 20

slide-34
SLIDE 34

Leave-one-out stability

leave-one-out estimate w(m) ≈ true estimate w

  • Spectral method: eigenvector perturbation bound

π − ππ∗

  • π(P −

P )

  • π∗

spectral-gap

  • new Davis-Kahan bound for probability transition matrices
  • asymmetric

Top-K ranking 18/ 20

slide-35
SLIDE 35

Leave-one-out stability

leave-one-out estimate w(m) ≈ true estimate w

  • Spectral method: eigenvector perturbation bound

π − ππ∗

  • π(P −

P )

  • π∗

spectral-gap

  • new Davis-Kahan bound for probability transition matrices
  • asymmetric
  • MLE: local strong convexity

θ − θ2 ∇Lλ(θ; y)2 strong convexity parameter

Top-K ranking 18/ 20

slide-36
SLIDE 36

Summary

Spectral method Regularized MLE Optimal sample complexity Linear-time computational complexity

✔ ✔ ✔ ✔

Novel entrywise perturbation analysis for spectral method and convex

  • ptimization

Paper: “Spectral method and regularized MLE are both optimal for top-K ranking”, Y. Chen, J. Fan, C. Ma, K. Wang, arxiv:1707.09971, 2017

Top-K ranking 19/ 20