Spectral Methods Meet Asymmetry: Two Recent Stories Yuxin Chen - - PowerPoint PPT Presentation

spectral methods meet asymmetry two recent stories
SMART_READER_LITE
LIVE PREVIEW

Spectral Methods Meet Asymmetry: Two Recent Stories Yuxin Chen - - PowerPoint PPT Presentation

Spectral Methods Meet Asymmetry: Two Recent Stories Yuxin Chen Electrical Engineering, Princeton University Spectral methods based on eigen-decomposition = E [ M ] + M E [ M ] M approx. low-rank Methods based on


slide-1
SLIDE 1

Spectral Methods Meet Asymmetry: Two Recent Stories

Yuxin Chen Electrical Engineering, Princeton University

slide-2
SLIDE 2

Spectral methods based on eigen-decomposition

M = E[M]

  • approx. low-rank

+ M − E [M] Methods based on eigen-decomposition of a certain data matrix M ...

2/ 42

slide-3
SLIDE 3

Spectral methods based on eigen-decomposition

M = E[M]

  • approx. low-rank

+ M − E [M] Methods based on eigen-decomposition of a certain data matrix M ... This talk: what happens if data matrix M is non-symmetric? — 2 recent stories

2/ 42

slide-4
SLIDE 4

Asymmetry helps: eigenvalue and eigenvector analyses

  • f asymmetrically perturbed low-rank matrices

Chen Cheng Stanford Stats Jianqing Fan Princeton ORFE

slide-5
SLIDE 5

Eigenvalue / eigenvector estimation

M ⋆: truth

  • A rank-1 matrix: M ⋆ = λ⋆u⋆u⋆⊤ ∈ Rn×n

4/ 42

slide-6
SLIDE 6

Eigenvalue / eigenvector estimation

M ⋆: truth

+

H: noise

  • A rank-1 matrix: M ⋆ = λ⋆u⋆u⋆⊤ ∈ Rn×n
  • Observed noisy data: M = M ⋆ + H

4/ 42

slide-7
SLIDE 7

Eigenvalue / eigenvector estimation

M ⋆: truth

+

H: noise

  • A rank-1 matrix: M ⋆ = λ⋆u⋆u⋆⊤ ∈ Rn×n
  • Observed noisy data: M = M ⋆ + H
  • Goal: estimate eigenvalue λ⋆ and eigenvector u⋆

4/ 42

slide-8
SLIDE 8

Non-symmetric noise matrix

M =

M ⋆ = λ⋆u⋆u⋆⊤

+

H: asymmetric matrix This may arise when, e.g., we have 2 samples for each entry of M ⋆ and arrange them in an asymmetric manner

5/ 42

slide-9
SLIDE 9

A natural estimation strategy: SVD

M =

M ⋆ = λ⋆u⋆u⋆⊤

+

H: asymmetric matrix

  • Use leading singular value λsvd of M to estimate λ⋆
  • Use leading left singular vector of M to estimate u⋆

6/ 42

slide-10
SLIDE 10

A less popular strategy: eigen-decomposition

M =

M ⋆ = λ⋆u⋆u⋆⊤

+

H: asymmetric matrix

  • Use leading singular value λsvd eigenvalue λeigs of M to estimate λ⋆
  • Use leading singular vector eigenvector of M to estimate u⋆

7/ 42

slide-11
SLIDE 11

SVD vs. eigen-decomposition

For asymmetric matrices:

  • Numerical stability

SVD > eigen-decomposition

8/ 42

slide-12
SLIDE 12

SVD vs. eigen-decomposition

For asymmetric matrices:

  • Numerical stability

SVD > eigen-decomposition

  • (Folklore?) Statistical accuracy

SVD ≍ eigen-decomposition

8/ 42

slide-13
SLIDE 13

SVD vs. eigen-decomposition

For asymmetric matrices:

  • Numerical stability

SVD > eigen-decomposition

  • (Folklore?) Statistical accuracy

SVD ≍ eigen-decomposition Shall we always prefer SVD over eigen-decomposition?

8/ 42

slide-14
SLIDE 14

A curious numerical experiment: Gaussian noise

M = u⋆u⋆⊤

M⋆

+ H; {Hi,j} : i.i.d. N(0, σ2), σ =

1 √n log n

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j SVD

  • λsvd − λ?
  • 9/ 42
slide-15
SLIDE 15

A curious numerical experiment: Gaussian noise

M = u⋆u⋆⊤

M⋆

+ H; {Hi,j} : i.i.d. N(0, σ2), σ =

1 √n log n

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD

  • λsvd − λ?
  • .5

n

  • λeigs − λ?
  • 9/ 42
slide-16
SLIDE 16

A curious numerical experiment: Gaussian noise

M = u⋆u⋆⊤

M⋆

+ H; {Hi,j} : i.i.d. N(0, σ2), σ =

1 √n log n

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD Rescaled SVD Error 2.5 pn

  • λsvd − λ?
  • λsvd − λ?
  • .5

n

  • λeigs − λ?
  • 9/ 42
slide-17
SLIDE 17

A curious numerical experiment: Gaussian noise

M = u⋆u⋆⊤

M⋆

+ H; {Hi,j} : i.i.d. N(0, σ2), σ =

1 √n log n

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD Rescaled SVD Error 2.5 pn

  • λsvd − λ?
  • λsvd − λ?
  • .5

n

  • λeigs − λ?
  • empirically,
  • λeigs − λ⋆

≈ 2.5

√n

  • λsvd − λ⋆
  • 9/ 42
slide-18
SLIDE 18

Another numerical experiment: matrix completion

M ⋆ = u⋆u⋆⊤; Mi,j =

1

pM⋆ i,j

with prob. p, 0, else, p = 3 log n

n

        

  • ?

? ?

  • ?

? ?

  • ?

?

  • ?

?

  • ?

? ? ?

  • ?

?

  • ?

? ? ? ? ?

  • ?

?

  • ?

        

n j6 ! 6?j

10/ 42

slide-19
SLIDE 19

Another numerical experiment: matrix completion

M ⋆ = u⋆u⋆⊤; Mi,j =

1

pM⋆ i,j

with prob. p, 0, else, p = 3 log n

n

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD Rescaled SVD Error 2.5 pn

  • λsvd − λ?
  • λsvd − λ?
  • .5

n

  • λeigs − λ?
  • empirically,
  • λeigs − λ⋆

≈ 2.5

√n

  • λsvd − λ⋆
  • 10/ 42
slide-20
SLIDE 20

Why does eigen-decomposition work so much better than SVD?

slide-21
SLIDE 21

Problem setup

M = u⋆u⋆⊤

M⋆

+ H ∈ Rn×n

  • H: noise matrix
  • independent entries: {Hi,j} are independent
  • zero mean: E[Hi,j] = 0
  • variance: Var(Hi,j) ≤ σ2
  • magnitudes: P{|Hi,j| ≥ B} n−12

12/ 42

slide-22
SLIDE 22

Problem setup

M = u⋆u⋆⊤

M⋆

+ H ∈ Rn×n

  • H: noise matrix
  • independent entries: {Hi,j} are independent
  • zero mean: E[Hi,j] = 0
  • variance: Var(Hi,j) ≤ σ2
  • magnitudes: P{|Hi,j| ≥ B} n−12
  • M ⋆ obeys incoherence condition

max

1≤i≤n

  • e⊤

i u⋆

≤ µ

n 4 ei U ? ke>

i U ?k2

12/ 42

slide-23
SLIDE 23

Classical linear algebra results

  • λsvd − λ⋆

≤ H

(Weyl)

  • λeigs − λ⋆

≤ H

(Bauer-Fike)

13/ 42

slide-24
SLIDE 24

Classical linear algebra results

  • λsvd − λ⋆

≤ H

(Weyl)

  • λeigs − λ⋆

≤ H

(Bauer-Fike)

matrix Bernstein inequality

  • λsvd − λ⋆

σ

  • n log n + B log n
  • λeigs − λ⋆

σ

  • n log n + B log n

13/ 42

slide-25
SLIDE 25

Classical linear algebra results

  • λsvd − λ⋆

≤ H

(Weyl)

  • λeigs − λ⋆

≤ H

(Bauer-Fike)

matrix Bernstein inequality

  • λsvd − λ⋆

σ

  • n log n + B log n

(reasonably tight if H is large)

  • λeigs − λ⋆

σ

  • n log n + B log n

13/ 42

slide-26
SLIDE 26

Classical linear algebra results

  • λsvd − λ⋆

≤ H

(Weyl)

  • λeigs − λ⋆

≤ H

(Bauer-Fike)

matrix Bernstein inequality

  • λsvd − λ⋆

σ

  • n log n + B log n

(reasonably tight if H is large)

  • λeigs − λ⋆

σ

  • n log n + B log n

(can be significantly improved)

13/ 42

slide-27
SLIDE 27

Main results: eigenvalue perturbation

Theorem 1 (Chen, Cheng, Fan ’18) With high prob., leading eigenvalue λeigs of M obeys

  • λeigs − λ⋆

µ

n

σ

  • n log n + B log n
  • n

j6 ! 6?j n j6 ! 6?j n j6 ! 6?j

14/ 42

slide-28
SLIDE 28

Main results: eigenvalue perturbation

Theorem 1 (Chen, Cheng, Fan ’18) With high prob., leading eigenvalue λeigs of M obeys

  • λeigs − λ⋆

µ

n

σ

  • n log n + B log n
  • 200

400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD Rescaled SVD Error 2.5 pn

  • λsvd − λ?
  • λsvd − λ?
  • .5

n

  • λeigs − λ?
  • Eigen-decomposition is

n

µ times better than SVD!

— recall

  • λsvd − λ⋆

σ√n log n + B log n

14/ 42

slide-29
SLIDE 29

Main results: entrywise eigenvector perturbation

Theorem 2 (Chen, Cheng, Fan ’18) With high prob., leading eigenvector u of M obeys min

  • u ± u⋆

µ

n

σ

  • n log n + B log n
  • n

min fku ! u?k1 ; ku + u?k1g

15/ 42

slide-30
SLIDE 30

Main results: entrywise eigenvector perturbation

Theorem 2 (Chen, Cheng, Fan ’18) With high prob., leading eigenvector u of M obeys min

  • u ± u⋆

µ

n

σ

  • n log n + B log n
  • if H ≪
  • λ⋆

, then

min

  • u ± u⋆
  • 2 ≪
  • u⋆
  • 2

(classical bound)

n min fku ! u?k1 ; ku + u?k1g

15/ 42

slide-31
SLIDE 31

Main results: entrywise eigenvector perturbation

Theorem 2 (Chen, Cheng, Fan ’18) With high prob., leading eigenvector u of M obeys min

  • u ± u⋆

µ

n

σ

  • n log n + B log n
  • if H ≪
  • λ⋆

, then

min

  • u ± u⋆
  • 2 ≪
  • u⋆
  • 2

(classical bound) min

  • u ± u⋆
  • ∞ ≪
  • u⋆

(our bound)

n min fku ! u?k1 ; ku + u?k1g

15/ 42

slide-32
SLIDE 32

Main results: entrywise eigenvector perturbation

Theorem 2 (Chen, Cheng, Fan ’18) With high prob., leading eigenvector u of M obeys min

  • u ± u⋆

µ

n

σ

  • n log n + B log n
  • if H ≪
  • λ⋆

, then

min

  • u ± u⋆
  • 2 ≪
  • u⋆
  • 2

(classical bound) min

  • u ± u⋆
  • ∞ ≪
  • u⋆

(our bound)

  • entrywise eigenvector perturbation is well-controlled

n min fku ! u?k1 ; ku + u?k1g

15/ 42

slide-33
SLIDE 33

Main results: entrywise eigenvector perturbation

Theorem 2 (Chen, Cheng, Fan ’18) With high prob., leading eigenvector u of M obeys min

  • u ± u⋆

µ

n

σ

  • n log n + B log n
  • 200

400 600 800 1000 1200 1400 1600 1800 2000

n

0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18

min fku ! u?k1 ; ku + u?k1g Eigen-Decomposition SVD

{Hi,j} : i.i.d. N(0, σ2); σ2 =

1 n log n

15/ 42

slide-34
SLIDE 34

Main results: perturbation of linear forms of eigenvectors

Theorem 3 (Chen, Cheng, Fan ’18) Fix any unit vector a. With high prob., leading eigenvector u of M

  • beys

min

  • a⊤(u ± u⋆)
  • max
  • a⊤u⋆

, µ

n

σ

  • n log n + B log n
  • 16/ 42
slide-35
SLIDE 35

Main results: perturbation of linear forms of eigenvectors

Theorem 3 (Chen, Cheng, Fan ’18) Fix any unit vector a. With high prob., leading eigenvector u of M

  • beys

min

  • a⊤(u ± u⋆)
  • max
  • a⊤u⋆

, µ

n

σ

  • n log n + B log n
  • if H ≪
  • λ⋆

, then

min

  • a⊤(u ± u⋆)
  • ≪ max
  • a⊤u⋆

, u⋆∞

  • perturbation of an arbitrary linear form of leading eigenvector is

well-controlled

16/ 42

slide-36
SLIDE 36

Intuition: asymmetry reduces bias

From Neumann series

  • some sort of Taylor expansion
  • ne can verify

|λ − λ⋆| ≍

  • u⋆⊤Hu⋆

λ + u⋆⊤H2u⋆ λ2 + u⋆⊤H3u⋆ λ3 + · · ·

  • 17/ 42
slide-37
SLIDE 37

Intuition: asymmetry reduces bias

From Neumann series

  • some sort of Taylor expansion
  • ne can verify

|λ − λ⋆| ≍

  • u⋆⊤Hu⋆

λ + u⋆⊤H2u⋆ λ2 + u⋆⊤H3u⋆ λ3 + · · ·

  • To develop some intuition, let’s look at 2nd order term

17/ 42

slide-38
SLIDE 38

Intuition: asymmetry reduces bias

From Neumann series

  • some sort of Taylor expansion
  • ne can verify

|λ − λ⋆| ≍

  • u⋆⊤Hu⋆

λ + u⋆⊤H2u⋆ λ2 + u⋆⊤H3u⋆ λ3 + · · ·

  • To develop some intuition, let’s look at 2nd order term
  • if H is symmetric,

E

u⋆⊤H2u⋆ = E Hu⋆2

2

= nσ2

17/ 42

slide-39
SLIDE 39

Intuition: asymmetry reduces bias

From Neumann series

  • some sort of Taylor expansion
  • ne can verify

|λ − λ⋆| ≍

  • u⋆⊤Hu⋆

λ + u⋆⊤H2u⋆ λ2 + u⋆⊤H3u⋆ λ3 + · · ·

  • To develop some intuition, let’s look at 2nd order term
  • if H is symmetric,

E

u⋆⊤H2u⋆ = E Hu⋆2

2

= nσ2

  • if H is asymmetric,

E

u⋆⊤H2u⋆ = E H⊤u⋆, Hu⋆ = σ2

  • much smaller than symmetric case

17/ 42

slide-40
SLIDE 40

What happens if M ⋆ is also not symmetric?

  • A rank-1 matrix: M ⋆ = λ⋆u⋆v⋆⊤ ∈ Rn1×n2
  • Suppose we observe 2 independent noisy copies

M1 = M ⋆ + H1, M2 = M ⋆ + H2

  • Goal: estimate λ⋆, u⋆ and v⋆

18/ 42

slide-41
SLIDE 41

Asymmetrization + dilation

Compute leading eigenvalue / eigenvector of

  • M1

M ⊤

2

  • =
  • M ⋆ + H1

M ⋆⊤ + H⊤

2

  • Our findings (eigenvalue / eigenvector perturbation) continue to

hold for this case!

19/ 42

slide-42
SLIDE 42

Rank-r case

M ⋆: truth

  • A rank-r and well-conditioned matrix: M ⋆ = r

i=1 λ⋆ i u⋆ i u⋆⊤ i

  • Observed noisy data: M = M ⋆ + H, where {Hi,j} are

independent

  • Goal: estimate λ⋆

20/ 42

slide-43
SLIDE 43

Rank-r case

M ⋆: truth

+

H: noise

  • A rank-r and well-conditioned matrix: M ⋆ = r

i=1 λ⋆ i u⋆ i u⋆⊤ i

  • Observed noisy data: M = M ⋆ + H, where {Hi,j} are

independent

  • Goal: estimate λ⋆

20/ 42

slide-44
SLIDE 44

Eigenvalue perturbation: rank-r case

Theorem 4 (Chen, Cheng, Fan ’18) With high prob., ith largest eigenvalue λi (1 ≤ i ≤ r) of M obeys

  • λi − λ⋆

j

  • µr2

n

σ

  • n log n + B log n
  • for some 1 ≤ j ≤ r

21/ 42

slide-45
SLIDE 45

Eigenvalue perturbation: rank-r case

Theorem 4 (Chen, Cheng, Fan ’18) With high prob., ith largest eigenvalue λi (1 ≤ i ≤ r) of M obeys

  • λi − λ⋆

j

  • µr2

n

σ

  • n log n + B log n
  • for some 1 ≤ j ≤ r
  • Eigen-decomposition is

n

µr2 times better than SVD!

21/ 42

slide-46
SLIDE 46

Eigenvalue perturbation: rank-r case

Theorem 4 (Chen, Cheng, Fan ’18) With high prob., ith largest eigenvalue λi (1 ≤ i ≤ r) of M obeys

  • λi − λ⋆

j

  • µr2

n

σ

  • n log n + B log n
  • for some 1 ≤ j ≤ r
  • Eigen-decomposition is

n

µr2 times better than SVD!

  • Might be improvable to
  • µr

n

σ√n log n + B log n ?

21/ 42

slide-47
SLIDE 47

Summary for this part

Eigen-decomposition could be much more powerful than SVD when dealing with non-symmetric data matrices

22/ 42

slide-48
SLIDE 48

Summary for this part

Eigen-decomposition could be much more powerful than SVD when dealing with non-symmetric data matrices Future directions:

  • Eigenvector perturbation for rank-r case
  • Beyond i.i.d. noise
  • Y. Chen, C. Cheng, J. Fan, “Asymmetry helps: Eigenvalue and eigenvector analyses of

asymmetrically perturbed low-rank matrices”, arXiv:1811.12804, 2018

22/ 42

slide-49
SLIDE 49

Spectral Methods are Optimal for Top-K Ranking

Cong Ma Princeton ORFE Kaizheng Wang Princeton ORFE Jianqing Fan Princeton ORFE

slide-50
SLIDE 50

Ranking

A fundamental problem in a wide range of contexts

  • web search, recommendation systems, admissions, sports

competitions, voting, ... PageRank

figure credit: Dzenan Hamzic

24/ 42

slide-51
SLIDE 51

Rank aggregation from pairwise comparisons

pairwise comparisons for ranking top tennis players

figure credit: Boz´

  • ki, Csat´
  • , Temesi

25/ 42

slide-52
SLIDE 52

Parametric models

Assign latent score to each of n items w∗ = [w∗

1, · · · , w∗ n]

i: rank

k w∗

i : preference score

26/ 42

slide-53
SLIDE 53

Parametric models

Assign latent score to each of n items w∗ = [w∗

1, · · · , w∗ n]

i: rank

k w∗

i : preference score

  • This work: Bradley-Terry-Luce (logistic) model

P {item j beats item i} = w∗

j

w∗

i + w∗ j

  • Other models: Thurstone model, low-rank model, ...

26/ 42

slide-54
SLIDE 54

Typical ranking procedures

Estimate latent scores − → rank items based on score estimates

27/ 42

slide-55
SLIDE 55

Top-K ranking

Estimate latent scores − → rank items based on score estimates Goal: identify the set of top-K items under minimal sample size

27/ 42

slide-56
SLIDE 56

Model: random sampling

  • Comparison graph: Erd˝
  • s–Renyi graph G ∼ G(n, p)

1 2 3 4 5 6 7 8 9 10 11 12

  • For each (i, j) ∈ G, obtain L paired comparisons

y(l)

i,j ind.

=

  

1, with prob.

w∗

j

w∗

i +w∗ j

0, else 1 ≤ l ≤ L

28/ 42

slide-57
SLIDE 57

Model: random sampling

  • Comparison graph: Erd˝
  • s–Renyi graph G ∼ G(n, p)

1 2 3 4 5 6 7 8 9 10 11 12

  • For each (i, j) ∈ G, obtain L paired comparisons

yi,j = 1 L

L

  • l=1

y(l)

i,j

(sufficient statistic)

28/ 42

slide-58
SLIDE 58

Prior art

Spectral method MLE mean square error for estimating scores top-K ranking accuracy

✔ ✔

Spectral MLE

✔ ✔

Negahban et al. ‘12 Negahban et al. ’12 Hajek et al. ‘14 Chen & Suh. ’15

29/ 42

slide-59
SLIDE 59

Prior art

Spectral method MLE mean square error for estimating scores top-K ranking accuracy

✔ ✔

Spectral MLE

✔ ✔

“meta metric”

a

Negahban et al. ‘12 Negahban et al. ’12 Hajek et al. ‘14 Chen & Suh. ’15

29/ 42

slide-60
SLIDE 60

Small ℓ2 loss = high ranking accuracy

30/ 42

slide-61
SLIDE 61

Small ℓ2 loss = high ranking accuracy

30/ 42

slide-62
SLIDE 62

Small ℓ2 loss = high ranking accuracy

These two estimates have same ℓ2 loss, but output different rankings

30/ 42

slide-63
SLIDE 63

Small ℓ2 loss = high ranking accuracy

These two estimates have same ℓ2 loss, but output different rankings Need to control entrywise error!

30/ 42

slide-64
SLIDE 64

Optimality?

Is spectral method alone optimal for top-K ranking?

31/ 42

slide-65
SLIDE 65

Optimality?

Is spectral method alone optimal for top-K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense

31/ 42

slide-66
SLIDE 66

Optimality?

Is spectral method alone optimal for top-K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense This work: affirmative answer + entire regime

  • inc. sparse graphs

31/ 42

slide-67
SLIDE 67

Spectral method (Rank Centrality)

Negahban, Oh, Shah ’12

  • Construct a (highly asymmetric) probability transition matrix P ,

whose off-diagonal entries obey Pi,j ∝

  • yi,j,

if (i, j) ∈ G 0, if (i, j) / ∈ G

  • Return score estimate as leading left eigenvector of P

32/ 42

slide-68
SLIDE 68

Rationale behind spectral method

In large-sample case, P → P ∗, whose off-diagonal entries obey P ∗

i,j ∝

  

w∗

j

w∗

i +w∗ j ,

if (i, j) ∈ G 0, if (i, j) / ∈ G

  • Stationary distribution of

reversible

  • check detailed balance

P ∗ π∗ ∝ [w∗

1, w∗ 2, . . . , w∗ n]

  • true score

33/ 42

slide-69
SLIDE 69

Main result

comparison graph G(n, p); sample size ≍ pn2L

1 2 3 4 5 6 7 8 9 10 11 12

Theorem 5 (Chen, Fan, Ma, Wang ’17) When p log n

n , spectral methods achieve optimal sample complexity

for top-K ranking!

34/ 42

slide-70
SLIDE 70

Main result

infeasible

le sample size

4

  • n: ∆K : score separation

separation:

K

score separation achievable by both methods

  • ∆K :=

w∗

(K)−w∗ (K+1)

w∗∞

: score separation

34/ 42

slide-71
SLIDE 71

Comparison with Jang et al ’16

Jang et al ’16: spectral method controls entrywise error if p

  • log n

n

  • relatively dense

35/ 42

slide-72
SLIDE 72

Comparison with Jang et al ’16

Jang et al ’16: spectral method controls entrywise error if p

  • log n

n

  • relatively dense

Our work / optimal sample size e Jang et al ’16

al sample size

log n n

⌘1/4

  • n: ∆K : score separation

35/ 42

slide-73
SLIDE 73

Empirical top-K ranking accuracy

0.1 0.2 0.3 0.4 0.5

"K: score separation

0.2 0.4 0.6 0.8 1

top-K ranking accuracy

Spectral Method Regularized MLE

n = 200, p = 0.25, L = 20

36/ 42

slide-74
SLIDE 74

Optimal control of entrywise error

w∗

1

w∗

2 w∗ 3

w∗

K

w∗

K+1

w1 w2 w3 wK

z }| {

∆K

|{z}

|{z}

wK+1 < 1

2∆K

< 1

2∆K

· · · · · · · · · · · ·

true score

  • re

score estimates Theorem 6 Suppose p log n

n

and sample size n log n

∆2

K . Then with high prob.,

the estimates w returned by both methods obey (up to global scaling) w − w∗∞ w∗∞ < 1 2∆K

37/ 42

slide-75
SLIDE 75

Key ingredient: leave-one-out analysis

For each 1 ≤ m ≤ n, introduce leave-one-out estimate w(m)

1 2 3 𝑛 𝑜 1 2 3 𝑛 𝑜 …

te w(m)

asible

… … …

y y = [yi,j]1≤i,j≤n = ⇒

38/ 42

slide-76
SLIDE 76

Key ingredient: leave-one-out analysis

For each 1 ≤ m ≤ n, introduce leave-one-out estimate w(m)

ize statistical independence

ce stability

38/ 42

slide-77
SLIDE 77

Exploit statistical independence

1 2 3 𝑛 𝑜 1 2 3 𝑛 𝑜 …

te w(m)

… … …

y y = [yi,j]1≤i,j≤n = ⇒

leave-one-out estimate w(m) | = all data related to mth item

39/ 42

slide-78
SLIDE 78

Leave-one-out stability

leave-one-out estimate w(m) ≈ true estimate w

40/ 42

slide-79
SLIDE 79

Leave-one-out stability

leave-one-out estimate w(m) ≈ true estimate w

  • Spectral method: eigenvector perturbation bound

π − ππ∗

  • π(P −

P )

  • π∗

spectral-gap

  • new Davis-Kahan bound for probability transition matrices
  • asymmetric

40/ 42

slide-80
SLIDE 80

A small sample of related works

  • Parametric models
  • Ford ’57
  • Hunter ’04
  • Negahban, Oh, Shah ’12
  • Rajkumar, Agarwal ’14
  • Hajek, Oh, Xu ’14
  • Chen, Suh ’15
  • Rajkumar, Agarwal ’16
  • Jang, Kim, Suh, Oh ’16
  • Suh, Tan, Zhao ’17
  • Non-parametric models
  • Shah, Wainwright ’15
  • Shah, Balakrishnan, Guntuboyina, Wainwright ’16
  • Chen, Gopi, Mao, Schneider ’17
  • Leave-one-out analysis
  • El Karoui, Bean, Bickel, Lim, Yu ’13
  • Zhong, Boumal ’17
  • Abbe, Fan, Wang, Zhong ’17
  • Ma, Wang, Chi, Chen ’17
  • Chen, Chi, Fan, Ma ’18
  • Chen, Chi, Fan, Ma, Yan ’19
  • Chen, Fan, Ma, Yan ’19

41/ 42

slide-81
SLIDE 81

Summary for this part

Spectral method Regularized MLE Optimal sample complexity Linear-time computational complexity

✔ ✔ ✔ ✔

Novel entrywise perturbation analysis for spectral method and convex

  • ptimization

Paper: “Spectral method and regularized MLE are both optimal for top-K ranking”, Y. Chen, J. Fan, C. Ma, K. Wang, Annals of Statistics, vol. 47, 2019

42/ 42