spectral method and regularized mle are both optimal for
play

Spectral Method and Regularized MLE Are Both Optimal for Top- K - PowerPoint PPT Presentation

Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang Ranking A fundamental problem in a wide range of contexts


  1. Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang

  2. Ranking A fundamental problem in a wide range of contexts • web search, recommendation systems, admissions, sports competitions, voting, ... PageRank figure credit: Dzenan Hamzic Top- K ranking 2/ 21

  3. Rank aggregation from pairwise comparisons pairwise comparisons for ranking top tennis players figure credit: Boz´ oki, Csat´ o, Temesi Top- K ranking 3/ 21

  4. Parametric models Assign latent score to each of n items w ∗ = [ w ∗ 1 , · · · , w ∗ n ] i : preference score w ∗ i : rank k Top- K ranking 4/ 21

  5. Parametric models Assign latent score to each of n items w ∗ = [ w ∗ 1 , · · · , w ∗ n ] i : preference score w ∗ i : rank k • This work: Bradley-Terry-Luce (logistic) model w ∗ j P { item j beats item i } = w ∗ i + w ∗ j • Other models: Thurstone model, low-rank model, ... Top- K ranking 4/ 21

  6. Typical ranking procedures Estimate latent scores − → rank items based on score estimates Top- K ranking 5/ 21

  7. Top- K ranking Estimate latent scores − → rank items based on score estimates Goal: identify the set of top- K items under minimal sample size Top- K ranking 5/ 21

  8. Model: random sampling • Comparison graph: Erd˝ os–R´ enyi graph G ∼ G ( n, p ) 4 5 3 6 2 7 1 8 12 9 11 10 • For each ( i, j ) ∈ G , obtain L paired comparisons  w ∗  1 , with prob. j y ( l ) ind. i + w ∗ w ∗ = 1 ≤ l ≤ L j i,j  0 , else Top- K ranking 6/ 21

  9. Model: random sampling • Comparison graph: Erd˝ os–R´ enyi graph G ∼ G ( n, p ) 4 5 3 6 2 7 1 8 12 9 11 10 • For each ( i, j ) ∈ G , obtain L paired comparisons L � y i,j = 1 y ( l ) ( sufficient statistic ) i,j L l =1 Top- K ranking 6/ 21

  10. Prior art mean square error top- K ranking for estimating scores accuracy ✔ Negahban et al. ‘12 Spectral method Negahban et al. ’12 ✔ MLE Hajek et al. ‘14 ✔ ✔ Spectral MLE Chen & Suh. ’15 Top- K ranking 7/ 21

  11. Prior art a “meta metric” mean square error top- K ranking for estimating scores accuracy ✔ Negahban et al. ‘12 Spectral method Negahban et al. ’12 ✔ MLE Hajek et al. ‘14 ✔ ✔ Spectral MLE Chen & Suh. ’15 Top- K ranking 7/ 21

  12. Small ℓ 2 loss � = high ranking accuracy Top- K ranking 8/ 21

  13. Small ℓ 2 loss � = high ranking accuracy Top- K ranking 8/ 21

  14. Small ℓ 2 loss � = high ranking accuracy These two estimates have same ℓ 2 loss, but output different rankings Top- K ranking 8/ 21

  15. Small ℓ 2 loss � = high ranking accuracy These two estimates have same ℓ 2 loss, but output different rankings Need to control entrywise error! Top- K ranking 8/ 21

  16. Optimality? Is spectral method or MLE alone optimal for top- K ranking? Top- K ranking 9/ 21

  17. Optimality? Is spectral method or MLE alone optimal for top- K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense Top- K ranking 9/ 21

  18. Optimality? Is spectral method or MLE alone optimal for top- K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense This work: affirmative answer for both methods + entire regime � �� � inc. sparse graphs Top- K ranking 9/ 21

  19. Spectral method (Rank Centrality) Negahban, Oh, Shah ’12 • Construct a probability transition matrix P , whose off-diagonal entries obey � y i,j , if ( i, j ) ∈ G P i,j ∝ 0 , if ( i, j ) / ∈ G • Return score estimate as leading left eigenvector of P Top- K ranking 10/ 21

  20. Rationale behind spectral method In large-sample case, P → P ∗ , whose off-diagonal entries obey  w ∗  j j , if ( i, j ) ∈ G P ∗ w ∗ i + w ∗ i,j ∝  0 , if ( i, j ) / ∈ G P ∗ • Stationary distribution of reversible � �� � check detailed balance π ∗ ∝ [ w ∗ 1 , w ∗ 2 , . . . , w ∗ n ] � �� � true score Top- K ranking 11/ 21

  21. Regularized MLE Negative log-likelihood � � � w i w j L ( w ) := − y j,i log + (1 − y j,i ) log w i + w j w i + w j ( i,j ) ∈G • L ( w ) becomes convex after reparametrization: − → θ = [ θ 1 , · · · , θ n ] , θ i = log w i w Top- K ranking 12/ 21

  22. Regularized MLE Negative log-likelihood � � � w i w j L ( w ) := − y j,i log + (1 − y j,i ) log w i + w j w i + w j ( i,j ) ∈G • L ( w ) becomes convex after reparametrization: − → θ = [ θ 1 , · · · , θ n ] , θ i = log w i w L λ ( θ ) := L ( θ ) + 1 2 λ � θ � 2 ( Regularized MLE ) minimize θ 2 � np log n choose λ ≍ L Top- K ranking 12/ 21

  23. Main result comparison graph G ( n, p ) ; sample size ≍ pn 2 L 4 5 3 6 2 7 1 8 12 9 11 10 Theorem 1 (Chen, Fan, Ma, Wang ’17) When p � log n n , both spectral method and regularized MLE achieve optimal sample complexity for top- K ranking! Top- K ranking 13/ 21

  24. Main result le sample size separation: score separation K achievable by both methods infeasible 4 on: ∆ K : score separation w ∗ ( K ) − w ∗ ( K +1) • ∆ K := : score separation � w ∗ � ∞ Top- K ranking 13/ 21

  25. Comparison with Jang et al ’16 � log n Jang et al ’16: spectral method controls entrywise error if p � n � �� � relatively dense Top- K ranking 14/ 21

  26. Comparison with Jang et al ’16 � log n Jang et al ’16: spectral method controls entrywise error if p � n � �� � relatively dense al sample size Our work / optimal sample size e Jang et al ’16 ⌘ 1 / 4 ⇣ on: ∆ K : score separation log n n Top- K ranking 14/ 21

  27. Empirical top- K ranking accuracy 1 Spectral Method top- K ranking accuracy Regularized MLE 0.8 0.6 0.4 0.2 0 0.1 0.2 0.3 0.4 0.5 " K : score separation n = 200 , p = 0 . 25 , L = 20 Top- K ranking 15/ 21

  28. Optimal control of entrywise error ∆ K z }| { w ∗ w ∗ 2 w ∗ w ∗ w ∗ 1 3 · · · K K +1 · · · true score ore score estimates w K w 1 w 2 w 3 w K +1 · · · · · · |{z} |{z} < 1 2 ∆ K < 1 2 ∆ K Theorem 2 Suppose p � log n and sample size � n log n K . Then with high prob., ∆ 2 n the estimates w returned by both methods obey (up to global scaling) � w − w ∗ � ∞ < 1 2∆ K � w ∗ � ∞ Top- K ranking 16/ 21

  29. Key ingredient: leave-one-out analysis For each 1 ≤ m ≤ n , introduce leave-one-out estimate w ( m ) … 𝑛 … 1 2 3 𝑜 1 2 3 … te w ( m ) = ⇒ 𝑛 … 𝑜 y y = [ y i,j ] 1 ≤ i,j ≤ n Top- K ranking 17/ 21 asible

  30. Key ingredient: leave-one-out analysis For each 1 ≤ m ≤ n , introduce leave-one-out estimate w ( m ) ize statistical independence ce stability Top- K ranking 17/ 21

  31. Exploit statistical independence … 1 2 𝑛 … 3 𝑜 1 2 3 … te w ( m ) = ⇒ 𝑛 … 𝑜 y = [ y i,j ] 1 ≤ i,j ≤ n y leave-one-out estimate w ( m ) all data related to m th item = | Top- K ranking 18/ 21

  32. Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w Top- K ranking 19/ 21

  33. Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w • Spectral method: eigenvector perturbation bound � � � π ( P − � � P ) π ∗ π � π ∗ � � π − � spectral-gap ◦ new Davis-Kahan bound for probability transition matrices � �� � asymmetric Top- K ranking 19/ 21

  34. Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w • Spectral method: eigenvector perturbation bound � � � π ( P − � � P ) π ∗ π � π ∗ � � π − � spectral-gap ◦ new Davis-Kahan bound for probability transition matrices � �� � asymmetric • MLE: local strong convexity �∇L λ ( θ ; � y ) � 2 � θ − � θ � 2 � strong convexity parameter Top- K ranking 19/ 21

  35. A small sample of related works • Parametric models ◦ Ford ’57 ◦ Hunter ’04 ◦ Negahban, Oh, Shah ’12 ◦ Rajkumar, Agarwal ’14 ◦ Hajek, Oh, Xu ’14 ◦ Chen, Suh ’15 ◦ Rajkumar, Agarwal ’16 ◦ Jang, Kim, Suh, Oh ’16 ◦ Suh, Tan, Zhao ’17 • Non-parametric models ◦ Shah, Wainwright ’15 ◦ Shah, Balakrishnan, Guntuboyina, Wainwright ’16 ◦ Chen, Gopi, Mao, Schneider ’17 • Leave-one-out analysis ◦ El Karoui, Bean, Bickel, Lim, Yu ’13 ◦ Zhong, Boumal ’17 ◦ Abbe, Fan, Wang, Zhong ’17 ◦ Ma, Wang, Chi, Chen ’17 ◦ Chen, Chi, Fan, Ma ’18 ◦ Chen, Chi, Fan, Ma, Yan ’19 Top- K ranking 20/ 21

  36. Summary Linear-time Optimal sample computational complexity complexity ✔ ✔ Spectral method ✔ ✔ Regularized MLE Novel entrywise perturbation analysis for spectral method and convex optimization Paper : “ Spectral method and regularized MLE are both optimal for top- K ranking ”, Y. Chen, J. Fan, C. Ma, K. Wang, Annals of Statistics , vol. 47, 2019 Top- K ranking 21/ 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend