spectral method and regularized mle are both optimal for
play

Spectral Method and Regularized MLE Are Both Optimal for Top- K - PowerPoint PPT Presentation

Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Cong Ma ORFE, Princeton University Joint work with Yuxin Chen, Jianqing Fan and Kaizheng Wang Ranking A fundamental problem in a wide range of contexts web search,


  1. Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Cong Ma ORFE, Princeton University Joint work with Yuxin Chen, Jianqing Fan and Kaizheng Wang

  2. Ranking A fundamental problem in a wide range of contexts • web search, recommendation systems, admissions, sports competitions, voting, ... PageRank figure credit: Dzenan Hamzic Top- K ranking 2/ 20

  3. Rank aggregation from pairwise comparisons pairwise comparisons for ranking top tennis players figure credit: Boz´ oki, Csat´ o, Temesi Top- K ranking 3/ 20

  4. Parametric models Assign latent preference score to each of n items w ∗ = [ w ∗ 1 , · · · , w ∗ n ] i : preference score w ∗ k i : rank Top- K ranking 4/ 20

  5. Parametric models Assign latent preference score to each of n items w ∗ = [ w ∗ 1 , · · · , w ∗ n ] i : preference score w ∗ k i : rank • This work: Bradley-Terry-Luce model: for w ∗ ∈ R n + w ∗ j P { item j beats item i } = w ∗ i + w ∗ j Top- K ranking 4/ 20

  6. Other parametric models • Thurstone model: for w ∗ ∈ R n � � w ∗ j − w ∗ P { item j beats item i } = Φ ���� i Gaussian cdf Top- K ranking 5/ 20

  7. Other parametric models • Thurstone model: for w ∗ ∈ R n � � w ∗ j − w ∗ P { item j beats item i } = Φ ���� i Gaussian cdf • Parametric models: for nondecreasing f : R → [0 , 1] which obey f ( t ) = 1 − f ( − t ) , ∀ t ∈ R Then we set � � w ∗ j − w ∗ P { item j beats item i } = f i Top- K ranking 5/ 20

  8. Typical ranking procedures Estimate latent scores − → rank items based on score estimates Top- K ranking 6/ 20

  9. Top- K ranking Estimate latent scores − → rank items based on score estimates Goal: identify the set of top- K items with pairwise comparisons Top- K ranking 6/ 20

  10. Model: random sampling • Comparison graph: Erd˝ os–R´ enyi graph G ∼ G ( n, p ) 4 5 3 6 2 7 1 8 12 9 11 10 • For each ( i, j ) ∈ G , obtain L paired comparisons  w ∗  1 , with prob. j y ( l ) ind. i + w ∗ w ∗ = 1 ≤ l ≤ L j i,j  0 , else Top- K ranking 7/ 20

  11. Model: random sampling • Comparison graph: Erd˝ os–R´ enyi graph G ∼ G ( n, p ) 4 5 3 6 2 7 1 8 12 9 11 10 • For each ( i, j ) ∈ G , obtain L paired comparisons L � y i,j = 1 y ( l ) ( sufficient statistic ) i,j L l =1 Top- K ranking 7/ 20

  12. Spectral method (Rank Centrality) Negahban, Oh, Shah ’12 • Construct a probability transition matrix P = [ P i,j ] 1 ≤ i,j ≤ n :  1  d y i,j , if ( i, j ) ∈ E ,   � 1 − 1 P i,j = if i = j, k :( i,k ) ∈E y i,k , d    0 , otherwise . • Return score estimate as leading left eigenvector of P Top- K ranking 8/ 20

  13. Rationale behind spectral method → P ∗ = [ P ∗ L →∞ In large-sample case, P − − − − i,j ] 1 ≤ i,j ≤ n :  w ∗  1 j  j , if ( i, j ) ∈ E ,   d w ∗ i + w ∗ � P ∗ w ∗ i,j = 1 − 1 if i = j, k k ,  k :( i,k ) ∈E i + w ∗ d w ∗    0 , otherwise . Top- K ranking 9/ 20

  14. Rationale behind spectral method → P ∗ = [ P ∗ L →∞ In large-sample case, P − − − − i,j ] 1 ≤ i,j ≤ n :  w ∗  1 j  j , if ( i, j ) ∈ E ,   d w ∗ i + w ∗ � P ∗ w ∗ i,j = 1 − 1 if i = j, k k ,  k :( i,k ) ∈E i + w ∗ d w ∗    0 , otherwise . • Stationary distribution of P ∗ : 1 π ∗ := [ w ∗ 1 , w ∗ 2 , . . . , w ∗ n ] ⊤ � n i =1 w ∗ i Top- K ranking 9/ 20

  15. Rationale behind spectral method → P ∗ = [ P ∗ L →∞ In large-sample case, P − − − − i,j ] 1 ≤ i,j ≤ n :  w ∗  1 j  j , if ( i, j ) ∈ E ,   d w ∗ i + w ∗ � P ∗ w ∗ i,j = 1 − 1 if i = j, k k ,  k :( i,k ) ∈E i + w ∗ d w ∗    0 , otherwise . • Stationary distribution of P ∗ : 1 π ∗ := [ w ∗ 1 , w ∗ 2 , . . . , w ∗ n ] ⊤ � n i =1 w ∗ i • Check detailed balance! Top- K ranking 9/ 20

  16. Regularized MLE Negative log-likelihood � � � w i w j L ( w ) := − y j,i log + (1 − y j,i ) log w i + w j w i + w j ( i,j ) ∈G Top- K ranking 10/ 20

  17. Regularized MLE Negative log-likelihood � � � w i w j L ( w ) := − y j,i log + (1 − y j,i ) log w i + w j w i + w j ( i,j ) ∈G � � �� � θ i =log w i 1 + e θ i − θ j − − − − − − → L ( θ ) := − y j,i ( θ i − θ j ) + log ( i,j ) ∈G Top- K ranking 10/ 20

  18. Regularized MLE Negative log-likelihood � � � w i w j L ( w ) := − y j,i log + (1 − y j,i ) log w i + w j w i + w j ( i,j ) ∈G � � �� � θ i =log w i 1 + e θ i − θ j − − − − − − → L ( θ ) := − y j,i ( θ i − θ j ) + log ( i,j ) ∈G L λ ( θ ) := L ( θ ) + 1 2 λ � θ � 2 ( Regularized MLE ) minimize θ 2 � np log n choose λ ≍ L Top- K ranking 10/ 20

  19. Prior art mean square error top- K ranking for estimating scores accuracy ✔ Negahban et al. ‘12 Spectral method Negahban et al. ’12 ✔ MLE Hajek et al. ‘14 ✔ ✔ Spectral MLE Chen & Suh. ’15 Top- K ranking 11/ 20

  20. Prior art a “meta metric” mean square error top- K ranking for estimating scores accuracy ✔ Negahban et al. ‘12 Spectral method Negahban et al. ’12 ✔ MLE Hajek et al. ‘14 ✔ ✔ Spectral MLE Chen & Suh. ’15 Top- K ranking 11/ 20

  21. Small ℓ 2 loss � = high ranking accuracy Top- K ranking 12/ 20

  22. Small ℓ 2 loss � = high ranking accuracy Top- K ranking 12/ 20

  23. Small ℓ 2 loss � = high ranking accuracy These two estimates have same ℓ 2 loss, but output different rankings Top- K ranking 12/ 20

  24. Small ℓ 2 loss � = high ranking accuracy These two estimates have same ℓ 2 loss, but output different rankings Need to control entrywise error! Top- K ranking 12/ 20

  25. Optimality? Is spectral method or MLE alone optimal for top- K ranking? Top- K ranking 13/ 20

  26. Optimality? Is spectral method or MLE alone optimal for top- K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense Top- K ranking 13/ 20

  27. Optimality? Is spectral method or MLE alone optimal for top- K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense This work: affirmative answer for both methods + entire regime � �� � inc. sparse graphs Top- K ranking 13/ 20

  28. Main result comparison graph G ( n, p ) ; sample size ≍ pn 2 L 4 5 3 6 2 7 1 8 12 9 11 10 Theorem 1 (Chen, Fan, Ma, Wang ’17) When p � log n n , both spectral method and regularized MLE achieve optimal sample complexity for top- K ranking! Top- K ranking 14/ 20

  29. Main result le sample size separation: score separation K achievable by both methods infeasible 4 on: ∆ K : score separation w ∗ ( K ) − w ∗ ( K +1) • ∆ K := : score separation � w ∗ � ∞ Top- K ranking 14/ 20

  30. Empirical top- K ranking accuracy 1 Spectral Method top- K ranking accuracy Regularized MLE 0.8 0.6 0.4 0.2 0 0.1 0.2 0.3 0.4 0.5 " K : score separation n = 200 , p = 0 . 25 , L = 20 Top- K ranking 15/ 20

  31. Optimal control of entrywise error ∆ K z }| { w ∗ w ∗ 2 w ∗ w ∗ w ∗ 1 3 · · · K K +1 · · · true score ore score estimates w K w 1 w 2 w 3 w K +1 · · · · · · |{z} |{z} < 1 2 ∆ K < 1 2 ∆ K Theorem 2 Suppose p � log n and sample size � n log n K . Then with high prob., ∆ 2 n the estimates w returned by both methods obey (up to global scaling) � w − w ∗ � ∞ < 1 2∆ K � w ∗ � ∞ Top- K ranking 16/ 20

  32. Key ingredient: leave-one-out analysis For each 1 ≤ m ≤ n , introduce leave-one-out estimate w ( m ) … 1 2 3 𝑛 … 𝑜 1 2 3 … te w ( m ) = ⇒ 𝑛 … 𝑜 y y = [ y i,j ] 1 ≤ i,j ≤ n Top- K ranking 17/ 20

  33. Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w Top- K ranking 18/ 20

  34. Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w • Spectral method: eigenvector perturbation bound � � � π ( P − � � P ) π ∗ � π − � π � π ∗ � spectral-gap ◦ new Davis-Kahan bound for probability transition matrices � �� � asymmetric Top- K ranking 18/ 20

  35. Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w • Spectral method: eigenvector perturbation bound � � � π ( P − � � P ) π ∗ � π − � π � π ∗ � spectral-gap ◦ new Davis-Kahan bound for probability transition matrices � �� � asymmetric • MLE: local strong convexity �∇L λ ( θ ; � y ) � 2 � θ − � θ � 2 � strong convexity parameter Top- K ranking 18/ 20

  36. Summary Linear-time Optimal sample computational complexity complexity ✔ ✔ Spectral method ✔ ✔ Regularized MLE Novel entrywise perturbation analysis for spectral method and convex optimization Paper : “ Spectral method and regularized MLE are both optimal for top- K ranking ”, Y. Chen, J. Fan, C. Ma, K. Wang, arxiv:1707.09971, 2017 Top- K ranking 19/ 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend