Spectral Method and Regularized MLE Are Both Optimal for Top- K - PowerPoint PPT Presentation

Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Cong Ma ORFE, Princeton University Joint work with Yuxin Chen, Jianqing Fan and Kaizheng Wang

Ranking A fundamental problem in a wide range of contexts • web search, recommendation systems, admissions, sports competitions, voting, ... PageRank figure credit: Dzenan Hamzic Top- K ranking 2/ 20

Rank aggregation from pairwise comparisons pairwise comparisons for ranking top tennis players figure credit: Boz´ oki, Csat´ o, Temesi Top- K ranking 3/ 20

Parametric models Assign latent preference score to each of n items w ∗ = [ w ∗ 1 , · · · , w ∗ n ] i : preference score w ∗ k i : rank Top- K ranking 4/ 20

Parametric models Assign latent preference score to each of n items w ∗ = [ w ∗ 1 , · · · , w ∗ n ] i : preference score w ∗ k i : rank • This work: Bradley-Terry-Luce model: for w ∗ ∈ R n + w ∗ j P { item j beats item i } = w ∗ i + w ∗ j Top- K ranking 4/ 20

Other parametric models • Thurstone model: for w ∗ ∈ R n � � w ∗ j − w ∗ P { item j beats item i } = Φ �� i Gaussian cdf Top- K ranking 5/ 20

Other parametric models • Thurstone model: for w ∗ ∈ R n � � w ∗ j − w ∗ P { item j beats item i } = Φ �� i Gaussian cdf • Parametric models: for nondecreasing f : R → [0 , 1] which obey f ( t ) = 1 − f ( − t ) , ∀ t ∈ R Then we set � � w ∗ j − w ∗ P { item j beats item i } = f i Top- K ranking 5/ 20

Typical ranking procedures Estimate latent scores − → rank items based on score estimates Top- K ranking 6/ 20

Top- K ranking Estimate latent scores − → rank items based on score estimates Goal: identify the set of top- K items with pairwise comparisons Top- K ranking 6/ 20

Model: random sampling • Comparison graph: Erd˝ os–R´ enyi graph G ∼ G ( n, p ) 4 5 3 6 2 7 1 8 12 9 11 10 • For each ( i, j ) ∈ G , obtain L paired comparisons  w ∗  1 , with prob. j y ( l ) ind. i + w ∗ w ∗ = 1 ≤ l ≤ L j i,j  0 , else Top- K ranking 7/ 20

Model: random sampling • Comparison graph: Erd˝ os–R´ enyi graph G ∼ G ( n, p ) 4 5 3 6 2 7 1 8 12 9 11 10 • For each ( i, j ) ∈ G , obtain L paired comparisons L � y i,j = 1 y ( l ) ( sufficient statistic ) i,j L l =1 Top- K ranking 7/ 20

Spectral method (Rank Centrality) Negahban, Oh, Shah ’12 • Construct a probability transition matrix P = [ P i,j ] 1 ≤ i,j ≤ n :  1  d y i,j , if ( i, j ) ∈ E ,   � 1 − 1 P i,j = if i = j, k :( i,k ) ∈E y i,k , d    0 , otherwise . • Return score estimate as leading left eigenvector of P Top- K ranking 8/ 20

Rationale behind spectral method → P ∗ = [ P ∗ L →∞ In large-sample case, P − − − − i,j ] 1 ≤ i,j ≤ n :  w ∗  1 j  j , if ( i, j ) ∈ E ,   d w ∗ i + w ∗ � P ∗ w ∗ i,j = 1 − 1 if i = j, k k ,  k :( i,k ) ∈E i + w ∗ d w ∗    0 , otherwise . Top- K ranking 9/ 20

Rationale behind spectral method → P ∗ = [ P ∗ L →∞ In large-sample case, P − − − − i,j ] 1 ≤ i,j ≤ n :  w ∗  1 j  j , if ( i, j ) ∈ E ,   d w ∗ i + w ∗ � P ∗ w ∗ i,j = 1 − 1 if i = j, k k ,  k :( i,k ) ∈E i + w ∗ d w ∗    0 , otherwise . • Stationary distribution of P ∗ : 1 π ∗ := [ w ∗ 1 , w ∗ 2 , . . . , w ∗ n ] ⊤ � n i =1 w ∗ i Top- K ranking 9/ 20

Rationale behind spectral method → P ∗ = [ P ∗ L →∞ In large-sample case, P − − − − i,j ] 1 ≤ i,j ≤ n :  w ∗  1 j  j , if ( i, j ) ∈ E ,   d w ∗ i + w ∗ � P ∗ w ∗ i,j = 1 − 1 if i = j, k k ,  k :( i,k ) ∈E i + w ∗ d w ∗    0 , otherwise . • Stationary distribution of P ∗ : 1 π ∗ := [ w ∗ 1 , w ∗ 2 , . . . , w ∗ n ] ⊤ � n i =1 w ∗ i • Check detailed balance! Top- K ranking 9/ 20

Regularized MLE Negative log-likelihood � � � w i w j L ( w ) := − y j,i log + (1 − y j,i ) log w i + w j w i + w j ( i,j ) ∈G Top- K ranking 10/ 20

Regularized MLE Negative log-likelihood � � � w i w j L ( w ) := − y j,i log + (1 − y j,i ) log w i + w j w i + w j ( i,j ) ∈G � � �� θ i =log w i 1 + e θ i − θ j − − − − − − → L ( θ ) := − y j,i ( θ i − θ j ) + log ( i,j ) ∈G Top- K ranking 10/ 20

Regularized MLE Negative log-likelihood � � � w i w j L ( w ) := − y j,i log + (1 − y j,i ) log w i + w j w i + w j ( i,j ) ∈G � � �� θ i =log w i 1 + e θ i − θ j − − − − − − → L ( θ ) := − y j,i ( θ i − θ j ) + log ( i,j ) ∈G L λ ( θ ) := L ( θ ) + 1 2 λ � θ � 2 ( Regularized MLE ) minimize θ 2 � np log n choose λ ≍ L Top- K ranking 10/ 20

Prior art mean square error top- K ranking for estimating scores accuracy ✔ Negahban et al. ‘12 Spectral method Negahban et al. ’12 ✔ MLE Hajek et al. ‘14 ✔ ✔ Spectral MLE Chen & Suh. ’15 Top- K ranking 11/ 20

Prior art a “meta metric” mean square error top- K ranking for estimating scores accuracy ✔ Negahban et al. ‘12 Spectral method Negahban et al. ’12 ✔ MLE Hajek et al. ‘14 ✔ ✔ Spectral MLE Chen & Suh. ’15 Top- K ranking 11/ 20

Small ℓ 2 loss � = high ranking accuracy Top- K ranking 12/ 20

Small ℓ 2 loss � = high ranking accuracy These two estimates have same ℓ 2 loss, but output different rankings Top- K ranking 12/ 20

Small ℓ 2 loss � = high ranking accuracy These two estimates have same ℓ 2 loss, but output different rankings Need to control entrywise error! Top- K ranking 12/ 20

Optimality? Is spectral method or MLE alone optimal for top- K ranking? Top- K ranking 13/ 20

Optimality? Is spectral method or MLE alone optimal for top- K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense Top- K ranking 13/ 20

Optimality? Is spectral method or MLE alone optimal for top- K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense This work: affirmative answer for both methods + entire regime � �� inc. sparse graphs Top- K ranking 13/ 20

Main result comparison graph G ( n, p ) ; sample size ≍ pn 2 L 4 5 3 6 2 7 1 8 12 9 11 10 Theorem 1 (Chen, Fan, Ma, Wang ’17) When p � log n n , both spectral method and regularized MLE achieve optimal sample complexity for top- K ranking! Top- K ranking 14/ 20

Main result le sample size separation: score separation K achievable by both methods infeasible 4 on: ∆ K : score separation w ∗ ( K ) − w ∗ ( K +1) • ∆ K := : score separation � w ∗ � ∞ Top- K ranking 14/ 20

Empirical top- K ranking accuracy 1 Spectral Method top- K ranking accuracy Regularized MLE 0.8 0.6 0.4 0.2 0 0.1 0.2 0.3 0.4 0.5 " K : score separation n = 200 , p = 0 . 25 , L = 20 Top- K ranking 15/ 20

Optimal control of entrywise error ∆ K z }| { w ∗ w ∗ 2 w ∗ w ∗ w ∗ 1 3 · · · K K +1 · · · true score ore score estimates w K w 1 w 2 w 3 w K +1 · · · · · · |{z} |{z} < 1 2 ∆ K < 1 2 ∆ K Theorem 2 Suppose p � log n and sample size � n log n K . Then with high prob., ∆ 2 n the estimates w returned by both methods obey (up to global scaling) � w − w ∗ � ∞ < 1 2∆ K � w ∗ � ∞ Top- K ranking 16/ 20

Key ingredient: leave-one-out analysis For each 1 ≤ m ≤ n , introduce leave-one-out estimate w ( m ) … 1 2 3 𝑛 … 𝑜 1 2 3 … te w ( m ) = ⇒ 𝑛 … 𝑜 y y = [ y i,j ] 1 ≤ i,j ≤ n Top- K ranking 17/ 20

Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w Top- K ranking 18/ 20

Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w • Spectral method: eigenvector perturbation bound � � � π ( P − � � P ) π ∗ � π − � π � π ∗ � spectral-gap ◦ new Davis-Kahan bound for probability transition matrices � �� asymmetric Top- K ranking 18/ 20

Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w • Spectral method: eigenvector perturbation bound � � � π ( P − � � P ) π ∗ � π − � π � π ∗ � spectral-gap ◦ new Davis-Kahan bound for probability transition matrices � �� asymmetric • MLE: local strong convexity �∇L λ ( θ ; � y ) � 2 � θ − � θ � 2 � strong convexity parameter Top- K ranking 18/ 20

Summary Linear-time Optimal sample computational complexity complexity ✔ ✔ Spectral method ✔ ✔ Regularized MLE Novel entrywise perturbation analysis for spectral method and convex optimization Paper : “ Spectral method and regularized MLE are both optimal for top- K ranking ”, Y. Chen, J. Fan, C. Ma, K. Wang, arxiv:1707.09971, 2017 Top- K ranking 19/ 20

Spectral Method and Regularized MLE Are Both Optimal for Top- K - PowerPoint PPT Presentation

Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Cong Ma ORFE, Princeton University Joint work with Yuxin Chen, Jianqing Fan and Kaizheng Wang Ranking A fundamental problem in a wide range of contexts web search,

Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Yuxin Chen Electrical

Making Life Easier Online service for people within North Lanarkshire MLE History MLE website

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

Excel2013: Model Logistic MLE 1Y1X Sept 2015 V1A V1A V1A Excel2013 Model Logistic MLE 1Y1X

Logistic Regression: MLE vs. OLS1 in Excel2013 29 Aug 2016 V0B V0B V0B Schield MLE vs.

Laying a Solid Foundation for Learning: Lessons from the Kom MLE Project in Cameroon Paul

MLE vs. MAP Aarti Singh Machine Learning 10-701/15-781 Sept 15, 2010 1 MLE vs. MAP Maximum

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

MLE, MAP, AND NAIVE BAYES 10-601 RECITATION MARY MCGLOHON MLE The usual representation we come

2015 Schield Logistic MLE1A Excel2013 10/29/2015 V0D V0D V0D 2015 Schield Logistic MLE 1A

2015 Schield Logistic MLE1C Excel2013 8/18/2016 V0D V0D V0D 2015 Schield Logistic MLE 1C

MLE/MAP + Nave Bayes MLE / MAP Readings: Nave Bayes Readings: Matt Gormley

Regularized generalized CCA (RGCCA) Arthur Tenenhaus (SUPELEC) Michel Tenenhaus (HEC Paris) 1

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Homework 2 MLE and Naive Bayes Instructions Answer the questions and upload your answers to

Overview of Component SPARS-J Search System SPARS-J Outline System architecture Ranking method

ARCHER Training Courses Sponsors Reusing this material This work is licensed under a Creative

FY 2015 Regional CoC Debriefing Norm Suchar Director, Office of Special Needs Assistance

Extreme scale matrix factorizations in Exploration Seismology Felix J. Herrmann SLIM Georgia

The Ranked Sequence ADT A ranked sequence S (with n elements) supports the following methods:

More Data Mining with Weka Class 4 Lesson 1 Attribute selection using the wrapper

Selection Problem Rank Given n unsorted elements, determine the Rank of an element is its

Age Estimation Using Expectation of Label Distribution Learning Bin-Bin Gao 1 , Hong-Yu Zhou 1 ,