Spectral Method and Regularized MLE Are Both Optimal for Top- K - PowerPoint PPT Presentation

Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang

Ranking A fundamental problem in a wide range of contexts • web search, recommendation systems, admissions, sports competitions, voting, ... PageRank figure credit: Dzenan Hamzic Top- K ranking 2/ 21

Rank aggregation from pairwise comparisons pairwise comparisons for ranking top tennis players figure credit: Boz´ oki, Csat´ o, Temesi Top- K ranking 3/ 21

Parametric models Assign latent score to each of n items w ∗ = [ w ∗ 1 , · · · , w ∗ n ] i : preference score w ∗ i : rank k Top- K ranking 4/ 21

Parametric models Assign latent score to each of n items w ∗ = [ w ∗ 1 , · · · , w ∗ n ] i : preference score w ∗ i : rank k • This work: Bradley-Terry-Luce (logistic) model w ∗ j P { item j beats item i } = w ∗ i + w ∗ j • Other models: Thurstone model, low-rank model, ... Top- K ranking 4/ 21

Typical ranking procedures Estimate latent scores − → rank items based on score estimates Top- K ranking 5/ 21

Top- K ranking Estimate latent scores − → rank items based on score estimates Goal: identify the set of top- K items under minimal sample size Top- K ranking 5/ 21

Model: random sampling • Comparison graph: Erd˝ os–R´ enyi graph G ∼ G ( n, p ) 4 5 3 6 2 7 1 8 12 9 11 10 • For each ( i, j ) ∈ G , obtain L paired comparisons  w ∗  1 , with prob. j y ( l ) ind. i + w ∗ w ∗ = 1 ≤ l ≤ L j i,j  0 , else Top- K ranking 6/ 21

Model: random sampling • Comparison graph: Erd˝ os–R´ enyi graph G ∼ G ( n, p ) 4 5 3 6 2 7 1 8 12 9 11 10 • For each ( i, j ) ∈ G , obtain L paired comparisons L � y i,j = 1 y ( l ) ( sufficient statistic ) i,j L l =1 Top- K ranking 6/ 21

Prior art mean square error top- K ranking for estimating scores accuracy ✔ Negahban et al. ‘12 Spectral method Negahban et al. ’12 ✔ MLE Hajek et al. ‘14 ✔ ✔ Spectral MLE Chen & Suh. ’15 Top- K ranking 7/ 21

Prior art a “meta metric” mean square error top- K ranking for estimating scores accuracy ✔ Negahban et al. ‘12 Spectral method Negahban et al. ’12 ✔ MLE Hajek et al. ‘14 ✔ ✔ Spectral MLE Chen & Suh. ’15 Top- K ranking 7/ 21

Small ℓ 2 loss � = high ranking accuracy Top- K ranking 8/ 21

Small ℓ 2 loss � = high ranking accuracy These two estimates have same ℓ 2 loss, but output different rankings Top- K ranking 8/ 21

Small ℓ 2 loss � = high ranking accuracy These two estimates have same ℓ 2 loss, but output different rankings Need to control entrywise error! Top- K ranking 8/ 21

Optimality? Is spectral method or MLE alone optimal for top- K ranking? Top- K ranking 9/ 21

Optimality? Is spectral method or MLE alone optimal for top- K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense Top- K ranking 9/ 21

Optimality? Is spectral method or MLE alone optimal for top- K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense This work: affirmative answer for both methods + entire regime � �� inc. sparse graphs Top- K ranking 9/ 21

Spectral method (Rank Centrality) Negahban, Oh, Shah ’12 • Construct a probability transition matrix P , whose off-diagonal entries obey � y i,j , if ( i, j ) ∈ G P i,j ∝ 0 , if ( i, j ) / ∈ G • Return score estimate as leading left eigenvector of P Top- K ranking 10/ 21

Rationale behind spectral method In large-sample case, P → P ∗ , whose off-diagonal entries obey  w ∗  j j , if ( i, j ) ∈ G P ∗ w ∗ i + w ∗ i,j ∝  0 , if ( i, j ) / ∈ G P ∗ • Stationary distribution of reversible � �� check detailed balance π ∗ ∝ [ w ∗ 1 , w ∗ 2 , . . . , w ∗ n ] � �� true score Top- K ranking 11/ 21

Regularized MLE Negative log-likelihood � � � w i w j L ( w ) := − y j,i log + (1 − y j,i ) log w i + w j w i + w j ( i,j ) ∈G • L ( w ) becomes convex after reparametrization: − → θ = [ θ 1 , · · · , θ n ] , θ i = log w i w Top- K ranking 12/ 21

Regularized MLE Negative log-likelihood � � � w i w j L ( w ) := − y j,i log + (1 − y j,i ) log w i + w j w i + w j ( i,j ) ∈G • L ( w ) becomes convex after reparametrization: − → θ = [ θ 1 , · · · , θ n ] , θ i = log w i w L λ ( θ ) := L ( θ ) + 1 2 λ � θ � 2 ( Regularized MLE ) minimize θ 2 � np log n choose λ ≍ L Top- K ranking 12/ 21

Main result comparison graph G ( n, p ) ; sample size ≍ pn 2 L 4 5 3 6 2 7 1 8 12 9 11 10 Theorem 1 (Chen, Fan, Ma, Wang ’17) When p � log n n , both spectral method and regularized MLE achieve optimal sample complexity for top- K ranking! Top- K ranking 13/ 21

Main result le sample size separation: score separation K achievable by both methods infeasible 4 on: ∆ K : score separation w ∗ ( K ) − w ∗ ( K +1) • ∆ K := : score separation � w ∗ � ∞ Top- K ranking 13/ 21

Comparison with Jang et al ’16 � log n Jang et al ’16: spectral method controls entrywise error if p � n � �� relatively dense Top- K ranking 14/ 21

Comparison with Jang et al ’16 � log n Jang et al ’16: spectral method controls entrywise error if p � n � �� relatively dense al sample size Our work / optimal sample size e Jang et al ’16 ⌘ 1 / 4 ⇣ on: ∆ K : score separation log n n Top- K ranking 14/ 21

Empirical top- K ranking accuracy 1 Spectral Method top- K ranking accuracy Regularized MLE 0.8 0.6 0.4 0.2 0 0.1 0.2 0.3 0.4 0.5 " K : score separation n = 200 , p = 0 . 25 , L = 20 Top- K ranking 15/ 21

Optimal control of entrywise error ∆ K z }| { w ∗ w ∗ 2 w ∗ w ∗ w ∗ 1 3 · · · K K +1 · · · true score ore score estimates w K w 1 w 2 w 3 w K +1 · · · · · · |{z} |{z} < 1 2 ∆ K < 1 2 ∆ K Theorem 2 Suppose p � log n and sample size � n log n K . Then with high prob., ∆ 2 n the estimates w returned by both methods obey (up to global scaling) � w − w ∗ � ∞ < 1 2∆ K � w ∗ � ∞ Top- K ranking 16/ 21

Key ingredient: leave-one-out analysis For each 1 ≤ m ≤ n , introduce leave-one-out estimate w ( m ) … 𝑛 … 1 2 3 𝑜 1 2 3 … te w ( m ) = ⇒ 𝑛 … 𝑜 y y = [ y i,j ] 1 ≤ i,j ≤ n Top- K ranking 17/ 21 asible

Key ingredient: leave-one-out analysis For each 1 ≤ m ≤ n , introduce leave-one-out estimate w ( m ) ize statistical independence ce stability Top- K ranking 17/ 21

Exploit statistical independence … 1 2 𝑛 … 3 𝑜 1 2 3 … te w ( m ) = ⇒ 𝑛 … 𝑜 y = [ y i,j ] 1 ≤ i,j ≤ n y leave-one-out estimate w ( m ) all data related to m th item = | Top- K ranking 18/ 21

Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w Top- K ranking 19/ 21

Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w • Spectral method: eigenvector perturbation bound � � � π ( P − � � P ) π ∗ π � π ∗ � � π − � spectral-gap ◦ new Davis-Kahan bound for probability transition matrices � �� asymmetric Top- K ranking 19/ 21

Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w • Spectral method: eigenvector perturbation bound � � � π ( P − � � P ) π ∗ π � π ∗ � � π − � spectral-gap ◦ new Davis-Kahan bound for probability transition matrices � �� asymmetric • MLE: local strong convexity �∇L λ ( θ ; � y ) � 2 � θ − � θ � 2 � strong convexity parameter Top- K ranking 19/ 21

A small sample of related works • Parametric models ◦ Ford ’57 ◦ Hunter ’04 ◦ Negahban, Oh, Shah ’12 ◦ Rajkumar, Agarwal ’14 ◦ Hajek, Oh, Xu ’14 ◦ Chen, Suh ’15 ◦ Rajkumar, Agarwal ’16 ◦ Jang, Kim, Suh, Oh ’16 ◦ Suh, Tan, Zhao ’17 • Non-parametric models ◦ Shah, Wainwright ’15 ◦ Shah, Balakrishnan, Guntuboyina, Wainwright ’16 ◦ Chen, Gopi, Mao, Schneider ’17 • Leave-one-out analysis ◦ El Karoui, Bean, Bickel, Lim, Yu ’13 ◦ Zhong, Boumal ’17 ◦ Abbe, Fan, Wang, Zhong ’17 ◦ Ma, Wang, Chi, Chen ’17 ◦ Chen, Chi, Fan, Ma ’18 ◦ Chen, Chi, Fan, Ma, Yan ’19 Top- K ranking 20/ 21

Summary Linear-time Optimal sample computational complexity complexity ✔ ✔ Spectral method ✔ ✔ Regularized MLE Novel entrywise perturbation analysis for spectral method and convex optimization Paper : “ Spectral method and regularized MLE are both optimal for top- K ranking ”, Y. Chen, J. Fan, C. Ma, K. Wang, Annals of Statistics , vol. 47, 2019 Top- K ranking 21/ 21

Spectral Method and Regularized MLE Are Both Optimal for Top- K - PowerPoint PPT Presentation

Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang Ranking A fundamental problem in a wide range of contexts

Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Cong Ma ORFE, Princeton

Making Life Easier Online service for people within North Lanarkshire MLE History MLE website

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

Excel2013: Model Logistic MLE 1Y1X Sept 2015 V1A V1A V1A Excel2013 Model Logistic MLE 1Y1X

Logistic Regression: MLE vs. OLS1 in Excel2013 29 Aug 2016 V0B V0B V0B Schield MLE vs.

Laying a Solid Foundation for Learning: Lessons from the Kom MLE Project in Cameroon Paul

MLE vs. MAP Aarti Singh Machine Learning 10-701/15-781 Sept 15, 2010 1 MLE vs. MAP Maximum

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

MLE, MAP, AND NAIVE BAYES 10-601 RECITATION MARY MCGLOHON MLE The usual representation we come

2015 Schield Logistic MLE1A Excel2013 10/29/2015 V0D V0D V0D 2015 Schield Logistic MLE 1A

2015 Schield Logistic MLE1C Excel2013 8/18/2016 V0D V0D V0D 2015 Schield Logistic MLE 1C

MLE/MAP + Nave Bayes MLE / MAP Readings: Nave Bayes Readings: Matt Gormley

Regularized generalized CCA (RGCCA) Arthur Tenenhaus (SUPELEC) Michel Tenenhaus (HEC Paris) 1

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Homework 2 MLE and Naive Bayes Instructions Answer the questions and upload your answers to

Comparing Several Samples We are often interested in comparing measurements made under more than

Blocked Designs Recall the paired comparison design: two treatments applied to the same

Interconnection in the Internet: the policy challenge David Clark MIT CSAIL November, 2011

LIGO containers in diverse computing environments Thomas P Downes Center for Gravitation,

MA111: Contemporary mathematics Jack Schmidt No entrance or exit quiz today (pick-up your

Pttr tss r st

Closest Pair of Points in the Plane Inge Li Grtz Thank you to Kevin Wayne for inspiration to

Pairwise comparison matrices and efficient weight vectors Sndor BOZKI Institute for Computer