Statistical Inference for Incomplete Ranking Data: The Case of - PowerPoint PPT Presentation

Statistical Inference for Incomplete Ranking Data: The Case of Rank-Dependent Coarsening Mohsen Ahmadi Fahandar 1 ullermeier 1 es Couso 2 Eyke H¨ In´ 1 Intelligent Systems Group, Paderborn University, Germany 2 Department of Statistics, University of Oviedo, Spain ICML 2017 Tuesday, August 8th

Contributions Considering statistical inference for incomplete ranking data, we: Propose a specific type of data-generating process, in which incompleteness is due to ”coarsening” of (latent) complete rankings. Introduce the concept of ”rank-dependent” coarsening. Under our proposed setting: We study the problem of rank aggregation and the performance of various rank aggregation methods, both theoretically and practically. 1 / 22

Rank Aggregation Given rankings over set of items (e.g., K = 5 ): (observations) a 4 ≻ a 5 ≻ a 3 ≻ a 2 ≻ a 1 a 5 ≻ a 2 ≻ a 1 ≻ a 3 ≻ a 4 a 3 ≻ a 1 ≻ a 5 ≻ a 4 ≻ a 2 . . . a 1 ≻ a 2 ≻ a 4 ≻ a 3 ≻ a 5 a ? ≻ a ? ≻ a ? ≻ a ? ≻ a ? Combine rankings into a (single) consensus ranking. 2 / 22

Ranking Distributions Plackett-Luce (PL) model The probability assigned to ranking π given parameter vector θ = ( θ 1 , θ 2 , . . . , θ K ) ∈ R K + : K θ π ( i ) � P θ ( π ) = θ π ( i ) + θ π ( i +1) + . . . + θ π ( K ) i =1 - The mode of the PL distribution (i.e., π ∗ ) is the natural consensus in this case. θ a 2 θ a 1 + θ a 3 )( θ a 3 θ a 1 For example, P θ ( a 2 ≻ a 1 ≻ a 3 ) = ( θ a 1 + θ a 2 + θ a 3 )( θ a 3 ) Bradley-Terry-Luce (BTL) model θ a 1 P θ ( a 1 ≻ a 2 ) = θ a 1 + θ a 2 3 / 22

Incomplete Rankings In most applications, the observed rankings are incomplete (e.g., K = 5 ) (observations) a 4 ≻ a 5 ≻ a 3 ≻ a 2 ≻ a 1 a 2 ≻ a 1 ≻ a 3 ≻ a 4 a 3 ≻ a 1 . . . a 1 ≻ a 4 ≻ a 5 a ? ≻ a ? ≻ a ? ≻ a ? ≻ a ? Rank aggregation for incomplete rankings is more challenging! 4 / 22

From Complete to Incomplete Ranking generation coarsening incomplete ranking full ranking ranking model P θ ( π ) P λ ( τ | π ) Where does the word ”coarsening” come from? 5 / 22

A Stochastic Model for Incomplete Rankings A collection of rankings S K : P θ,λ ( τ, π ) = P θ ( π ) · P λ ( τ | π ) Generation of full rankings: P θ : S K → [0 , 1] , Coarsening process: P λ ( . | π ) : π ∈ S K , λ ∈ Λ . 6 / 22

Modeling of the Coarsening non-parametric (i.e., non-parametric (i.e., model and estimate P λ model and estimate P λ with no assumptions) with no assumptions) Estimate P λ parametric (i.e., take P λ parametric (i.e., take P λ from parametric family) from parametric family) Full Model ( P θ + P λ ) ignore coarsening but ignore coarsening but make assumptions make assumptions (e.g., rank-dependent) (e.g., rank-dependent) Not Estimate P λ ignore coarseing and ignore coarseing and make no assumption make no assumption 7 / 22

The Underlying Assumption Standard marginalization observed ranking set of items random subset a 4 ≻ a 3 { a 1 , a 2 , a 3 , a 4 } { a 4 , a 3 } a 4 ≻ a 1 ≻ a 3 ≻ a 2 full ranking What we propose: A coarsening that acts only on ”ranks” (positions) not items: P : 2 [ K ] → [0 , 1] observed ranking set of ranks random subset { 1 , 2 , 3 , 4 } { 2 , 4 } a 1 ≻ a 2 a 4 ≻ a 1 ≻ a 3 ≻ a 2 full ranking 8 / 22

Specific Instantiation generation coarsening ranking incomplete full ranking ranking model P θ ( π ) P λ ( τ | π ) pairwise observations Plackett-Luce model 9 / 22

Data Generating Process Rank-dependence in case of Pairwise Comparisons The entire distribution P λ is specified by the set of K ( K − 1) / 2 probabilities: � � � λ u,v | 1 ≤ u < v ≤ K, λ u,v ≥ 0 , λ u,v = 1 1 ≤ u<v ≤ K The probability to observe i better than j : q ′ � i,j = P θ ( π ) λ π ( i ) ,π ( j ) π ∈ E ( a i ≻ a j ) E ( a i ≻ a j ) is the set of all rankings consistent with a i ≻ a j . 10 / 22

Data Generating Process Generated rankings based on PL Coarsening λ 1 , 3 = 1 (a degenerate probability distribution) Observations D a 4 ≻ a 5 ≻ a 3 ≻ a 2 ≻ a 1 a 4 ≻ a 3 a 5 ≻ a 1 a 5 ≻ a 2 ≻ a 1 ≻ a 3 ≻ a 4 a 1 ≻ a 3 ≻ a 2 ≻ a 4 ≻ a 5 a 1 ≻ a 2 . . . . . . a 1 ≻ a 2 ≻ a 4 ≻ a 3 ≻ a 5 a 1 ≻ a 4 11 / 22

Introduced Bias Let θ = (14 , 5 , 1) and coarsening be degenerate: λ 1 , 2 = 1 (i.e., top-2):   − 0 . 737 0 . 933 θ i   ( p i,j = ) marginal matrix ≈ 0 . 263 − 0 . 833   θ i + θ j   0 . 067 0 . 167 −   − 0 . 714 0 . 76 q ′ i,j   ( q i,j = ) matrix ≈ 0 . 286 − 0 . 559   q ′ i,j + q ′   j,i 0 . 24 0 . 441 − 12 / 22

Definitions Comparison matrix C ( c i,j : number of wins a i over a j ): a 1 a 2 a 3 a 4   a 1 0 6 4 1 a 2 7 0 5 8   C =   a 3 3 4 0 9     a 4 2 1 12 0 Probability matrix ˆ P (relative wins): a 1 a 2 a 3 a 4   a 1 0 0 . 46 0 . 57 0 . 33 a 2 0 . 54 0 0 . 56 0 . 89   ˆ P =   a 3 0 . 43 0 . 44 0 0 . 43     a 4 0 . 67 0 . 11 0 . 57 0 where c i,j p i,j = ˆ c i,j + c j,i 13 / 22

Rank Estimation Framework observations D ( K = 4) a 4 ≻ a 3 esimated ranking a 2 ≻ a 1 aggregate estimate Matrix C π : a 2 ≻ a 4 ≻ a 1 ≻ a 3 ˆ = = = = ⇒ = = = = = ⇒ a 1 ≻ a 2 . . . a 1 ≻ a 4 14 / 22

Rank Aggregation Methods Statistical Estimation BTL, BTL(R) (Bradley & Terry, 1952) Least Squares/HodgeRank (LS) (Jiang et al., 2011) Voting Methods Borda (Borda, 1781) Copeland (CP) (Copeland, 1951) Spectral Methods Rank Centrality (RC) (Negahban et al., 2012) MC2, MC3 (Dwork et al., 2001) Graph-based Methods FAS, FAS(R), FAS(B) (Saab, 2001; Fomin et al., 2010) Pairwise Coupling HT (Hastie & Tibshirani, 1998) Price (Price et al., 1994) WU1, WU2 (Wu et al., 2004) 15 / 22

Research Questions Practical performance : How close is the prediction ˆ π to the ground truth ranking π ∗ ? Consistency Consistency Let ˆ π N denote the ranking produced as a prediction by a ranking method on the basis of N observed (pairwise) preferences. The method is consistent if π N = π ∗ ) → 1 P (ˆ for N → ∞ . 16 / 22

BTL (Bradley-Terry-Luce) Given comparison matrix C : a 1 a 2 a 3 a 4   a 1 0 6 4 1 a 2 7 0 5 8   C =   a 3 3 4 0 9     a 4 2 1 12 0 BTL estimates the parameters by likelihood maximization: � c i,j � θ i ˆ � θ ∈ arg max θ i + θ j θ ∈ R K 1 ≤ i � = j ≤ K ˆ θ ≈ (0 . 253 , 0 . 382 , 0 . 178 , 0 . 187) π : a 2 ≻ a 1 ≻ a 4 ≻ a 3 ˆ 17 / 22

Borda and Copeland (CP) Given probability matrix ˆ P : a 1 a 2 a 3 a 4   a 1 0 0 . 46 0 . 57 0 . 33 a 2 0 . 54 0 0 . 56 0 . 89   ˆ P =   a 3 0 . 43 0 . 44 0 0 . 43     a 4 0 . 67 0 . 11 0 . 57 0 Borda assigns a score to each item: K � s i = p i,j ˆ s : (1 . 366 , 1 . 983 , 1 . 302 , 1 . 349) ⇒ ˆ π : a 2 ≻ a 1 ≻ a 4 ≻ a 3 i =1 Copeland (the number of pairwise victories): K p i,j > 1 � � � s i = I ˆ s : (1 , 3 , 0 , 2) ⇒ ˆ π : a 2 ≻ a 1 ≻ a 4 ≻ a 3 2 i =1 18 / 22

FAS (Feedback Arc Set) Given comparison matrix C : a 1 a 2 a 3 a 4   a 1 0 6 4 1 a 2 7 0 5 8   C =   a 3 3 4 0 9     a 4 2 1 12 0 FAS seeks to find the ranking that causes the lowest sum of penalties: � ˆ π = arg min c j,i π ∈ S K ( i,j ): π ( i ) <π ( j ) π : a 2 ≻ a 4 ≻ a 1 ≻ a 3 ˆ 19 / 22

Statistical Inference for Incomplete Ranking Data: The Case of - PowerPoint PPT Presentation

Statistical Inference for Incomplete Ranking Data: The Case of Rank-Dependent Coarsening Mohsen Ahmadi Fahandar 1 ullermeier 1 es Couso 2 Eyke H In 1 Intelligent Systems Group, Paderborn University, Germany 2 Department of Statistics,

Statistical inference for incomplete Ins Couso ranking data: A comparison of two Mohsen Ahmadi

Incomplete Information Econ 400 University of Notre Dame Econ 400 (ND) Incomplete Information

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Synthesis under incomplete information Andreas Augustin June 12, 2008 Andreas Augustin

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch Ads Ranking at

Foundations for Inference I Dajiang Liu @PHS525 Feb-09-2016 Statistical Inference

UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. 1 Statistical Inference is

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Randomness Task 6: Coping with Incomplete Knowledge: Overview You flip a coin. It either

Bayesian Games and Auctions Mihai Manea MIT Games of Incomplete Information Incomplete

Products Committee Aug 13, 2014 Agenda MIL-STD-129 / FED-STD-123 Discussion Greg Rollins, 1.

Future Airspace Strategy Modernising the UK Airspace System FUTURE AIRSPACE STRATEGY DEPLOYMENT

NDPERS M ANDATORY 457 P ROVIDER T RAINING General Information 1. Presentation is being recorded

A Multigrid Tutorial part two William L. Briggs Department of Mathematics University of

9

Towards Standardization of Threshold Schemes for Cryptographic Primitives at NIST Lu s

CSCI 246 Class 8 ASYMPTOTIC NOTATION Quiz Questions Lecture 14: What is the Big- O

Regression analysis Tamuno Alfred, PhD Biostatistician DataCamp Designing and Analyzing