(Bayesian) Statistics with Rankings Marina Meil a University of - PowerPoint PPT Presentation

(Bayesian) Statistics with Rankings Marina Meil˘ a University of Washington www.stat.washington.edu/mmp with Alnur Ali, Harr Chen, Bhushan Mandhani, Le Bao, Kapil Phadnis, Artur Patterson, Brendan Murphy, Jeff Bilmes

Permutations (rankings) data represents preferences Burger preferences Elections Ireland, n = 5 , N = 1100 n = 6 , N = 600 Roch Scal McAl Bano Nall Scal McAl Nall Bano Roch med-rare med rare ... Roch McAl done med-done med ... med-rare rare med ... College programs n = 533 , N = 53737 , t = 10 DC116 DC114 DC111 DC148 DB512 DN021 LM054 WD048 LM020 LM050 WD028 DN008 TR071 DN012 DN052 FT491 FT353 FT471 FT541 FT402 FT404 TR004 FT351 FT110 FT352 Ranking data discrete many valued combinatorial structure

The Consensus Ranking problem Given a set of rankings { π 1 , π 2 , . . . π N } ⊂ S n find the consensus ranking (or central ranking) π 0 that best agrees with the data Elections Ireland, n = 5 , N = 1100 Roch Scal McAl Bano Nall Scal McAl Nall Bano Roch Roch McAl Consensus = [ Roch Scal McAl Bano Nall ] ?

The Consensus Ranking problem Problem (also called Preference Aggregation, Kemeny Ranking) Given a set of rankings { π 1 , π 2 , . . . π N } ⊂ S n find the consensus ranking (or central ranking) π 0 such that N � π 0 = argmin d ( π i , π 0 ) S n i =1 for d = inversion distance / Kendall τ -distance / “bubble sort” distance

The Consensus Ranking problem Problem (also called Preference Aggregation, Kemeny Ranking) Given a set of rankings { π 1 , π 2 , . . . π N } ⊂ S n find the consensus ranking (or central ranking) π 0 such that N � π 0 = argmin d ( π i , π 0 ) S n i =1 for d = inversion distance / Kendall τ -distance / “bubble sort” distance Relevance voting in elections (APA, Ireland, Cambridge), panels of experts (admissions, hiring, grant funding) aggregating user preferences (economics, marketing) subproblem of other problems (building a good search engine: leaning to rank [Cohen, Schapire,Singer 99]) Equivalent to finding the “mean” or “median” of a set of points

The Consensus Ranking problem Problem (also called Preference Aggregation, Kemeny Ranking) Given a set of rankings { π 1 , π 2 , . . . π N } ⊂ S n find the consensus ranking (or central ranking) π 0 such that N � π 0 = argmin d ( π i , π 0 ) S n i =1 for d = inversion distance / Kendall τ -distance / “bubble sort” distance Relevance voting in elections (APA, Ireland, Cambridge), panels of experts (admissions, hiring, grant funding) aggregating user preferences (economics, marketing) subproblem of other problems (building a good search engine: leaning to rank [Cohen, Schapire,Singer 99]) Equivalent to finding the “mean” or “median” of a set of points Fact: Consensus ranking for the inversion distance is NP hard

Consensus ranking problem N � π 0 = argmin d ( π i , π 0 ) S n i =1 This talk Will generalize the problem from finding π 0 to estimating statistical model Will generalize the data From complete, finite permutations to top-t rankings, countably many items ( n → ∞ ). . .

Outline Statistical models for permutations and the dependence of ranks 1 Codes, inversion distance and the precedence matrix 2 Mallows models over permutations 3 Maximum Likelihood estimation 4 The Likelihood A Branch and Bound Algorithm Related work, experimental comparisons Mallows and GM and other statistical models Top-t rankings and infinite permutations 5 Statistical results 6 Bayesian Estimation, conjugate prior, Dirichlet process mixtures Conclusions 7

Some notation Base set { a , b , c , d } contains n items (or alternatives) E.g { rare, med-rare, med, med-done, . . . } S n = the symmetric group = the set of all permutations over n items π = [ c a b d ] ∈ S n a permutation/ranking π = [ c a ] a top-t ranking (is a partial order) t = | π | ≤ n the length of π We observe data π 1 , π 2 , . . . , π N ∼ sampled independently from distribution P over S n (where P is unknown)

Representations for permutations reference permutation id = [ a b c d ] π = [ c a b d ] ranked list (2 3 1) cycle representation a b c d function on { a , b , c , d } [ 2 3 1 4 ] 0 1 0 0 0 0 1 0 Π = permutation matrix 1 0 0 0 0 0 0 1 − 1 0 1 0 − 0 1 Q = precedence matrix , Q ij = 1 if i ≺ π j , 1 1 − 1 0 0 0 − ( V 1 , V 2 , V 3 ) = (1 , 1 , 0) code ( s 1 , s 2 , s 3 ) = (2 , 0 , 0)

Representations for permutations reference permutation id = [ a b c d ] π = [ c a b d ] ranked list (2 3 1) cycle representation a b c d function on { a , b , c , d } [ 2 3 1 4 ] 0 0 1 0 1 0 0 0 Π = permutation matrix 0 1 0 0 0 0 0 1 − 1 0 1 0 − 0 1 Q = precedence matrix , Q ij = 1 if i ≺ π j 1 1 − 1 0 0 0 − ( V 1 , V 2 , V 3 ) = (1 , 1 , 0) code ( s 1 , s 2 , s 3 ) = (2 , 0 , 0)

Thurstone: Ranking by utility The Thurstone Model item j has expected utility µ j sample u j = µ j + ǫ j , j = 1 : n (independently or not) u j is the actual utility of item j sort ( u j ) j =1: n to obtain a π

Thurstone: Ranking by utility The Thurstone Model item j has expected utility µ j sample u j = µ j + ǫ j , j = 1 : n (independently or not) u j is the actual utility of item j sort ( u j ) j =1: n to obtain a π rich model class typically ǫ j ∼ Normal (0 , σ 2 j ) parameters interpretable some simple probability calculations are intractable P [ a ≺ b ]] tractable, P [ i in first place ] tractable P [ i in 85th place ] intractable each rank of π depends on all the ǫ j

Plackett-Luce: Ranking as drawing without replacement The Plackett-Luce model item j has weight w j > 0 w a w b P ([ a , b , . . . ]) ∝ i ′ w i ′ − w a . . . P P i ′ w i ′ items are drawn “without replacement” from distribution ( w 1 , w 2 . . . w n ) (Markov chain) normalization constant Z generally not known distribution of first ranks approximately independent item at rank j depends on all previous ranks

Bradley-Terry: penalizing inversions The Bradley-Terry model   � P ( π ) ∝ exp  − α ij Q ij ( π )  i < j exponential family model one parameter for every pair ) i , j ) α ij is penalty for inverting i with j only qualitative interpretation normalization constant Z generally not known transitivity i ≺ j , j ≺ k = ⇒ i ≺ k therefore the sufficient statistics Q ij are dependent

Bradley-Terry: penalizing inversions The Bradley-Terry model   � P ( π ) ∝ exp  − α ij Q ij ( π )  i < j exponential family model one parameter for every pair ) i , j ) α ij is penalty for inverting i with j only qualitative interpretation normalization constant Z generally not known transitivity i ≺ j , j ≺ k = ⇒ i ≺ k therefore the sufficient statistics Q ij are dependent Mallows models are a subclass of Bradley-Terry models do not suffer from this dependence coming next. . .

Outline Statistical models for permutations and the dependence of ranks 1 Codes, inversion distance and the precedence matrix 2 Mallows models over permutations 3 Maximum Likelihood estimation 4 The Likelihood A Branch and Bound Algorithm Related work, experimental comparisons Mallows and GM and other statistical models Top-t rankings and infinite permutations 5 Statistical results 6 Bayesian Estimation, conjugate prior, Dirichlet process mixtures Conclusions 7

The precedence matrix Q π = [ c a b d ] a b c d − 1 0 1 a Q ( π ) = 0 − 0 1 b 1 1 − 1 c 0 0 0 − d Q ij ( π ) = 1 iff i before j in π Q ij = 1 − Q ji reference permutation id = [ a b c d ] : determines the order of rows, columns in Q

The number of inversions and Q π = [ c a b d ] a b c d − 1 0 1 a Q ( π ) = 0 − 0 1 b 1 1 − 1 c 0 0 0 − d define L ( Q ) = � i > j Q ij = sum( lower triangle ( Q ))

The number of inversions and Q π = [ c a b d ] a b c d − 1 0 1 a Q ( π ) = 0 − 0 1 b 1 1 − 1 c 0 0 0 − d define L ( Q ) = � i > j Q ij = sum( lower triangle ( Q )) then #inversions( π ) = L ( Q ) = d ( π, id )

The inversion distance and Q π = [ c a b d ] , Refence permutation Reference permutation id = [ a b c d ] π 0 = [ b a d c ] Π T Q ( π ) 0 Q ( π )Π 0 a b c d b a d c − 1 0 1 a − 0 1 0 b 0 − 0 1 b 1 − 1 0 a 1 1 − 1 c 0 0 − 0 d 0 0 0 − d 1 1 1 − c d ( π, id ) = 2 d ( π, π 0 ) = 4

The inversion distance and Q To obtain d ( π, π 0 ) Construct Q ( π ) 1 Sort rows and columns by π 0 2 Sum elements in lower triangle 3

The inversion distance and Q To obtain d ( π, π 0 ) π = [ c a b d ] , π 0 = [ b a d c ] Construct Q ( π ) 1 b a d c Sort rows and columns by π 0 2 − 0 1 0 b Sum elements in lower triangle 1 − 1 0 a 3 0 0 − 0 d Note also that 1 1 1 − c To obtain d ( π 1 , π 0 ) + d ( π 2 , π 0 ) + . . . d ( π, π 0 ) = 4 Construct Q ( π 1 ) , Q ( π 2 ) , . . . 1 Sum 2 Q = Q ( π 1 ) + Q ( π 2 ) + . . . Sort rows and columns of Q 3 by π 0 Sum elements in lower 4 triangle of Q

A decomposition for the inversion distance d ( π, π 0 ) = # inversions between π and π 0 d ([ c a b d ] , [ b a d c ]) = # (inversions w.r.t b ) � �� V 1 + # (inversions w.r.t a ) � �� V 2 + # ( inversions w.r.t d ) � �� V 3 + . . . V j = # inversions where π 0 ( j ) is disfavored

(Bayesian) Statistics with Rankings Marina Meil a University of - PowerPoint PPT Presentation

(Bayesian) Statistics with Rankings Marina Meil a University of Washington www.stat.washington.edu/mmp with Alnur Ali, Harr Chen, Bhushan Mandhani, Le Bao, Kapil Phadnis, Artur Patterson, Brendan Murphy, Jeff Bilmes Permutations (rankings)

HOW HEALTHY IS OUR COUNTY? 2013 COUNTY HEALTH RANKINGS & ROADMAPS OUTLINE Rankings Background

ROUNDERS (1998) CASINO ROYALE (2006) HAND RANKINGS HIGH CARD HAND RANKINGS PAIR HIGH CARD

Wheatley and National Rankings May 2013 1 Recent National Rankings Several national High

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Statistics for Analytical Science at Warwick Simon Spencer Bayesian statistics in epidemiology

Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric

Statistics for Applications Chapter 8: Bayesian Statistics 1/17 The Bayesian approach (1)

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian statistics DS GA 1002 Probability and Statistics for Data Science

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

MOSCOW, RUSSIA JILL COOPER PROGRAMME MARKETING DIRECTOR GSMA RCS GLOBAL ADOPTION RCS

Software Quality Engineering: Testing, Quality Assurance, and Quantifiable Improvement Jeff

Location Privacy Protection For Smartphone Users Thanks to Kassem Fawaz and Kang G Shin

Dungeons and DQNs Toward Reinforcement Learning Agents that Play Tabletop Roleplaying Games

Why Publish to the VO? Sverin Gaudet Canadian Astronomy Data Centre Canadian Advanced Network

Variable Elimination Probabilistic Graphical Models Sharif University of Technology Spring 2018

Measurement of Wire Sag in a Vibrating Wire Setup* Animesh Jain, Ping He, George Ganetis

CloudKitty Hands-on 1 / 56 Lets meet your hosts! 2 / 56 Lets meet your hosts! Todays

(Bayesian) Statistics with Rankings Marina Meil a University of - PowerPoint PPT Presentation

(Bayesian) Statistics with Rankings Marina Meil a University of Washington www.stat.washington.edu/mmp with Alnur Ali, Harr Chen, Bhushan Mandhani, Le Bao, Kapil Phadnis, Artur Patterson, Brendan Murphy, Jeff Bilmes Permutations (rankings)

HOW HEALTHY IS OUR COUNTY? 2013 COUNTY HEALTH RANKINGS &amp; ROADMAPS OUTLINE Rankings Background

ROUNDERS (1998) CASINO ROYALE (2006) HAND RANKINGS HIGH CARD HAND RANKINGS PAIR HIGH CARD

Wheatley and National Rankings May 2013 1 Recent National Rankings Several national High

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Statistics for Analytical Science at Warwick Simon Spencer Bayesian statistics in epidemiology

Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric

Statistics for Applications Chapter 8: Bayesian Statistics 1/17 The Bayesian approach (1)

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian statistics DS GA 1002 Probability and Statistics for Data Science

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

MOSCOW, RUSSIA JILL COOPER PROGRAMME MARKETING DIRECTOR GSMA RCS GLOBAL ADOPTION RCS

Software Quality Engineering: Testing, Quality Assurance, and Quantifiable Improvement Jeff

Location Privacy Protection For Smartphone Users Thanks to Kassem Fawaz and Kang G Shin

Dungeons and DQNs Toward Reinforcement Learning Agents that Play Tabletop Roleplaying Games

Why Publish to the VO? Sverin Gaudet Canadian Astronomy Data Centre Canadian Advanced Network

Variable Elimination Probabilistic Graphical Models Sharif University of Technology Spring 2018

Measurement of Wire Sag in a Vibrating Wire Setup* Animesh Jain, Ping He, George Ganetis

CloudKitty Hands-on 1 / 56 Lets meet your hosts! 2 / 56 Lets meet your hosts! Todays

HOW HEALTHY IS OUR COUNTY? 2013 COUNTY HEALTH RANKINGS & ROADMAPS OUTLINE Rankings Background