Unsupervised Rank Aggregation with Distance-Based Models Kevin - PowerPoint PPT Presentation

Unsupervised Rank Aggregation with Distance-Based Models Kevin Small Tufts University Collaborators: Alex Klementiev (Johns Hopkins University) Ivan Titov (Saarland University) Dan Roth (University of Illinois)

Motivation Query � � Consider a panel of judges Each independently generates (partial) � � rankings over objects to the best of their ability � � The need to meaningfully aggregate their output is a fundamental problem Applications are plentiful in Information � � Retrieval and Natural Language Processing 2

Multilingual Named Entity Discovery � � Named Entity Discovery [ Klementiev & Roth, ACL 06 ] : given a bilingual corpus one side of which is annotated with Named Entities, find their counterparts in the other Candidate Candidate Candidate Candidate r1 r1 r1 r1 r2 r2 r2 r2 r3 r3 r3 r3 r4 r4 r4 r4 �� 1 1 1 1 1 3 3 �� 3 3 3 27 27 4 14 �� 2 2 2 3 3 1 1 guimaraes �� 59 2 59 2 59 31 12 �� 7 7 7 17 51 32 17 �� NEs are often transliterated: rank according to a transliteration model score � � NEs tend to co-occur across languages: rank according to temporal alignment � � NEs tend to co-occur in similar contexts: rank according to contextual similarity � � NEs tend to co-occur in similar topics: rank according to topic similarity � � etc. 3

Overview of Our Approach � � We propose a formal framework for unsupervised structured label aggregation � � Judges independently generate a (partial) labeling attempting to reproduce the true underlying label based on their expertise in a given domain � � We derive an EM-based algorithm treating the votes of individual judges and the true label as the observed and unobserved data, respectively � � Intuition : experts in a given domain are better at generating votes close to true ranking and will tend to agree with each other, while the non-experts will not � � We instantiate the framework for the cases of combining permutations, combining top-k lists, and combining dependency parses . 4

Notation � � Permutation � over n objects x 1 … x n 2 1 � � e = (1,2,...,n) is the identity permutation 3 � � Set S n of all n! permutations 4 � � Distance d : S n � S n � R + between permutations � � E.g. Kendall’s tau distance : minimum number of adjacent transpositions 2 2 1 4 d K ( , ) = 3 3 3 4 1 � � d is assumed to be invariant to arbitrary re-labeling of the n objects � � d( � , � ) = d(e, � ) = D( � ). If � is a r.v., so is D=D( � ) 2 2 1 1 1 1 4 2 4 4 d K ( , ) = d K ( , ) = D K ( ) = 3 3 3 3 3 3 4 1 4 2 2 5

Background: Mallows Models Uniform when � = 0 “Peaky” when σ | � | is large where Expensive to compute � � is the dispersion parameter θ ∈ R , θ ≤ 0 � � is the location parameter σ ∈ S n � � d( . , . ) right-invariant, so does not depend on � Z ( θ, σ ) D ( π ) = � m � � If D can be decomposed where are indep. V i i =1 V i ( π ) r.v.’s, then may be efficient to compute [Fligner and Verducci ‘86] E θ ( D ) 6

Generative Story for Aggregation p( � ) Generate the true � according to prior p( � ) � Draw � 1 … � K independently from K Mallows models p( � i | p( � 1 | � 1 , � ) p( � 2 | � 2 , � ) p( � K | � K , � ) … � i , � ) , with the same location parameter � � 1 � 2 � K K � p ( π, σ | θ ) = p ( π ) p ( σ i | θ i , π ) i =1 7

Background: Extended Mallows Models σ ∈ S K The associated conditional model (when votes of K judges are n available) proposed in [Lebanon and Lafferty ’02] : where Free parameters represent the degree of expertise of θ ∈ R K , θ ≤ 0 individual judges. It is straightforward to generalize both models to partial rankings by constructing appropriate distance functions 8

Outline � � Motivation Introduction � � Problem Statement and background � � Overview of our approach � � Background � � Mallows models / Extended Mallows models � � Unsupervised Learning and Inference Our contribution � � Incorporating domain-specific expertise � � Instantiations of the framework � � Combining permutations / top-k lists � � Experiments � � Dependency Parsing � � Conclusions 9

Our Approach [ICML 2008] � � We propose a formal framework for unsupervised rank aggregation based on the extended Mallows model formalism � � We derive an EM-based algorithm to estimate model parameters � � � Observed data: votes of individual judges � � Unobserved data: true ranking Judge 1 Judge 2 Judge K � (1) (1) (1) (1) … � 1 � 2 � K � (2) (2) (2) (2) … � 1 � 2 � K Q … � (Q) (Q) (Q) … (Q) � 1 � 2 � K 10

Learning ′ Denoting to be the value of parameters from the previous iteration, the θ M step for the i th ranker is: LHS RHS In general, > n! Average distance between votes > (n!) Q computations Marginal of the unobserved of the i th ranker and π (1 ..Q ) data π (1 ..Q ) computations 11

Learning and Inference LHS RHS Learning (estimating � ) � � For K constituent rankers, repeat: � � Estimate the RHS given current parameter values � � � Sample with Metropolis-Hastings Depends on structure type, � � Or use heuristics more about this later � � Solve the LHS to update � � � Efficient estimation can be done for particular types of distance functions Inference (computing the most likely ranking) � � Sample with Metropolis-Hastings or use heuristics 12

Domain-specific expertise? [IJCAI 2009] � � Relative expertise may not stay the �� Query same May depend on the type of objects � � May depend on the type of query � � � � Typically, ranked supervised data to estimate judges’ expertise is very expensive to obtain Especially for multiple types � � 13

Mallows Models with Domain-Specific Expertise σ ∈ S K The associated conditional model (when votes of K judges are n available) can be derived: �� K � exp i =1 θ t,i d ( π, σ i ) p ( π, t | σ , θ , α ) = α t Z ( θ , σ ) Free parameters represent the degree of expertise of θ ∈ R T × K , θ ≤ 0 individual judges. are the mixture weights. α ∈ R T Note: it is straightforward to generalize these models to other structured labels (e.g. partial rankings) by constructing appropriate distance functions 14

Learning Q 1 � � p ( π ( j ) , t | σ ( j ) , θ ′ , α ′ ) α t = Q 1 j =1 π ( j ) ∈S n Q 1 d ( π ( j ) , σ ( j ) � � i ) p ( π ( j ) , t | σ ( j ) , θ ′ , α ′ ) E θ t,i ( D ) = α t Q 3 2 j =1 π ( j ) ∈S n � � For each of i th ranker and t th type: ′ α � � � Estimate (1) and (2) given current parameter values and E θ t,i ( D ) α t θ � � Solve 3 to update θ t,i � � Repeat 15

Instantiating the Framework � � We have not committed to a particular type of structure � � In order to instantiate the framework: � � Design a distance function appropriate for the setting � � If a function if right invariant and decomposable [LHS] estimation can be done quickly � � Design a sampling procedure for learning [RHS] and inference 16

Case 1: Combining Permutations [LHS] � � Kendall tau distance D K is the minimum number of adjacent transpositions needed to transform one permutation into another � � Can be decomposed into a sum of independent random variables: n − 1 � � I ( π − 1 ( i ) − π − 1 ( j )) where V i ( π ) = D K ( π ) = V i ( π ) i =1 j>i V i 3 � � And the expected value can be shown to be: 4 1 2 1 3 n 1 5 ne θ je θj � 0 E θ ( D K ) = 1 − e θ − 1 1 − e θj 1 7 j =1 6 0 Monotonically decreasing, can find � with line search quickly 17

Case 1: Combining Permutations [RHS] Sampling from the base chain of random transpositions � � Start with a random permutation π � � � If chain is at , randomly transpose two objects forming π a = p ( π � | θ , σ ) /p ( π | θ , σ ) ≥ 1 π � � � If chain moves to π � � � Else, chain moves to with probability a � � Note that we can compute distance incrementally, i.e. add the change due to a single transposition � � Convergence � � n log(n) if d is Cayley’s distance [Diaconis ’98] , likely similar for some others � � No convergence results for general case, but it works well in practice 18

Case 1: Combining Permutations [RHS] An alternative heuristic: weighted Borda count, i.e. � � Linearly combine ranks of each object and argsort � � Model parameters � represent relative expertise, so it makes e ( − θ i ) sense to weigh rankers as w i = w 1 + w 2 … + w K argsort 19

Unsupervised Rank Aggregation with Distance-Based Models Kevin - PowerPoint PPT Presentation

Unsupervised Rank Aggregation with Distance-Based Models Kevin Small Tufts University Collaborators: Alex Klementiev (Johns Hopkins University) Ivan Titov (Saarland University) Dan Roth (University of Illinois) Motivation Query

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

Part 16: Group Recommender Systems Rank Aggregation and Balancing Techniques Francesco Ricci

Symmetric rank distance codes Kai-Uwe Schmidt Otto-von-Guericke University Magdeburg, Germany 1

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Rank Aggregation from Pairwise Comparisons in the Presence of Adversarial Corruptions Arpit

Course : Data mining Topic : Rank aggregation Aristides Gionis Aalto University Department of

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &

Maximum rank distance codes and finite semifields John Sheekey Universiteit Gent, Belgium

Mark-recapture distance sampling (MRDS) in Distance 7.1 Setting up Distance for MRDS

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

Overview of the MultiLing Pilot in TAC 2011 George Giannakopoulos 1 1 NCSR Demokritos, Greece

Ons Oueslati Water Research Institute, Italian National Research Council Introduction Within

PRESENCE OR ABSENCE OF CHRONIC KIDNEY DISEASE. A BET ON MACE TRIAL REPORT. Kamyar Kalantar-Zadeh,

1 The Stakes: Half a million Coloradans have health insurance because of the Affordable Care Act

Burdekin Grower Research Update Marian Davis Burdekin harvester trials Ryan Turner Water

private healthcare organization: the balanced scorecard as a tool to support management Lara

Optimal Electricity Generation Portfolios in the Presence of Fuel Price and Availability Risks

The 2007 trends: EMEA analysis EMEA analysis The 2007 trends: Francesco Pignatti, MD Safety and

Unsupervised Rank Aggregation with Distance-Based Models Kevin - PowerPoint PPT Presentation

Unsupervised Rank Aggregation with Distance-Based Models Kevin Small Tufts University Collaborators: Alex Klementiev (Johns Hopkins University) Ivan Titov (Saarland University) Dan Roth (University of Illinois) Motivation Query

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

Part 16: Group Recommender Systems Rank Aggregation and Balancing Techniques Francesco Ricci

Symmetric rank distance codes Kai-Uwe Schmidt Otto-von-Guericke University Magdeburg, Germany 1

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &amp;

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Rank Aggregation from Pairwise Comparisons in the Presence of Adversarial Corruptions Arpit

Course : Data mining Topic : Rank aggregation Aristides Gionis Aalto University Department of

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &amp;

Maximum rank distance codes and finite semifields John Sheekey Universiteit Gent, Belgium

Mark-recapture distance sampling (MRDS) in Distance 7.1 Setting up Distance for MRDS

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

Overview of the MultiLing Pilot in TAC 2011 George Giannakopoulos 1 1 NCSR Demokritos, Greece

Ons Oueslati Water Research Institute, Italian National Research Council Introduction Within

PRESENCE OR ABSENCE OF CHRONIC KIDNEY DISEASE. A BET ON MACE TRIAL REPORT. Kamyar Kalantar-Zadeh,

1 The Stakes: Half a million Coloradans have health insurance because of the Affordable Care Act

Burdekin Grower Research Update Marian Davis Burdekin harvester trials Ryan Turner Water

private healthcare organization: the balanced scorecard as a tool to support management Lara

Optimal Electricity Generation Portfolios in the Presence of Fuel Price and Availability Risks

The 2007 trends: EMEA analysis EMEA analysis The 2007 trends: Francesco Pignatti, MD Safety and

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &