Unsupervised Rank Aggregation with Distance-Based Models
Kevin Small
Tufts University
Unsupervised Rank Aggregation with Distance-Based Models Kevin - - PowerPoint PPT Presentation
Unsupervised Rank Aggregation with Distance-Based Models Kevin Small Tufts University Collaborators: Alex Klementiev (Johns Hopkins University) Ivan Titov (Saarland University) Dan Roth (University of Illinois) Motivation Query
Tufts University
2 Consider a panel of judges
The need to meaningfully aggregate
3
Candidate r1 r2 r3 r4
NEs are often transliterated: rank according to a transliteration model score NEs tend to co-occur across languages: rank according to temporal alignment NEs tend to co-occur in similar contexts: rank according to contextual similarity NEs tend to co-occur in similar topics: rank according to topic similarity etc.
Candidate r1 r2 r3 r4
r1 r2 r3 r4
1
27
3
17
r1 r2 r3 r4
1 3 3
27 4 14
3 1 1
31 12
17 51 32
4
Judges independently generate a (partial) labeling attempting to reproduce
We derive an EM-based algorithm treating the votes of individual judges
5
e = (1,2,...,n) is the identity permutation
E.g. Kendall’s tau distance: minimum number of adjacent transpositions
d(, ) = d(e, ) = D(). If is a r.v., so is D=D()
2 1 3 4 2 1 3 4
2 1 3 4 2 1 3 4 1 2 3 4 1 2 3 4 1 2 3 4 2 1 3 4
6
is the dispersion parameter is the location parameter d(.,.) right-invariant, so does not depend on If D can be decomposed where are indep.
i=1 Vi(π)
Uniform when = 0 “Peaky” when || is large Expensive to compute
σ
7
K
8
9
Incorporating domain-specific expertise
Combining permutations / top-k lists
10
We propose a formal framework for unsupervised rank aggregation based
We derive an EM-based algorithm to estimate model parameters
(1)
(1)
(1)
(2)
(2)
(2)
(Q)
(Q)
(Q)
Observed data: votes of individual judges Unobserved data: true ranking
11
LHS RHS
′
> (n!)Q computations In general, > n! computations Marginal of the unobserved data π(1..Q) Average distance between votes
12
For K constituent rankers, repeat:
Estimate the RHS given current parameter values
Sample with Metropolis-Hastings Or use heuristics Solve the LHS to update Efficient estimation can be done for particular types of distance functions
LHS RHS
Sample with Metropolis-Hastings or use heuristics
13 Relative expertise may not stay the
Typically, ranked supervised data to
14
15
Q
Q
i )p(π(j), t|σ(j), θ′, α′)
For each of ith ranker and tth type:
Estimate (1) and (2) given current parameter values and Solve 3 to update
Repeat
′
16
If a function if right invariant and decomposable [LHS] estimation can
17
Kendall tau distance DK is the minimum number of adjacent transpositions
Can be decomposed into a sum of independent random variables: And the expected value can be shown to be:
n−1
i
2 3 1 6 5 4 7 3 1 1 1 1
V
n
Monotonically decreasing, can find with line search quickly
18
Start with a random permutation If chain is at , randomly transpose two objects forming
If chain moves to Else, chain moves to with probability
Note that we can compute distance incrementally, i.e. add the change
Convergence
n log(n) if d is Cayley’s distance [Diaconis ’98], likely similar for some others No convergence results for general case, but it works well in practice
19
20
We extend Kendall tau to top-k
k
˜ π−1(i)/ ∈Z
k
˜ π−1(i)∈Z
Bring grey boxes to bottom Switch with objects in (k+1) Kendall’s tau for the k elements r grey boxes z white boxes r + z = k
21
R.v.’s and are independent, we can use the same trick to show
Also monotonically decreasing, can again use line search Both and reduce to Kendall tau results when same elements are
k
Sampling / heuristics for [RHS] and inference are similar to the
22
Incorporating domain-specific expertise
Combining permutations / top-k lists
50 100 150 200 250 2 4 6 8 10 12 14 16 18 20 Average Dk to true permutation EM Iteration Sampling Weighted True
23
24
Judges: K = 4 search engines (S1, S2, S3, S4) Documents: Top k = 100 Queries: Q = 50 queries
Our model gets 0.92
25
Judges: K = 38 TREC-3 ad-hoc retrieval shared task participants Documents: Top k = 100 documents Queries: Q = 50 queries
K
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 20 22 24 26 28 30 32 34 36 38 Precision Number of random rankers Kr Aggregation (Top-10) CombMNZrank (Top-10) Aggregation (Top-30) CombMNZrank (Top-30)
26
27
Incorporating domain-specific expertise
Combining permutations / top-k lists
28
ROOT Buyers 1 stepped 2 in 3 to 4 the 5 futures 6 pit 7 . 8
SBJ ROOT ADV AMOD NMOD NMOD PMOD P
ROOT Buyers 1 stepped 2 in 3 to 4 the 5 futures 6 pit 7 . 8
ROOT PMOD NMOD NMOD PMOD ADV SBJ ROOT SBJ ROOT ADV AMOD NMOD NMOD PMOD P
2 2 3 7 7 4 2 Buyers 1 stepped 2 in 3 to 4 the 5 futures 6 pit 7 . 8
SBJ ROOT ADV PMOD NMOD NMOD PMOD ROOT
2 2 3 7 7 4 Buyers 1 stepped 2 in 3 to 4 the 5 futures 6 pit 7 . 8
SBJ ROOT ADV AMOD NMOD NMOD PMOD P
2 2 3 7 7 4 2 Buyers 1 stepped 2 in 3 to 4 the 5 futures 6 pit 7 . 8
SBJ ROOT ADV PMOD NMOD NMOD PMOD ROOT
2 2 3 7 7 4 Buyers 1 stepped 2 in 3 to 4 the 5 futures 6 pit 7 . 8
v(1) v v(2) v(3) v(4) v(5) v(6) v(7) v(8) y(8) y(7) y(6) y(5) y(4) y(3) y(2) y(1) y
SBJ ROOT ADV AMOD NMOD NMOD PMOD P
2 2 3 7 7 4 2 Buyers 1 stepped 2 in 3 to 4 the 5 futures 6 pit 7 . 8
SBJ ROOT ADV PMOD NMOD NMOD PMOD ROOT
2 2 3 7 7 4 Buyers 1 stepped 2 in 3 to 4 the 5 futures 6 pit 7 . 8
v(1) v v(2) v(3) v(4) v(5) v(6) v(7) v(8) y(8) y(7) y(6) y(5) y(4) y(3) y(2) y(1) y
Let denote a pair
dH(v, y) =
n
v(i) = y(i)
Labeled attachment
29
Parameter estimation of the type-agnostic model can be done directly Let us assume there are exactly possibilities for each link, and that jth
On each round of training the learning procedure for the type-agnostic
With small , parameter estimation can be done quickly!
i − log(1 − R i) − log (|S| − 1)
i = 1
Q
n(j)
i,(l) exp
i=1 θ iv = y(j) i,(l)
i=1 θ iv = y(j) i,(l)
j=1 n(j) = N
30
Dependency parsers are CoNLL-2007 shared task participants
10 languages: Arabic, Basque, Catalan, Chinese, Czech, English, Greek, Hungarian,
131 to 690 sentences and 4513 to 5390 words, depending on the language Between 20 and 23 systems, depending on the language
Varied the number of participants attempting to represent expertise in the
Baseline: majority vote on each link (ties broken randomly)
Accuracy
Group 1 Group 2 Group K
31
Participant
True Performance jni@msi.vxu.se
1 84.40 sagae@is.s.u-tokyo.ac.jp
2 83.91 nakagawa378@oki.com
3 83.61 johan.hall@vxu.se
5 82.48 carreras@csail.mit.edu
4 83.46 chenwl@nict.go.jp
7 82.04 attardi@di.unipi.it
8 81.34 xyduan@nlpr.ia.ac.cn
9 80.75 ivan.titov@cui.unige.ch
6 82.26 dasmith@jhu.edu
10 80.69 michael.schiehlen@ims.uni-stuttgart.de
11 80.46 bcbb@db.csie.ncu.edu.tw
12 78.79 prashanth@research.iiit.ac.in
13 78.67 richard@cs.lth.se
14 77.55 nguyenml@jaist.ac.jp
16 75.06 joyce840205@gmail.com
17 74.65 s.v.m.canisius@uvt.nl
15 75.57 francis.maes@lip6.fr
18 73.63 zeman@ufal.mff.cuni.cz
19 62.13 svetoslav.marinov@his.se
20 59.75
32
Performance measured as average accuracy over 10 languages
79 80 81 82 83 4 6 8 10 12 14 16 18 Average Labeled Attachment Score Number of Participants Voted Baseline Aggregation Model
33
34