Learning to rank search results Voting algorithms, rank combination - PowerPoint PPT Presentation

Learning to rank search results Voting algorithms, rank combination methods Web Search André Mourão, João Magalhães 1

How can we merge these results? • Which model should we select for our production system? • Not trivial. Would require even more relevance judgments. • Can we merge these ranks into a single, better, rank? • Yes, we can! 3

Standing on the shoulders of giants • Vogt and Cottrell identified the following effects: • Skimming Effect: different retrieval models may retrieve different relevant documents for a single query; • Chorus Effect: potential for relevance is correlated with the number of retrieval models that suggest a document; • Dark Horse Effect: some retrieval models may produce more (or less) accurate estimates of relevance, relative to other models, for some documents. C. Vogt, C. and G. Cottrell, Fusion Via a Linear Combination of Scores. Inf. Retr., 1999 4

Example • Consider the following three ranks of five documents (tweets), for a given query: Tweet Desc. Tweet Desc. Tweet count BM25* LM* (user) Position id Score id Score id Score 1 D5 2.34 D5 1.23 D4 19685 2 D4 2.12 D4 1.02 D1 18756 3 D3 1.93 D3 1.00 D2 2342 4 D2 1.43 D1 0.85 D5 2341 5 D1 1.34 D2 0.71 D3 123 *similarity between query text and tweet description, as returned by retrieval model (e.g. BM25, LM) • On a given rank 𝑗 , a document 𝑒 has a score s i 𝑒 and is placed on the r i d position. • Ranks are sorted by score. 5

Search-result fusion methods • Unsupervised reranking methods • Score-based methods • Comb* • Rank-based fusion • Bordafuse • Condorcet • Reciprocal Rank Fusion (RRF) • Learning to Rank 6

Comb* • Use score of the document on the different lists as the main ranking factor: • This can be the Retrieval Status Value of the retrieval model. 𝐷𝑝𝑛𝑐𝑁𝐵𝑌 𝑒 = max 𝑡 0 𝑒 , … , 𝑡 𝑜 𝑒 𝐷𝑝𝑛𝑐𝑁𝐽𝑂 𝑒 = min 𝑡 0 𝑒 , … , 𝑡 𝑜 𝑒 𝐷𝑝𝑛𝑐𝑇𝑉𝑁 𝑒 = ෍ 𝑡 𝑗 𝑒 𝑗 7 Joon Ho Lee. Analyses of multiple evidence combination ACM SIGIR 1997.

CombSUM example • CombSUM is used by Lucene to combine results from multi-field queries: Fusion Tweet Tweet User Doc score Desc. BM25 Desc. LM tweet count 1.02 19685 D4 2.12 19688.14 D1 1.34 0.85 18756 18758.19 D5 2.34 1.23 2341 2344.57 D2 1.43 0.71 2342 2344.14 1.00 123 D3 1.93 125.93 • Ranges of the features may greatly influence ranking • Less prevalent on scores from retrieval models 8

CombSUM example • CombSUM is used by Lucene to combine results from multi-field queries: Fusion Tweet Tweet User Doc score Desc. BM25 Desc. LM tweet count D4 1.80 1.59 2.02 5.40 D5 2.30 2.66 0.23 5.19 D3 1.36 1.48 0.00 2.84 D1 0.00 0.72 1.92 2.64 D2 0.21 0.00 0.23 0.44 𝑡𝑑𝑝𝑠𝑓 − 𝜈 Normalized assuming normal distribution: 𝜏 • Lucene already normalizes scores returned by retrieval models • But scores may not follow normal distribution or be biased on small samples (e.g. 1000 documents retrieved by Lucene) 9

wComb* • Lucene can also give higher/lower weight to scores from different fields Query query = queryParserHelper.parse(queryString, "abstract"); query.setBoost(0.3f); • These weights are then multiplied by the scores: 𝑥𝐷𝑝𝑛𝑐𝑇𝑉𝑁 𝑒 = ෍ 𝑥 𝑗 𝑡 𝑗 𝑒 𝑗 w 𝐷𝑝𝑛𝑐𝑁𝑂𝑎 𝑒 = 𝑗|𝑒 ∈ 𝑆𝑏𝑜𝑙 𝑗 ∙ 𝑥𝐷𝑝𝑛𝑐𝑇𝑉𝑁 𝑒 • How to find these weights? • Manually • Machine learning (more on this latter) 10

CombMNZ • CombMNZ multiplies the number of ranks where the document occurs by the sum of the scores obtained across all lists. 𝐷𝑝𝑛𝑐𝑁𝑂𝑎 𝑒 = 𝑗|𝑒 ∈ 𝑆𝑏𝑜𝑙 𝑗 ∙ ෍ 𝑡 𝑗 𝑒 𝑗 • Despite normalization issues common in score-based methods, CombMNZ is competitive with rank-based approaches. 11

Borda fuse • A voting algorithm based on the positions of the candidates. • Invented by Jean-Charles de Borda in 18 th century • For each rank, the document gets a score corresponding to its (inverse) position on the rank. • The fused rank is based on the sum of all per-rank scores. Fusion Tweet Tweet User Doc score Desc. BM25 Desc. LM tweet count D4 D5 D1 D3 D2 12 Javed A. Aslam , Mark Montague, Models for metasearch, ACM SIGIR 2001

Borda fuse • A voting algorithm based on the positions of the candidates. • Invented by Jean-Charles de Borda in 18 th century • For each rank, the document gets a score corresponding to its (inverse) position on the rank. • The fused rank is based on the sum of all per-rank scores. Fusion Tweet Tweet User Doc score Desc. BM25 Desc. LM tweet count D4 (5-2)=3 (5-2)=3 (5-1)=4 10 D5 D1 D3 D2 13 Javed A. Aslam , Mark Montague, Models for metasearch, ACM SIGIR 2001

Borda fuse • A voting algorithm based on the positions of the candidates. • Invented by Jean-Charles de Borda in 18 th century in France • For each rank, the document gets a score corresponding to its (inverse) position on the rank. • The fused rank is based on the sum of all per-rank scores. Fusion Tweet Tweet User Doc score Desc. BM25 Desc. LM tweet count D4 3 3 4 10 D5 4 4 1 9 D1 0 1 3 4 D3 2 2 0 4 D2 1 0 2 3 14 Javed A. Aslam , Mark Montague, Models for metasearch, ACM SIGIR 2001

Condorcet • Voting algorithm that started as a way to select the best candidate on an election • Marquis de Condorcet, also in 18 th century France • Based on a majoritarian method • Uses pairwise comparisons, r(d1)>r(d2). • For each pair (d1,d2) we compare the number of times d1 beats d2. • The best candidate found through the pairwise comparisons. • Generalizing Condorcet to produce a rank can have a high computationally complexity. • There are solutions to compute the rank with low complexity. 15 Mark Montague and Javed A. Aslam. Condorcet fusion for improved retrieval. ACM CIKM 2002.

Condorcet example Pairwise comparison D1 D2 D3 D4 D5 D1 D2 D3 D4 D5 Tweet Desc. BM25: D2 > D1 Tweet Desc. LM : D1 > D2 Tweet count : D1 > D2 16

Condorcet example Pairwise comparison D1 D2 D3 D4 D5 D1 - 2,0,1 D2 1,0,2 D3 D4 D5 Win, Draw, Lose Tweet Desc. BM25: D2 > D1 Tweet Desc. LM : D1 > D2 D1 vs D2 1, 0, 2 Tweet count : D1 > D2 D2 vs D1 2, 0, 1 17

Condorcet example Pairwise comparison D1 D2 D3 D4 D5 D1 - 2,0,1 1,0,2 0,0,3 1,0,2 D2 1,0,2 - 1,0,2 0,0,3 2,0,1 D3 2,0,1 2,0,1 - 0,0,3 0,0,3 D4 3,0,0 3,0,0 3,0,0 - 1,0,2 D5 2,0,1 2,0,1 3,0,0 2,0,1 - 18

Condorcet example Pairwise comparison Pairwise winners D1 D2 D3 D4 D5 Win Tie Lose Score D1 - 2,0,1 1,0,2 0,0,3 1,0,2 D4 10 0 2 8 D2 1,0,2 - 1,0,2 0,0,3 2,0,1 D5 9 0 3 6 D3 2,0,1 2,0,1 - 0,0,3 0,0,3 D3 4 0 8 -4 D1 4 0 8 -4 D4 3,0,0 3,0,0 3,0,0 - 1,0,2 D2 4 0 8 -4 D5 2,0,1 2,0,1 3,0,0 2,0,1 - 19

Reciprocal Rank Fusion (RRF) • The reciprocal rank fusion weights each document with the inverse of its position on the rank. • Favours documents at the “top” of the rank. • Penalizes documents below the “top” of the rank 1 𝑆𝑆𝐺𝑡𝑑𝑝𝑠𝑓 𝑒 = ෍ 𝑙 + 𝑠 𝑗 𝑒 , 𝑗 where k = 60 Gordon Cormack, Charles LA Clarke, and Stefan Büttcher. Reciprocal rank fusion outperforms 20 Condorcet and individual rank learning methods. ACM SIGIR 2009.

RRF example 1 𝑆𝑆𝐺𝑡𝑑𝑝𝑠𝑓 𝑒 = ෍ 𝑙 + 𝑠 𝑗 𝑒 , 𝑙 = 0 (𝑔𝑝𝑠 𝑢ℎ𝑗𝑡 𝑓𝑦𝑏𝑛𝑞𝑚𝑓) 𝑗 Fusion Tweet Tweet User Doc score Desc. BM25 Desc. LM tweet count D5 D4 D1 D3 D2 21

RRF example 1 𝑆𝑆𝐺𝑡𝑑𝑝𝑠𝑓 𝑒 = ෍ 𝑙 + 𝑠 𝑗 𝑒 , 𝑙 = 0 (𝑔𝑝𝑠 𝑢ℎ𝑗𝑡 𝑓𝑦𝑏𝑛𝑞𝑚𝑓) 𝑗 Fusion Tweet Tweet User Doc score Desc. BM25 Desc. LM tweet count D5 1/1 1/4 1/1 2.250 D4 D1 D3 D2 22

RRF example 1 𝑆𝑆𝐺𝑡𝑑𝑝𝑠𝑓 𝑒 = ෍ 𝑙 + 𝑠 𝑗 𝑒 , 𝑙 = 0 (𝑔𝑝𝑠 𝑢ℎ𝑗𝑡 𝑓𝑦𝑏𝑛𝑞𝑚𝑓) 𝑗 Fusion Tweet Tweet User Doc score Desc. BM25 Desc. LM tweet count D5 1/1 1/4 1/1 2.250 D4 1/2 1/1 1/2 2.000 D1 1/5 1/2 1/4 0.950 D3 1/3 1/5 1/3 0.866 D2 1/4 1/3 1/5 0.783 23

Experimental comparison TREC45 Gov2 1998 1999 2005 2006 Method P@10 MAP P@10 MAP P@10 MAP P@10 MAP VSM 0.266 0.106 0.240 0.120 0.298 0.092 0.282 0.097 BIN 0.256 0.141 0.224 0.148 0.069 0.050 0.106 0.083 2-Poisson 0.402 0.177 0.406 0.207 0.418 0.171 0.538 0.207 BM25 0.424 0.178 0.440 0.205 0.471 0.243 0.534 0.277 LMJM 0.390 0.179 0.432 0.209 0.416 0.211 0.494 0.257 LMD 0.450 0.193 0.428 0.226 0.484 0.244 0.580 0.293 BM25F 0.482 0.242 0.544 0.277 BM25+PRF 0.452 0.239 0.454 0.249 0.567 0.277 0.588 0.314 RRF 0.462 0.215 0.464 0.252 0.543 0.297 0.570 0.352 Condorcet 0.446 0.207 0.462 0.234 0.525 0.281 0.574 0.325 CombMNZ 0.448 0.201 0.448 0.245 0.561 0.270 0.570 0.318 LR 0.446 0.266 0.588 0.309 RankSVM 0.420 0.234 0.556 0.268 24

Learning to rank search results Voting algorithms, rank combination - PowerPoint PPT Presentation

Learning to rank search results Voting algorithms, rank combination methods Web Search Andr Mouro, Joo Magalhes 1 2 How can we merge these results? Which model should we select for our production system? Not trivial. Would

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

How to Rank Your Website on Page #1 of Google SEARCH ENGINE OPTIMISATION (SEO) Search Results

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Top Rank SEO Philippines was established in 2015 by Search Engine Optimization Specialists with

Learning to Rank Learning to Rank with Partially-Labeled Data with Partially-Labeled Data Kevin

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

Cross-Domain Learning-to-rank with SVM Erheng Zhong 1 1 Department of Computer Science and

Learning to Rank with Learning to Rank with Partially-Labeled Data Partially-Labeled Data Kevin

2018 - 2019 Teacher Salary Comparison Report 0-Year 5-Year 10-Year 15-Year 20-Year District

A BIG MULTILINGUAL TERMINOLOGICAL DATA SPACE Rodolfo Maslias (EU TermCoord) and Roberto Navigli

How Not to Write an x86 Platform Driver Core-kernel dev plays with device drivers.... October 24,

DYNAMIC: DONT BE AFRAID Hadi Hariri JetBrains Agenda The What, the Why, the How A Tale as

VDM : Mathematical Structures for Formal Methods Andrew Butterfield 19th May 2000 Abstract

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

Scalable Data Services with mongoDB High Performance High Availability for... Managers

Interoperability Challenges in Libraries Adam Brin Digital Antiquity Back in Time How did you

Merging Results from Multiple Sources in Video Retrieval Wei-Hao Lin Language Technologies