Using an Inverted Index Synopsis for Query Latency and Performance Prediction
Nicola Tonellotto University of Pisa nicola.tonellotto@unipi.it
Using an Inverted Index Synopsis for Query Latency and Performance - - PowerPoint PPT Presentation
Using an Inverted Index Synopsis for Query Latency and Performance Prediction Nicola Tonellotto University of Pisa nicola.tonellotto@unipi.it The scale of Web search challenge How many documents? In how long? Reports suggest that Google
Nicola Tonellotto University of Pisa nicola.tonellotto@unipi.it
pages in the indexes of its search engine
all of those pages: the index data structures help it to efficiently find pages that effectively match the query and will help the user
Shard Replica Query Server Retrieval Strategy Shard Replica Query Server Retrieval Strategy Broker Query Scheduler queries
N M
Results Merging
Source: https://www.pexels.com/photo/datacenter-server-449401/
Query
BM25 + DAAT 1,000 – 10,000 docs
Base Ranker Inverted Index
First Stage
Top Ranker Features Learning to Rank Algorithms
Second Stage
Learning To Rank 10 – 100 docs
⋮
Result Page(s)
N documents K documents
executed.
effectiveness for efficiency
queries
energy savings
OR OR OR OR OR AND AND AND AND
docid space score space critical docid σ1 σ3 σ2 σ4 σ5 threshold 휃 t1 t2 t3 t4 t5 critical docid critical docid critical docid
docid space σ1 σ3 σ2 t1 t2 t3 t1 t2 t1 t3 t1 t2 t3 t2 t3 σ1 + σ2 σ1 + σ3 σ2 + σ3 σ1 + σ2 + σ3 OR OR OR OR OR OR OR AND AND AND AND AND AND score space critical docid critical docid critical docid critical docid critical docid critical docid
threshold 휃
2 term queries 4 term queries
Query processing strategy (MaxScore, Wand, BMW) Number of terms Length of posting lists Co-occurrence of query terms (Posting list union/intersection)
engine infrastructure (Jeon et al., SIGIR 2014)
pairs of query terms
used to calculate them
index does not reflect well the term distributions in the rest of the index
model
12 11 9 8 7 14 13 5 3 2 1 12 15 14 6 5 4 2 11 9 8 7 15 13 4 3 1 12 11 10 8 7 15 14 13 6 5 4 12 9 8 14 6 4 2 1
𝛿 sampling
12 1 12 15 4 15 4 1 12 10 15 4 12 4 1
Can be used to estimate the expected number of documents processed in any query, processed either in OR mode (union of posting lists) or in AND mode (intersection of posting lists)
Original docids Remapped docids
Intersection Union Analytical model Index synopsis
MaxScore WAND BMW
synopsis indices
BMW strategies on a full inverted index
pruning strategies
first-pass retrieval to achieve efficient neural retrieval?