Efficient query processing
Efficient scoring, distributed query processing
Web Search
1
Efficient query processing Efficient scoring, distributed query - - PowerPoint PPT Presentation
Efficient query processing Efficient scoring, distributed query processing Web Search 1 Ranking functions In general, document scoring functions are of the form The BM25 function, is one of the best performing: The term frequency is
Efficient scoring, distributed query processing
1
2
3
Section 5.1
term is scored
computationaly too complex.
4
Sort in decreasing order of score Gets the docs with the lowest ID Process one doc Gets all docs with the query terms Replace the worst doc
5
Sort in increasing order of score
6
Both methods are exact!
7
the score of each term in the query.
documents.
posting list?
9
time and an accumulator stores the score of each term in the query.
accumulator contains the scores of the documents.
the size of the collection or the largest posting list?
10
MaxScore 188ms, 93 ms, 2.8x105 docs
MaxScore 242ms, 152 ms, 6.2x105 docs
13
query term’s postings
advance that they will be almost noise?
postings in the index.
14
general (IDF) or relevant for the document (TF).
index
15
the most representative of that document.
terms distribution in the document and in the collection.
to be added to the index
16
Document-centric Term-centric
17
MaxScore 188ms, 93 ms, 2.8x105 docs
MaxScore 242ms, 152 ms, 6.2x105 docs
18
19
Chapter 7
Quantitative
followers.
Query Leader Follower
decreasing importance
26
Section 14.1
Document partitioning Term partitioning
random sub-set of the documents
the final list with m results
synchronized across index servers
27
search results from the index-servers to the server doing the rank fusion;
28
corresponding nodes.
29
from the entire planet at every second…
queries is to use DNS to distributed queries across data-centers.
IP according to the data-center load and to the user’s geographic location.
30
Barroso, Luiz André, Jeffrey Dean, and Urs Hölzle. "Web search for a planet: The Google cluster architecture." IEEE Micro (2003)
31
Chapter 7 and 9 Section 5.1 Section 14.1