Tiered Indexes Indexing, session 12 CS6200: Information Retrieval - PowerPoint PPT Presentation

Tiered Indexes Indexing, session 12 CS6200: Information Retrieval Slides by: Jesse Anderton

Champion Lists Champion Lists Champion Lists are inverted lists for terms which contain only the highest-scoring d1 d2 d3 documents for that term. tf cheap 2 6 0 tf used 1 0 6 At indexing time, we compute a document’s matching score for a term. If it’s one of the top tf cars 8 3 5 r documents, we add it to the champion list. cheap d1 d2 champions At query time, we first match documents in the champion list for any query term, and only used d1 d3 champions proceed to other documents if that didn’t find enough results. cars d1 d3 champions We can pick larger r for terms with higher df . Why would this help? d2 others

Sorting by Quality As a generalization of champion lists, we can sort the postings for a term by some Postings sorted by quality document quality score q d . Suppose the d1 d2 d3 quality score is part of our matching function: q d 0.5 0.25 0.75 � score ( D , Q ) = λq D + ( 1 − λ ) f ( w ) · g ( w ) cheap d1 d2 w ∈ Q Recall that we want to sort the postings by a used d3 d1 common value so we can easily merge them. We previously sorted by docid. cars d3 d1 d2 Sorting by global document quality still allows efficient merging, though sorting by a term-based matching score would not.

Impact Ordering If we use term-at-a-time processing, we Postings sorted by tf can sort the lists in different orders. d1 d2 d3 Impact Ordering sorts lists by some tf cheap 2 6 0 notion of term relevance. As a simple tf used 1 0 6 example, tf w,d can be used. tf cars 8 3 5 Here, we often stop processing documents early in each list. We may cheap d2 d1 process query terms in order of decreasing df , and stop processing each used d3 d1 list when document scores stop changing much. We may also skip low- df cars d1 d3 d2 terms.

Tiered Indexes Tiered Indexes take these ideas further. d1 d2 d3 We use multiple indexes. Documents tf cheap 27 3 0 likely to have the highest scores are in tf used 17 0 6 the first index, and subsequent indexes tf cars 8 13 16 have progressively worse documents. cheap d1 We process queries in one index at a Tier 1 time, stopping when we find enough used d1 tf ≥ 10 documents. Only a few queries will need cars d2 d3 all indexes. cheap d2 Early tiers are often optimized for speed. For instance, the top tier might be held Tier 2 used d3 tf < 10 in RAM, while lower tiers are on disk. cars d1

Query Caching Caching also plays an essential role in improving query performance for large search engines. Many forms of caching are used. • Results for common queries are cached. A substantial fraction of queries are run by many users (e.g., “facebook”). • Merged inverted lists for common sets of query terms are cached. This is particularly useful for common phrases (e.g., “new york city”). • Caching is particularly important in Peer-to-peer search, where a query may download cached results from other peers. Caching is often implemented in a multi-level way, e.g., the query cache is checked first, then a cache of merged lists is checked, and finally a cache of individual inverted lists.

Wrapping Up The organization of indexes in a large-scale search engine is important for rapid query processing. Inverted lists can be sorted in various ways to improve inexact top k retrieval performance, and tiered indexes are often used to handle “easy” queries quickly while still offering good performance for rarer, more difficult queries. Good multi-level caching strategies are also essential for achieving good performance, particularly for web and peer-to-peer search.

Tiered Indexes Indexing, session 12 CS6200: Information Retrieval - PowerPoint PPT Presentation

Tiered Indexes Indexing, session 12 CS6200: Information Retrieval Slides by: Jesse Anderton Champion Lists Champion Lists Champion Lists are inverted lists for terms which contain only the highest-scoring d1 d2 d3 documents for that term.

Module 7: Creating and Maintaining Indexes Overview Creating Indexes Creating Index

Modern OLTP Indexes (Part 2) 1 / 43 Modern OLTP Indexes (Part 2) Recap Recap 2 / 43 Modern OLTP

An Example of Index An Example of Index pattern of structure in indicators pattern of structure

Module 6: Planning Indexes Overview Introduction to Indexes Index Architecture How

Balancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation Hui

Tiered Rates Tiered Rates Cathedral City Cathedral City City City Council Study Session

Multi-Tiered System of Support (MTSS) December 4, 2018 What is MTSS? Multi-Tiered System of

CPSC 875 CPSC 875 John D McGregor John D. McGregor Ocarina Tiered Tiered Ocarina Ocarina

Frogmoor Concept Design Options 3 Concept Design Options: 1) Traditional tiered fountain 2)

Dow Jones Sustainability Indexes A cooperation of Dow Jones Indexes and SAM Content Key

RECIPE : Converting Concurrent DRAM Indexes to Persistent-Memory Indexes Se Kwon Lee, Jayashree

Indexes 1 Demo 2 Indexes Index = data structure

Multi-Tiered Systems of Support The MTSS Framework for Brunswick County Schools Board of

Self-Care for Parents 1 Welcome! LD East Parent and Community LD East Multi-Tiered Systems of

Proposed TAP-R Reconciliation JULY 2020 What is the Tiered Assistance Program (TAP)? TAP is an

UNIFIED SCHOOL DISTRICT OF MARSHFIELD Multi-Tiered System of Support Update What is MTSS?

RECSM Summer School: Social Network Analysis Pablo Barber a School of International Relations

The New Agenda: Patient Centered Strategies for the Exam Room Kickoff Webinar March 28, 2016

Graduate Physics Programs Admissions Overview David Wittman University of California, Davis

Heuristic Optimization Thomas St utzle IRIDIA, CoDE Universit e Libre de Bruxelles

Splicing TE-LSPs in Inter-AS/ Hierarchical CsC scenarios

Succinct Malleable NIZKs and an Application to Compact Shuffles Melissa Chase (MSR Redmond)

7/10/2020 Air Quality Task Force Meeting 7/10/2020 Air Quality Task Force Meeting

Servicing and Loss Mitigation Jennifer Schultz, Esq. Community Legal Services, Inc. 1410 W. Erie