Inverted Indexes IR, session 5 CS6200: Information Retrieval - PowerPoint PPT Presentation

Jan 15, 2024 •226 likes •299 views

Inverted Indexes IR, session 5 CS6200: Information Retrieval Slides by: Jesse Anderton Scaling up A term incidence matrix with V Corpus Terms Docs Entries terms and D documents has O(V x D) entries. Shakespeares ~1.1 ~31,000 37

Inverted Indexes IR, session 5 CS6200: Information Retrieval Slides by: Jesse Anderton
Scaling up • A term incidence matrix with V Corpus Terms Docs Entries terms and D documents has O(V x D) entries. Shakespeare’s ~1.1 ~31,000 37 Plays million • Shakespeare used around 31,000 distinct words across 37 plays, for about 1.1M entries. English ~1.7 ~4.5 ~7.65 • As of 2014, a collection of Wikipedia Wikipedia million million trillion pages comprises about 4.5M pages and roughly 1.7M distinct words. Assuming just one bit per matrix >2 >1.7 >3.4x10 15 English Web million billion entry, this would consume about 890GB of memory.
Inverted Indexes • Two insights allow us to reduce this to a manageable size: 1. The matrix is sparse – any document uses a tiny fraction of the vocabulary. 2. A query only uses a handful of words, so we don’t need the rest. • We use an inverted index instead of using a term incidence matrix directly. • An inverted index is a map from a term to a posting list of documents which use that term.
Search Algorithm • Consider queries of the form: t 1 AND t 2 AND … AND t n • In this simplified case, we need only take the intersections of the term posting lists. • This algorithm, inspired by merge sort, relies on the posting lists being sorted by length. • We save time by processing the terms in order from least common to most common. (Why does this help?)
Example
Wrapping Up • All modern search engines rely on inverted indexes in some form. Many other data structures were considered, but none has matched its efficiency. • The entries in a production inverted index typically contain many more fields providing extra information about the documents. • The efficient construction and use of inverted indexes is a topic of its own, and will be covered in a later module. • Next, we’ll see a more nuanced way to find relevant documents.

Recommend

Inverted Indexes the IR Way CS330 Fall 2005 1 Term Doc # How Inverted Files now 1 is 1

Inverted Indexes the IR Way CS330 Fall 2005 1 Term Doc # How Inverted Files now 1 is 1 the 1 Are Created time 1 for 1 all 1 good 1 men 1 Periodically rebuilt, static otherwise. to 1 come 1 Documents are parsed to

675 views • 18 slides

Module 7: Creating and Maintaining Indexes Overview Creating Indexes Creating Index

Module 7: Creating and Maintaining Indexes Overview Creating Indexes Creating Index Options Maintaining Indexes Introduction to Statistics Querying the sysindexes Table Setting Up Indexes Using the Index Tuning Wizard

555 views • 27 slides

Modern OLTP Indexes (Part 2) 1 / 43 Modern OLTP Indexes (Part 2) Recap Recap 2 / 43 Modern OLTP

Modern OLTP Indexes (Part 2) Modern OLTP Indexes (Part 2) 1 / 43 Modern OLTP Indexes (Part 2) Recap Recap 2 / 43 Modern OLTP Indexes (Part 2) Recap Versioned Latch Coupling Optimistic coupling scheme where writers are not blocked on

610 views • 43 slides

Indices Tomasz Bartoszewski Inverted Index Search Construction Compression Inverted

Indices Tomasz Bartoszewski Inverted Index Search Construction Compression Inverted Index In its simplest form, the inverted index of a document collection is basically a data structure that attaches each distinctive term with a

1.27k views • 41 slides

An Example of Index An Example of Index pattern of structure in indicators pattern of structure

Chapter 6. Composite Measures What are indexes, scales, and Chapter 6. Composite Measures What are indexes, scales, and - Indexes, Scales and Typologies - Indexes, Scales and Typologies typologies? typologies? What are indexes, scales,

251 views • 3 slides

Module 6: Planning Indexes Overview Introduction to Indexes Index Architecture How

Module 6: Planning Indexes Overview Introduction to Indexes Index Architecture How SQL Server Retrieves Stored Data How SQL Server Maintains Index and Heap Structures Deciding Which Columns to Index Introduction to Indexes

352 views • 31 slides

Crawling HTML create an user user inverted index query Search show results inverted

1/22/2013 Standard Web Search Engine Architecture store documents, check for duplicates, extract links crawl the web DocIds Crawling HTML create an user user inverted index query Search show results inverted engine index To user

505 views • 5 slides

Inverted Index Lecture 12 Inverted Index 1 December 2014 1 Wentworth Institute of Technology

Wentworth Institute of Technology COMP570 Database Applications | Fall 2014 | Derbinsky Inverted Index Lecture 12 Inverted Index 1 December 2014 1 Wentworth Institute of Technology COMP570 Database Applications | Fall 2014 | Derbinsky

396 views • 28 slides

Information Retrieval Lecture 2 Recap of the previous lecture Basic inverted indexes:

Information Retrieval Lecture 2 Recap of the previous lecture Basic inverted indexes: Structure Dictionary and Postings Key steps in construction sorting Boolean query processing Simple optimization Linear time

556 views • 44 slides

Compressing Inverted Indexes with Recursive Graph Bisection: A Reproducibility Study Joel

Compressing Inverted Indexes with Recursive Graph Bisection: A Reproducibility Study Joel Mackenzie 1 Antonio Mallia 2 Mathias Petri 3 J. Shane Culpepper 1 Torsten Suel 2 1 RMIT University, Melbourne, Australia 2 New York University, New York, USA

1.07k views • 61 slides

Fast Dictionary-based Compression for Inverted Indexes Giulio Ermanno Pibiri Matthias Petri

Fast Dictionary-based Compression for Inverted Indexes Giulio Ermanno Pibiri Matthias Petri Alistair Mo ff at The University of Pisa The University of Melbourne The University of Melbourne and ISTI-CNR Melbourne, Australia Melbourne,

257 views • 21 slides

Dow Jones Sustainability Indexes A cooperation of Dow Jones Indexes and SAM Content Key

Dow Jones Sustainability Indexes A cooperation of Dow Jones Indexes and SAM Content Key Facts Assessment 2011 Dow Jones Sustainability World Index Dow Jones Sustainability Europe Index Dow Jones Sustainability Asia

554 views • 40 slides

RECIPE : Converting Concurrent DRAM Indexes to Persistent-Memory Indexes Se Kwon Lee, Jayashree

RECIPE : Converting Concurrent DRAM Indexes to Persistent-Memory Indexes Se Kwon Lee, Jayashree Mohan, Sanidhya Kashyap * , Taesoo Kim, Vijay Chidambaram *On the job market 1 Persistent Memory (PM) New storage class memory technology

1.01k views • 70 slides

Indexes 1 Demo 2 Indexes Index = data structure

Indexes 1 Demo 2 Indexes Index = data structure used to speed access to tuples of a rela7on, given values of one or more

1.08k views • 15 slides

Microsoft AI & Research Traditional IR Keyword based Search AUTB streams Inverted index

Microsoft AI & Research Traditional IR Keyword based Search AUTB streams Inverted index User Engagement Natural Language User clicks, Search Inverted index metawords Voice, Vision AI Context-Based Deep Learning Vectors Search

708 views • 46 slides

NPFL103: Information Retrieval (1) Introduction, Boolean retrieval, Inverted index, Text

Introduction Boolean retrieval Inverted index Boolean queries Text processing Phrase queries Proximity search NPFL103: Information Retrieval (1) Introduction, Boolean retrieval, Inverted index, Text processing Pavel Pecina Institute of

1.12k views • 65 slides

CS 241 Data Organization Quiz 2 February 1, 2018 Question 1: Automatic variable In the C

CS 241 Data Organization Quiz 2 February 1, 2018 Question 1: Automatic variable In the C Programming Language, an automatic variable is: A A local variable in a function which comes into existence at the time the function is called, and

278 views • 7 slides

User-Defined Functions Purpose of this Video Series Goal : Create your own functions Not

Module 5 User-Defined Functions Purpose of this Video Series Goal : Create your own functions Not same as designing (a larger course goal) Focusing on technical details of writing code But need to introduce a lot of terminology

710 views • 44 slides

Fundamentals of Programming Session 23 Instructor: Reza Entezari-Maleki Email:

Fundamentals of Programming Session 23 Instructor: Reza Entezari-Maleki Email: entezari@ce.sharif.edu 1 Fall 2013 These slides have been created using Deitels slides Sharif University of Technology Outlines C++ Inline Functions

721 views • 34 slides

Antennas and Propagation Antennas Propagation Modes Line of Sight Transmission

CMPE 477 Wireless and Mobile Networks Lecture 3: Antennas and Propagation Antennas Propagation Modes Line of Sight Transmission Fading in the Mobile Environment Introduction An antenna is an electrical conductor or system

349 views • 34 slides

Francesco Belardo University of Messina - University of Primorska On the eigenspaces of signed

Frontpage Preliminaries Relations between spectra Relations among the eigenspaces The End 2015 International Conference on Graph Theory FAMNIT - University of Primorska Francesco Belardo University of Messina - University of Primorska On

681 views • 53 slides

PD-sets for codes related to flag-transitive symmetric designs Nina Mostarac

Introduction Codes from graphs Flag-transitive symmetric designs Examples PD-sets for codes related to flag-transitive symmetric designs Nina Mostarac (nmavrovic@math.uniri.hr) Dean Crnkovi c (deanc@math.uniri.hr) Department of

643 views • 20 slides

BIBDs and Group Testing Lucia Moura School of Electrical Engineering and Computer Science

Review BIBDs Group Testing Algorithms BIBDs and Group Testing Lucia Moura School of Electrical Engineering and Computer Science University of Ottawa lucia@eecs.uottawa.ca Winter 2017 BIBDs and Group Testing Lucia Moura Review BIBDs Group

522 views • 18 slides

Symmetric Designs Lucia Moura School of Electrical Engineering and Computer Science University

Symmetric designs Projective Planes and Geometries Symmetric Designs Lucia Moura School of Electrical Engineering and Computer Science University of Ottawa lucia@eecs.uottawa.ca Winter 2017 Symmetric Designs Lucia Moura Symmetric designs

574 views • 20 slides