Retrieval Models Probability Ranking Principle Web Search Slides - PowerPoint PPT Presentation

Retrieval Models Probability Ranking Principle Web Search Slides based on the books: 1

Retrieval models • Geometric/linear spaces • Vector space model • Probability ranking principle • Language models approach to IR • An important emphasis in recent work • Probabilistic retrieval model • Binary independence model • Okapi’s BM25 2

Recall a few probability basics • For events A and B, the Bayes ’ Rule is: 𝑞 𝐵, 𝐶 = 𝑞 𝐵 𝐶 𝑞 𝐶 = 𝑞 𝐶 𝐵 𝑞 𝐵 𝑞 𝐵 𝐶 = 𝑞 𝐵, 𝐶 𝑞(𝐶) = 𝑞 𝐵 𝑞 𝐶 𝐵 𝑞 𝐶 • Interpretation: 𝑞𝑝𝑡𝑢𝑓𝑠𝑗𝑝𝑠 = 𝑞𝑠𝑗𝑝𝑠 ∙ 𝑚𝑗𝑙𝑓𝑚𝑗ℎ𝑝𝑝𝑒 ֞ 𝑞 𝐵 𝐶 = 𝑞 𝐵 𝑞 𝐶 𝐵 𝑓𝑤𝑗𝑒𝑓𝑜𝑑𝑓 𝑞 𝐶 3

Recall a few probability basics 𝑞 𝐵 𝑒𝑏𝑢𝑏 = 𝑞 𝐵 𝑞 𝑒𝑏𝑢𝑏 𝐵 𝑞 𝑒𝑏𝑢𝑏 𝑞 𝑇𝑀𝐶 = 𝑑𝑏𝑛𝑞𝑓ã𝑝 𝑒𝑏𝑢𝑏 = 𝑞 𝑇𝑀𝐶 = 𝑑𝑏𝑛𝑞𝑓ã𝑝 𝑞 𝑒𝑏𝑢𝑏 𝑇𝑀𝐶 = 𝑑𝑏𝑛𝑞𝑓ã𝑝 𝑞 𝑒𝑏𝑢𝑏 𝑏𝑞𝑝𝑡𝑢𝑓𝑠𝑗𝑝𝑠𝑗 = 𝑏𝑞𝑠𝑗𝑝𝑠𝑗 ∙ 𝑤𝑓𝑠𝑝𝑡𝑗𝑛𝑗𝑚ℎ𝑏𝑜ç𝑏 𝑓𝑤𝑗𝑒𝑓𝑜𝑑𝑗𝑏 5

Why probabilities in IR? • In traditional IR systems, matching between each document and query is attempted in a semantically imprecise space of index terms. Understanding User Query of user need is Information Need Representation uncertain How to match? Uncertain guess of Document whether document has Documents Representation relevant content Probabilities provide a principled foundation for uncertain reasoning. Can we use probabilities to quantify our uncertainties? 6

The document ranking problem • We have a collection of documents • User issues a query • A list of documents needs to be returned • Ranking method is the core of an IR system: • In what order do we present documents to the user? • We want the “best” document to be first, second best second, etc …. Idea: Rank by probability of relevance of the document w.r.t. information need 7

Modeling relevance P(R=1|document, query) • Let d represent a document in the collection. • Let R represent relevance of a document w.r.t. to a query q • Let R=1 represent relevant and R=0 not relevant. 𝑞 𝑠 = 1|𝑟, 𝑒 = 𝑞 𝑒, 𝑟 𝑠 = 1 𝑞(𝑠 = 1) • Our goal is to estimate: 𝑞(𝑒, 𝑟) 𝑞 𝑠 = 0|𝑟, 𝑒 = 𝑞 𝑒, 𝑟 𝑠 = 0 𝑞(𝑠 = 0) 𝑞(𝑒, 𝑟) 8

Probability Ranking Principle (PRP) • PRP in action: Rank all documents by 𝑞 𝑠 = 1|𝑟, 𝑒 • Theorem: Using the PRP is optimal, in that it minimizes the loss (Bayes risk) under 1/0 loss • Provable if all probabilities correct, etc. [e.g., Ripley 1996] 𝑞 𝑠|𝑟, 𝑒 = 𝑞 𝑒, 𝑟 𝑠 𝑞(𝑠) 𝑞(𝑒, 𝑟) • Using odds, we reach a more convenient formulation of ranking : O 𝑆 𝑟, 𝑒 = 𝑞 𝑠 = 1|𝑟, 𝑒 𝑞 𝑠 = 0|𝑟, 𝑒 9

Probabilistic retrieval models interpretation • PRP in action: Rank all documents by 𝑞 𝑠 = 1|𝑟, 𝑒 • Theorem: Using the PRP is optimal, in that it minimizes the loss (Bayes risk) under 1/0 loss • Provable if all probabilities correct, etc. [e.g., Ripley 1996] 𝑞 𝑠|𝑟, 𝑒 = 𝑞 𝑒, 𝑟 𝑠 𝑞(𝑠) 𝑞(𝑒, 𝑟) • Using odds, we reach a more convenient ranking formulation: O 𝑆 𝑟, 𝑒 = 𝑞 𝑠 = 1|𝑟, 𝑒 𝑞 𝑠 = 0|𝑟, 𝑒 ∝ 𝑞 𝑒 𝑟, 𝑠 = 1 𝑞 𝑒 𝑟, 𝑠 = 0) 10

ҧ The two families of Retrieval Models Probability Ranking Principle O 𝑆 𝑟, 𝑒 = 𝑞 𝑠 = 1|𝑟, 𝑒 𝑞 𝑠 = 0|𝑟, 𝑒 Language Models Probabilistic Retrieval Models 𝑃 𝑆 𝑟, 𝑒 ∝ log 𝑞 𝑟|𝑒, 𝑠 𝑞 𝑠|𝑒 𝑃 𝑆 𝑟, 𝑒 ∝ 𝑞 𝑒 𝑟, 𝑠 = 1 𝑞 𝑟|𝑒, ҧ 𝑠 𝑞 𝑠|𝑒 𝑞 𝑒 𝑟, 𝑠 = 0) • Vector Space Model • LM Dirichlet • Binary Independent Model • LM Jelineck-Mercer • BM25 12

Retrieval Models Probability Ranking Principle Web Search Slides - PowerPoint PPT Presentation

Retrieval Models Probability Ranking Principle Web Search Slides based on the books: 1 Retrieval models Geometric/linear spaces Vector space model Probability ranking principle Language models approach to IR An important

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

Models for Models for Retrieval and Browsing Retrieval and Browsing - Structural Models and

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Models for Models for Retrieval and Browsing Retrieval and Browsing - Fuzzy Set, Extended

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

Luo Si Department of Computer Science Purdue University Retrieval Models Information Need

Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

Information Retrieval Introducing Information Retrieval and Web Search

Accessing XML content: An information retrieval perspective Mounia Lalmas mounia@acm.org 1

Information Retrieval CS276: Information Retrieval and Web Search Text Classification 1 Chris

Query Likelihood Retrieval LM, session 6 CS6200: Information Retrieval Slides by: Jesse Anderton

Part I: A Development Framework for Water What is Development ? Tragedies and their causes and

Announcements Lecture 16 Debugging Leah Perlmutter / Summer 2018 Announcements Reading

Reasoning about Programs Need: definition of works/correct : a specification (and bugs) But

2012/13 Financial Position and Recovery Plan Month 12 Outturn Performance April 2013 0 Index

Characterizations of subregular tree languages Andreas Maletti Universitt Leipzig, Germany

PPs t r rt

Tagging: An Overview Rule-based Disambiguation Example after-morphology data (using Penn

Odds and ends Determinis0c Encryp0on Construc0ons: SIV

Retrieval Models Probability Ranking Principle Web Search Slides - PowerPoint PPT Presentation

Retrieval Models Probability Ranking Principle Web Search Slides based on the books: 1 Retrieval models Geometric/linear spaces Vector space model Probability ranking principle Language models approach to IR An important

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Retrieval Models: Outline CS490W: Web I nformation Search &amp; Management Retrieval Models

Models for Models for Retrieval and Browsing Retrieval and Browsing - Structural Models and

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Models for Models for Retrieval and Browsing Retrieval and Browsing - Fuzzy Set, Extended

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

Luo Si Department of Computer Science Purdue University Retrieval Models Information Need

Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

Information Retrieval Introducing Information Retrieval and Web Search

Accessing XML content: An information retrieval perspective Mounia Lalmas mounia@acm.org 1

Information Retrieval CS276: Information Retrieval and Web Search Text Classification 1 Chris

Query Likelihood Retrieval LM, session 6 CS6200: Information Retrieval Slides by: Jesse Anderton

Part I: A Development Framework for Water What is Development ? Tragedies and their causes and

Announcements Lecture 16 Debugging Leah Perlmutter / Summer 2018 Announcements Reading

Reasoning about Programs Need: definition of works/correct : a specification (and bugs) But

2012/13 Financial Position and Recovery Plan Month 12 Outturn Performance April 2013 0 Index

Characterizations of subregular tree languages Andreas Maletti Universitt Leipzig, Germany

PPs t r rt

Tagging: An Overview Rule-based Disambiguation Example after-morphology data (using Penn

Odds and ends Determinis0c Encryp0on Construc0ons: SIV

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models