Introduction to Natural Language Processing Summary Language models - PowerPoint PPT Presentation

Introduction Probability Ranking Principle WWW: pecina@ufal.mfg.cuni.cz E-mail: Pavel Pecina Today’s teacher: Probabilistic Models for Information Retrieval Today’s topic: Week 8, lecture Today: by members of the Institute of Formal and Applied Linguistics a course taught as B4M36NLP at Open Informatics Introduction to Natural Language Processing Summary Language models Okapi BM25 Binary Independence Model 1 / 63 htup://ufal.mfg.cuni.cz/ ∼ pecina/

Introduction Probability Ranking Principle Binary Independence Model Okapi BM25 Language models Summary Contents Introduction Probability Ranking Principle Binary Independence Model Okapi BM25 Language models Summary 2 / 63

Introduction Probability Ranking Principle Binary Independence Model Okapi BM25 Language models Summary Probabilistic IR models at a glance 1. Probability Ranking Principle 2. Binary Independence Model 3. BestMatch25 (Okapi) 3 / 63 ▶ Classical probabilistic retrieval model ▶ Language model approach to IR

Introduction Probability Ranking Principle Binary Independence Model Okapi BM25 Language models Summary Introduction 4 / 63

Introduction Probability Ranking Principle Binary Independence Model Okapi BM25 Language models Summary Probabilistic model vs. other models Boolean model: simple Boolean model. Vector space model: ranking. model? 5 / 63 ▶ Probabilistic models support ranking and thus are betuer than the ▶ The vector space model is also a formally defined model that supports ▶ Why would we want to look for an alternative to the vector space

Introduction Probability Ranking Principle Binary Independence Model Okapi BM25 Language models Summary Probabilistic vs. vector space model of “ is the document a good document to give to the user or not? ” nonrelevant. really want an IR system to do: give relevant documents to the user. 6 / 63 ▶ Vector space model: rank documents according to similarity to query. ▶ The notion of similarity does not translate directly into an assessment ▶ The most similar document can be highly relevant or completely ▶ Probability theory is arguably a cleaner formalization of what we

Introduction Basic Probability Theory special case of this rule gives: Probability Ranking Principle 7 / 63 Summary Language models Okapi BM25 Binary Independence Model ▶ For events A and B : ▶ Joint probability P ( A ∩ B ) : both events occurring ▶ Conditional probability P ( A | B ) : A occurring given B has occurred ▶ Chain rule gives relationship between joint/conditional probabilities: P ( AB ) = P ( A ∩ B ) = P ( A | B ) P ( B ) = P ( B | A ) P ( A ) ▶ Similarly for the complement of an event P ( A ) : P ( AB ) = P ( B | A ) P ( A ) ▶ Partition rule: if B can be divided into an exhaustive set of disjoint subcases, then P ( B ) is the sum of the probabilities of the subcases. A P ( B ) = P ( AB ) + P ( AB )

Introduction Probability Ranking Principle Odds: does not hold. based on the likelihood of B occurring in the two cases that A does or event A is in the absence of any other information). 8 / 63 Summary Binary Independence Model Okapi BM25 Basic probability theory cont’d Language models ▶ Bayes’ Rule for inverting conditional probabilities: P ( A | B ) = P ( B | A ) P ( A ) P ( B | A ) = X ∈{ A , A } P ( B | X ) P ( X ) · P ( A ) P ( B ) ∑ ▶ Can be thought of as a way of updating probabilities: ▶ Start ofg with prior probability P ( A ) (initial estimate of how likely ▶ Derive a posterior probability P ( A | B ) afuer having seen the evidence B , ▶ Odds of an event is a kind of multiplier for how probabilities change: O ( A ) = P ( A ) P ( A ) P ( A ) = 1 − P ( A )

Introduction Probability Ranking Principle Binary Independence Model Okapi BM25 Language models Summary Probability Ranking Principle 9 / 63

Introduction The document ranking problem relevance of other documents. Probability Ranking Principle issues a query, and an ordered list of documents is returned. 10 / 63 Summary Language models Okapi BM25 Binary Independence Model ▶ Ranked retrieval setup: given a collection of documents, the user ▶ Assume binary relevance: R d , q is a random dichotomous variable: R d , q = 1 if document d is relevant w.r.t query q R d , q = 0 otherwise. ▶ Ofuen we write just R for R d , q ▶ Probabilistic ranking orders documents decreasingly by their estimated probability of relevance w.r.t. query: P ( R = 1 | d , q ) . ▶ Assume that the relevance of each document is independent of the

Introduction Probability Ranking Principle its user will be the best that is obtainable on the basis of those data. the system for this purpose, the overall efgectiveness of the system to possible on the basis of whatever data have been made available to documents […] in order of decreasing probability of relevance to the PRP in full: will be the best that is obtainable. their probability of relevance, then the efgectiveness of the system PRP in brief: Probability Ranking Principle (PRP) Summary Language models Okapi BM25 Binary Independence Model 11 / 63 ▶ If the retrieved documents (w.r.t a query) are ranked decreasingly on ▶ If [the IR] system’s response to each [query] is a ranking of the [query], where the probabilities are estimated as accurately as

Introduction Probability Ranking Principle Binary Independence Model Okapi BM25 Language models Summary Binary Independence Model 12 / 63

Introduction 1. ‘Binary’ (equivalent to Boolean): documents and queries represented practically works – ‘naive’ assumption of Naive Bayes models). 2. ‘Independence’: no association between terms (not true, but q . Probability Ranking Principle as binary term incidence vectors. Traditionally used with the PRP, with the following assumptions: Binary Independence Model (BIM) Summary Language models Okapi BM25 Binary Independence Model 13 / 63 ▶ E.g., document d represented by vector ⃗ x = ( x 1 , . . . , x M ) , where x t = 1 if term t occurs in d and x t = 0 otherwise. ▶ Difgerent documents may have the same vector representation. ▶ Similarly, we represent q by the incidence vector ⃗

Introduction 0 1 1 Calpurnia 0 1 0 0 0 0 Cleopatra 1 0 0 0 0 Probability Ranking Principle 1 … 0 1 1 1 0 worser mercy 1 1 1 1 0 1 1 0 1 Tempest Binary Independence Model Okapi BM25 Language models Summary Binary incidence matrix Anthony Julius The Hamlet Othello Macbeth … and Caesar Cleopatra 1 Anthony Caesar 0 0 1 0 1 1 Brutus 1 0 0 0 1 1 14 / 63 Each document is represented as a binary vector ∈ { 0 , 1 } | V | .

Introduction Probability Ranking Principle Binary Independence Model Okapi BM25 Language models Summary Binary Independence Model (1) To make a probabilistic retrieval strategy precise, need to estimate how terms in documents contribute to relevance: document length) that afgect judgments about document relevance. document relevance. 15 / 63 ▶ Find measurable statistics (term frequency, document frequency, ▶ Combine these statistics to estimate the probability P ( R | d , q ) of ▶ Next: how exactly we can do this.

Introduction Probability Ranking Principle probabilities. x . nonrelevant document is retrieved, then that document’s 16 / 63 Binary Independence Model (2) Language models Okapi BM25 Binary Independence Model Summary P ( R | d , q ) is modeled using term incidence vectors as P ( R | ⃗ x ,⃗ q ) : P ( ⃗ x | R = 1 ,⃗ q ) P ( R = 1 | ⃗ q ) P ( R = 1 | ⃗ q ) = x ,⃗ P ( ⃗ x | ⃗ q ) P ( ⃗ x | R = 0 ,⃗ q ) P ( R = 0 | ⃗ q ) P ( R = 0 | ⃗ x ,⃗ q ) = P ( ⃗ x | ⃗ q ) ▶ P ( ⃗ x | R = 1 ,⃗ q ) and P ( ⃗ x | R = 0 ,⃗ q ) : probability that if a relevant or representation is ⃗ ▶ Use statistics about the document collection to estimate these

Introduction Binary Independence Model (3) documents in the collection. q . Probability Ranking Principle 17 / 63 Summary Language models Okapi BM25 Binary Independence Model P ( R | d , q ) is modeled using term incidence vectors as P ( R | ⃗ x ,⃗ q ) : P ( ⃗ x | R = 1 ,⃗ q ) P ( R = 1 | ⃗ q ) P ( R = 1 | ⃗ x ,⃗ q ) = P ( ⃗ x | ⃗ q ) P ( ⃗ x | R = 0 ,⃗ q ) P ( R = 0 | ⃗ q ) P ( R = 0 | ⃗ x ,⃗ q ) = P ( ⃗ x | ⃗ q ) ▶ P ( R = 1 | ⃗ q ) and P ( R = 0 | ⃗ q ) : prior probability of retrieving a relevant or nonrelevant document for a query ⃗ ▶ Estimate P ( R = 1 | ⃗ q ) and P ( R = 0 | ⃗ q ) from percentage of relevant ▶ Since a document is either relevant or nonrelevant to a query, we must have that: P ( R = 1 | ⃗ x ,⃗ q ) + P ( R = 0 | ⃗ x ,⃗ q ) = 1

Introduction to Natural Language Processing Summary Language models - PowerPoint PPT Presentation

Introduction Probability Ranking Principle WWW: pecina@ufal.mfg.cuni.cz E-mail: Pavel Pecina Todays teacher: Probabilistic Models for Information Retrieval Todays topic: Week 8, lecture Today: by members of the Institute of Formal

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

Introduction to Natural Language Processing CMSC 470 Marine Carpuat Natural Language Processing

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Natural Language Processing: Natural Language Processing: Introduction to Syntactic Parsing

Hamiltonian Cycles in Triangulations Gunnar Brinkmann, Craig Larson, Jasper Souffriau, Nico Van

Agenda Cash Flow, Projections & Forecasting Managing Revenue & Expenses

Public Information Meeting WIS 60 CORRIDOR STUDY Washington and Ozaukee Counties Welcome

Enterprise-wide Optimization: Strategies for Integration, Uncertainty, and Decomposition Ignacio

Separation of Concerns for Dependable Software Design Daniel Jackson and Eunsuk Kang MIT Nov 7

The Freedom Ladder 5 Tactics 4 Principles for achieving independence through products. Say

Leonardo de Moura Quantified SMT formulas. Applications: synthesis, software verification, ...

The Design and Implementation of Open vSwitch Ben Pfaff Justin Pettit Teemu Koponen

Introduction to Natural Language Processing Summary Language models - PowerPoint PPT Presentation

Introduction Probability Ranking Principle WWW: pecina@ufal.mfg.cuni.cz E-mail: Pavel Pecina Todays teacher: Probabilistic Models for Information Retrieval Todays topic: Week 8, lecture Today: by members of the Institute of Formal

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

Introduction to Natural Language Processing CMSC 470 Marine Carpuat Natural Language Processing

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Natural Language Processing: Natural Language Processing: Introduction to Syntactic Parsing

Hamiltonian Cycles in Triangulations Gunnar Brinkmann, Craig Larson, Jasper Souffriau, Nico Van

Agenda Cash Flow, Projections &amp; Forecasting Managing Revenue &amp; Expenses

Public Information Meeting WIS 60 CORRIDOR STUDY Washington and Ozaukee Counties Welcome

Enterprise-wide Optimization: Strategies for Integration, Uncertainty, and Decomposition Ignacio

Separation of Concerns for Dependable Software Design Daniel Jackson and Eunsuk Kang MIT Nov 7

The Freedom Ladder 5 Tactics 4 Principles for achieving independence through products. Say

Leonardo de Moura Quantified SMT formulas. Applications: synthesis, software verification, ...

The Design and Implementation of Open vSwitch Ben Pfaff Justin Pettit Teemu Koponen

Agenda Cash Flow, Projections & Forecasting Managing Revenue & Expenses