  # Introduction to Natural Language Processing Summary Language models - PowerPoint PPT Presentation

## Introduction Probability Ranking Principle WWW: pecina@ufal.mfg.cuni.cz E-mail: Pavel Pecina Todays teacher: Probabilistic Models for Information Retrieval Todays topic: Week 8, lecture Today: by members of the Institute of Formal

1. Introduction Probability Ranking Principle WWW: pecina@ufal.mfg.cuni.cz E-mail: Pavel Pecina Today’s teacher: Probabilistic Models for Information Retrieval Today’s topic: Week 8, lecture Today: by members of the Institute of Formal and Applied Linguistics a course taught as B4M36NLP at Open Informatics Introduction to Natural Language Processing Summary Language models Okapi BM25 Binary Independence Model 1 / 63 htup://ufal.mfg.cuni.cz/ ∼ pecina/

2. Introduction Probability Ranking Principle Binary Independence Model Okapi BM25 Language models Summary Contents Introduction Probability Ranking Principle Binary Independence Model Okapi BM25 Language models Summary 2 / 63

3. Introduction Probability Ranking Principle Binary Independence Model Okapi BM25 Language models Summary Probabilistic IR models at a glance 1. Probability Ranking Principle 2. Binary Independence Model 3. BestMatch25 (Okapi) 3 / 63 ▶ Classical probabilistic retrieval model ▶ Language model approach to IR

4. Introduction Probability Ranking Principle Binary Independence Model Okapi BM25 Language models Summary Introduction 4 / 63

5. Introduction Probability Ranking Principle Binary Independence Model Okapi BM25 Language models Summary Probabilistic model vs. other models Boolean model: simple Boolean model. Vector space model: ranking. model? 5 / 63 ▶ Probabilistic models support ranking and thus are betuer than the ▶ The vector space model is also a formally defined model that supports ▶ Why would we want to look for an alternative to the vector space

6. Introduction Probability Ranking Principle Binary Independence Model Okapi BM25 Language models Summary Probabilistic vs. vector space model of “ is the document a good document to give to the user or not? ” nonrelevant. really want an IR system to do: give relevant documents to the user. 6 / 63 ▶ Vector space model: rank documents according to similarity to query. ▶ The notion of similarity does not translate directly into an assessment ▶ The most similar document can be highly relevant or completely ▶ Probability theory is arguably a cleaner formalization of what we

7. Introduction Basic Probability Theory special case of this rule gives: Probability Ranking Principle 7 / 63 Summary Language models Okapi BM25 Binary Independence Model ▶ For events A and B : ▶ Joint probability P ( A ∩ B ) : both events occurring ▶ Conditional probability P ( A | B ) : A occurring given B has occurred ▶ Chain rule gives relationship between joint/conditional probabilities: P ( AB ) = P ( A ∩ B ) = P ( A | B ) P ( B ) = P ( B | A ) P ( A ) ▶ Similarly for the complement of an event P ( A ) : P ( AB ) = P ( B | A ) P ( A ) ▶ Partition rule: if B can be divided into an exhaustive set of disjoint subcases, then P ( B ) is the sum of the probabilities of the subcases. A P ( B ) = P ( AB ) + P ( AB )

8. Introduction Probability Ranking Principle Odds: does not hold. based on the likelihood of B occurring in the two cases that A does or event A is in the absence of any other information). 8 / 63 Summary Binary Independence Model Okapi BM25 Basic probability theory cont’d Language models ▶ Bayes’ Rule for inverting conditional probabilities: P ( A | B ) = P ( B | A ) P ( A ) P ( B | A ) = X ∈{ A , A } P ( B | X ) P ( X ) · P ( A ) P ( B ) ∑ ▶ Can be thought of as a way of updating probabilities: ▶ Start ofg with prior probability P ( A ) (initial estimate of how likely ▶ Derive a posterior probability P ( A | B ) afuer having seen the evidence B , ▶ Odds of an event is a kind of multiplier for how probabilities change: O ( A ) = P ( A ) P ( A ) P ( A ) = 1 − P ( A )

9. Introduction Probability Ranking Principle Binary Independence Model Okapi BM25 Language models Summary Probability Ranking Principle 9 / 63

10. Introduction The document ranking problem relevance of other documents. Probability Ranking Principle issues a query, and an ordered list of documents is returned. 10 / 63 Summary Language models Okapi BM25 Binary Independence Model ▶ Ranked retrieval setup: given a collection of documents, the user ▶ Assume binary relevance: R d , q is a random dichotomous variable: R d , q = 1 if document d is relevant w.r.t query q R d , q = 0 otherwise. ▶ Ofuen we write just R for R d , q ▶ Probabilistic ranking orders documents decreasingly by their estimated probability of relevance w.r.t. query: P ( R = 1 | d , q ) . ▶ Assume that the relevance of each document is independent of the

11. Introduction Probability Ranking Principle its user will be the best that is obtainable on the basis of those data. the system for this purpose, the overall efgectiveness of the system to possible on the basis of whatever data have been made available to documents […] in order of decreasing probability of relevance to the PRP in full: will be the best that is obtainable. their probability of relevance, then the efgectiveness of the system PRP in brief: Probability Ranking Principle (PRP) Summary Language models Okapi BM25 Binary Independence Model 11 / 63 ▶ If the retrieved documents (w.r.t a query) are ranked decreasingly on ▶ If [the IR] system’s response to each [query] is a ranking of the [query], where the probabilities are estimated as accurately as

12. Introduction Probability Ranking Principle Binary Independence Model Okapi BM25 Language models Summary Binary Independence Model 12 / 63

13. Introduction 1. ‘Binary’ (equivalent to Boolean): documents and queries represented practically works – ‘naive’ assumption of Naive Bayes models). 2. ‘Independence’: no association between terms (not true, but q . Probability Ranking Principle as binary term incidence vectors. Traditionally used with the PRP, with the following assumptions: Binary Independence Model (BIM) Summary Language models Okapi BM25 Binary Independence Model 13 / 63 ▶ E.g., document d represented by vector ⃗ x = ( x 1 , . . . , x M ) , where x t = 1 if term t occurs in d and x t = 0 otherwise. ▶ Difgerent documents may have the same vector representation. ▶ Similarly, we represent q by the incidence vector ⃗

14. Introduction 0 1 1 Calpurnia 0 1 0 0 0 0 Cleopatra 1 0 0 0 0 Probability Ranking Principle 1 … 0 1 1 1 0 worser mercy 1 1 1 1 0 1 1 0 1 Tempest Binary Independence Model Okapi BM25 Language models Summary Binary incidence matrix Anthony Julius The Hamlet Othello Macbeth … and Caesar Cleopatra 1 Anthony Caesar 0 0 1 0 1 1 Brutus 1 0 0 0 1 1 14 / 63 Each document is represented as a binary vector ∈ { 0 , 1 } | V | .

15. Introduction Probability Ranking Principle Binary Independence Model Okapi BM25 Language models Summary Binary Independence Model (1) To make a probabilistic retrieval strategy precise, need to estimate how terms in documents contribute to relevance: document length) that afgect judgments about document relevance. document relevance. 15 / 63 ▶ Find measurable statistics (term frequency, document frequency, ▶ Combine these statistics to estimate the probability P ( R | d , q ) of ▶ Next: how exactly we can do this.

16. Introduction Probability Ranking Principle probabilities. x . nonrelevant document is retrieved, then that document’s 16 / 63 Binary Independence Model (2) Language models Okapi BM25 Binary Independence Model Summary P ( R | d , q ) is modeled using term incidence vectors as P ( R | ⃗ x ,⃗ q ) : P ( ⃗ x | R = 1 ,⃗ q ) P ( R = 1 | ⃗ q ) P ( R = 1 | ⃗ q ) = x ,⃗ P ( ⃗ x | ⃗ q ) P ( ⃗ x | R = 0 ,⃗ q ) P ( R = 0 | ⃗ q ) P ( R = 0 | ⃗ x ,⃗ q ) = P ( ⃗ x | ⃗ q ) ▶ P ( ⃗ x | R = 1 ,⃗ q ) and P ( ⃗ x | R = 0 ,⃗ q ) : probability that if a relevant or representation is ⃗ ▶ Use statistics about the document collection to estimate these

17. Introduction Binary Independence Model (3) documents in the collection. q . Probability Ranking Principle 17 / 63 Summary Language models Okapi BM25 Binary Independence Model P ( R | d , q ) is modeled using term incidence vectors as P ( R | ⃗ x ,⃗ q ) : P ( ⃗ x | R = 1 ,⃗ q ) P ( R = 1 | ⃗ q ) P ( R = 1 | ⃗ x ,⃗ q ) = P ( ⃗ x | ⃗ q ) P ( ⃗ x | R = 0 ,⃗ q ) P ( R = 0 | ⃗ q ) P ( R = 0 | ⃗ x ,⃗ q ) = P ( ⃗ x | ⃗ q ) ▶ P ( R = 1 | ⃗ q ) and P ( R = 0 | ⃗ q ) : prior probability of retrieving a relevant or nonrelevant document for a query ⃗ ▶ Estimate P ( R = 1 | ⃗ q ) and P ( R = 0 | ⃗ q ) from percentage of relevant ▶ Since a document is either relevant or nonrelevant to a query, we must have that: P ( R = 1 | ⃗ x ,⃗ q ) + P ( R = 0 | ⃗ x ,⃗ q ) = 1

18. Introduction Summary Probability Ranking Principle Deriving a ranking function for query terms (1) 18 / 63 Language models Okapi BM25 Binary Independence Model ▶ Given a query q , ranking documents by P ( R = 1 | d , q ) is modeled under BIM as ranking them by P ( R = 1 | ⃗ x ,⃗ q ) . ▶ Easier: rank documents by their odds of relevance (same ranking): P ( R =1 | ⃗ q ) P ( ⃗ x | R =1 ,⃗ q ) P ( R = 1 | ⃗ q ) x ,⃗ P ( ⃗ x | ⃗ q ) O ( R | ⃗ x ,⃗ q ) = q ) = = P ( R = 0 | ⃗ P ( R =0 | ⃗ q ) P ( ⃗ x | R =0 ,⃗ q ) x ,⃗ P ( ⃗ x | ⃗ q ) P ( R = 1 | ⃗ q ) q ) · P ( ⃗ x | R = 1 ,⃗ q ) = P ( R = 0 | ⃗ P ( ⃗ x | R = 0 ,⃗ q ) P ( R =1 | ⃗ q ) q ) = O ( R | ⃗ q ) is a constant for a given query → can be ignored. ▶ P ( R =0 | ⃗

More recommend