probabilistic information retrieval
play

Probabilistic Information Retrieval CE-324: Modern Information - PowerPoint PPT Presentation

Probabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2017 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Why probabilities in


  1. Probabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2017 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

  2. Why probabilities in IR? Understanding of user User Query need is uncertain Information Need Representation How to match? Document Uncertain guess of whether Documents Representation doct has relevant content In traditional IR systems, matching between each doc and query is attempted in a semantically imprecise space of index terms. Probabilities provide a principled foundation for uncertain reasoning. Can we use probabilities to quantify our uncertainties? 2

  3. Probabilistic IR  Probabilistic methods are one of the oldest but also one of the currently hottest topics in IR.  Traditionally: neat ideas, but didn ’ t win on performance  It may be different now. 3

  4. Probabilistic IR topics  Classical probabilistic retrieval model  Probability Ranking Principle  Binary independence model ( ≈ We will see that its a Naïve Bayes text categorization)  (Okapi) BM25  Language model approach to IR  An important emphasis on this approach in recent work 4

  5. The document ranking problem  Problem specification:  We have a collection of docs  User issues a query  A list of docs needs to be returned  Ranking method is the core of an IR system:  In what order do we present documents to the user?  Idea: Rank by probability of relevance of the doc w.r.t. information need  𝑄(𝑆 = 1|𝑒𝑝𝑑 𝑗 , 𝑟𝑣𝑓𝑠𝑧) 5

  6. Probability Ranking Principle (PRP) “ If a reference retrieval system ’ s response to each request is a ranking of the docs in the collection in order of decreasing probability of relevance to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data have been made available to the system for this purpose, the overall effectiveness of the system to its user will be the best that is obtainable on the basis of those data. ” [1960s/1970s] S. Robertson, W.S. Cooper, M.E. Maron; van Rijsbergen (1979:113); Manning & Schütze (1999:538) 6

  7. Recall a few probability basics  Product rule: 𝑞 𝑏, 𝑐 = 𝑞 𝑏 𝑐 𝑄(𝑐)  Sum rule: 𝑞 𝑏 = 𝑐 𝑞(𝑏, 𝑐)  Bayes ’ Rule Prior ( | ) ( ) ( | ) ( ) p b a p a p b a p a   ( | ) p a b  ( ) ( | ) ( ) ( | ) ( ) p b p b a p a p b a p a Posterior  Odds: ( ) ( ) p a p a   ( ) O a  ( ) 1 ( ) p a p a 7

  8. Probability Ranking Principle (PRP) d : doc 𝑟 : query R : relevance of a doc w.r.t. given (fixed) query 𝑆 = 1 : relevant 𝑆 = 0 : not relevant Need to find probability that a doc 𝒚 is relevant to a query 𝒓 . 𝑞(𝑆 = 1|𝑒, 𝑟) 𝑞 𝑆 = 0 𝑒, 𝑟 = 1 − 𝑞 𝑆 = 1 𝑒, 𝑟 8

  9. Probability Ranking Principle (PRP) 𝑞 𝑆 = 1 𝑒, 𝑟 = 𝑞 𝑒 𝑆 = 1, 𝑟 𝑞(𝑆 = 1|𝑟) 𝑞(𝑒|𝑟) 𝑞 𝑆 = 0 𝑒, 𝑟 = 𝑞 𝑒 𝑆 = 0, 𝑟 𝑞(𝑆 = 0|𝑟) 𝑞(𝑒|𝑟)  𝑞(𝑒|𝑆 = 1, 𝑟) : probability of 𝑒 in the class of relevant docs to the query 𝑟 .  𝑞(𝑒|𝑆 = 0, 𝑟) : probability of 𝑒 in the class of non- relevant docs to the query 𝑟 . 9

  10. Probability Ranking Principle (PRP)  How do we compute all those probabilities?  Do not know exact probabilities, have to use estimates  Binary Independence Model (BIM)  which we discuss next – is the simplest model 10

  11. Probabilistic Retrieval Strategy  Estimate how terms contribute to relevance  How do things like tf , df , and length influence your judgments about doc relevance?  A more nuanced answer is the Okapi formula  Spärck Jones / Robertson  Combine the above estimated values to find doc relevance probability  Order docs by decreasing probability 11

  12. Probabilistic Ranking Basic concept: “ For a given query, if we know some docs that are relevant, terms that occur in those docs should be given greater weighting in searching for other relevant docs . By making assumptions about the distribution of terms and applying Bayes Theorem, it is possible to derive weights theoretically. ” Van Rijsbergen 12

  13. Binary Independence Model  Traditionally used in conjunction with PRP  “ Binary ” = Boolean : docs are represented as binary incidence vectors of terms  𝒚 = [𝑦 1 𝑦 2 … 𝑦 𝑛 ]  𝑦 𝑗 = 1 iff term 𝑗 is present in document 𝑦 .  “ Independence ” : terms occur in docs independently  Equivalent to Multivariate Bernoulli Naive Bayes model  Sometimes used for text categorization [we will see in the next lectures] 13

  14. Binary Independence Model  Will use odds and Bayes ’ Rule:   ( 1| ) ( | 1, ) P R q P x R q  ( 1| , ) ( | ) P R q x x P q   ( | , ) O R q x    ( 0 | ) ( | 0, ) P R q P x R q ( 0 | , ) P R q x ( | ) P x q 14

  15. Binary Independence Model    ( 1| , ) ( 1| ) ( | 1, ) P R q x P R q P x R q    ( | , ) O R q x    ( 0 | , ) ( 0 | ) ( | 0, ) P R q x P R q P x R q Constant for a Needs estimation given query Using Independence Assumption:   n ( | 1, ) ( | 1, ) p x R q P x R q   i   ( | 0, ) ( | 0, ) p x R q P x R q  i 1 i  n ( | 1, ) P x R q    i ( | , ) ( | ) O R q d O R q  ( | 0, ) P x R q  1 i i 15

  16. Binary Independence Model Since 𝑦 𝑗 is either 0 or 1:     ( 1| 1, ) ( 0 | 1, ) P x R q P x R q      i i ( | , ) ( | ) O R q d O R q     ( 1| 0, ) ( 0 | 0, ) P x R q P x R q   x 1 x 0 i i i i    ( 1| 1, ) p P x R q Let i i    ( 1| 0, ) u P x R q i i Assume, for all terms not occurring in the query ( q i =0 ) that 𝑞 𝑗 = 𝑣 𝑗 This can be changed (e.g., in relevance feedback) 16

  17. Probabilities document relevant (R=1) not relevant (R=0) term present p i u i x i = 1 term absent (1 – p i ) (1 – u i ) x i = 0 Then... 17

  18. Binary Independence Model  1 p p      i i ( | , ) ( | ) O R q x O R q  1 u u    1 0 x q x i i i i i  1 q i Non-matching All matching terms query terms   (1 ) 1 p u p      ( | ) i i i O R q   (1 ) 1 u p u    1 1 x q q i i i i i i All query terms All matching terms 18

  19. Binary Independence Model   (1 ) 1 p u p      i i i ( | , ) ( | ) O R q x O R q   (1 ) 1 u p u    1 1 x q q i i i i i i Constant for each query Only quantity to be estimated for rankings Retrieval Status Value:   (1 ) (1 ) p u  p u    log i i log i i RSV   (1 ) (1 ) u p u p     1 1 x q x q i i i i i i i i 19

  20. Binary Independence Model All boils down to computing RSV:   (1 ) (1 ) p u p u     i i i i log log RSV   (1 ) (1 ) u p u p     1 1 x q x q i i i i i i i i   (1 ) p u   ; RSV c log i i c i  i (1 ) u p   1 x i q i i i c i s function as the term weights in this model So, how do we compute c i ’ s from our data ? 20

  21. BIM: example  𝑟 = {𝑦 1 , 𝑦 2 }  Relevance judgements from 20 docs together with the distribution of 𝑦 1 , 𝑦 2 within these docs  𝑞 1 = 8/12 , 𝑣 1 = 3/8 (1,1)  𝑞 2 = 7/12 and 𝑣 2 = 4/8 . (1,0) (0,1)  𝑑 1 = log 10 /3 (0,0)  𝑑 2 = log 7 /5 21

  22. Binary Independence Model Estimating RSV coefficients in theory For each term i look at this table of document counts: Documents Relevant Non-Relevant Total x i =1 s df-s df x i =0 S-s N-df-S+s N-df Total S N-S N 𝑣 𝑗 = 𝑒𝑔 − 𝑡 s p i  Estimates: 𝑂 − 𝑇 S For now, 𝑡 𝑇 − 𝑡 assume no Weight of i-th term: 𝑑 𝑗 ≈ log zero terms. 𝑒𝑔 − 𝑡 𝑂 − 𝑒𝑔 − 𝑇 + 𝑡 22

  23. Estimation – key challenge  If non-relevant docs are approximated by the whole collection:  𝑣 𝑗 = 𝑒𝑔 𝑗 /𝑂  prob. of occurrence in non-relevant docs for query  log(1– 𝑣 𝑗 )/𝑣 𝑗 = log(𝑂– 𝑒𝑔 𝑗 )/𝑒𝑔 𝑗 ≈ log 𝑂/𝑒𝑔 IDF! 𝑗 23

  24. Estimation – key challenge  𝑞 𝑗 cannot be approximated as easily as 𝑣 𝑗  probability of occurrence in relevant docs  𝑞 𝑗 can be estimated in various ways:  constant (Croft and Harper combination match)  Then just get idf weighting of terms  proportional to prob. of occurrence in collection  Greiff (SIGIR 1998) argues for 1/3 + 2/3 𝑒𝑔 𝑗 /𝑂  from relevant docs if know some  Relevance weighting can be used in a feedback loop 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend