Language Models
LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing
Web Search
13
Slides based on the books:
Language Models LM Jelinek-Mercer Smoothing and LM Dirichlet - - PowerPoint PPT Presentation
Language Models LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing Web Search Slides based on the books: 13 Overview Indexes Query Indexi xing Ranki king Applica cation Results Documents User Information Query y Query analys
LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing
13
Slides based on the books:
14
Applica cation Multimedia documents User Information analys ysis Indexes Ranki king Query Documents Indexi xing Query Results Query y proce cess ssing Crawler
15
15
automaton stops. String = “frog said that toad likes frog STOP” P(string) = 0.01 · 0.03 · 0.04 · 0.01 · 0.02 · 0.01 · 0.2= 0.0000000000048
16
String = “frog said that toad likes frog STOP” P(string|Md1 ) = 0.01 · 0.03 · 0.04 · 0.01 · 0.02 · 0.01 · 0.02 = 0.0000000000048 = 4.8 · 10-12 P(string|Md2 ) = 0.01 · 0.03 · 0.05 · 0.02 · 0.02 · 0.01 · 0.02 = 0.0000000000120 = 12 · 10-12 P(string|Md1 ) < P(string|Md2 )
STOP” than d1 is.
17
18
𝑞𝑣𝑜𝑗 𝑢1𝑢2𝑢3𝑢4 = 𝑞 𝑢1 𝑞 𝑢2 𝑞 𝑢3 𝑞 𝑢4 𝑞𝑐𝑗 𝑢1𝑢2𝑢3𝑢4 = 𝑞 𝑢1 𝑞 𝑢2|𝑢1 𝑞 𝑢3|𝑢2 𝑞 𝑢4|𝑢3 𝑞 𝑒 =
𝑚𝑒! 𝑔𝑢1,𝑒!𝑔𝑢2,𝑒! …𝑔𝑢𝑜,𝑒! 𝑞 𝑢1 𝑔𝑢1,𝑒𝑞 𝑢2 𝑔𝑢2,𝑒…𝑞 𝑢𝑜 𝑔𝑢𝑜,𝑒
under 1/0 loss
19
𝑞 𝑠|𝑟, 𝑒 = 𝑞 𝑒, 𝑟 𝑠 𝑞(𝑠) 𝑞(𝑒, 𝑟) O 𝑆 𝑟, 𝑒 = 𝑞 𝑠 = 1|𝑟, 𝑒 𝑞 𝑠 = 0|𝑟, 𝑒
O 𝑆 𝑟, 𝑒 = 𝑞 𝑠 = 1|𝑟, 𝑒 𝑞 𝑠 = 0|𝑟, 𝑒 = 𝑞 𝑒, 𝑟 𝑠 = 1 𝑞 𝑠 = 1 𝑞 𝑒, 𝑟 𝑠 = 0 𝑞 𝑠 = 0 = 𝑞 𝑟 𝑒, 𝑠 𝑞 𝑒|𝑠 𝑞 𝑠 𝑞 𝑟 𝑒, ҧ 𝑠 𝑞 𝑒| ҧ 𝑠 𝑞 ҧ 𝑠 ∝ log 𝑞 𝑟|𝑒, 𝑠 𝑞 𝑠|𝑒 𝑞 𝑟|𝑒, ҧ 𝑠 𝑞 ҧ 𝑠|𝑒 = log 𝑞(𝑟|𝑒, 𝑠) − log 𝑞 𝑟 𝑒, ҧ 𝑠 + log 𝑞 𝑠|𝑒 𝑞 ҧ 𝑠 𝑒
20
log 𝑞(𝑟|𝑒, 𝑠) − log 𝑞 𝑟 𝑒, ҧ 𝑠 + log 𝑞 𝑠|𝑒 𝑞 ҧ 𝑠 𝑒 ≈ log 𝑞(𝑟|𝑒, 𝑠) + logit 𝑞 𝑠|𝑒
21
Naive Bayes (we dropped the r variable)
𝑢,𝑟 is the term frequency (# occurrences) of t in q
22
𝑞 𝑟 𝑁𝑒 = ෑ
𝑢∈ 𝑟∩𝑒
𝑞 𝑢 𝑁𝑒
𝑢𝑔
𝑢,𝑟
𝑞 𝑟 𝑁𝑒 = ෑ
𝑗=0 |𝑟|
𝑞 𝑢𝑗 𝑁𝑒
maximum likelihood estimate:
collection.
23
𝑞 𝑢 𝑁𝑒
𝑛𝑚 = 𝑔 𝑢,𝑒
|𝑒|
. . . but no more likely than would be expected by chance in the collection.
𝑛𝑚 based on the term
frequencies in the collection as a whole:
𝑛𝑚 to “smooth” 𝑞 𝑢 𝑒 away from zero.
24
𝑞 𝑢 𝑁𝐷
𝑛𝑚 = 𝑚𝑢
𝑚𝐷
distributions:
frequency of the word.
containing all query words.
25
𝑞 𝑟 𝑒, 𝐷 = 𝜇 ∙ 𝑞 𝑟 𝑁𝑒 + 1 − 𝜇 ∙ 𝑞 𝑟 𝑁𝑑
collection and has a “document in mind” and generates the query from this document.
user had in mind was in fact this one.
26
𝑞 𝑟 𝑒, 𝐷 ≈ ෑ
𝑢∈{𝑟∩𝑒}
𝜇 ∙ 𝑞 𝑢 𝑁𝑒 + 1 − 𝜇 ∙ 𝑞 𝑢 𝑁𝑑
when computing the term average on a document:
frequency.
collection distribution.
document.
27
average:
28
𝑞 𝑢 𝑁𝑒
𝑁𝐵𝑄 = 𝑔 𝑢,𝑒 + 𝜈 ∙ 𝑁𝑑(𝑢)
𝑒 + 𝜈 𝑞 𝑟 𝑒 = ෑ
𝑢∈𝑟
𝑔
𝑢,𝑒 + 𝜈 ∙ 𝑁𝑑 𝑢
𝑒 + 𝜈
𝑟𝑢
TREC45 Gov2 1998 1999 2005 2006 Method P@10 MAP P@10 MAP P@10 MAP P@10 MAP Binary 0.256 0.141 0.224 0.148 0.069 0.050 0.106 0.083 2-Poisson 0.402 0.177 0.406 0.207 0.418 0.171 0.538 0.207 BM25 0.424 0.178 0.440 0.205 0.471 0.243 0.534 0.277 LMJM 0.390 0.179 0.432 0.209 0.416 0.211 0.494 0.257 LMD 0.450 0.193 0.428 0.226 0.484 0.244 0.580 0.293 BM25F 0.482 0.242 0.544 0.277 BM25+PRF 0.452 0.239 0.454 0.249 0.567 0.277 0.588 0.314 RRF 0.462 0.215 0.464 0.252 0.543 0.297 0.570 0.352 LR 0.446 0.266 0.588 0.309 RankSVM 0.420 0.234 0.556 0.268
29
the Dirichlet smoothing.
Jelinek-Mercer smoothing.
30
Chengxiang Zhai and John Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22, 2 (April 2004), 179-214.
Method Query AP Prec@10 Prec@20 LMJM Title 0.227 0.323 0.265 LMD Title 0.256 0.352 0.289 LMJM Long 0.280 0.388 0.315 LMD Long 0.279 0.373 0.303
collection
Chapter 12 Sections 9.1, 9.2 and 9.3
31