INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch¨ utze’s, linked from http://informationretrieval.org/
IR 6: Ranking
Paul Ginsparg
Cornell University, Ithaca, NY
13 Sep 2011
1 / 48
INFO 4300 / CS4300 Information Retrieval slides adapted from - - PowerPoint PPT Presentation
INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from http://informationretrieval.org/ IR 6: Ranking Paul Ginsparg Cornell University, Ithaca, NY 13 Sep 2011 1 / 48 Administrativa Course Webpage:
1 / 48
2 / 48
3 / 48
4 / 48
5 / 48
6 / 48
t,d
t
7 / 48
t,d
t,d
t
8 / 48
9 / 48
10 / 48
11 / 48
Term frequency Document frequency Normalization n (natural) tft,d n (no) 1 n (none) 1 l (logarithm) 1 + log(tft,d) t (idf) log N dft c (cosine)
1
√
w2
1 +w2 2 +...+w2 M
a (augmented) 0.5 +
0.5×tft,d maxt(tft,d)
p (prob idf) max{0, log N−dft
dft
} u (pivoted unique) 1/u b (boolean) 1 if tft,d > 0
b (byte size) 1/CharLengthα, α < 1 L (log ave)
1+log(tft,d) 1+log(avet∈d(tft,d))
Best known combination of weighting options Default: no weighting
12 / 48
13 / 48
14 / 48
15 / 48
16 / 48
17 / 48
18 / 48
19 / 48
20 / 48
21 / 48
22 / 48
23 / 48
24 / 48
26 / 48
31 / 48
32 / 48
33 / 48
34 / 48
35 / 48
36 / 48
37 / 48
38 / 48
39 / 48
40 / 48
41 / 48
42 / 48
43 / 48
44 / 48
45 / 48
46 / 48
47 / 48
48 / 48