Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2011/12
III.1-
Chapter III: Ranking Principles
1
Chapter III: Ranking Principles Information Retrieval & Data - - PowerPoint PPT Presentation
Chapter III: Ranking Principles Information Retrieval & Data Mining Universitt des Saarlandes, Saarbrcken Winter Semester 2011/12 III.1- 1 Chapter III: Ranking Principles* III.1 Document Processing & Boolean Retrieval
III.1-
1
3 November 2011 IR&DM, WS'11/12 III.1-
2
Tokenization, Stemming, Lemmatization, Boolean Retrieval Models
TF*IDF & Vector Space Model, Precision/Recall, F-Measure, MAP, etc.
Binary/Multivariate Models, 2-Poisson Model, BM25, Relevance Feedback
Basic LMs, Smoothing, Extended LMs, Cross-Lingual IR
Query Expansion, Proximity Ranking, Fuzzy Retrieval, XML-IR
*Mostly following Manning/Raghavan/Schütze, with additions from other sources
3 November 2011 IR&DM, WS'11/12 III.1-
3
Based on Manning/Raghavan/Schütze, Chapters 1.1, 1.4, 2.1, 2.2, 3.3, and 6.1
IR&DM, WS'11/12 III.1- 3 November 2011
4
IR&DM, WS'11/12 III.1- 3 November 2011
5
IR&DM, WS'11/12 III.1- 3 November 2011
Caesar AND Brutus AND NOT Calpurnia
6
IR&DM, WS'11/12 III.1- 3 November 2011
7
Antony Julius The Hamlet Othello Macbeth ... and Caesar Tempest Cleopatra Antony 1 1 1 Brutus 1 1 1 Caesar 1 1 1 1 1 Calpurnia 1 Cleopatra 1 mercy 1 1 1 1 1 worser 1 1 1 1 ...
IR&DM, WS'11/12 III.1- 3 November 2011
8
IR&DM, WS'11/12 III.1- 3 November 2011
9
IR&DM, WS'11/12 III.1- 3 November 2011
10
IR&DM, WS'11/12 III.1- 3 November 2011
11
IR&DM, WS'11/12 III.1- 3 November 2011
12
IR&DM, WS'11/12 III.1- 3 November 2011
13
IR&DM, WS'11/12 III.1- 3 November 2011
13
IR&DM, WS'11/12 III.1- 3 November 2011
– http://www.mpi-inf.mpg.de and pauli.miettinen@mpi-inf.mpg.de
13
IR&DM, WS'11/12 III.1- 3 November 2011
– http://www.mpi-inf.mpg.de and pauli.miettinen@mpi-inf.mpg.de
13
IR&DM, WS'11/12 III.1- 3 November 2011
– http://www.mpi-inf.mpg.de and pauli.miettinen@mpi-inf.mpg.de
13
IR&DM, WS'11/12 III.1- 3 November 2011
– http://www.mpi-inf.mpg.de and pauli.miettinen@mpi-inf.mpg.de
13
IR&DM, WS'11/12 III.1- 3 November 2011
– http://www.mpi-inf.mpg.de and pauli.miettinen@mpi-inf.mpg.de
13
IR&DM, WS'11/12 III.1- 3 November 2011
– http://www.mpi-inf.mpg.de and pauli.miettinen@mpi-inf.mpg.de
13
IR&DM, WS'11/12 III.1- 3 November 2011
– http://www.mpi-inf.mpg.de and pauli.miettinen@mpi-inf.mpg.de
13
IR&DM, WS'11/12 III.1- 3 November 2011
– http://www.mpi-inf.mpg.de and pauli.miettinen@mpi-inf.mpg.de
13
IR&DM, WS'11/12 III.1- 3 November 2011
– http://www.mpi-inf.mpg.de and pauli.miettinen@mpi-inf.mpg.de
13
IR&DM, WS'11/12 III.1- 3 November 2011
14
IR&DM, WS'11/12 III.1- 3 November 2011
15
IR&DM, WS'11/12 III.1- 3 November 2011
16
IR&DM, WS'11/12 III.1- 3 November 2011
17
IR&DM, WS'11/12 III.1- 3 November 2011
18
IR&DM, WS'11/12 III.1- 3 November 2011
19
IR&DM, WS'11/12 III.1- 3 November 2011
20
IR&DM, WS'11/12 III.1- 3 November 2011
21
IR&DM, WS'11/12 III.1- 3 November 2011
22
s∈S(x,y) |s|
IR&DM, WS'11/12 III.1- 3 November 2011
23
IR&DM, WS'11/12 III.1- 3 November 2011
24
int LevenshteinDistance(char s[1..m], char t[1..n]) { declare int d[0..m, 0..n] for i from 0 to m d[i, 0] := i // the distance of any first string to an empty second string for j from 0 to n d[0, j] := j // the distance of any second string to an empty first string for j from 1 to n { for i from 1 to m { if s[i] = t[j] then d[i, j] := d[i-1, j-1] // no operation required else d[i, j] := minimum (d[i-1, j] + 1, // a deletion d[i, j-1] + 1, // an insertion d[i-1, j-1] + 1 // a substitution) } } return d[m,n] }