Nicola Ferro
Information Management Systems (IMS) Research Group Department of Information Engineering (DEI) University of Padua, Italy
Forum for Information Retrieval Evaluation (FIRE 2013) 4-6 December 2013, New Delhi, India
Document Misplacement for IR Evaluation Nicola Ferro Information - - PowerPoint PPT Presentation
Document Misplacement for IR Evaluation Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering (DEI) University of Padua, Italy Forum for Information Retrieval Evaluation (FIRE 2013) 4 - 6
Information Management Systems (IMS) Research Group Department of Information Engineering (DEI) University of Padua, Italy
Forum for Information Retrieval Evaluation (FIRE 2013) 4-6 December 2013, New Delhi, India
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
2
System Focus Human Focus
TREC-style Studies Information-Seeking Behavior in Context Log Analysis TREC Interactive Studies Experimental Information Behavior Information-Seeking Behavior with IR Systems “Users” make relevance assessments Filtering and SDI
Archetypical IIR Study
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
2
System Focus Human Focus
TREC-style Studies Information-Seeking Behavior in Context Log Analysis TREC Interactive Studies Experimental Information Behavior Information-Seeking Behavior with IR Systems “Users” make relevance assessments Filtering and SDI
Archetypical IIR Study
ease the interpretation of evaluation results?
around which measures are designed?
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
Marco Angelini, Sapienza University of Rome, Italy Giuseppe Santucci, Sapienza University of Rome, Italy Gianmaria Silvello, University of Padua, Italy
Kalervo Jarvelin, University of Tampere, Finland Heikki Keskustalo, University of Tampere, Finland Ari Pirkola, University of Tampere, Finland Gianmaria Silvello, University of Padua, Italy
3
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
DCG allows for graded relevance judgments and embed a model of the user behavior while s/he scrolls down the results list which also gives an account of her/his overall satisfaction represents the gain for a document with the given relevance level at rank , e.g. 0 for not relevant, 1 for partially relevant, 3 for highly relevant the log base indicates the “patience/determination” of the user while scrolling the list, e.g. indicates an impatient user while indicates a more motivated user
5
DG(i) =
i < b
G(i) logb(i)
i ≥ b DCG(i) =
i
DG(k)
G(i)
i
b
b = 2
b = 10
− − − − 100 200 300 400 500 600 700 800 900 1000 20 40 60 80 100 120 140 Rank DCG
DCG curve comparison for TREC7, topic: 365
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
6
1
HR HR FR NR PR FR NR NR NR
PR HR NR
2 3 4 5 6 7 8 9 10 11 12 20
NR
HR HR HR FR FR FR PR PR PR PR NR NR NR
HR HR FR PR FR NR NR NR PR HR NR NR NR
Ideal is often used in measures for normalization, see e.g. nDCG Optimal, the best ranking possible with the documents actually retrieved by the system How these ranks are correlated?
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
7
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
8
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
9
1
HR HR FR NR PR FR NR NR NR
PR HR NR
2 3 4 5 6 7 8 9 10 11 12 20
NR
HR HR HR FR FR FR PR PR PR PR NR NR NR
+8
too early too early too early too early too early too early correct correct correct correct too late correct correct
min(FR)= 4 max(FR)= 6 max(HR)= 3 min(HR)= 1 min(PR)= 7 max(PR)= 10 min(NR)= 11 max(NR)= 20
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
10
(j) =
j ≤ t
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
11
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
12
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
IR systems are more and more perceived as commodities, like water and electricity
“if you do not find something with a search engine, it does not exist”
Traditional IR measures are centered around the idea of utility for the user in scanning a ranked list
Has enough relevant information been provided to the user? Has this relevant information provided in a good enough order?
Considering search as a commodity leads to assuming that somehow the utiliy is granted and so other factors may affect the performances of an IR system
14
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
IR systems are more and more perceived as commodities, like water and electricity
“if you do not find something with a search engine, it does not exist”
Traditional IR measures are centered around the idea of utility for the user in scanning a ranked list
Has enough relevant information been provided to the user? Has this relevant information provided in a good enough order?
Considering search as a commodity leads to assuming that somehow the utiliy is granted and so other factors may affect the performances of an IR system
14
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
CRP cumulates, at each rank position, the positive and negative document misplacements (RP) and measures the total “space” the user had to run back and forth in the result list CRP represents the avoidable effort, since in the case of the ideal ranking there would be zero misplacements, and this avoidable effort causes user weariness
15
Highly Relevant Documents Fairly Relevant Documents Partially Relevant Documents Not Relevant Documents
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Negative Misplacement 2 - 16 = -14 positions
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
16
20 40 60 80 100 120 140 160 180 200 −800 −600 −400 −200 200 400 600 800
CRP − typical run, RBt = 32, N = 200
Rank CRP
RBt
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
16
20 40 60 80 100 120 140 160 180 200 −800 −600 −400 −200 200 400 600 800
CRP − typical run, RBt = 32, N = 200
Rank CRP
RBt
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
17
100 200 300 400 500 600 700 800 900 1000 −1 −0.5 0.5 1 1.5 x 10
4Rank CRP
CRP curve comparison for TREC7, topic: 351
input.APL985LC input.acsys7mi − − 100 200 300 400 500 600 700 800 900 1000 50 100 150 200 250 Rank CG
CG curve comparison for TREC7, topic: 351
input.APL985LC input.acsys7mi 100 200 300 400 500 600 700 800 900 1000 20 40 60 80 100 120 140 Rank DCG
DCG curve comparison for TREC7, topic: 35
input.APL985LC input.acsys7mi 100 200 300 400 500 600 700 800 900 1000 −80 −70 −60 −50 −40 −30 −20 −10 10 Rank CRP
CRP curve comparison for TREC7, topic: 365
input.APL985LC input.acsys7mi − − 100 200 300 400 500 600 700 800 900 1000 5 10 15 20 25 30 35 40 45 Rank CG
CG curve comparison for TREC7, topic: 365
input.APL985LC input.acsys7mi 100 200 300 400 500 600 700 800 900 1000 5 10 15 20 25 30 35 40 Rank DCG
DCG curve comparison for TREC7, topic: 365
input.APL985LC input.acsys7mi
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
18
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
recovery ratio ρ: the earliest rank position, if any, at which the user passes through the ideal point related to the recall base; space ratio σ: the ratio of the total avoidable space the user had to walk through twist τ: the mean between ρ and σ to grasp the overall angle and
19
20 40 60 80 100 120 140 160 180 200 −800 −600 −400 −200 200 400 600 800
−
t
Rank CRP
RBt
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
20
0.2 0.1 0.0 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3
Kendall’s Tau
ρ σ
nDCG@1000
Q-measure
nCG@1000
τ
AP
ρ σ τ
nDCG@1000 Q-measure nCG@1000
AP
TREC 7 TREC 2001 NTCIR 3 (chz, jpn)
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
AP
U+ E++++ OK U+++ E++++ KO U++ E++++ ~~ U++++ E++++ KO U++ E+++ ~ U+++ E+++ KO U++++ E+++ KO U++++ E++ ~~ U++++ E+ OK
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
nDCG
U+ E++++ OK U++ E++++ ~~ U+++ E++++ KO U++++ E++++ KO U++ E+++ ~ U+++ E+++ KO U++++ E+++ KO U++++ E++ ~~ U++++ E+ OK
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
both rank-by-rank and single-number measure it grasps a different angle with respect to existing measures parameter free
is user-oriented enough?
parameter free
is intuitive enough?
reads differently from traditional measures
22
Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India
visual interactive tool CRP measure
visual interactive tool: exploring what-if analysis CRP: more extensive experimental evaluation, normalization, study of the properties
23