Document Misplacement for IR Evaluation Nicola Ferro Information - - PowerPoint PPT Presentation

document misplacement for ir evaluation
SMART_READER_LITE
LIVE PREVIEW

Document Misplacement for IR Evaluation Nicola Ferro Information - - PowerPoint PPT Presentation

Document Misplacement for IR Evaluation Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering (DEI) University of Padua, Italy Forum for Information Retrieval Evaluation (FIRE 2013) 4 - 6


slide-1
SLIDE 1

Nicola Ferro

Information Management Systems (IMS) Research Group Department of Information Engineering (DEI) University of Padua, Italy

Forum for Information Retrieval Evaluation (FIRE 2013) 4-6 December 2013, New Delhi, India

Document Misplacement for IR Evaluation

slide-2
SLIDE 2

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

Outline

2

System Focus Human Focus

TREC-style Studies Information-Seeking Behavior in Context Log Analysis TREC Interactive Studies Experimental Information Behavior Information-Seeking Behavior with IR Systems “Users” make relevance assessments Filtering and SDI

Archetypical IIR Study

slide-3
SLIDE 3

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

Outline

2

System Focus Human Focus

TREC-style Studies Information-Seeking Behavior in Context Log Analysis TREC Interactive Studies Experimental Information Behavior Information-Seeking Behavior with IR Systems “Users” make relevance assessments Filtering and SDI

Archetypical IIR Study

?

  • 1. How to provide visual interactive tools that

ease the interpretation of evaluation results?

  • 2. Should utility (gain) be the main concept

around which measures are designed?

slide-4
SLIDE 4

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

Joint Work With

Visual Analytics

Marco Angelini, Sapienza University of Rome, Italy Giuseppe Santucci, Sapienza University of Rome, Italy Gianmaria Silvello, University of Padua, Italy

Alternative Evaluation Measures

Kalervo Jarvelin, University of Tampere, Finland Heikki Keskustalo, University of Tampere, Finland Ari Pirkola, University of Tampere, Finland Gianmaria Silvello, University of Padua, Italy

3

slide-5
SLIDE 5

Visual Tools based on Document Misplacement

slide-6
SLIDE 6

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

Discounted Cumulative Gain

DCG allows for graded relevance judgments and embed a model of the user behavior while s/he scrolls down the results list which also gives an account of her/his overall satisfaction represents the gain for a document with the given relevance level at rank , e.g. 0 for not relevant, 1 for partially relevant, 3 for highly relevant the log base indicates the “patience/determination” of the user while scrolling the list, e.g. indicates an impatient user while indicates a more motivated user

5

DG(i) =

  • G(i)

i < b

G(i) logb(i)

i ≥ b DCG(i) =

i

  • k=1

DG(k)

G(i)

i

b

b = 2

b = 10

− − − − 100 200 300 400 500 600 700 800 900 1000 20 40 60 80 100 120 140 Rank DCG

DCG curve comparison for TREC7, topic: 365

slide-7
SLIDE 7

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

Inspecting ranked lists

6

1

Run

HR HR FR NR PR FR NR NR NR

Rank

PR HR NR

2 3 4 5 6 7 8 9 10 11 12 20

NR

Ideal

HR HR HR FR FR FR PR PR PR PR NR NR NR

Optimal

HR HR FR PR FR NR NR NR PR HR NR NR NR

Ideal is often used in measures for normalization, see e.g. nDCG Optimal, the best ranking possible with the documents actually retrieved by the system How these ranks are correlated?

slide-8
SLIDE 8

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

Performance analysis: To re-rank or to re-query?

7

slide-9
SLIDE 9

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

How to spot failures?

8

slide-10
SLIDE 10

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

Document Misplacement

9

1

Run

HR HR FR NR PR FR NR NR NR

Rank

PR HR NR

2 3 4 5 6 7 8 9 10 11 12 20

NR

Ideal

HR HR HR FR FR FR PR PR PR PR NR NR NR

Relative Position

  • 7
  • 2
  • 3
  • 2
  • 4
  • 1

+8

too early too early too early too early too early too early correct correct correct correct too late correct correct

min(FR)= 4 max(FR)= 6 max(HR)= 3 min(HR)= 1 min(PR)= 7 max(PR)= 10 min(NR)= 11 max(NR)= 20

slide-11
SLIDE 11

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

Failure analisys: identify critical rank areas

10

(j) =

  • t
  • E(j)
  • ≤ j ∧

j ≤ t

  • E(j)
  • j − t
  • E(j)
  • j < t
  • E(j)
  • j − t
  • E(j)
  • j > t
  • E(j)
  • ∆[j] = DGE[j] − DGIb[j]
slide-12
SLIDE 12

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

What is the impact of a possible fix?

11

slide-13
SLIDE 13

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

What-if analysis: estimating the impact of a fix

12

slide-14
SLIDE 14

Measures based on Document Misplacement

slide-15
SLIDE 15

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

Search as a Commodity

IR systems are more and more perceived as commodities, like water and electricity

“if you do not find something with a search engine, it does not exist”

Traditional IR measures are centered around the idea of utility for the user in scanning a ranked list

Has enough relevant information been provided to the user? Has this relevant information provided in a good enough order?

BUT

Considering search as a commodity leads to assuming that somehow the utiliy is granted and so other factors may affect the performances of an IR system

14

slide-16
SLIDE 16

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

Search as a Commodity

IR systems are more and more perceived as commodities, like water and electricity

“if you do not find something with a search engine, it does not exist”

Traditional IR measures are centered around the idea of utility for the user in scanning a ranked list

Has enough relevant information been provided to the user? Has this relevant information provided in a good enough order?

BUT

Considering search as a commodity leads to assuming that somehow the utiliy is granted and so other factors may affect the performances of an IR system

14

slide-17
SLIDE 17

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

Cumulated Relative Position (CRP)

CRP cumulates, at each rank position, the positive and negative document misplacements (RP) and measures the total “space” the user had to run back and forth in the result list CRP represents the avoidable effort, since in the case of the ideal ranking there would be zero misplacements, and this avoidable effort causes user weariness

15

Highly Relevant Documents Fairly Relevant Documents Partially Relevant Documents Not Relevant Documents

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Negative Misplacement 2 - 16 = -14 positions

slide-18
SLIDE 18

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

How Does It Look Like?

16

20 40 60 80 100 120 140 160 180 200 −800 −600 −400 −200 200 400 600 800

CRP − typical run, RBt = 32, N = 200

Rank CRP

RBt

slide-19
SLIDE 19

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

How Does It Look Like?

16

20 40 60 80 100 120 140 160 180 200 −800 −600 −400 −200 200 400 600 800

CRP − typical run, RBt = 32, N = 200

Rank CRP

RBt

slide-20
SLIDE 20

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

How It Does Look Like wrt Others?

17

100 200 300 400 500 600 700 800 900 1000 −1 −0.5 0.5 1 1.5 x 10

4

Rank CRP

CRP curve comparison for TREC7, topic: 351

input.APL985LC input.acsys7mi − − 100 200 300 400 500 600 700 800 900 1000 50 100 150 200 250 Rank CG

CG curve comparison for TREC7, topic: 351

input.APL985LC input.acsys7mi 100 200 300 400 500 600 700 800 900 1000 20 40 60 80 100 120 140 Rank DCG

DCG curve comparison for TREC7, topic: 35

input.APL985LC input.acsys7mi 100 200 300 400 500 600 700 800 900 1000 −80 −70 −60 −50 −40 −30 −20 −10 10 Rank CRP

CRP curve comparison for TREC7, topic: 365

input.APL985LC input.acsys7mi − − 100 200 300 400 500 600 700 800 900 1000 5 10 15 20 25 30 35 40 45 Rank CG

CG curve comparison for TREC7, topic: 365

input.APL985LC input.acsys7mi 100 200 300 400 500 600 700 800 900 1000 5 10 15 20 25 30 35 40 Rank DCG

DCG curve comparison for TREC7, topic: 365

input.APL985LC input.acsys7mi

slide-21
SLIDE 21

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

What Task? What User Model?

Task: informational

at each rank position, CRP the total amount

  • f avoidable effort up to then

User model: user with a uniform probability of stopping at each rank position

similar to the user model underlying CG/DCG and also RBP, somehow

18

slide-22
SLIDE 22

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

Summary Indicators

recovery ratio ρ: the earliest rank position, if any, at which the user passes through the ideal point related to the recall base; space ratio σ: the ratio of the total avoidable space the user had to walk through twist τ: the mean between ρ and σ to grasp the overall angle and

  • utlook of CRP about a run

19

20 40 60 80 100 120 140 160 180 200 −800 −600 −400 −200 200 400 600 800

t

Rank CRP

RBt

ρ σ- σ+

slide-23
SLIDE 23

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

Does CRP Tell a Different Story?

20

0.2 0.1 0.0 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3

Kendall’s Tau

ρ σ

nDCG@1000

Q-measure

nCG@1000

τ

AP

ρ σ τ

nDCG@1000 Q-measure nCG@1000

AP

TREC 7 TREC 2001 NTCIR 3 (chz, jpn)

slide-24
SLIDE 24

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

What Story Does CRP Tell?

21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

  • Tau

AP

U+ E++++ OK U+++ E++++ KO U++ E++++ ~~ U++++ E++++ KO U++ E+++ ~ U+++ E+++ KO U++++ E+++ KO U++++ E++ ~~ U++++ E+ OK

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

  • Tau

nDCG

U+ E++++ OK U++ E++++ ~~ U+++ E++++ KO U++++ E++++ KO U++ E+++ ~ U+++ E+++ KO U++++ E+++ KO U++++ E++ ~~ U++++ E+ OK

slide-25
SLIDE 25

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

CRP Considerations

Positive aspects

both rank-by-rank and single-number measure it grasps a different angle with respect to existing measures parameter free

Controversial aspects

is user-oriented enough?

parameter free

is intuitive enough?

reads differently from traditional measures

22

slide-26
SLIDE 26

Nicola Ferro Document Misplacement for IR Evaluation FIRE 2013, 5 December 2013, New Delhi, India

Final Remarks and Future Work

We have discussed how document misplacement can play a role in IR evaluation

visual interactive tool CRP measure

Future work

visual interactive tool: exploring what-if analysis CRP: more extensive experimental evaluation, normalization, study of the properties

23

slide-27
SLIDE 27

Thank you