Document Misplacement for IR Evaluation Nicola Ferro Information - PowerPoint PPT Presentation

Document Misplacement for IR Evaluation Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering (DEI) University of Padua, Italy Forum for Information Retrieval Evaluation (FIRE 2013) 4 - 6 December 2013, New Delhi, India

Outline Experimental Information Behavior with IR Systems TREC Interactive Studies “Users” make relevance Information-Seeking Information-Seeking Behavior in Context TREC-style Studies Filtering and SDI Log Analysis assessments Behavior Archetypical IIR System Human Study Focus Focus Document Misplacement for IR Evaluation 2 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

Outline Experimental Information Behavior with IR Systems 1. How to provide visual interactive tools that TREC Interactive Studies “Users” make relevance ease the interpretation of evaluation results? Information-Seeking Information-Seeking Behavior in Context TREC-style Studies Filtering and SDI 2. Should utility (gain) be the main concept around which measures are designed? Log Analysis assessments Behavior ? Archetypical IIR System Human Study Focus Focus Document Misplacement for IR Evaluation 2 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

Joint Work With Visual Analytics Marco Angelini, Sapienza University of Rome, Italy Giuseppe Santucci, Sapienza University of Rome, Italy Gianmaria Silvello, University of Padua, Italy Alternative Evaluation Measures Kalervo Jarvelin, University of Tampere, Finland Heikki Keskustalo, University of Tampere, Finland Ari Pirkola, University of Tampere, Finland Gianmaria Silvello, University of Padua, Italy Document Misplacement for IR Evaluation 3 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

Visual Tools based on Document Misplacement

− − − − Discounted Cumulative Gain DCG curve comparison for TREC7, topic: 365 140 120 � G ( i ) �� i < b 100 DG ( i ) = G ( i ) �� i ≥ b 80 DCG log b ( i ) 60 i 40 � DCG ( i ) = DG ( k ) 20 k =1 0 0 100 200 300 400 500 600 700 800 900 1000 Rank DCG allows for graded relevance judgments and embed a model of the user behavior while s/he scrolls down the results list which also gives an account of her/his overall satisfaction represents the gain for a document with the given relevance level at G ( i ) i rank , e.g. 0 for not relevant, 1 for partially relevant, 3 for highly relevant the log base indicates the “ patience/determination ” of the user while scrolling b b = 2 the list, e.g. indicates an impatient user while indicates a more b = 10 motivated user Document Misplacement for IR Evaluation 5 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

Inspecting ranked lists Rank Run Ideal Optimal Ideal is often 1 HR HR HR used in 2 HR HR HR measures for 3 FR HR HR normalization, 4 NR FR FR see e.g. nDCG 5 PR FR FR Optimal, the 6 FR FR PR best ranking 7 NR PR PR possible with 8 NR PR NR the documents 9 NR PR NR actually 10 PR PR NR retrieved by 11 HR NR NR the system 12 NR NR NR How these ranks are correlated? 20 NR NR NR Document Misplacement for IR Evaluation 6 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

Performance analysis: To re - rank or to re - query? Document Misplacement for IR Evaluation 7 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

How to spot failures? Document Misplacement for IR Evaluation 8 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

Document Misplacement Rank Run Ideal Relative Position correct 1 HR HR 0 min(HR)= 1 correct 2 HR HR 0 too early 3 FR HR -1 max(HR)= 3 too early 4 NR FR -7 min(FR)= 4 too early 5 PR FR -2 correct 6 max(FR)= 6 FR FR 0 too early 7 NR PR -4 min(PR)= 7 too early 8 NR PR -3 too early 9 NR PR -2 correct 10 PR PR 0 max(PR)= 10 too late 11 HR NR +8 min(NR)= 11 correct 12 NR NR 0 correct 20 NR NR 0 max(NR)= 20 Document Misplacement for IR Evaluation 9 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

Failure analisys: identify critical rank areas � � � 0 E ( j ) ≤ j ∧ �� t � � � � � E ( j ) j ≤ �� t � �� ( j ) = � � � � E ( j ) E ( j ) j − �� t �� j < �� t � � � � � � � E ( j ) E ( j ) � j − �� t �� j > �� t ∆ �� [ j ] = DG E [ j ] − DG I b [ j ] Document Misplacement for IR Evaluation 10 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

What is the impact of a possible fix? Document Misplacement for IR Evaluation 11 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

What - if analysis: estimating the impact of a fix Document Misplacement for IR Evaluation 12 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

Measures based on Document Misplacement

Search as a Commodity IR systems are more and more perceived as commodities, like water and electricity “ if you do not find something with a search engine, it does not exist ” Traditional IR measures are centered around the idea of utility for the user in scanning a ranked list Has enough relevant information been provided to the user? Has this relevant information provided in a good enough order? BUT Considering search as a commodity leads to assuming that somehow the utiliy is granted and so other factors may affect the performances of an IR system Document Misplacement for IR Evaluation 14 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

Cumulated Relative Position (CRP) Highly Fairly Partially Not Relevant Relevant Relevant Relevant Documents Documents Documents Documents 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Negative Misplacement 2 - 16 = - 14 positions CRP cumulates, at each rank position, the positive and negative document misplacements (RP) and measures the total “ space ” the user had to run back and forth in the result list CRP represents the avoidable effort, since in the case of the ideal ranking there would be zero misplacements, and this avoidable effort causes user weariness Document Misplacement for IR Evaluation 15 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

How Does It Look Like? CRP − typical run, RB t = 32, N = 200 RB t 800 600 400 200 CRP 0 − 200 − 400 − 600 − 800 20 40 60 80 100 120 140 160 180 200 Rank Document Misplacement for IR Evaluation 16 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

How It Does Look Like wrt Others? CRP curve comparison for TREC7, topic: 351 CRP curve comparison for TREC7, topic: 365 4 x 10 1.5 10 0 1 − 10 − 20 0.5 − 30 CRP CRP − 40 0 − 50 − − − 60 − 0.5 input.APL985LC input.APL985LC − 70 input.acsys7mi input.acsys7mi − 1 − − 80 − 0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 Rank Rank CG curve comparison for TREC7, topic: 365 CG curve comparison for TREC7, topic: 351 45 250 40 200 35 30 150 25 CG CG 20 100 15 10 50 input.APL985LC input.APL985LC 5 input.acsys7mi input.acsys7mi 0 0 0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 Rank Rank DCG curve comparison for TREC7, topic: 365 DCG curve comparison for TREC7, topic: 35 � 40 140 35 120 30 100 25 80 DCG DCG 20 60 15 40 10 input.APL985LC 5 20 input.APL985LC input.acsys7mi input.acsys7mi 0 0 0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 Rank Rank Document Misplacement for IR Evaluation 17 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

What Task? What User Model? Task: informational at each rank position, CRP the total amount of avoidable effort up to then User model: user with a uniform probability of stopping at each rank position similar to the user model underlying CG/DCG and also RBP, somehow Document Misplacement for IR Evaluation 18 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

Document Misplacement for IR Evaluation Nicola Ferro Information - PowerPoint PPT Presentation

Document Misplacement for IR Evaluation Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering (DEI) University of Padua, Italy Forum for Information Retrieval Evaluation (FIRE 2013) 4 - 6

Document #15 Document #15 Document #15 Document #15 Document #15 Document #15 Document #15

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

POLICY DOCUMENT Northern Ireland Blood Transfusion Service Document Details Document Number:

>>>CLICK HERE<<< Presentation d un document word New Haven. peugeot 207 workshop

the Python document processor Brecht Machiels BATCH-MODE DOCUMENT PROCESSOR Structured Document

Designing Documents Document Design Which would you rather read? Use document design to

Webinar on Meta-evaluation Approaches to Improve Evaluation Practice Mnica Lomea Gelis,

Programme BRICK Programme Evaluation: How, why and what? The plan Practical evaluation -

Evaluation Update Laura Forsythe, PhD, MPH Associate Director, Evaluation & Analysis Lori

Evaluation DEMMS: Evaluation of Multimedia What are the Evaluation lectures about: When

Heuristic Evaluation (Pinelle) Heuristic evaluation is a method of qualitative evaluation of

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

UX Evaluation SWEN-444 Selected material from The UX Book , Hartson & Pyla UX Evaluation

GraphBlast: multi-feature graphs database searching Alfredo Ferro, Rosalba Giugno, Misael

ttr s s

Scattering amplitudes from the amplituhedron NMHV volume forms Andrea Orta

Overview of NTCIR-14 Makoto P. Kato Yiqun Liu University of Tsukuba Tsinghua University

Soil Fertility Management Daniel H. Smith, CCA Southwest Regional Specialist Nutrient and Pest

A C A CAS ASH CR H CROP OP WITH H DR. . ANDREA REA BA BASC SCHE HE Tech

Human Capital Persistence and Development Rudi Rocha (UFRJ) Claudio Ferraz (PUC-Rio) Rodrigo R.

Caches & Memcache Example Client N. America Client System Asia + Caches Client Africa

Sambuz

Useful Links

Newsletter

Mail Us

Document Misplacement for IR Evaluation Nicola Ferro Information - PowerPoint PPT Presentation

Document Misplacement for IR Evaluation Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering (DEI) University of Padua, Italy Forum for Information Retrieval Evaluation (FIRE 2013) 4 - 6

Document #15 Document #15 Document #15 Document #15 Document #15 Document #15 Document #15

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

POLICY DOCUMENT Northern Ireland Blood Transfusion Service Document Details Document Number:

&gt;&gt;&gt;CLICK HERE&lt;&lt;&lt; Presentation d un document word New Haven. peugeot 207 workshop

the Python document processor Brecht Machiels BATCH-MODE DOCUMENT PROCESSOR Structured Document

Designing Documents Document Design Which would you rather read? Use document design to

Webinar on Meta-evaluation Approaches to Improve Evaluation Practice Mnica Lomea Gelis,

Programme BRICK Programme Evaluation: How, why and what? The plan Practical evaluation -

Evaluation Update Laura Forsythe, PhD, MPH Associate Director, Evaluation &amp; Analysis Lori

Evaluation DEMMS: Evaluation of Multimedia What are the Evaluation lectures about: When

Heuristic Evaluation (Pinelle) Heuristic evaluation is a method of qualitative evaluation of

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

UX Evaluation SWEN-444 Selected material from The UX Book , Hartson &amp; Pyla UX Evaluation

GraphBlast: multi-feature graphs database searching Alfredo Ferro, Rosalba Giugno, Misael

ttr s s

Scattering amplitudes from the amplituhedron NMHV volume forms Andrea Orta

Overview of NTCIR-14 Makoto P. Kato Yiqun Liu University of Tsukuba Tsinghua University

Soil Fertility Management Daniel H. Smith, CCA Southwest Regional Specialist Nutrient and Pest

A C A CAS ASH CR H CROP OP WITH H DR. . ANDREA REA BA BASC SCHE HE Tech

Human Capital Persistence and Development Rudi Rocha (UFRJ) Claudio Ferraz (PUC-Rio) Rodrigo R.

Caches &amp; Memcache Example Client N. America Client System Asia + Caches Client Africa

Sambuz

Useful Links

Newsletter

Mail Us

>>>CLICK HERE<<< Presentation d un document word New Haven. peugeot 207 workshop

Evaluation Update Laura Forsythe, PhD, MPH Associate Director, Evaluation & Analysis Lori

UX Evaluation SWEN-444 Selected material from The UX Book , Hartson & Pyla UX Evaluation

Caches & Memcache Example Client N. America Client System Asia + Caches Client Africa