To Re-rank or to Re-query: Can Visual Analytics Solve This Dilemma? - - PowerPoint PPT Presentation

to re rank or to re query can visual analytics solve this
SMART_READER_LITE
LIVE PREVIEW

To Re-rank or to Re-query: Can Visual Analytics Solve This Dilemma? - - PowerPoint PPT Presentation

To Re-rank or to Re-query: Can Visual Analytics Solve This Dilemma? E. Di Buccio 1 , M. Dussin 1 , N. Ferro 1 , I. Masiero 1 , G. Santucci 2 , G. Tino 2 1 University of Padua, Padova, Italy 2 Sapienza University of Rome, Rome, Italy Second


slide-1
SLIDE 1

To Re-rank or to Re-query: Can Visual Analytics Solve This Dilemma?

  • E. Di Buccio1, M. Dussin1, N. Ferro1, I. Masiero1, G. Santucci2, G. Tino2

1 University of Padua, Padova, Italy

Second International Conference of the Cross Language Evaluation Forum, CLEF2011 September 21, 2011, Amsterdam, The Netherlands

2 Sapienza University of Rome, Rome, Italy

slide-2
SLIDE 2

IR System Failure Analysis

  • Objective

Understading factors affecting the perfomance of an IR system

  • Problem

Complexity of the analysis task Example:

RIA Workshop [HarmanEt2009] (28 people, 6 weeks, 11-40 hours per topic)

  • How to address this complexity?

[HarmanEt2009] Harman, D., Buckley, C.: Overview of the Reliable Information Access Workshop. Information Retrieval 12, 615-641 (2009)

2

slide-3
SLIDE 3

Supporting Failure Analysis

  • Provide analysts with

‐ Methodologies ‐ Tools

  • Previous approaches

‐ Beadplots [BanksEt1999] ‐ Query Performance Analyzer [SormunenEt2002] ‐ VisualVectora [JärvelinEt2008] ‐ Potential for Personalization Curve [TeevanEt2010]

[BanksEt1999] Banks, D., Over, P., Zhang, N.-F.: Blind men and Elephants: Six Approaches to TREC data. Information Retrieval 1, 7-34 (1999) [SormunenEt2002] Sormunen, E., Hokkanen, S., Kangaslampi, P., Pyy, P., Sepponen, B.: Query performance analyzer -: a web- based tool for IR research and instruction. In Proceedings of SIGIR 2002, p. 450, ACM, New York (2002) [JärvelinEt2008] Järvelin, K., Vähämöttönen, I., Keskustalo, H., Kekäläinen, J.: VisualVectora: An interactive Visualization Tool for Cumalated Gain-based Retrieval Experiments. In Proceedings of ECIR ’08, Glasgow, UK (2008) [TeevanEt2010] Teevan, J., Dumais, S.T., Horvitz, E.: Potential for Personalization. ACM TOCHI, 17, 1-31 (2010)

3

slide-4
SLIDE 4

Proposed Solution

  • Visual Analytics-based approach

4

  • Quantify gain/loss with respect to the optimal and

the ideal ranking

slide-5
SLIDE 5

Analytical Model

  • Ranked result list representation

‐ Vector representation [JärvelinEt2002] ‐ GT: ground truth function (values in {0,1,…,k}) ‐ DF: discounting function

  • Two analytical measures introduced:

‐ R_Pos is the relative position of the documents in V with respect to their optimal position in the optimal ranking O ‐ Δ_Gain(i) difference between DF at rank i of the experiment and the optimal vector

5

[JärvelinEt2002] Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM TOIS, 20, 422-446 (2002)

V

id1 id2 id3 id4

GT(V)

3 1 2 3

DF

3

1

2

3

slide-6
SLIDE 6

Analytical Model Visualisation

6

GT(V) DF DCG 3 3,00 3,00 1 1,00 4,00 2 1,26 5,26 3 1,50 6,76 2 0,86 7,62 2 0,77 8,40 3 1,07 9,47 2 0,67 10,13 0,00 10,13 1 0,30 10,43 0,00 10,43 3 0,84 11,27 GT(O) DF DCG 3 3,00 3,00 3 3,00 6,00 3 1,89 7,89 3 1,50 9,39 2 0,86 10,25 2 0,77 11,03 2 0,71 11,74 2 0,67 12,41 1 0,32 12,72 1 0,30 13,02 0,00 13,02 0,00 13,02

  • k

above below

GT(V) DF DCG 3 3,00 3,00 1 1,00 4,00 2 1,26 5,26 3 1,50 6,76 2 0,86 7,62 2 0,77 8,40 3 1,07 9,47 2 0,67 10,13 0,00 10,13 1 0,30 10,43 0,00 10,43 3 0,84 11,27 Δ_Gain 0,00

  • 2,00
  • 0,63

0,00 0,00 0,00 0,36 0,00

  • 0,32

0,00 0,00 0,84 Δ_Gain 0,00

  • 2,00
  • 0,63

0,00 0,00 0,00 0,36 0,00

  • 0,32

0,00 0,00 0,84

  • k

loss local gain

slide-7
SLIDE 7

Failure Analysis Approach

7

Rank correlation among gain vectors Ranking curves R_Pos and Δ_Gain

Analysis on a per document basis using R_Pos and Δ_Gain vectors (e.g. examining document by click

  • n the corresponding entry)

More in-depth investigation on a per topic basis by examining gap among ranking curves τ: Kendall Tau Analysis through (τideal-opt , τopt-exp) pairs

  • High τideal-opt and low τopt-exp : possible re-ranking
  • Low or negative τideal-opt : possible re-query
slide-8
SLIDE 8

Experimentation

  • Experimentation carried out on TREC data

‐ Document corpora of the TREC7 Adhoc Test Collection ‐ Subset of the TREC7 Adhoc topics re-assessed in [JärvelinEt2002] ‐ Graded relevant judgments gathered in [JärvelinEt2002]

  • DCG

‐ trec_eval implementation with logx(i+1)

8

slide-9
SLIDE 9

Case Study (re-ranking)

9

(τideal-opt , τopt-exp) = (0.88, 0.07)

slide-10
SLIDE 10

Case Study (re-ranking)

10

(τideal-opt , τopt-exp) = (0.88, 0.07) (τideal-opt , τopt-exp) = (0.99, 0.24)

slide-11
SLIDE 11

Case Study (re-query)

11

(τideal-opt , τopt-exp) = (0.59, 0.45)

slide-12
SLIDE 12

Concluding Remarks

  • Visual Analytics integrated in IR Evaluation

‐ helps explore the quality of ranked result lists ‐ helps point out the location and the magnitude of ranking errors

  • Future Work

‐ Extending the approach to comparison of multiple experiments ‐ Allowing for more complex forms of interaction with curve and R_Pos and Δ_Gain vectors ‐ Automatic extraction of features from misplaced documents and visualization of relationship among misplaced documents

12

slide-13
SLIDE 13

Questions?

13