to re rank or to re query can visual analytics solve this
play

To Re-rank or to Re-query: Can Visual Analytics Solve This Dilemma? - PowerPoint PPT Presentation

To Re-rank or to Re-query: Can Visual Analytics Solve This Dilemma? E. Di Buccio 1 , M. Dussin 1 , N. Ferro 1 , I. Masiero 1 , G. Santucci 2 , G. Tino 2 1 University of Padua, Padova, Italy 2 Sapienza University of Rome, Rome, Italy Second


  1. To Re-rank or to Re-query: Can Visual Analytics Solve This Dilemma? E. Di Buccio 1 , M. Dussin 1 , N. Ferro 1 , I. Masiero 1 , G. Santucci 2 , G. Tino 2 1 University of Padua, Padova, Italy 2 Sapienza University of Rome, Rome, Italy Second International Conference of the Cross Language Evaluation Forum, CLEF2011 September 21, 2011, Amsterdam, The Netherlands

  2. IR System Failure Analysis • Objective Understading factors affecting the perfomance of an IR system • Problem Complexity of the analysis task Example: RIA Workshop [ HarmanEt2009 ] (28 people, 6 weeks, 11-40 hours per topic) • How to address this complexity? [HarmanEt2009] Harman, D., Buckley, C.: Overview of the Reliable Information Access Workshop . Information Retrieval 12, 615-641 (2009) 2

  3. Supporting Failure Analysis • Provide analysts with ‐ Methodologies ‐ Tools • Previous approaches ‐ Beadplots [ BanksEt1999 ] ‐ Query Performance Analyzer [ SormunenEt2002 ] ‐ VisualVectora [ JärvelinEt2008 ] ‐ Potential for Personalization Curve [ TeevanEt2010 ] [BanksEt1999] Banks, D., Over, P., Zhang, N.-F.: Blind men and Elephants: Six Approaches to TREC data . Information Retrieval 1, 7-34 (1999) [SormunenEt2002] Sormunen, E., Hokkanen, S., Kangaslampi, P., Pyy, P., Sepponen, B.: Query performance analyzer -: a web- based tool for IR research and instruction . In Proceedings of SIGIR 2002, p. 450, ACM, New York (2002) [JärvelinEt2008] Järvelin, K., Vähämöttönen, I., Keskustalo, H., Kekäläinen, J.: VisualVectora: An interactive Visualization Tool for Cumalated Gain-based Retrieval Experiments . In Proceedings of ECIR ’08, Glasgow, UK (2008) [TeevanEt2010] Teevan, J., Dumais, S.T., Horvitz, E.: Potential for Personalization. ACM TOCHI, 17, 1-31 (2010) 3

  4. Proposed Solution • Visual Analytics-based approach • Quantify gain/loss with respect to the optimal and the ideal ranking 4

  5. Analytical Model • Ranked result list representation V GT(V) DF id1 3 3 ‐ Vector representation [ JärvelinEt2002 ] id2 1 1 ‐ GT: ground truth function (values in {0,1,…,k}) 2 id3 2 ‐ DF: discounting function 3 id4 3 … … … • Two analytical measures introduced: ‐ R_Pos is the relative position of the documents in V with respect to their optimal position in the optimal ranking O ‐ Δ_Gain (i) difference between DF at rank i of the experiment and the optimal vector [JärvelinEt2002] Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques . ACM TOIS, 20, 422-446 (2002) 5

  6. Analytical Model Visualisation GT(V) GT(V) DF DF DCG DCG Δ _Gain Δ _Gain GT(O) DF DCG 3 3 3,00 3,00 3,00 3,00 0,00 0,00 3 3,00 3,00 1 1 1,00 1,00 4,00 4,00 -2,00 -2,00 3 3,00 6,00 ok 2 2 1,26 1,26 5,26 5,26 -0,63 -0,63 3 1,89 7,89 above 3 3 1,50 1,50 6,76 6,76 0,00 0,00 3 1,50 9,39 below 2 2 0,86 0,86 7,62 7,62 0,00 0,00 2 0,86 10,25 2 2 0,77 0,77 8,40 8,40 0,00 0,00 2 0,77 11,03 ok 3 3 1,07 1,07 9,47 9,47 0,36 0,36 2 0,71 11,74 loss 2 2 0,67 0,67 10,13 10,13 0,00 0,00 2 0,67 12,41 local gain 0 0 0,00 0,00 10,13 10,13 -0,32 -0,32 1 0,32 12,72 1 1 0,30 0,30 10,43 10,43 0,00 0,00 1 0,30 13,02 0 0 0,00 0,00 10,43 10,43 0,00 0,00 0 0,00 13,02 3 3 0,84 0,84 11,27 11,27 0,84 0,84 0 0,00 13,02 6

  7. Failure Analysis Approach τ : Kendall Tau Rank correlation among Analysis through ( τ ideal-opt , τ opt-exp ) pairs gain vectors - High τ ideal-opt and low τ opt-exp : possible re-ranking - Low or negative τ ideal-opt : possible re-query More in-depth investigation on a per topic basis Ranking curves by examining gap among ranking curves Analysis on a per document basis using R_Pos and Δ _Gain vectors (e.g. examining document by click R_Pos and Δ _Gain on the corresponding entry) 7

  8. Experimentation • Experimentation carried out on TREC data ‐ Document corpora of the TREC7 Adhoc Test Collection ‐ Subset of the TREC7 Adhoc topics re-assessed in [JärvelinEt2002] ‐ Graded relevant judgments gathered in [JärvelinEt2002] • DCG ‐ trec_eval implementation with log x (i+1) 8

  9. Case Study (re-ranking) ( τ ideal-opt , τ opt-exp ) = (0.88, 0.07) 9

  10. Case Study (re-ranking) ( τ ideal-opt , τ opt-exp ) = (0.88, 0.07) ( τ ideal-opt , τ opt-exp ) = (0.99, 0.24) 10

  11. Case Study (re-query) ( τ ideal-opt , τ opt-exp ) = (0.59, 0.45) 11

  12. Concluding Remarks • Visual Analytics integrated in IR Evaluation ‐ helps explore the quality of ranked result lists ‐ helps point out the location and the magnitude of ranking errors • Future Work ‐ Extending the approach to comparison of multiple experiments ‐ Allowing for more complex forms of interaction with curve and R_Pos and Δ _Gain vectors ‐ Automatic extraction of features from misplaced documents and visualization of relationship among misplaced documents 12

  13. Questions? 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend