robustness robustness robustness
play

Robustness? Robustness ? Robustness? - PDF document

Robustness? Robustness ? Robustness? Thomas Mandl


  1. � �� ��������������� ��������������� Robustness? Robustness ? Robustness? ����������������������� Thomas Mandl ������������ ���!�"##$� Information Science • Robust … means … capable of functioning Universität Hildesheim mandl@uni-hildesheim.de correctly, (or at the very minimum, not failing catastrophically) under a great many Robust Task - conditions. (http://www.reference.com/) Result Overview and Lessons Learned from Robustness • Robust IR means the capability of an IR Evaluation system to work well (and reach at least a minimal performance) under a variety of conditions (topics, difficulty, collections, users, languages …) ��������������� ����������������������� Thomas Mandl: Robust CLEF 2007 - Overview Thomas Mandl: Robust CLEF 2007 - Overview 1 2 Variety of of conditions conditions … … System Variance System System Variance Variance Variety Variety of conditions … 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.1 0.2 0 0.1 Mono FR Mono EN Mono PT Bi ->FR 0 Mono FR Mono EN Mono PT Bi ->FR Variance between topics Thomas Mandl: Robust CLEF 2007 - Overview Thomas Mandl: Robust CLEF 2007 - Overview 3 4 Robust Task Task 2007 2007 History of Robust IR Evaluation of Robust IR Evaluation Robust Robust Task 2007 History History of Robust IR Evaluation • TREC • Again … – Mono-lingual Retrieval – Use topics and relevance assessment from previous CLEF campaigns – 2003 - 2005 – Take a different perspective and use a robust • CLEF evaluation measure (GMAP) – Mono-, bi- and Multilingual Retrieval – Emphasize the difficult (= low performing) – 2006 six languages topics – 2007 three languages Thomas Mandl: Robust CLEF 2007 - Overview Thomas Mandl: Robust CLEF 2007 - Overview 5 6 1

  2. Training and Test Training and Test Which system Which system is is better? better? Training and Test Which system is better? • CLEF 2001, 2002 and 2003 for training 1 0.9 • CLEF 2004, 2005 and 2006 for testing 0.8 n Topics ∏ = geoAve 0.7 x n I i 0.6 II = 1 i 0.5 III 0.4 0.3 T o p ic S y s te m R e s u lt T o p ic S y s te m R e s u lt 0.2 1 A 0 .1 1 B 0 .2 0.1 0 2 A 0 .1 2 B 0 .2 Result A Result B 3 A 0 .9 3 B 0 .6 G e o A v e A 0 .2 1 G e o A v e B 0 .2 9 M A P A 0 .3 7 M A P B 0 .3 3 Thomas Mandl: Robust CLEF 2007 - Overview Thomas Mandl: Robust CLEF 2007 - Overview 7 8 Collections Collections Collections Robust Task Robust Task 2007 Robust Task 2007 2007 Language Target Collection Training Test • ����������� ������������ ����������� Topics Topics • �������������������� English Los Angeles Times 1994 41-200 251-350 • ����������������� ������������������� • ���� ����������������� French Le Monde 1994 41-140 251-350 • ����� ���������� ������� Swiss News Agency 94 Portuguese P ú blico 1995 - 201-350 Thomas Mandl: Robust CLEF 2007 - Overview Thomas Mandl: Robust CLEF 2007 - Overview 9 10 Participation Results Participation Participation Results Results Mono English • 63 runs submitted by 7 groups Rank Participant Experiment MAP GMAP 1st reina 10.2415/AH-ROBUST-MONO-EN-TEST- 38.97% 18.50% • 2006: 133 runs by 8 groups CLEF2007.REINA.REINAENTDNT 2nd daedalus 10.2415/AH-ROBUST-MONO-EN-TEST- 37.78% 17.72% CLEF2007.DAEDALUS.ENFSEN22S 3rd hildesheim 10.2415/AH-ROBUST-MONO-EN-TEST- 5.88% 0.32% CLEF2007.HILDESHEIM.HIMOENBRFNE Mono Portuguese Rank Participant Experiment MAP GMAP 10.2415/AH-ROBUST-MONO-PT-TEST- 1st reina CLEF2007.REINA.REINAPTTDNT 41.40% 12.87% 10.2415/AH-ROBUST-MONO-PT-TEST- 2nd jaen CLEF2007.JAEN.UJARTPT1 24.74% 0.58% 10.2415/AH-ROBUST-MONO-PT-TEST- 3rd daedalus CLEF2007.DAEDALUS.PTFSPT2S 23.75% 0.50% 10.2415/AH-ROBUST-MONO-PT-TEST- 4th xldb CLEF2007.XLDB.XLDBROB16 1.21% 0.071% Thomas Mandl: Robust CLEF 2007 - Overview Thomas Mandl: Robust CLEF 2007 - Overview 11 12 2

  3. Results Results Mono English Mono English Results Mono Results Mono Portuguese Portuguese Results Mono English Results Mono Portuguese Ad−Hoc Robust Monolingual English Test Task Top 5 Participants − Standard Recall Levels vs Mean Interpolated Precision Ad−Hoc Robust Monolingual Portuguese Test Task Top 5 Participants − Standard Recall Levels vs Mean Interpolated Precision 100% 100% reina [Experiment REINAENTDNT; MAP 38.97%; Not Pooled] reina [Experiment REINAPTTDNT; MAP 41.40%; Not Pooled] daedalus [Experiment ENFSEN22S; MAP 37.78%; Not Pooled] jaen [Experiment UJARTPT1; MAP 24.74%; Not Pooled] 90% hildesheim [Experiment HIMOENBRFNE; MAP 5.88%; Not Pooled] 90% daedalus [Experiment PTFSPT2S; MAP 23.75%; Not Pooled] xldb [Experiment XLDBROB16_10; MAP 1.21%; Not Pooled] 80% 80% 70% 70% 60% 60% Precision Precision 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Recall Recall Thomas Mandl: Robust CLEF 2007 - Overview Thomas Mandl: Robust CLEF 2007 - Overview 13 14 Results Results Results Results Mono French Results Results Mono French Mono French Ad−Hoc Robust Monolingual French Test Task Top 5 Participants − Standard Recall Levels vs Mean Interpolated Precision Mono French 100% unine [Experiment UNINEFR1; MAP 42.13%; Not Pooled] Rank Participant Experiment MAP GMAP reina [Experiment REINAFRTDET; MAP 38.04%; Not Pooled] 1st unine 10.2415/AH-ROBUST-MONO-FR-TEST- 42.13% 14.24% 90% jaen [Experiment UJARTFR1; MAP 34.76%; Not Pooled] CLEF2007.UNINE.UNINEFR1 daedalus [Experiment FRFSFR22S; MAP 29.91%; Not Pooled] hildesheim [Experiment HIMOFRBRF2; MAP 27.31%; Not Pooled] 2nd reina 10.2415/AH-ROBUST-MONO-FR-TEST- 38.04% 12.17% 80% CLEF2007.REINA.REINAFRTDET 70% 3rd jaen 10.2415/AH-ROBUST-MONO-FR-TEST- 34.76% 10.69% CLEF2007.JAEN.UJARTFR1 4th daedalus 10.2415/AH-ROBUST-MONO-FR-TEST- 29.91% 7.43% 60% CLEF2007.DAEDALUS.FRFSFR22S Precision 50% 5th hildesheim 10.2415/AH-ROBUST-MONO-FR-TEST- 27.31% 5.47% CLEF2007.HILDESHEIM.HIMOFRBRF2 40% Bi -> French 30% Rank Participant Experiment MAP GMAP 10.2415/AH-ROBUST-BILI-X2FR-TEST- 20% 1st reina CLEF2007.REINA.REINAE2FTDNT 35.83% 12.28% 10.2415/AH-ROBUST-BILI-X2FR-TEST- 10% 2nd unine CLEF2007.UNINE.UNINEBILFR1 33.50% 5.01% 10.2415/AH-ROBUST-BILI-X2FR-TEST- 0% 3rd colesun CLEF2007.COLESUN.EN2FRTST4GRINTLOGLU001 22.87% 3.57% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Recall Thomas Mandl: Robust CLEF 2007 - Overview Thomas Mandl: Robust CLEF 2007 - Overview 15 16 Results Bi Bi- -lingual X lingual X - -> French > French Approaches Results Results Bi-lingual X -> French Approaches Approaches Ad−Hoc Robust Bilingual Test Task, French target collection(s) Top 5 Participants − Standard Recall Levels vs Mean Interpolated Precision 100% reina [Experiment REINAE2FTDNT; MAP 35.83%; Not Pooled] • Adoption of traditional and “advanced” CLIR unine [Experiment UNINEBILFR1; MAP 33.50%; Not Pooled] 90% colesun [Experiment EN2FRTST4GRINTLOGLU001; MAP 22.87%; Not Pooled] methods 80% – BM 25 ( Miracle ) 70% – N-gram translation ( CoLesIR ) 60% Precision – Weighting, stemming ( Uni NE ) 50% 40% 30% • Adoption of “robust” heuristics 20% – Expansion with an external resource ( SINAI ) 10% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Recall Thomas Mandl: Robust CLEF 2007 - Overview Thomas Mandl: Robust CLEF 2007 - Overview 17 18 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend