Grid@CLEF Track Overview Donna Harman Nicola Ferro NIST, USA - - PowerPoint PPT Presentation

grid clef track overview
SMART_READER_LITE
LIVE PREVIEW

Grid@CLEF Track Overview Donna Harman Nicola Ferro NIST, USA - - PowerPoint PPT Presentation

CLEF 2009 Workshop September 30th - October 2nd 2009, , Greece Grid@CLEF Track Overview Donna Harman Nicola Ferro NIST, USA University of Padua, Italy donna.harman@nist.gov ferro@dei.unipd.it Issues The CLEF research


slide-1
SLIDE 1

Nicola Ferro

University of Padua, Italy

ferro@dei.unipd.it

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece

Donna Harman

NIST, USA

donna.harman@nist.gov

Grid@CLEF Track Overview

slide-2
SLIDE 2

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

Issues

The CLEF research community has been outstanding and very active in designing, developing, and testing MLIA methods and techniques, constantly improving the performances of such components

BUT

Do we really know how MLIA components behave with respect to languages? Do we have a deep comprehension of how these components interact together when the language changes?

2

slide-3
SLIDE 3

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

Objectives

Look at differences across a wide set of languages; Identify best practices for each language; Help other countries to develop their expertise in the IR field and create IR groups; Provide a repository, in which all the information and knowledge derived from the experiments undertaken can be managed and made available

3

slide-4
SLIDE 4

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

Where we are?

4

slide-5
SLIDE 5

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

Where we are?

4

slide-6
SLIDE 6

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

How Can We Get There?

5

slide-7
SLIDE 7

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

Approach

It’s not competition It’s not ranking It’s participation and cooperation

6

slide-8
SLIDE 8

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

The CIRCO Framework

The framework allows for a distributed, loosely-coupled, and asynchronous experimental evaluation of Information Retrieval (IR) systems where: distributed implies that different stakeholders can take part in the experimentation, each one providing one or more components of the whole IR system to be evaluated; loosely-coupled points out that minimal integration among the different components is required to carry out the experimentation; asynchronous underlines that no synchronization among the different components is required to carry out the experimentation.

7

Tokenizer Stop Word Remover Stemmer Indexer

slide-9
SLIDE 9

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

Participation

8

Participant Institution Country chemnitz Chemnitz University of Technology Germany cheshire U.C.Berkeley United States

Task # Participants # Runs Monolingual Dutch Monolingual English 2 6 Monolingual French 2 6 Monolingual German 2 6 Monolingual Italian Total 18

9 subscribed 2 succeeded 18 runs

Groups

slide-10
SLIDE 10

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

Grid@CLEF Collections

9

Language Collection Documents Size (approx.) Dutch NRC Handelsblad 1994/95 84,121 291 Mbyte Algemeen Dagblad 1994/95 106,484 235 Mbyte 190,605 526 Mbyte English Los Angeles Times 1994 113,005 420 Mbyte French Le Monde 1994 44,013 154 Mbyte French SDA 1994 43,178 82 Mbyte 87,191 236 Mbyte German Frankfurter Rundschau 1994 139,715 319 Mbyte Der Spiegel 1994/95 13,979 61 Mbyte German SDA 1994 71,677 140 Mbyte 225,371 520 Mbyte Italian La Stampa 1994 58,051 189 Mbyte Italian SDA 1994 50,527 81 Mbyte 108,578 270 Mbyte

slide-11
SLIDE 11

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

Grid@CLEF Topics

84 topics in Dutch, English, French, German, and Italian from CLEF 2001&2002 All the topics have relevant documents in all the collections

10

slide-12
SLIDE 12

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

Grid@CLEF Results

11

English

slide-13
SLIDE 13

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

Grid@CLEF Results

11

English French

slide-14
SLIDE 14

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

Grid@CLEF Results

11

English French German

slide-15
SLIDE 15

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

Grid@CLEF: Approaches

12

Models: vector space + divergence from randomness based on Lucene + Terrier Stop words Snowball, N-grams for German, Krovetz and Savoy’s stemmers Blind query expansion (top terms from top docs) Data fusion with Z-score

Track Rank Participant Experiment DOI MAP English 1st chemnitz

10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHEMNITZ.CUT GRID MONO EN MERGED LUCENE TERRIER

54.45% 2nd chesire

10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHESHIRE.CHESHIRE GRID ENG T2FB

53.13% Difference 2.48% French 1st chesire

10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHESHIRE.CHESHIRE GRID FRE T2FB

51.88% 2nd chemnitz

10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHEMNITZ.CUT GRID MONO FR MERGED LUCENE TERRIER

49.42% Difference 4.97% German 1st chemnitz

10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHEMNITZ.CUT GRID MONO DE MERGED LUCENE TERRIER

48.64% 2nd chesire

10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHESHIRE.CHESHIRE GRID GER T2FB

40.02% Difference 21.53%

slide-16
SLIDE 16

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

Grid@CLEF: Approaches

12

Models: vector space + divergence from randomness based on Lucene + Terrier Stop words Snowball, N-grams for German, Krovetz and Savoy’s stemmers Blind query expansion (top terms from top docs) Data fusion with Z-score

We look for strong rules which let us predict the retrieval quality . . . [and] enable us to automatically configure a retrieval engine in accordance to the corpus

Track Rank Participant Experiment DOI MAP English 1st chemnitz

10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHEMNITZ.CUT GRID MONO EN MERGED LUCENE TERRIER

54.45% 2nd chesire

10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHESHIRE.CHESHIRE GRID ENG T2FB

53.13% Difference 2.48% French 1st chesire

10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHESHIRE.CHESHIRE GRID FRE T2FB

51.88% 2nd chemnitz

10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHEMNITZ.CUT GRID MONO FR MERGED LUCENE TERRIER

49.42% Difference 4.97% German 1st chemnitz

10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHEMNITZ.CUT GRID MONO DE MERGED LUCENE TERRIER

48.64% 2nd chesire

10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHESHIRE.CHESHIRE GRID GER T2FB

40.02% Difference 21.53%

slide-17
SLIDE 17

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

Grid@CLEF: Approaches

12

Models: vector space + divergence from randomness based on Lucene + Terrier Stop words Snowball, N-grams for German, Krovetz and Savoy’s stemmers Blind query expansion (top terms from top docs) Data fusion with Z-score Models: logistic regression based

  • n Chesire II

Stop words Stemmer (Snowball) Blind query expansion (probabilistic relevance feedback, top 10 terms from top 10 docs)

Track Rank Participant Experiment DOI MAP English 1st chemnitz

10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHEMNITZ.CUT GRID MONO EN MERGED LUCENE TERRIER

54.45% 2nd chesire

10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHESHIRE.CHESHIRE GRID ENG T2FB

53.13% Difference 2.48% French 1st chesire

10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHESHIRE.CHESHIRE GRID FRE T2FB

51.88% 2nd chemnitz

10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHEMNITZ.CUT GRID MONO FR MERGED LUCENE TERRIER

49.42% Difference 4.97% German 1st chemnitz

10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHEMNITZ.CUT GRID MONO DE MERGED LUCENE TERRIER

48.64% 2nd chesire

10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHESHIRE.CHESHIRE GRID GER T2FB

40.02% Difference 21.53%

slide-18
SLIDE 18

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

Grid@CLEF: Approaches

12

Models: vector space + divergence from randomness based on Lucene + Terrier Stop words Snowball, N-grams for German, Krovetz and Savoy’s stemmers Blind query expansion (top terms from top docs) Data fusion with Z-score Models: logistic regression based

  • n Chesire II

Stop words Stemmer (Snowball) Blind query expansion (probabilistic relevance feedback, top 10 terms from top 10 docs) We aim at understanding what happens when you try to separate the processing elements of IR systems, taking this as an opportunity to re-analyse and improve our system by finding a way to incorporate components of other IR systems

Track Rank Participant Experiment DOI MAP English 1st chemnitz

10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHEMNITZ.CUT GRID MONO EN MERGED LUCENE TERRIER

54.45% 2nd chesire

10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHESHIRE.CHESHIRE GRID ENG T2FB

53.13% Difference 2.48% French 1st chesire

10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHESHIRE.CHESHIRE GRID FRE T2FB

51.88% 2nd chemnitz

10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHEMNITZ.CUT GRID MONO FR MERGED LUCENE TERRIER

49.42% Difference 4.97% German 1st chemnitz

10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHEMNITZ.CUT GRID MONO DE MERGED LUCENE TERRIER

48.64% 2nd chesire

10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHESHIRE.CHESHIRE GRID GER T2FB

40.02% Difference 21.53%

slide-19
SLIDE 19

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

The task raised interest and participants addressed their own questions beyond the ones specific for the task We (organizers & participants) have proven that it is feasible even if challenging Much work is waiting for us:

tuning of the protocol and framework visualization and analysis issues

Grid@CLEF 2009 Conclusions

13

slide-20
SLIDE 20

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

Visualization Issues

14

slide-21
SLIDE 21

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Donna Harman

Grid@CLEF Advisory Committee

Martin Braschler Chris Buckley Fredric Gey Kalervo Järvelin Noriko Kando Craig Macdonald Prasenjit Majumder Paul McNamee Teruko Mitamura Mandar Mitra Stephen Robertson Jacques Savoy

15