Ad Hoc Track Overview: The TEL and Persian Tasks Carol Peters - - PowerPoint PPT Presentation

ad hoc track overview the tel and persian tasks
SMART_READER_LITE
LIVE PREVIEW

Ad Hoc Track Overview: The TEL and Persian Tasks Carol Peters - - PowerPoint PPT Presentation

CLEF 2009 Workshop September 30th - October 2nd 2009, , Greece Ad Hoc Track Overview: The TEL and Persian Tasks Carol Peters Nicola Ferro ISTI CNR, Italy University of Padua, Italy carol.peters@isti.cnr.it ferro@dei.unipd.it


slide-1
SLIDE 1

Nicola Ferro

University of Padua, Italy

ferro@dei.unipd.it

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece

Carol Peters

ISTI CNR, Italy

carol.peters@isti.cnr.it

Ad Hoc Track Overview: The TEL and Persian Tasks

slide-2
SLIDE 2

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

13 + 4 participants 11 countries

Participation

2

Ad hoc TEL participants Participant Institution Country aeb Athens Univ. Economics & Business Greece celi CELI Research srl Italy chemnitz Chemnitz University of Technology Germany cheshire U.C.Berkeley United States cuza Alexandru Ioan Cuza University Romania hit HIT2Lab, Heilongjiang Inst. Tech. China inesc

  • Tech. Univ. Lisbon

Portugal karlsruhe

  • Univ. Karlsruhe

Germany

  • pentext

OpenText Corp. Canada qazviniau Islamic Azaz Univ. Qazvin Iran trinity Trinity Coll. Dublin Ireland trinity-dcu Trinity Coll. & DCU Ireland weimar Bauhaus Univ. Weimar Germany Ad hoc Persian participants Participant Institution Country jhu-apl Johns Hopkins Univ. United States

  • pentext

OpenText Corp. Canada qazviniau Islamic Azaz Univ. Qazvin Iran unine U.Neuchatel-Informatics Switzerland

slide-3
SLIDE 3

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

Participation by Country

3

Germany 25,0% Greece 6,3% Ireland 12,5% Italy 6,3% Portugal 6,3% Romania 6,3% Switzerland 6,3% Canada 6,3% United States 12,5% China 6,3% Iran 6,3%

Europe: 69% Asia: 12% America: 19%

slide-4
SLIDE 4

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

Submissions by Task and Language

4

Task Chinese English Farsi French German Greek Italian Total TEL Mono English – 46 – – – – – 46 TEL Mono French – – – 35 – – – 35 TEL Mono German – – – – 35 – – 35 TEL Bili English 3 15 19 5 1 43 TEL Bili French 12 12 2 26 TEL Bili German 1 12 12 1 26 Mono Persian – – 17 – – – – 17 Bili Persian – 3 – – – – – 3 Total 4 73 17 62 66 5 4 231

Bili FA 1% Mono FA 7% Bili DE 11% Bili FR 11% Bili EN 19% Mono DE 15% Mono FR 15% Mono EN 20% Italian 2% German 29% Greek 2% French 27% Farsi 7% Chinese 2% English 32%

slide-5
SLIDE 5

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

Is this article relevant to my information need?”

? ?

Is the publication described by the bibliographic record relevant to my information need?

TEL Task

The task is to search and retrieve relevant items from collections of library catalog cards, which are surrogates for documents held by libraries Both monolingual and bilingual tasks have been offered Not only the data are very sparse and less rich than newspapers but also the task is different from a traditional ad-hoc task

5

slide-6
SLIDE 6

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

TEL Collections

The collections have been provided by The European Library (http:// www.theeuropeanlibrary.org/) and are catalog records harvested from Europe’s national libraries English

source: British Library (BL) size: 1,208,383,351 bytes items: 1,000,100 records

French

source: Bibliothèque Nationale de France (BnF) size: 1.362.122.091 bytes items: 1,000,100 records

German

source: Austrian National Library (ONB) size: 1.306.492.248 bytes items:869,353 records

6

slide-7
SLIDE 7

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

TEL Collections

The collections have been provided by The European Library (http:// www.theeuropeanlibrary.org/) and are catalog records harvested from Europe’s national libraries English

source: British Library (BL) size: 1,208,383,351 bytes items: 1,000,100 records

French

source: Bibliothèque Nationale de France (BnF) size: 1.362.122.091 bytes items: 1,000,100 records

German

source: Austrian National Library (ONB) size: 1.306.492.248 bytes items:869,353 records

6

slide-8
SLIDE 8

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

TEL Collections: Distribution of the Languages

7

7% 14% 21% 28% 35% 42% 49% 56% 63% 70% TEL English (BL) TEL French (BnF) TEL German (ONB)

English French German Spanish Russian Italian Latin Esperanto Other

TEL Collections are multilingual

slide-9
SLIDE 9

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

TEL Collections: Distribution of the Content

8

0% 40% 80% 120% 160% 200% 240% 280% 320% 360% 400% TEL English (BL) TEL French (BnF) TEL German (ONB)

Title Subject Description Abstract

TEL Collections are sparse

slide-10
SLIDE 10

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

TEL Topics

50 topics have been developed in English, German, and French Additional translations to Chinese, Greek, and Italian have been provided upon request Topics consist of title and description

  • nly; the narrative

contained information relevant only to assessors

9

slide-11
SLIDE 11

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

Persian Task

For the first time, a non-European language target collection is part of the CLEF corpus Persian is an Indo-European language, spoken in Iran, Afghanistan and Tajikistan, known as Farsi.

the Academy of Persian Language and Literature has declared the name “Persian” is more appropriate than “Farsi” Persian uses challenging script, which is a modified version of the Arabic alphabet with elision of short vowels and is written from right to left Persian morphology is complex and makes extensive use of suffixes and compounding

The task has been organized together with the Data Base Research Group (DBRG) of the University of Tehran which provided the Hamshahri corpus Both monolingual and bilingual tasks have been offered

10

slide-12
SLIDE 12

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

Persian Collection

The Hamshahri corpus is a newspaper corpus with news articles from 1996 to 2002, made available by the DBRG of University of Teheran (http:// ece.ut.ac.ir/dbrg/ hamshahri/) News article are categorized both in Persian and English It consists of:

size: 628,471,252 bytes items:166,774 documents

11

slide-13
SLIDE 13

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

Persian Topics

50 topics have been developed in Persian and translated to English Topics consist of title, description, and narrative When translating topics, the attempt is to render them as naturally as

  • possible. This was a particularly difficult task when going from Persian to

English as cultural differences had to be catered for

12

slide-14
SLIDE 14

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

Pool Statistics

13

fa en fr de

50 docs/topic 37 docs/topic 31 docs/topic 89 docs/topic

TEL English Pool (DOI 10.2454/AH-TEL-ENGLISH-CLEF2009) Pool size 26,190 pooled documents

  • 23,663 not relevant documents
  • 2,527 relevant documents

50 topics Pooled Experiments 31 out of 89 submitted experiments

  • monolingual: 22 out of 43 submitted experi-

ments

  • bilingual: 9 out of 46 submitted experiments

Assessors 4 assessors TEL French Pool (DOI 10.2454/AH-TEL-FRENCH-CLEF2009) Pool size 21,971 pooled documents

  • 20,118 not relevant documents
  • 1,853 relevant documents

50 topics Pooled Experiments 21 out of 61 submitted experiments

  • monolingual: 16 out of 35 submitted experi-

ments

  • bilingual: 5 out of 26 submitted experiments

Assessors 1 assessor TEL German Pool (DOI 10.2454/AH-TEL-GERMAN-CLEF2009) Pool size 25,541 pooled documents

  • 23,882 not relevant documents
  • 1,559 relevant documents

50 topics Pooled Experiments 21 out of 61 submitted experiments

  • monolingual: 16 out of 35 submitted experi-

ments

  • bilingual: 5 out of 26 submitted experiments

Assessors 2 assessors Persian Pool (DOI 10.2454/AH-PERSIAN-CLEF2009) Pool size 23,536 pooled documents

  • 19,072 not relevant documents
  • 4,464 relevant documents

50 topics Pooled Experiments 20 out of 20 submitted experiments

  • monolingual: 17 out of 17 submitted experi-

ments

  • bilingual: 3 out of 3 submitted experiments

Assessors 23 assessors

en fr de fa

slide-15
SLIDE 15

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

TEL English

14

Bilingual is 99% (was 91% in 2008)

  • f monolingual
slide-16
SLIDE 16

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

TEL French

15

Bilingual is 94% (was 57% in 2008)

  • f monolingual
slide-17
SLIDE 17

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

TEL German

16

Bilingual is 90% (was 53% in 2008)

  • f monolingual
slide-18
SLIDE 18

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

TEL: Approaches

17

Monolingual Bilingual

Multilinguality: not considered Structure: fields with different weights Models: vector space + multinomial language models based on Lucene Stop words N-grams Blind query expansion (Rocchio) Data fusion with linear combination

Track Rank Participant Experiment DOI MAP English 1st inesc

10.2415/AH-TEL-MONO-EN-CLEF2009.INESC.RUN11

40.84% 2nd chemnitz

10.2415/AH-TEL-MONO-EN-CLEF2009.CHEMNITZ.CUT 11 MONO MERGED EN 9 10

40.71% 3rd trinity

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY.TCDENRUN2

40.35% 4th hit

10.2415/AH-TEL-MONO-EN-CLEF2009.HIT.MTDD10T40

39.36% 5th trinity-dcu

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY-DCU.TCDDCUEN3

36.96% Difference 10.50% French 1st karlsruhe

10.2415/AH-TEL-MONO-FR-CLEF2009.KARLSRUHE.INDEXBL

27.20% 2nd chemnitz

10.2415/AH-TEL-MONO-FR-CLEF2009.CHEMNITZ.CUT 19 MONO MERGED FR 17 18

25.83% 3rd inesc

10.2415/AH-TEL-MONO-FR-CLEF2009.INESC.RUN12

25.11% 4th

  • pentext

10.2415/AH-TEL-MONO-FR-CLEF2009.OPENTEXT.OTFR09TDE

24.12% 5th celi

10.2415/AH-TEL-MONO-FR-CLEF2009.CELI.CACAO FRBNF ML

23.61% Difference 15.20% German 1st

  • pentext

10.2415/AH-TEL-MONO-DE-CLEF2009.OPENTEXT.OTDE09TDE

28.68% 2nd chemnitz

10.2415/AH-TEL-MONO-DE-CLEF2009.CHEMNITZ.CUT 3 MONO MERGED DE 1 2

27.89% 3rd inesc

10.2415/AH-TEL-MONO-DE-CLEF2009.INESC.RUN12

27.85% 4th trinity-dcu

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY-DCU.TCDDCUDE3

26.86% 5th trinity

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY.TCDDERUN1

25.77% Difference 11.30% Track Rank Participant Experiment DOI MAP English 1st chemnitz

10.2415/AH-TEL-BILI-X2EN-CLEF2009.CHEMNITZ.CUT 13 BILI MERGED DE2EN 9 10

40.46% 2nd hit

10.2415/AH-TEL-BILI-X2EN-CLEF2009.HIT.XTDD10T40

35.27% 3rd trinity

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY.TCDDEENRUN3

35.05% 4th trinity-dcu

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY-DCU.TCDDCUDEEN1

33.33% 5th karlsrhue

10.2415/AH-TEL-BILI-X2EN-CLEF2009.KARLSRUHE.DE INDEXBL

32.70% Difference 23.73% French 1st chemnitz

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHEMNITZ.CUT 24 BILI EN2FR MERGED LANG SPEC REF CUT 17

25.57% 2nd karlsrhue

10.2415/AH-TEL-BILI-X2FR-CLEF2009.KARLSRUHE.EN INDEXBL

24.62% 3rd chesire

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHESHIRE.BIENFRT2FB

16.77% 4th trinity

10.2415/AH-TEL-BILI-X2FR-CLEF2009.TRINITY.TCDDEFRRUN2

16.33% 5th weimar

10.2415/AH-TEL-BILI-X2FR-CLEF2009.WEIMAR.CLESA169283ENINFR

14.51% Difference 69.67% German 1st chemnitz

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHEMNITZ.CUT 5 BILI MERGED EN2DE 1 2

25.83% 2nd trinity

10.2415/AH-TEL-BILI-X2DE-CLEF2009.TRINITY.TCDENDERUN3

19.35% 3rd karlsrhue

10.2415/AH-TEL-BILI-X2DE-CLEF2009.KARLSRUHE.EN INDEXBL

16.46% 4th weimar

10.2415/AH-TEL-BILI-X2DE-CLEF2009.WEIMAR.COMBINEDFRINDE

15.75% 5th chesire

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHESHIRE.BIENDET2FBX

11.50% Difference 124.60%

slide-19
SLIDE 19

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

TEL: Approaches

17

Monolingual Bilingual

Multilinguality: not considered Structure: fields with different weights Models: vector space + multinomial language models based on Lucene Stop words N-grams Blind query expansion (Rocchio) Data fusion with linear combination Multilinguality: separate indexes for each language and multiple translations

  • f the query

Structure: not considered Models: vector space + divergence from randomness based on Lucene + Terrier Google Translate Stop words Snowball for English and French, N-grams for German Blind query expansion (top terms from top docs) Data fusion with Z-score

More at the talk

Track Rank Participant Experiment DOI MAP English 1st inesc

10.2415/AH-TEL-MONO-EN-CLEF2009.INESC.RUN11

40.84% 2nd chemnitz

10.2415/AH-TEL-MONO-EN-CLEF2009.CHEMNITZ.CUT 11 MONO MERGED EN 9 10

40.71% 3rd trinity

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY.TCDENRUN2

40.35% 4th hit

10.2415/AH-TEL-MONO-EN-CLEF2009.HIT.MTDD10T40

39.36% 5th trinity-dcu

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY-DCU.TCDDCUEN3

36.96% Difference 10.50% French 1st karlsruhe

10.2415/AH-TEL-MONO-FR-CLEF2009.KARLSRUHE.INDEXBL

27.20% 2nd chemnitz

10.2415/AH-TEL-MONO-FR-CLEF2009.CHEMNITZ.CUT 19 MONO MERGED FR 17 18

25.83% 3rd inesc

10.2415/AH-TEL-MONO-FR-CLEF2009.INESC.RUN12

25.11% 4th

  • pentext

10.2415/AH-TEL-MONO-FR-CLEF2009.OPENTEXT.OTFR09TDE

24.12% 5th celi

10.2415/AH-TEL-MONO-FR-CLEF2009.CELI.CACAO FRBNF ML

23.61% Difference 15.20% German 1st

  • pentext

10.2415/AH-TEL-MONO-DE-CLEF2009.OPENTEXT.OTDE09TDE

28.68% 2nd chemnitz

10.2415/AH-TEL-MONO-DE-CLEF2009.CHEMNITZ.CUT 3 MONO MERGED DE 1 2

27.89% 3rd inesc

10.2415/AH-TEL-MONO-DE-CLEF2009.INESC.RUN12

27.85% 4th trinity-dcu

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY-DCU.TCDDCUDE3

26.86% 5th trinity

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY.TCDDERUN1

25.77% Difference 11.30% Track Rank Participant Experiment DOI MAP English 1st chemnitz

10.2415/AH-TEL-BILI-X2EN-CLEF2009.CHEMNITZ.CUT 13 BILI MERGED DE2EN 9 10

40.46% 2nd hit

10.2415/AH-TEL-BILI-X2EN-CLEF2009.HIT.XTDD10T40

35.27% 3rd trinity

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY.TCDDEENRUN3

35.05% 4th trinity-dcu

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY-DCU.TCDDCUDEEN1

33.33% 5th karlsrhue

10.2415/AH-TEL-BILI-X2EN-CLEF2009.KARLSRUHE.DE INDEXBL

32.70% Difference 23.73% French 1st chemnitz

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHEMNITZ.CUT 24 BILI EN2FR MERGED LANG SPEC REF CUT 17

25.57% 2nd karlsrhue

10.2415/AH-TEL-BILI-X2FR-CLEF2009.KARLSRUHE.EN INDEXBL

24.62% 3rd chesire

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHESHIRE.BIENFRT2FB

16.77% 4th trinity

10.2415/AH-TEL-BILI-X2FR-CLEF2009.TRINITY.TCDDEFRRUN2

16.33% 5th weimar

10.2415/AH-TEL-BILI-X2FR-CLEF2009.WEIMAR.CLESA169283ENINFR

14.51% Difference 69.67% German 1st chemnitz

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHEMNITZ.CUT 5 BILI MERGED EN2DE 1 2

25.83% 2nd trinity

10.2415/AH-TEL-BILI-X2DE-CLEF2009.TRINITY.TCDENDERUN3

19.35% 3rd karlsrhue

10.2415/AH-TEL-BILI-X2DE-CLEF2009.KARLSRUHE.EN INDEXBL

16.46% 4th weimar

10.2415/AH-TEL-BILI-X2DE-CLEF2009.WEIMAR.COMBINEDFRINDE

15.75% 5th chesire

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHESHIRE.BIENDET2FBX

11.50% Difference 124.60%

slide-20
SLIDE 20

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

TEL: Approaches

17

Monolingual Bilingual

Multilinguality: not considered Structure: fields with different weights Models: vector space + multinomial language models based on Lucene Stop words N-grams Blind query expansion (Rocchio) Data fusion with linear combination Multilinguality: separate indexes for each language and multiple translations

  • f the query

Structure: not considered Models: vector space + divergence from randomness based on Lucene + Terrier Google Translate Stop words Snowball for English and French, N-grams for German Blind query expansion (top terms from top docs) Data fusion with Z-score

More at the talk

Multilinguality: not considered Structure: different field sets used for different collections Models: language models based on Lemur Stop words Stemmers (Porter and Lucene default

  • nes)

Latent Dirichlet Analisys for re-ranking Google Translate

Track Rank Participant Experiment DOI MAP English 1st inesc

10.2415/AH-TEL-MONO-EN-CLEF2009.INESC.RUN11

40.84% 2nd chemnitz

10.2415/AH-TEL-MONO-EN-CLEF2009.CHEMNITZ.CUT 11 MONO MERGED EN 9 10

40.71% 3rd trinity

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY.TCDENRUN2

40.35% 4th hit

10.2415/AH-TEL-MONO-EN-CLEF2009.HIT.MTDD10T40

39.36% 5th trinity-dcu

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY-DCU.TCDDCUEN3

36.96% Difference 10.50% French 1st karlsruhe

10.2415/AH-TEL-MONO-FR-CLEF2009.KARLSRUHE.INDEXBL

27.20% 2nd chemnitz

10.2415/AH-TEL-MONO-FR-CLEF2009.CHEMNITZ.CUT 19 MONO MERGED FR 17 18

25.83% 3rd inesc

10.2415/AH-TEL-MONO-FR-CLEF2009.INESC.RUN12

25.11% 4th

  • pentext

10.2415/AH-TEL-MONO-FR-CLEF2009.OPENTEXT.OTFR09TDE

24.12% 5th celi

10.2415/AH-TEL-MONO-FR-CLEF2009.CELI.CACAO FRBNF ML

23.61% Difference 15.20% German 1st

  • pentext

10.2415/AH-TEL-MONO-DE-CLEF2009.OPENTEXT.OTDE09TDE

28.68% 2nd chemnitz

10.2415/AH-TEL-MONO-DE-CLEF2009.CHEMNITZ.CUT 3 MONO MERGED DE 1 2

27.89% 3rd inesc

10.2415/AH-TEL-MONO-DE-CLEF2009.INESC.RUN12

27.85% 4th trinity-dcu

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY-DCU.TCDDCUDE3

26.86% 5th trinity

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY.TCDDERUN1

25.77% Difference 11.30% Track Rank Participant Experiment DOI MAP English 1st chemnitz

10.2415/AH-TEL-BILI-X2EN-CLEF2009.CHEMNITZ.CUT 13 BILI MERGED DE2EN 9 10

40.46% 2nd hit

10.2415/AH-TEL-BILI-X2EN-CLEF2009.HIT.XTDD10T40

35.27% 3rd trinity

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY.TCDDEENRUN3

35.05% 4th trinity-dcu

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY-DCU.TCDDCUDEEN1

33.33% 5th karlsrhue

10.2415/AH-TEL-BILI-X2EN-CLEF2009.KARLSRUHE.DE INDEXBL

32.70% Difference 23.73% French 1st chemnitz

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHEMNITZ.CUT 24 BILI EN2FR MERGED LANG SPEC REF CUT 17

25.57% 2nd karlsrhue

10.2415/AH-TEL-BILI-X2FR-CLEF2009.KARLSRUHE.EN INDEXBL

24.62% 3rd chesire

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHESHIRE.BIENFRT2FB

16.77% 4th trinity

10.2415/AH-TEL-BILI-X2FR-CLEF2009.TRINITY.TCDDEFRRUN2

16.33% 5th weimar

10.2415/AH-TEL-BILI-X2FR-CLEF2009.WEIMAR.CLESA169283ENINFR

14.51% Difference 69.67% German 1st chemnitz

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHEMNITZ.CUT 5 BILI MERGED EN2DE 1 2

25.83% 2nd trinity

10.2415/AH-TEL-BILI-X2DE-CLEF2009.TRINITY.TCDENDERUN3

19.35% 3rd karlsrhue

10.2415/AH-TEL-BILI-X2DE-CLEF2009.KARLSRUHE.EN INDEXBL

16.46% 4th weimar

10.2415/AH-TEL-BILI-X2DE-CLEF2009.WEIMAR.COMBINEDFRINDE

15.75% 5th chesire

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHESHIRE.BIENDET2FBX

11.50% Difference 124.60%

slide-21
SLIDE 21

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

TEL: Approaches

17

Monolingual Bilingual

Multilinguality: not considered Structure: fields with different weights Models: vector space + multinomial language models based on Lucene Stop words N-grams Blind query expansion (Rocchio) Data fusion with linear combination Multilinguality: separate indexes for each language and multiple translations

  • f the query

Structure: not considered Models: vector space + divergence from randomness based on Lucene + Terrier Google Translate Stop words Snowball for English and French, N-grams for German Blind query expansion (top terms from top docs) Data fusion with Z-score

More at the talk

Multilinguality: not considered Structure: different field sets used for different collections Models: language models based on Lemur Stop words Stemmers (Porter and Lucene default

  • nes)

Latent Dirichlet Analisys for re-ranking Google Translate Multilinguality: not considered Structure: subset of the fields used Models: language models Stop words Blind query expansion Google Translate

Track Rank Participant Experiment DOI MAP English 1st inesc

10.2415/AH-TEL-MONO-EN-CLEF2009.INESC.RUN11

40.84% 2nd chemnitz

10.2415/AH-TEL-MONO-EN-CLEF2009.CHEMNITZ.CUT 11 MONO MERGED EN 9 10

40.71% 3rd trinity

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY.TCDENRUN2

40.35% 4th hit

10.2415/AH-TEL-MONO-EN-CLEF2009.HIT.MTDD10T40

39.36% 5th trinity-dcu

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY-DCU.TCDDCUEN3

36.96% Difference 10.50% French 1st karlsruhe

10.2415/AH-TEL-MONO-FR-CLEF2009.KARLSRUHE.INDEXBL

27.20% 2nd chemnitz

10.2415/AH-TEL-MONO-FR-CLEF2009.CHEMNITZ.CUT 19 MONO MERGED FR 17 18

25.83% 3rd inesc

10.2415/AH-TEL-MONO-FR-CLEF2009.INESC.RUN12

25.11% 4th

  • pentext

10.2415/AH-TEL-MONO-FR-CLEF2009.OPENTEXT.OTFR09TDE

24.12% 5th celi

10.2415/AH-TEL-MONO-FR-CLEF2009.CELI.CACAO FRBNF ML

23.61% Difference 15.20% German 1st

  • pentext

10.2415/AH-TEL-MONO-DE-CLEF2009.OPENTEXT.OTDE09TDE

28.68% 2nd chemnitz

10.2415/AH-TEL-MONO-DE-CLEF2009.CHEMNITZ.CUT 3 MONO MERGED DE 1 2

27.89% 3rd inesc

10.2415/AH-TEL-MONO-DE-CLEF2009.INESC.RUN12

27.85% 4th trinity-dcu

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY-DCU.TCDDCUDE3

26.86% 5th trinity

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY.TCDDERUN1

25.77% Difference 11.30% Track Rank Participant Experiment DOI MAP English 1st chemnitz

10.2415/AH-TEL-BILI-X2EN-CLEF2009.CHEMNITZ.CUT 13 BILI MERGED DE2EN 9 10

40.46% 2nd hit

10.2415/AH-TEL-BILI-X2EN-CLEF2009.HIT.XTDD10T40

35.27% 3rd trinity

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY.TCDDEENRUN3

35.05% 4th trinity-dcu

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY-DCU.TCDDCUDEEN1

33.33% 5th karlsrhue

10.2415/AH-TEL-BILI-X2EN-CLEF2009.KARLSRUHE.DE INDEXBL

32.70% Difference 23.73% French 1st chemnitz

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHEMNITZ.CUT 24 BILI EN2FR MERGED LANG SPEC REF CUT 17

25.57% 2nd karlsrhue

10.2415/AH-TEL-BILI-X2FR-CLEF2009.KARLSRUHE.EN INDEXBL

24.62% 3rd chesire

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHESHIRE.BIENFRT2FB

16.77% 4th trinity

10.2415/AH-TEL-BILI-X2FR-CLEF2009.TRINITY.TCDDEFRRUN2

16.33% 5th weimar

10.2415/AH-TEL-BILI-X2FR-CLEF2009.WEIMAR.CLESA169283ENINFR

14.51% Difference 69.67% German 1st chemnitz

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHEMNITZ.CUT 5 BILI MERGED EN2DE 1 2

25.83% 2nd trinity

10.2415/AH-TEL-BILI-X2DE-CLEF2009.TRINITY.TCDENDERUN3

19.35% 3rd karlsrhue

10.2415/AH-TEL-BILI-X2DE-CLEF2009.KARLSRUHE.EN INDEXBL

16.46% 4th weimar

10.2415/AH-TEL-BILI-X2DE-CLEF2009.WEIMAR.COMBINEDFRINDE

15.75% 5th chesire

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHESHIRE.BIENDET2FBX

11.50% Difference 124.60%

slide-22
SLIDE 22

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

TEL: Approaches

17

Monolingual Bilingual

Multilinguality: not considered Structure: fields with different weights Models: vector space + multinomial language models based on Lucene Stop words N-grams Blind query expansion (Rocchio) Data fusion with linear combination Multilinguality: separate indexes for each language and multiple translations

  • f the query

Structure: not considered Models: vector space + divergence from randomness based on Lucene + Terrier Google Translate Stop words Snowball for English and French, N-grams for German Blind query expansion (top terms from top docs) Data fusion with Z-score

More at the talk

Multilinguality: not considered Structure: different field sets used for different collections Models: language models based on Lemur Stop words Stemmers (Porter and Lucene default

  • nes)

Latent Dirichlet Analisys for re-ranking Google Translate Multilinguality: not considered Structure: subset of the fields used Models: language models Stop words Blind query expansion Google Translate Multilinguality: not considered Structure: different field sets experimented Models: language models and BM25 based on Lemur Stop words Stemmers (Snowball) Blind query expansion Document expansion with subject headings (DDC) by using EVM Google Translate and MaTrEx (STMT)

Track Rank Participant Experiment DOI MAP English 1st inesc

10.2415/AH-TEL-MONO-EN-CLEF2009.INESC.RUN11

40.84% 2nd chemnitz

10.2415/AH-TEL-MONO-EN-CLEF2009.CHEMNITZ.CUT 11 MONO MERGED EN 9 10

40.71% 3rd trinity

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY.TCDENRUN2

40.35% 4th hit

10.2415/AH-TEL-MONO-EN-CLEF2009.HIT.MTDD10T40

39.36% 5th trinity-dcu

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY-DCU.TCDDCUEN3

36.96% Difference 10.50% French 1st karlsruhe

10.2415/AH-TEL-MONO-FR-CLEF2009.KARLSRUHE.INDEXBL

27.20% 2nd chemnitz

10.2415/AH-TEL-MONO-FR-CLEF2009.CHEMNITZ.CUT 19 MONO MERGED FR 17 18

25.83% 3rd inesc

10.2415/AH-TEL-MONO-FR-CLEF2009.INESC.RUN12

25.11% 4th

  • pentext

10.2415/AH-TEL-MONO-FR-CLEF2009.OPENTEXT.OTFR09TDE

24.12% 5th celi

10.2415/AH-TEL-MONO-FR-CLEF2009.CELI.CACAO FRBNF ML

23.61% Difference 15.20% German 1st

  • pentext

10.2415/AH-TEL-MONO-DE-CLEF2009.OPENTEXT.OTDE09TDE

28.68% 2nd chemnitz

10.2415/AH-TEL-MONO-DE-CLEF2009.CHEMNITZ.CUT 3 MONO MERGED DE 1 2

27.89% 3rd inesc

10.2415/AH-TEL-MONO-DE-CLEF2009.INESC.RUN12

27.85% 4th trinity-dcu

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY-DCU.TCDDCUDE3

26.86% 5th trinity

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY.TCDDERUN1

25.77% Difference 11.30% Track Rank Participant Experiment DOI MAP English 1st chemnitz

10.2415/AH-TEL-BILI-X2EN-CLEF2009.CHEMNITZ.CUT 13 BILI MERGED DE2EN 9 10

40.46% 2nd hit

10.2415/AH-TEL-BILI-X2EN-CLEF2009.HIT.XTDD10T40

35.27% 3rd trinity

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY.TCDDEENRUN3

35.05% 4th trinity-dcu

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY-DCU.TCDDCUDEEN1

33.33% 5th karlsrhue

10.2415/AH-TEL-BILI-X2EN-CLEF2009.KARLSRUHE.DE INDEXBL

32.70% Difference 23.73% French 1st chemnitz

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHEMNITZ.CUT 24 BILI EN2FR MERGED LANG SPEC REF CUT 17

25.57% 2nd karlsrhue

10.2415/AH-TEL-BILI-X2FR-CLEF2009.KARLSRUHE.EN INDEXBL

24.62% 3rd chesire

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHESHIRE.BIENFRT2FB

16.77% 4th trinity

10.2415/AH-TEL-BILI-X2FR-CLEF2009.TRINITY.TCDDEFRRUN2

16.33% 5th weimar

10.2415/AH-TEL-BILI-X2FR-CLEF2009.WEIMAR.CLESA169283ENINFR

14.51% Difference 69.67% German 1st chemnitz

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHEMNITZ.CUT 5 BILI MERGED EN2DE 1 2

25.83% 2nd trinity

10.2415/AH-TEL-BILI-X2DE-CLEF2009.TRINITY.TCDENDERUN3

19.35% 3rd karlsrhue

10.2415/AH-TEL-BILI-X2DE-CLEF2009.KARLSRUHE.EN INDEXBL

16.46% 4th weimar

10.2415/AH-TEL-BILI-X2DE-CLEF2009.WEIMAR.COMBINEDFRINDE

15.75% 5th chesire

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHESHIRE.BIENDET2FBX

11.50% Difference 124.60%

slide-23
SLIDE 23

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

TEL: Approaches

17

Monolingual Bilingual

Multilinguality: not considered Structure: fields with different weights Models: vector space + multinomial language models based on Lucene Stop words N-grams Blind query expansion (Rocchio) Data fusion with linear combination Multilinguality: separate indexes for each language and multiple translations

  • f the query

Structure: not considered Models: vector space + divergence from randomness based on Lucene + Terrier Google Translate Stop words Snowball for English and French, N-grams for German Blind query expansion (top terms from top docs) Data fusion with Z-score

More at the talk

Multilinguality: not considered Structure: different field sets used for different collections Models: language models based on Lemur Stop words Stemmers (Porter and Lucene default

  • nes)

Latent Dirichlet Analisys for re-ranking Google Translate Multilinguality: not considered Structure: subset of the fields used Models: language models Stop words Blind query expansion Google Translate Multilinguality: not considered Structure: different field sets experimented Models: language models and BM25 based on Lemur Stop words Stemmers (Snowball) Blind query expansion Document expansion with subject headings (DDC) by using EVM Google Translate and MaTrEx (STMT) Multilinguality: separate indexes for each language at field level Structure: not considered Models: vector space + divergence from randomness based Terrier Stop words Stemmers (Snowball) Cross-Language Explicit Semantic Analysis (CL-ESA) exploiting Wikipedia Data fusion with linear combination and with Support Vector Machines (SVM)

More at the talk

Track Rank Participant Experiment DOI MAP English 1st inesc

10.2415/AH-TEL-MONO-EN-CLEF2009.INESC.RUN11

40.84% 2nd chemnitz

10.2415/AH-TEL-MONO-EN-CLEF2009.CHEMNITZ.CUT 11 MONO MERGED EN 9 10

40.71% 3rd trinity

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY.TCDENRUN2

40.35% 4th hit

10.2415/AH-TEL-MONO-EN-CLEF2009.HIT.MTDD10T40

39.36% 5th trinity-dcu

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY-DCU.TCDDCUEN3

36.96% Difference 10.50% French 1st karlsruhe

10.2415/AH-TEL-MONO-FR-CLEF2009.KARLSRUHE.INDEXBL

27.20% 2nd chemnitz

10.2415/AH-TEL-MONO-FR-CLEF2009.CHEMNITZ.CUT 19 MONO MERGED FR 17 18

25.83% 3rd inesc

10.2415/AH-TEL-MONO-FR-CLEF2009.INESC.RUN12

25.11% 4th

  • pentext

10.2415/AH-TEL-MONO-FR-CLEF2009.OPENTEXT.OTFR09TDE

24.12% 5th celi

10.2415/AH-TEL-MONO-FR-CLEF2009.CELI.CACAO FRBNF ML

23.61% Difference 15.20% German 1st

  • pentext

10.2415/AH-TEL-MONO-DE-CLEF2009.OPENTEXT.OTDE09TDE

28.68% 2nd chemnitz

10.2415/AH-TEL-MONO-DE-CLEF2009.CHEMNITZ.CUT 3 MONO MERGED DE 1 2

27.89% 3rd inesc

10.2415/AH-TEL-MONO-DE-CLEF2009.INESC.RUN12

27.85% 4th trinity-dcu

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY-DCU.TCDDCUDE3

26.86% 5th trinity

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY.TCDDERUN1

25.77% Difference 11.30% Track Rank Participant Experiment DOI MAP English 1st chemnitz

10.2415/AH-TEL-BILI-X2EN-CLEF2009.CHEMNITZ.CUT 13 BILI MERGED DE2EN 9 10

40.46% 2nd hit

10.2415/AH-TEL-BILI-X2EN-CLEF2009.HIT.XTDD10T40

35.27% 3rd trinity

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY.TCDDEENRUN3

35.05% 4th trinity-dcu

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY-DCU.TCDDCUDEEN1

33.33% 5th karlsrhue

10.2415/AH-TEL-BILI-X2EN-CLEF2009.KARLSRUHE.DE INDEXBL

32.70% Difference 23.73% French 1st chemnitz

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHEMNITZ.CUT 24 BILI EN2FR MERGED LANG SPEC REF CUT 17

25.57% 2nd karlsrhue

10.2415/AH-TEL-BILI-X2FR-CLEF2009.KARLSRUHE.EN INDEXBL

24.62% 3rd chesire

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHESHIRE.BIENFRT2FB

16.77% 4th trinity

10.2415/AH-TEL-BILI-X2FR-CLEF2009.TRINITY.TCDDEFRRUN2

16.33% 5th weimar

10.2415/AH-TEL-BILI-X2FR-CLEF2009.WEIMAR.CLESA169283ENINFR

14.51% Difference 69.67% German 1st chemnitz

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHEMNITZ.CUT 5 BILI MERGED EN2DE 1 2

25.83% 2nd trinity

10.2415/AH-TEL-BILI-X2DE-CLEF2009.TRINITY.TCDENDERUN3

19.35% 3rd karlsrhue

10.2415/AH-TEL-BILI-X2DE-CLEF2009.KARLSRUHE.EN INDEXBL

16.46% 4th weimar

10.2415/AH-TEL-BILI-X2DE-CLEF2009.WEIMAR.COMBINEDFRINDE

15.75% 5th chesire

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHESHIRE.BIENDET2FBX

11.50% Difference 124.60%

slide-24
SLIDE 24

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

TEL: Approaches

17

Monolingual Bilingual

Multilinguality: not considered Structure: fields with different weights Models: vector space + multinomial language models based on Lucene Stop words N-grams Blind query expansion (Rocchio) Data fusion with linear combination Multilinguality: separate indexes for each language and multiple translations

  • f the query

Structure: not considered Models: vector space + divergence from randomness based on Lucene + Terrier Google Translate Stop words Snowball for English and French, N-grams for German Blind query expansion (top terms from top docs) Data fusion with Z-score

More at the talk

Multilinguality: not considered Structure: different field sets used for different collections Models: language models based on Lemur Stop words Stemmers (Porter and Lucene default

  • nes)

Latent Dirichlet Analisys for re-ranking Google Translate Multilinguality: not considered Structure: subset of the fields used Models: language models Stop words Blind query expansion Google Translate Multilinguality: not considered Structure: different field sets experimented Models: language models and BM25 based on Lemur Stop words Stemmers (Snowball) Blind query expansion Document expansion with subject headings (DDC) by using EVM Google Translate and MaTrEx (STMT) Multilinguality: separate indexes for each language at field level Structure: not considered Models: vector space + divergence from randomness based Terrier Stop words Stemmers (Snowball) Cross-Language Explicit Semantic Analysis (CL-ESA) exploiting Wikipedia Data fusion with linear combination and with Support Vector Machines (SVM)

More at the talk

Multilinguality: not considered Structure: not considered Models: probabilistic model (Okapi- like) Stop words (few terms) Lexicon-base inflectional stemmer and decompounding for German, Snowball stemmers, N-gram stemmers

Track Rank Participant Experiment DOI MAP English 1st inesc

10.2415/AH-TEL-MONO-EN-CLEF2009.INESC.RUN11

40.84% 2nd chemnitz

10.2415/AH-TEL-MONO-EN-CLEF2009.CHEMNITZ.CUT 11 MONO MERGED EN 9 10

40.71% 3rd trinity

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY.TCDENRUN2

40.35% 4th hit

10.2415/AH-TEL-MONO-EN-CLEF2009.HIT.MTDD10T40

39.36% 5th trinity-dcu

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY-DCU.TCDDCUEN3

36.96% Difference 10.50% French 1st karlsruhe

10.2415/AH-TEL-MONO-FR-CLEF2009.KARLSRUHE.INDEXBL

27.20% 2nd chemnitz

10.2415/AH-TEL-MONO-FR-CLEF2009.CHEMNITZ.CUT 19 MONO MERGED FR 17 18

25.83% 3rd inesc

10.2415/AH-TEL-MONO-FR-CLEF2009.INESC.RUN12

25.11% 4th

  • pentext

10.2415/AH-TEL-MONO-FR-CLEF2009.OPENTEXT.OTFR09TDE

24.12% 5th celi

10.2415/AH-TEL-MONO-FR-CLEF2009.CELI.CACAO FRBNF ML

23.61% Difference 15.20% German 1st

  • pentext

10.2415/AH-TEL-MONO-DE-CLEF2009.OPENTEXT.OTDE09TDE

28.68% 2nd chemnitz

10.2415/AH-TEL-MONO-DE-CLEF2009.CHEMNITZ.CUT 3 MONO MERGED DE 1 2

27.89% 3rd inesc

10.2415/AH-TEL-MONO-DE-CLEF2009.INESC.RUN12

27.85% 4th trinity-dcu

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY-DCU.TCDDCUDE3

26.86% 5th trinity

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY.TCDDERUN1

25.77% Difference 11.30% Track Rank Participant Experiment DOI MAP English 1st chemnitz

10.2415/AH-TEL-BILI-X2EN-CLEF2009.CHEMNITZ.CUT 13 BILI MERGED DE2EN 9 10

40.46% 2nd hit

10.2415/AH-TEL-BILI-X2EN-CLEF2009.HIT.XTDD10T40

35.27% 3rd trinity

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY.TCDDEENRUN3

35.05% 4th trinity-dcu

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY-DCU.TCDDCUDEEN1

33.33% 5th karlsrhue

10.2415/AH-TEL-BILI-X2EN-CLEF2009.KARLSRUHE.DE INDEXBL

32.70% Difference 23.73% French 1st chemnitz

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHEMNITZ.CUT 24 BILI EN2FR MERGED LANG SPEC REF CUT 17

25.57% 2nd karlsrhue

10.2415/AH-TEL-BILI-X2FR-CLEF2009.KARLSRUHE.EN INDEXBL

24.62% 3rd chesire

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHESHIRE.BIENFRT2FB

16.77% 4th trinity

10.2415/AH-TEL-BILI-X2FR-CLEF2009.TRINITY.TCDDEFRRUN2

16.33% 5th weimar

10.2415/AH-TEL-BILI-X2FR-CLEF2009.WEIMAR.CLESA169283ENINFR

14.51% Difference 69.67% German 1st chemnitz

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHEMNITZ.CUT 5 BILI MERGED EN2DE 1 2

25.83% 2nd trinity

10.2415/AH-TEL-BILI-X2DE-CLEF2009.TRINITY.TCDENDERUN3

19.35% 3rd karlsrhue

10.2415/AH-TEL-BILI-X2DE-CLEF2009.KARLSRUHE.EN INDEXBL

16.46% 4th weimar

10.2415/AH-TEL-BILI-X2DE-CLEF2009.WEIMAR.COMBINEDFRINDE

15.75% 5th chesire

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHESHIRE.BIENDET2FBX

11.50% Difference 124.60%

slide-25
SLIDE 25

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

TEL: Approaches

17

Monolingual Bilingual

Multilinguality: not considered Structure: fields with different weights Models: vector space + multinomial language models based on Lucene Stop words N-grams Blind query expansion (Rocchio) Data fusion with linear combination Multilinguality: separate indexes for each language and multiple translations

  • f the query

Structure: not considered Models: vector space + divergence from randomness based on Lucene + Terrier Google Translate Stop words Snowball for English and French, N-grams for German Blind query expansion (top terms from top docs) Data fusion with Z-score

More at the talk

Multilinguality: not considered Structure: different field sets used for different collections Models: language models based on Lemur Stop words Stemmers (Porter and Lucene default

  • nes)

Latent Dirichlet Analisys for re-ranking Google Translate Multilinguality: not considered Structure: subset of the fields used Models: language models Stop words Blind query expansion Google Translate Multilinguality: not considered Structure: different field sets experimented Models: language models and BM25 based on Lemur Stop words Stemmers (Snowball) Blind query expansion Document expansion with subject headings (DDC) by using EVM Google Translate and MaTrEx (STMT) Multilinguality: separate indexes for each language at field level Structure: not considered Models: vector space + divergence from randomness based Terrier Stop words Stemmers (Snowball) Cross-Language Explicit Semantic Analysis (CL-ESA) exploiting Wikipedia Data fusion with linear combination and with Support Vector Machines (SVM)

More at the talk

Multilinguality: not considered Structure: not considered Models: probabilistic model (Okapi- like) Stop words (few terms) Lexicon-base inflectional stemmer and decompounding for German, Snowball stemmers, N-gram stemmers Multilinguality: multiple translations of the query Structure: subset of the fields used Models: vector space based on Lucene Lemmatization and named entity recognition Translation disambiguation by using corpus-based word space model via random indexing Internal translation resources and

  • nline dictionaries (Ergane)

Track Rank Participant Experiment DOI MAP English 1st inesc

10.2415/AH-TEL-MONO-EN-CLEF2009.INESC.RUN11

40.84% 2nd chemnitz

10.2415/AH-TEL-MONO-EN-CLEF2009.CHEMNITZ.CUT 11 MONO MERGED EN 9 10

40.71% 3rd trinity

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY.TCDENRUN2

40.35% 4th hit

10.2415/AH-TEL-MONO-EN-CLEF2009.HIT.MTDD10T40

39.36% 5th trinity-dcu

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY-DCU.TCDDCUEN3

36.96% Difference 10.50% French 1st karlsruhe

10.2415/AH-TEL-MONO-FR-CLEF2009.KARLSRUHE.INDEXBL

27.20% 2nd chemnitz

10.2415/AH-TEL-MONO-FR-CLEF2009.CHEMNITZ.CUT 19 MONO MERGED FR 17 18

25.83% 3rd inesc

10.2415/AH-TEL-MONO-FR-CLEF2009.INESC.RUN12

25.11% 4th

  • pentext

10.2415/AH-TEL-MONO-FR-CLEF2009.OPENTEXT.OTFR09TDE

24.12% 5th celi

10.2415/AH-TEL-MONO-FR-CLEF2009.CELI.CACAO FRBNF ML

23.61% Difference 15.20% German 1st

  • pentext

10.2415/AH-TEL-MONO-DE-CLEF2009.OPENTEXT.OTDE09TDE

28.68% 2nd chemnitz

10.2415/AH-TEL-MONO-DE-CLEF2009.CHEMNITZ.CUT 3 MONO MERGED DE 1 2

27.89% 3rd inesc

10.2415/AH-TEL-MONO-DE-CLEF2009.INESC.RUN12

27.85% 4th trinity-dcu

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY-DCU.TCDDCUDE3

26.86% 5th trinity

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY.TCDDERUN1

25.77% Difference 11.30% Track Rank Participant Experiment DOI MAP English 1st chemnitz

10.2415/AH-TEL-BILI-X2EN-CLEF2009.CHEMNITZ.CUT 13 BILI MERGED DE2EN 9 10

40.46% 2nd hit

10.2415/AH-TEL-BILI-X2EN-CLEF2009.HIT.XTDD10T40

35.27% 3rd trinity

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY.TCDDEENRUN3

35.05% 4th trinity-dcu

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY-DCU.TCDDCUDEEN1

33.33% 5th karlsrhue

10.2415/AH-TEL-BILI-X2EN-CLEF2009.KARLSRUHE.DE INDEXBL

32.70% Difference 23.73% French 1st chemnitz

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHEMNITZ.CUT 24 BILI EN2FR MERGED LANG SPEC REF CUT 17

25.57% 2nd karlsrhue

10.2415/AH-TEL-BILI-X2FR-CLEF2009.KARLSRUHE.EN INDEXBL

24.62% 3rd chesire

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHESHIRE.BIENFRT2FB

16.77% 4th trinity

10.2415/AH-TEL-BILI-X2FR-CLEF2009.TRINITY.TCDDEFRRUN2

16.33% 5th weimar

10.2415/AH-TEL-BILI-X2FR-CLEF2009.WEIMAR.CLESA169283ENINFR

14.51% Difference 69.67% German 1st chemnitz

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHEMNITZ.CUT 5 BILI MERGED EN2DE 1 2

25.83% 2nd trinity

10.2415/AH-TEL-BILI-X2DE-CLEF2009.TRINITY.TCDENDERUN3

19.35% 3rd karlsrhue

10.2415/AH-TEL-BILI-X2DE-CLEF2009.KARLSRUHE.EN INDEXBL

16.46% 4th weimar

10.2415/AH-TEL-BILI-X2DE-CLEF2009.WEIMAR.COMBINEDFRINDE

15.75% 5th chesire

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHESHIRE.BIENDET2FBX

11.50% Difference 124.60%

slide-26
SLIDE 26

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

TEL: Approaches

17

Monolingual Bilingual

Multilinguality: not considered Structure: fields with different weights Models: vector space + multinomial language models based on Lucene Stop words N-grams Blind query expansion (Rocchio) Data fusion with linear combination Multilinguality: separate indexes for each language and multiple translations

  • f the query

Structure: not considered Models: vector space + divergence from randomness based on Lucene + Terrier Google Translate Stop words Snowball for English and French, N-grams for German Blind query expansion (top terms from top docs) Data fusion with Z-score

More at the talk

Multilinguality: not considered Structure: different field sets used for different collections Models: language models based on Lemur Stop words Stemmers (Porter and Lucene default

  • nes)

Latent Dirichlet Analisys for re-ranking Google Translate Multilinguality: not considered Structure: subset of the fields used Models: language models Stop words Blind query expansion Google Translate Multilinguality: not considered Structure: different field sets experimented Models: language models and BM25 based on Lemur Stop words Stemmers (Snowball) Blind query expansion Document expansion with subject headings (DDC) by using EVM Google Translate and MaTrEx (STMT) Multilinguality: separate indexes for each language at field level Structure: not considered Models: vector space + divergence from randomness based Terrier Stop words Stemmers (Snowball) Cross-Language Explicit Semantic Analysis (CL-ESA) exploiting Wikipedia Data fusion with linear combination and with Support Vector Machines (SVM)

More at the talk

Multilinguality: not considered Structure: not considered Models: probabilistic model (Okapi- like) Stop words (few terms) Lexicon-base inflectional stemmer and decompounding for German, Snowball stemmers, N-gram stemmers Multilinguality: multiple translations of the query Structure: subset of the fields used Models: vector space based on Lucene Lemmatization and named entity recognition Translation disambiguation by using corpus-based word space model via random indexing Internal translation resources and

  • nline dictionaries (Ergane)

Multilinguality: not considered Structure: subset of the fields used Models: logistic regression based on Chesire II LEC Power Translator Stop words Stemmer (Snowball) Blind query expansion (probabilistic relevance feedback, top 10 terms from top 10 docs)

Track Rank Participant Experiment DOI MAP English 1st inesc

10.2415/AH-TEL-MONO-EN-CLEF2009.INESC.RUN11

40.84% 2nd chemnitz

10.2415/AH-TEL-MONO-EN-CLEF2009.CHEMNITZ.CUT 11 MONO MERGED EN 9 10

40.71% 3rd trinity

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY.TCDENRUN2

40.35% 4th hit

10.2415/AH-TEL-MONO-EN-CLEF2009.HIT.MTDD10T40

39.36% 5th trinity-dcu

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY-DCU.TCDDCUEN3

36.96% Difference 10.50% French 1st karlsruhe

10.2415/AH-TEL-MONO-FR-CLEF2009.KARLSRUHE.INDEXBL

27.20% 2nd chemnitz

10.2415/AH-TEL-MONO-FR-CLEF2009.CHEMNITZ.CUT 19 MONO MERGED FR 17 18

25.83% 3rd inesc

10.2415/AH-TEL-MONO-FR-CLEF2009.INESC.RUN12

25.11% 4th

  • pentext

10.2415/AH-TEL-MONO-FR-CLEF2009.OPENTEXT.OTFR09TDE

24.12% 5th celi

10.2415/AH-TEL-MONO-FR-CLEF2009.CELI.CACAO FRBNF ML

23.61% Difference 15.20% German 1st

  • pentext

10.2415/AH-TEL-MONO-DE-CLEF2009.OPENTEXT.OTDE09TDE

28.68% 2nd chemnitz

10.2415/AH-TEL-MONO-DE-CLEF2009.CHEMNITZ.CUT 3 MONO MERGED DE 1 2

27.89% 3rd inesc

10.2415/AH-TEL-MONO-DE-CLEF2009.INESC.RUN12

27.85% 4th trinity-dcu

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY-DCU.TCDDCUDE3

26.86% 5th trinity

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY.TCDDERUN1

25.77% Difference 11.30% Track Rank Participant Experiment DOI MAP English 1st chemnitz

10.2415/AH-TEL-BILI-X2EN-CLEF2009.CHEMNITZ.CUT 13 BILI MERGED DE2EN 9 10

40.46% 2nd hit

10.2415/AH-TEL-BILI-X2EN-CLEF2009.HIT.XTDD10T40

35.27% 3rd trinity

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY.TCDDEENRUN3

35.05% 4th trinity-dcu

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY-DCU.TCDDCUDEEN1

33.33% 5th karlsrhue

10.2415/AH-TEL-BILI-X2EN-CLEF2009.KARLSRUHE.DE INDEXBL

32.70% Difference 23.73% French 1st chemnitz

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHEMNITZ.CUT 24 BILI EN2FR MERGED LANG SPEC REF CUT 17

25.57% 2nd karlsrhue

10.2415/AH-TEL-BILI-X2FR-CLEF2009.KARLSRUHE.EN INDEXBL

24.62% 3rd chesire

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHESHIRE.BIENFRT2FB

16.77% 4th trinity

10.2415/AH-TEL-BILI-X2FR-CLEF2009.TRINITY.TCDDEFRRUN2

16.33% 5th weimar

10.2415/AH-TEL-BILI-X2FR-CLEF2009.WEIMAR.CLESA169283ENINFR

14.51% Difference 69.67% German 1st chemnitz

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHEMNITZ.CUT 5 BILI MERGED EN2DE 1 2

25.83% 2nd trinity

10.2415/AH-TEL-BILI-X2DE-CLEF2009.TRINITY.TCDENDERUN3

19.35% 3rd karlsrhue

10.2415/AH-TEL-BILI-X2DE-CLEF2009.KARLSRUHE.EN INDEXBL

16.46% 4th weimar

10.2415/AH-TEL-BILI-X2DE-CLEF2009.WEIMAR.COMBINEDFRINDE

15.75% 5th chesire

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHESHIRE.BIENDET2FBX

11.50% Difference 124.60%

slide-27
SLIDE 27

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

TEL: Approaches

17

Monolingual Bilingual

Multilinguality: not considered Structure: fields with different weights Models: vector space + multinomial language models based on Lucene Stop words N-grams Blind query expansion (Rocchio) Data fusion with linear combination Multilinguality: separate indexes for each language and multiple translations

  • f the query

Structure: not considered Models: vector space + divergence from randomness based on Lucene + Terrier Google Translate Stop words Snowball for English and French, N-grams for German Blind query expansion (top terms from top docs) Data fusion with Z-score

More at the talk

Multilinguality: not considered Structure: different field sets used for different collections Models: language models based on Lemur Stop words Stemmers (Porter and Lucene default

  • nes)

Latent Dirichlet Analisys for re-ranking Google Translate Multilinguality: not considered Structure: subset of the fields used Models: language models Stop words Blind query expansion Google Translate Multilinguality: not considered Structure: different field sets experimented Models: language models and BM25 based on Lemur Stop words Stemmers (Snowball) Blind query expansion Document expansion with subject headings (DDC) by using EVM Google Translate and MaTrEx (STMT) Multilinguality: separate indexes for each language at field level Structure: not considered Models: vector space + divergence from randomness based Terrier Stop words Stemmers (Snowball) Cross-Language Explicit Semantic Analysis (CL-ESA) exploiting Wikipedia Data fusion with linear combination and with Support Vector Machines (SVM)

More at the talk

Multilinguality: not considered Structure: not considered Models: probabilistic model (Okapi- like) Stop words (few terms) Lexicon-base inflectional stemmer and decompounding for German, Snowball stemmers, N-gram stemmers Multilinguality: multiple translations of the query Structure: subset of the fields used Models: vector space based on Lucene Lemmatization and named entity recognition Translation disambiguation by using corpus-based word space model via random indexing Internal translation resources and

  • nline dictionaries (Ergane)

Multilinguality: not considered Structure: subset of the fields used Models: logistic regression based on Chesire II LEC Power Translator Stop words Stemmer (Snowball) Blind query expansion (probabilistic relevance feedback, top 10 terms from top 10 docs) Multilinguality: multiple translations of the query Structure: not considered Models: vector space Google Translate Stemmers (Snowball) Cross-Language Explicit Semantic Analysis (CL-ESA) relying on Wikipedia

More at the talk

Track Rank Participant Experiment DOI MAP English 1st inesc

10.2415/AH-TEL-MONO-EN-CLEF2009.INESC.RUN11

40.84% 2nd chemnitz

10.2415/AH-TEL-MONO-EN-CLEF2009.CHEMNITZ.CUT 11 MONO MERGED EN 9 10

40.71% 3rd trinity

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY.TCDENRUN2

40.35% 4th hit

10.2415/AH-TEL-MONO-EN-CLEF2009.HIT.MTDD10T40

39.36% 5th trinity-dcu

10.2415/AH-TEL-MONO-EN-CLEF2009.TRINITY-DCU.TCDDCUEN3

36.96% Difference 10.50% French 1st karlsruhe

10.2415/AH-TEL-MONO-FR-CLEF2009.KARLSRUHE.INDEXBL

27.20% 2nd chemnitz

10.2415/AH-TEL-MONO-FR-CLEF2009.CHEMNITZ.CUT 19 MONO MERGED FR 17 18

25.83% 3rd inesc

10.2415/AH-TEL-MONO-FR-CLEF2009.INESC.RUN12

25.11% 4th

  • pentext

10.2415/AH-TEL-MONO-FR-CLEF2009.OPENTEXT.OTFR09TDE

24.12% 5th celi

10.2415/AH-TEL-MONO-FR-CLEF2009.CELI.CACAO FRBNF ML

23.61% Difference 15.20% German 1st

  • pentext

10.2415/AH-TEL-MONO-DE-CLEF2009.OPENTEXT.OTDE09TDE

28.68% 2nd chemnitz

10.2415/AH-TEL-MONO-DE-CLEF2009.CHEMNITZ.CUT 3 MONO MERGED DE 1 2

27.89% 3rd inesc

10.2415/AH-TEL-MONO-DE-CLEF2009.INESC.RUN12

27.85% 4th trinity-dcu

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY-DCU.TCDDCUDE3

26.86% 5th trinity

10.2415/AH-TEL-MONO-DE-CLEF2009.TRINITY.TCDDERUN1

25.77% Difference 11.30% Track Rank Participant Experiment DOI MAP English 1st chemnitz

10.2415/AH-TEL-BILI-X2EN-CLEF2009.CHEMNITZ.CUT 13 BILI MERGED DE2EN 9 10

40.46% 2nd hit

10.2415/AH-TEL-BILI-X2EN-CLEF2009.HIT.XTDD10T40

35.27% 3rd trinity

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY.TCDDEENRUN3

35.05% 4th trinity-dcu

10.2415/AH-TEL-BILI-X2EN-CLEF2009.TRINITY-DCU.TCDDCUDEEN1

33.33% 5th karlsrhue

10.2415/AH-TEL-BILI-X2EN-CLEF2009.KARLSRUHE.DE INDEXBL

32.70% Difference 23.73% French 1st chemnitz

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHEMNITZ.CUT 24 BILI EN2FR MERGED LANG SPEC REF CUT 17

25.57% 2nd karlsrhue

10.2415/AH-TEL-BILI-X2FR-CLEF2009.KARLSRUHE.EN INDEXBL

24.62% 3rd chesire

10.2415/AH-TEL-BILI-X2FR-CLEF2009.CHESHIRE.BIENFRT2FB

16.77% 4th trinity

10.2415/AH-TEL-BILI-X2FR-CLEF2009.TRINITY.TCDDEFRRUN2

16.33% 5th weimar

10.2415/AH-TEL-BILI-X2FR-CLEF2009.WEIMAR.CLESA169283ENINFR

14.51% Difference 69.67% German 1st chemnitz

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHEMNITZ.CUT 5 BILI MERGED EN2DE 1 2

25.83% 2nd trinity

10.2415/AH-TEL-BILI-X2DE-CLEF2009.TRINITY.TCDENDERUN3

19.35% 3rd karlsrhue

10.2415/AH-TEL-BILI-X2DE-CLEF2009.KARLSRUHE.EN INDEXBL

16.46% 4th weimar

10.2415/AH-TEL-BILI-X2DE-CLEF2009.WEIMAR.COMBINEDFRINDE

15.75% 5th chesire

10.2415/AH-TEL-BILI-X2DE-CLEF2009.CHESHIRE.BIENDET2FBX

11.50% Difference 124.60%

slide-28
SLIDE 28

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

Persian

18

Bilingual is ¿5%? (was 92% in 2008)

  • f monolingual
slide-29
SLIDE 29

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

Persian: Approaches

19

Models: language models with smoothing between term frequencies and corpus frequencies Stop words N-grams and skip-grams Blind query expansion (top terms from 20 top docs)

Track Rank Participant Experiment DOI MAP Monolingual 1st jhu-apl

10.2415/AH-PERSIAN-MONO-FA-CLEF2009.JHU-APL.JHUFASK41R400TD

49.38% 2nd unine

10.2415/AH-PERSIAN-MONO-FA-CLEF2009.UNINE.UNINEPE4

49.37% 3rd

  • pentext

10.2415/AH-PERSIAN-MONO-FA-CLEF2009.OPENTEXT.OTFA09TDE

39.53% 4th qazviniau

10.2415/AH-PERSIAN-MONO-FA-CLEF2009.QAZVINIAU.IAUPERFA3

37.62% 5th – – –% Difference 31.25% Bilingual 1st qazviniau

10.2415/AH-PERSIAN-BILI-X2FA-CLEF2009.QAZVINIAU.IAUPEREN3

2.72% 2nd – – – 3rd – – – 4th – – – 5th – – – Difference –

slide-30
SLIDE 30

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

Persian: Approaches

19

Models: language models with smoothing between term frequencies and corpus frequencies Stop words N-grams and skip-grams Blind query expansion (top terms from 20 top docs) Models: vector space + BM25 + divergence from randomness Stemmers: plurals; light; regural expression Blind query expansion

Track Rank Participant Experiment DOI MAP Monolingual 1st jhu-apl

10.2415/AH-PERSIAN-MONO-FA-CLEF2009.JHU-APL.JHUFASK41R400TD

49.38% 2nd unine

10.2415/AH-PERSIAN-MONO-FA-CLEF2009.UNINE.UNINEPE4

49.37% 3rd

  • pentext

10.2415/AH-PERSIAN-MONO-FA-CLEF2009.OPENTEXT.OTFA09TDE

39.53% 4th qazviniau

10.2415/AH-PERSIAN-MONO-FA-CLEF2009.QAZVINIAU.IAUPERFA3

37.62% 5th – – –% Difference 31.25% Bilingual 1st qazviniau

10.2415/AH-PERSIAN-BILI-X2FA-CLEF2009.QAZVINIAU.IAUPEREN3

2.72% 2nd – – – 3rd – – – 4th – – – 5th – – – Difference –

slide-31
SLIDE 31

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

Persian: Approaches

19

Models: language models with smoothing between term frequencies and corpus frequencies Stop words N-grams and skip-grams Blind query expansion (top terms from 20 top docs) Models: vector space + BM25 + divergence from randomness Stemmers: plurals; light; regural expression Blind query expansion Models: probabilistic model (Okapi- like) Stop words (few terms) Savoy’s Persian Stemmer

Track Rank Participant Experiment DOI MAP Monolingual 1st jhu-apl

10.2415/AH-PERSIAN-MONO-FA-CLEF2009.JHU-APL.JHUFASK41R400TD

49.38% 2nd unine

10.2415/AH-PERSIAN-MONO-FA-CLEF2009.UNINE.UNINEPE4

49.37% 3rd

  • pentext

10.2415/AH-PERSIAN-MONO-FA-CLEF2009.OPENTEXT.OTFA09TDE

39.53% 4th qazviniau

10.2415/AH-PERSIAN-MONO-FA-CLEF2009.QAZVINIAU.IAUPERFA3

37.62% 5th – – –% Difference 31.25% Bilingual 1st qazviniau

10.2415/AH-PERSIAN-BILI-X2FA-CLEF2009.QAZVINIAU.IAUPEREN3

2.72% 2nd – – – 3rd – – – 4th – – – 5th – – – Difference –

slide-32
SLIDE 32

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

Persian: Approaches

19

Models: language models with smoothing between term frequencies and corpus frequencies Stop words N-grams and skip-grams Blind query expansion (top terms from 20 top docs) Models: vector space + BM25 + divergence from randomness Stemmers: plurals; light; regural expression Blind query expansion Models: probabilistic model (Okapi- like) Stop words (few terms) Savoy’s Persian Stemmer Models: language models based on Indri Automatic structured query construction via query Wikification: weighted list of Wiki concepts + anchors Perstemmer

Track Rank Participant Experiment DOI MAP Monolingual 1st jhu-apl

10.2415/AH-PERSIAN-MONO-FA-CLEF2009.JHU-APL.JHUFASK41R400TD

49.38% 2nd unine

10.2415/AH-PERSIAN-MONO-FA-CLEF2009.UNINE.UNINEPE4

49.37% 3rd

  • pentext

10.2415/AH-PERSIAN-MONO-FA-CLEF2009.OPENTEXT.OTFA09TDE

39.53% 4th qazviniau

10.2415/AH-PERSIAN-MONO-FA-CLEF2009.QAZVINIAU.IAUPERFA3

37.62% 5th – – –% Difference 31.25% Bilingual 1st qazviniau

10.2415/AH-PERSIAN-BILI-X2FA-CLEF2009.QAZVINIAU.IAUPEREN3

2.72% 2nd – – – 3rd – – – 4th – – – 5th – – – Difference –

slide-33
SLIDE 33

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

We need to do more

CLEF 2008 Conclusions

20

Encouraging participation to both tasks and interesting results have been achieved The experience gained this year will be very useful to further tune the tasks TEL Task:

traditional IR approaches seem to work well and achieve good results

  • nly two groups have exploited the inherent multilinguality of the data

almost no group has exploited the semi-structured nature of the data or used the subject headings

slide-34
SLIDE 34

CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα, Greece Nicola Ferro and Carol Peters

Last year objectives achieved

multilinguality of the collections has been investigated structure of the collection has been exploited

Coverage

Almost all the main IR models as well as implementations have been experimented

Impressive Bilingual to Monolingual performances

seems to be “stable” over the tested collections what about Google Translate?

CLEF 2009 Conclusions

21