ad hoc track overview the tel and persian tasks
play

Ad Hoc Track Overview: The TEL and Persian Tasks Carol Peters - PowerPoint PPT Presentation

CLEF 2009 Workshop September 30th - October 2nd 2009, , Greece Ad Hoc Track Overview: The TEL and Persian Tasks Carol Peters Nicola Ferro ISTI CNR, Italy University of Padua, Italy carol.peters@isti.cnr.it ferro@dei.unipd.it


  1. CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα , Greece Ad Hoc Track Overview: The TEL and Persian Tasks Carol Peters Nicola Ferro ISTI CNR, Italy University of Padua, Italy carol.peters@isti.cnr.it ferro@dei.unipd.it

  2. Participation Ad hoc TEL participants Participant Institution Country aeb Athens Univ. Economics & Business Greece celi CELI Research srl Italy chemnitz Chemnitz University of Technology Germany cheshire U.C.Berkeley United States 13 + 4 participants cuza Alexandru Ioan Cuza University Romania hit HIT2Lab, Heilongjiang Inst. Tech. China 11 countries inesc Tech. Univ. Lisbon Portugal karlsruhe Univ. Karlsruhe Germany opentext OpenText Corp. Canada qazviniau Islamic Azaz Univ. Qazvin Iran trinity Trinity Coll. Dublin Ireland trinity-dcu Trinity Coll. & DCU Ireland weimar Bauhaus Univ. Weimar Germany Ad hoc Persian participants Participant Institution Country jhu-apl Johns Hopkins Univ. United States opentext OpenText Corp. Canada qazviniau Islamic Azaz Univ. Qazvin Iran unine U.Neuchatel-Informatics Switzerland CLEF 2009 Workshop Nicola Ferro and Carol Peters 2 September 30th - October 2nd 2009, Κέρκυρα , Greece

  3. Participation by Country Germany Asia: 25,0% 12% Iran 6,3% Greece 6,3% Europe: China 6,3% 69% Ireland United States 12,5% 12,5% Italy Canada 6,3% America: Switzerland 6,3% Portugal Romania 6,3% 19% 6,3% 6,3% CLEF 2009 Workshop Nicola Ferro and Carol Peters 3 September 30th - October 2nd 2009, Κέρκυρα , Greece

  4. Submissions by Task and Language Task Chinese English Farsi French German Greek Italian Total TEL Mono English – 46 – – – – – 46 TEL Mono French – – – 35 – – – 35 TEL Mono German – – – – 35 – – 35 TEL Bili English 3 0 0 15 19 5 1 43 TEL Bili French 0 12 0 0 12 0 2 26 TEL Bili German 1 12 0 12 0 0 1 26 Mono Persian – – 17 – – – – 17 Bili Persian – 3 – – – – – 3 Total 4 73 17 62 66 5 4 231 French Mono FR 27% Farsi 15% Mono DE Greek 7% Mono EN 15% Chinese 2% 20% 2% Bili FA Bili EN 1% German 19% Mono FA English 29% 7% 32% Bili DE Bili FR Italian 11% 11% 2% CLEF 2009 Workshop Nicola Ferro and Carol Peters 4 September 30th - October 2nd 2009, Κέρκυρα , Greece

  5. TEL Task The task is to search and retrieve relevant items from collections of library catalog cards , which are surrogates for documents held by libraries Both monolingual and bilingual tasks have been offered Not only the data are very sparse and less rich than newspapers but also the task is different from a traditional ad-hoc task Is this article relevant to my Is the publication described by information need?” the bibliographic record relevant to my information need? ? ? CLEF 2009 Workshop Nicola Ferro and Carol Peters 5 September 30th - October 2nd 2009, Κέρκυρα , Greece

  6. TEL Collections The collections have been provided by The European Library ( http:// www.theeuropeanlibrary.org/ ) and are catalog records harvested from Europe’s national libraries English source: British Library (BL) size: 1,208,383,351 bytes items: 1,000,100 records French source: Bibliothèque Nationale de France (BnF) size: 1.362.122.091 bytes items: 1,000,100 records German source: Austrian National Library (ONB) size: 1.306.492.248 bytes items:869,353 records CLEF 2009 Workshop Nicola Ferro and Carol Peters 6 September 30th - October 2nd 2009, Κέρκυρα , Greece

  7. TEL Collections The collections have been provided by The European Library ( http:// www.theeuropeanlibrary.org/ ) and are catalog records harvested from Europe’s national libraries English source: British Library (BL) size: 1,208,383,351 bytes items: 1,000,100 records French source: Bibliothèque Nationale de France (BnF) size: 1.362.122.091 bytes items: 1,000,100 records German source: Austrian National Library (ONB) size: 1.306.492.248 bytes items:869,353 records CLEF 2009 Workshop Nicola Ferro and Carol Peters 6 September 30th - October 2nd 2009, Κέρκυρα , Greece

  8. TEL Collections: Distribution of the Languages English French German Spanish Russian Italian Latin Esperanto Other 70% TEL Collections are multilingual 63% 56% 49% 42% 35% 28% 21% 14% 7% TEL English (BL) TEL French (BnF) TEL German (ONB) CLEF 2009 Workshop Nicola Ferro and Carol Peters 7 September 30th - October 2nd 2009, Κέρκυρα , Greece

  9. TEL Collections: Distribution of the Content Title Subject Description Abstract TEL Collections 400% are sparse 360% 320% 280% 240% 200% 160% 120% 80% 40% TEL English (BL) 0% TEL French (BnF) TEL German (ONB) CLEF 2009 Workshop Nicola Ferro and Carol Peters 8 September 30th - October 2nd 2009, Κέρκυρα , Greece

  10. TEL Topics 50 topics have been developed in English, German, and French Additional translations to Chinese, Greek, and Italian have been provided upon request Topics consist of title and description only; the narrative contained information relevant only to assessors CLEF 2009 Workshop Nicola Ferro and Carol Peters 9 September 30th - October 2nd 2009, Κέρκυρα , Greece

  11. Persian Task For the first time, a non-European language target collection is part of the CLEF corpus Persian is an Indo-European language, spoken in Iran, Afghanistan and Tajikistan, known as Farsi . the Academy of Persian Language and Literature has declared the name “Persian” is more appropriate than “Farsi” Persian uses challenging script , which is a modified version of the Arabic alphabet with elision of short vowels and is written from right to left Persian morphology is complex and makes extensive use of suffixes and compounding The task has been organized together with the Data Base Research Group (DBRG) of the University of Tehran which provided the Hamshahri corpus Both monolingual and bilingual tasks have been offered CLEF 2009 Workshop Nicola Ferro and Carol Peters 10 September 30th - October 2nd 2009, Κέρκυρα , Greece

  12. Persian Collection The Hamshahri corpus is a newspaper corpus with news articles from 1996 to 2002, made available by the DBRG of University of Teheran ( http:// ece.ut.ac.ir/dbrg/ hamshahri/ ) News article are categorized both in Persian and English It consists of: size: 628,471,252 bytes items:166,774 documents CLEF 2009 Workshop Nicola Ferro and Carol Peters 11 September 30th - October 2nd 2009, Κέρκυρα , Greece

  13. Persian Topics 50 topics have been developed in Persian and translated to English Topics consist of title , description , and narrative When translating topics, the attempt is to render them as naturally as possible. This was a particularly difficult task when going from Persian to English as cultural differences had to be catered for CLEF 2009 Workshop Nicola Ferro and Carol Peters 12 September 30th - October 2nd 2009, Κέρκυρα , Greece

  14. Pool Statistics TEL English Pool (DOI 10.2454/AH-TEL-ENGLISH-CLEF2009 ) 26,190 pooled documents en 50 docs/topic • 23,663 not relevant documents Pool size fa • 2,527 relevant documents 50 topics 31 out of 89 submitted experiments • monolingual: 22 out of 43 submitted experi- Pooled Experiments ments • bilingual: 9 out of 46 submitted experiments Assessors 4 assessors TEL French Pool (DOI 10.2454/AH-TEL-FRENCH-CLEF2009 ) 21,971 pooled documents fr 37 docs/topic • 20,118 not relevant documents Pool size • 1,853 relevant documents 50 topics 21 out of 61 submitted experiments • monolingual: 16 out of 35 submitted experi- Pooled Experiments ments • bilingual: 5 out of 26 submitted experiments Assessors 1 assessor TEL German Pool (DOI 10.2454/AH-TEL-GERMAN-CLEF2009 ) en 25,541 pooled documents de 31 docs/topic • 23,882 not relevant documents fr Pool size • 1,559 relevant documents de 50 topics 21 out of 61 submitted experiments • monolingual: 16 out of 35 submitted experi- Pooled Experiments ments • bilingual: 5 out of 26 submitted experiments Assessors 2 assessors Persian Pool (DOI 10.2454/AH-PERSIAN-CLEF2009 ) 23,536 pooled documents fa 89 docs/topic • 19,072 not relevant documents Pool size • 4,464 relevant documents 50 topics 20 out of 20 submitted experiments • monolingual: 17 out of 17 submitted experi- Pooled Experiments ments • bilingual: 3 out of 3 submitted experiments Assessors 23 assessors CLEF 2009 Workshop Nicola Ferro and Carol Peters 13 September 30th - October 2nd 2009, Κέρκυρα , Greece

  15. TEL English Bilingual is 99% (was 91% in 2008) of monolingual CLEF 2009 Workshop Nicola Ferro and Carol Peters 14 September 30th - October 2nd 2009, Κέρκυρα , Greece

  16. TEL French Bilingual is 94% (was 57% in 2008) of monolingual CLEF 2009 Workshop Nicola Ferro and Carol Peters 15 September 30th - October 2nd 2009, Κέρκυρα , Greece

  17. TEL German Bilingual is 90% (was 53% in 2008) of monolingual CLEF 2009 Workshop Nicola Ferro and Carol Peters 16 September 30th - October 2nd 2009, Κέρκυρα , Greece

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend