The Importance of Evaluation for Multilingual Information Retrieval - - PowerPoint PPT Presentation

the importance of evaluation for multilingual information
SMART_READER_LITE
LIVE PREVIEW

The Importance of Evaluation for Multilingual Information Retrieval - - PowerPoint PPT Presentation

The Importance of Evaluation for Multilingual Information Retrieval Carol Peters ISTI-CNR, Pisa, Italy FIRE 2011 IIT Bombay, 2-4 December 2011 From FIRE 2008 to FIRE 2010 FIRE 2008 CLEF: Objectives and First Results FIRE 2010 10


slide-1
SLIDE 1

The Importance of Evaluation for Multilingual Information Retrieval

Carol Peters ISTI-CNR, Pisa, Italy

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-2
SLIDE 2

From FIRE 2008 to FIRE 2010

FIRE 2008

  • CLEF: Objectives and First Results

FIRE 2010

  • 10 Years of CLEF: An Assessment
  • What we’ve done
  • What we’ve learned
  • What the next steps should be

FIRE 2011

  • Exploiting the Results for MLIR System Building

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-3
SLIDE 3

In IR the role of an evaluation campaign is to:

  • Identify priority areas for research:
  • evaluation permits hypotheses to be validated and progress

assessed

  • Support system development and testing
  • evaluation saves developers time and money
  • 1997 – First MLIR/CLIR system evaluation campaigns in

US and Japan: TREC and NTCIR

  • 2000 – MLIR/CLIR evaluation in Europe: CLEF (extension
  • f CLIR track at TREC)
  • 2008 – FIRE: MLIR/CLIR evaluation for Indian languages

MLIR/CLIR System Evaluation

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-4
SLIDE 4

These evaluation initiatives:

  • Promote research
  • Encourage creation of multi-disciplinary

communities

  • Produce vast amounts of valuable scientific

data

  • Favour understanding of issues involved in

successful system development

Results

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-5
SLIDE 5
slide-6
SLIDE 6
  • The Need for MLIR/CLIR?
  • What are the Challenges?
  • What is the Contribution of Evaluation?
  • The Example of CLEF

Outline

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-7
SLIDE 7
  • Web is an important platform for knowledge

dissemination and acquisition

  • User information needs are increasingly varied
  • From primarily academic use to widespread

commercial, leisure, educational, entertainment etc. uses

  • Content is available in many languages and

non-English content is growing rapidly

  • Information providers and seekers

should have equal opportunities

  • Preservation of national languages

MLIR in the Information Society

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-8
SLIDE 8

The Need for Multilingual Search

http://www.internetworldstats.com/stats.htm

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-9
SLIDE 9

Country Population Internet Users 2000 Internet Users 2011 Penetration % of Pop. China

1,336,718,015 22,5000.000 485,000,000 36.3%

United States

313,232,044 95,354,000 245,000,000 78.2%

India

1,189,172,906 5,000,000 100,000,000 8.4%

Japan

126,475,664 47,080,000 99,182,000 78.4%

Brazil

203,429,773 5,000,000 75,982,000 37.4%

Germany

81,471,834 24,000,000 65,125,000 79.9%

Russia

138,739,892 3,100,000 59,700,000 43.0%

UK

62,698,362 15,400,000 51,442,100 82.0%

France

65,102,719 8,500,000 45,262,000 69.5%

Nigeria

155,215,573 20,000 43,982,200 28.3%

Countries with most Internet Users

http://www.internetworldstats.com/top20.htm

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-10
SLIDE 10

Country Population Internet Users 2000 Internet Users 2011 Penetration % of Pop. China

1,336,718,015 22,5000.000 485,000,000 36.3%

United States

313,232,044 95,354,000 245,000,000 78.2%

India

1,189,172,906 5,000,000 100,000,000 8.4%

Japan

126,475,664 47,080,000 99,182,000 78.4%

Brazil

203,429,773 5,000,000 75,982,000 37.4%

Germany

81,471,834 24,000,000 65,125,000 79.9%

Russia

138,739,892 3,100,000 59,700,000 43.0%

UK

62,698,362 15,400,000 51,442,100 82.0%

France

65,102,719 8,500,000 45,262,000 69.5%

Nigeria

155,215,573 20,000 43,982,200 28.3%

Countries with most Internet Users

http://www.internetworldstats.com/top20.htm

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-11
SLIDE 11
  • Concerns the storage, access, retrieval and

presentation of digital information in any of the world's languages.

  • Main areas of interest:
  • enabling technology (character encoding, scripts,

internationalisation, localisation)

  • multiple language access, browsing, retrieval,

display

  • Crossing the language boundary (filtering, merging,

ranking, selecting, presenting results)

MLIR related research

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-12
SLIDE 12

FIRE 2011 IIT Bombay, 2-4 December 2011

The Terminology

  • Multilingual Information Access (MLIA)
  • Accessing, querying and retrieving information

from collections in any language (covering basic enabling techniques and including MLIR and CLIR)

  • Multilingual Information Retrieval (MLIR)
  • Information retrieval in multiple languages

(includes CLIR)

  • Cross-Language Information Retrieval (CLIR)
  • Querying multilingual collections in one language in
  • rder to retrieve documents in other languages
slide-13
SLIDE 13

Fully multilingual, multimodal IR systems

  • capable of processing a query in any medium and

any language

  • finding relevant information from a multilingual

multimedia collection containing documents in any language and form

  • and presenting it in style most likely to be useful

to the user

The Grand Challenge

Oard & Hull , AAAI Spring Symposium, Stanford 1997

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-14
SLIDE 14

MLIR/CLIR System Development is Complex

  • There are 6,800 known languages spoken in 200 countries
  • ca 2,250 have writing systems (the others are only spoken)
  • Just 300 have some kind of language processing tools

MLIR/CLIR System development involves integrating IR techniques with Language Processing tools and Language Transfer mechanisms

slide-15
SLIDE 15
  • Multilingual Portals (Localization)
  • How many languages / how many levels should be

multilingual / how to handle updates /linguistic and cultural dependent issues

  • Monolingual Search for Multiple Languages
  • encoding and representation issues / language

identification / indexing issues (stop words, stemmers, morphological analysers, named entity recognition, ..)

  • Cross-Language Search
  • translation resources (lexicons, corpora, MT systems)
  • Presentation of Results
  • in form interpretable and exploitable by user

MLIR/CLIR System Development is Complex

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-16
SLIDE 16
  • Understanding Search in the Multilingual Context

(language & culture)

  • Globalisation (internationalisation & localisation)
  • MLIR/CLIR System Development
  • Language processing tools
  • Best retrieval mechanisms (indexing, matching, merging)
  • Best translation resources
  • From text to multimodal retrieval
  • Providing effective user support
  • Going from Research to Practice

Main Challenges

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-17
SLIDE 17
  • Understanding Search in the Multilingual Context

(language & culture)

  • Globalisation (internationalisation & localisation)
  • MLIR/CLIR System Development
  • Language processing tools
  • Best retrieval mechanisms (indexing, matching, merging)
  • Best translation resources
  • From text to multimodal retrieval
  • Providing effective user support
  • Going from Research to Practice

Main Challenges

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-18
SLIDE 18
  • Understanding Search in the Multilingual Context

(language & culture)

  • Globalisation (internationalisation & localisation)
  • MLIR/CLIR System Development
  • Language processing tools
  • Best retrieval mechanisms (indexing, matching, merging)
  • Best translation resources
  • From text to multimodal retrieval
  • Providing effective user support
  • Going from Research to Practice

Main Challenges

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-19
SLIDE 19
  • Pre-process & index both documents and queries –

generally using language dependent techniques (tokenisation, stopwords, stemming, morphological analysis, decompounding, etc.)

  • Translate: queries or documents (or both)
  • Translation resources
  • Machine Translation (MT)
  • Parallel/comparable corpora
  • Bilingual Dictionaries
  • Multilingual Thesauri
  • Conceptual Interlingua
  • Find relevant documents in target collection(s) &

present results

Building a CLIR System

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-20
SLIDE 20
  • Language identification
  • Morphology: inflection, derivation, compounding, …
  • OOV terms, e.g. proper names, terminology
  • Multi-word concepts, e.g. phrases and idioms
  • Ambiguity, e.g. polysemy
  • Handling many languages: L1 -> Ln
  • Merging results from different sources / media
  • Presenting the results in useful fashion

Main CLIR Difficulties (I)

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-21
SLIDE 21
  • CLIR system need clever pre-processing of target

collections (e.g. semantic analysis, classification, information extraction)

  • CLIR systems need intelligent post-processing of

results: merging/ summarization / translation

  • CLIR systems need well-developed resources
  • Language Processing Tools
  • Language Resources
  • Resources are expensive to acquire, maintain, update

Main CLIR Difficulties (II)

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-22
SLIDE 22

CLIR for Multimedia

  • Retrieval from a mixed media collection is non- trivial

problem

  • Different media processed in different ways and suffer

from different kinds of indexing errors:

  • spoken documents indexed using speech recognition
  • handwritten documents indexed using OCR
  • images indexed using significant features
  • Need for complex integration of multiple technologies
  • Need for merging of results from different sources

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-23
SLIDE 23

Clough October 2011

Supporting the User

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-24
SLIDE 24

MLIR/CLIR System Evaluation is Complex

  • Need to evaluate single components
  • Need to evaluate overall system performance
  • Need to distinguish CL aspects from IR issues

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-25
SLIDE 25

Objectives of CLEF

Promote research and stimulate development of multilingual IR systems, through

  • Creation of evaluation infrastructure and organisation
  • f regular evaluation campaigns for system and

component testing

  • Building of an MLIA/CLIR research community
  • Construction of publicly available test-suites

The Vision

Step-by-step promote the development of truly multilingual, multimodal systems

Cross Language Evaluation Forum

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-26
SLIDE 26

Evolution of CLEF

CLEF 2000 TEXT

  • mono-, bi- & multilingual text doc retrieval (Ad Hoc)
  • mono- and cross-language information on structured

scientific data (Domain-Specific) CLEF 2001 USER NEEDS

  • interactive cross-language retrieval (iCLEF)

CLEF 2002 SPEECH

  • cross-language spoken document retrieval (CL-SR)

CLEF 2003 IMAGE & QA

  • multiple language question answering (QA@CLEF)
  • cross-language retrieval in image collections (ImageCLEF)

CLEF 2005 WEB & GIR

  • multilingual retrieval of Web documents (WebCLEF)
  • cross-language geographical retrieval (GeoCLEF)

CLEF 2008 VIDEO

  • cross-language video retrieval (VideoCLEF)
  • multilingual information filtering (INFILE@CLEF)

CLEF 2009 USERS & APPLICATIONS

  • intellectual property (CLEF-IP)
  • log file analysis (LogCLEF)
  • large-scale grid experiments (Grid@CLEF)

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-27
SLIDE 27

CLEF Tracks: 2000 - 2009

slide-28
SLIDE 28

AdHoc Track: Promotes development of mono and cross-language text retrieval systems

  • AdHoc 2000-2007 European news documents:

increasingly complex & diverse tasks

  • Monolingual – Bilingual – Multilingual
  • AdHoc 2008-2009: Non-European news docs; library

catalog archives

  • Advanced Tasks – using previously built test collections
  • Multilingual 2 yrs on / merging
  • Robust – measuring stable performance

Advancing State-of-Art through Evaluation: AdHoc

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-29
SLIDE 29

Ad Hoc: Importance of Monolingual IR

  • Need to understand processing requirements of

all languages to be queried, eg morphology, syntax, segmentation, special features

  • Need to adopt best approach per languages
  • CLEF test collection includes wide variety of

European language types

  • Germanic: Dutch, English, German, Swedish
  • Romance: French, Italian, Portuguese, Spanish
  • Slavic: Russian, Bulgarian, Czech
  • Non-IndoEuropean: Finnish, Hungarian

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-30
SLIDE 30

AdHoc: Growth in Target Languages

2 4 6 8 10 12 14 2000 2001 2002 2003 2004 2005 2006 2007 2008 Czech Hungarian Bulgarian Portuguese Russian Swedish Finnish Dutch Spanish Italian German French

Donna Harman: CLEF2009 Workshop

slide-31
SLIDE 31

Ad Hoc Track 2000-2007: Bilingual & Multilingual Tasks

  • Tasks made increasingly difficult over the years
  • CLEF 2003 - 2 multilingual tasks
  • Small-multilingual: 4 “core” language

(EN,ES,FR,DE)

  • Large-multilingual: 8 languages (+FI,IT,NL,SV)
  • Bilingual: “unusual” language combinations
  • IT -> ES

FR -> NL

  • DE -> IT

FI -> DE

  • x -> RU Newcomers only: x -> EN
  • CLEF 2007: Non-European topic languages
  • AM/ID/OR/ZH→ EN
  • BN/HI/MR/TA/TE→ EN

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-32
SLIDE 32

AdHoc

Monolingual Bilingual Multilingual CLEF2000

DE;FR;IT X→EN X→DE;EN;FR;IT

CLEF2001

DE;ES;FR;IT;NL X→EN, X→NL X→DE;EN;ES;FR;IT

CLEF2002

DE;ES;FI;FR IT;NL;SV X→DE;ES;FI;FR;IT;NL;SV X→EN(newcomer) X→DE;EN;ES;FR;IT

CLEF2003

DE;ES;FI;FR IT;NL;RU;SV IT→ES;DE→IT FR→NL;FI→DE X→RU;X→EN X→DE;EN;ES;FR X→DE;EN;ES;FI FR;IT;NL;SV

CLEF2004

FI;FR;RU;PT ES/FR/IT/RU→FI DE/FI/NL/SV→FR X→RU;X→EN X→FI;FR;RU;PT

CLEF2005

BG;FR;HU;PT X→ BG;FR;HU;PT EX →EN Multi8 2yrson Multi8 merge

CLEF2006

BG;FR;HU;PT X→ BG;FR;HU;PT X →EN ROBUST:X→DE;EN;ES; FR;NL

CLEF2007

BG, CZ, HU ROBUST: EN;FR;PT X→ BG;CZ;HU; AM/ID/OR/ZH→ EN BN/HI/MR/TA/TE→ EN ROBUST: X→EN;FR;PT

CLEF2008 & CLEF2009

FA TEL: DE; EN; FR ROBUST: WSD EN EN→FA TEL: x→DE;EN;FR ROBUST: WSD Es →EN

Growth in Complexity

slide-33
SLIDE 33

Advancing State-of-Art for Document Retrieval Quantifiable improvements in performance of CLIR systems

  • TREC-6 1997
  • EN→FR: 49%; EN→DE: 64% of best monolingual FR

& DE retrieval

  • CLEF-recent
  • 2008: EN →IR: 92% of best mono retrieval
  • 2008: EN →IR: 92% of best mono retrieval
  • 2009 TEL: X→EN 99%; X→DE 90%; X→FR 94% of

best mono retrieval

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-34
SLIDE 34

Advancing State-of-Art: Summary

  • Investigation of core issues in MLIR/CLIR
  • development of multiple language processing tools
  • creation of linguistic resources
  • analysis of user behaviour
  • implementation of appropriate cross-language retrieval

models and algorithms for different tasks and languages

  • Development of MLIR/CLIR Systems
  • From Bilingual to Multilingual
  • From document retrieval to information extraction
  • From free text to structured text
  • From Mono-media to Multimedia

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-35
SLIDE 35
  • CLEF multilingual comparable corpus of more than 3M news docs in

15 languages: BG,CZ,DE,EN,ES,EU,FI,FR,HU,IT,NL,RU,SV,PT and Persian

  • The European Library Data in DE, EN, FR (>3M docs)
  • GIRT-4 social science database in EN and DE, Russian ISISS

collection; Cambridge Sociological Abstracts

  • Online Flickr database
  • IAPR TC-12 photo database (20,000 images, captions in EN, DE);
  • ARRS Goldminer database (200,000 medical images)
  • IRMA: 10,000 images for automatic medical image annotation
  • INEX Wikipedia image collection (150,000 images)
  • Very large multilingual collection of Web docs (EuroGov)
  • Malach spontaneous speech collection – EN & CZ (Shoah archives)
  • Dutch / English documentary TV videos
  • Agence France Press (AFP) newswire in Arabic, French & English
  • Patent documents from the European Patent Office

CLEF Achievements at 10 yr milestone: Test Collections

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-36
SLIDE 36

New Vision & Tools for Comparative System Evaluation

slide-37
SLIDE 37

Development of methodology, tools, resources:

  • Language resources: stopword lists, dictionaries, lexicons,

parallel & aligned corpora

  • Linguistic components: stemmers, lemmatizers, PoS

taggers, decompounders

  • Translation approaches: MT, dictionary-based, corpora-

based, conceptual networks

  • IR model testing: boolean, vector space probabilistic,

language models

  • Advanced IR approaches: data fusion, query expansion,

relevance feedback

  • Interface issues: user assistance in query formulation &

results presentation in multilingual context

CLEF Achievements at 10 year milestone

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-38
SLIDE 38

Best Practices for MLIR/CLIR

(Exploiting CLEF experience, results, knowhow to provide guidelines for usrs and developers)

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-39
SLIDE 39

TrebleCLEF Best Practices and Guidelines

Best Practices White Papers produced, exploiting CLEF experience, results, knowhow to provide guidelines for users and developers:

  • Best Practices for Language Resources for MLIA,

Nicolas Moreau, ELDA

  • Best Practices for System- and User-oriented MLIA,

Martin Braschler, Zurich

  • Univ. Applied Technology & Julio Gonzalo, UNED

Madrid

  • Best Practices for Test Collection Creation and

Evaluation Methodologies Mark Sanderson, Univ. Sheffield &

Martin Braschler, Zurich Univ. Applied Technology

Designed for practitioners rather than for academics

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-40
SLIDE 40

Objectives

  • Enable system developers identify and find the tools and

resources they need when implementing MLIR/CLR system functionality

  • Foster Collaborative resource creation
  • Foster creation of common pools of resources
  • E.g. for specific less-resourced languages
  • Foster dissemination of resources after evaluation

campaigns and projects are over

Best Practices in MLIR/CLIR Language Resources

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-41
SLIDE 41
  • Practical guidelines for evaluating in an efficient and cost-effective way
  • Aims at bridging gap between academic research community and practitoners
  • Provides information on:
  • Necessary resources, procedures and methods
  • Lists questions to consider when evaluating:
  • What is the purpose of the evaluation?
  • What resources are available to conduct the evaluation?
  • What do you know about the IR system being tested?
  • Measuring IR system effectiveness
  • Comparing Results & Using Significance Tests

Best Practices in Test Collection Creation

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-42
SLIDE 42

Result Index Indexing Query Indexing Matching Document representation Query representation Wirtschaft Wirtschaft Result Query representation Translation Documents Documents Translation Document representation Translation Result

Recommendations in three parts:

  • Indexing
  • Translation
  • Matching

Best Practices in MLIR/CLIR System-Oriented Aspects

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-43
SLIDE 43

Best Practices: Indexing for MLIR/CLIR

Use weighted & ranked retrieval copes with translation error Use Unicode/XML covers different scripts Use minimal stopword elimination keep maximum information Remove diacritics, special characters tolerant towards inconsistent spelling Use stemming covers different word forms Use decompounding for languages with poductive compound formation tolerant towards different phrasings Use character n-grams when stemming resources not available helps with languages with scarce language resources

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-44
SLIDE 44

Major Results: Translation & Matching

Maximize coverage of translation resources reduces retrieval failure due to missing translations Use document translation to solve merging problems if combined results in multiple languages are needed Combine different types of translation resources minimizes mistranslations inherent to the individual resources Use an interlingua when direct translation resources unavailable covers language pairs with no direct translation resources Use high-performing weighting schemes, eg Okapi/BM25, LM, DFR or lnu.ltn weighting schemes with robust performance over different types of text Use pseudo-relevance feedback boosts recall (coverage of results)

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-45
SLIDE 45

Main Recommendations

Blueprint for Effective System

  • Effective, well-tuned monolingual retrieval for as

many languages as possible

  • Combination of different sources of translation

information

  • Merging of multiple, well-tuned bilingual results

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-46
SLIDE 46

Best Practices in MLIA User-oriented Aspects

Recommendations in three parts:

  • Cross-Language

Document Selection

  • Query Translation &

Refinement

  • Personalization

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-47
SLIDE 47

Personalization

Allow user to specify language skills and translation preferences in user profile

slide-48
SLIDE 48

Help the user choose right translation!

slide-49
SLIDE 49

Recommendations

  • Include user-assisted query translation facilities
  • But do not show them by default
  • Indirect user-assisted query translation without target-

language inspection is preferable

  • Link structured sources that help mapping the meaning
  • f the query, e.g. named entities, Wikipedia, synonyms,

KWIC

  • Query translation, document translation & assited query

translation facilities must fit together

Query Translation & Refinement

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-50
SLIDE 50
  • Understanding Search in the Multilingual Context

(language & culture)

  • Globalisation (internationalisation & localisation)
  • MLIR/CLIR System Development
  • Language processing tools
  • Best retrieval mechanisms (indexing, matching, merging)
  • Best translation resources
  • From text to multimodal retrieval
  • Providing effective user support
  • Going from Research to Practice

Main Challenges

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-51
SLIDE 51

Search in the Multilingual Context

(includes culture and language)

Peters, Braschler, Clough, 2011

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-52
SLIDE 52

What is Culture?

  • Language is most direct expression of culture; it

is what makes us human and what gives each of us a sense of identity (EC 2005)

  • Members of the same culture are likely to have the

same knowledge of certain things and would think and act similarly in certain situations

  • Cultural aspects to consider include
  • Religion, customs, colours, metaphors, icons and

flags, and language

  • Crossing language boundaries implies crossing

cultural boundaries

  • Localisation of the search user interface…. But so

much more

Clough October 2011

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-53
SLIDE 53
  • Understanding Search in the Multilingual Context

(language & culture)

  • Globalisation (internationalisation & localisation)
  • MLIR/CLIR System Development
  • Language processing tools
  • Best retrieval mechanisms (indexing, matching, merging)
  • Best translation resources
  • From text to multimodal retrieval
  • Providing effective user support
  • Going from Research to Practice

Main Challenges

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-54
SLIDE 54

Europeana

slide-55
SLIDE 55

From the Lab to the Market Place

If research has been successful and if the problem is (nearly) solved, then Why are there so few commercial systems ?

Peters, Braschler, Clough, 2011

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-56
SLIDE 56

Challenges of Integration into the Enterprise

  • Search system must run on a single ‘off-the-shelf’ server
  • System must be easily integrated into the client’s platform
  • Response times even for complex queries must be fast (<2 s)
  • Scalability problems must be resolved (CLIR queries are

typically several times larger than in monolingual search)

  • Easy tuning of parameters to achieve precision
  • High quality translation of results and presentation according

to customers’ requirements

  • The expected costs for customer support, integration and

maintenance must be low

  • The necessary lexical and translation resources must be easy

to acquire and easy to optimise to meet client’s needs

Peters, Braschler, Clough, 2011

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-57
SLIDE 57

Summing up

  • Evaluation provides opportunity to test, tune, and

compare approaches in order to improve system performance

  • Evaluation campaigns promote research
  • Evaluation campaigns produce huge amounts of valuable

experimental data

  • Evaluation campaigns create communities interested in

examining the same issues and comparing ideas and experiences

  • BUT

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-58
SLIDE 58

Future Directions

  • Search systems are used interactively , evaluation

must also consider multilingual interface design

  • multilingual search is NOT just an engineering

problem; study impact of users’ cultural background and language skills

  • Multilingual searching is part of wider information

seeking activities

  • The multilingual functionality must be integrated into a

larger application

  • Research has shown we can do cross-language

search in multiple languages well

  • the challenge is going from research to practice

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-59
SLIDE 59

Summing Up

Never forget the Vision

FIRE 2011 IIT Bombay, 2-4 December 2011

slide-60
SLIDE 60