Dictionary and Monolingual Corpus-based Query Translation for - - PowerPoint PPT Presentation

dictionary and monolingual corpus based query translation
SMART_READER_LITE
LIVE PREVIEW

Dictionary and Monolingual Corpus-based Query Translation for - - PowerPoint PPT Presentation

Introduction Related work Proposed query translation method Evaluation Conclusions References Dictionary and Monolingual Corpus-based Query Translation for Basque-English CLIR Xabier Saralegi Maddalen Lpez de Lacalle R&D Elhuyar


slide-1
SLIDE 1

Introduction Related work Proposed query translation method Evaluation Conclusions References

Dictionary and Monolingual Corpus-based Query Translation for Basque-English CLIR

Xabier Saralegi Maddalen López de Lacalle

R&D Elhuyar Foundation

7th international conference on Language Resources and Evaluation LREC 2010, Valletta, Malta 2010/05/20

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-2
SLIDE 2

Introduction Related work Proposed query translation method Evaluation Conclusions References

Outline

1

Introduction

2

Related work Different Strategies CLIR Frameworks based on query translation

3

Proposed query translation method Experimental setup Treatment of OOV words MWE Translation Selection

4

Evaluation

5

Conclusions

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-3
SLIDE 3

Introduction Related work Proposed query translation method Evaluation Conclusions References Introduction

Introduction: Motivation

CLIR = IR + language barrier Most CLIR technology based on Machine Translation Systems (MTS) or Parallel Corpora (PC)

MTS and PC resources expensive or scarce for most pair of languages, specially for small languages

Bilingual dictionaries easier to obtain

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-4
SLIDE 4

Introduction Related work Proposed query translation method Evaluation Conclusions References Introduction

Introduction: Bilingual Dictionaries

Problems: Translation ambiguity, Out-of-Vocabulary words, Multi Word Expressions Example Query 80:

EU: “G7 gailurrean Napolin Errusiak jokatutako papera“ EN: “role played by Russia in the G7 summit in Naples in 1994” papera : paper, role. . .

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-5
SLIDE 5

Introduction Related work Proposed query translation method Evaluation Conclusions References Introduction

Introduction: Bilingual Dictionaries

Problems: Translation ambiguity, Out-of-Vocabulary words, Multi Word Expressions Example Query 46:

EU: “Irakeko bahitura ” EN: “Embargo on Iraq”

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-6
SLIDE 6

Introduction Related work Proposed query translation method Evaluation Conclusions References Introduction

Introduction: Bilingual Dictionaries

Problems: Translation ambiguity, Out-of-Vocabulary words, Multi Word Expressions Example Query 47:

EU: “Errusiarren esku hartzea Txetxenian” EN: “Russian intervention in Chechnya”

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-7
SLIDE 7

Introduction Related work Proposed query translation method Evaluation Conclusions References Introduction

Introduction: Objectives

Objetives of this work

To analyse how each problem affects retrieval performance of a dictionary-based Basque-English CLIR system To evaluate methods not based on parallel corpora to treat those problems

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-8
SLIDE 8

Introduction Related work Proposed query translation method Evaluation Conclusions References Different Strategies CLIR Frameworks based on query translation

Outline

1

Introduction

2

Related work Different Strategies CLIR Frameworks based on query translation

3

Proposed query translation method Experimental setup Treatment of OOV words MWE Translation Selection

4

Evaluation

5

Conclusions

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-9
SLIDE 9

Introduction Related work Proposed query translation method Evaluation Conclusions References Different Strategies CLIR Frameworks based on query translation

Different Strategies

Translate → collection or queries?

Collection → richer context for translation selection (Oard, 1998) Query → most studied because it is more scalable (Hull and Grefenstette, 1996) Best results: Translating both, merging corresponding rankings (McCarley, 1999)(Wang and Oard, 2003)

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-10
SLIDE 10

Introduction Related work Proposed query translation method Evaluation Conclusions References Different Strategies CLIR Frameworks based on query translation

Outline

1

Introduction

2

Related work Different Strategies CLIR Frameworks based on query translation

3

Proposed query translation method Experimental setup Treatment of OOV words MWE Translation Selection

4

Evaluation

5

Conclusions

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-11
SLIDE 11

Introduction Related work Proposed query translation method Evaluation Conclusions References Different Strategies CLIR Frameworks based on query translation

CLIR Frameworks based on query translation

(a) Post-translation Relevance Model (PTRM) The query is translated independientely and then a relevance model is used Query terms translated with PC or dict.

PC solves translation selection Dict.: co-occurrence based method for solving selection (Monz and Dorr, 2005) (Gao et al., 2002)

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-12
SLIDE 12

Introduction Related work Proposed query translation method Evaluation Conclusions References Different Strategies CLIR Frameworks based on query translation

CLIR Frameworks based on query translation

(b) Cross-lingual probabilistic relevance models (CLPRM) Translation process included in relevance model Query terms translated by PC or dict. All candidates are treated as a single token (Pirkola, 1998), or pondered with weights mined from PC (Darwish and Oard, 2003)

  • r comparable corpora (Saralegi and Lopez de Lacalle, 2010)

TFj(si) =

{k|Dk ∈T(si)}

TFj(Dk) DF(Qi) = |∪{k|Dk ∈T(Qi)} {d|Dk ∈ d}|

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-13
SLIDE 13

Introduction Related work Proposed query translation method Evaluation Conclusions References Different Strategies CLIR Frameworks based on query translation

CLIR Frameworks based on query translation

(c) Cross-lingual language models (CLLM) Translation process included in relevance model Query terms translations PC or dict. Translation probabilities are included in a probabilistic model (Xu, Weischedel, and Nguyen, 2001)

P(Qs|Dt) = ∏

w∈Qs

(((1−λ)P(w|Gs))+λ( ∑

t∈Dt

P(t|Dt)P(w|t)))

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-14
SLIDE 14

Introduction Related work Proposed query translation method Evaluation Conclusions References Different Strategies CLIR Frameworks based on query translation

CLIR Frameworks based on query translation

CLLM (c) better than CLPRM (b) when PC provided (Xu, Weischedel, and Nguyen, 2001) CLPRM (b) better than PTRM (a)(based on dic.) whith long queries (Saralegi and Lopez de Lacalle, 2009) PTRM (a) independent of retrieval models.

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-15
SLIDE 15

Introduction Related work Proposed query translation method Evaluation Conclusions References Experimental setup Treatment of OOV words MWE Translation Selection

Proposed query translation method

Dictionary based and parallel corpora free PTRM:

OOV: cognate detection on target collection MWE: matching and translating by means of MWE lists Translation selection: Target collection’s co-occurrence based method (Monz and Dorr, 2005)

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-16
SLIDE 16

Introduction Related work Proposed query translation method Evaluation Conclusions References Experimental setup Treatment of OOV words MWE Translation Selection

Outline

1

Introduction

2

Related work Different Strategies CLIR Frameworks based on query translation

3

Proposed query translation method Experimental setup Treatment of OOV words MWE Translation Selection

4

Evaluation

5

Conclusions

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-17
SLIDE 17

Introduction Related work Proposed query translation method Evaluation Conclusions References Experimental setup Treatment of OOV words MWE Translation Selection

Experimental setup

Topics and Collections:

Development: CLEF (41-90) topics, LA Times 94 collection, and corresponding HRJ (Human Relevance Judgements) Test: CLEF (250-350) topics, LA Times 94 and Glasgow Herald 95 collections, and corresponding HRJ

Retrieval model: Indri Dictionaries:

Morris Basque/English dictionary: 77,864 entries and 28,874 Euskalterm terminology bank: 72,184 entries and 56,745 unique Basque terms.

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-18
SLIDE 18

Introduction Related work Proposed query translation method Evaluation Conclusions References Experimental setup Treatment of OOV words MWE Translation Selection

Outline

1

Introduction

2

Related work Different Strategies CLIR Frameworks based on query translation

3

Proposed query translation method Experimental setup Treatment of OOV words MWE Translation Selection

4

Evaluation

5

Conclusions

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-19
SLIDE 19

Introduction Related work Proposed query translation method Evaluation Conclusions References Experimental setup Treatment of OOV words MWE Translation Selection

Treatment of OOV words

Transliteration rules + LCSR:

OOV word

  • Trans. Rule

Transliteration

  • Max. LCSR

Txetxenia tx/ch chechenia (chechenia,chechnya)=0.89 korrupzio

  • zio/-tion , k/c

corruption (corruption,corruption)=1

Table: Example of an OOV word resolved using cognate detection

A total of 64 OOV terms were quantified out and they account for the 15.46% of all query terms Most of the OOV words are NEs

Named Entities Nouns Adj. Numbers 82.81% 12.5% 3.13% 1.56%

Table: Distribution of OOV words depending on their POS

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-20
SLIDE 20

Introduction Related work Proposed query translation method Evaluation Conclusions References Experimental setup Treatment of OOV words MWE Translation Selection

Treatment of OOV words

Cognate based method solves 80% of OOV words However, only 7 cases need transliteration and LCSR Despite this, 8.96% and 3.52% MAP improvement regarding to baseline (no transliteration and LCSR) OOV words tend to be relevant We estimated the MAP topline by providing the translations of the OOV words by hand Topline MAP: translation by hand of all OOV terms

12.38% (short queries), 4.101% (long queries)

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-21
SLIDE 21

Introduction Related work Proposed query translation method Evaluation Conclusions References Experimental setup Treatment of OOV words MWE Translation Selection

Treatment of OOV words

Translation Method MAP Improvement Over First % Short Long Short Long First Translation 0.2703 0.3835 Topline: First Translation + OOV (by hand) 0.3085 0.3999 12.38 4.101 First Translation + Cognates 0.2969 0.3975 8.96 3.52 Table: Retrieval performance for OOV words for development topics (41-90 topics)

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-22
SLIDE 22

Introduction Related work Proposed query translation method Evaluation Conclusions References Experimental setup Treatment of OOV words MWE Translation Selection

Outline

1

Introduction

2

Related work Different Strategies CLIR Frameworks based on query translation

3

Proposed query translation method Experimental setup Treatment of OOV words MWE Translation Selection

4

Evaluation

5

Conclusions

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-23
SLIDE 23

Introduction Related work Proposed query translation method Evaluation Conclusions References Experimental setup Treatment of OOV words MWE Translation Selection

MWE

Treatment: detection on the source query and translation by using a terminology bank We identified by hand MWEs on queries:

60 MWEs 51 of them compositional (can be translated word by word)

Basque MWE Words

  • Trans. from dic.

Correct candidate Bigarren Mundu Gerra Bigarren second,secondary second Mundu people, world world Gerra war war

Table: Example of word-by-word MWE translation

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-24
SLIDE 24

Introduction Related work Proposed query translation method Evaluation Conclusions References Experimental setup Treatment of OOV words MWE Translation Selection

MWE

The matching method identifies and translates only 11 MWEs (2 non-compositional) Poor coverage but some improvement on MAP terms

5.49 % (short queries), 2.76% (long queries) Most of MWEs compositional → translation selection can solve them

Topline MAP: translation by hand of all MWEs

19.81% (short queries), 9.17% (long queries)

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-25
SLIDE 25

Introduction Related work Proposed query translation method Evaluation Conclusions References Experimental setup Treatment of OOV words MWE Translation Selection

MWE

Translation Method MAP Improvement Over First % Short Long Short Long First Translation 0.2703 0.3835 Topline: First Translation + MWE (by hand) 0.3371 0.4222 19.81 9.17 First Translation + MWE 0.2860 0.3944 5.49 2.76 Table: Retrieval performance for MWEs for 41-90 topics

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-26
SLIDE 26

Introduction Related work Proposed query translation method Evaluation Conclusions References Experimental setup Treatment of OOV words MWE Translation Selection

Outline

1

Introduction

2

Related work Different Strategies CLIR Frameworks based on query translation

3

Proposed query translation method Experimental setup Treatment of OOV words MWE Translation Selection

4

Evaluation

5

Conclusions

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-27
SLIDE 27

Introduction Related work Proposed query translation method Evaluation Conclusions References Experimental setup Treatment of OOV words MWE Translation Selection

Translation Selection

Target co-occurrence based selection algorithm:

Idea: Among all candidates of the source query terms given by the dictionary, select those ones that maximize the global asociation degree between them

NP-hard maximization problem → Greedy approach (Monz and Dorr, 2005)

Initially, all translation candidates are equally likely:

w0

T (t|si) =

1

|tr(si)|

In the iteration step, each translation candidate is iteratively updated:

wn

T (t|si) = wn−1 T

(t|si)+

t′∈inlink(t)

wL(t,t′)∗ wT (t′|si)

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-28
SLIDE 28

Introduction Related work Proposed query translation method Evaluation Conclusions References Experimental setup Treatment of OOV words MWE Translation Selection

Translation Selection

Measuring Association degree (wL(t,t′))

Log-likelihood Ratio (LLR) and co-occurrences between lemmas LLR+nearness factor: Including the distance between source words Log-likelihood Ratio (LLR) and co-occurrences between expanded lemmas

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-29
SLIDE 29

Introduction Related work Proposed query translation method Evaluation Conclusions References Experimental setup Treatment of OOV words MWE Translation Selection

Translation Selection

AM including distance (formula)

w′

L(t,t′) = wL(t,t′)∗ wF(t,t′)

w′

F(t,t′) =

maxsi,Sj∈Qdis(Si,Sj) dis(so(t),so(t′))

∗ 2smw(so(t),so(t′))

Strong evidence, more weight (formula):

smw(s,s′) =

  • 1

if {s,s′} ⊆ Z where Z ∈ MWE else

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-30
SLIDE 30

Introduction Related work Proposed query translation method Evaluation Conclusions References Experimental setup Treatment of OOV words MWE Translation Selection

Translation Selection

Association between expanded tokens

S1 : Source query word 1. S2 : Source query word 2. C1 and C2 : Senses for source query word 1. C3 : Sense for source query word 2. t1 and t2 : Trans. candidates for sense C1. t3 : Trans. candidates for sense C2. Frequency of the senses:

f(Cx) = ∑

t∈Cx

f(t)

Frequency between senses:

f(C1 ∩ C3) = f((∪t∈C1t)∩(∪t∈C3t))

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-31
SLIDE 31

Introduction Related work Proposed query translation method Evaluation Conclusions References Experimental setup Treatment of OOV words MWE Translation Selection

Translation Selection

Toplines: by hand

Select the correct translation from candidates of MRD

21.19% (short queries), 10.10% (long queries)

If no candidate, take it from english monolingual

32.49% (short queries), 16.50% (long queries)

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-32
SLIDE 32

Introduction Related work Proposed query translation method Evaluation Conclusions References Experimental setup Treatment of OOV words MWE Translation Selection

Translation Selection

Translation Method MAP Improvement Over First % Short Long Short Long First Translation 0.2703 0.3835 Topline 1: translation Selection by hand 0.3430 0.4266 21.19 10.10 Target co-occurrence based 0.3405 0.4123 20.62 6.99 Topline 2: translation Selection by hand + new translations 0.4004 0.4593 32.49 16.50 Target co-occurrence based + nearness 0.3399 0.4117 20.48 6.85 Target co-occurrence (expanded tokens) 0.3323 0.4163 18.05 7.88

Table: Retrieval perfomance for translation selection for development topics (41-90

topics)

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-33
SLIDE 33

Introduction Related work Proposed query translation method Evaluation Conclusions References Setup Independent Methods Method Combinations Results

Outline

1

Introduction

2

Related work Different Strategies CLIR Frameworks based on query translation

3

Proposed query translation method Experimental setup Treatment of OOV words MWE Translation Selection

4

Evaluation

5

Conclusions

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-34
SLIDE 34

Introduction Related work Proposed query translation method Evaluation Conclusions References Setup Independent Methods Method Combinations Results

Evaluation

Runs:

English monolingual (topline) First translation from the dictionary (baseline) OOV: First trans. and cognate detection MWE: MWE translation and First trans. TS: Co-occurrence-based translation selection TS+Nearness: including the nearness factor TS (expanded tokens): Sense co-occurrence TS (expanded tokens)+OOV TS (expanded tokens)+OOV+MWE

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-35
SLIDE 35

Introduction Related work Proposed query translation method Evaluation Conclusions References Setup Independent Methods Method Combinations Results

Evaluation: Independent Methods

Runs:

English monolingual (topline) First translation from the dictionary (baseline) OOV: First trans. and cognate detection MWE: MWE translation and First trans. TS: Co-occurrence-based translation selection TS+Nearness: including the nearness factor TS (expanded tokens): Sense co-occurrence TS (expanded tokens)+OOV TS (expanded tokens)+OOV+MWE

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-36
SLIDE 36

Introduction Related work Proposed query translation method Evaluation Conclusions References Setup Independent Methods Method Combinations Results

Evaluation Results: Independent Methods

Run MAP % of Monolingual Improvement Over First % Short Long Short Long Short Long English monolingual 0.3176 0.3773 Baseline 0.2195 0.2599 67 69 OOV 0.2279 0.2670 72 71 7.24 2.66 MWE 0.2237 0.2601 70 69 5.5 0.08

Table: MAP values for test topics (250-350)

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-37
SLIDE 37

Introduction Related work Proposed query translation method Evaluation Conclusions References Setup Independent Methods Method Combinations Results

Evaluation: Independent Methods

Runs:

English monolingual (topline) First translation from the dictionary (baseline) OOV: First trans. and cognate detection MWE: MWE translation and First trans. TS: Co-occurrence-based translation selection TS+Nearness: including the nearness factor TS (expanded tokens): Sense co-occurrence TS (expanded tokens)+OOV

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-38
SLIDE 38

Introduction Related work Proposed query translation method Evaluation Conclusions References Setup Independent Methods Method Combinations Results

Evaluation Results: Independent Methods

Run MAP % of Monolingual Improvement Over First % Short Long Short Long Short Long English monolingual 0.3176 0.3773 Baseline 0.2195 0.2599 67 69 OOV 0.2279 0.2670 72 71 7.24 2.66 MWE 0.2237 0.2601 70 69 5.5 0.08 TS 0.2315 0.2642 73 70 8.68 1.63 TS+Nearness 0.2318 0.2627 73 70 8.8 1.07 TS (expanded tokens) 0.2362 0.2747 74 73 10.5 5.39

Table: MAP values for test topics (250-350)

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-39
SLIDE 39

Introduction Related work Proposed query translation method Evaluation Conclusions References Setup Independent Methods Method Combinations Results

Evaluation: Method Combinations

Topics and collections:

Test: CLEF (250-350) topics, LA Times 94 and Glasgow Herald 95 collections, and corresponding HRJ

Runs:

English monolingual (topline) First translation from the dictionary (baseline) OOV: First trans. and cognate detection MWE: MWE translation and First trans. TS: Co-occurrence-based translation selection TS+Nearness: including the nearness factor TS (expanded tokens): Sense co-occurrence TS (expanded tokens)+OOV

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-40
SLIDE 40

Introduction Related work Proposed query translation method Evaluation Conclusions References Setup Independent Methods Method Combinations Results

Evaluation Results: Method Combinations

Run MAP % of Monolingual Improvement Over First % Short Long Short Long Short Long English monolingual 0.3176 0.3773 Baseline 0.2195 0.2599 67 69 OOV 0.2279 0.2670 72 71 7.24 2.66 MWE 0.2237 0.2601 70 69 5.5 0.08 TS 0.2315 0.2642 73 70 8.68 1.63 TS+Nearness 0.2318 0.2627 73 70 8.8 1.07 TS (expanded tokens) 0.2362 0.2747 74 73 10.5 5.39 TS (expanded tokens)+OOV 0.2424 0.2805 76 74 12.79 7.34

Table: MAP values for test topics (250-350)

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-41
SLIDE 41

Introduction Related work Proposed query translation method Evaluation Conclusions References Setup Independent Methods Method Combinations Results

Evaluation Results

Co-occcurrences based method and cognate detection based method improve the baseline significantly Expanded token co-occurrences better than token co-occurrences MWE treatment poor due to lack of recall

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-42
SLIDE 42

Introduction Related work Proposed query translation method Evaluation Conclusions References Conclusions

Outline

1

Introduction

2

Related work Different Strategies CLIR Frameworks based on query translation

3

Proposed query translation method Experimental setup Treatment of OOV words MWE Translation Selection

4

Evaluation

5

Conclusions

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-43
SLIDE 43

Introduction Related work Proposed query translation method Evaluation Conclusions References Conclusions

Conclusions

Translation selection (including non-compositional MWE) decreases MAP the most on a dictionary-based approach

Wrong selection (10% short queries, 21% long queries) Wrong selection+No correct translation on MRD (17% queries, 32% queries)

OOV terms the least influential factor (12% queries, 4% queries) Proposed dictionary-based parallel corpora free methods offer significant improvement

Co-occurrence based translation selection algorithm Cognate detection method

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-44
SLIDE 44

Introduction Related work Proposed query translation method Evaluation Conclusions References

References I

Darwish, K. and D.W. Oard. 2003. Probabilistic structured query

  • methods. In In Proceedings of the 21st Annual 26th International ACM

SIGIR Conference on Research and Development in Information Retrieval, pages 338–344. ACM. Gao, Jianfeng, Ming Zhou, Jian-Yun Nie, Hongzhao He, and Weijun

  • Chen. 2002. Resolving query ambiguity using a decaying

co-occurrence model and syntactic dependence relations. In Proceedings of the 25th annual international ACM SIGIR conference

  • n Research and development in information retrieval, pages

183–190. ACM.

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-45
SLIDE 45

Introduction Related work Proposed query translation method Evaluation Conclusions References

References II

Hull, D.A. and G. Grefenstette. 1996. Querying across languages: a dictionary-based approach to multilingual information retrieval. In Proceedings of the 19th annual international ACM SIGIR conference

  • n Research and development in information retrieval, pages 49–57.

ACM. McCarley, J. Scott. 1999. Should we translate the documents or the queries in cross-language information retrieval? In Proceedings of the 37th annual meeting of the Association for Computational Linguistics

  • n Computational Linguistics 1999, pages 208–214. Association for

Computational Linguistics.

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-46
SLIDE 46

Introduction Related work Proposed query translation method Evaluation Conclusions References

References III

Monz, C. and B.J. Dorr. 2005. Iterative translation disambiguation for cross-language information retrieval. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 520–527. ACM. Oard, Douglas W. 1998. A comparative study of query and document translation for cross-language information retrieval. In AMTA ’98: Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup, pages 472–483, London, UK. Springer-Verlag. Pirkola, A. 1998. The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference

  • n Research and development in information retrieval, pages 55–63.

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-47
SLIDE 47

Introduction Related work Proposed query translation method Evaluation Conclusions References

References IV

Saralegi, Xabier and Maddalen Lopez de Lacalle. 2010. Aestimating translation probabilities from the web for structured queries on clir. In In Proceedings of the 32th European Conference on Information Retrieval, pages 586–589. Springer. Saralegi, Xabier and Maddallen Lopez de Lacalle. 2009. Comparing different approaches to treat translation ambiguity in clir: Structured queries vs. target co-occurrences based selection. In In Proceedings

  • f the 6th international workshop on Text-based Information Retrieval,

pages 398 – 404. Wang, Jianqiang and Douglas W. Oard. 2003. Combining query translation and document translation in cross-language retrieval. In Proceedings of the 4th Workshop of the Cross-Language Evaluation Forum, pages 108–121. Springer Berlin / Heidelberg.

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-48
SLIDE 48

Introduction Related work Proposed query translation method Evaluation Conclusions References

References V

Xu, Jinxi, Ralph Weischedel, and Chanh Nguyen. 2001. Evaluating a probabilistic model for cross-lingual information retrieval. In SIGIR ’01: Proceedings of the 24th annual international ACM SIGIR conference

  • n Research and development in information retrieval, pages

105–110, New York, NY, USA. ACM.

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English

slide-49
SLIDE 49

Introduction Related work Proposed query translation method Evaluation Conclusions References

Dictionary and Monolingual Corpus-based Query Translation for Basque-English CLIR

Xabier Saralegi Maddalen López de Lacalle

R&D Elhuyar Foundation

7th international conference on Language Resources and Evaluation LREC 2010, Valletta, Malta 2010/05/20

Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English