 
              Introduction Related work Proposed query translation method Evaluation Conclusions References Dictionary and Monolingual Corpus-based Query Translation for Basque-English CLIR Xabier Saralegi Maddalen López de Lacalle R&D Elhuyar Foundation 7th international conference on Language Resources and Evaluation LREC 2010, Valletta, Malta 2010/05/20 Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English
Introduction Related work Proposed query translation method Evaluation Conclusions References Outline Introduction 1 Related work 2 Different Strategies CLIR Frameworks based on query translation Proposed query translation method 3 Experimental setup Treatment of OOV words MWE Translation Selection Evaluation 4 5 Conclusions Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English
Introduction Related work Proposed query translation method Introduction Evaluation Conclusions References Introduction: Motivation CLIR = IR + language barrier Most CLIR technology based on Machine Translation Systems (MTS) or Parallel Corpora (PC) MTS and PC resources expensive or scarce for most pair of languages, specially for small languages Bilingual dictionaries easier to obtain Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English
Introduction Related work Proposed query translation method Introduction Evaluation Conclusions References Introduction: Bilingual Dictionaries Problems: Translation ambiguity , Out-of-Vocabulary words, Multi Word Expressions Example Query 80: EU: “G7 gailurrean Napolin Errusiak jokatutako papera “ EN: “role played by Russia in the G7 summit in Naples in 1994” papera : paper, role. . . Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English
Introduction Related work Proposed query translation method Introduction Evaluation Conclusions References Introduction: Bilingual Dictionaries Problems: Translation ambiguity, Out-of-Vocabulary words , Multi Word Expressions Example Query 46: EU: “Irakeko bahitura ” EN: “ Embargo on Iraq” Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English
Introduction Related work Proposed query translation method Introduction Evaluation Conclusions References Introduction: Bilingual Dictionaries Problems: Translation ambiguity, Out-of-Vocabulary words, Multi Word Expressions Example Query 47: EU: “Errusiarren esku hartzea Txetxenian” EN: “Russian intervention in Chechnya” Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English
Introduction Related work Proposed query translation method Introduction Evaluation Conclusions References Introduction: Objectives Objetives of this work To analyse how each problem affects retrieval performance of a dictionary-based Basque-English CLIR system To evaluate methods not based on parallel corpora to treat those problems Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English
Introduction Related work Proposed query translation method Different Strategies Evaluation CLIR Frameworks based on query translation Conclusions References Outline Introduction 1 Related work 2 Different Strategies CLIR Frameworks based on query translation Proposed query translation method 3 Experimental setup Treatment of OOV words MWE Translation Selection Evaluation 4 Conclusions 5 Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English
Introduction Related work Proposed query translation method Different Strategies Evaluation CLIR Frameworks based on query translation Conclusions References Different Strategies Translate → collection or queries? Collection → richer context for translation selection (Oard, 1998) Query → most studied because it is more scalable (Hull and Grefenstette, 1996) Best results: Translating both , merging corresponding rankings (McCarley, 1999)(Wang and Oard, 2003) Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English
Introduction Related work Proposed query translation method Different Strategies Evaluation CLIR Frameworks based on query translation Conclusions References Outline Introduction 1 Related work 2 Different Strategies CLIR Frameworks based on query translation Proposed query translation method 3 Experimental setup Treatment of OOV words MWE Translation Selection Evaluation 4 Conclusions 5 Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English
Introduction Related work Proposed query translation method Different Strategies Evaluation CLIR Frameworks based on query translation Conclusions References CLIR Frameworks based on query translation (a) Post-translation Relevance Model (PTRM) The query is translated independientely and then a relevance model is used Query terms translated with PC or dict. PC solves translation selection Dict.: co-occurrence based method for solving selection (Monz and Dorr, 2005) (Gao et al., 2002) Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English
Introduction Related work Proposed query translation method Different Strategies Evaluation CLIR Frameworks based on query translation Conclusions References CLIR Frameworks based on query translation (b) Cross-lingual probabilistic relevance models (CLPRM) Translation process included in relevance model Query terms translated by PC or dict. All candidates are treated as a single token (Pirkola, 1998), or pondered with weights mined from PC (Darwish and Oard, 2003) or comparable corpora (Saralegi and Lopez de Lacalle, 2010) ∑ TF j ( s i ) = TF j ( D k ) { k | D k ∈ T ( s i ) } DF ( Q i ) = |∪ { k | D k ∈ T ( Q i ) } { d | D k ∈ d }| Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English
Introduction Related work Proposed query translation method Different Strategies Evaluation CLIR Frameworks based on query translation Conclusions References CLIR Frameworks based on query translation (c) Cross-lingual language models (CLLM) Translation process included in relevance model Query terms translations PC or dict. Translation probabilities are included in a probabilistic model (Xu, Weischedel, and Nguyen, 2001) P ( Q s | D t ) = ∏ ((( 1 − λ ) P ( w | G s ))+ λ ( ∑ P ( t | D t ) P ( w | t ))) w ∈ Q s t ∈ D t Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English
Introduction Related work Proposed query translation method Different Strategies Evaluation CLIR Frameworks based on query translation Conclusions References CLIR Frameworks based on query translation CLLM (c) better than CLPRM (b) when PC provided (Xu, Weischedel, and Nguyen, 2001) CLPRM (b) better than PTRM (a)(based on dic.) whith long queries (Saralegi and Lopez de Lacalle, 2009) PTRM (a) independent of retrieval models. Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English
Introduction Related work Experimental setup Proposed query translation method Treatment of OOV words Evaluation MWE Conclusions Translation Selection References Proposed query translation method Dictionary based and parallel corpora free PTRM: OOV: cognate detection on target collection MWE: matching and translating by means of MWE lists Translation selection: Target collection’s co-occurrence based method (Monz and Dorr, 2005) Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English
Introduction Related work Experimental setup Proposed query translation method Treatment of OOV words Evaluation MWE Conclusions Translation Selection References Outline Introduction 1 Related work 2 Different Strategies CLIR Frameworks based on query translation Proposed query translation method 3 Experimental setup Treatment of OOV words MWE Translation Selection Evaluation 4 Conclusions 5 Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English
Introduction Related work Experimental setup Proposed query translation method Treatment of OOV words Evaluation MWE Conclusions Translation Selection References Experimental setup Topics and Collections: Development: CLEF (41-90) topics, LA Times 94 collection, and corresponding HRJ (Human Relevance Judgements) Test: CLEF (250-350) topics, LA Times 94 and Glasgow Herald 95 collections, and corresponding HRJ Retrieval model: Indri Dictionaries: Morris Basque/English dictionary: 77,864 entries and 28,874 Euskalterm terminology bank: 72,184 entries and 56,745 unique Basque terms. Xabier Saralegi, Maddalen López de Lacalle Dictionary and Monolingual Corpus-based Query Translation for Basque-English
Recommend
More recommend