End-to-End Neural CLIR by Sharing Representation LILY Spring 2018 - PowerPoint PPT Presentation

Aug 26, 2023 •332 likes •504 views

End-to-End Neural CLIR by Sharing Representation LILY Spring 2018 Workshop Rui Zhang Cross-lingual Information Retrieval (CLIR) Information Retrieval Retrieve relevant documents from a corpus for a given user query. e.g., Google

End-to-End Neural CLIR by Sharing Representation LILY Spring 2018 Workshop Rui Zhang
Cross-lingual Information Retrieval (CLIR) Information Retrieval ● Retrieve relevant documents from a corpus for a given user query. ● e.g., Google Search ● Usually monolingual, i.e., documents and queries are in the same language. ● TF-IDF, BM25 Cross-lingual Information Retrieval (CLIR) ● The documents are in a language different from that of the user’s query. ● e.g., an investor wish to monitor the consumer sentiment from tweets around the world.
Methods for CLIR Translation-based approach ● A pipeline of two components: translation + monolingual IR ● Can be further divided into document translation and query translation e.g., the query is in English and documents are in Swahili ● Query translation from English to Swahili using a bilingual dictionary. ● Document translation from Swahili to English using a machine translation system.
Methods for CLIR Translation-based approach is difficult ● Query Translation ○ rely on a comprehensive bilingual dictionary ○ Hard to translate short text queries and phrases ● Document Translation ○ Need to build a reliable machine translation system ● Especially for low-resource languages
Neural (Monolingual) Information Retrieval Many successful neural IR systems have emerged: ● DUET (Mitra et al., 2017) ● PACRR (Hui et al., 2017) ● DSSM (Huang et al., 2013) ● DESM (Mitra et al., 2016) ● MatchPyramid (Pang et al., 2016) ● DRMM (Guo et al., 2016) … ... But, they are evaluated in Monolingual IR settings.
Research Goal and Challenges Goal: Build an end-to-end neural CLIR that ● models local information ○ unigram term match ○ position-dependent information such as proximity and term positions. ● models global information ○ semantic matching in distributed representation space ● directly learns from (query,document,relevance) supervisions ● performs better than the pipeline translation-based approach because it avoids cascading errors
Research Goal and Challenges Challenges ● How can we capture local information and global information when query language and document language are different? ● How can we use and learn shared representation for multiple languages?
Proposed Method 1) Use multilingual word embeddings to build a similarity matrix. ● This models local information. MatchPyramid (Pang et al., 2016)
Multilingual Word Embedding https://github.com/facebookresearch/MUSE
Proposed Method 2) Use monolingual or multilingual embedding to learn a shared distributed representation ● This models global information.
DUET for CLIR - Local Model
DUET for CLIR - Global Model
Data Sets WikiCLIR (Sasaki et al., 2018) ● Automatically created from parallel wiki pages ● Large-scale, 25 languages Standard CLIR task ● CLEF ● NTCIR ● TREC

Recommend

Techniques to improve Dictionary Based CLIR Sai Madhurya Peyyeti KX48810 Different Techniques

Techniques to improve Dictionary Based CLIR Sai Madhurya Peyyeti KX48810 Different Techniques in IR Translation is the key problem in CLIR Query Translation u Dictionary Based Corpora Based Dis-Adv: lack of resources Machine

452 views • 11 slides

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural IR tasks Neural IR architecture Feature Representations Neural IR query auto completion Neural IR query suggestion Neural IR document

1.48k views • 18 slides

Secret Sharing and Visual Cryptography Outline Secret Sharing Visual Secret Sharing

Secret Sharing and Visual Cryptography Outline Secret Sharing Visual Secret Sharing Constructions Moir Cryptography Issues Secret Sharing Secret Sharing Threshold Secret Sharing (Shamir, Blakely 1979) Motivation

1.32k views • 62 slides

Advanced Tools from Modern Cryptography Lecture 3 Secret-Sharing (ctd.) Secret-Sharing Last

Advanced Tools from Modern Cryptography Lecture 3 Secret-Sharing (ctd.) Secret-Sharing Last time (n,t) secret-sharing (n,n) via additive secret-sharing Shamir secret-sharing for general (n,t) Shamir secret-sharing is a linear

611 views • 14 slides

Neural representation of linguistic feature Neural representation of linguistic feature hierarchy

Neural representation of linguistic feature Neural representation of linguistic feature hierarchy reflects language proficiency hierarchy reflects language proficiency Giovanni Di Liberto Jinping Nie Jeremy Yeaton, Bahar Khalighinejad, Shihab

294 views • 13 slides

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks and Handwriting Recognition Steven Sloss Math 164 Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven Sloss Structure Training Neural Networks Math 164 Motivation Problem

889 views • 41 slides

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural Networks can represent complex decision boundaries decision boundaries Variable size. Any boolean function can be Variable size. Any boolean

358 views • 14 slides

ESCRI-SA Knowledge Sharing Sharing Objectives and Components A presentation for the ESCRI-SA

ESCRI-SA Knowledge Sharing Sharing Objectives and Components A presentation for the ESCRI-SA Knowledge Sharing Reference Group, Meeting 1 February 6, 2018 Knowledge Sharing Plan (KSP) The sharing of knowledge from ARENA funded projects

197 views • 6 slides

K K Knowledge Knowledge l d l d Representation Representation Representation

K K Knowledge Knowledge l d l d Representation Representation Representation Representation Chapter 3 Chapter 3 Dr Ahmed Rafea Dr Ahmed Rafea Dr Ahmed Rafea Dr Ahmed Rafea Introduction Introduction Introduction Introduction In

692 views • 17 slides

Neural Network Approaches to Representation Learning for NLP Navid Rekabsaz Idiap Research

Neural Network Approaches to Representation Learning for NLP Navid Rekabsaz Idiap Research Institute @navidrekabsaz navid.rekabsaz@idiap.ch Agenda Brief Intro to Deep Learning - Neural Networks Word Representation Learning - Neural

774 views • 40 slides

A Survey on Cross-language IR (CLIR) Naveen Yamparala (RS09174) Types of IR (Language based)

A Survey on Cross-language IR (CLIR) Naveen Yamparala (RS09174) Types of IR (Language based) [1] There are broadly three different types of Information retrieval. 1. Monolingual information retrieval : Query and the documents will be in the

410 views • 14 slides

Analysis of Cross Language Information Retrieval methods Introduction to Cross Language

Analysis of Cross Language Information Retrieval methods Introduction to Cross Language Information Retrieval (CLIR) CLIR is a subfield of information retrieval dealing with retrieving information written in a language different from the

461 views • 18 slides

1 Translation model Language model Dictionaries used Languages Name #Entries Type P(S|T)

Motivation Classification of CLIR methods Cross-Language IR at We developed an automatic transliteration query translation method University of Tsukuba method for Japanese and English CLIR document translation method Automatic

210 views • 3 slides

Revisiting Document Length Hypotheses NTCIR-4 CLIR and Patent Experiments at Patolis 4 June 2004

Revisiting Document Length Hypotheses NTCIR-4 CLIR and Patent Experiments at Patolis 4 June 2004 Sumio FUJITA PATOLIS Corporation Introduction Is patent search different from traditional document retrieval tasks? If the answer is

291 views • 26 slides

IASL System for NTCIR-6 Korean-Chinese CLIR Yu-Chun Wang Cheng-Wei Lee Richard Tzong-Han Tsai

IASL System for NTCIR-6 Korean-Chinese CLIR Yu-Chun Wang Cheng-Wei Lee Richard Tzong-Han Tsai Wen-Lian Hsu * Min-Yuh Day Intelligent Agent Systems Lab. (IASL) Institute of Information Science, Academia Sinica, Taiwan NTCIR-6, Tokyo, Japan,

261 views • 22 slides

How Deep Learning is making MT and other areas converge? MARTA R. COSTA-JUSS UNIVERSITAT

How Deep Learning is making MT and other areas converge? MARTA R. COSTA-JUSS UNIVERSITAT POLITCNICA DE CATALUNYA, BARCELONA About me ASR SMT HMT CLIR HMT SMT+NN CLIR OM LIMSI- USP, So I2R, IPN,

583 views • 47 slides

The Spatial Web A New Data Management Frontier Christian S. Jensen www.cs.au.dk/~csj The

The Spatial Web A New Data Management Frontier Christian S. Jensen www.cs.au.dk/~csj The Web Is Going Mobile A quickly evolving mobile Internet infrastructure. Mobile devices, e.g., smartphones, tablets, laptops, navigation devices,

818 views • 56 slides

MySQL and Ceph Yves Trudeau, Principal Architect, Percona Yves Trudeau, Principal Architect,

MySQL and Ceph Yves Trudeau, Principal Architect, Percona Yves Trudeau, Principal Architect, Percona Santa Clara, California | April 24th 27th, 2017 Santa Clara, California | April 24th 27th, 2017 Who am I? Physicist (by training)

597 views • 32 slides

FSPA Update Deepika Jena UEC Meeting 8 Feb 2019 Chinese New Year Platform is set for todays

FSPA Update Deepika Jena UEC Meeting 8 Feb 2019 Chinese New Year Platform is set for todays party ! v All flyers are put, mails sent out for an invitation. v We have requested a donation for $5. v We have Karaoke competition with some

344 views • 7 slides

Breakaway Gap MICROSOFT 70.0 69.5 69.0 68.5 Gapped under Gapped under 68.0 support

Breakaway Gap MICROSOFT 70.0 69.5 69.0 68.5 Gapped under Gapped under 68.0 support support 67.5 67.0 66.5 Small Small 66.0 65.5 windows windows 65.0 64.5 64.0 63.5 63.0 62.5 62.0 61.5 61.0 60.5 60.0 59.5 59.0 58.5

332 views • 20 slides

General Linguistics HPSG Head-Driven Phrase Structure Grammar Alexandr Rosen stav

General Linguistics HPSG Head-Driven Phrase Structure Grammar Alexandr Rosen stav teoretick a komputa cn lingvistiky Filozofick fakulta Univerzity Karlovy 27 March 2013 Rosen (FF UK) LTGF 27 March 2013 1 / 60 Reading 1

781 views • 60 slides

Developing Solar Projects CARES Conference 2019 Assessing buildings for Solar PV Ben Whittle,

Developing Solar Projects CARES Conference 2019 Assessing buildings for Solar PV Ben Whittle, Welsh Government Energy Service (EST) What are you trying to achieve? Maximum generation possible? 100% provision plus export? Reasonable

1.24k views • 78 slides

Mother Tongue Language Nurturing Active learners, proficient users of MTL Department mission:

Mother Tongue Language Nurturing Active learners, proficient users of MTL Department mission: Students develop passion for the language and culture of their MTL through a plethora of pedagogies and activities: Activities Pedagogies Learning

200 views • 7 slides

Voices of Empire Literary Dialect & the Digital Archive Dr. David Brown w March 15, 2018 w

Voices of Empire Literary Dialect & the Digital Archive Dr. David Brown w March 15, 2018 w Lancaster University I weel tak them to Lochabar and wash them in the Brook Daft are your dreams, as daftly wad ye hide Your weel-seen love,

811 views • 64 slides