Vector Space Models for Phrase-based Machine Translation Tamer - PowerPoint PPT Presentation

Vector Space Models for Phrase-based Machine Translation Tamer Alkhouli, Andreas Guta, and Hermann Ney <surname>@cs.rwth-aachen.de Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation Doha, Qatar October 25, 2014 Human Language Technology and Pattern Recognition Chair of Computer Science 6 Computer Science Department RWTH Aachen University, Germany Alkhouli et al.: Vector Space Models for Phrase-based MT SSST-8: October 25, 2014 1 / 15

Outline ◮ Introduction and Motivation ◮ From Words to Phrases ◮ Semantic Phrase Features ◮ Paraphrasing and Out-of-vocabulary Reduction ◮ Experiments ◮ Conclusion Alkhouli et al.: Vector Space Models for Phrase-based MT SSST-8: October 25, 2014 2 / 15

Introduction and Motivation ◮ Goal: improve phrase-based translation (PBT) using vector space models ◮ Categorical word representations: no information about word identities ◮ Embedding words in a vector space allow such encoding ⊲ geometric arrangements in the vector space ⊲ enables information retrieval approaches using a similarity measure ◮ Distributional hypothesis (Harris 1954): words occurring in similar contexts have similar meanings ◮ Word representations based on: ⊲ co-occurrence counts (Lund and Burgess, 1996; Landauer and Dumais, 1997) → dimensionality reduction (e.g. SVD) ⊲ neural networks (NN) → input/output weights Alkhouli et al.: Vector Space Models for Phrase-based MT SSST-8: October 25, 2014 3 / 15

From Words to Phrases ◮ How to learn phrase vectors? ◮ Phrase representations ⊲ decompositional approach: resort to word constituents (Gao et al., 2013; Chen et al., 2010) ⊲ atomic treatment of phrases (Mikolov et al., 2013b; Hu et al., 2014) ◦ advantage: reuse word-level methods ◦ challenge: data sparsity ◮ This work: NN-based atomic phrase representations Alkhouli et al.: Vector Space Models for Phrase-based MT SSST-8: October 25, 2014 4 / 15

Phrase Corpus ◮ Phrase corpus used to learn phrase vectors ◮ Corpus built using a multi-pass greedy algorithm ⊲ initialization: phrases have length 1 ⊲ join phrases forwards, backwards or do not join ⊲ Use bilingual phrase table scores to make the decision: � � L score ( ˜ w l g l ( ˜ ∑ f ) = max e ) f , ˜ e ˜ l = 1 ◦ ( ˜ e ) : bilingual phrase pair f , ˜ ◦ g l ( ˜ e ) : l -th feature of the bilingual phrase pair f , ˜ ◦ w l : l -th feature weight ◮ 2 phrasal and 2 lexical features with manually tuned weights Alkhouli et al.: Vector Space Models for Phrase-based MT SSST-8: October 25, 2014 5 / 15

Semantic Phrase Feature ◮ Add a vector-based feature to the log-linear framework of PBT: h ( ˜ e ) = sim ( Wx ˜ e ) f , ˜ f , z ˜ f : S -dimensional source phrase vector ⊲ x ˜ e : T -dimensional target phrase vector ⊲ z ˜ ⊲ W : T × S linear projection matrix (Mikolov et al. 2013a) ⊲ sim : similarity function (e.g. cosine similarity) ◮ Learn W using stochastic gradient descent N ∑ || Wx n − z n || 2 min W n = 1 where ( x n , z n ) � = ( x ˜ e ) such that: f , z ˜ � � L w l g l ( ˜ e ′ ) ∑ e = argmax ˜ f , ˜ e ′ ˜ l = 1 Alkhouli et al.: Vector Space Models for Phrase-based MT SSST-8: October 25, 2014 6 / 15

Out-of-vocabulary Reduction ◮ Introduce new phrase pairs to the phrase table ◮ Paraphrase ˜ f with | ˜ f | = 1 ⊲ reduce out-of-vocabulary (OOV) words ⊲ use word vectors ◮ k -nearest neighbor search using a similarity measure ◮ Additional phrase table feature ⊲ similarity measured between a phrase and its paraphrase ⊲ original features copied from original phrase pair ◮ Avoid interfering with existing phrase entries → limit paraphrasing to source words unseen in parallel data Alkhouli et al.: Vector Space Models for Phrase-based MT SSST-8: October 25, 2014 7 / 15

Experiments ◮ IWSLT 2013 Arabic → English task ◮ Domain: TED lectures TED UN Arabic English Arabic English Sentences 147K 8M Running Words 3M 3M 228M 226M IWSLT 2013 Arabic and English corpora statistics Alkhouli et al.: Vector Space Models for Phrase-based MT SSST-8: October 25, 2014 8 / 15

Experiments ◮ Phrase vectors trained using word2vec 1 ⊲ simple neural network model without hidden layers ⊲ use frequent phrases only ◮ Vector dimension: Arabic: 800, English: 200 ◮ 5 passes for phrase corpus construction 1 http://code.google.com/p/word2vec/ Alkhouli et al.: Vector Space Models for Phrase-based MT SSST-8: October 25, 2014 9 / 15

Experiments TED+UN Arabic English # tokens words 231M 229M phrases 126M 115M vocabulary words 0.5M 0.4M phrases 5.8M 5.3M # vectors ( word2vec vocabulary) words 134K 123K phrases 934K 913K Corpus and vector statistics for IWSLT 2013 Arabic → English Alkhouli et al.: Vector Space Models for Phrase-based MT SSST-8: October 25, 2014 10 / 15

Experiments ◮ Standard PBT Baseline features: ⊲ 2 phrasal features ⊲ 2 lexical features ⊲ 3 binary count features ⊲ 6 Hierarchical reordering features ⊲ 4-gram mixture LM ⊲ jump distortion ⊲ phrase and word penalties ◮ In-domain baseline data: TED ◮ Full baseline data: TED+UN, domain-adapted phrase table Alkhouli et al.: Vector Space Models for Phrase-based MT SSST-8: October 25, 2014 11 / 15

Experiments ◮ Word vectors used for paraphrasing ◮ Reduction of OOV rate: 5 . 4% → 3 . 9% Arabic dev eval13 # OOV TED 185 254 TED+paraphrasing 150 183 Vocabulary 3,714 4,734 OOV reduction for IWSLT 2013 Arabic → English Alkhouli et al.: Vector Space Models for Phrase-based MT SSST-8: October 25, 2014 12 / 15

Experiments ◮ Improvements over the TED baseline ⊲ semantic feature: 0 . 4% B LEU and 0 . 7% T ER ⊲ paraphrasing: 0 . 6% B LEU and 0 . 7% T ER dev2010 eval2013 system B LEU [ % ] T ER [ % ] B LEU [ % ] T ER [ % ] TED 29.1 50.5 28.9 52.5 + semantic feature 29.1 † 50.1 † 29.3 † 51.8 + paraphrasing 29.2 † 50.2 † 29.5 † 51.8 + both 29.2 50.2 † 29.4 † 51.8 TED+UN 29.7 49.3 30.5 50.5 + semantic feature 29.8 49.2 30.2 50.7 Semantic feature and paraphrasing results for IWSLT 2013 Arabic → English. ◮ † : statistical significance with p < 0 . 01 Alkhouli et al.: Vector Space Models for Phrase-based MT SSST-8: October 25, 2014 13 / 15

Conclusion ◮ Improved end-to-end translation using vector space models ⊲ semantic phrase features using phrase vectors ⊲ paraphrasing using word vectors ◮ Exploit monolingual data for OOV reduction ◮ Proposed methods helpful for resource-limited tasks ◮ B LEU and T ER may underestimate semantic models Alkhouli et al.: Vector Space Models for Phrase-based MT SSST-8: October 25, 2014 14 / 15

Thank you for your attention Tamer Alkhouli Andreas Guta <surname>@cs.rwth-aachen.de http://www-i6.informatik.rwth-aachen.de/ Alkhouli et al.: Vector Space Models for Phrase-based MT SSST-8: October 25, 2014 15 / 15

The Blackslide GoBack

Vector Space Models for Phrase-based Machine Translation Tamer - PowerPoint PPT Presentation

Vector Space Models for Phrase-based Machine Translation Tamer Alkhouli, Andreas Guta, and Hermann Ney <surname>@cs.rwth-aachen.de Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation Doha, Qatar October 25, 2014

Phrase-Based Models Philipp Koehn 15 September 2020 Philipp Koehn Machine Translation:

Chapter 5 Phrase-based models Statistical Machine Translation Motivation Word-Based Models

Building a Phrase-based SMT System Graham Neubig & Kevin Duh Nara Institute of Science and

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

Translation Model Parallel corpus source target translation e f phrase phrase features

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

Phrase-Based Machine Translation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase Alignment Dan Klein UC

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Neutrino Physics in Italy In Italy activities in Neutrino Physics are funded mainly by

Disordered fermions in two dimensions: is Anderson insulating phase the only possibility? Luca

Elementary Particles Lecture 3 Niels Tuning Harry van der Graaf Niels Tuning (1) Plan

The Periodic Table 01 Slide 1 Electron Radial Distribution 03 p Orbital Shapes:

IETF MARTINI WG interim meeting 2010-01-26 Requirements mailing list discussion (John

Asterisk WebRTC frontier: make client SIP Phone with sipML5 - Janus Gateway Alessandro Polidori

Call Management Policy Specification for Asterisk PBX George Konstantoulakis Morgan Stanley

Re-introducing E1 in OsmoBSC Harald Welte <laforge@gnumonks.org> Intro OpenBSC (later

Vector Space Models for Phrase-based Machine Translation Tamer - PowerPoint PPT Presentation

Vector Space Models for Phrase-based Machine Translation Tamer Alkhouli, Andreas Guta, and Hermann Ney <surname>@cs.rwth-aachen.de Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation Doha, Qatar October 25, 2014

Phrase-Based Models Philipp Koehn 15 September 2020 Philipp Koehn Machine Translation:

Chapter 5 Phrase-based models Statistical Machine Translation Motivation Word-Based Models

Building a Phrase-based SMT System Graham Neubig &amp; Kevin Duh Nara Institute of Science and

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

Translation Model Parallel corpus source target translation e f phrase phrase features

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

Phrase-Based Machine Translation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase Alignment Dan Klein UC

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Neutrino Physics in Italy In Italy activities in Neutrino Physics are funded mainly by

Disordered fermions in two dimensions: is Anderson insulating phase the only possibility? Luca

Elementary Particles Lecture 3 Niels Tuning Harry van der Graaf Niels Tuning (1) Plan

The Periodic Table 01 Slide 1 Electron Radial Distribution 03 p Orbital Shapes:

IETF MARTINI WG interim meeting 2010-01-26 Requirements mailing list discussion (John

Asterisk WebRTC frontier: make client SIP Phone with sipML5 - Janus Gateway Alessandro Polidori

Call Management Policy Specification for Asterisk PBX George Konstantoulakis Morgan Stanley

Re-introducing E1 in OsmoBSC Harald Welte &lt;laforge@gnumonks.org&gt; Intro OpenBSC (later

Building a Phrase-based SMT System Graham Neubig & Kevin Duh Nara Institute of Science and

Re-introducing E1 in OsmoBSC Harald Welte <laforge@gnumonks.org> Intro OpenBSC (later