Neural Machine Translation II Refinements Philipp Koehn 17 October - PowerPoint PPT Presentation

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Neural Machine Translation 1 <s> the house is big . </s> Input Word Embeddings Left-to-Right Recurrent NN Right-to-Left Recurrent NN Attention Input Context Hidden State Output Word Predictions Error Given Output Words Output Word Embedding <s> das Haus ist groß , </s> Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Neural Machine Translation 2 • Last lecture: architecture of attentional sequence-to-sequence neural model • Today: practical considerations and refinements – ensembling – handling large vocabularies – using monolingual data – deep models – alignment and coverage – use of linguistic annotation – multiple language pairs Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

3 ensembling Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Ensembling 4 • Train multiple models • Say, by different random initializations • Or, by using model dumps from earlier iterations (most recent, or interim models with highest validation score) Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Decoding with Single Model 5 the y i Ey i Context c i-1 c i cat this State s i-1 s i of Word t i-1 t i fish Prediction there Selected y i-1 y i Word dog Embedding these Ey i-1 Ey i Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Combine Predictions 6 Model Model Model Model Model 1 2 3 4 Average the .54 .52 .12 .29 .37 cat .01 .02 .33 .03 .10 this .01 .11 .06 .14 .08 of .00 .00 .01 .08 .02 fish .00 .12 .15 .00 .07 there .03 .03 .00 .07 .03 dog .00 .00 .05 .20 .06 these .05 .09 .09 .00 .00 Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Ensembling 7 • Surprisingly reliable method in machine learning • Long history, many variants: bagging, ensemble, model averaging, system combination, ... • Works because errors are random, but correct decisions unique Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Right-to-Left Inference 8 • Neural machine translation generates words right to left (L2R) the → cat → is → in → the → bag → . • But it could also generate them right to left (R2L) the ← cat ← is ← in ← the ← bag ← . Obligatory notice: Some languages (Arabic, Hebrew, ...) have writing systems that are right-to-left, so the use of ”right-to-left” is not precise here. Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Right-to-Left Reranking 9 • Train both L2R and R2L model • Score sentences with both ⇒ use both left and right context during translation • Only possible once full sentence produced → re-ranking 1. generate n-best list with L2R model 2. score candidates in n-best list with R2L model 3. chose translation with best average score Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

10 large vocabularies Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Zipf’s Law: Many Rare Words 11 frequency rank frequency × rank = constant Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Many Problems 12 • Sparse data – words that occur once or twice have unreliable statistics • Computation cost – input word embedding matrix: | V | × 1000 – outout word prediction matrix: 1000 × | V | Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Some Causes for Large Vocabularies 13 • Morphology tweet, tweets, tweeted, tweeting, retweet, ... → morphological analysis? • Compounding homework, website, ... → compound splitting? • Names Netanyahu, Jones, Macron, Hoboken, ... → transliteration? ⇒ Breaking up words into subwords may be a good idea Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Byte Pair Encoding 14 • Start by breaking up words into characters t h e � f a t � c a t � i s � i n � t h e � t h i n � b a g • Merge frequent pairs t h → th th e � f a t � c a t � i s � i n � th e � th i n � b a g a t → at th e � f at � c at � i s � i n � th e � th i n � b a g i n → in th e � f at � c at � i s � in � th e � th in � b a g th e → the the � f at � c at � i s � in � the � th in � b a g • Each merge operation increases the vocabulary size – starting with the size of the character set (maybe 100 for Latin script) – stopping at, say, 50,000 Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Example: 49,500 BPE Operations 15 Obama receives Net@@ any@@ ahu the relationship between Obama and Net@@ any@@ ahu is not exactly friendly . the two wanted to talk about the implementation of the international agreement and about Teheran ’s destabil@@ ising activities in the Middle East . the meeting was also planned to cover the conflict with the Palestinians and the disputed two state solution . relations between Obama and Net@@ any@@ ahu have been stra@@ ined for years . Washington critic@@ ises the continuous building of settlements in Israel and acc@@ uses Net@@ any@@ ahu of a lack of initiative in the peace process . the relationship between the two has further deteriorated because of the deal that Obama negotiated on Iran ’s atomic programme . in March , at the invitation of the Republic@@ ans , Net@@ any@@ ahu made a controversial speech to the US Congress , which was partly seen as an aff@@ ront to Obama . the speech had not been agreed with Obama , who had rejected a meeting with reference to the election that was at that time im@@ pending in Israel . Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

16 using monolingual data Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Traditional View 17 • Two core objectives for translation Adequacy Fluency meaning of source and target match target is well-formed translation model language model parallel data monolingual data • Language model is key to good performance in statistical models • But: current neural translation models only trained on parallel data Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Integrating a Language Model 18 • Integrating a language model into neural architecture – word prediction informed by translation model and language model – gated unit that decides balance • Use of language model in decoding – train language model in isolation – add language model score during inference (similar to ensembling) • Proper balance between models (amount of training data, weights) unclear Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Backtranslation 19 • No changes to model architecture reverse system • Create synthetic parallel data – train a system in reverse direction – translate target-side monolingual data into source language – add as additional parallel data • Simple, yet effective final system Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

20 deeper models Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Deeper Models 21 • Encoder and decoder are recurrent neural networks • We can add additional layers for each step • Recall shallow and deep language models Input Hidden Input Layer 1 Hidden Shallow Deep Hidden Layer Layer 2 Hidden Output Layer 3 Output • Adding residual connections (short-cuts through deep layers) help Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Deep Decoder 22 • Two ways of adding layers – deep transitions: several layers on path to output – deeply stacking recurrent neural networks • Why not both? Context Decoder State: Stack 1, Transition 1 Decoder State: Stack 1, Transition 2 Decoder State: Stack 2, Transition 1 Decoder State: Stack 2, Transition 2 Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Deep Encoder 23 • Previously proposed encoder already has 2 layers – left-to-right recurrent network, to encode left context – right-to-left recurrent network, to encode right context ⇒ Third way of adding layers Input Word Embedding Encoder Layer 1: L2R Encoder Layer 2: R2L Encoder Layer 3: L2R Encoder Layer 4: R2L Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Reality Check: Edinburgh WMT 2017 24 Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

25 alignment and coverage Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017

Neural Machine Translation II Refinements Philipp Koehn 17 October - PowerPoint PPT Presentation

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine Translation: Neural Machine Translation II Refinements 17 October 2017 Neural Machine Translation 1 <s> the house is big .

Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Refinements on CP Refinements: Modeling

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Machine Translation Philipp Koehn 6 October 2020 Philipp Koehn Machine Translation:

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Convolutional over Recurrent Encoder for Neural Machine Translation Praveen Dakwale and Christof

Adaptive Multi-pass Decoder for Neural Machine Translation EMNLP 2018

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Decoding Philipp Koehn 8 October 2020 Philipp Koehn Machine

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Semi-supervised Learning for Neural Machine Translation Yong Cheng joint work with Wei Xu,

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

Syntax-Directed Translation for Top-Down Parsing 1 Midterm next week during class online

Leveling for Non-Volatile Main Memories Haikun Liu , Yuanyuan Ye, Xiaofei Liao, Hai Jin, Yu Zhang,

23 Advanced Topics 5: Multi-lingual Models Up until now, we have assumed that in the case of

Evaluating MT Quality Evaluation of Why do we want to do it? Translation Quality - Want to

ANSI TAG 37 Committee F43 Language Services and Products Interagency Language Roundtable

61A Lecture 35 Quiz 4 (SQL) released on Tuesday 4/28 is due Thursday 4/30 @ 11:59pm Friday,

TPC warm readout with the RCE system Matt Graham, SLAC protoDUNE DAQ Review November 3, 2016