Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn - PowerPoint PPT Presentation

Inserting Words 32 ● Words may be added during translation – The English just does not have an equivalent in German – We still need to map it to something: special NULL token 0 1 2 3 4 das Haus ist klein NULL the house is just small 1 2 3 4 5 a ∶ { 1 → 1 , 2 → 2 , 3 → 3 , 4 → 0 , 5 → 4 } Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

IBM Model 1 33 ● Generative model: break up translation process into smaller steps – IBM Model 1 only uses lexical translation ● Translation probability – for a foreign sentence f = ( f 1 ,...,f l f ) of length l f – to an English sentence e = ( e 1 ,...,e l e ) of length l e – with an alignment of each English word e j to a foreign word f i according to the alignment function a ∶ j → i l e ǫ p ( e ,a ∣ f ) = t ( e j ∣ f a ( j ) ) ∏ ( l f + 1 ) l e j = 1 – parameter ǫ is a normalization constant Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Example 34 das Haus ist klein t ( e ∣ f ) t ( e ∣ f ) t ( e ∣ f ) t ( e ∣ f ) e e e e the 0.7 house 0.8 is 0.8 small 0.4 that 0.15 building 0.16 ’s 0.16 little 0.4 which 0.075 home 0.02 exists 0.02 short 0.1 who 0.05 household 0.015 has 0.015 minor 0.06 this 0.025 shell 0.005 are 0.005 petty 0.04 p ( e,a ∣ f ) = ǫ 4 3 × t ( the ∣ das ) × t ( house ∣ Haus ) × t ( is ∣ ist ) × t ( small ∣ klein ) = ǫ 4 3 × 0 . 7 × 0 . 8 × 0 . 8 × 0 . 4 = 0 . 0028 ǫ Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

35 em algorithm Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Learning Lexical Translation Models 36 ● We would like to estimate the lexical translation probabilities t ( e ∣ f ) from a parallel corpus ● ... but we do not have the alignments ● Chicken and egg problem – if we had the alignments , → we could estimate the parameters of our generative model – if we had the parameters , → we could estimate the alignments Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

EM Algorithm 37 ● Incomplete data – if we had complete data , would could estimate model – if we had model , we could fill in the gaps in the data ● Expectation Maximization (EM) in a nutshell 1. initialize model parameters (e.g. uniform) 2. assign probabilities to the missing data 3. estimate model parameters from completed data 4. iterate steps 2–3 until convergence Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

EM Algorithm 38 ... la maison ... la maison blue ... la fleur ... ... the house ... the blue house ... the flower ... ● Initial step: all alignments equally likely ● Model learns that, e.g., la is often aligned with the Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

EM Algorithm 39 ... la maison ... la maison blue ... la fleur ... ... the house ... the blue house ... the flower ... ● After one iteration ● Alignments, e.g., between la and the are more likely Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

EM Algorithm 40 ... la maison ... la maison bleu ... la fleur ... ... the house ... the blue house ... the flower ... ● After another iteration ● It becomes apparent that alignments, e.g., between fleur and flower are more likely (pigeon hole principle) Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

EM Algorithm 41 ... la maison ... la maison bleu ... la fleur ... ... the house ... the blue house ... the flower ... ● Convergence ● Inherent hidden structure revealed by EM Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

EM Algorithm 42 ... la maison ... la maison bleu ... la fleur ... ... the house ... the blue house ... the flower ... p(la|the) = 0.453 p(le|the) = 0.334 p(maison|house) = 0.876 p(bleu|blue) = 0.563 ... ● Parameter estimation from the aligned corpus Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

IBM Model 1 and EM 43 ● EM Algorithm consists of two steps ● Expectation-Step: Apply model to the data – parts of the model are hidden (here: alignments) – using the model, assign probabilities to possible values ● Maximization-Step: Estimate model from data – take assign values as fact – collect counts (weighted by probabilities) – estimate model from counts ● Iterate these steps until convergence Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

44 phrase-based models Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Phrase-Based Model 45 ● Foreign input is segmented in phrases ● Each phrase is translated into English ● Phrases are reordered Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Phrase Translation Table 46 ● Main knowledge source: table with phrase translations and their probabilities ● Example: phrase translations for natuerlich Probability φ ( ¯ e ∣ ¯ f ) Translation of course 0.5 naturally 0.3 of course , 0.15 , of course , 0.05 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Real Example 47 ● Phrase translations for den Vorschlag learned from the Europarl corpus: φ ( ¯ e ∣ ¯ f ) φ ( ¯ e ∣ ¯ f ) English English the proposal 0.6227 the suggestions 0.0114 ’s proposal 0.1068 the proposed 0.0114 a proposal 0.0341 the motion 0.0091 the idea 0.0250 the idea of 0.0091 this proposal 0.0227 the proposal , 0.0068 proposal 0.0205 its proposal 0.0068 of the proposal 0.0159 it 0.0068 the proposals 0.0159 ... ... – lexical variation (proposal vs suggestions) – morphological variation (proposal vs proposals) – included function words (the, a, ...) – noise (it) Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

48 decoding Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Decoding 49 ● We have a mathematical model for translation p ( e ∣ f ) ● Task of decoding: find the translation e best with highest probability e best = argmax e p ( e ∣ f ) ● Two types of error – the most probable translation is bad → fix the model – search does not find the most probably translation → fix the search ● Decoding is evaluated by search error, not quality of translations (although these are often correlated) Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Translation Process 50 ● Task: translate this sentence from German into English er geht ja nicht nach hause Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Translation Process 51 ● Task: translate this sentence from German into English er geht ja nicht nach hause er he ● Pick phrase in input, translate Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Translation Process 52 ● Task: translate this sentence from German into English er geht ja nicht nach hause er ja nicht he does not ● Pick phrase in input, translate – it is allowed to pick words out of sequence reordering – phrases may have multiple words: many-to-many translation Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Translation Process 53 ● Task: translate this sentence from German into English er geht ja nicht nach hause er geht ja nicht he does not go ● Pick phrase in input, translate Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Translation Process 54 ● Task: translate this sentence from German into English er geht ja nicht nach hause er geht ja nicht nach hause he does not go home ● Pick phrase in input, translate Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

55 decoding process Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Translation Options 56 er geht ja nicht nach hause he is yes not after house it are is do not to home , it goes , of course does not according to chamber , he go , is not in at home it is not home he will be is not under house it goes does not return home he goes do not do not is to are following is after all not after does not to not is not are not is not a ● Many translation options to choose from – in Europarl phrase table: 2727 matching phrase pairs for this sentence – by pruning to the top 20 per phrase, 202 translation options remain Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Translation Options 57 er geht ja nicht nach hause he is yes not after house it are is do not to home , it goes , of course does not according to chamber , he go is not in at home it is not home he will be is not under house it goes does not return home he goes do not do not is to are following is after all not after does not to not is not are not is not a ● The machine translation decoder does not know the right answer – picking the right translation options – arranging them in the right order → Search problem solved by heuristic beam search Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Decoding: Precompute Translation Options 58 er geht ja nicht nach hause consult phrase translation table for all input phrases Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Decoding: Start with Initial Hypothesis 59 er geht ja nicht nach hause initial hypothesis: no input words covered, no output produced Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Decoding: Hypothesis Expansion 60 er geht ja nicht nach hause are pick any translation option, create new hypothesis Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Decoding: Hypothesis Expansion 61 er geht ja nicht nach hause he are it create hypotheses for all other translation options Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Decoding: Hypothesis Expansion 62 er geht ja nicht nach hause yes he home goes are does not go home it to also create hypotheses from created partial hypothesis Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Decoding: Find Best Path 63 er geht ja nicht nach hause yes he home goes are does not go home it to backtrack from highest scoring complete hypothesis Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Recombination 64 ● Two hypothesis paths lead to two matching hypotheses – same number of foreign words translated – same English words in the output – different scores it is it is ● Worse hypothesis is dropped it is Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Stacks 65 goes does not he are it yes no word one word two words three words translated translated translated translated ● Hypothesis expansion in a stack decoder – translation option is applied to hypothesis – new hypothesis is dropped into a stack further down Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

66 syntax-based models Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Phrase Structure Grammar 67 S VP-A VP-A VP-A PP NP-A PRP MD VB VBG RP TO PRP DT NNS I shall be passing on to you some comments Phrase structure grammar tree for an English sentence (as produced Collins’ parser) Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Synchronous Phrase Structure Grammar 68 ● English rule NP → DET JJ NN ● French rule NP → DET NN JJ ● Synchronous rule (indices indicate alignment): NP → DET 1 NN 2 JJ 3 ∣ DET 1 JJ 3 NN 2 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Synchronous Grammar Rules 69 ● Nonterminal rules NP → DET 1 NN 2 JJ 3 ∣ DET 1 JJ 3 NN 2 ● Terminal rules N → maison ∣ house NP → la maison bleue ∣ the blue house ● Mixed rules NP → la maison JJ 1 ∣ the JJ 1 house Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Syntax Decoding 70 ➏ VB drink Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S German input sentence with tree Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Syntax Decoding 71 ➏ ➊ PRO VB drink she Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Purely lexical rule: filling a span with a translation (a constituent in the chart) Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Syntax Decoding 72 ➏ ➊ ➋ PRO NN VB coffee drink she Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Purely lexical rule: filling a span with a translation (a constituent in the chart) Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Syntax Decoding 73 ➏ ➊ ➋ ➌ PRO NN VB coffee drink she Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Purely lexical rule: filling a span with a translation (a constituent in the chart) Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Syntax Decoding 74 ➏ ➍ NP NP PP NN DET NN IN | | | a cup of ➊ ➋ ➌ PRO NN VB coffee drink she Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Complex rule: matching underlying constituent spans, and covering words Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Syntax Decoding 75 ➏ ➎ VP VP VBZ | TO VB NP wants | to ➍ NP NP PP NN DET NN IN | | | a cup of ➊ ➋ ➌ PRO NN VB coffee drink she Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Complex rule with reordering Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Syntax Decoding 76 ➏ S PRO VP ➎ VP VP VBZ | TO NP VB wants | to ➍ NP NP PP NN DET NN IN | | | a cup of ➊ ➋ ➌ NN VB PRO she coffee drink Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

77 neural language models Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

N-Gram Backoff Language Model 78 ● Previously, we approximated p ( W ) = p ( w 1 ,w 2 ,...,w n ) ● ... by applying the chain rule p ( W ) = ∑ p ( w i ∣ w 1 ,...,w i − 1 ) i ● ... and limiting the history (Markov order) p ( w i ∣ w 1 ,...,w i − 1 ) ≃ p ( w i ∣ w i − 4 ,w i − 3 ,w i − 2 ,w i − 1 ) ● Each p ( w i ∣ w i − 4 ,w i − 3 ,w i − 2 ,w i − 1 ) may not have enough statistics to estimate → we back off to p ( w i ∣ w i − 3 ,w i − 2 ,w i − 1 ) , p ( w i ∣ w i − 2 ,w i − 1 ) , etc., all the way to p ( w i ) – exact details of backing off get complicated — ”interpolated Kneser-Ney” Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

First Sketch 79 Word 1 Hidden Layer Word 2 Word 5 Word 3 Word 4 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Representing Words 80 ● Words are represented with a one-hot vector, e.g., – dog = (0,0,0,0,1,0,0,0,0,....) – cat = (0,0,0,0,0,0,0,1,0,....) – eat = (0,1,0,0,0,0,0,0,0,....) ● That’s a large vector! Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Second Sketch 81 Word 1 Hidden Layer Word 2 Word 5 Word 3 Word 4 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Add a Hidden Layer 82 Word 1 C Hidden Layer Word 2 C Word 5 Word 3 C Word 4 C ● Map each word first into a lower-dimensional real-valued space ● Shared weight matrix C Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Details (Bengio et al., 2003) 83 ● Add direct connections from embedding layer to output layer ● Activation functions – input → embedding: none – embedding → hidden: tanh – hidden → output: softmax ● Training – loop through the entire corpus – update between predicted probabilities and 1-hot vector for output word Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Word Embeddings 84 Word Embedding C ● By-product: embedding of word into continuous space ● Similar contexts → similar embedding ● Recall: distributional semantics Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Word Embeddings 85 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Word Embeddings 86 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Are Word Embeddings Magic? 87 ● Morphosyntactic regularities (Mikolov et al., 2013) – adjectives base form vs. comparative, e.g., good, better – nouns singular vs. plural, e.g., year, years – verbs present tense vs. past tense, e.g., see, saw ● Semantic regularities – clothing is to shirt as dish is to bowl – evaluated on human judgment data of semantic similarities Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

88 recurrent neural networks Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Recurrent Neural Networks 89 1 Word 1 E H Word 2 C ● Start: predict second word from first ● Mystery layer with nodes all with value 1 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Recurrent Neural Networks 90 1 Word 1 E H Word 2 C copy values H Word 2 E H Word 3 C Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Recurrent Neural Networks 91 1 Word 1 E H Word 2 C copy values H Word 2 E H Word 3 C copy values H Word 3 E H Word 4 C Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Training 92 1 Word 1 E H Word 2 ● Process first training example ● Update weights with back-propagation Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

93 neural translation model Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Feed Forward Neural Language Model 94 Word 1 C Hidden Layer Word 2 C Word 5 Word 3 C Word 4 C Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Recurrent Neural Language Model 95 <s> Given word Predict the first word Embedding of a sentence Hidden Same as before, state just drawn top-down Predicted word the Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Recurrent Neural Language Model 96 <s> the Given word Predict the second word Embedding of a sentence Hidden Re-use hidden state state from first word prediction Predicted word the house Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Recurrent Neural Language Model 97 <s> house the Given word Predict the third word Embedding of a sentence Hidden ... and so on state Predicted word the house is Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Recurrent Neural Language Model 98 <s> house is . the big Given word Embedding Hidden state Predicted word the house is big . </s> Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Recurrent Neural Translation Model 99 ● We predicted the words of a sentence ● Why not also predict their translations? Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn - PowerPoint PPT Presentation

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020 Machine Translation: French (2012) 1 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020 Machine

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Global Translation Services Website translation using post-edited machine translation and

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

History & Evaluation CMSC 470 Marine Carpuat T odays topics Machine Translation

Algorithms for NLP Parsing II Anjalie Field CMU Slides adapted from: Dan Klein UC

Slide 1 / 7 Slide 2 / 7 1 The fruit fly has eight chromosomes. During the S phase of

1 Peter Series Lesson #005 February 19, 2015 Dean Bible Ministries www.deanbibleministries.org

Cast of Characters Allison Abayasekara, MA Association of Clinicians for the Underserved Pamela

Harrisburg Elementary Virtual Open House Fall 2020 Meet Mrs. Baran This is going to be my 4th

Medication Safety Tips For Opioid Use If prescribed an opioid medication, you should...

3D Digitalization Techniques Applied to Cultural Heritage Ricardo Marroquim 17 March, 2010 part

Current status and future plans for NeuroTools Pierre Yger BioEngineering Department, Imperial

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn - PowerPoint PPT Presentation

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020 Machine Translation: French (2012) 1 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020 Machine

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Global Translation Services Website translation using post-edited machine translation and

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

History &amp; Evaluation CMSC 470 Marine Carpuat T odays topics Machine Translation

Algorithms for NLP Parsing II Anjalie Field CMU Slides adapted from: Dan Klein UC

Slide 1 / 7 Slide 2 / 7 1 The fruit fly has eight chromosomes. During the S phase of

1 Peter Series Lesson #005 February 19, 2015 Dean Bible Ministries www.deanbibleministries.org

Cast of Characters Allison Abayasekara, MA Association of Clinicians for the Underserved Pamela

Harrisburg Elementary Virtual Open House Fall 2020 Meet Mrs. Baran This is going to be my 4th

Medication Safety Tips For Opioid Use If prescribed an opioid medication, you should...

3D Digitalization Techniques Applied to Cultural Heritage Ricardo Marroquim 17 March, 2010 part

Current status and future plans for NeuroTools Pierre Yger BioEngineering Department, Imperial

History & Evaluation CMSC 470 Marine Carpuat T odays topics Machine Translation