Phrase-Based Models Philipp Koehn 15 September 2020 Philipp Koehn - PowerPoint PPT Presentation

Phrase-Based Models Philipp Koehn 15 September 2020 Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Motivation 1 • Word-Based Models translate words as atomic units • Phrase-Based Models translate phrases as atomic units • Advantages: – many-to-many translation can handle non-compositional phrases – use of local context in translation – the more data, the longer phrases can be learned • ”Standard Model”, used by Google Translate and others Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Phrase-Based Model 2 • Foreign input is segmented in phrases • Each phrase is translated into English • Phrases are reordered Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Phrase Translation Table 3 • Main knowledge source: table with phrase translations and their probabilities • Example: phrase translations for natuerlich e | ¯ Translation Probability φ (¯ f ) of course 0.5 naturally 0.3 of course , 0.15 , of course , 0.05 Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Real Example 4 • Phrase translations for den Vorschlag learned from the Europarl corpus: e | ¯ e | ¯ English φ (¯ f ) English φ (¯ f ) the proposal 0.6227 the suggestions 0.0114 ’s proposal 0.1068 the proposed 0.0114 a proposal 0.0341 the motion 0.0091 the idea 0.0250 the idea of 0.0091 this proposal 0.0227 the proposal , 0.0068 proposal 0.0205 its proposal 0.0068 of the proposal 0.0159 it 0.0068 the proposals 0.0159 ... ... – lexical variation (proposal vs suggestions) – morphological variation (proposal vs proposals) – included function words (the, a, ...) – noise (it) Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Linguistic Phrases? 5 • Model is not limited to linguistic phrases (noun phrases, verb phrases, prepositional phrases, ...) • Example non-linguistic phrase pair spass am → fun with the • Prior noun often helps with translation of preposition • Experiments show that limitation to linguistic phrases hurts quality Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

6 modeling Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Noisy Channel Model 7 • We would like to integrate a language model • Bayes rule p ( f | e ) p ( e ) argmax e p ( e | f ) = argmax e p ( f ) = argmax e p ( f | e ) p ( e ) Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Noisy Channel Model 8 • Applying Bayes rule also called noisy channel model – we observe a distorted message R (here: a foreign string f ) – we have a model on how the message is distorted (here: translation model) – we have a model on what messages are probably (here: language model) – we want to recover the original message S (here: an English string e ) Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

More Detail 9 • Bayes rule e best = argmax e p ( e | f ) = argmax e p ( f | e ) p LM ( e ) – translation model p ( f | e ) – language model p LM ( e ) • Decomposition of the translation model I p ( ¯ φ ( ¯ � f I e I 1 | ¯ 1 ) = f i | ¯ e i ) d ( start i − end i − 1 − 1) i =1 – phrase translation probability φ – reordering probability d Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Distance-Based Reordering 10 d=-3 d=0 d=1 d=2 foreign 1 2 3 4 5 6 7 English phrase translates movement distance 1 1–3 start at beginning 0 2 6 skip over 4–5 +2 3 4–5 move back over 4–6 -3 4 7 skip over 6 +1 Scoring function: d ( x ) = α | x | — exponential with distance Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

11 training Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Learning a Phrase Translation Table 12 • Task: learn the model from a parallel corpus • Three stages: – word alignment: using IBM models or other method – extraction of phrase pairs – scoring phrase pairs Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Word Alignment 13 michael davon bleibt haus dass geht aus im er , michael assumes that he will stay in the house Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Extracting Phrase Pairs 14 michael davon bleibt haus dass geht aus im er , michael assumes that he will stay in the house extract phrase pair consistent with word alignment: assumes that / geht davon aus , dass Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Consistent 15 ok violated ok one unaligned alignment word is fine point outside All words of the phrase pair have to align to each other. Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Consistent 16 e, ¯ f ) consistent with an alignment A , if all words f 1 , ..., f n in ¯ Phrase pair (¯ f that have alignment points in A have these with words e 1 , ..., e n in ¯ e and vice versa: e, ¯ (¯ f ) consistent with A ⇔ e : ( e i , f j ) ∈ A → f j ∈ ¯ ∀ e i ∈ ¯ f AND ∀ f j ∈ ¯ f : ( e i , f j ) ∈ A → e i ∈ ¯ e e, f j ∈ ¯ AND ∃ e i ∈ ¯ f : ( e i , f j ) ∈ A Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Phrase Pair Extraction 17 michael davon bleibt dass haus geht aus im er , michael assumes that he will stay in the house Smallest phrase pairs: michael — michael assumes — geht davon aus / geht davon aus , that — dass / , dass he — er will stay — bleibt in the — im house — haus unaligned words (here: German comma) lead to multiple translations Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Larger Phrase Pairs 18 michael davon bleibt haus dass geht aus im er , michael assumes that he will stay in the house michael assumes — michael geht davon aus / michael geht davon aus , assumes that — geht davon aus , dass ; assumes that he — geht davon aus , dass er that he — dass er / , dass er ; in the house — im haus michael assumes that — michael geht davon aus , dass michael assumes that he — michael geht davon aus , dass er michael assumes that he will stay in the house — michael geht davon aus , dass er im haus bleibt assumes that he will stay in the house — geht davon aus , dass er im haus bleibt that he will stay in the house — dass er im haus bleibt ; dass er im haus bleibt , he will stay in the house — er im haus bleibt ; will stay in the house — im haus bleibt Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Scoring Phrase Translations 19 • Phrase pair extraction: collect all phrase pairs from the data • Phrase pair scoring: assign probabilities to phrase translations • Score by relative frequency: e, ¯ count (¯ f ) φ ( ¯ f | ¯ e ) = e, ¯ � f i count (¯ f i ) ¯ Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

EM Training of the Phrase Model 20 • We presented a heuristic set-up to build phrase translation table (word alignment, phrase extraction, phrase scoring) • Alternative: align phrase pairs directly with EM algorithm e, ¯ – initialization: uniform model, all φ (¯ f ) are the same – expectation step: ∗ estimate likelihood of all possible phrase alignments for all sentence pairs – maximization step: e, ¯ ∗ collect counts for phrase pairs (¯ f ) , weighted by alignment probability e, ¯ ∗ update phrase translation probabilties p (¯ f ) • However: method easily overfits (learns very large phrase pairs, spanning entire sentences) Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Size of the Phrase Table 21 • Phrase translation table typically bigger than corpus ... even with limits on phrase lengths (e.g., max 7 words) → Too big to store in memory? • Solution for training – extract to disk, sort, construct for one source phrase at a time • Solutions for decoding – on-disk data structures with index for quick look-ups – suffix arrays to create phrase pairs on demand Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

22 advanced modeling Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Weighted Model 23 • Described standard model consists of three sub-models – phrase translation model φ ( ¯ f | ¯ e ) – reordering model d – language model p LM ( e ) | e | I φ ( ¯ � � e best = argmax e f i | ¯ e i ) d ( start i − end i − 1 − 1) p LM ( e i | e 1 ...e i − 1 ) i =1 i =1 • Some sub-models may be more important than others • Add weights λ φ , λ d , λ LM | e | I φ ( ¯ e i ) λ φ d ( start i − end i − 1 − 1) λ d � � p LM ( e i | e 1 ...e i − 1 ) λ LM e best = argmax e f i | ¯ i =1 i =1 Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Log-Linear Model 24 • Such a weighted model is a log-linear model: n � p ( x ) = exp λ i h i ( x ) i =1 • Our feature functions – number of feature function n = 3 – random variable x = ( e, f, start, end ) – feature function h 1 = log φ – feature function h 2 = log d – feature function h 3 = log p LM Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020

Phrase-Based Models Philipp Koehn 15 September 2020 Philipp Koehn - PowerPoint PPT Presentation

Phrase-Based Models Philipp Koehn 15 September 2020 Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020 Motivation 1 Word-Based Models translate words as atomic units Phrase-Based Models translate phrases as

Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase Alignment Dan Klein UC

Building a Phrase-based SMT System Graham Neubig & Kevin Duh Nara Institute of Science and

Chapter 5 Phrase-based models Statistical Machine Translation Motivation Word-Based Models

What Is an Expanded Noun Phrase? An expanded noun phrase gives much more detail than a simple

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

14 Symbolic MT 3: Phrase-based MT The previous two sections introduced word-by-word models of

Phrasal Rank-Encoding Exploiting Phrase Redundancy and Translational Relations for Phrase Table

NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig Nara Institute of Science

Southern Pinghua and its Noun Southern Pinghua and its Noun Southern Pinghua and its Noun

Translation Model Parallel corpus source target translation e f phrase phrase features

Overview Learning phrases from alignments A phrase-based model 6.864 (Fall 2007)

Efficient solutions for word reordering in German-English phrase-based SMT Arianna Bisazza &

CSE 517 Natural Language Processing Winter 2015 Phrase Based Translation Yejin Choi Slides

Statistical Phrase-Based Translation Philipp Koehn, Franz Och, Daniel Marcu koehn@isi.edu,

Dynamically shaping the reordering search space of phrase-based SMT Arianna Bisazza &

Lexical Syntax for Dependency-based Language Models Statistical Machine Translation Incremental

Gaussian Approximation of Quantization Error for Inference from Compressed Data Alon Kipnis

Lightning Introductions Digital Computing Beyond Moores Law May 3-4, 2018 Sarita

World Scout Badge World Scout Badge History of Scouting History of Scouting in Singapore

A Look into Far Detector Photon Rates Caroline Zhang Calibration Consortium Meeting March 9,

Natural Language Processing (CSEP 517): Machine Translation Noah Smith 2017 c University of

Coordinator Orientation July 15, 2020 Jennifer Norton and Anika Harris Welcome and Introductions

Decentralized Proof-Term Library Michael Nahas affliliated with Radboud Universiteit Nijmegen

Multilingual Entity Linking: Comparing English and Spanish Henry Rosales-M endez, Barbara

Phrase-Based Models Philipp Koehn 15 September 2020 Philipp Koehn - PowerPoint PPT Presentation

Phrase-Based Models Philipp Koehn 15 September 2020 Philipp Koehn Machine Translation: Phrase-Based Models 15 September 2020 Motivation 1 Word-Based Models translate words as atomic units Phrase-Based Models translate phrases as

Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase Alignment Dan Klein UC

Building a Phrase-based SMT System Graham Neubig &amp; Kevin Duh Nara Institute of Science and

Chapter 5 Phrase-based models Statistical Machine Translation Motivation Word-Based Models

What Is an Expanded Noun Phrase? An expanded noun phrase gives much more detail than a simple

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

14 Symbolic MT 3: Phrase-based MT The previous two sections introduced word-by-word models of

Phrasal Rank-Encoding Exploiting Phrase Redundancy and Translational Relations for Phrase Table

NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig Nara Institute of Science

Southern Pinghua and its Noun Southern Pinghua and its Noun Southern Pinghua and its Noun

Translation Model Parallel corpus source target translation e f phrase phrase features

Overview Learning phrases from alignments A phrase-based model 6.864 (Fall 2007)

Efficient solutions for word reordering in German-English phrase-based SMT Arianna Bisazza &amp;

CSE 517 Natural Language Processing Winter 2015 Phrase Based Translation Yejin Choi Slides

Statistical Phrase-Based Translation Philipp Koehn, Franz Och, Daniel Marcu koehn@isi.edu,

Dynamically shaping the reordering search space of phrase-based SMT Arianna Bisazza &amp;

Lexical Syntax for Dependency-based Language Models Statistical Machine Translation Incremental

Gaussian Approximation of Quantization Error for Inference from Compressed Data Alon Kipnis

Lightning Introductions Digital Computing Beyond Moores Law May 3-4, 2018 Sarita

World Scout Badge World Scout Badge History of Scouting History of Scouting in Singapore

A Look into Far Detector Photon Rates Caroline Zhang Calibration Consortium Meeting March 9,

Natural Language Processing (CSEP 517): Machine Translation Noah Smith 2017 c University of

Coordinator Orientation July 15, 2020 Jennifer Norton and Anika Harris Welcome and Introductions

Decentralized Proof-Term Library Michael Nahas affliliated with Radboud Universiteit Nijmegen

Multilingual Entity Linking: Comparing English and Spanish Henry Rosales-M endez, Barbara

Building a Phrase-based SMT System Graham Neubig & Kevin Duh Nara Institute of Science and

Efficient solutions for word reordering in German-English phrase-based SMT Arianna Bisazza &

Dynamically shaping the reordering search space of phrase-based SMT Arianna Bisazza &