Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine - PowerPoint PPT Presentation

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17 September 2020

Decoding 1 • We have a mathematical model for translation p ( e | f ) • Task of decoding: find the translation e best with highest probability e best = argmax e p ( e | f ) • Two types of error – the most probable translation is bad → fix the model – search does not find the most probably translation → fix the search • Decoding is evaluated by search error, not quality of translations (although these are often correlated) Philipp Koehn Machine Translation: Decoding 17 September 2020

2 translation process Philipp Koehn Machine Translation: Decoding 17 September 2020

Translation Process 3 • Task: translate this sentence from German into English er geht ja nicht nach hause Philipp Koehn Machine Translation: Decoding 17 September 2020

Translation Process 4 • Task: translate this sentence from German into English er geht ja nicht nach hause er he • Pick phrase in input, translate Philipp Koehn Machine Translation: Decoding 17 September 2020

Translation Process 5 • Task: translate this sentence from German into English er geht ja nicht nach hause er ja nicht he does not • Pick phrase in input, translate – it is allowed to pick words out of sequence reordering – phrases may have multiple words: many-to-many translation Philipp Koehn Machine Translation: Decoding 17 September 2020

Translation Process 6 • Task: translate this sentence from German into English er geht ja nicht nach hause er geht ja nicht he does not go • Pick phrase in input, translate Philipp Koehn Machine Translation: Decoding 17 September 2020

Translation Process 7 • Task: translate this sentence from German into English er geht ja nicht nach hause er geht ja nicht nach hause he does not go home • Pick phrase in input, translate Philipp Koehn Machine Translation: Decoding 17 September 2020

Computing Translation Probability 8 • Probabilistic model for phrase-based translation: I φ ( ¯ � e best = argmax e f i | ¯ e i ) d ( start i − end i − 1 − 1) p LM ( e ) i =1 • Score is computed incrementally for each partial hypothesis • Components Phrase translation Picking phrase ¯ f i to be translated as a phrase ¯ e i → look up score φ ( ¯ f i | ¯ e i ) from phrase translation table Reordering Previous phrase ended in end i − 1 , current phrase starts at start i → compute d ( start i − end i − 1 − 1) Language model For n -gram model, need to keep track of last n − 1 words → compute score p LM ( w i | w i − ( n − 1) , ..., w i − 1 ) for added words w i Philipp Koehn Machine Translation: Decoding 17 September 2020

9 decoding process Philipp Koehn Machine Translation: Decoding 17 September 2020

Translation Options 10 er geht ja nicht nach hause he is yes not after house it are is do not to home , it goes , of course does not according to chamber , he go , is not in at home it is not home he will be is not under house it goes does not return home he goes do not do not is to are following is after all not after does not to not is not are not is not a • Many translation options to choose from – in Europarl phrase table: 2727 matching phrase pairs for this sentence – by pruning to the top 20 per phrase, 202 translation options remain Philipp Koehn Machine Translation: Decoding 17 September 2020

Translation Options 11 er geht ja nicht nach hause he is yes not after house it are is do not to home , it goes , of course does not according to chamber , he go is not in at home it is not home he will be is not under house it goes does not return home he goes do not do not is to are following is after all not after does not to not is not are not is not a • The machine translation decoder does not know the right answer – picking the right translation options – arranging them in the right order → Search problem solved by heuristic beam search Philipp Koehn Machine Translation: Decoding 17 September 2020

Decoding: Precompute Translation Options 12 er geht ja nicht nach hause consult phrase translation table for all input phrases Philipp Koehn Machine Translation: Decoding 17 September 2020

Decoding: Start with Initial Hypothesis 13 er geht ja nicht nach hause initial hypothesis: no input words covered, no output produced Philipp Koehn Machine Translation: Decoding 17 September 2020

Decoding: Hypothesis Expansion 14 er geht ja nicht nach hause are pick any translation option, create new hypothesis Philipp Koehn Machine Translation: Decoding 17 September 2020

Decoding: Hypothesis Expansion 15 er geht ja nicht nach hause he are it create hypotheses for all other translation options Philipp Koehn Machine Translation: Decoding 17 September 2020

Decoding: Hypothesis Expansion 16 er geht ja nicht nach hause yes he home goes are does not go home it to also create hypotheses from created partial hypothesis Philipp Koehn Machine Translation: Decoding 17 September 2020

Decoding: Find Best Path 17 er geht ja nicht nach hause yes he home goes are does not go home it to backtrack from highest scoring complete hypothesis Philipp Koehn Machine Translation: Decoding 17 September 2020

18 dynamic programming Philipp Koehn Machine Translation: Decoding 17 September 2020

Computational Complexity 19 • The suggested process creates exponential number of hypothesis • Machine translation decoding is NP-complete • Reduction of search space: – recombination (risk-free) – pruning (risky) Philipp Koehn Machine Translation: Decoding 17 September 2020

Recombination 20 • Two hypothesis paths lead to two matching hypotheses – same foreign words translated – same English words in the output it is it is • Worse hypothesis is dropped it is Philipp Koehn Machine Translation: Decoding 17 September 2020

Recombination 21 • Two hypothesis paths lead to hypotheses indistinguishable in subsequent search – same foreign words translated – same last two English words in output (assuming trigram language model) – same last foreign word translated he does not it does not • Worse hypothesis is dropped he does not it Philipp Koehn Machine Translation: Decoding 17 September 2020

Restrictions on Recombination 22 • Translation model: Phrase translation independent from each other → no restriction to hypothesis recombination • Language model: Last n − 1 words used as history in n -gram language model → recombined hypotheses must match in their last n − 1 words • Reordering model: Distance-based reordering model based on distance to end position of previous input phrase → recombined hypotheses must have that same end position • Other feature function may introduce additional restrictions Philipp Koehn Machine Translation: Decoding 17 September 2020

23 pruning Philipp Koehn Machine Translation: Decoding 17 September 2020

Pruning 24 • Recombination reduces search space, but not enough (we still have a NP complete problem on our hands) • Pruning: remove bad hypotheses early – put comparable hypothesis into stacks (hypotheses that have translated same number of input words) – limit number of hypotheses in each stack Philipp Koehn Machine Translation: Decoding 17 September 2020

Stacks 25 goes does not he are it yes no word one word two words three words translated translated translated translated • Hypothesis expansion in a stack decoder – translation option is applied to hypothesis – new hypothesis is dropped into a stack further down Philipp Koehn Machine Translation: Decoding 17 September 2020

Stack Decoding Algorithm 26 1: place empty hypothesis into stack 0 2: for all stacks 0... n − 1 do for all hypotheses in stack do 3: for all translation options do 4: if applicable then 5: create new hypothesis 6: place in stack 7: recombine with existing hypothesis if possible 8: prune stack if too big 9: end if 10: end for 11: end for 12: 13: end for Philipp Koehn Machine Translation: Decoding 17 September 2020

Pruning 27 • Pruning strategies – histogram pruning: keep at most k hypotheses in each stack – stack pruning: keep hypothesis with score α × best score ( α < 1 ) • Computational time complexity of decoding with histogram pruning O ( max stack size × translation options × sentence length ) • Number of translation options is linear with sentence length, hence: O ( max stack size × sentence length 2 ) • Quadratic complexity Philipp Koehn Machine Translation: Decoding 17 September 2020

Reordering Limits 28 • Limiting reordering to maximum reordering distance • Typical reordering distance 5–8 words – depending on language pair – larger reordering limit hurts translation quality • Reduces complexity to linear O ( max stack size × sentence length ) • Speed / quality trade-off by setting maximum stack size Philipp Koehn Machine Translation: Decoding 17 September 2020

29 future cost estimation Philipp Koehn Machine Translation: Decoding 17 September 2020

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine - PowerPoint PPT Presentation

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17 September 2020 Decoding 1 We have a mathematical model for translation p ( e | f ) Task of decoding: find the translation e best with highest

By et al Siegfried Engelmann Decoding Strategies: Decoding B1- Teacher's Presentation Book

Efficient Video Decoding on GPUs Efficient Video Decoding on GPUs by Point Based Rendering by

Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for

List Decoding of Algebraic Codes Peter Beelen, Kristian Brander and Johan S.R. Nielsen DTU

Observation Decoding with Sensor Models: Recognition Tasks via Classical Planning Diego Aineto,

Beyond Sequential decoding toward parallel decoding In the context of neural sequence modelling

Decoding One Out of Many Nicolas Sendrier INRIA Paris-Rocquencourt, equipe-projet SECRET

Why decoding? Understanding the neural code. Neural Decoding Given spikes, what was the

Decoding Reed-Muller codes over product sets John Kim, Swastik Kopparty Rutgers University May

6.02 Fall 2012 Lecture #7 Viterbi decoding of convolutional codes Path and branch metrics

Decoding in SMT Nitin Madnani February 8, 2006 The Decoding Problem Search Inputs:

Syntax-Based Decoding Philipp Koehn 9 November 2017 Philipp Koehn Machine Translation:

Smaller decoding exponents: ball-collision decoding D. J. Bernstein University of Illinois at

Decoding challenge Assessing the practical hardness of syndrome decoding for code-based

Missing-data masks in all-combinations multi-band decoding It is shown that for MAP decoding

Syntax-Based Decoding 2 Philipp Koehn 14 November 2017 Philipp Koehn Machine Translation:

Boolean and Vector Space Retrieval Models CS 293S, 2017 Some of slides from R. Mooney

Seeded Discovery of Base Relations in Large Corpora Nicholas Andrews 1 Naren Ramakrishnan 2 1 BBN

AgileItera+ons: Planning&Repor+ng MovingFromUserStories

Real-Time Open-Domain QA with Dense-Sparse Phrase Index Minjoon Seo, Jinhyuk Lee, Tom

CS314 Software Engineering Sprint 4 - Worldwide Trips! Dave Matthews Sprint 4 Summary Use

Dynamically shaping the reordering search space of phrase-based SMT Arianna Bisazza &

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Social Engineering Fundamentals Exploiting the Human Bugs Anthony C. Zboralski

Sambuz

Useful Links

Newsletter

Mail Us

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine - PowerPoint PPT Presentation

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17 September 2020 Decoding 1 We have a mathematical model for translation p ( e | f ) Task of decoding: find the translation e best with highest

By et al Siegfried Engelmann Decoding Strategies: Decoding B1- Teacher's Presentation Book

Efficient Video Decoding on GPUs Efficient Video Decoding on GPUs by Point Based Rendering by

Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for

List Decoding of Algebraic Codes Peter Beelen, Kristian Brander and Johan S.R. Nielsen DTU

Observation Decoding with Sensor Models: Recognition Tasks via Classical Planning Diego Aineto,

Beyond Sequential decoding toward parallel decoding In the context of neural sequence modelling

Decoding One Out of Many Nicolas Sendrier INRIA Paris-Rocquencourt, equipe-projet SECRET

Why decoding? Understanding the neural code. Neural Decoding Given spikes, what was the

Decoding Reed-Muller codes over product sets John Kim, Swastik Kopparty Rutgers University May

6.02 Fall 2012 Lecture #7 Viterbi decoding of convolutional codes Path and branch metrics

Decoding in SMT Nitin Madnani February 8, 2006 The Decoding Problem Search Inputs:

Syntax-Based Decoding Philipp Koehn 9 November 2017 Philipp Koehn Machine Translation:

Smaller decoding exponents: ball-collision decoding D. J. Bernstein University of Illinois at

Decoding challenge Assessing the practical hardness of syndrome decoding for code-based

Missing-data masks in all-combinations multi-band decoding It is shown that for MAP decoding

Syntax-Based Decoding 2 Philipp Koehn 14 November 2017 Philipp Koehn Machine Translation:

Boolean and Vector Space Retrieval Models CS 293S, 2017 Some of slides from R. Mooney

Seeded Discovery of Base Relations in Large Corpora Nicholas Andrews 1 Naren Ramakrishnan 2 1 BBN

AgileItera+ons: Planning&amp;Repor+ng MovingFromUserStories

Real-Time Open-Domain QA with Dense-Sparse Phrase Index Minjoon Seo*, Jinhyuk Lee*, Tom

CS314 Software Engineering Sprint 4 - Worldwide Trips! Dave Matthews Sprint 4 Summary Use

Dynamically shaping the reordering search space of phrase-based SMT Arianna Bisazza &amp;

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Social Engineering Fundamentals Exploiting the Human Bugs Anthony C. Zboralski

Sambuz

Useful Links

Newsletter

Mail Us

AgileItera+ons: Planning&Repor+ng MovingFromUserStories

Real-Time Open-Domain QA with Dense-Sparse Phrase Index Minjoon Seo, Jinhyuk Lee, Tom

Dynamically shaping the reordering search space of phrase-based SMT Arianna Bisazza &