Chapter 6 Decoding Statistical Machine Translation Decoding We - PowerPoint PPT Presentation

Chapter 6 Decoding Statistical Machine Translation

Decoding • We have a mathematical model for translation p ( e | f ) • Task of decoding: find the translation e best with highest probability e best = argmax e p ( e | f ) • Two types of error – the most probable translation is bad → fix the model – search does not find the most probably translation → fix the search • Decoding is evaluated by search error, not quality of translations (although these are often correlated) Chapter 6: Decoding 1

Translation Process • Task: translate this sentence from German into English er geht ja nicht nach hause Chapter 6: Decoding 2

Translation Process • Task: translate this sentence from German into English er geht ja nicht nach hause er he • Pick phrase in input, translate Chapter 6: Decoding 3

Translation Process • Task: translate this sentence from German into English er geht ja nicht nach hause er ja nicht he does not • Pick phrase in input, translate – it is allowed to pick words out of sequence reordering – phrases may have multiple words: many-to-many translation Chapter 6: Decoding 4

Translation Process • Task: translate this sentence from German into English er geht ja nicht nach hause er geht ja nicht he does not go • Pick phrase in input, translate Chapter 6: Decoding 5

Translation Process • Task: translate this sentence from German into English er geht ja nicht nach hause er geht ja nicht nach hause he does not go home • Pick phrase in input, translate Chapter 6: Decoding 6

Computing Translation Probability • Probabilistic model for phrase-based translation: I φ ( ¯ � e best = argmax e f i | ¯ e i ) d ( start i − end i − 1 − 1) p lm ( e ) i =1 • Score is computed incrementally for each partial hypothesis • Components Phrase translation Picking phrase ¯ f i to be translated as a phrase ¯ e i → look up score φ ( ¯ f i | ¯ e i ) from phrase translation table Reordering Previous phrase ended in end i − 1 , current phrase starts at start i → compute d ( start i − end i − 1 − 1) Language model For n -gram model, need to keep track of last n − 1 words → compute score p lm ( w i | w i − ( n − 1) , ..., w i − 1 ) for added words w i Chapter 6: Decoding 7

Translation Options er geht ja nicht nach hause he is yes not after house it are is do not to home , it goes , of course does not according to chamber , he go , is not in at home it is not home he will be is not under house it goes does not return home he goes do not do not is to are following is after all not after does not to not is not are not is not a • Many translation options to choose from – in Europarl phrase table: 2727 matching phrase pairs for this sentence – by pruning to the top 20 per phrase, 202 translation options remain Chapter 6: Decoding 8

Translation Options er geht ja nicht nach hause he is yes not after house it are is do not to home , it goes , of course does not according to chamber , he go is not in at home it is not home he will be is not under house it goes does not return home he goes do not do not is to are following is after all not after does not to not is not are not is not a • The machine translation decoder does not know the right answer – picking the right translation options – arranging them in the right order → Search problem solved by heuristic beam search Chapter 6: Decoding 9

Decoding: Precompute Translation Options er geht ja nicht nach hause consult phrase translation table for all input phrases Chapter 6: Decoding 10

Decoding: Start with Initial Hypothesis er geht ja nicht nach hause initial hypothesis: no input words covered, no output produced Chapter 6: Decoding 11

Decoding: Hypothesis Expansion er geht ja nicht nach hause are pick any translation option, create new hypothesis Chapter 6: Decoding 12

Decoding: Hypothesis Expansion er geht ja nicht nach hause he are it create hypotheses for all other translation options Chapter 6: Decoding 13

Decoding: Hypothesis Expansion er geht ja nicht nach hause yes he home goes are does not go home it to also create hypotheses from created partial hypothesis Chapter 6: Decoding 14

Decoding: Find Best Path er geht ja nicht nach hause yes he home goes are does not go home it to backtrack from highest scoring complete hypothesis Chapter 6: Decoding 15

Computational Complexity • The suggested process creates exponential number of hypothesis • Machine translation decoding is NP-complete • Reduction of search space: – recombination (risk-free) – pruning (risky) Chapter 6: Decoding 16

Recombination • Two hypothesis paths lead to two matching hypotheses – same number of foreign words translated – same English words in the output – different scores it is it is • Worse hypothesis is dropped it is Chapter 6: Decoding 17

Recombination • Two hypothesis paths lead to hypotheses indistinguishable in subsequent search – same number of foreign words translated – same last two English words in output (assuming trigram language model) – same last foreign word translated – different scores he does not it does not • Worse hypothesis is dropped he does not it Chapter 6: Decoding 18

Restrictions on Recombination • Translation model: Phrase translation independent from each other → no restriction to hypothesis recombination • Language model: Last n − 1 words used as history in n -gram language model → recombined hypotheses must match in their last n − 1 words • Reordering model: Distance-based reordering model based on distance to end position of previous input phrase → recombined hypotheses must have that same end position • Other feature function may introduce additional restrictions Chapter 6: Decoding 19

Pruning • Recombination reduces search space, but not enough (we still have a NP complete problem on our hands) • Pruning: remove bad hypotheses early – put comparable hypothesis into stacks (hypotheses that have translated same number of input words) – limit number of hypotheses in each stack Chapter 6: Decoding 20

Stacks goes does not he are it yes no word one word two words three words translated translated translated translated • Hypothesis expansion in a stack decoder – translation option is applied to hypothesis – new hypothesis is dropped into a stack further down Chapter 6: Decoding 21

Stack Decoding Algorithm 1: place empty hypothesis into stack 0 2: for all stacks 0... n − 1 do for all hypotheses in stack do 3: for all translation options do 4: if applicable then 5: create new hypothesis 6: place in stack 7: recombine with existing hypothesis if possible 8: prune stack if too big 9: end if 10: end for 11: end for 12: 13: end for Chapter 6: Decoding 22

Pruning • Pruning strategies – histogram pruning: keep at most k hypotheses in each stack – stack pruning: keep hypothesis with score α × best score ( α < 1 ) • Computational time complexity of decoding with histogram pruning O ( max stack size × translation options × sentence length ) • Number of translation options is linear with sentence length, hence: O ( max stack size × sentence length 2 ) • Quadratic complexity Chapter 6: Decoding 23

Reordering Limits • Limiting reordering to maximum reordering distance • Typical reordering distance 5–8 words – depending on language pair – larger reordering limit hurts translation quality • Reduces complexity to linear O ( max stack size × sentence length ) • Speed / quality trade-off by setting maximum stack size Chapter 6: Decoding 24

Translating the Easy Part First? the tourism initiative addresses this for the first time the tourism initiative die touristische initiative tm:-0.19,lm:-0.4, tm:-1.16,lm:-2.93 tm:-1.21,lm:-4.67 d:0, all:-0.65 d:0, all:-4.09 d:0, all: -5.88 the first time das erste mal tm:-0.56,lm:-2.81 d:-0.74. all: -4.11 both hypotheses translate 3 words worse hypothesis has better score Chapter 6: Decoding 25

Estimating Future Cost • Future cost estimate: how expensive is translation of rest of sentence? • Optimistic: choose cheapest translation options • Cost for each translation option – translation model : cost known – language model: output words known, but not context → estimate without context – reordering model: unknown, ignored for future cost estimation Chapter 6: Decoding 26

Cost Estimates from Translation Options the tourism initiative addresses this for the first time -1.0 -2.0 -1.5 -2.4 -1.4 -1.0 -1.0 -1.9 -1.6 -4.0 -2.5 -2.2 -1.3 -2.4 -2.7 -2.3 -2.3 -2.3 cost of cheapest translation options for each input span (log-probabilities) Chapter 6: Decoding 27

Chapter 6 Decoding Statistical Machine Translation Decoding We - PowerPoint PPT Presentation

Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for translation p ( e | f ) Task of decoding: find the translation e best with highest probability e best = argmax e p ( e | f ) Two types of

By et al Siegfried Engelmann Decoding Strategies: Decoding B1- Teacher's Presentation Book

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17

Efficient Video Decoding on GPUs Efficient Video Decoding on GPUs by Point Based Rendering by

List Decoding of Algebraic Codes Peter Beelen, Kristian Brander and Johan S.R. Nielsen DTU

Observation Decoding with Sensor Models: Recognition Tasks via Classical Planning Diego Aineto,

Beyond Sequential decoding toward parallel decoding In the context of neural sequence modelling

Decoding One Out of Many Nicolas Sendrier INRIA Paris-Rocquencourt, equipe-projet SECRET

Why decoding? Understanding the neural code. Neural Decoding Given spikes, what was the

Decoding Reed-Muller codes over product sets John Kim, Swastik Kopparty Rutgers University May

6.02 Fall 2012 Lecture #7 Viterbi decoding of convolutional codes Path and branch metrics

Decoding in SMT Nitin Madnani February 8, 2006 The Decoding Problem Search Inputs:

Syntax-Based Decoding Philipp Koehn 9 November 2017 Philipp Koehn Machine Translation:

Smaller decoding exponents: ball-collision decoding D. J. Bernstein University of Illinois at

Decoding challenge Assessing the practical hardness of syndrome decoding for code-based

Missing-data masks in all-combinations multi-band decoding It is shown that for MAP decoding

Syntax-Based Decoding 2 Philipp Koehn 14 November 2017 Philipp Koehn Machine Translation:

Overview Multidimensional Databases Cubes: Dimensions, Facts, Measures OLAP queries

Designing and Implementing an Emergency HOME TBRA Program Part 3: Office Hours June 24, 2020 1

Using Online Ac,vity as Digital Fingerprints to Create a

High Order Masking of Look-up Tables with Common Shares J-S.Coron, F.Rondepierre, R.Zeitoun 12th

The Hard(er?) Problems Phillip Hallam-Baker Comodo Group Inc. 'Four' Box Model Overt Covert

Economic MPC of Thermal Storage for Demand Response American Control Conference, July 1, 2015

E-R Diagram Database Development We know how to query a database using SQL A set of

Generics for the Working ML'er Generics for the Working ML'er Vesa Karvonen University of