Decoding
Philipp Koehn 17 September 2020
Philipp Koehn Machine Translation: Decoding 17 September 2020
Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine - - PowerPoint PPT Presentation
Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17 September 2020 Decoding 1 We have a mathematical model for translation p ( e | f ) Task of decoding: find the translation e best with highest
Philipp Koehn 17 September 2020
Philipp Koehn Machine Translation: Decoding 17 September 2020
1
p(e|f)
ebest = argmaxe p(e|f)
– the most probable translation is bad → fix the model – search does not find the most probably translation → fix the search
(although these are often correlated)
Philipp Koehn Machine Translation: Decoding 17 September 2020
2
Philipp Koehn Machine Translation: Decoding 17 September 2020
3
er geht ja nicht nach hause
Philipp Koehn Machine Translation: Decoding 17 September 2020
4
er geht ja nicht nach hause er he
Philipp Koehn Machine Translation: Decoding 17 September 2020
5
er geht ja nicht nach hause er ja nicht he does not
– it is allowed to pick words out of sequence reordering – phrases may have multiple words: many-to-many translation
Philipp Koehn Machine Translation: Decoding 17 September 2020
6
er geht ja nicht nach hause er geht ja nicht he does not go
Philipp Koehn Machine Translation: Decoding 17 September 2020
7
er geht ja nicht nach hause er geht ja nicht nach hause he does not go home
Philipp Koehn Machine Translation: Decoding 17 September 2020
8
ebest = argmaxe
I
φ( ¯ fi|¯ ei) d(starti − endi−1 − 1) pLM(e)
Phrase translation Picking phrase ¯ fi to be translated as a phrase ¯ ei → look up score φ( ¯ fi|¯ ei) from phrase translation table Reordering Previous phrase ended in endi−1, current phrase starts at starti → compute d(starti − endi−1 − 1) Language model For n-gram model, need to keep track of last n − 1 words → compute score pLM(wi|wi−(n−1), ..., wi−1) for added words wi
Philipp Koehn Machine Translation: Decoding 17 September 2020
9
Philipp Koehn Machine Translation: Decoding 17 September 2020
10
he
er geht ja nicht nach hause
it , it , he is are goes go yes is , of course not do not does not is not after to according to in house home chamber at home not is not does not do not home under house return home do not it is he will be it goes he goes is are is after all does to following not after not to , not is not are not is not a
– in Europarl phrase table: 2727 matching phrase pairs for this sentence – by pruning to the top 20 per phrase, 202 translation options remain
Philipp Koehn Machine Translation: Decoding 17 September 2020
11
he
er geht ja nicht nach hause
it , it , he is are goes go yes is , of course not do not does not is not after to according to in house home chamber at home not is not does not do not home under house return home do not it is he will be it goes he goes is are is after all does to following not after not to not is not are not is not a
– picking the right translation options – arranging them in the right order → Search problem solved by heuristic beam search
Philipp Koehn Machine Translation: Decoding 17 September 2020
12
er geht ja nicht nach hause
consult phrase translation table for all input phrases
Philipp Koehn Machine Translation: Decoding 17 September 2020
13
er geht ja nicht nach hause
initial hypothesis: no input words covered, no output produced
Philipp Koehn Machine Translation: Decoding 17 September 2020
14
er geht ja nicht nach hause
are
pick any translation option, create new hypothesis
Philipp Koehn Machine Translation: Decoding 17 September 2020
15
er geht ja nicht nach hause
are it he
create hypotheses for all other translation options
Philipp Koehn Machine Translation: Decoding 17 September 2020
16
er geht ja nicht nach hause
are it he goes does not yes go to home home
also create hypotheses from created partial hypothesis
Philipp Koehn Machine Translation: Decoding 17 September 2020
17
er geht ja nicht nach hause
are it he goes does not yes go to home home
backtrack from highest scoring complete hypothesis
Philipp Koehn Machine Translation: Decoding 17 September 2020
18
Philipp Koehn Machine Translation: Decoding 17 September 2020
19
– recombination (risk-free) – pruning (risky)
Philipp Koehn Machine Translation: Decoding 17 September 2020
20
– same foreign words translated – same English words in the output
it is it is
it is Philipp Koehn Machine Translation: Decoding 17 September 2020
21
– same foreign words translated – same last two English words in output (assuming trigram language model) – same last foreign word translated
it he does not does not
it he does not Philipp Koehn Machine Translation: Decoding 17 September 2020
22
→ no restriction to hypothesis recombination
→ recombined hypotheses must match in their last n − 1 words
position of previous input phrase → recombined hypotheses must have that same end position
Philipp Koehn Machine Translation: Decoding 17 September 2020
23
Philipp Koehn Machine Translation: Decoding 17 September 2020
24
(we still have a NP complete problem on our hands)
– put comparable hypothesis into stacks (hypotheses that have translated same number of input words) – limit number of hypotheses in each stack
Philipp Koehn Machine Translation: Decoding 17 September 2020
25
are it he goes does not yes
no word translated
translated two words translated three words translated
– translation option is applied to hypothesis – new hypothesis is dropped into a stack further down
Philipp Koehn Machine Translation: Decoding 17 September 2020
26
1: place empty hypothesis into stack 0 2: for all stacks 0...n − 1 do 3:
for all hypotheses in stack do
4:
for all translation options do
5:
if applicable then
6:
create new hypothesis
7:
place in stack
8:
recombine with existing hypothesis if possible
9:
prune stack if too big
10:
end if
11:
end for
12:
end for
13: end for Philipp Koehn Machine Translation: Decoding 17 September 2020
27
– histogram pruning: keep at most k hypotheses in each stack – stack pruning: keep hypothesis with score α × best score (α < 1)
O(max stack size × translation options × sentence length)
O(max stack size × sentence length2)
Philipp Koehn Machine Translation: Decoding 17 September 2020
28
– depending on language pair – larger reordering limit hurts translation quality
O(max stack size × sentence length)
Philipp Koehn Machine Translation: Decoding 17 September 2020
29
Philipp Koehn Machine Translation: Decoding 17 September 2020
30
the tourism initiative addresses this for the first time
the
die
tm:-0.19,lm:-0.4, d:0, all:-0.65 tourism
touristische
tm:-1.16,lm:-2.93 d:0, all:-4.09 the first time
das erste mal
tm:-0.56,lm:-2.81 d:-0.74. all:-4.11 initiative
initiative
tm:-1.21,lm:-4.67 d:0, all:-5.88
both hypotheses translate 3 words worse hypothesis has better score
Philipp Koehn Machine Translation: Decoding 17 September 2020
31
– translation model: cost known – language model: output words known, but not context → estimate without context – reordering model: unknown, ignored for future cost estimation
Philipp Koehn Machine Translation: Decoding 17 September 2020
32
the tourism initiative addresses this for the first time
cost of cheapest translation options for each input span (log-probabilities)
Philipp Koehn Machine Translation: Decoding 17 September 2020
33
first future cost estimate for n words (from first) word 1 2 3 4 5 6 7 8 9 the
tourism
initiative
addresses
this
for
the
first
time
than unusual ones (tourism initiative addresses: -5.9)
Philipp Koehn Machine Translation: Decoding 17 September 2020
34
the first time
das erste mal
tm:-0.56,lm:-2.81 d:-0.74. all:-4.11 the tourism initiative
die touristische initiative
tm:-1.21,lm:-4.67 d:0, all:-5.88
this for ... time
für diese zeit
tm:-0.82,lm:-2.98 d:-1.06. all:-4.86
=
=
=
– left hypothesis starts with hard part: the tourism initiative score: -5.88, future cost: -6.1 → total cost -11.98 – middle hypothesis starts with easiest part: the first time score: -4.11, future cost: -9.3 → total cost -13.41 – right hypothesis picks easy parts: this for ... time score: -4.86, future cost: -9.1 → total cost -13.96
Philipp Koehn Machine Translation: Decoding 17 September 2020
35
Philipp Koehn Machine Translation: Decoding 17 September 2020
36
→ too much computation
1: place empty hypothesis into stack 0 2: for all stacks 0...n − 1 do 3:
for all hypotheses in stack do
4:
for all translation options do
5:
if applicable then
6:
create new hypothesis
7:
place in stack
8:
recombine with existing hypothesis if possible
9:
prune stack if too big
10:
end if
11:
end for
12:
end for
13: end for Philipp Koehn Machine Translation: Decoding 17 September 2020
37
– – – – ...
– – – – ... ⇒ Loop over groups, check for applicability once for each pair of groups (not much gained so far)
Philipp Koehn Machine Translation: Decoding 17 September 2020
38
go walk goes are is he does not he just does it does not he just does not he is not it is not
Philipp Koehn Machine Translation: Decoding 17 September 2020
39
he does not -3.2 he just does -3.5 it does not -4.1 he just does not -4.3 he is not -4.7 it is not -5.1
Philipp Koehn Machine Translation: Decoding 17 September 2020
40
he does not -3.2
he just does -3.5
it does not -4.1
he just does not -4.3
he is not -4.7
it is not -5.1
Philipp Koehn Machine Translation: Decoding 17 September 2020
41
he does not -3.2
he just does -3.5
it does not -4.1
he just does not -4.3
he is not -4.7
it is not -5.1
Philipp Koehn Machine Translation: Decoding 17 September 2020
42
he does not -3.2
he just does -3.5
it does not -4.1
he just does not -4.3
he is not -4.7
it is not -5.1
Philipp Koehn Machine Translation: Decoding 17 September 2020
43
he does not -3.2
he just does -3.5
it does not -4.1
he just does not -4.3
he is not -4.7
it is not -5.1
Philipp Koehn Machine Translation: Decoding 17 September 2020
44
he does not -3.2
he just does -3.5
it does not -4.1
he just does not -4.3
he is not -4.7
it is not -5.1
Philipp Koehn Machine Translation: Decoding 17 September 2020
45
he does not -3.2
he just does -3.5
it does not -4.1
he just does not -4.3
he is not -4.7
it is not -5.1
Philipp Koehn Machine Translation: Decoding 17 September 2020
46
Philipp Koehn Machine Translation: Decoding 17 September 2020
47
– a lot of hypotheses share suffixes – a lot of translation options share prefixes – combining ∗ the last word of a hypothesis ∗ the first word of a translation options may already indicate if we should pursue further
– organize hypotheses by suffix tree – organize translation options by prefix tree – process priority queue based on pairs of nodes in these trees
Philipp Koehn Machine Translation: Decoding 17 September 2020
48
Hypotheses with 2 words translated
Translation options for a source span
Philipp Koehn Machine Translation: Decoding 17 September 2020
49
Hypotheses with 2 words translated
Translation options for a source span
ǫ countries the big
large
country a large
big a the
8
a big nation
ǫ do not waver
does rarely waver
not hesitate
8 waver
wavers not
Philipp Koehn Machine Translation: Decoding 17 September 2020
50
ǫ countries the big
large
country a large
big a the
8
a big nation
ǫ do not waver
does rarely waver
not hesitate
8 waver
wavers not
– (ǫ,ǫ), score: -3.2 (-2.1 + -1.1)
Philipp Koehn Machine Translation: Decoding 17 September 2020
51
ǫ countries the big
large
country a large
big a the
8
a big nation
ǫ do not waver
does rarely waver
not hesitate
8 waver
wavers not
– (ǫ,ǫ), score: -3.2 (-2.1 + -1.1)
– (country,ǫ), score: -3.2 (-2.1 + -1.1) – (ǫ[1+],ǫ), score: -3.3 (-2.2 + -1.1)
Philipp Koehn Machine Translation: Decoding 17 September 2020
52
ǫ countries the big
large
country a large
big a the
8
a big nation
ǫ do not waver
does rarely waver
not hesitate
8 waver
wavers not
– (country,ǫ), score: -3.2 (-2.1 + -1.1) – (ǫ[1+],ǫ), score: -3.3 (-2.2 + -1.1)
p(does)
= +0.2
– (country,does), score: -3.0 (-2.1 + -1.1 + +0.2) – (country,ǫ[1+]), score: -3.6 (-2.1 + -1.5)
Philipp Koehn Machine Translation: Decoding 17 September 2020
53
ǫ countries the big
large
country a large
big a the
8
a big nation
ǫ do not waver
does rarely waver
not hesitate
8 waver
wavers not
– (country,does), score: -3.0 (-2.1 + -1.1 + +0.2) – (ǫ[1+],ǫ), score: -3.3 (-2.2 + -1.1) – (country,ǫ[1+]), score: -3.6 (-2.1 + -1.5)
p(does|country)
= +0.1
– (big country,does), score: -2.9 (-2.1 + -1.1 + +0.2 + +0.1) – (country[1+],does), score: -3.7 (-2.1 + -1.1 + +0.2 + -0.7 )
Philipp Koehn Machine Translation: Decoding 17 September 2020
54
ǫ countries the big
large
country a large
big a the
8
a big nation
ǫ do not waver
does rarely waver
not hesitate
8 waver
wavers not
– (big country,does), score: -2.9 (-2.1 + -1.1 + +0.2 + +0.1) – (ǫ[1+],ǫ), score: -3.3 (-2.2 + -1.1) – (country,ǫ[1+]), score: -3.6 (-2.1 + -1.5) – (country[1+],does), score: -3.7 (-2.1 + -1.1 + +0.2 + -0.7 )
– once a full combination is completed (a big country,does not waver), add it to the stack – badly matching updates will push items down the priority queue e.g., logp(does|countries)
p(does)
= −2.1
Philipp Koehn Machine Translation: Decoding 17 September 2020
55
Philipp Koehn Machine Translation: Decoding 17 September 2020
56
Philipp Koehn Machine Translation: Decoding 17 September 2020
57
Philipp Koehn Machine Translation: Decoding 17 September 2020
58
probability + heuristic estimate number of words covered ① depth-first expansion to completed path ② recombination ③ alternative path leading to hypothesis beyond threshold cheapest score
Philipp Koehn Machine Translation: Decoding 17 September 2020
59
– change the translation of a word or phrase – combine the translation of two words into a phrase – split up the translation of a phrase into two smaller phrase translations – move parts of the output into a different position – swap parts of the output with the output at a different part of the sentence
Philipp Koehn Machine Translation: Decoding 17 September 2020
60
– recombination – pruning (requires future cost estimate)
Philipp Koehn Machine Translation: Decoding 17 September 2020