Neural Machine Translation II Refinements
Philipp Koehn 17 October 2017
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
Neural Machine Translation II Refinements Philipp Koehn 17 October - - PowerPoint PPT Presentation
Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine Translation: Neural Machine Translation II Refinements 17 October 2017 Neural Machine Translation 1 <s> the house is big .
Philipp Koehn 17 October 2017
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
1
Input Word Embeddings Left-to-Right Recurrent NN Right-to-Left Recurrent NN Attention Input Context Hidden State Output Word Predictions Given Output Words Error Output Word Embedding
<s> the house is big . </s> <s> das Haus ist groß , </s>
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
2
– ensembling – handling large vocabularies – using monolingual data – deep models – alignment and coverage – use of linguistic annotation – multiple language pairs
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
3
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
4
(most recent, or interim models with highest validation score)
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
5
Context State
Word Prediction
Selected Word
Embedding
the cat this
fish there dog these
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
6
.54
the
.01
cat
.11
this
.00
.00
fish
.03
there
.00
dog
.05
these
.52 .02 .12 .00 .01 .03 .00 .09
Model 1 Model 2
.12 .33 .06 .01 .15 .00 .05 .09
Model 3
.29 .03 .14 .08 .00 .07 .20 .00
Model 4
.37 .10 .08 .02 .07 .03 .00
Model Average
.06
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
7
bagging, ensemble, model averaging, system combination, ...
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
8
the → cat → is → in → the → bag → .
the ← cat ← is ← in ← the ← bag ← .
Obligatory notice: Some languages (Arabic, Hebrew, ...) have writing systems that are right-to-left, so the use of ”right-to-left” is not precise here.
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
9
⇒ use both left and right context during translation
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
10
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
11
frequency rank
frequency × rank = constant
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
12
– words that occur once or twice have unreliable statistics
– input word embedding matrix: |V | × 1000 – outout word prediction matrix: 1000 × |V |
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
13
tweet, tweets, tweeted, tweeting, retweet, ... → morphological analysis?
homework, website, ... → compound splitting?
Netanyahu, Jones, Macron, Hoboken, ... → transliteration? ⇒ Breaking up words into subwords may be a good idea
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
14
t h e f a t c a t i s i n t h e t h i n b a g
t h→th th e f a t c a t i s i n th e th i n b a g a t→at th e f at c at i s i n th e th i n b a g i n→in th e f at c at i s in th e th in b a g th e→the the f at c at i s in the th in b a g
– starting with the size of the character set (maybe 100 for Latin script) – stopping at, say, 50,000
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
15
Obama receives Net@@ any@@ ahu the relationship between Obama and Net@@ any@@ ahu is not exactly friendly . the two wanted to talk about the implementation of the international agreement and about Teheran ’s destabil@@ ising activities in the Middle East . the meeting was also planned to cover the conflict with the Palestinians and the disputed two state solution . relations between Obama and Net@@ any@@ ahu have been stra@@ ined for years . Washington critic@@ ises the continuous building of settlements in Israel and acc@@ uses Net@@ any@@ ahu of a lack of initiative in the peace process . the relationship between the two has further deteriorated because of the deal that Obama negotiated on Iran ’s atomic programme . in March , at the invitation of the Republic@@ ans , Net@@ any@@ ahu made a controversial speech to the US Congress , which was partly seen as an aff@@ ront to Obama . the speech had not been agreed with Obama , who had rejected a meeting with reference to the election that was at that time im@@ pending in Israel .
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
16
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
17
Adequacy Fluency meaning of source and target match target is well-formed translation model language model parallel data monolingual data
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
18
– word prediction informed by translation model and language model – gated unit that decides balance
– train language model in isolation – add language model score during inference (similar to ensembling)
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
19
– train a system in reverse direction – translate target-side monolingual data into source language – add as additional parallel data
reverse system final system
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
20
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
21
Input Hidden Layer Output Input Hidden Layer 2 Output Hidden Layer 1 Hidden Layer 3
Shallow Deep
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
22
– deep transitions: several layers on path to output – deeply stacking recurrent neural networks
Context Decoder State: Stack 1, Transition 1 Decoder State: Stack 1, Transition 2 Decoder State: Stack 2, Transition 1 Decoder State: Stack 2, Transition 2
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
23
– left-to-right recurrent network, to encode left context – right-to-left recurrent network, to encode right context ⇒ Third way of adding layers
Input Word Embedding Encoder Layer 1: L2R Encoder Layer 2: R2L Encoder Layer 3: L2R Encoder Layer 4: R2L
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
24
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
25
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
26
– based on co-occurence, word position, etc. – expectation maximization (EM) algorithm – popular: IBM models, fast-align
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
27
relations between Obama and Netanyahu have been strained for years . die Beziehungen zwischen Obama und Netanjahu sind seit Jahren angespannt . 56 89 72 16 26 96 79 98 42 11 11 14 38 22 84 23 54 10 98 49
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
28
– traditional objective function: match output words – now: also match given word alignments
– given alignment matrix A, with
j Aij = 1 (from IBM Models)
– computed attention αij (also
j αij = 1 due to softmax)
– added training objective (cross-entropy) costCE = −1 I
I
J
Aij log αij
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
29
in
to solve the problem , the ” Social Housing ” alliance suggests a fresh start . um das Problem zu l¨
, schl¨ agt das Unternehmen der Gesellschaft f¨ ur soziale Bildung vor . 37 33 63 81 84 10 80 12 40 13 71 18 86 84 80 45 40 12 10 41 44 10 89 10 40 37 10 30 80 11 13
43 7 46 161 108 89 62 112 392 121 110 130 26 132 22 19 6 6
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
30
coverage(j) =
αi,j
coverage(j) − 1
coverage(j)
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
31
a(si−1, hj) = W asi−1 + U ahj + V acoverage(j) + ba
log
P(yi|x) + λ
(1 − coverage(j))2
– some words are typically dropped – some words produce multiple output words
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
32
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
33
Words the girl watched attentively the beautiful fireflies Part of speech
DET NN VFIN ADV DET JJ NNS
Lemma the girl watch attentive the beautiful firefly Morphology
PAST
Noun phrase
BEGIN CONT OTHER OTHER BEGIN CONT CONT
Verb phrase
OTHER OTHER BEGIN CONT CONT CONT CONT
girl watched
fireflies fireflies watched
DET SUBJ
DET ADJ OBJ
Semantic role
PATIENT
Semantic type
VIEW
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
34
– part-of-speech tag – morphological features – etc.
– each annotation maps to embedding – embeddings are added
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
35
– ultimately, we do not care if right part-of-speech tag is predicted – only right output words matter
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
36
Sentence the girl watched attentively the beautiful fireflies Syntax tree
S NP DET
the
NN
girl
VP VFIN
watched
ADVP ADV
attentively
NP DET
the
JJ
beautiful
NNS
fireflies Linearized (S (NP (DET the ) (NN girl ) ) (VP (VFIN watched ) (ADVP (ADV attentively ) ) (NP (DET the ) (JJ beautiful ) (NNS fireflies ) ) ) )
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
37
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
38
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
39
– French–English corpus – German–English corpus
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
40
– French–English corpus – French–Spanish corpus
[ENGLISH] N’y a-t-il pas ici deux poids, deux mesures? ⇒ Is this not a case of double standards? [SPANISH] N’y a-t-il pas ici deux poids, deux mesures? ⇒ No puede verse con toda claridad que estamos utilizando un doble rasero?
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
41
English French Spanish German MT
[SPANISH] Messen wir hier nicht mit zweierlei Maß? ⇒ No puede verse con toda claridad que estamos utilizando un doble rasero?
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
42
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017
43
Philipp Koehn Machine Translation: Neural Machine Translation II – Refinements 17 October 2017