Neural Machine Translation
Philipp Koehn 6 October 2020
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
Neural Machine Translation Philipp Koehn 6 October 2020 Philipp - - PowerPoint PPT Presentation
Neural Machine Translation Philipp Koehn 6 October 2020 Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020 Language Models 1 Modeling variants feed-forward neural network recurrent neural network long
Philipp Koehn 6 October 2020
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
1
– feed-forward neural network – recurrent neural network – long short term memory neural network
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
2
Softmax FF
Embed Embed Embed Embed
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
3
<s> Embed
Input Word Embedding Input Word Output Word Prediction ti Output Word yi E xj xj Recurrent State hj
Softmax the RNN
Predict the first word of a sentence
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
4
<s> Embed the Embed
Input Word Embedding Input Word
Softmax
Output Word Prediction ti
house
Output Word yi E xj xj Recurrent State hj
Softmax the RNN RNN
Predict the second word of a sentence Re-use hidden state from first word prediction
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
5
<s> Embed the Embed house Embed
Input Word Embedding Input Word
Softmax Softmax
Output Word Prediction ti
house is
Output Word yi E xj xj Recurrent State hj
Softmax the RNN RNN RNN
Predict the third word of a sentence ... and so on
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
6
<s> Embed the Embed house Embed is Embed big Embed . Embed
Input Word Embedding Input Word
Softmax Softmax Softmax Softmax Softmax
Output Word Prediction ti
house is big . </s>
Output Word yi E xj xj Recurrent State hj
Softmax the RNN RNN RNN RNN RNN RNN
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
7
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
8
<s> Embed the Embed house Embed is Embed big Embed . Embed </s> Embed
Input Word Embedding Input Word
Softmax Softmax Softmax Softmax Softmax Softmax
Output Word Prediction ti
house is big . </s> das
Output Word yi E xj xj Recurrent State hj
das Embed Haus Embed ist Embed groß Embed . Embed Softmax Softmax Softmax Softmax Softmax Haus ist groß . </s> Softmax the RNN RNN RNN RNN RNN RNN RNN RNN RNN RNN RNN RNN
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
9
⇒ Solution: attention mechanism
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
10
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
11
<s> Embed the Embed house Embed is Embed big Embed . Embed
Input Word Embedding Input Word
Softmax Softmax Softmax Softmax Softmax
Output Word Prediction ti
house is big . </s>
Output Word yi E xj xj Recurrent State hj
Softmax the RNN RNN RNN RNN RNN RNN
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
12
RNN RNN RNN RNN RNN RNN RNN
RNN RNN RNN RNN RNN RNN RNN
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
13
<s> RNN Embed RNN the Embed RNN house Embed RNN is Embed RNN big Embed RNN . Embed RNN </s> RNN Embed RNN
Right-to-Left Encoder Left-to-Right Encoder Input Word Embedding Input Word hj E xj xj hj
RNN RNN RNN RNN RNN
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
14
<s> RNN Embed RNN the Embed RNN house Embed RNN is Embed RNN big Embed RNN . Embed RNN </s> RNN Embed RNN
Right-to-Left Encoder Left-to-Right Encoder Input Word Embedding Input Word hj E xj xj hj
RNN RNN RNN RNN RNN
E xj
← − hj = f(← − − hj+1, ¯ E xj) − → hj = f(− − → hj−1, ¯ E xj)
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
15
RNN RNN RNN RNN
Output Word Prediction Decoder State ti si
Softmax Softmax Softmax
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
16
Embed RNN Embed Embed Embed RNN RNN RNN
Output Word Prediction Output Word Embeddings Decoder State ti E yi si
Softmax Softmax Softmax
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
17
Embed RNN Embed Embed Embed RNN RNN RNN
Output Word Prediction Output Word Embeddings Decoder State Input Context ti E yi si ci
Softmax Softmax Softmax
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
18
RNN RNN
Output Word Prediction Output Word Output Word Embeddings Decoder State Input Context ti
<s> Embed das Embed
yi E yi si ci
Softmax
si = f(si−1, Ey−1, ci)
feed-forward layer, GRU, LSTM, ...
vector ti (same size as vocabulary) ti = W(Usi−1 + V Eyi−1 + Cci) then finding the highest value in vector ti
probability distribution over words
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
19
RNN RNN Attention RNN RNN RNN RNN RNN
Decoder State Input Context Attention Right-to-Left Encoder Left-to-Right Encoder si αij hj hj
RNN RNN RNN RNN RNN
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
20
RNN RNN Attention RNN RNN RNN RNN RNN
Decoder State Input Context Attention Right-to-Left Encoder Left-to-Right Encoder si αij hj hj
RNN RNN RNN RNN RNN
– the representation of input words hj = (← − hj, − → hj)
(modeled with with a feed-forward neural network layer)
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
21
RNN RNN Attention RNN RNN RNN RNN RNN
Decoder State Input Context Attention Right-to-Left Encoder Left-to-Right Encoder si αij hj hj
RNN RNN RNN RNN RNN
αij = exp(a(si−1, hj))
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
22
RNN RNN Weighted Sum Attention RNN RNN RNN RNN RNN
Decoder State Input Context Attention Right-to-Left Encoder Left-to-Right Encoder si ci αij hj hj
RNN RNN RNN RNN RNN
j αijhj Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
23
RNN RNN Weighted Sum Attention RNN RNN RNN RNN RNN RNN
Decoder State Input Context Attention Right-to-Left Encoder Left-to-Right Encoder si ci αij hj hj
RNN RNN RNN RNN RNN
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
24
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
25
<s> das Cost Haus Cost ist Cost
Output Word Prediction Output Word Error ti yi
Softmax Softmax Softmax
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
26
Product W1 Sum Sigmoid W2 b1 b2
x
Product 1.0 0.0 3 2 1
.731 .119 Sum Sigmoid 3.06 1.06 .743 3 2 4 3
5 -5
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
27
<s> <s> Embed RNN Weighted Sum Attention RNN Embed RNN the das Embed Cost Weighted Sum Attention Embed RNN house Haus Embed Cost Weighted Sum Attention Embed RNN is ist Embed Cost Weighted Sum Attention Embed RNN big groß Embed Cost Softmax Weighted Sum Attention Embed RNN . . Embed Cost Weighted Sum Attention Embed RNN </s> </s> Embed Cost Softmax RNN Weighted Sum Attention RNN Embed RNN RNN RNN RNN RNN RNN
Output Word Prediction Output Word Output Word Embeddings Error Decoder State Input Context Attention Right-to-Left Encoder Left-to-Right Encoder Input Word Embedding Input Word ti yi E yi
si ci αij hj E xj xj hj
RNN RNN RNN RNN RNN Softmax Softmax Softmax Softmax
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
28
– most computations on vectors, matrices – efficient implementations for CPU and GPU
– processing several sentence pairs at once – scalar operation → vector operation – vector operation → matrix operation – matrix operation → 3d tensor operation
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
29
⇒ A lot of wasted computations
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
30
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
31
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
32
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
33
Output Hidden Layer Input Word Embedding
Softmax RNN Softmax RNN Softmax RNN Embed Embed Embed
yt ht E xt Shallow
Softmax RNN RNN Softmax RNN RNN Softmax RNN RNN RNN RNN RNN Embed Embed Embed
yt ht,3 ht,2 ht,1 E xi
Softmax RNN RNN Softmax RNN RNN Softmax RNN RNN RNN RNN RNN
Output Hidden Layer 1 Hidden Layer 2 Hidden Layer 3 Input Word Embedding
Embed Embed Embed
yt ht,3 ht,2 ht,1 E xi Deep Stacked Deep Transitional
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
34
– deep transitions: several layers on path to output – deeply stacking recurrent neural networks
RNN RNN RNN RNN RNN
Decoder State
Stack 1, Transition 1
Input Context vt,1,1 ct
FF FF FF FF FF
vt,1,2 Decoder State
Stack 1, Transition 2
RNN RNN RNN RNN RNN
vt,2,1
FF FF FF FF FF
vt,2,2 Decoder State
Stack 2, Transition 1
Decoder State
Stack 2, Transition 2
FF FF FF FF FF
st,1=vt,1,3 Decoder State
Stack 1, Transition 3
FF FF FF FF FF
st,2=vt,2,3 Decoder State
Stack 2, Transition 3
Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020
35
– left-to-right recurrent network, to encode left context – right-to-left recurrent network, to encode right context ⇒ Third way of adding layers
RNN RNN RNN RNN RNN
Encoder State
Layer 1, L2R
Input Word Embedding ci
RNN RNN RNN RNN RNN
hj,2 Encoder State
Layer 1, R2L
RNN RNN RNN RNN RNN
hj,4 Encoder State
Layer 2, R2L
RNN RNN RNN RNN RNN
hj,3 Encoder State
Layer 2, L2R
hj,1 Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020