neural machine translation
play

Neural Machine Translation Philipp Koehn 6 October 2020 Philipp - PowerPoint PPT Presentation

Neural Machine Translation Philipp Koehn 6 October 2020 Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020 Language Models 1 Modeling variants feed-forward neural network recurrent neural network long


  1. Neural Machine Translation Philipp Koehn 6 October 2020 Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  2. Language Models 1 • Modeling variants – feed-forward neural network – recurrent neural network – long short term memory neural network • May include input context Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  3. Feed Forward Neural Language Model 2 w i Output Word Softmax h Hidden Layer FF Ew Embedding Embed Embed Embed Embed w i-4 w i-3 w i-2 w i-1 History Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  4. Recurrent Neural Language Model 3 y i Output Word the Output Word t i Softmax Prediction Recurrent h j RNN State Input Word E x j Embed Embedding x j Input Word <s> Predict the first word of a sentence Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  5. Recurrent Neural Language Model 4 y i Output Word the house Output Word t i Softmax Softmax Prediction Recurrent h j RNN RNN State Input Word E x j Embed Embed Embedding x j Input Word <s> the Predict the second word of a sentence Re-use hidden state from first word prediction Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  6. Recurrent Neural Language Model 5 y i Output Word the house is Output Word t i Softmax Softmax Softmax Prediction Recurrent h j RNN RNN RNN State Input Word E x j Embed Embed Embed Embedding x j Input Word <s> the house Predict the third word of a sentence ... and so on Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  7. Recurrent Neural Language Model 6 y i Output Word the house is big . </s> Output Word t i Softmax Softmax Softmax Softmax Softmax Softmax Prediction Recurrent h j RNN RNN RNN RNN RNN RNN State Input Word E x j Embed Embed Embed Embed Embed Embed Embedding x j Input Word <s> the house is big . Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  8. Recurrent Neural Translation Model 7 • We predicted the words of a sentence • Why not also predict their translations? Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  9. Encoder-Decoder Model 8 y i Output Word the house is big . </s> das Haus ist groß . </s> Output Word t i Softmax Softmax Softmax Softmax Softmax Softmax Softmax Softmax Softmax Softmax Softmax Softmax Prediction Recurrent h j RNN RNN RNN RNN RNN RNN RNN RNN RNN RNN RNN RNN State Input Word E x j Embed Embed Embed Embed Embed Embed Embed Embed Embed Embed Embed Embed Embedding x j Input Word <s> the house is big . </s> das Haus ist groß . • Obviously madness • Proposed by Google (Sutskever et al. 2014) Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  10. What is Missing? 9 • Alignment of input words to output words ⇒ Solution: attention mechanism Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  11. 10 neural translation model with attention Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  12. Input Encoding 11 y i Output Word the house is big . </s> Output Word t i Softmax Softmax Softmax Softmax Softmax Softmax Prediction Recurrent h j RNN RNN RNN RNN RNN RNN State Input Word E x j Embed Embed Embed Embed Embed Embed Embedding x j Input Word <s> the house is big . • Inspiration: recurrent neural network language model on the input side Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  13. Hidden Language Model States 12 • This gives us the hidden states RNN RNN RNN RNN RNN RNN RNN • These encode left context for each word • Same process in reverse: right context for each word RNN RNN RNN RNN RNN RNN RNN Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  14. Input Encoder 13 Right-to-Left h j RNN RNN RNN RNN RNN RNN RNN Encoder Left-to-Right h j RNN RNN RNN RNN RNN RNN RNN Encoder Input Word E x j Embed Embed Embed Embed Embed Embed Embed Embedding x j Input Word <s> the house is big . </s> • Input encoder: concatenate bidrectional RNN states • Each word representation includes full left and right sentence context Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  15. Encoder: Math 14 Right-to-Left h j RNN RNN RNN RNN RNN RNN RNN Encoder Left-to-Right h j RNN RNN RNN RNN RNN RNN RNN Encoder Input Word E x j Embed Embed Embed Embed Embed Embed Embed Embedding x j Input Word <s> the house is big . </s> • Input is sequence of words x j , mapped into embedding space ¯ E x j • Bidirectional recurrent neural networks ← h j = f ( ← − − − h j +1 , ¯ E x j ) − → h j = f ( − − → h j − 1 , ¯ E x j ) • Various choices for the function f () : feed-forward layer, GRU, LSTM, ... Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  16. Decoder 15 • We want to have a recurrent neural network predicting output words Output Word t i Softmax Softmax Softmax Prediction s i Decoder State RNN RNN RNN RNN Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  17. Decoder 16 • We want to have a recurrent neural network predicting output words Output Word E y i Embed Embed Embed Embed Embeddings Output Word t i Softmax Softmax Softmax Prediction s i Decoder State RNN RNN RNN RNN • We feed decisions on output words back into the decoder state Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  18. Decoder 17 • We want to have a recurrent neural network predicting output words Output Word E y i Embed Embed Embed Embed Embeddings Output Word t i Softmax Softmax Softmax Prediction s i Decoder State RNN RNN RNN RNN c i Input Context • We feed decisions on output words back into the decoder state • Decoder state is also informed by the input context Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  19. More Detail 18 • Decoder is also recurrent neural network over sequence of hidden states s i Output Word E y i Embed Embed Embeddings s i = f ( s i − 1 , Ey − 1 , c i ) y i Output Word • Again, various choices for the function f () : <s> das feed-forward layer, GRU, LSTM, ... Output Word t i Softmax Prediction • Output word y i is selected by computing a vector t i (same size as vocabulary) s i Decoder State RNN RNN t i = W ( Us i − 1 + V Ey i − 1 + Cc i ) c i Input Context then finding the highest value in vector t i • If we normalize t i , we can view it as a probability distribution over words • Ey i is the embedding of the output word y i Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  20. Attention 19 s i Decoder State RNN RNN Input Context α ij Attention Attention Right-to-Left h j RNN RNN RNN RNN RNN Encoder Left-to-Right h j RNN RNN RNN RNN RNN Encoder • Given what we have generated so far (decoder hidden state) • ... which words in the input should we pay attention to (encoder states)? Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  21. Attention 20 s i Decoder State RNN RNN Input Context α ij Attention Attention Right-to-Left h j RNN RNN RNN RNN RNN Encoder Left-to-Right h j RNN RNN RNN RNN RNN Encoder • Given: – the previous hidden state of the decoder s i − 1 – the representation of input words h j = ( ← h j , − − → h j ) • Predict an alignment probability a ( s i − 1 , h j ) to each input word j (modeled with with a feed-forward neural network layer) Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  22. Attention 21 s i Decoder State RNN RNN Input Context α ij Attention Attention Right-to-Left h j RNN RNN RNN RNN RNN Encoder Left-to-Right h j RNN RNN RNN RNN RNN Encoder • Normalize attention (softmax) exp ( a ( s i − 1 , h j )) α ij = � k exp ( a ( s i − 1 , h k )) Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  23. Attention 22 s i Decoder State RNN RNN Weighted c i Input Context Sum α ij Attention Attention Right-to-Left h j RNN RNN RNN RNN RNN Encoder Left-to-Right h j RNN RNN RNN RNN RNN Encoder • Relevant input context: weigh input words according to attention: c i = � j α ij h j Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  24. Attention 23 s i Decoder State RNN RNN RNN Weighted c i Input Context Sum α ij Attention Attention Right-to-Left h j RNN RNN RNN RNN RNN Encoder Left-to-Right h j RNN RNN RNN RNN RNN Encoder • Use context to predict next hidden state and output word Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  25. 24 training Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend