effective approaches to attention based
play

Effective Approaches to Attention-based Neural Machine Translation - PowerPoint PPT Presentation

Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu Pham and Chris Manning EMNLP 2015 Presented by: Yunan Zhang Neural Machine Translation Attention Mechanism (Sutskever et al., 2014) (Bahdanau et al., 2015) _ suis


  1. Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu Pham and Chris Manning EMNLP 2015 Presented by: Yunan Zhang

  2. Neural Machine Translation Attention Mechanism (Sutskever et al., 2014) (Bahdanau et al., 2015) _ suis étudiant Je _ étudiant I am a student Je suis Recent innovation in deep learning: • Control problem (Mnih et al., 14) New approach: recent SOTA • Speech recognition (Chorowski et al., 14) results • Image captioning (Xu et al., 15) • English-French (Luong et al., • Propose a new and better attention mechanism. 15. Our work .) • Examine other variants of attention models. • Achieve new SOTA results WMT English-German. • English-German (Jean et al., 15)

  3. Neural Machine Translation (NMT) _ suis étudiant Je _ étudiant I am a student Je suis • Big RNNs trained end-to-end.

  4. Neural Machine Translation (NMT) _ suis étudiant Je _ étudiant I am a student Je suis • Big RNNs trained end-to-end: encoder-decoder. – Generalize well to long sequences. – Small memory footprint. – Simple decoder.

  5. Attention Mechanism suis Attention Layer Context vector 0.6 0.2 0.1 0.1 _ I am a student Je • Maintain a memory of source hidden states • Memory here means a weighted average of the hidden states • The weight is determined by comparing the current target hidden state and all the source

  6. Attention Mechanism suis Context vector 0.6 0.2 0.1 0.1 _ I am a student Je • Maintain a memory of source hidden states – Able to translate long sentences. – f

  7. Motivation • A new attention mechanism: local attention – Use a subset of source states each time. – Better results with focused attention! • Global attention: use all source states – Other variants of (Bahdanau et al., 15)

  8. Global Attention • Alignment weight vector:

  9. Global Attention • Alignment weight vector: (Bahdanau et al., 15)

  10. Global Attention Context vector : weighted average of source states.

  11. Global Attention Attentional vector

  12. Local Attention aligned positions? • defines a focused window . • A blend between soft & hard attention (Xu et al., ’1��.

  13. Local Attention (2) • Predict aligned positions: Real value in [0, S] Source sentence How do we learn to the position parameters?

  14. Local Attention (3) Alignment weights 1 5.5 2 0.8 0.6 0.4 0.2 0 3.5 4 4.5 5 5.5 6 6.5 7 7.5 s • Like global model: for integer in – Compute

  15. Local Attention (3) Truncated Gaussian 1 0.8 0.6 0.4 0.2 0 3.5 4 4.5 5 5.5 6 6.5 7 7.5 s • Favor points close to the center.

  16. Local Attention (3) 1 0.8 New Peak 0.6 0.4 0.2 0 3.5 4 4.5 5 5.5 6 6.5 7 7.5 s

  17. Experiments • WMT English ⇄ German (4.5M sentence pairs). • Setup: (Sutskever et al., 14, Luong et al., 15) – 4-layer stacking LSTMs: 1000-dim cells/embeddings. – 50K most frequent English & German words

  18. English- Ger�a� WMT’1� Results Systems Ppl BLEU Winning system – phrase-based + large LM (Buck et al.) 20.7 Our NMT systems Base 10.6 11.3 • Large progressive gains: – Attention: +2.8 BLEU Feed input: +1.3 BLEU • BLEU & perplexity correlation (Luong et al., ’1��.

  19. English- Ger�a� WMT’1� Results Systems Ppl BLEU Winning system – phrase-based + large LM (Buck et al.) 20.7 Our NMT systems Base 10.6 11.3 Base + reverse 9.9 12.6 (+1.3) • Large progressive gains: – Attention: +2.8 BLEU Feed input: +1.3 BLEU • BLEU & perplexity correlation (Luong et al., ’1��.

  20. English- Ger�a� WMT’1� Results Systems Ppl BLEU Winning system – phrase-based + large LM (Buck et al.) 20.7 Our NMT systems Base 10.6 11.3 Base + reverse 9.9 12.6 (+1.3) Base + reverse + dropout 8.1 14.0 (+1.4) • Large progressive gains: – Attention: +2.8 BLEU Feed input: +1.3 BLEU • BLEU & perplexity correlation (Luong et al., ’1��.

  21. English- Ger�a� WMT’1� Results Systems Ppl BLEU Winning system – phrase-based + large LM (Buck et al.) 20.7 Our NMT systems Base 10.6 11.3 Base + reverse 9.9 12.6 (+1.3) Base + reverse + dropout 8.1 14.0 (+1.4) Base + reverse + dropout + global attn 7.3 16.8 (+2.8) • Large progressive gains: – Attention: +2.8 BLEU Feed input: +1.3 BLEU • BLEU & perplexity correlation (Luong et al., ’1��.

  22. English- Ger�a� WMT’1� Results Systems Ppl BLEU Winning system – phrase-based + large LM (Buck et al.) 20.7 Our NMT systems Base 10.6 11.3 Base + reverse 9.9 12.6 (+1.3) Base + reverse + dropout 8.1 14.0 (+1.4) Base + reverse + dropout + global attn 7.3 16.8 (+2.8) Base + reverse + dropout + global attn + feed input 6.4 18.1 (+1.3) • Large progressive gains: – Attention: +2.8 BLEU Feed input: +1.3 BLEU • BLEU & perplexity correlation (Luong et al., ’1��.

  23. English- Ger�a� WMT’1� Results Systems Ppl BLEU Winning sys – phrase-based + large LM (Buck et al., 2014) 20.7 Existing NMT systems (Jean et al., 2015) RNNsearch 16.5 RNNsearch + unk repl. + large vocab + ensemble 8 models 21.6 Our NMT systems Global attention 7.3 16.8 (+2.8) Global attention + feed input 6.4 18.1 (+1.3)

  24. English- Ger�a� WMT’1� Results Systems Ppl BLEU Winning sys – phrase-based + large LM (Buck et al., 2014) 20.7 Existing NMT systems (Jean et al., 2015) RNNsearch 16.5 RNNsearch + unk repl. + large vocab + ensemble 8 models 21.6 Our NMT systems Global attention 7.3 16.8 (+2.8) Global attention + feed input 6.4 18.1 (+1.3) Local attention + feed input 5.9 19.0 (+0.9) • Local-predictive attention: +0.9 BLEU gain.23.0

  25. English- Ger�a� WMT’1� Results Systems Ppl BLEU Winning sys – phrase-based + large LM (Buck et al., 2014) 20.7 Existing NMT systems (Jean et al., 2015) RNNsearch 16.5 RNNsearch + unk repl. + large vocab + ensemble 8 models 21.6 Our NMT systems Global attention 7.3 16.8 (+2.8) Global attention + feed input 6.4 18.1 (+1.3) Local attention + feed input 5.9 19.0 (+0.9) Local attention + feed input + unk replace 5.9 20.9 (+1.9) • Unknown replacement: +1.9 BLEU – �Luo�g et al., ’1��, �Jea� et al., ’1��.

  26. English- Ger�a� WMT’1� Results Systems Ppl BLEU Winning sys – phrase-based + large LM (Buck et al., 2014) 20.7 Existing NMT systems (Jean et al., 2015) RNNsearch 16.5 RNNsearch + unk repl. + large vocab + ensemble 8 models 21.6 Our NMT systems Global attention 7.3 16.8 (+2.8) Global attention + feed input 6.4 18.1 (+1.3) Local attention + feed input 5.9 19.0 (+0.9) Local attention + feed input + unk replace 5.9 20.9 (+1.9) Ensemble 8 models + unk replace 23.0 (+2.1) New SOTA!

  27. WMT’1� E�glish -Results English-German Systems BLEU Winning system – NMT + 5-gram LM reranker (Montreal) 24.9 Our ensemble 8 models + unk replace 25.9 New SOTA! • WMT’1� German-English : similar gains – Attention: +2.7 BLEU Feed input: +1.0 BLEU

  28. Analysis • Learning curves • Long sentences • Alignment quality • Sample translations

  29. Learning Curves • faf No attention Attention

  30. Translate Long Sentences Attention No Attention

  31. Alignment Quality Models AER Berkeley aligner 0.32 Our NMT systems Global attention 0.39 Local attention 0.36 Ensemble 0.34 • RWTH gold alignment data – 508 English-German Europarl sentences. • Force decode our models. Competitive AERs!

  32. Sample English-German translations src ′′ We ′ re pleased the FAA re�og�izes that a� e�jo�a�le passe�ger e�perie��e is not incompatible �ith safet� a�d se�urit� , ′′ said Roger Do� , CEO of the U.S. Travel Association . ref � Wir freue� u�s , dass die FAA erke��t , dass ei� a�ge�eh�es Passagiererle��is nicht im Wider- spruch zur Sicherheit steht � , sagte Roger Do� , CEO der U.S. Travel Association . be ′′ Wir freue� u�s , dass die FAA a�erke��t , dass ei� a�ge�eh�es ist �i�ht �it st Sicherheit und Sicherheit unvereinbar ist ′′ , sagte Roger Do� , CEO der US - die . ba ′′ Wir freue� u�s u ̈�er die <u�k> , dass ei� <u�k> <u�k> �it Si�herheit �i�ht se vereinbar ist �it Si�herheit u�d Si�herheit ′′ , sagte Roger Ca�ero� , CEO der US - <unk> . • Translate a doubly-negated phrase correctly • Fail to tra�slate �passe�ger e�perie��e�.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend