generating alignments using target foresight in attention
play

Generating Alignments using Target Foresight in Attention-Based - PowerPoint PPT Presentation

Generating Alignments using Target Foresight in Attention-Based Neural Machine Translation Jan-Thorsten Peter, Arne Nix, Hermann Ney peter@cs.rwth-aachen.de Mai 29, 2017 EAMT 2017, Prag Human Language Technology and Pattern Recognition


  1. Generating Alignments using Target Foresight in Attention-Based Neural Machine Translation Jan-Thorsten Peter, Arne Nix, Hermann Ney peter@cs.rwth-aachen.de Mai 29, 2017 EAMT 2017, Prag Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University J.-T. Peter, A.Nix, H. Ney:Target Foresight 1/23 29.05.2017

  2. Outline Motivation Neural Machine Translation Target Foresight Guided Alignment Training Target Foresight with Guided Alignment Training Conclusion J.-T. Peter, A.Nix, H. Ney:Target Foresight 2/23 29.05.2017

  3. Motivation ◮ Alignment use to be important for SMT ◮ Neural Machine Translation (NMT) uses attention ◮ There are still application for alignments: ⊲ Guided alignment training [Chen & Matusov + 16] ⊲ Transread 1 ⊲ Linguee 2 ◮ Using the attention as alignment produces bad results ◮ Can we use NMT to create alignments? 1 https://transread.limsi.fr 2 http://www.linguee.com J.-T. Peter, A.Nix, H. Ney:Target Foresight 3/23 29.05.2017

  4. Related Work D. Bahdanau, K. Cho, Y. Bengio [Bahdanau & Cho + 15]: Neural machine translation by jointly learning to align and translate. ICLR, May 2015 . ◮ Introducing an attention mechanism to neural machine translation W. Chen, E. Matusov, S. Khadivi, J.-T. Peter [Chen & Matusov + 16]: Guided alignment training for topic-aware neural machine translation. AMTA, October 2016 . ◮ Introduces guided alignment training Z. Tu, Z. Lu, Y. Liu, X. Liu, H. Li [Tu & Lu + 16]: Modeling coverage for neural machine translation. ACL, August 2016 . ◮ Analysing attention of neural machine translation using S AER J.-T. Peter, A.Nix, H. Ney:Target Foresight 4/23 29.05.2017

  5. Outline Motivation Neural Machine Translation Target Foresight Guided Alignment Training Target Foresight with Guided Alignment Training Conclusion J.-T. Peter, A.Nix, H. Ney:Target Foresight 5/23 29.05.2017

  6. Attention Based NMT 1 into − → 1 and ← − ◮ Bidirectional RNN encodes source sentence f J h J h J 1 ◮ h j := [ − → j ; ← − h T h T j ] T J.-T. Peter, A.Nix, H. Ney:Target Foresight 6/23 29.05.2017

  7. Attention Based NMT α ij = v T ◮ Energies computed through MLP: ˜ a tanh( W a s i − 1 + U a h j ) W a ∈ R n × n , U a ∈ R n × 2 n , v a ∈ R n : weight parameters J.-T. Peter, A.Nix, H. Ney:Target Foresight 6/23 29.05.2017

  8. Attention Based NMT exp(˜ α ij ) ◮ Attention weights normalized with softmax: α ij = � J k =1 exp(˜ α ik ) J.-T. Peter, A.Nix, H. Ney:Target Foresight 6/23 29.05.2017

  9. Attention Based NMT ◮ Context vector as weighted sum: c i = � J j =1 α ij h j J.-T. Peter, A.Nix, H. Ney:Target Foresight 6/23 29.05.2017

  10. Attention Based NMT ◮ Neural network output: p ( e i | e i − 1 , f J 1 ) = g out ( e i − 1 , s i − 1 , c i ) 1 g out : output function J.-T. Peter, A.Nix, H. Ney:Target Foresight 6/23 29.05.2017

  11. Attention Based NMT ◮ Hidden decoder state: s i = g dec ( e i , c i ; s i − 1 ) g dec : gated recurrent unit J.-T. Peter, A.Nix, H. Ney:Target Foresight 6/23 29.05.2017

  12. GIZA++ vs. NMT Alignment GIZA++ NMT ◮ GIZA++ creates a clean alignment ◮ Noise NMT alignment J.-T. Peter, A.Nix, H. Ney:Target Foresight 7/23 29.05.2017

  13. Alignment Error Rate ◮ Alignment Evaluation: AER ( S, P ; A ) = 1 − | A ∩ S | + | A ∩ P | [Och & Ney 03] | A | + | S | SAER ( M S , M P ; M A ) = 1 − | M A ⊙ M S | + | M A ⊙ M P | [Tu & Lu + 16] | M A | + | M S | Europarl De-En Alignment Test Model AER% SAER % GIZA++ 21.0 26.8 Attention-Based 38.1 63.6 ◮ Attention is converted into hard alignment in both directions ◮ Merged using Och’s refined method [Och & Ney 03]. J.-T. Peter, A.Nix, H. Ney:Target Foresight 8/23 29.05.2017

  14. Alignment Error Rate ◮ Alignment Evaluation: AER ( S, P ; A ) = 1 − | A ∩ S | + | A ∩ P | [Och & Ney 03] | A | + | S | SAER ( M S , M P ; M A ) = 1 − | M A ⊙ M S | + | M A ⊙ M P | [Tu & Lu + 16] | M A | + | M S | Europarl De-En Alignment Test Model AER% SAER % GIZA++ 21.0 26.8 Attention-Based 38.1 63.6 ◮ Attention is converted into hard alignment in both directions ◮ Merged using Och’s refined method [Och & Ney 03]. J.-T. Peter, A.Nix, H. Ney:Target Foresight 8/23 29.05.2017

  15. Outline Motivation Neural Machine Translation Target Foresight Guided Alignment Training Target Foresight with Guided Alignment Training Conclusion J.-T. Peter, A.Nix, H. Ney:Target Foresight 9/23 29.05.2017

  16. Target Foresight ◮ Idea: Use knowledge of the target sentence e I 1 to improve the attention α ij = v T V a ∈ R n × p ˜ a tanh( W a s i − 1 + U a h j + V a ˜ e i ) J.-T. Peter, A.Nix, H. Ney:Target Foresight 10/23 29.05.2017

  17. Raw Target Foresight ◮ Target word encoded in source embedding and attention weights J.-T. Peter, A.Nix, H. Ney:Target Foresight 11/23 29.05.2017

  18. Target Foresight with Noise Target Foresight with Noise NMT 0.9 0.9 </S> </S> 0.8 0.8 . . 0.7 0.7 call 0.6 0.6 call 0.5 0.5 this this 0.4 0.4 heeded heeded 0.3 0.3 Commission 0.2 0.2 commission 0.1 0.1 the the die Kommission hat diesen Appell vernommen . </S> die Kommission hat diesen Appell vernommen . </S> ◮ Adding noise on attention does not help J.-T. Peter, A.Nix, H. Ney:Target Foresight 12/23 29.05.2017

  19. Freeze Encoder and Decoder Target Foresight Frozen NMT ◮ Train baseline system ◮ Freeze encoder and decoder weights ◮ Continue training with target foresight J.-T. Peter, A.Nix, H. Ney:Target Foresight 13/23 29.05.2017

  20. Freeze Encoder and Decoder Target Foresight Frozen NMT Alignment Test Model A ER % S AER % GIZA++ 21.0 26.8 Attention-Based 38.1 63.6 + Target foresight with frozen en-/decoder 33.9 55.6 J.-T. Peter, A.Nix, H. Ney:Target Foresight 13/23 29.05.2017

  21. Outline Motivation Neural Machine Translation Target Foresight Guided Alignment Training Target Foresight with Guided Alignment Training Conclusion J.-T. Peter, A.Nix, H. Ney:Target Foresight 14/23 29.05.2017

  22. Guided Alignment Training ◮ Idea: Introducing target alignment A as a second objective [Chen & Matusov + 16] ◮ Cross-Entropy cost L align between the attention weights α and target alignment A I n J n L align ( A, α ) := − 1 � � � A n,ij log α n,ij N n i =1 j =1 ◮ Optimize w.r.t. L ( A, α, e I 1 , f J 1 ) := λ CE · L CE + λ align · L align ⊲ L CE : standard decoder cost function (cross-entropy) ⊲ λ align , λ CE : weights determined through experiments J.-T. Peter, A.Nix, H. Ney:Target Foresight 15/23 29.05.2017

  23. Guided Alignment Training IWSLT De-En Test Alignment Test Model BLEU% AER% SAER% Attention-Based 29.3 41.8 66.3 + GA 30.3 35.4 44.2 ◮ Improves translation by 1 . 0 BLEU on IWSLT2013 Test ◮ Great improvement in AER and SAER and Alignment Test ◮ Trained an all IWSLT 2013 data J.-T. Peter, A.Nix, H. Ney:Target Foresight 16/23 29.05.2017

  24. Outline Motivation Neural Machine Translation Target Foresight Guided Alignment Training Target Foresight with Guided Alignment Training Conclusion J.-T. Peter, A.Nix, H. Ney:Target Foresight 17/23 29.05.2017

  25. GIZA ++ vs. Target Foresight with Guided Alignment GIZA++ TF + GA ◮ Target foresight creates correct alignment J.-T. Peter, A.Nix, H. Ney:Target Foresight 18/23 29.05.2017

  26. Results Alignment Test Model A ER % S AER % fast_align 27.9 33.0 GIZA++ 21.0 26.8 BerkeleyAligner 20.5 26.4 Attention-Based 38.1 63.6 + Guided alignment 29.8 38.0 + Target foresight with frozen en-/decoder 33.9 55.6 + Target foresight with guided alignment 19.0 34.9 + converted to hard alignment 19.0 24.6 ◮ Trained on Europal data ◮ Target foresight improves A ER by 2.0% compared to GIZA++ ◮ S AER is biased towards hard alignments J.-T. Peter, A.Nix, H. Ney:Target Foresight 19/23 29.05.2017

  27. Results Alignment Test Model A ER % S AER % fast_align 27.9 33.0 GIZA++ 21.0 26.8 BerkeleyAligner 20.5 26.4 Attention-Based 38.1 63.6 + Guided alignment 29.8 38.0 + Target foresight with frozen en-/decoder 33.9 55.6 + Target foresight with guided alignment 19.0 34.9 + converted to hard alignment 19.0 24.6 ◮ Trained on Europal data ◮ Target foresight improves A ER by 2.0% compared to GIZA++ ◮ S AER is biased towards hard alignments J.-T. Peter, A.Nix, H. Ney:Target Foresight 19/23 29.05.2017

  28. Retrain Guided Alignment ◮ Use improved alignment for guided alignment training ◮ Test data: IWSLT 2013 ◮ Train data: Europarl corpus Test Alignment Test Model B LEU A ER % S AER % Attention-Based 16.0 38.1 63.6 + GA using GIZA++ 18.4 29.8 38.0 + GA using target-foresight alignments 18.8 28.5 36.7 J.-T. Peter, A.Nix, H. Ney:Target Foresight 20/23 29.05.2017

  29. Outline Motivation Neural Machine Translation Target Foresight Guided Alignment Training Target Foresight with Guided Alignment Training Conclusion J.-T. Peter, A.Nix, H. Ney:Target Foresight 21/23 29.05.2017

  30. Conclusion ◮ Improvement of A ER by 2.0% compared to GIZA++ ◮ Can easily be used to align unseen data ◮ Aligned data can again be used for guided alignment training ◮ Neural networks will cheat if it is possible ◮ Guided alignment keeps it from cheating J.-T. Peter, A.Nix, H. Ney:Target Foresight 22/23 29.05.2017

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend