Generating Alignments using Target Foresight in Attention-Based - PowerPoint PPT Presentation

Generating Alignments using Target Foresight in Attention-Based Neural Machine Translation Jan-Thorsten Peter, Arne Nix, Hermann Ney peter@cs.rwth-aachen.de Mai 29, 2017 EAMT 2017, Prag Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University J.-T. Peter, A.Nix, H. Ney:Target Foresight 1/23 29.05.2017

Outline Motivation Neural Machine Translation Target Foresight Guided Alignment Training Target Foresight with Guided Alignment Training Conclusion J.-T. Peter, A.Nix, H. Ney:Target Foresight 2/23 29.05.2017

Motivation ◮ Alignment use to be important for SMT ◮ Neural Machine Translation (NMT) uses attention ◮ There are still application for alignments: ⊲ Guided alignment training [Chen & Matusov + 16] ⊲ Transread 1 ⊲ Linguee 2 ◮ Using the attention as alignment produces bad results ◮ Can we use NMT to create alignments? 1 https://transread.limsi.fr 2 http://www.linguee.com J.-T. Peter, A.Nix, H. Ney:Target Foresight 3/23 29.05.2017

Related Work D. Bahdanau, K. Cho, Y. Bengio [Bahdanau & Cho + 15]: Neural machine translation by jointly learning to align and translate. ICLR, May 2015 . ◮ Introducing an attention mechanism to neural machine translation W. Chen, E. Matusov, S. Khadivi, J.-T. Peter [Chen & Matusov + 16]: Guided alignment training for topic-aware neural machine translation. AMTA, October 2016 . ◮ Introduces guided alignment training Z. Tu, Z. Lu, Y. Liu, X. Liu, H. Li [Tu & Lu + 16]: Modeling coverage for neural machine translation. ACL, August 2016 . ◮ Analysing attention of neural machine translation using S AER J.-T. Peter, A.Nix, H. Ney:Target Foresight 4/23 29.05.2017

Attention Based NMT 1 into − → 1 and ← − ◮ Bidirectional RNN encodes source sentence f J h J h J 1 ◮ h j := [ − → j ; ← − h T h T j ] T J.-T. Peter, A.Nix, H. Ney:Target Foresight 6/23 29.05.2017

Attention Based NMT α ij = v T ◮ Energies computed through MLP: ˜ a tanh( W a s i − 1 + U a h j ) W a ∈ R n × n , U a ∈ R n × 2 n , v a ∈ R n : weight parameters J.-T. Peter, A.Nix, H. Ney:Target Foresight 6/23 29.05.2017

Attention Based NMT exp(˜ α ij ) ◮ Attention weights normalized with softmax: α ij = � J k =1 exp(˜ α ik ) J.-T. Peter, A.Nix, H. Ney:Target Foresight 6/23 29.05.2017

Attention Based NMT ◮ Context vector as weighted sum: c i = � J j =1 α ij h j J.-T. Peter, A.Nix, H. Ney:Target Foresight 6/23 29.05.2017

Attention Based NMT ◮ Neural network output: p ( e i | e i − 1 , f J 1 ) = g out ( e i − 1 , s i − 1 , c i ) 1 g out : output function J.-T. Peter, A.Nix, H. Ney:Target Foresight 6/23 29.05.2017

Attention Based NMT ◮ Hidden decoder state: s i = g dec ( e i , c i ; s i − 1 ) g dec : gated recurrent unit J.-T. Peter, A.Nix, H. Ney:Target Foresight 6/23 29.05.2017

GIZA++ vs. NMT Alignment GIZA++ NMT ◮ GIZA++ creates a clean alignment ◮ Noise NMT alignment J.-T. Peter, A.Nix, H. Ney:Target Foresight 7/23 29.05.2017

Alignment Error Rate ◮ Alignment Evaluation: AER ( S, P ; A ) = 1 − | A ∩ S | + | A ∩ P | [Och & Ney 03] | A | + | S | SAER ( M S , M P ; M A ) = 1 − | M A ⊙ M S | + | M A ⊙ M P | [Tu & Lu + 16] | M A | + | M S | Europarl De-En Alignment Test Model AER% SAER % GIZA++ 21.0 26.8 Attention-Based 38.1 63.6 ◮ Attention is converted into hard alignment in both directions ◮ Merged using Och’s refined method [Och & Ney 03]. J.-T. Peter, A.Nix, H. Ney:Target Foresight 8/23 29.05.2017

Target Foresight ◮ Idea: Use knowledge of the target sentence e I 1 to improve the attention α ij = v T V a ∈ R n × p ˜ a tanh( W a s i − 1 + U a h j + V a ˜ e i ) J.-T. Peter, A.Nix, H. Ney:Target Foresight 10/23 29.05.2017

Raw Target Foresight ◮ Target word encoded in source embedding and attention weights J.-T. Peter, A.Nix, H. Ney:Target Foresight 11/23 29.05.2017

Target Foresight with Noise Target Foresight with Noise NMT 0.9 0.9 </S> </S> 0.8 0.8 . . 0.7 0.7 call 0.6 0.6 call 0.5 0.5 this this 0.4 0.4 heeded heeded 0.3 0.3 Commission 0.2 0.2 commission 0.1 0.1 the the die Kommission hat diesen Appell vernommen . </S> die Kommission hat diesen Appell vernommen . </S> ◮ Adding noise on attention does not help J.-T. Peter, A.Nix, H. Ney:Target Foresight 12/23 29.05.2017

Freeze Encoder and Decoder Target Foresight Frozen NMT ◮ Train baseline system ◮ Freeze encoder and decoder weights ◮ Continue training with target foresight J.-T. Peter, A.Nix, H. Ney:Target Foresight 13/23 29.05.2017

Freeze Encoder and Decoder Target Foresight Frozen NMT Alignment Test Model A ER % S AER % GIZA++ 21.0 26.8 Attention-Based 38.1 63.6 + Target foresight with frozen en-/decoder 33.9 55.6 J.-T. Peter, A.Nix, H. Ney:Target Foresight 13/23 29.05.2017

Guided Alignment Training ◮ Idea: Introducing target alignment A as a second objective [Chen & Matusov + 16] ◮ Cross-Entropy cost L align between the attention weights α and target alignment A I n J n L align ( A, α ) := − 1 � � � A n,ij log α n,ij N n i =1 j =1 ◮ Optimize w.r.t. L ( A, α, e I 1 , f J 1 ) := λ CE · L CE + λ align · L align ⊲ L CE : standard decoder cost function (cross-entropy) ⊲ λ align , λ CE : weights determined through experiments J.-T. Peter, A.Nix, H. Ney:Target Foresight 15/23 29.05.2017

Guided Alignment Training IWSLT De-En Test Alignment Test Model BLEU% AER% SAER% Attention-Based 29.3 41.8 66.3 + GA 30.3 35.4 44.2 ◮ Improves translation by 1 . 0 BLEU on IWSLT2013 Test ◮ Great improvement in AER and SAER and Alignment Test ◮ Trained an all IWSLT 2013 data J.-T. Peter, A.Nix, H. Ney:Target Foresight 16/23 29.05.2017

GIZA ++ vs. Target Foresight with Guided Alignment GIZA++ TF + GA ◮ Target foresight creates correct alignment J.-T. Peter, A.Nix, H. Ney:Target Foresight 18/23 29.05.2017

Results Alignment Test Model A ER % S AER % fast_align 27.9 33.0 GIZA++ 21.0 26.8 BerkeleyAligner 20.5 26.4 Attention-Based 38.1 63.6 + Guided alignment 29.8 38.0 + Target foresight with frozen en-/decoder 33.9 55.6 + Target foresight with guided alignment 19.0 34.9 + converted to hard alignment 19.0 24.6 ◮ Trained on Europal data ◮ Target foresight improves A ER by 2.0% compared to GIZA++ ◮ S AER is biased towards hard alignments J.-T. Peter, A.Nix, H. Ney:Target Foresight 19/23 29.05.2017

Retrain Guided Alignment ◮ Use improved alignment for guided alignment training ◮ Test data: IWSLT 2013 ◮ Train data: Europarl corpus Test Alignment Test Model B LEU A ER % S AER % Attention-Based 16.0 38.1 63.6 + GA using GIZA++ 18.4 29.8 38.0 + GA using target-foresight alignments 18.8 28.5 36.7 J.-T. Peter, A.Nix, H. Ney:Target Foresight 20/23 29.05.2017

Conclusion ◮ Improvement of A ER by 2.0% compared to GIZA++ ◮ Can easily be used to align unseen data ◮ Aligned data can again be used for guided alignment training ◮ Neural networks will cheat if it is possible ◮ Guided alignment keeps it from cheating J.-T. Peter, A.Nix, H. Ney:Target Foresight 22/23 29.05.2017

Generating Alignments using Target Foresight in Attention-Based - PowerPoint PPT Presentation

Generating Alignments using Target Foresight in Attention-Based Neural Machine Translation Jan-Thorsten Peter, Arne Nix, Hermann Ney peter@cs.rwth-aachen.de Mai 29, 2017 EAMT 2017, Prag Human Language Technology and Pattern Recognition

CSCE 471/871 Lecture 2: Alignments Pairwise Alignments Stephen Scott Alignments Scoring

the Role of Strategic Foresight in Estonia Meelis Kitsing, PhD Head of Research, Foresight Centre

SHAPES FORESIGHT EXERCISES Awareness Week Monday Foresight in SHAPES Fraunhofer INT

Multiple Alignments and Phylogenies Mark Voorhies 3/29/2012 Mark Voorhies Multiple Alignments

Multiple Alignments and Phylogenies Mark Voorhies 3/31/2011 Mark Voorhies Multiple Alignments

A Basic Presentation on Strategic Foresight Dr. Chalette Renee Griffin, SHRM-CP, Foresight For The

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

H2 F2009 H2 F2009 GENERATING GENERATING GENERATING GENERATING FREE CASH FLOW FREE CASH FLOW

Pairwise sequence alignments Volker Flegel Vassilios Ioannidis VI - 2004 Page 1 Outline

Multiple Sequence Multiple Sequence Alignments Alignments Multiple alignment Pairwise

Global and local alignments Global vs. local alignments Global: align all nucleotides

FP Foresight UK Infrastructure Income Fund Investor Presentation December 2017 FP Foresight UK

Foresight Energy LP Investor Presentation May 2017 FORESIGHT ENERGY Disclaimer It is

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Advanced Electric Generating Advanced Electric Generating Advanced Electric Generating

Substance Use in Older Adults: Screening and Treatment Intervention Strategies A Roadmap for

S y s t e ms P r o g r a mmi n g T P 5 S p r i t e s I n t e

Persistency of Linear Programming Formulations for the Stable Set Problem guez-Heck 1 , Karl

Hybrid (DG) Methods for the Helmholtz Equation Joachim Sch oberl Computational Mathematics in

Acoustic and electromagnetic transmission problems: wavenumber-explicit bounds and

A user-tailored approach to privacy decision support Bart P. Knijnenburg @usabart Slides and

Case study: Lead Regulations 0.015 mg/L action level in drinking water Sources

Empowerment, Digital Literacy and Shared Digital Health Records: the value of nothing about me