Advances and Challenges in Neural Machine Translation Gongbo Tang - PowerPoint PPT Presentation

Advances and Challenges in Neural Machine Translation Gongbo Tang 26 September 2019

Outline Model Architectures 1 Nosiy Data 2 Monolingual Data 3 Domain Adaption 4 Coverage 5 Understanding NMT 6 Gongbo Tang Advances and Challenges in NMT 2/57

The Best of Both Worlds Encoder-decoders With residual feed-forward layers Cascaded encoder Multi-column encoder (b) Multi-Column Encoder (a) Cascaded Encoder Source : The Best of Both Worlds : Combining Recent Advances in Neural Machine Translation Gongbo Tang Advances and Challenges in NMT 3/57

Star-Transformer h 1 h 1 h 8 h 2 h 8 h 2 h 7 h 3 h 7 s h 3 h 6 h 4 h 6 h 4 h 5 h 5 Figure 1: Left: Connections of one layer in Trans- former, circle nodes indicate the hidden states of input tokens. Right: Connections of one layer in Star- Transformer, the square node is the virtual relay node. Red edges and blue edges are ring and radical connections, respectively. Source : Star-Transformer Gongbo Tang Advances and Challenges in NMT 4/57

☯ ☯ ☯ Modeling Recurrence for Transformer Output of Output of h 0 h 1 h 2 h 3 h 4 h 5 h 6 Transformer Encoder Recurrence Encoder e 1 e 2 e 3 e 4 e 5 e 6 Add & Norm Add & Norm Self-Attention Encoder Feed Recurrence Encoder Feed (a) Recurrent Neural Network Forward Forward d ence N × N × h 0 h 1 h 2 Norm Add & Norm Multi-Head Recurrence Attention Modeling c 1 c 2 Positional ⊕ Encoding e 1 e 2 e 3 e 4 e 5 e 6 Source Embedding (b) Attentive Recurrent Network Source Figure 3: Two implementations of recurrence modeling: (a) standard RNN, and (b) the proposed ARN. Figure 2: The architecture of Transformer augmented with an additional recurrence encoder , the output of which is directly fed to the top decoder layer. Recurrent Neural Network (RNN) An intu- Source : Modeling Recurrence for Transformer Gongbo Tang Advances and Challenges in NMT 5/57

Convolutional Self-Attention Networks Bush held a talk with Sharon Bush held a talk with Sharon Bush held a talk with Sharon (a) Vanilla SANs (b) 1D-Convolutional SANs (c) 2D-Convolutional SANs Figure 1: Illustration of (a) vanilla SANs; (b) 1-dimensional convolution with the window size being 3 ; and (c) 2-dimensional convolution with the area being 3 × 3 . Different colors represent different subspaces modeled by multi-head attention, and transparent colors denote masked tokens that are invisible to SANs. Source : Convolutional Self-Attention Networks Gongbo Tang Advances and Challenges in NMT 6/57

Lattice-Based Transformer Encoder Hidden representations mao-yi fa-zhan ju fu zong-cai v 0 v 1 v 2 v 3 v 4 v 5 t 1 t 2 t 3 t 4 t 5 (1)Segmentaion 1 Add & Norm Feed t 1 v 0 mao-yi v 1 fa-zhan-ju v 2 fu-zong-cai v 3 Forward t 2 (2)Segmentaion 2 N x mao-yi-fa-zhan ju fu-zong-cai v 0 v 1 v 2 v 3 Add & Norm t 3 Lattice-aware t 4 (3)Segmentation 3 self-attention e 0:2 :mao-yi e 2:4 :fa-zhan e 4:5 :ju e 5:6 :fu e 6:8 :zong-cai t 5 v 0 v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 mao yi fa zhan ju fu zong cai Lattice c 6 c 7 c 1 c 2 c 3 c 4 c 5 c 8 Positional Encoding e 0:4 :mao-yi-fa-zhan e 2:5 :fa-zhan-ju e 5:8 :fu-zong-cai Input (4)Lattice Embedding Lattice sequence Inputs Figure 1: Incorporating three different segmentation Figure 2: The architecture of lattice-based Transformer for a lattice graph. The original sentence is “ mao-yi- encoder. Lattice positional encoding is added to the fa-zhan-ju-fu-zong-cai ”. In Chinese it is “ 贸易发展局 embeddings of lattice sequence inputs. Different colors 副总裁 ”. In English it means “ The vice president of in lattice-aware self-attention indicate different relation Trade Development Council ” embeddings. Source : Lattice-Based Transformer Encoder for Neural Machine Translation Gongbo Tang Advances and Challenges in NMT 7/57

Incorporating Sentential Context layer 3 layer 3 layer 3 layer 2 layer 2 layer 2 layer 1 layer 1 layer 1 (a) Vanilla (b) Shallow Sentential Context (c) Deep Sentential Context Figure 1: Illustration of the proposed approache. As on a 3-layer encoder: (a) vanilla model without sentential context, (b) shallow sentential context representation (i.e. blue square) by exploiting the top encoder layer only; and (c) deep sentential context representation (i.e. brown square) by exploiting all encoder layers. The circles denote hidden states of individual tokens in the input sentence, and the squares denote the sentential context representations. The red up arrows denote that the representations are fed to the subsequent decoder. This figure is best viewed in color. Source : Exploiting Sentential Context for Neural Machine Translation Gongbo Tang Advances and Challenges in NMT 8/57

Tree Transformer wagging cute the dog tail Add & Norm its is Layer 2 Feed the Forward cute Constituent Layer 1 Priors dog Add & Norm is Multi-Head Constituent wagging Layer 0 Attention Attention its tail cute dog is wagging its tail the (A) (B) (C) Figure 1: (A) A 3-layer Tree Transformer, where the blocks are constituents induced from the input sentence. The two neighboring constituents may merge together in the next layer, so the sizes of constituents gradually grow from layer to layer. The red arrows indicate the self-attention. (B) The building blocks of Tree Transformer. (C) Constituent prior C for the layer 1 . Source : Tree Transformer : Integrating Tree Structures into Self-Attention Gongbo Tang Advances and Challenges in NMT 9/57

Noise in Training Data • Crawled parallel data from the web (very noisy) SMT NMT WMT17 24.0 27.2 + Paracrawl 25.2 (+1.2) 17.3 (-9.9) (German-English, 90m words each of WMT17 and Crawl data) • Corpus cleaning methods [Xu and Koehn, EMNLP 2017] give improvements Source : Philipp Koehn’s slides Gongbo Tang Advances and Challenges in NMT 10/57

Noisy Data Types of noise Misaligned sentences Disfluent language (from MT, bad translations) Wrong language data (e.g., French in German–English corpus) Untranslated sentences Short segments (e.g., dictionaries) Mismatched domain Gongbo Tang Advances and Challenges in NMT 11/57

Mismatched Sentences • Artificial created by randomly shuffling sentence order • Added to existing parallel corpus in different amounts 5% 10% 20% 50% 100% 24.0 24.0 23.9 26.1 23.9 25.3 23.4 -0.0 -0.0 -0.1 -0.1 -0.6 -1.1 -1.9 • Bigger impact on NMT (green, left) than SMT (blue, right) Source : Philipp Koehn’s slides Gongbo Tang Advances and Challenges in NMT 12/57

Misordered Words • Artificial created by randomly shuffling words in each sentence 5% 10% 20% 50% 100% 24.0 23.6 23.9 26.6 23.6 25.5 23.7 Source -0.0 -0.1 -0.4 -0.4 -0.3 -0.6 -1.7 24.0 24.0 23.4 26.7 23.2 26.1 22.9 Target -0.0 -0.0 -0.6 -0.5 -0.8 -1.1 -1.1 • Similar impact on NMT than SMT, worse for source reshuffle Source : Philipp Koehn’s slides Gongbo Tang Advances and Challenges in NMT 13/57

Untranslated Sentences 5% 10% 20% 50% 100% 17.6 23.8 11.2 23.9 5.6 23.8 3.2 23.4 3.2 21.1 -0.2 -0.1 -0.2 -0.6 -2.9 -9.8 Source -16.0 -21.6 -24.0 -24.0 27.2 27.0 26.7 26.8 26.9 Target -0.0 -0.2 -0.3 -0.5 -0.4 Source : Philipp Koehn’s slides Gongbo Tang Advances and Challenges in NMT 14/57

Short Sentences 5% 10% 20% 50% 27.1 24.1 26.5 23.9 26.7 23.8 1-2 words -0.1 +0.1 -0.1 -0.2 -0.7 -0.5 27.8 24.2 27.6 24.5 28.0 24.5 26.6 24.2 +0.8 +0.5 1-5 words +0.6 +0.2 +0.4 +0.5 -0.6 +0.2 • No harm done Source : Philipp Koehn’s slides Gongbo Tang Advances and Challenges in NMT 15/57

Amount of Training Data BLEU Scores with Varying Amounts of Training Data 31 . 1 30 . 3 29 . 6 29 . 2 30 28 . 6 30 . 4 30 . 1 27 . 9 27 . 4 29 . 2 26 . 9 28 . 6 26 . 2 27 . 8 25 . 7 26 . 9 24 . 9 26 . 1 23 . 5 23 . 4 24 . 7 22 . 2 21 . 8 21 . 2 22 . 4 19 . 6 20 18 . 1 18 . 2 16 . 4 14 . 7 11 . 9 10 7 . 2 Phrase-Based with Big LM Phrase-Based 1 . 6 Neural 0 10 6 10 7 10 8 Corpus Size (English Words) Gongbo Tang Advances and Challenges in NMT 16/57

Using Monolingual Data in NMT Dummy source No source sentence randomly sample from monolingual data each epoch freeze encoder/attention layers for monolingual training instances Synthetic source Produce synthetic source-side sentence via back translation. Back-translation : use a trained model on the opposite direction to generate source-side sentence. Gongbo Tang Advances and Challenges in NMT 17/57

Back Translation Steps train a system in reverse language direction use the system to translate target-side monolingual data combine both real parallel data and synthetic parallel data reverse system final system Figure from : Philipp Koehn’s slides Gongbo Tang Advances and Challenges in NMT 18/57

Iterative Back Translation back system 1 back system 2 final system Figure from : Philipp Koehn’s slides Gongbo Tang Advances and Challenges in NMT 19/57

Dual Learning • We could iterate through steps of – train system – create synthetic corpus • Dual learning: train models in both directions together – translation models F → E and E → F – take sentence f – translate into sentence e’ – translate that back into sentence f’ – training objective: f should match f’ • Setup could be fooled by just copying ( e’ = f ) ⇒ score e’ with a language for language E add language model score as cost to training objective Source : Philipp Koehn’s slides Gongbo Tang Advances and Challenges in NMT 20/57

Advances and Challenges in Neural Machine Translation Gongbo Tang - PowerPoint PPT Presentation

Advances and Challenges in Neural Machine Translation Gongbo Tang 26 September 2019 Outline Model Architectures 1 Nosiy Data 2 Monolingual Data 3 Domain Adaption 4 Coverage 5 Understanding NMT 6 Gongbo Tang Advances and Challenges

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Machine Translation Philipp Koehn 6 October 2020 Philipp Koehn Machine Translation:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Convolutional over Recurrent Encoder for Neural Machine Translation Praveen Dakwale and Christof

Adaptive Multi-pass Decoder for Neural Machine Translation EMNLP 2018

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Decoding Philipp Koehn 8 October 2020 Philipp Koehn Machine

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Semi-supervised Learning for Neural Machine Translation Yong Cheng joint work with Wei Xu,

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

ASKING FOR ACTIONABLE FEEDBACK: From Wrong Spotting To Growth AC 3 T: To Optimize Feedback

Bing Search: An Engine in the Clouds Munich Munich Rablstr. Gewrzmhlstr. Founded

shell: program that runs other programs CS 251 Fa Fall 2019 2019 CS 240 Spring 2020

TCP TCP provides the end-to-end reliable CSCE 515: connection that IP alone cannot support

Elections and Political Parties G. Elliott Morris Data Journalist DataCamp Analyzing Election

POLL RESULTS: Congressional Bipartisanship Nationwide and in Battleground States 1 Voters think

Univision News September Poll: Poll Overview & Key Findings @SergioGarciaRs Sergio I.

We will use the NMHC OPTECH app for polling in this session, the results will help guide our

Advances and Challenges in Neural Machine Translation Gongbo Tang - PowerPoint PPT Presentation

Advances and Challenges in Neural Machine Translation Gongbo Tang 26 September 2019 Outline Model Architectures 1 Nosiy Data 2 Monolingual Data 3 Domain Adaption 4 Coverage 5 Understanding NMT 6 Gongbo Tang Advances and Challenges

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Machine Translation Philipp Koehn 6 October 2020 Philipp Koehn Machine Translation:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Convolutional over Recurrent Encoder for Neural Machine Translation Praveen Dakwale and Christof

Adaptive Multi-pass Decoder for Neural Machine Translation EMNLP 2018

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Decoding Philipp Koehn 8 October 2020 Philipp Koehn Machine

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Semi-supervised Learning for Neural Machine Translation Yong Cheng joint work with Wei Xu,

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

ASKING FOR ACTIONABLE FEEDBACK: From Wrong Spotting To Growth AC 3 T: To Optimize Feedback

Bing Search: An Engine in the Clouds Munich Munich Rablstr. Gewrzmhlstr. Founded

shell: program that runs other programs CS 251 Fa Fall 2019 2019 CS 240 Spring 2020

TCP TCP provides the end-to-end reliable CSCE 515: connection that IP alone cannot support

Elections and Political Parties G. Elliott Morris Data Journalist DataCamp Analyzing Election

POLL RESULTS: Congressional Bipartisanship Nationwide and in Battleground States 1 Voters think

Univision News September Poll: Poll Overview &amp; Key Findings @SergioGarciaRs Sergio I.

We will use the NMHC OPTECH app for polling in this session, the results will help guide our

Univision News September Poll: Poll Overview & Key Findings @SergioGarciaRs Sergio I.