BREAKTHROUGHS IN NEURAL MACHINE TRANSLATION Olof Mogren Chalmers - PowerPoint PPT Presentation

BREAKTHROUGHS IN NEURAL MACHINE TRANSLATION Olof Mogren Chalmers University of Technology 2016-09-29

COMING SEMINARS • T oday: Olof Mogren Neural Machine Translation • October 6: John Wiedenhoeft Fast Bayesian inference in Hidden Markov Models using Dynamic Wavelet Compression • October 10: Haris Charalambos Themistocleous Linguistic, signal processing, and machine learning approaches in eliciting information form speech ❤tt♣✿✴✴✇✇✇✳❝s❡✳❝❤❛❧♠❡rs✳s❡✴r❡s❡❛r❝❤✴❧❛❜✴s❡♠✐♥❛rs✴

Progress in Machine Translation [Edinburgh En-De WMT newstest2013 Cased BLEU; NMT 2015 from U. Montréal] Phrase-based SMT Syntax-based SMT Neural MT 25 20 15 10 5 0 2013 2014 2015 2016 From [Sennrich 2016, http://www.meta-net.eu/events/meta-forum-2016/slides/09_sennrich.pdf]

Phrase-based Statistical Machine Translation A marvelous use of big data but … it’s mined out?!? 1519 年 600 名西班牙人在墨西哥登陆，去征服几百万人口的阿兹特克帝国，初次交锋他们损兵三分之二。 In 1519, six hundred Spaniards landed in Mexico to conquer the Aztec Empire with a population of a few million. They lost two thirds of their soldiers in the first clash. translate.google.com (2009): 1519 600 Spaniards landed in Mexico, millions of people to conquer the Aztec empire, the first two-thirds of soldiers against their loss. translate.google.com (2013): 1519 600 Spaniards landed in Mexico to conquer the Aztec empire, hundreds of millions of people, the initial confrontation loss of soldiers two-thirds. conquer the Aztec empire, the first two-thirds of the loss of soldiers they clash. } translate.google.com (2014): 1519 600 Spaniards landed in Mexico, millions of people to conquer the Aztec empire, the first two-thirds of the loss of soldiers they clash. translate.google.com (2015): 1519 600 Spaniards landed in Mexico, millions of people to conquer the Aztec empire, the first two-thirds of the loss of soldiers they clash. translate.google.com (2016): 1519 600 Spaniards landed in Mexico, millions of people to

WHAT IS NEURAL MT (NMT)? The approach of modelling the entire MT process via one big artificial neural network.

MODELLING LANGUAGE USING RNNS y 1 y 2 y 3 x 1 x 2 x 3 • Language models: P ( word i | word 1 , ..., word i − 1 ) • Recurrent Neural Networks • Gated additive sequence modelling: LSTM (and variants) details • Fixed vector representation for sequences • Use with beam-search for language generation

ENCODER-DECODER FRAMEWORK y 1 y 2 y 3 encoder { { decoder x 1 x 2 x 3 • Sequence to Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le, NIPS 2014

ENCODER-DECODER FRAMEWORK y 1 y 2 y 3 encoder { { decoder x 1 x 2 x 3 • Sequence to Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le, NIPS 2014 • Reversed input sentence!

ENCODER-DECODER WITH ATTENTION y 1 y 2 y 3 encoder decoder x 1 x 2 x 3 • N eural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio - ICLR 2015

ENCODER-DECODER WITH ATTENTION y 1 y 2 y 3 encoder decoder attention x 1 x 2 x 3 • N eural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio - ICLR 2015

ALIGNMENT - (MORE) environments environment agreement Economic European August <end> signed marine <end> should known 1992 noted Area least was The that the the the on be in of is It . . L' Il accord convient de sur noter la que zone l' économique environment européenne marin a est été le signé moins connu en de août l' 1992 environment . . <end> <end>

NEURAL MACHINE TRANSLATION, NMT • E nd-to-end training • Distributed representations • Better exploitation of context What’s not on that list?

WHAT’S BEEN HOLDING NMT BACK? • Limited vocabulary • Copying • Dictionary lookup • Data requirements • Computation • Training time • Inference time • Memory usage

RARE WORDS 1: SUBWORD UNITS • N eural machine translation of rare words with subword units Rico Sennrich and Barry Haddow and Alexandra Birch • A character-level decoder without explicit segmentation for neural machine translation Junyoung Chung, Kyunghyun Cho, and Yoshua Bengio, ACL 2016 Byte-pair encoding (BPE): ❛❛❛❜❞❛❛❛❜❛❝ ❩❛❜❞❩❛❜❛❝ ❩❨❞❩❨❛❝ ❳❞❳❛❝ ❩❂❛❛ ❨❂❛❜ ❳❂❩❨ ❩❂❛❛ ❨❂❛❜ ❩❂❛❛

RARE WORDS 2: HYBRID CHAR/WORD NMT • A chieving open vocabulary neural machine translation with hybrid word-character models Thang Luong and Chris Manning, ACL 2016. • Hybrid architechture: • Word-based for most words • Character-based for rare words • 2 BLEU points improvement over copy mechanism

W ord-level (4 layers) End-to-end training 8 stacked LSTM layers

Effects of Vocabulary Sizes Word Word + copy mechanism Hybrid 20 +2.1 18 +3.5 16 +4.5 14 12 BLEU +11.4 10 8 6 4 2 0 1K 10K 20K 50K More than +2.0 BLEU over copy mechanism! Vocabulary Size 177

Rare Word Embeddings

TRAINING WITH MONOLINGUAL DATA • Improving neural machine translation models with monolingual data Rico Sennrich, Barry Haddow, Alexandra Birch, ACL 2016. • Backtranslate monolingual data (with NMT model) • Use backtranslated data as parallell training data

BREAKTHROUGHS IN NEURAL MACHINE TRANSLATION Olof Mogren Chalmers - PowerPoint PPT Presentation

BREAKTHROUGHS IN NEURAL MACHINE TRANSLATION Olof Mogren Chalmers University of Technology 2016-09-29 COMING SEMINARS T oday: Olof Mogren Neural Machine Translation October 6: John Wiedenhoeft Fast Bayesian inference in Hidden Markov

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Machine Translation Philipp Koehn 6 October 2020 Philipp Koehn Machine Translation:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Convolutional over Recurrent Encoder for Neural Machine Translation Praveen Dakwale and Christof

Adaptive Multi-pass Decoder for Neural Machine Translation EMNLP 2018

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Decoding Philipp Koehn 8 October 2020 Philipp Koehn Machine

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Semi-supervised Learning for Neural Machine Translation Yong Cheng joint work with Wei Xu,

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Master Recherche IAC Option 2 Robotique et agents autonomes Jamal Atif Mich` ele Sebag LRI

Deep Learning in Computer Vision (CSC2523) Reading List Bid for papers: Tue, Jan 26, 11.59pm,

Context to Sequence Typical Frameworks and Applications Piji Li Department of Systems

Scheduling multi-task applications on heterogeneous platforms Anne Benoit, Jean-Fran cois

(Question Answering and Dialogue Systems) 063)

Sirius 4.0: Let me Sirius that for you! EclipseCon France, June 2016 Sirius EclipseCon France,

Native Americans Unit Intro to unit: https://app.discoveryeducation.com/learn/vide

Module: Cloud Computing Security Professor Trent Jaeger Penn State University Systems and