How Deep Learning is making MT and other areas converge? MARTA R. - PowerPoint PPT Presentation

How Deep Learning is making MT and other areas converge? MARTA R. COSTA-JUSSÀ UNIVERSITAT POLITÈCNICA DE CATALUNYA, BARCELONA

About me • ASR • SMT • HMT • CLIR • HMT • SMT+NN • CLIR • OM LIMSI- USP, São I2R, IPN, BM, CNRS, Paris Paulo Singapore Mexico Barcelona 2004 2008 2012 2014 2015 • SMT • NMT • S2S • NLI Translation • SLT UPC, UPC, Barcelona Barcelona 2

Outline Machine Translation and Deep Learning Neural Machine Translation Neural MT architecture applied to other areas ◦ NLP (Chatbot) ◦ Speech (End-to-End speech recognition, End-to-End speech translation) ◦ Image (Image captioning) Neural MT inspired by other areas ◦ Image/NLP (Character-aware modelling) ◦ Machine Learning (Adversarial networks) Discussion 3

Machine Translation SOURCE LANGUAGE M Co-ocurrences O Rules Frecuency Neural Networks D Dictionaries Counts E L TARGET LANGUAGE From 1950s till now From 1990s till now Starting in 2014… Dates Eurotra, Apertium… TC-Star, Moses… NEMATUS… Refs (Forcada, 2005) (Koehn, 2010) (Cho, 2014) 4

Neural nets are… Neural networks, a branch of machine learning, are a biologically-inspired programming paradigm which enables a computer to learn from observational data (http://neuralnetworksanddeeplearning.com/) 5

Deep learning is… A branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using model architectures, with complex structures or otherwise, composed of multiple non-linear transformations (wikipedia) A set of machine learning algorithms which attempt to learn multiple-layered models of inputs, commonly neural networks (Du et al, 2013) 6

Neural Machine Translation 7

Motivation: End-to-end system PHRASE-BASED NEURAL decoder Source Language Text Parallel corpus preprocessing Word Alignment Preprocessing Phrase extraction Finding the right target words Translation model given the source words Decoding Language model Ensure that translated words come in the right order Postprocessing Monolingual corpus TEST TRAINING encoder Target Language Text 8

Related work: language modeling Find a function that takes as input n-1 words and returns a conditional probability of the next one Recurrent neural network have allowed to ignore dependency beyond the context window (via recursion) p(I’m) p(fine|I’m) p(.|fine) EOS . I’m fine 9

Architecture: encoder-decoder decoder encoder ? estás EOS Cómo you how are estás ? eos Cómo ? 10

Attention-based mechanism decoder + encoder 11

Neural MT architecture applied to other areas NATURAL LANGUAGE PROCESSING SPEECH IMAGE

Natural Language Processing 13

Chatbot: a computer program that conducts a conversation 14

Chatbots models RULE-BASED RETRIEVAL-BASED ELIZA: Decomposition rules and key words IRIS: a chat-oriented dialogue system based matching (Weizenbaum, 1966) AIML on the vector space model framework (Artificial Intelligence Markup Language) (Banchs and Li, 2012) (Wallace, 1995). <category> <pattern>WHAT IS YOUR NAME</pattern> <template>My name is Michael N.S Evanious.</template> </category> https://hackernoon.com/chatbot-architecture-496f5bf820ed 15

Generative-based chatbots decoder encoder This model can be trained end-to-end Human: what is your job ? Enc-dec: i’m a lawyer . Can not solve the Human: what do you do ? problem of modeling Enc-dec: i’m a doctor . the dialogue 16

Chatbot evaluation Question Count Enc-dec PREFERRED OVER Cleverbot 97 Cleverbot PREFERRED OVER Enc-dec 60 Tie 20 ü The model can generate Disagreemnt AMONG JUDGES (no 23 simple and basic consensus of 3 out of 4 judges) conversations. X The model can not currently deliver realistic conversations. 17

Chatbot examples Human: what is the purpose of being intelligent ? Human: what is your job ? CleverBot: The purpose of life is to love God and love people. CleverBot: I work in mental health Enc-dec: to find out what it is Enc-dec: i ’m a lawyer . * Human: what do you do ? Human: What do you think about england during the reign of CleverBot: I have no job . elizabeth ? Enc-dec: i ’m a doctor . CleverBot: I like him. Do you? Enc-dec: it was a great place . * Human: What is the color of a yellow car ? CleverBot: Brown. Enc-dec: yellow . 18

Other NLP Tasks with Seq2seq Text summarization : process of shortening a text document with software to create a summary with the major points of the original document. Question Answering: automatically producing an answer to a question given a corresponding document. Semantic Parsing: mapping natural language into a logical form that can be executed on a knowledge base and return an answer Syntactic Parsing: process of analysing a string of symbols, either in natural language or in computer languages, conforming to the rules of a formal grammar 19

Speech Recognition 20

Speech Recognition system x = x 1 ... x | x | w = w 1 ... w | w | TASK Language INFO models Feature N-best microphone FEATURES vector Hip. RECOGNIZER DECISION RECOGNIZED Acoustic Lexicon Models SENTENCE 21

RNN/CNN-HMM+RNNLM Acoustic Model RNN/CNN Phonetic inventory HMM Pronunciation Lexicon Language (N-GRAM +) RNN Model 22

Speech recognition with encoder- decoder with attention decoder Language Model + Acoustic Model encoder 23

Listener Challenges: speech signals can be hundreds to thousands of frames long Solution: using a pyramid BLSTM 24

Attend & Spell 25

End-to-end Sp Speech -to- te text Model WER CLDNN-HMM* 8.0 LAS + LM Rescoring 10.3 *Convolutional Long Short Term Memory Fully Connected Deep Neural Network 26

End-to-end Multi-task learning which aims at improving the Speech-to-text generalization performance of a task using other related Translation tasks. One-to-many Many-to-One What is new here compared to previous work? Speech Speech Multi-task training Recognition Translation English Spanish Text Speech Speech Text Translation Translation One encoder, multiple decoders Multiple encoders, one decoder 27

Spanish-> English FISHER/CALLHOME BLEU results Model Test 1 Test 2 End-to-End ST 47.3 16.6 Multi-task 48.7 17.4 ASR / NMT concatenation 45.4 16.6 28

Example of attention probabilities 29

Image 30

Image Captioning A cat on the mat 31

Encoder-decoder with attention decoder + encoder 32

Captioning: Show, Attend & Tell 33

Results on the MS COCO database Method BLEU Log-Biliniar (Kiros et al 2014a) 24.3 Enc-Dec (Vinyals et al 2014a) 24.6 +Attention (Xu et al, 2015) 25.0 34

Other Computer Vision Tasks with Attention Visual Question Answering : given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Video Caption Generation: attempts to generate a complete and natural sentence, enriching the single label as in video classification, to capture the most informative dynamics in videos. 35

Neural MT architecture inspired by other areas

Convolutional Neural Neworks for character-aware Neural MT 37

German-English BLEU Results Method DE->EN EN->DE Phrase 20.99 17.04 NMT 20.64 17.15 +Char 22.10 20.22 38

Examples 39

Generative Adversarial Networks 40

German-to-English BLEU Results Method DE->EN Baseline (Shen et al 2016) 25.84 +Adversarial 27.94 41

German-to-English Example Source wir mussen verhindern , dass die menschen kenntnis erlangen von dingen , vor allem dann , wenn sie wahr sind . Baseline we need to prevent people who are able to know that people have to do, especially if they are true . +Adversarial we need to prevent people who are able to know about things , especially if they are true . REF we have to prevent people from finding about things , especially when they are true . 42

Discussion 43

Implementations of Encoder-Decoder LSTM CNN 44

Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs Local: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. Intra vs External: intra attention is within the encoder’s input sentence, external attention is across sentences. 45

One large encoder-decoder • Text, speech, image… is all converging to a signal paradigm? • If you know how to build a neural MT system, you may easily learn how to build a speech-to-text recognition system... • Or you may train them together to achieve zero-shot AI. *And other references on this research direction…. 46

Thanks Acknowledgements: MARTA.RUIZ@UPC.EDU Noé Casas and Carlos Escolano • WWW.COSTA-JUSSA.COM for their valuable feedback on the slides. MT-Marathon Organizers for • inviting me to this exciting event.

How Deep Learning is making MT and other areas converge? MARTA R. - PowerPoint PPT Presentation

How Deep Learning is making MT and other areas converge? MARTA R. COSTA-JUSS UNIVERSITAT POLITCNICA DE CATALUNYA, BARCELONA About me ASR SMT HMT CLIR HMT SMT+NN CLIR OM LIMSI- USP, So I2R, IPN,

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Converge Cornerstone Fund The Impact of COVID-19 and Ministry Lending S T A R T . S T R E N G T

Welcome to the CONVERGE Virtual Forum COVID-19 Working Groups for Public Health and Social

Why do irreversible processes converge faster to equilibrium than reversible ones? Marcus Kaiser

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

FOCUS AREAS FOCUS AREAS FOCUS AREAS FOCUS AREAS Our Our Vision Vision Our Our Vision

Designation of Areas under Evacuation Orders under Evacuation Orders Legend Legend Areas where

Using Instant Messaging to Provide an Intelligent Learning Environment Chun-Hung Lu 1 , Guey-Fa

Dialog Management EE596/LING580 -- Conversational Artificial Intelligence Hao Cheng University

11-823 Conlanging Building your own chatbot with AIML AIML Chatbots AIML Chatbots A.L.I.C.E

The Use of GIS Technology for The Use of GIS Technology for Planning of GNSS Measurement

r rt

Regular Properties and the Existence of Proof Systems AiML & LATD Bern, August 28, 2018

Micro Content, Chatbots, and Machine Learning What do they mean for Technical Authoring?

Dynamic epistemic logics: promises, problems, shortcomings, and perspectives Andreas Herzig

How Deep Learning is making MT and other areas converge? MARTA R. - PowerPoint PPT Presentation

How Deep Learning is making MT and other areas converge? MARTA R. COSTA-JUSS UNIVERSITAT POLITCNICA DE CATALUNYA, BARCELONA About me ASR SMT HMT CLIR HMT SMT+NN CLIR OM LIMSI- USP, So I2R, IPN,

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Converge Cornerstone Fund The Impact of COVID-19 and Ministry Lending S T A R T . S T R E N G T

Welcome to the CONVERGE Virtual Forum COVID-19 Working Groups for Public Health and Social

Why do irreversible processes converge faster to equilibrium than reversible ones? Marcus Kaiser

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

FOCUS AREAS FOCUS AREAS FOCUS AREAS FOCUS AREAS Our Our Vision Vision Our Our Vision

Designation of Areas under Evacuation Orders under Evacuation Orders Legend Legend Areas where

Using Instant Messaging to Provide an Intelligent Learning Environment Chun-Hung Lu 1 , Guey-Fa

Dialog Management EE596/LING580 -- Conversational Artificial Intelligence Hao Cheng University

11-823 Conlanging Building your own chatbot with AIML AIML Chatbots AIML Chatbots A.L.I.C.E

The Use of GIS Technology for The Use of GIS Technology for Planning of GNSS Measurement

r rt

Regular Properties and the Existence of Proof Systems AiML &amp; LATD Bern, August 28, 2018

Micro Content, Chatbots, and Machine Learning What do they mean for Technical Authoring?

Dynamic epistemic logics: promises, problems, shortcomings, and perspectives Andreas Herzig

Regular Properties and the Existence of Proof Systems AiML & LATD Bern, August 28, 2018