How Deep Learning is making MT and other areas converge? MARTA R. - - PowerPoint PPT Presentation

how deep learning is making mt and other areas converge
SMART_READER_LITE
LIVE PREVIEW

How Deep Learning is making MT and other areas converge? MARTA R. - - PowerPoint PPT Presentation

How Deep Learning is making MT and other areas converge? MARTA R. COSTA-JUSS UNIVERSITAT POLITCNICA DE CATALUNYA, BARCELONA About me ASR SMT HMT CLIR HMT SMT+NN CLIR OM LIMSI- USP, So I2R, IPN,


slide-1
SLIDE 1

How Deep Learning is making MT and other areas converge?

MARTA R. COSTA-JUSSÀ UNIVERSITAT POLITÈCNICA DE CATALUNYA, BARCELONA

slide-2
SLIDE 2

About me

2

  • ASR
  • SMT+NN

LIMSI- CNRS, Paris

  • SMT
  • S2S

Translation

UPC, Barcelona

  • SMT
  • CLIR

USP, São Paulo

  • HMT

I2R, Singapore

  • CLIR
  • OM

BM, Barcelona

  • HMT

IPN, Mexico

  • NMT
  • NLI
  • SLT

UPC, Barcelona

2004 2008 2012 2014 2015

slide-3
SLIDE 3

Outline

Machine Translation and Deep Learning Neural Machine Translation Neural MT architecture applied to other areas

  • NLP (Chatbot)
  • Speech (End-to-End speech recognition, End-to-End speech translation)
  • Image (Image captioning)

Neural MT inspired by other areas

  • Image/NLP (Character-aware modelling)
  • Machine Learning (Adversarial networks)

Discussion

3

slide-4
SLIDE 4

Machine Translation

Rules Dictionaries Co-ocurrences Frecuency Counts Neural Networks SOURCE LANGUAGE TARGET LANGUAGE M O D E L

From 1950s till now Eurotra, Apertium… (Forcada, 2005) From 1990s till now TC-Star, Moses… (Koehn, 2010) Starting in 2014… NEMATUS… (Cho, 2014)

Dates Refs

4

slide-5
SLIDE 5

Neural nets are…

Neural networks, a branch of machine learning, are a biologically-inspired programming paradigm which enables a computer to learn from observational data (http://neuralnetworksanddeeplearning.com/)

5

slide-6
SLIDE 6

Deep learning is…

A branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using model architectures, with complex structures or otherwise, composed of multiple non-linear transformations (wikipedia) A set of machine learning algorithms which attempt to learn multiple-layered models of inputs, commonly neural networks (Du et al, 2013)

6

slide-7
SLIDE 7

Neural Machine Translation

7

slide-8
SLIDE 8

Motivation: End-to-end system

PHRASE-BASED

Source Language Text Preprocessing Decoding Postprocessing Translation model Language model Target Language Text

Ensure that translated words come in the right order Finding the right target words given the source words

Parallel corpus Word Alignment Phrase extraction Monolingual corpus TRAINING TEST

preprocessing

NEURAL

8

encoder decoder

slide-9
SLIDE 9

Related work: language modeling

Find a function that takes as input n-1 words and returns a conditional probability of the next one Recurrent neural network have allowed to ignore dependency beyond the context window (via recursion)

p(I’m) p(fine|I’m) p(.|fine) EOS I’m fine .

9

slide-10
SLIDE 10

Architecture: encoder-decoder

how are you ? Cómo estás EOS

encoder decoder

eos Cómo estás

10

? ?

slide-11
SLIDE 11

Attention-based mechanism

encoder decoder

+

11

slide-12
SLIDE 12

Neural MT architecture applied to other areas

NATURAL LANGUAGE PROCESSING SPEECH IMAGE

slide-13
SLIDE 13

Natural Language Processing

13

slide-14
SLIDE 14

Chatbot: a computer program that conducts a conversation

14

slide-15
SLIDE 15

Chatbots models

RULE-BASED ELIZA: Decomposition rules and key words matching (Weizenbaum, 1966) AIML (Artificial Intelligence Markup Language) (Wallace, 1995). RETRIEVAL-BASED IRIS: a chat-oriented dialogue system based

  • n the vector space model framework

(Banchs and Li, 2012)

<category> <pattern>WHAT IS YOUR NAME</pattern> <template>My name is Michael N.S Evanious.</template> </category> https://hackernoon.com/chatbot-architecture-496f5bf820ed

15

slide-16
SLIDE 16

Generative-based chatbots

Human: what is your job ? Enc-dec: i’m a lawyer . Human: what do you do ? Enc-dec: i’m a doctor . This model can be trained end-to-end Can not solve the problem of modeling the dialogue

encoder decoder

16

slide-17
SLIDE 17

ü The model can generate simple and basic conversations. X The model can not currently deliver realistic conversations.

Chatbot evaluation

Question Count Enc-dec PREFERRED OVER Cleverbot 97 Cleverbot PREFERRED OVER Enc-dec 60 Tie 20 Disagreemnt AMONG JUDGES (no consensus of 3 out of 4 judges) 23

17

slide-18
SLIDE 18

Chatbot examples

Human: what is the purpose of being intelligent ? CleverBot: The purpose of life is to love God and love people. Enc-dec: to find out what it is * Human: What do you think about england during the reign of elizabeth ? CleverBot: I like him. Do you? Enc-dec: it was a great place . * Human: What is the color of a yellow car ? CleverBot: Brown. Enc-dec: yellow . Human: what is your job ? CleverBot: I work in mental health Enc-dec: i ’m a lawyer . Human: what do you do ? CleverBot: I have no job . Enc-dec: i ’m a doctor .

18

slide-19
SLIDE 19

Other NLP Tasks with Seq2seq

Text summarization: process of shortening a text document with software to create a summary with the major points of the original document. Question Answering: automatically producing an answer to a question given a corresponding document. Semantic Parsing: mapping natural language into a logical form that can be executed on a knowledge base and return an answer Syntactic Parsing: process of analysing a string of symbols, either in natural language or in computer languages, conforming to the rules of a formal grammar

19

slide-20
SLIDE 20

Speech Recognition

20

slide-21
SLIDE 21

Speech Recognition system

FEATURES RECOGNIZER DECISION

Lexicon Acoustic Models Language models TASK INFO

Feature vector N-best Hip.

RECOGNIZED SENTENCE

microphone

x = x1 ... x|x| w = w1 ... w|w|

21

slide-22
SLIDE 22

RNN/CNN-HMM+RNNLM

(N-GRAM +) RNN HMM RNN/CNN

Language Model Acoustic Model Phonetic inventory Pronunciation Lexicon

22

slide-23
SLIDE 23

Speech recognition with encoder- decoder with attention

Language Model Acoustic Model

encoder decoder

+

23

slide-24
SLIDE 24

Listener

Challenges: speech signals can be hundreds to thousands of frames long Solution: using a pyramid BLSTM

24

slide-25
SLIDE 25

Attend & Spell

25

slide-26
SLIDE 26

End-to-end Sp

Speech-to-te text

Model WER CLDNN-HMM* 8.0 LAS + LM Rescoring 10.3 *Convolutional Long Short Term Memory Fully Connected Deep Neural Network

26

slide-27
SLIDE 27

End-to-end Speech-to-text Translation

Multi-task learning which aims at improving the generalization performance of a task using other related tasks. One-to-many Many-to-One

What is new here compared to previous work? Multi-task training

Spanish Speech Speech Recognition Speech Translation Speech Translation Text Translation English Text One encoder, multiple decoders Multiple encoders, one decoder

27

slide-28
SLIDE 28

Spanish-> English FISHER/CALLHOME BLEU results

Model Test 1 Test 2 End-to-End ST 47.3 16.6 Multi-task 48.7 17.4 ASR / NMT concatenation 45.4 16.6

28

slide-29
SLIDE 29

Example of attention probabilities

29

slide-30
SLIDE 30

Image

30

slide-31
SLIDE 31

Image Captioning

A cat on the mat

31

slide-32
SLIDE 32

Encoder-decoder with attention

decoder

+

encoder

32

slide-33
SLIDE 33

Captioning: Show, Attend & Tell

33

slide-34
SLIDE 34

Results on the MS COCO database

Method BLEU Log-Biliniar (Kiros et al 2014a) 24.3 Enc-Dec (Vinyals et al 2014a) 24.6 +Attention (Xu et al, 2015) 25.0

34

slide-35
SLIDE 35

Other Computer Vision Tasks with Attention

Visual Question Answering: given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Video Caption Generation: attempts to generate a complete and natural sentence, enriching the single label as in video classification, to capture the most informative dynamics in videos.

35

slide-36
SLIDE 36

Neural MT architecture inspired by other areas

slide-37
SLIDE 37

Convolutional Neural Neworks for character-aware Neural MT

37

slide-38
SLIDE 38

German-English BLEU Results

38

Method DE->EN EN->DE Phrase 20.99 17.04 NMT 20.64 17.15 +Char 22.10 20.22

slide-39
SLIDE 39

Examples

39

slide-40
SLIDE 40

Generative Adversarial Networks

40

slide-41
SLIDE 41

German-to-English BLEU Results

41

Method DE->EN Baseline (Shen et al 2016) 25.84 +Adversarial 27.94

slide-42
SLIDE 42

German-to-English Example

42

Source wir mussen verhindern , dass die menschen kenntnis erlangen von dingen , vor allem dann , wenn sie wahr sind . Baseline we need to prevent people who are able to know that people have to do, especially if they are true . +Adversarial we need to prevent people who are able to know about things, especially if they are true . REF we have to prevent people from finding about things , especially when they are true .

slide-43
SLIDE 43

Discussion

43

slide-44
SLIDE 44

Implementations of Encoder-Decoder

LSTM CNN

44

slide-45
SLIDE 45

Attention-based mechanisms

Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs Local: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. Intra vs External: intra attention is within the encoder’s input sentence, external attention is across sentences.

45

slide-46
SLIDE 46

One large encoder-decoder

  • Text, speech, image… is all converging to a signal paradigm?
  • If you know how to build a neural MT system, you may easily learn

how to build a speech-to-text recognition system...

  • Or you may train them together to achieve zero-shot AI.

*And other references on this research direction….

46

slide-47
SLIDE 47

Thanks

MARTA.RUIZ@UPC.EDU WWW.COSTA-JUSSA.COM

Acknowledgements:

  • Noé Casas and Carlos Escolano

for their valuable feedback on the slides.

  • MT-Marathon

Organizers for inviting me to this exciting event.