How Deep Learning is making MT and other areas converge?
MARTA R. COSTA-JUSSÀ UNIVERSITAT POLITÈCNICA DE CATALUNYA, BARCELONA
How Deep Learning is making MT and other areas converge? MARTA R. - - PowerPoint PPT Presentation
How Deep Learning is making MT and other areas converge? MARTA R. COSTA-JUSS UNIVERSITAT POLITCNICA DE CATALUNYA, BARCELONA About me ASR SMT HMT CLIR HMT SMT+NN CLIR OM LIMSI- USP, So I2R, IPN,
MARTA R. COSTA-JUSSÀ UNIVERSITAT POLITÈCNICA DE CATALUNYA, BARCELONA
2
LIMSI- CNRS, Paris
Translation
UPC, Barcelona
USP, São Paulo
I2R, Singapore
BM, Barcelona
IPN, Mexico
UPC, Barcelona
2004 2008 2012 2014 2015
Machine Translation and Deep Learning Neural Machine Translation Neural MT architecture applied to other areas
Neural MT inspired by other areas
Discussion
3
Rules Dictionaries Co-ocurrences Frecuency Counts Neural Networks SOURCE LANGUAGE TARGET LANGUAGE M O D E L
From 1950s till now Eurotra, Apertium… (Forcada, 2005) From 1990s till now TC-Star, Moses… (Koehn, 2010) Starting in 2014… NEMATUS… (Cho, 2014)
Dates Refs
4
Neural networks, a branch of machine learning, are a biologically-inspired programming paradigm which enables a computer to learn from observational data (http://neuralnetworksanddeeplearning.com/)
5
A branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using model architectures, with complex structures or otherwise, composed of multiple non-linear transformations (wikipedia) A set of machine learning algorithms which attempt to learn multiple-layered models of inputs, commonly neural networks (Du et al, 2013)
6
7
PHRASE-BASED
Source Language Text Preprocessing Decoding Postprocessing Translation model Language model Target Language Text
Ensure that translated words come in the right order Finding the right target words given the source words
Parallel corpus Word Alignment Phrase extraction Monolingual corpus TRAINING TEST
preprocessingNEURAL
8
encoder decoder
Find a function that takes as input n-1 words and returns a conditional probability of the next one Recurrent neural network have allowed to ignore dependency beyond the context window (via recursion)
p(I’m) p(fine|I’m) p(.|fine) EOS I’m fine .
9
how are you ? Cómo estás EOS
encoder decoder
eos Cómo estás
10
? ?
encoder decoder
+
11
NATURAL LANGUAGE PROCESSING SPEECH IMAGE
Natural Language Processing
13
14
RULE-BASED ELIZA: Decomposition rules and key words matching (Weizenbaum, 1966) AIML (Artificial Intelligence Markup Language) (Wallace, 1995). RETRIEVAL-BASED IRIS: a chat-oriented dialogue system based
(Banchs and Li, 2012)
<category> <pattern>WHAT IS YOUR NAME</pattern> <template>My name is Michael N.S Evanious.</template> </category> https://hackernoon.com/chatbot-architecture-496f5bf820ed
15
Human: what is your job ? Enc-dec: i’m a lawyer . Human: what do you do ? Enc-dec: i’m a doctor . This model can be trained end-to-end Can not solve the problem of modeling the dialogue
encoder decoder
16
ü The model can generate simple and basic conversations. X The model can not currently deliver realistic conversations.
Question Count Enc-dec PREFERRED OVER Cleverbot 97 Cleverbot PREFERRED OVER Enc-dec 60 Tie 20 Disagreemnt AMONG JUDGES (no consensus of 3 out of 4 judges) 23
17
Human: what is the purpose of being intelligent ? CleverBot: The purpose of life is to love God and love people. Enc-dec: to find out what it is * Human: What do you think about england during the reign of elizabeth ? CleverBot: I like him. Do you? Enc-dec: it was a great place . * Human: What is the color of a yellow car ? CleverBot: Brown. Enc-dec: yellow . Human: what is your job ? CleverBot: I work in mental health Enc-dec: i ’m a lawyer . Human: what do you do ? CleverBot: I have no job . Enc-dec: i ’m a doctor .
18
Text summarization: process of shortening a text document with software to create a summary with the major points of the original document. Question Answering: automatically producing an answer to a question given a corresponding document. Semantic Parsing: mapping natural language into a logical form that can be executed on a knowledge base and return an answer Syntactic Parsing: process of analysing a string of symbols, either in natural language or in computer languages, conforming to the rules of a formal grammar
19
Speech Recognition
20
FEATURES RECOGNIZER DECISION
Lexicon Acoustic Models Language models TASK INFO
Feature vector N-best Hip.
RECOGNIZED SENTENCE
microphone
x = x1 ... x|x| w = w1 ... w|w|
21
(N-GRAM +) RNN HMM RNN/CNN
Language Model Acoustic Model Phonetic inventory Pronunciation Lexicon
22
Language Model Acoustic Model
encoder decoder
+
23
Challenges: speech signals can be hundreds to thousands of frames long Solution: using a pyramid BLSTM
24
25
Model WER CLDNN-HMM* 8.0 LAS + LM Rescoring 10.3 *Convolutional Long Short Term Memory Fully Connected Deep Neural Network
26
End-to-end Speech-to-text Translation
Multi-task learning which aims at improving the generalization performance of a task using other related tasks. One-to-many Many-to-One
What is new here compared to previous work? Multi-task training
Spanish Speech Speech Recognition Speech Translation Speech Translation Text Translation English Text One encoder, multiple decoders Multiple encoders, one decoder
27
Model Test 1 Test 2 End-to-End ST 47.3 16.6 Multi-task 48.7 17.4 ASR / NMT concatenation 45.4 16.6
28
29
Image
30
A cat on the mat
31
decoder
+
encoder
32
33
Method BLEU Log-Biliniar (Kiros et al 2014a) 24.3 Enc-Dec (Vinyals et al 2014a) 24.6 +Attention (Xu et al, 2015) 25.0
34
Visual Question Answering: given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Video Caption Generation: attempts to generate a complete and natural sentence, enriching the single label as in video classification, to capture the most informative dynamics in videos.
35
37
38
Method DE->EN EN->DE Phrase 20.99 17.04 NMT 20.64 17.15 +Char 22.10 20.22
39
40
41
Method DE->EN Baseline (Shen et al 2016) 25.84 +Adversarial 27.94
42
Source wir mussen verhindern , dass die menschen kenntnis erlangen von dingen , vor allem dann , wenn sie wahr sind . Baseline we need to prevent people who are able to know that people have to do, especially if they are true . +Adversarial we need to prevent people who are able to know about things, especially if they are true . REF we have to prevent people from finding about things , especially when they are true .
43
LSTM CNN
44
Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs Local: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. Intra vs External: intra attention is within the encoder’s input sentence, external attention is across sentences.
45
how to build a speech-to-text recognition system...
*And other references on this research direction….
46
MARTA.RUIZ@UPC.EDU WWW.COSTA-JUSSA.COM
Acknowledgements:
for their valuable feedback on the slides.
Organizers for inviting me to this exciting event.