Sequence-to-sequence models
used for machine translation and Murat Apishev Katya Artemova
Computational Pragmatics Lab, HSE
December 2, 2019
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 1 / 67
Sequence-to-sequence models used for machine translation and Murat - - PowerPoint PPT Presentation
Sequence-to-sequence models used for machine translation and Murat Apishev Katya Artemova Computational Pragmatics Lab, HSE December 2, 2019 Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 1 / 67 Machine translation
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 1 / 67
Machine translation
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 2 / 67
Machine translation
Image source: jeddy92 Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 3 / 67
Machine translation
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 4 / 67
Machine translation
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 5 / 67
Machine translation
j−1,j∈[n+1,m] and a context vector cj
Image source: jalammar Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 6 / 67
Machine translation
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 7 / 67
Machine translation
i while generating the word j
i , hD j ))
k , hD j ))
◮ dot product attention:
◮ additive attention:
◮ multiplicative attention:
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 8 / 67
Machine translation
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 9 / 67
Machine translation
4
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 10 / 67
Task oriented chat-bots
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 11 / 67
Task oriented chat-bots
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 12 / 67
Task oriented chat-bots
1 The encoder models is a
2 The decoder is a unidirectional
3 At each step the decoder state
k exp(ei,k),
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 13 / 67
Task oriented chat-bots
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 14 / 67
Constituency parsing
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 15 / 67
Constituency parsing
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 16 / 67
Constituency parsing
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 17 / 67
Constituency parsing
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 18 / 67
Spelling correction
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 19 / 67
Spelling correction
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 20 / 67
Spelling correction
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 21 / 67
Spelling correction
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 22 / 67
Summarization
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 23 / 67
Summarization
1 Abstractive summarization: paraphrase the corpus using novel
2 Extractive summarization: concatenate extracts taken from a corpus
Image source: nlpprogress.com Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 24 / 67
Summarization
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 25 / 67
Summarization
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 26 / 67
Summarization
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 27 / 67
Summarization
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 28 / 67
Summarization
1 Gigaword summarization dataset [11] 2 RIA news dataset [12] Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 29 / 67
Summarization
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 30 / 67
Summarization
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 31 / 67
Summarization
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 32 / 67
Summarization
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 33 / 67
Summarization
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 34 / 67
Summarization
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 35 / 67
Summarization
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 36 / 67
Summarization
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 37 / 67
Question answering
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 38 / 67
Question answering
1 Factoid questions: ◮ What is the dress code for the Vatican? ◮ Who is the President of the United States? ◮ What are the dots in Hebrew called? 2 Commonsense questions: ◮ What do all humans want to experience in their own home? (a) feel
3 Opinion questions: ◮ Can anyone recommend a good coffee shop near HSE campus? 4 Cloze-style questions Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 39 / 67
Question answering
1 Types of answers ◮ binary (yes / now) ◮ find a span of text ◮ multiple choice Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 40 / 67
Question answering
1 Information retrieval (IR)-based QA: find a span of text, which
2 Open-domain Question Answering (ODQA): answer questions about
3 Knowledge (KB)-based QA: build a semantic representation of
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 41 / 67
IR-based QA
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 42 / 67
IR-based QA
◮ answer type (PER, LOC, TIME) ◮ focus ◮ question type
◮ question reformulation: remove wh-words, change word order ◮ query expansion Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 43 / 67
IR-based QA
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 44 / 67
IR-based QA Datasets
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 45 / 67
IR-based QA Datasets
1 Stanford Question Answering Dataset (SQuAD) 2 NewsQA 3 WikiQA 4 WebQuestions 5 WikiMovies 6 Russian: SberQUAD 7 MedQuAD [16] Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 46 / 67
IR-based QA Datasets
1 Project Nayuki’s Wikipedia’s internal PageRanks to obtain the top
2 Articles splitted in individual paragraphes 3 Crowsourcing: ask and answer up to 5 questions on the content of
4 Crowdworkers were encouraged to ask questions in their own words,
5 Analysis: the (i) diversity of answer types, (ii) the difficulty of
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 47 / 67
IR-based QA Datasets
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 48 / 67
IR-based QA Datasets
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 49 / 67
IR-based QA Datasets
1 first predict whether a question can be answered, if so, generate the
2 the generated answer should be well-formed 3 the passage re-ranking
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 50 / 67
IR-based QA Models
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 51 / 67
IR-based QA Models
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 52 / 67
IR-based QA Models
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 53 / 67
IR-based QA Models
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 54 / 67
IR-based QA Models
1 Question and passage encoder: BiRNN to convert the words to
2 Gated attention-based recurrent networks: to incorporate
3 Self-matching attention: passage context is necessary to infer the
4 Output: use pointer networks to predict the start and end position of
5 Training: minimize the sum of the negative log probabilities of the
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 55 / 67
IR-based QA Models
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 56 / 67
IR-based QA Models
1 Character Embedding Layer maps each word to a vector space
2 Word Embedding Layer maps each word to a vector space using a
3 Contextual Embedding Layer utilizes contextual cues from
4 Attention Flow Layer couples the query and context vectors and
5 Modeling Layer employs a Recurrent Neural Network to scan the
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 57 / 67
IR-based QA Models
1 S-NET [23]: Extraction-then-synthesis framework 2 QANet [24] benefits from data augmentation techniques, such as
3 V-NET [25]: end-to-end neural model that enables answer candidates
4 Deep Cascade QA [26]: deep cascade model, which consists of the
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 58 / 67
IR-based QA Models
1 seq2seq architectures are exploited in a variety of NLP tasks 2 Attention mechanism helps to find soft alignments 3 The metrics are rarely differentiable, hence reinforcement learning Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 59 / 67
IR-based QA Models
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 60 / 67
IR-based QA Models
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 61 / 67
IR-based QA Models
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 62 / 67
IR-based QA Models
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 63 / 67
IR-based QA Models
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 64 / 67
IR-based QA Models
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 65 / 67
IR-based QA Models
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 66 / 67
IR-based QA Models
Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 67 / 67