Attention for Machine Comprehension Made by : Rishab Goel Based on - - PowerPoint PPT Presentation

attention for machine comprehension
SMART_READER_LITE
LIVE PREVIEW

Attention for Machine Comprehension Made by : Rishab Goel Based on - - PowerPoint PPT Presentation

Attention for Machine Comprehension Made by : Rishab Goel Based on slides by: Alex Graves, Hien Quoc, Renjie Liao Highway Networks Benefits ... Benefits ... Importance ... For training very deep architectures By allowing better information


slide-1
SLIDE 1

Attention for Machine Comprehension

Made by : Rishab Goel

Based on slides by: Alex Graves, Hien Quoc, Renjie Liao

slide-2
SLIDE 2

Highway Networks

slide-3
SLIDE 3
slide-4
SLIDE 4

Benefits ...

slide-5
SLIDE 5

Benefits ...

slide-6
SLIDE 6

Importance ...

For training very deep architectures By allowing better information flow Better optimization Intuition : linear transformation/input suffice for learning, language at higher level of abstraction???

http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

slide-7
SLIDE 7

Hien Quoc Dang

slide-8
SLIDE 8

Idea of Maxout

Hien Quoc Dang

slide-9
SLIDE 9

Intuitions

Inspired from dropout Similar to bagging but integrated as a part of single network

Hien Quoc Dang

slide-10
SLIDE 10

Idea of Maxout ...

Hien Quoc Dang

slide-11
SLIDE 11

Idea of Maxout ...

Hien Quoc Dang

slide-12
SLIDE 12

Comparison to Rectifiers

Hien Quoc Dang

slide-13
SLIDE 13

Why Maxout Work ?

Hien Quoc Dang

slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16

Slides : Santi Pascual

slide-17
SLIDE 17

LSTMs ...

Chris Olah’s blog

slide-18
SLIDE 18

Need for Attention

The embeddings not sufficient to encode information over long distances Helps to attend to important patch of data Interpretability to the model

slide-19
SLIDE 19

Attentive Reader

slide-20
SLIDE 20
slide-21
SLIDE 21

DYNAMIC COATTENTION NETWORKS FOR QUESTION ANSWERING

Authors : Caiming Xiong, Victor Zhong, Richard Socher

slide-22
SLIDE 22

Introduction

Machine Comprehension No knowledge base required Till SQUAD no large scale, natural dataset Cloze style datasets like CNN/Mail Daily Synthetic/small size

slide-23
SLIDE 23

About SQuAD

Consists questions on a set of Wikipedia articles Wh type questions The answer is a segment of text, or span

Source : Rajpurkar et al.

slide-24
SLIDE 24

Model in nutshell ...

Socher et al

slide-25
SLIDE 25

Doc and Query Encoder

Socher et al

slide-26
SLIDE 26

Liked

  • Gagan

Socher et al

slide-27
SLIDE 27

Dynamic Decoder

Liked : all Socher et al

slide-28
SLIDE 28

Highway Maxout Network ...

Socher et al

slide-29
SLIDE 29

Socher et al

slide-30
SLIDE 30

Socher et al

slide-31
SLIDE 31

Implementation

  • 1. CoreNLP for preprocessing
  • 2. GloVe word vectors pretrained on 840B

Common Crawl corpus

  • 3. OOV set to 0
  • 4. Sentinel vectors randomly initialized, optimized

during training

Disliked

  • Gagan (pt. 3)
  • Akshay (pt. 4) claim not proven
slide-32
SLIDE 32

Iterative process visualisation ...

Socher et al

slide-33
SLIDE 33

Socher et al

slide-34
SLIDE 34

Results

Disliked

  • Haroun (ensemble gain too

much) Socher et al

slide-35
SLIDE 35

Liked

  • Barun
  • Nupur

Socher et al

slide-36
SLIDE 36

Performance across diff. types of ques.

Liked

  • Shantanu

Socher et al

slide-37
SLIDE 37

Ablation studies ...

Liked

  • Prachi

Socher et al

slide-38
SLIDE 38

Predictions

Socher et al

slide-39
SLIDE 39

Logistic Regression Prediction : Theatre Museum

Socher et al

slide-40
SLIDE 40

Comments : Trouble decoding multiple intuitive answer

Socher et al

slide-41
SLIDE 41

Cons

Lack error analysis, need more ablation studies[Barun, Surag] System give extractive answer and not abstractive[Nupur] Do not compare HMN and MN[all] Unintuitive decoder[Dinesh]

slide-42
SLIDE 42

Doubts ...

Why HMN worked out? Role of sentinel vectors?? Error propagation in argmax function Maxout for LSTMs as well (not clear) Use multiple initialisation of start and end pointers ( how ??)

slide-43
SLIDE 43

Extensions ...

Use approach for others datasets like CNN/Daily Mail and MS COCO QA [Barun] Use different attention, Match LSTM [Barun] Bi-directional attention [Gagan] Use iterative idea to visual QA, classification, NER, SRL etc [Akshay, Surag] Find synonyms[Haroun]

slide-44
SLIDE 44

Extensions ...

Combine char2vec and word2vec embeddings to represent the document and query

slide-45
SLIDE 45

Thanks!

slide-46
SLIDE 46