Slide Credits:Agrawal Slide Credits:Agrawal Slide Credits:Agrawal - - PowerPoint PPT Presentation

slide credits agrawal slide credits agrawal slide credits
SMART_READER_LITE
LIVE PREVIEW

Slide Credits:Agrawal Slide Credits:Agrawal Slide Credits:Agrawal - - PowerPoint PPT Presentation

Slide Credits:Agrawal Slide Credits:Agrawal Slide Credits:Agrawal Kolmogorov-Smirnov Test p(Captions vs (Q+A))<0.001 LSTM : one hidden layer MLP : 2 hidden layer fc network output size 1024 1000 dropout(0.5) units tanh each word size


slide-1
SLIDE 1
slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15

Slide Credits:Agrawal

slide-16
SLIDE 16

Slide Credits:Agrawal

slide-17
SLIDE 17

Slide Credits:Agrawal

slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29

Kolmogorov-Smirnov Test p(Captions vs (Q+A))<0.001

slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33

LSTM : one hidden layer MLP : 2 hidden layer fc network

  • utput size 1024

1000 dropout(0.5) units tanh each word size 300 end-to-end learning cross-entropy Deeper LSTM: two hidden layer

  • utput :

2048 > fc+tanh >1024 Input Vocabulary : All question words

slide-34
SLIDE 34

2-Channel VQA Model

Convolution Layer + Non-Linearity Pooling Layer Convolution Layer + Non-Linearity Pooling Layer Fully-Connected MLP

4096-dim

Embedding Embedding

“How many horses are in this image?”

Neural Network Softmax

  • ver top K answers

Image Question

1024-dim

Slide Credits:Agrawal

slide-35
SLIDE 35

Ablation #1: Language-alone

Convolution Layer + Non-Linearity Pooling Layer Convolution Layer + Non-Linearity Pooling Layer Fully-Connected MLP

1k output units

Embedding

Neural Network Softmax

  • ver top K answers

Image

“How many horses are in this image?”

Question Embedding

1024-dim

Slide Credits:Agrawal

slide-36
SLIDE 36

Ablation #2: Vision-alone

Convolution Layer + Non-Linearity Pooling Layer Convolution Layer + Non-Linearity Pooling Layer Fully-Connected MLP

4096-dim

Embedding

Neural Network Softmax

  • ver top K answers

Image

“How many horses are in this image?”

Question Embedding

Slide Credits:Agrawal

slide-37
SLIDE 37

Slide Credits:Agrawal

slide-38
SLIDE 38
slide-39
SLIDE 39

Slide Credits:Agrawal

slide-40
SLIDE 40

Current Leaderboard

slide-41
SLIDE 41

Questions&Discussion&Demo