Slide Credits:Agrawal Slide Credits:Agrawal Slide Credits:Agrawal - - PowerPoint PPT Presentation

▶

Sep 05, 2022 540 likes •964 views

Slide Credits:Agrawal Slide Credits:Agrawal Slide Credits:Agrawal Kolmogorov-Smirnov Test p(Captions vs (Q+A))<0.001 LSTM : one hidden layer MLP : 2 hidden layer fc network output size 1024 1000 dropout(0.5) units tanh each word size

SLIDE 1

SLIDE 2

SLIDE 3

SLIDE 4

SLIDE 5

SLIDE 6

SLIDE 7

SLIDE 8

SLIDE 9

SLIDE 10

SLIDE 11

SLIDE 12

SLIDE 13

SLIDE 14

SLIDE 15

Slide Credits:Agrawal

SLIDE 16

Slide Credits:Agrawal

SLIDE 17

Slide Credits:Agrawal

SLIDE 18

SLIDE 19

SLIDE 20

SLIDE 21

SLIDE 22

SLIDE 23

SLIDE 24

SLIDE 25

SLIDE 26

SLIDE 27

SLIDE 28

SLIDE 29

Kolmogorov-Smirnov Test p(Captions vs (Q+A))<0.001

SLIDE 30

SLIDE 31

SLIDE 32

SLIDE 33

LSTM : one hidden layer MLP : 2 hidden layer fc network

utput size 1024

1000 dropout(0.5) units tanh each word size 300 end-to-end learning cross-entropy Deeper LSTM: two hidden layer

utput :

2048 > fc+tanh >1024 Input Vocabulary : All question words

SLIDE 34

2-Channel VQA Model

Convolution Layer + Non-Linearity Pooling Layer Convolution Layer + Non-Linearity Pooling Layer Fully-Connected MLP

4096-dim

Embedding Embedding

“How many horses are in this image?”

Neural Network Softmax

ver top K answers

Image Question

1024-dim

Slide Credits:Agrawal

SLIDE 35

Ablation #1: Language-alone

Convolution Layer + Non-Linearity Pooling Layer Convolution Layer + Non-Linearity Pooling Layer Fully-Connected MLP

1k output units

Embedding

Neural Network Softmax

ver top K answers

Image

“How many horses are in this image?”

Question Embedding

1024-dim

Slide Credits:Agrawal

SLIDE 36

Ablation #2: Vision-alone

Convolution Layer + Non-Linearity Pooling Layer Convolution Layer + Non-Linearity Pooling Layer Fully-Connected MLP

4096-dim

Embedding

Neural Network Softmax

ver top K answers

Image

“How many horses are in this image?”

Question Embedding

Slide Credits:Agrawal

SLIDE 37

Slide Credits:Agrawal

SLIDE 38

SLIDE 39

Slide Credits:Agrawal

SLIDE 40

Current Leaderboard

SLIDE 41

2-Channel VQA Model

Embedding Embedding

Image Question

Ablation #1: Language-alone

Embedding

Image

Question Embedding

Ablation #2: Vision-alone

Embedding

Image

Question Embedding

Current Leaderboard

Questions&Discussion&Demo