Learning to Compose Neural Networks for Question Answering
Andreas, Rohrbach, Darrell, and Klein Garrett Bingham
Learning to Compose Neural Networks for Question Answering Andreas, - - PowerPoint PPT Presentation
Learning to Compose Neural Networks for Question Answering Andreas, Rohrbach, Darrell, and Klein Garrett Bingham Outline Approach Module Inventory Network Layout Experiments Conclusions Critiques 2 Approach
Andreas, Rohrbach, Darrell, and Klein Garrett Bingham
2
Dynamically assembled neural networks answer queries about images and structured knowledge bases.
3
4
Attention - A distribution over pixels or entities Labels - A distribution over answers
5
Lookup produces attention focused at index f(i):
6
(1 of 6)
Find computes a distribution
7
(2 of 6)
Relate directs focus from one region of the input to another. It is similar to the find module, but it conditions its behavior
8
(3 of 6)
And is analogous to set intersection for attentions. The and module computes elementwise multiplication.
9
(4 of 6)
Describe uses input attention to predict an output label.
10
(5 of 6)
Exists produces a label directly from the input attention. It does not use an intermediate feature vector like describe.
11
(6 of 6)
12
13
(1 of 4)
Nouns, verbs → find Proper nouns → lookup Prepositional phrases → relate preposition, {find noun, lookup proper noun}
14
(2 of 4)
For each subset of fragments: → Join all fragments with and → Insert exists or describe at the top
15
(3 of 4)
Network selected with policy gradient method. Once network is chosen, it is trained with standard backpropagation. Modules have global weights shared across all instances of the module, but not shared with other modules.
16
(4 of 4)
17
200,000+ images and human-annotated questions and answers. Only describe, and, and find modules are used.
18
(1 of 2)
SOTA results, outperforming:
prediction
19
(1 of 2)
263 examples Entities (states, cities, parks), relations (north-of, capital-of) GeoQA+Q distinguishes between: What cities are in Texas? → Austin Are there any cities in Texas? → Yes while GeoQA does not.
20
(2 of 2)
Dynamic model outperforms:
module network (NMN) Demonstrates that D-NMN can
perform well on diverse datasets.
21
(2 of 2)
22
Given (question, world, answer) triples, model learns to assemble neural networks on the fly. Model answers queries about both structured and unstructured information. SOTA on VQA and GeoQA+Q datasets.
23
Only describe, and, and find were used for VQA: (describe (and ( find [ all nouns in sentence ] ) ) ) Why introduce modules and then not use them? We have to conclude that lookup, relate, and exists hurt performance. Are static networks better than dynamic ones? Is the RL network constructor agent not effective?
24
How do we know the RL agent is effective at constructing networks? What about a random network? Why not construct a network by and-ing all of the layout candidates together?
25
What is the motivation for these modules specifically? What is a measure module? And, but no or?
26