Learning to Compose Neural Networks for Question Answering Andreas, - - PowerPoint PPT Presentation

learning to compose neural networks for question answering
SMART_READER_LITE
LIVE PREVIEW

Learning to Compose Neural Networks for Question Answering Andreas, - - PowerPoint PPT Presentation

Learning to Compose Neural Networks for Question Answering Andreas, Rohrbach, Darrell, and Klein Garrett Bingham Outline Approach Module Inventory Network Layout Experiments Conclusions Critiques 2 Approach


slide-1
SLIDE 1

Learning to Compose Neural Networks for Question Answering

Andreas, Rohrbach, Darrell, and Klein Garrett Bingham

slide-2
SLIDE 2

Outline

  • Approach
  • Module Inventory
  • Network Layout
  • Experiments
  • Conclusions
  • Critiques

2

slide-3
SLIDE 3

Approach

Dynamically assembled neural networks answer queries about images and structured knowledge bases.

3

slide-4
SLIDE 4

Module Inventory

4

slide-5
SLIDE 5

Module Base Types

Attention - A distribution over pixels or entities Labels - A distribution over answers

5

slide-6
SLIDE 6

Lookup (→ Attention)

Lookup produces attention focused at index f(i):

6

(1 of 6)

slide-7
SLIDE 7

Find (→ Attention)

Find computes a distribution

  • ver indices with a MLP.

7

(2 of 6)

slide-8
SLIDE 8

Relate (Attention → Attention)

Relate directs focus from one region of the input to another. It is similar to the find module, but it conditions its behavior

  • n the current attention.

8

(3 of 6)

slide-9
SLIDE 9

And (Attention → Attention)

And is analogous to set intersection for attentions. The and module computes elementwise multiplication.

9

(4 of 6)

slide-10
SLIDE 10

Describe (Attention → Labels)

Describe uses input attention to predict an output label.

10

(5 of 6)

slide-11
SLIDE 11

Exists (Attention → Labels)

Exists produces a label directly from the input attention. It does not use an intermediate feature vector like describe.

11

(6 of 6)

slide-12
SLIDE 12

Network Layout

12

slide-13
SLIDE 13

Question → Dependency Parse

13

(1 of 4)

slide-14
SLIDE 14

Parse → Layout Fragments

Nouns, verbs → find Proper nouns → lookup Prepositional phrases → relate preposition, {find noun, lookup proper noun}

14

(2 of 4)

slide-15
SLIDE 15

Fragments → Layout Candidates

For each subset of fragments: → Join all fragments with and → Insert exists or describe at the top

15

(3 of 4)

slide-16
SLIDE 16

Candidates → Final Network

Network selected with policy gradient method. Once network is chosen, it is trained with standard backpropagation. Modules have global weights shared across all instances of the module, but not shared with other modules.

16

(4 of 4)

slide-17
SLIDE 17

Experiments

17

slide-18
SLIDE 18

Visual Question Answering

200,000+ images and human-annotated questions and answers. Only describe, and, and find modules are used.

18

(1 of 2)

slide-19
SLIDE 19

VQA - Results

SOTA results, outperforming:

  • Visual bag-of-words model
  • Dynamic parameter prediction model (fixed architecture)
  • Conventional attention model
  • Previous neural module networks without structure

prediction

19

(1 of 2)

slide-20
SLIDE 20

GeoQA

263 examples Entities (states, cities, parks), relations (north-of, capital-of) GeoQA+Q distinguishes between: What cities are in Texas? → Austin Are there any cities in Texas? → Yes while GeoQA does not.

20

(2 of 2)

slide-21
SLIDE 21

GeoQA - Results

Dynamic model outperforms:

  • Logical baseline (LSP-F)
  • Perceptual baseline (LSP-W)
  • Fixed-structure neural

module network (NMN) Demonstrates that D-NMN can

  • utperform logical baselines and

perform well on diverse datasets.

21

(2 of 2)

slide-22
SLIDE 22

Conclusions and Critiques

22

slide-23
SLIDE 23

Conclusions

Given (question, world, answer) triples, model learns to assemble neural networks on the fly. Model answers queries about both structured and unstructured information. SOTA on VQA and GeoQA+Q datasets.

23

slide-24
SLIDE 24

Critique 1 - Discarding modules

Only describe, and, and find were used for VQA: (describe (and ( find [ all nouns in sentence ] ) ) ) Why introduce modules and then not use them? We have to conclude that lookup, relate, and exists hurt performance. Are static networks better than dynamic ones? Is the RL network constructor agent not effective?

24

slide-25
SLIDE 25

Critique 2 - No RL Agent Baselines

How do we know the RL agent is effective at constructing networks? What about a random network? Why not construct a network by and-ing all of the layout candidates together?

25

slide-26
SLIDE 26

Critique 3 - Nitpicks

What is the motivation for these modules specifically? What is a measure module? And, but no or?

26

slide-27
SLIDE 27

Questions?