Natural Language for Visual Reasoning Alane Suhr, Mike Lewis, James - - PowerPoint PPT Presentation

natural language for visual reasoning
SMART_READER_LITE
LIVE PREVIEW

Natural Language for Visual Reasoning Alane Suhr, Mike Lewis, James - - PowerPoint PPT Presentation

Natural Language for Visual Reasoning Alane Suhr, Mike Lewis, James Yeh, Yoav Artzi lic.nlp.cornell.edu/nlvr/ Language and Vision A small herd of cows in a large What is the dog carrying? grassy field. (Agrawal et al 2015) (Chen et al 2015)


slide-1
SLIDE 1

Natural Language for Visual Reasoning

Alane Suhr, Mike Lewis, James Yeh, Yoav Artzi

lic.nlp.cornell.edu/nlvr/

slide-2
SLIDE 2

A small herd of cows in a large grassy field.

(Chen et al 2015)

What is the dog carrying?

(Agrawal et al 2015)

Language and Vision

Our goal: natural language with a diverse set of semantic and syntactic phenomenon

slide-3
SLIDE 3

There is a box with 3 items of all 3 different colors.

Task: determine whether the statement is true or false for the image.

Natural Language for Visual Reasoning

TRUE

slide-4
SLIDE 4

Outline

  • Task and environments
  • Data collection
  • Analysis
  • Baselines
slide-5
SLIDE 5

There are only two towers which has the same base color. TRUE FALSE

Task and Environments

Scatter Tower There is a box with 3 items of all 3 different colors.

slide-6
SLIDE 6

Data collection

  • Goal: collect natural language descriptions of

images and true/false judgments

  • Generate images
  • Collect natural language sentences
  • Validate image/sentence pairs
slide-7
SLIDE 7

Image Generation

slide-8
SLIDE 8
  • Randomly choose number of

items per box and item shapes, colors, sizes, and positions (without overlap)

Image Generation

slide-9
SLIDE 9
  • Randomly choose number of

items per box and item shapes, colors, sizes, and positions (without overlap)

  • Construct second image with the

same type

Image Generation

slide-10
SLIDE 10
  • Randomly choose number of

items per box and item shapes, colors, sizes, and positions (without overlap)

  • Construct second image with the

same type

Image Generation

slide-11
SLIDE 11
  • Construct third image by shuffling

items in the first image

  • Randomly choose number of

items per box and item shapes, colors, sizes, and positions (without overlap)

  • Construct second image with the

same type

Image Generation

slide-12
SLIDE 12
  • Randomly choose number of

items per box and item shapes, colors, sizes, and positions (without overlap)

  • Construct second image with the

same type

Image Generation

  • Construct third image by shuffling

items in the first image

slide-13
SLIDE 13

Image Generation

  • Construct fourth image by

shuffling items in the second image

  • Randomly choose number of

items per box and item shapes, colors, sizes, and positions (without overlap)

  • Construct second image with the

same type

  • Construct third image by shuffling

items in the first image Generate two unique images and permute their items to create two other images

slide-14
SLIDE 14

Sentence Writing

Write a sentence that is true about the top two images and false about the bottom two.

There is a box with 3 items

  • f all 3 different colors.

There is a box with 3 items

  • f all 3 different colors.

There is a box with 3 items

  • f all 3 different colors.

There is a box with 3 items

  • f all 3 different colors.
  • Don’t refer to the
  • rder of the images.
  • Don’t refer to the
  • rder of the boxes.

Setup encourages set reasoning, counting, and comparisons

slide-15
SLIDE 15

Sentence Writing

There is a box with 3 items

  • f all 3 different colors.

There is a box with 3 items

  • f all 3 different colors.

There is a box with 3 items

  • f all 3 different colors.

There is a box with 3 items

  • f all 3 different colors.

TRUE TRUE FALSE FALSE

slide-16
SLIDE 16

Validation

There is a box with 3 items

  • f all 3 different colors.
  • Higher-quality data
  • Measure agreement
  • Make sure sentences follow the guidelines


Fleiss’ κ: 0.709 ➡ 0.808

slide-17
SLIDE 17

Validation

There is a box with 3 items

  • f all 3 different colors.

☑︎ ☐

TRUE FALSE

slide-18
SLIDE 18

Permutation

There is a box with 3 items

  • f all 3 different colors.

☑︎ ☐

TRUE FALSE

slide-19
SLIDE 19

Corpus Statistics

  • 92,244 examples
  • 3,962 unique sentences
  • Krippendorff’s α: 0.831
  • Fleiss’ κ: 0.808
  • (Landis and Koch, 1977)
  • 262 words in the

vocabulary

  • Average sentence length
  • f 11.2

lic.nlp.cornell.edu/nlvr

  • Four data splits
  • 80.7% training
  • 6.4% development
  • 6.4% public test
  • 6.4% unreleased test
slide-20
SLIDE 20

Related Corpora

Task Examples MSCOCO

(Chen et al 2015)

Caption generation A small herd of cows in a large grassy field. CLEVR

(Johnson et al 2016)

Question answering

How many objects are either small cylinders

  • r red things?

VQA — real

(Agrawal et al 2015)

Question answering What is the dog carrying? VQA — abstract

(Agrawal et al 2015)

Question answering Is this a forest? NLVR

(Suhr et al 2017)

Binary classification

there are exactly three blue objects not touching any edge

slide-21
SLIDE 21

Related Corpora

Task Real images? Natural language? MSCOCO

(Chen et al 2015)

Caption generation

✔ ✔

CLEVR

(Johnson et al 2016)

Question answering

✗ ✗

VQA — real

(Agrawal et al 2015)

Question answering

✔ ✔

VQA — abstract

(Agrawal et al 2015)

Question answering

✗ ✔

NLVR

(Suhr et al 2017)

Binary classification

✗ ✔

slide-22
SLIDE 22

Longer than VQA Similar to MS COCO

Lengths

6 12 18 24 30 1 6 11 16 21 26 31 36 41

VQA real images VQA abstract images MSCOCO CLEVR

NLVR (ours)

slide-23
SLIDE 23

Linguistic Analysis

Hard cardinality

VQA (abstract) VQA (real) NLVR

Soft cardinality Existential quantifiers Universal quantifiers Coordination Negation Coreference Presupposition Spatial relations Comparisons Coordination ambiguity Prepositional ambiguity

Analyzed 200 random development sentences.

slide-24
SLIDE 24

There is a tower with exactly three blocks, and it has a yellow block and two blue blocks. there are at least two yellow squares not touching any edge

Numerical Expressions

Hard cardinality 66% 12% 12%

VQA (abstract) VQA (real) NLVR

Soft cardinality 16% 1% 0%

TRUE TRUE

slide-25
SLIDE 25

There is a box with a black item between 2 items of the same color and no item on top of that. There is a box with a yellow item and three black items.

Negation and Coordination

Negation 10% 1% 0%

VQA (abstract) VQA (real) NLVR

Coordination 17% 5% 3%

TRUE TRUE

slide-26
SLIDE 26

Text 


  • nly 


(RNN) Image 


  • nly 


(CNN) CNN+RNN NMN

62.0 56.3 55.3 56.2

Accuracy on unreleased test set

Baselines

(Andreas et al 2015)

Majority class

55.4

slide-27
SLIDE 27

Feature-based Analysis

  • Features text and structured representation
  • Use maximum entropy model

Unreleased test Dev No count features

57.7 68.04 67.82

Accuracy

slide-28
SLIDE 28

http://lic.nlp.cornell.edu/nlvr/

Thank you!