Self-Critical Reasoning for Robust Visual Question Answering Jialin - - PowerPoint PPT Presentation

β–Ά
self critical reasoning for robust visual question
SMART_READER_LITE
LIVE PREVIEW

Self-Critical Reasoning for Robust Visual Question Answering Jialin - - PowerPoint PPT Presentation

Self-Critical Reasoning for Robust Visual Question Answering Jialin Wu and Raymond J. Mooney Visual Question Answering (VQA) Common VQA system What utensil is pictured? Knife (0.72) Answer Prediction Fork Visual feature set


slide-1
SLIDE 1

Self-Critical Reasoning for Robust Visual Question Answering

Jialin Wu and Raymond J. Mooney

slide-2
SLIDE 2

Visual Question Answering (VQA)

  • Common VQA system

What utensil is pictured?

Answer Prediction Knife (0.72) Fork (0.66) Visual feature set 𝒲 Original image

slide-3
SLIDE 3

Capture superficial statistical correlations between QA pairs

VQA system

Knife

I won’t bother to look at the image, I can answer your question by just looking at the question

What utensil is pictured?

Original image

20 40 60 80 100

knife fork

Training Answer Distribution

slide-4
SLIDE 4

Force VQA to focus on what humans focus on

  • Extract a proposal set of objects ( ) that humans focus on.

OR There is a fork near the cake.

Proposal object set Human visual explanation Human textual explanation

slide-5
SLIDE 5

Force VQA to focus on what humans focus on

  • Enforce the gradients for the correct answer to have the largest value

for at least one of the extracted objects.

βˆ‡#π‘ž(𝑔𝑝𝑠𝑙|𝑅, 𝒲)

Proposal object set

Influence Strengthen Loss

slide-6
SLIDE 6

Results

  • Compared to baseline model on VQA-CP dataset
  • VQA-CP dataset manually set the train and test set in very different

distribution

38 43 48 53

All

VQA scores

Baseline Ours (infl)

slide-7
SLIDE 7

Over sensitivity to the most common objects

VQA system

I can focus on the fork but I still think it is a knife

What utensil is pictured? Knife

Focused objects for answer β€œfork” Focused objects for answer β€œknife”

slide-8
SLIDE 8

Criticizing the false influential object

  • Find the most influential object for the correct answer using gradients

What utensil is pictured?

βˆ‡#π‘ž(𝑔𝑝𝑠𝑙|𝑅, 𝒲)

OR There is a fork near the cake.

Answer Prediction

Knife (0.72) Fork (0.66) Proposal object set Explaining prediction β€œfork”

Visual feature set 𝒲

Original image Human visual explanation Human textual explanation The most influential object

slide-9
SLIDE 9

Criticizing the false influential object

  • Force the object to contribute more to the correct answer.

What utensil is pictured?

βˆ‡#π‘ž(𝑔𝑝𝑠𝑙|𝑅, 𝒲)

OR There is a fork near the cake.

Answer Prediction

Knife (0.72) Fork (0.66) Proposal object set Explaining prediction β€œfork”

Visual feature set 𝒲

Original image Human visual explanation Human textual explanation The most influential object βˆ‡#π‘ž(π‘™π‘œπ‘—π‘”π‘“|𝑅, 𝒲)

Explaining prediction β€œknife”

Self Critical Loss

slide-10
SLIDE 10

Our self-critical approach

VQA system

Fork

Oh, yes, the utensil should be a fork.

What utensil is pictured?

slide-11
SLIDE 11

Results

  • Compared to baseline model on VQA-CP dataset

38 40 42 44 46 48 50 52

All

VQA scores

Baseline Ours (infl) Ours (infl + crit)