Yash Goyal Aishwarya Agrawal (Georgia Tech) (Georgia Tech) - - PowerPoint PPT Presentation

yash goyal aishwarya agrawal georgia tech georgia tech
SMART_READER_LITE
LIVE PREVIEW

Yash Goyal Aishwarya Agrawal (Georgia Tech) (Georgia Tech) - - PowerPoint PPT Presentation

Yash Goyal Aishwarya Agrawal (Georgia Tech) (Georgia Tech) Outline Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results 2 Outline Overview of Task and Dataset Overview of Challenge Winner


slide-1
SLIDE 1

Aishwarya Agrawal (Georgia Tech) Yash Goyal (Georgia Tech)

slide-2
SLIDE 2

Outline

Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results

2

slide-3
SLIDE 3

Outline

Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results

3

slide-4
SLIDE 4

Outline

Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results

4

slide-5
SLIDE 5

Outline

Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results

5

slide-6
SLIDE 6

Outline

Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results

6

slide-7
SLIDE 7

VQA Task

7

slide-8
SLIDE 8

VQA Task

What is the mustache made of?

8

slide-9
SLIDE 9

VQA Task

What is the mustache made of?

AI System

9

slide-10
SLIDE 10

VQA Task

What is the mustache made of? bananas

AI System

10

slide-11
SLIDE 11

VQA v1.0 Dataset

11

slide-12
SLIDE 12

VQA v1.0 Dataset

12

About

  • bjects
slide-13
SLIDE 13

VQA v1.0 Dataset

13

Fine-grained recognition

slide-14
SLIDE 14

VQA v1.0 Dataset

14

Counting

slide-15
SLIDE 15

VQA v1.0 Dataset

15

Common sense

slide-16
SLIDE 16

VQA v2.0 Dataset

slide-17
SLIDE 17

woman

Different answers Similar images VQA v1.0

man

Who is wearing glasses? New in VQA v2.0

slide-18
SLIDE 18

VQA v2.0 Dataset Stats

  • >200K images
  • >1.1M questions
  • >11M answers

18

1.8 x VQA v1.0

slide-19
SLIDE 19

Accuracy Metric

19

slide-20
SLIDE 20

Outline

Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results

20

slide-21
SLIDE 21

VQA Challenge on

https://evalai.cloudcv.org/

21

slide-22
SLIDE 22

Dataset splits

Images Questions Answers Training 80K 443K 4.4M

Dataset size is approximate

22

slide-23
SLIDE 23

Dataset splits

Images Questions Answers Training 80K 443K 4.4M Validation 40K 214K 2.1M

Dataset size is approximate

23

slide-24
SLIDE 24

Dataset splits

Images Questions Answers Training 80K 443K 4.4M Validation 40K 214K 2.1M Test 80K 447K

Dataset size is approximate

24

slide-25
SLIDE 25

Test Dataset

  • 4 splits of approximately equal size
  • Test-dev (development)

– Debugging and Validation.

  • Test-standard (publications)

– Used to score entries for the Public Leaderboard.

  • Test-challenge (competitions)

– Used to rank challenge participants.

  • Test-reserve (check overfitting)

– Used to estimate overfitting. Scores on this set are never released.

Slide adapted from: MSCOCO Detection/Segmentation Challenge, ICCV 2015

25

slide-26
SLIDE 26

Outline

Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results

slide-27
SLIDE 27

Challenge Stats

  • 40 teams
  • >=40 institutions*
  • >=8 countries*

*Statistics based on teams that have replied

slide-28
SLIDE 28

Challenge Runner-Ups

Joint Runner-Up Team 1

SNU-BI Challenge Accuracy: 71.69 Jin-Hwa Kim (Seoul National University) Jaehyun Jun (Seoul National University) Byoung-Tak Zhang (Seoul National University & Surromind Robotics)

28

slide-29
SLIDE 29

Challenge Runner-Ups

Joint Runner-Up Team 2

HDU-UCAS-USYD Challenge Accuracy: 71.91 Zhou Yu (Hangzhou Dianzi University, China) Jun Yu (Hangzhou Dianzi University, China) Chenchao Xiang (Hangzhou Dianzi University, China) Jianping Fan (Hangzhou Dianzi University, China) Dalu Guo (The Unversity of Sydney, Australia) Dacheng Tao (The University of Sydney, Australia) Liang Wang (Hangzhou Dianzi University, China)

Qingming Huang (University of Chinese Academy of Sciences)

slide-30
SLIDE 30

Challenge Winner

Challenge Accuracy: 72.41 Yu Jiang† (Facebook AI Research) Vivek Natarajan† (Facebook AI Research) Xinlei Chen† (Facebook AI Research) Dhruv Batra (Facebook AI Research & Georgia Tech) Marcus Rohrbach (Facebook AI Research)

30

Devi Parikh (Facebook AI Research & Georgia Tech) FAIR-A*

† equal contribution

slide-31
SLIDE 31

Outline

Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results

slide-32
SLIDE 32

Challenge Results

60 62 64 66 68 70 72 74

slide-33
SLIDE 33

Challenge Results

60 62 64 66 68 70 72 74

slide-34
SLIDE 34

Challenge Results

67 68 69 70 71 72 73

slide-35
SLIDE 35

Challenge Results

67 68 69 70 71 72 73

+3.4% absolute

slide-36
SLIDE 36

Statistical Significance

  • Bootstrap samples 5000 times
  • @ 95% confidence
slide-37
SLIDE 37

Statistical Significance

67 68 69 70 71 72 73

Overall Accuracy

slide-38
SLIDE 38

Easy vs. Difficult Questions

slide-39
SLIDE 39

Easy vs. Difficult Questions

10 20 30 40 50 60 70

0/10 1/10 2/10 3/10 4/10 5/10 6/10 7/10 8/10 9/10 10/10

Percentage of questions correctly answered by teams Number of top 10 teams

slide-40
SLIDE 40

Easy vs. Difficult Questions

10 20 30 40 50 60 70

0/10 1/10 2/10 3/10 4/10 5/10 6/10 7/10 8/10 9/10 10/10

Percentage of questions correctly answered by teams Number of top 10 teams 82.5% of questions can be answered by at least 1 method! Difficult Questions

slide-41
SLIDE 41

Easy vs. Difficult Questions

10 20 30 40 50 60 70

0/10 1/10 2/10 3/10 4/10 5/10 6/10 7/10 8/10 9/10 10/10

Percentage of questions correctly answered by teams Number of top 10 teams Difficult Questions Easy Questions

slide-42
SLIDE 42

Easy vs. Difficult Questions

10 20 30 40 50 60 70

0/10 1/10 2/10 3/10 4/10 5/10 6/10 7/10 8/10 9/10 10/10

Percentage of questions correctly answered by teams Number of top 10 teams

2016 2017 2018

slide-43
SLIDE 43

Difficult Questions with Rare Answers

slide-44
SLIDE 44

Difficult Questions with Rare Answers

What is the name of … What is the number on … What is written on the … What does the sign … What time is it? What kind of … What type of … Why is the …

slide-45
SLIDE 45

Easy vs. Difficult Questions

slide-46
SLIDE 46

Easy vs. Difficult Questions

Difficult Questions with Frequent Answers Easy Questions

slide-47
SLIDE 47

Answer Type Analyses

  • SNU_BI performs the best for “number” questions
slide-48
SLIDE 48

Results on “number” questions

30 35 40 45 50 55 60

FAIR-A* HDU-UCAS-USYD SNU-BI casia_iva Tohoku CV Lab MIL-UT ut-swk graph-attention-msm DCD_ZJU vqabyte fs UTS_YZZD Adelaide-Teney VQA-ReasonTensor UPMC-LIP6 wyvernbai caption_vqa cvqa nagizero CFM-UESTC VQA_NTU yudf2010 nmlab612 TsinghuaCVLab CIST-VQA VLC Southampton RelVQA University of Guelph MLRG NTU_ROSE_USTC zhi-smile VQA-Machine+ xie Vardaan HACKERS AE-VQA dandelin ghost VQA-Learning vqa-suchow HAIBIN windLBL VQA_San vqateam_mcb_benchmark akshay_isical

"number" accuracy

slide-49
SLIDE 49

Answer Type Analyses

  • SNU_BI performs the best for “number” questions
  • No team statistically significantly better than the

winner team for “yes/no” and “other”

slide-50
SLIDE 50

Are models sensitive to subtle changes in images?

woman

Different answers Similar images

man

Who is wearing glasses?

slide-51
SLIDE 51

Are models sensitive to subtle changes in images?

  • Are predictions different for complementary images?
  • Are predictions accurate for complementary images?
slide-52
SLIDE 52

Are predictions different for complementary images?

40 45 50 55 60 65 70 FAIR-A* HDU-UCAS-USYD SNU-BI casia_iva MIL-UT Tohoku CV Lab ut-swk graph-attention-msm DCD_ZJU vqabyte fs UTS_YZZD Adelaide-Teney VQA-ReasonTensor UPMC-LIP6 wyvernbai caption_vqa cvqa nagizero CFM-UESTC VQA_NTU yudf2010 nmlab612 TsinghuaCVLab CIST-VQA VLC Southampton RelVQA University of Guelph MLRG NTU_ROSE_USTC zhi-smile VQA-Machine+ xie Vardaan HACKERS AE-VQA dandelin ghost VQA-Learning vqa-suchow HAIBIN windLBL VQA_San vqateam_mcb_benchmark akshay_isical

slide-53
SLIDE 53

Are predictions accurate for complementary images?

40 42 44 46 48 50 52 54 56 58 60 FAIR-A* HDU-UCAS-USYD SNU-BI casia_iva MIL-UT Tohoku CV Lab ut-swk graph-attention-msm DCD_ZJU vqabyte fs UTS_YZZD Adelaide-Teney VQA-ReasonTensor UPMC-LIP6 wyvernbai caption_vqa cvqa nagizero CFM-UESTC VQA_NTU yudf2010 nmlab612 TsinghuaCVLab CIST-VQA VLC Southampton RelVQA University of Guelph MLRG NTU_ROSE_USTC zhi-smile VQA-Machine+ xie Vardaan HACKERS AE-VQA dandelin ghost VQA-Learning vqa-suchow HAIBIN windLBL VQA_San vqateam_mcb_benchmark akshay_isical

slide-54
SLIDE 54

Are predictions accurate for complementary images?

40 42 44 46 48 50 52 54 56 58 60 FAIR-A* HDU-UCAS-USYD SNU-BI casia_iva MIL-UT Tohoku CV Lab ut-swk graph-attention-msm DCD_ZJU vqabyte fs UTS_YZZD Adelaide-Teney VQA-ReasonTensor UPMC-LIP6 wyvernbai caption_vqa cvqa nagizero CFM-UESTC VQA_NTU yudf2010 nmlab612 TsinghuaCVLab CIST-VQA VLC Southampton RelVQA University of Guelph MLRG NTU_ROSE_USTC zhi-smile VQA-Machine+ xie Vardaan HACKERS AE-VQA dandelin ghost VQA-Learning vqa-suchow HAIBIN windLBL VQA_San vqateam_mcb_benchmark akshay_isical

52.7% 2017 winner +4.8% absolute

slide-55
SLIDE 55

Are models driven by priors?

Only consider those questions whose answers are not popular (given the question type) in training

  • 1-Prior: Test answers are not the top-1 most common

in training

  • 2-Prior: Test answer are not the top-2 most common in

training

Agrawal et al., CVPR 2018

slide-56
SLIDE 56

Are models driven by priors?

5-6% drop

50 55 60 65 70 75

FAIR-A* HDU-UCAS-USYD SNU-BI casia_iva MIL-UT Tohoku CV Lab ut-swk graph-attention-msm DCD_ZJU vqabyte fs UTS_YZZD Adelaide-Teney VQA-ReasonTensor UPMC-LIP6 wyvernbai caption_vqa cvqa nagizero CFM-UESTC VQA_NTU yudf2010 nmlab612 TsinghuaCVLab CIST-VQA VLC Southampton RelVQA University of Guelph MLRG NTU_ROSE_USTC zhi-smile VQA-Machine+ xie Vardaan HACKERS AE-VQA dandelin ghost VQA-Learning vqa-suchow HAIBIN windLBL VQA_San vqateam_mcb_benchmark akshay_isical

All Questions Non-1-Prior Questions

slide-57
SLIDE 57

Are models driven by priors?

15-16% drop

40 45 50 55 60 65 70 75

FAIR-A* HDU-UCAS-USYD SNU-BI casia_iva MIL-UT Tohoku CV Lab ut-swk graph-attention-msm DCD_ZJU vqabyte fs UTS_YZZD Adelaide-Teney VQA-ReasonTensor UPMC-LIP6 wyvernbai caption_vqa cvqa nagizero CFM-UESTC VQA_NTU yudf2010 nmlab612 TsinghuaCVLab CIST-VQA VLC Southampton RelVQA University of Guelph MLRG NTU_ROSE_USTC zhi-smile VQA-Machine+ xie Vardaan HACKERS AE-VQA dandelin ghost VQA-Learning vqa-suchow HAIBIN windLBL VQA_San vqateam_mcb_benchmark akshay_isical

All Questions Non-2-Prior Questions

slide-58
SLIDE 58

Are models driven by priors?

52 53 54 55 56 57 58

slide-59
SLIDE 59

Improvement from 2017 challenge

  • 1-Prior: Best performance improved by 3.8%
  • 2-Prior: Best performance improved by 3.3%
slide-60
SLIDE 60

Are models compositional?

Only consider those questions which are compositionally novel:

  • QA pair is not seen in training
  • Constituting concepts seen in training

Agrawal et al., Arxiv 2018

slide-61
SLIDE 61

Are models compositional?

slide-62
SLIDE 62

Are models compositional?

12-13% drop

40 45 50 55 60 65 70 75

FAIR-A* HDU-UCAS-USYD SNU-BI casia_iva MIL-UT Tohoku CV Lab ut-swk graph-attention-msm DCD_ZJU vqabyte fs UTS_YZZD Adelaide-Teney VQA-ReasonTensor UPMC-LIP6 wyvernbai caption_vqa cvqa nagizero CFM-UESTC VQA_NTU yudf2010 nmlab612 TsinghuaCVLab CIST-VQA VLC Southampton RelVQA University of Guelph MLRG NTU_ROSE_USTC zhi-smile VQA-Machine+ xie Vardaan HACKERS AE-VQA dandelin ghost VQA-Learning vqa-suchow HAIBIN windLBL VQA_San vqateam_mcb_benchmark akshay_isical

All Questions Compositionally Novel Questions

slide-63
SLIDE 63

Are models compositional?

53 54 55 56 57 58 59 60 61

56.5% 2017 winner +3.4% absolute

slide-64
SLIDE 64

Are models compositional?

53 54 55 56 57 58 59 60 61

slide-65
SLIDE 65

Average answer recall

  • New accuracy metric proposed in Kafle and Kannan, ICCV 17

– Also known as “Normalized accuracy”

  • Method:

– Computes accuracy for each unique answer – Take the mean over all unique answers

  • Rewards models which perform well for rare answers
slide-66
SLIDE 66

Average answer recall

18 20 22 24 26 28 30 FAIR-A* HDU-UCAS-USYD SNU-BI casia_iva Tohoku CV Lab MIL-UT ut-swk graph-attention-msm DCD_ZJU vqabyte fs UTS_YZZD Adelaide-Teney VQA-ReasonTensor UPMC-LIP6 wyvernbai caption_vqa cvqa nagizero CFM-UESTC VQA_NTU yudf2010 nmlab612 TsinghuaCVLab CIST-VQA VLC Southampton RelVQA University of Guelph MLRG NTU_ROSE_USTC zhi-smile VQA-Machine+ xie Vardaan HACKERS AE-VQA dandelin ghost VQA-Learning vqa-suchow HAIBIN windLBL VQA_San vqateam_mcb_benchmark akshay_isical

slide-67
SLIDE 67

Average answer recall

18 20 22 24 26 28 30 FAIR-A* HDU-UCAS-USYD SNU-BI casia_iva Tohoku CV Lab MIL-UT ut-swk graph-attention-msm DCD_ZJU vqabyte fs UTS_YZZD Adelaide-Teney VQA-ReasonTensor UPMC-LIP6 wyvernbai caption_vqa cvqa nagizero CFM-UESTC VQA_NTU yudf2010 nmlab612 TsinghuaCVLab CIST-VQA VLC Southampton RelVQA University of Guelph MLRG NTU_ROSE_USTC zhi-smile VQA-Machine+ xie Vardaan HACKERS AE-VQA dandelin ghost VQA-Learning vqa-suchow HAIBIN windLBL VQA_San vqateam_mcb_benchmark akshay_isical

slide-68
SLIDE 68

Progress in VQA

68

50 55 60 65 70 75 12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18

ICCV 15

Accuracy on v2

slide-69
SLIDE 69

Progress in VQA

69

50 55 60 65 70 75 12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18

ICCV 15 2016 Challenge winner

Accuracy on v2

slide-70
SLIDE 70

Progress in VQA

70

50 55 60 65 70 75 12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18

ICCV 15 2016 Challenge winner

Accuracy on v2

+7.0% absolute

slide-71
SLIDE 71

Progress in VQA

71

50 55 60 65 70 75 12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18

ICCV 15 2016 Challenge winner 2017 Challenge winner Challenge 2017 deadline

Accuracy on v2

slide-72
SLIDE 72

Progress in VQA

72

50 55 60 65 70 75 12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18

ICCV 15 2016 Challenge winner 2017 Challenge winner Challenge 2017 deadline +6.7% absolute

Accuracy on v2

slide-73
SLIDE 73

Progress in VQA

73

50 55 60 65 70 75 12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18

ICCV 15 2016 Challenge winner 2017 Challenge winner Challenge 2018 deadline 2018 Challenge winner

Accuracy on v2

slide-74
SLIDE 74

Progress in VQA

74

50 55 60 65 70 75 12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18

ICCV 15 2016 Challenge winner 2017 Challenge winner +3.4% absolute 2018 Challenge winner

Accuracy on v2

slide-75
SLIDE 75

Visual Dialog Challenge 2018

75

  • Deadline: mid-August, 2018
  • Results: September 8th, 2018 at ECCV 2018

visualdialog.org/challenge/2018

  • ~130k images (COCO)
  • 10-round dialog / image
  • ~1.3 million QA pairs
  • Evaluation
  • Automatic metrics
  • Human annotations

VisDial v1.0

slide-76
SLIDE 76

Thanks! Questions?