Ph.D. Student Machine Learning and Perception Lab 2 sky stop - PowerPoint PPT Presentation

Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab

sky stop light building bus car person sidewalk Identify objects in scene 3

blue green tall sky stop light building many red cars bus one bicycle Identify attributes of objects 4

man walking on sidewalk person wearing a helmet riding bicycle Identify activities in scene 5

street scene Identify the scene 6

A person on bike going through green light with bus nearby Describe the scene 8

A giraffe standing in the grass next to a tree. 11

• Answer questions about the scene – Q: How many buses are there? – Q: What is the name of the street? – Q: Is the man on bicycle wearing a helmet? 13

Visual Question Answering (VQA) Task: Given an image and a natural language open- ended question, generate a natural language answer. 15

VQA Task 16

VQA CloudCV Demo cloudcv.org/vqa/?useVoice=1&listenAnswer=1 17

Applications of VQA • An aid to visually-impaired Is it safe to cross the street now? 18

Applications of VQA • Surveillance What kind of car did the man in red shirt leave in? 19

Applications of VQA • Interacting with robot Is my laptop in my bedroom upstairs? 20

VQA Dataset 21

Real images (from MSCOCO) Tsung-Yi Lin et al. “Microsoft COCO: Common Objects in COntext .” ECCV 2014. http://mscoco.org/ 22

Questions Stump a smart robot! Ask a question that a human can answer, but a smart robot probably can’t! 23

Two modalities of answering • Open Ended • Multiple Choice 24

Open Ended Task What is the girl holding in her hand? How many mirrors? Why is the girl holding an umbrella? 25

Multiple Choice Task What is the bus number? a) 3 b) 1 c) green d) 4 e) window trim f) blue g) m5 h) corn, carrots, onions, rice i) red j) 125 k) san antonio l) sign pen m) 478 n) no o) 25 p) 2 q) yes r) white 26

Dataset Stats • >250K images (MSCOCO + 50K Abstract Scenes) • >750K questions (3 per image) • ~10M answers (10 w/ image + 3 w/o image) 27

Please visit www.visualqa.org for more details. 28

Browse the Dataset http://visualqa.org/browser/ 29

Questions 30

Dataset Visualization http://visualqa.org/visualize/ 32

Answers • 38.4% of questions are binary yes/no • 98.97% questions have answers <= 3 words – 23k unique 1 word answers 33

Answers 34

2-Channel VQA Model Neural Network Image Embedding Softmax over top K answers 4096-dim Convolution Layer Pooling Layer Convolution Layer Pooling Layer Fully-Connected MLP + Non-Linearity + Non-Linearity Embedding Question “How many horses are in this image?” 1024-dim 36

Ablation #1: Language-alone Neural Network Image Embedding Softmax 1k output over top K answers units Convolution Layer Pooling Layer Convolution Layer Pooling Layer Fully-Connected MLP + Non-Linearity + Non-Linearity Question Embedding “How many horses are in this image?” 1024-dim 37

Ablation #2: Vision-alone Neural Network Image Embedding Softmax over top K answers 4096-dim Convolution Layer Pooling Layer Convolution Layer Pooling Layer Fully-Connected MLP + Non-Linearity + Non-Linearity Question Embedding “How many horses are in this image?” 38

Accuracy Metric 39

Open-Ended Task Accuracies Human performance Human vs. Machine performance room for 25.14 improvement Human Machine 40

Results • Multiple-Choice > Open-Ended • Question alone does quite well Code available! • Image helps 41

Commonsense • Does this person have 20/20 vision? 42

Does this question need commonsense? Q: How many calories are in this pizza? 43

How old does a person need to be? Q: How many calories are in this pizza? 44

Most “commonsense” questions 45

Least “commonsense” questions 46

Spectrum 3-4 (15.3%) 5-8 (39.7%) 9-12 (28.4%) 13-17 (11.2%) 18+ (5.5%) Is that a bird in the sky? How many pizzas are shown? Where was this picture taken? Is he likely to get mugged if he walked What type of architecture is this? down a dark alleyway like this? What color is the shoe? What are the sheep eating? What ceremony does the cake Is this a vegetarian meal? Is this a Flemish bricklaying commemorate? pattern? How many zebras are there? What color is his hair? Are these boats too tall to fit What type of beverage is in the glass? How many calories are in this under the bridge? pizza? Is there food on the table? What sport is being played? What is the name of the white Can you name the performer in the What government document is shape under the batter? purple costume? needed to partake in this activity? Is this man wearing shoes? Name one ingredient in the skillet. Is this at the stadium? Besides these humans, what other What is the make and model of animals eat here? this vehicle? 47

Question Average Age what brand 12.5 why 11.18 what type 11.04 what kind 10.55 is this 10.13 what does 10.06 what time 9.81 who 9.58 where 9.54 which 9.32 does 9.29 do 9.23 what is 9.11 what are 9.04 are 8.65 is the 8.52 is there 8.24 what sport 8.06 how many 7.67 what animal 6.74 what color 6.6 48

VQA Age • Average “age of questions” = 8.98 years. • Our model =* 4.74 years old! * age as estimated by untrained crowd-sourced workers 49

VQA Common sense • Average common sense required = 31%. • Our best algorithm has* 17% common sense! * as estimated by untrained crowd-sourced workers 50

VQA Challenges on www.codalab.org 51

VQA Challenge @ CVPR16 52

VQA Challenge @ CVPR16 code available! 53

VQA Workshop @ CVPR16 54

Papers using VQA … and many more 55

Dataset: >1k downloads Code: >1.5k views Academia, industry, start ups 56

Conclusions • VQA: Visual Question Answering – The next “grand challenge” in vision, language, AI • Spectrum: Easy to Difficult – “What room is this?”  Scene Recognition – “How many …”  Object Recognition – … – “Does this person have 20/20 vision”  Common sense • Exciting times ahead! 57

VQA Team Jiasen Lu Akrit Mohapatra Aishwarya Agrawal Stanislaw Antol Virginia Tech Virginia Tech Virginia Tech Virginia Tech Webmaster Meg Mitchell Larry Zitnick Dhruv Batra Devi Parikh Microsoft Research Facebook AI Virginia Tech Virginia Tech Research 58

Closing Remarks • CloudCV VQA Exhibition: Booth 101 • Contact email: aish@vt.edu • Please complete the Presenter Evaluation sent to you by email or through the GTC Mobile App. Your feedback is important! 59

Thanks! Questions? 60

Visual Question Answering (VQA) 61

Ph.D. Student Machine Learning and Perception Lab 2 sky stop - PowerPoint PPT Presentation

Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab 2 sky stop light building bus car person sidewalk Identify objects in scene 3 blue green tall sky stop light building many red cars bus one bicycle Identify

Student view socrative.com STUDENT LOGIN or LOGIN, STUDENT LOGIN Room name: ALLIANCE Student

STUDENT FINANCE STUDENT FINANCE STUDENT FINANCE STUDENT FINANCE 2014/15 20 2014/15 20 14/15

Student by Student Student by Student Technology & Leadership Conference Participating

Student Services Personal Tutor Training: Pastoral Student Services Student Services Student

Perceptions of the Student Perceptions of the Student Perceptions of the Student Perceptions of

Teacher Teacher-Student Data Link Teacher Teacher Student Data Link Student Data Link Student

Code of Student Conduct Revision Graduate Student Senate Student Government Association Student

Student Response Systems Student Response Systems Student Response Systems Student Response

Student Transportation Student Transportation Student Transportation Student Transportation Bus

Formula Student Overview for 2014-2015 Carleton Formula Student What is Formula Student?

The Student Visa Subclass 500 Session plan Simplified Student Visa Framework Student visa

New Student Welcome Day will begin shortly. New Student Welcome Day 1 New Student Welcome Day

SDS Enriches Your Student Life Student Development Services 2020/2021 Student Development

Dragos, Inc. | May 2019 Student Student Officer Student Officer Network Defender Student

Student Forum The Chester County Student Forum Student Forum is made up of students from the

Student Financial Assistance & Student Accounts Orientation June, 2019 Student Financial

A St Study udy of of Bi Bicy cycle cle Si Signa gnal l Com ompl plian ance ce Emp

MIPS presentation of the interim report for the third quarter 2019 8 November 2019 KEY

CITI BIKE 2018 INFILL DRAFT PLAN Manhattan Community Board 3 November 13, 2018 1 2018 INFILL

HELLP HELLP vs HUS vs HUS Both HELLP syndrome and HUS present, haematologically, as a

Lone Peak Mountain Bike Team WELCOME TO The 2015 Season! We are the Knight Riders! Welcome to our

Hiking and Biking Trails: How you can get involved ESIMBA Eastern Shore chapter of the

Nelson City Council Cycle Counting 922027 Background Why count? To understand how many

MICROTRANSIT: PARTNERSHIPS WITH TNCS CalACT 2018 Autumn Technology & Shared Mobility

Sambuz

Useful Links

Newsletter

Mail Us

Ph.D. Student Machine Learning and Perception Lab 2 sky stop - PowerPoint PPT Presentation

Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab 2 sky stop light building bus car person sidewalk Identify objects in scene 3 blue green tall sky stop light building many red cars bus one bicycle Identify

Student view socrative.com STUDENT LOGIN or LOGIN, STUDENT LOGIN Room name: ALLIANCE Student

STUDENT FINANCE STUDENT FINANCE STUDENT FINANCE STUDENT FINANCE 2014/15 20 2014/15 20 14/15

Student by Student Student by Student Technology &amp; Leadership Conference Participating

Student Services Personal Tutor Training: Pastoral Student Services Student Services Student

Perceptions of the Student Perceptions of the Student Perceptions of the Student Perceptions of

Teacher Teacher-Student Data Link Teacher Teacher Student Data Link Student Data Link Student

Code of Student Conduct Revision Graduate Student Senate Student Government Association Student

Student Response Systems Student Response Systems Student Response Systems Student Response

Student Transportation Student Transportation Student Transportation Student Transportation Bus

Formula Student Overview for 2014-2015 Carleton Formula Student What is Formula Student?

The Student Visa Subclass 500 Session plan Simplified Student Visa Framework Student visa

New Student Welcome Day will begin shortly. New Student Welcome Day 1 New Student Welcome Day

SDS Enriches Your Student Life Student Development Services 2020/2021 Student Development

Dragos, Inc. | May 2019 Student Student Officer Student Officer Network Defender Student

Student Forum The Chester County Student Forum Student Forum is made up of students from the

Student Financial Assistance &amp; Student Accounts Orientation June, 2019 Student Financial

A St Study udy of of Bi Bicy cycle cle Si Signa gnal l Com ompl plian ance ce Emp

MIPS presentation of the interim report for the third quarter 2019 8 November 2019 KEY

CITI BIKE 2018 INFILL DRAFT PLAN Manhattan Community Board 3 November 13, 2018 1 2018 INFILL

HELLP HELLP vs HUS vs HUS Both HELLP syndrome and HUS present, haematologically, as a

Lone Peak Mountain Bike Team WELCOME TO The 2015 Season! We are the Knight Riders! Welcome to our

Hiking and Biking Trails: How you can get involved ESIMBA Eastern Shore chapter of the

Nelson City Council Cycle Counting 922027 Background Why count? To understand how many

MICROTRANSIT: PARTNERSHIPS WITH TNCS CalACT 2018 Autumn Technology &amp; Shared Mobility

Sambuz

Useful Links

Newsletter

Mail Us

Student by Student Student by Student Technology & Leadership Conference Participating

Student Financial Assistance & Student Accounts Orientation June, 2019 Student Financial

MICROTRANSIT: PARTNERSHIPS WITH TNCS CalACT 2018 Autumn Technology & Shared Mobility