Embodied Question Answering NVIDIA GTC March 26, 2018 Abhishek Das - PowerPoint PPT Presentation

Embodied Question Answering NVIDIA GTC March 26, 2018 Abhishek Das PhD student, Georgia Tech

Embodied Question Answering Samyak Datta Georgia Gkioxari Stefan Lee Devi Parikh Dhruv Batra Georgia Tech FAIR Georgia Tech FAIR/Georgia Tech FAIR/Georgia Tech em embodied edqa. a.org/pa pape per.pd pdf To To appear in CVPR PR 2018 (Oral).

Forward

Turn Left

What is to the left of the shower? Cabinet Slide credit: Devi Parikh

EmbodiedQA: AI Challenges • Language understanding • Visual understanding • Active perception • Common sense reasoning • Grounding into actions • Selective memory • Credit assignment Slide credit: Devi Parikh

EmbodiedQA: Context Dialog Language Single-Shot QA Single Frame Video Vision Slide credit: Devi Parikh

EmbodiedQA: Context Dialog Language Single-Shot QA P a s s i v e A c t i o n A c t i v e Single Frame Video Vision Slide credit: Devi Parikh

EmbodiedQA: Context Dialog Language Single-Shot QA VQA P a s s i v e Q. What is the mustache made of? A c t i o n A [Antol and Agrawal et al., ICCV 2015] c t i v [Malinowski et al., ICCV 2015] e … Single Frame Video Vision Slide credit: Devi Parikh

EmbodiedQA: Context Dialog Language Single-Shot QA Q. How many times does the cat touch the dog? A. 4 times [Jang et al., CVPR 2017] VQA VideoQA P a s s i v e A c t i o Attribute: “dog”, “egg”, “bowl”, “woman”, “plate” n A c Q. What is a woman boiling in a pot of water? t i v e A. Eggs Single Frame Video [Ye et al., SIGIR 2017] Vision [Tapaswi et al., CVPR 2016] … Slide credit: Devi Parikh

EmbodiedQA: Context Dialog Language Visual Dialog Single-Shot QA VQA VideoQA P a s s i v e A c t i o n A c t i v e [Das et al., CVPR 2017] Single Frame Video [Das and Kottur et al., ICCV 2017] Vision … Slide credit: Devi Parikh

EmbodiedQA: Context Dialog Language • Goal specified via reward Visual Dialog • e.g., [Gupta et al., CVPR17, Zhu et al., ICCV17] Single-Shot QA • Goal specified via visual target • e.g., [Zhu et al., ICRA17] • Fully observable environment • e.g., [Wang et al., ACL16] VQA VideoQA • Recent P a s s i • [Hermann et al., 2017, Chaplot et al., 2017] v e A Embodied • More complex environments c t i QA o • Higher level tasks n A c • [Anderson et al., CVPR18] t i v e • Interactive downstream tasks Single Frame Video Vision Slide credit: Devi Parikh

EQA Dataset • Questions in environments Slide credit: Devi Parikh

EQA Dataset • Questions in en envi vironmen ents ts Slide credit: Devi Parikh

EQA Dataset: Environments House3D: A Rich and Realistic 3D environment https://github.com/facebookresearch/House3D Georgia Gkioxari Yuandong Tian Yuxin Wu Yi Wu UC Berkeley Facebook AI Research Slide credit: Georgia Gkioxari

SUNCG dataset [Song et al., CVPR 2017] Manually designed using an online interior design interface (Planner5D) Slide credit: Georgia Gkioxari

45,622 indoor scenes 5,697,217 object instances 404,058 rooms 2644 unique objects 80 object categories SUNCG dataset [Song et al., CVPR 2017] Manually designed using an online interior design interface (Planner5D) Slide credit: Georgia Gkioxari

• Collision and free space prediction • On Tesla M40 GPU (120x90 resolution) OpenGL 600fps single process • • Linux/MacOS compatible 1800fps multi process • • House3D Slide credit: Georgia Gkioxari

RGB image Depth maps Semantic segmentation masks Top-down 2D views House3D Slide credit: Georgia Gkioxari

EQA Dataset: Environments • Subset of House 3D: Typical home environments • Realistic layout according to all three SUNCG annotators • Not too large or too small (300-800m 2 , cover 1/3rd of ground area) • Have at least one kitchen, living room, dining room, bedroom • Ignore obscure rooms (e.g., loggia) and tiny objects (e.g., light switches) Slide credit: Devi Parikh

EQA Dataset: Environments Test for generalization to Homes (767): Rooms (12): Objects (50) train: 643 homes gym dining room rug piano dryer computer fireplace whiteboard bookshelf wardrobe cabinet novel environments! val: 67 homes patio living room pan toilet plates ottoman fish tank dishwasher microwave water dispenser test: 57 homes office bathroom bed table mirror tv stand stereo set chessboard playstation vacuum cleaner lobby bedroom cup xbox heater bathtub shoe rack range oven refrigerator coffee machine garage elevator sink sofa kettle dresser knife rack towel rack loudspeaker utensil holder kitchen balcony desk vase shower washer fruit bowl television dressing tab. cutting board ironing board food processor Slide credit: Devi Parikh

Slide credit: Devi Parikh fish tank piano pedestal fan candle air conditioner EQA Dataset: Environments bedroom kitchen living room

EQA Dataset • Questions in en envi vironmen ents ts Slide credit: Devi Parikh

EQA Dataset • Qu Ques esti tions in environments Slide credit: Devi Parikh

EQA Dataset: Questions • Programmatically generate questions and answers location: What room is the <OBJ> located in? color: What color is the <OBJ>? color_room: What color is the <OBJ> in the <ROOM>? preposition: What is <on/above/below/next-to> the <OBJ> in the <ROOM>? existence: Is there a(n) <OBJ> in the <ROOM>? logical: Is there a(n) <OBJ1> and a(n) <OBJ2> in the <ROOM>? count: How many <OBJs> in the <ROOM>? room_count: How many <ROOMs> in the house? distance: Is the <OBJ1> closer to the <OBJ2> than to the <OBJ3> in the <ROOM>? … Varying navigation and memory Skill combinations Slide credit: Devi Parikh

EQA Dataset: Questions • Programmatically generate questions and answers Slide credit: Devi Parikh

EQA Dataset: Questions • Programmatically generate questions and answers location: What room is the <OBJ> located in? EQA v1 color: What color is the <OBJ>? color_room: What color is the <OBJ> in the <ROOM>? preposition: What is <on/above/below/next-to> the <OBJ> in the <ROOM>? existence: Is there a(n) <OBJ> in the <ROOM>? logical: Is there a(n) <OBJ1> and a(n) <OBJ2> in the <ROOM>? count: How many <OBJs> in the <ROOM>? room_count: How many <ROOMs> in the house? distance: Is the <OBJ1> closer to the <OBJ2> than to the <OBJ3> in the <ROOM>? Slide credit: Devi Parikh

EQA Dataset: Questions • Programmatically generate questions and answers Remove questions with peaky answer location: What room is the <OBJ> located in? distributions EQA v1 color: What color is the <OBJ>? color_room: What color is the <OBJ> in the <ROOM>? preposition: What is <on/above/below/next-to> the <OBJ> in the <ROOM>? existence: Is there a(n) <OBJ> in the <ROOM>? logical: Is there a(n) <OBJ1> and a(n) <OBJ2> in the <ROOM>? count: How many <OBJs> in the <ROOM>? room_count: How many <ROOMs> in the house? distance: Is the <OBJ1> closer to the <OBJ2> than to the <OBJ3> in the <ROOM>? Questions (5281) train: 4246 val: 506 test: 529 Slide credit: Devi Parikh

EQA Dataset: Expert Demonstrations • Connected House3D to Amazon Mechanical Turk Slide credit: Devi Parikh

EQA Dataset: Expert Demonstrations Slide credit: Devi Parikh

41 Slide credit: Devi Parikh

42 Slide credit: Devi Parikh

EQA Dataset: Expert Demonstrations • Connected House3D to Amazon Mechanical Turk • Currently: demonstrations for 1162 questions across 70 environments • Can be used for training • Learn how to explore • Capture human common sense • Can serve as a performance reference Slide credit: Devi Parikh

EQA Dataset: Expert Demonstrations • Connected House3D to Amazon Mechanical Turk • Currently: demonstrations for 1162 questions across 70 environments • Can be used for training • Learn how to explore • Capture human common sense • Can an serve as as a a performan ance reference (see pap aper) Slide credit: Devi Parikh

Model: Vision, Language, Navigation, Answering Slide credit: Devi Parikh

Model: sion , Language, Navigation, Answering Vi Visi Autoencoder 224 110 53 24 10 Segmentation 224 110 53 24 10 32 32 16 8 Conv_4 Conv_3 Conv_2 Conv_1 RGB Depth Encoder Slide credit: Devi Parikh

Model: Vision, La ge , Navigation, Answering Langu guage Slide credit: Devi Parikh

Model: Vision, Language, Na tion , Answering Navig vigatio • Planner: direction or intention • Controller: velocity or primitive actions Stop Repeat Slide credit: Devi Parikh

Embodied Question Answering NVIDIA GTC March 26, 2018 Abhishek Das - PowerPoint PPT Presentation

Embodied Question Answering NVIDIA GTC March 26, 2018 Abhishek Das PhD student, Georgia Tech Embodied Question Answering Samyak Datta Georgia Gkioxari Stefan Lee Devi Parikh Dhruv Batra Georgia Tech FAIR Georgia Tech FAIR/Georgia Tech

Question Answering What is Ques+on Answering? Dan Jurafsky Ques%on

Designing deep architectures for Visual Question Answering Matthieu Cord Sorbonne University

Question Answering and AnswerFinder Diego Moll a Centre for Language Technology Department of

A Multilingual Hybrid Question-Answering System Cross-Lingual Open-Domain Question Answering

Understanding Player Interpretation An Embodied Approach Jonne Arjoranta University of

ISLS: NAPLeS Embodied Cognition and the Learning Sciences Dor Abrahamson Embodied Design

Response: Pray the Story Keynote Session 2 Sarah Agnew 1 2 Terminology reminder Embodied

Six views of embodied cognition (Wilson, 2002) What is meant by embodied cognition?

EMBODIED CARBON IN THE BUILT ENVIRONMENT: SESSION 5 - REUSE August 17, 2018 Disclaimer Webinar

Making sense of time: The embodied nature of human abstraction Rafael E. Nez Embodied

Embodied Machines Artificial vs. Embodied Intelligence Artificial Intelligence (AI)

Invitation: the performer-interpreter employs tools of the body, emotion, and audience,

Embodied Carbon in the Built Environment: Change Through Policy February 16, 2018 Series

Embodied Carbon in MEP design Studies Louise Hamot Global Head of Lifecycle Research

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Statistical NLP Spring 2011 Lecture 26: Question Answering Dan Klein UC Berkeley Question

Q1 CY2020 Unaudited Results Investor Briefing April 28,2020 1 Irwin C. Lee Mike P. Liwanag

BI OREACTORS NEW SOLUTI ONS FOR OLD PROBLEMS Dr. Pavel Lehky LAMBDA Laborat ory I nst

Bitcoin and the Brave New World of Cryptocurrency AFP Arizona Bryce A. Suzuki Bryan Cave LLP

FY19 Strictly Private & Confidential MAR 2020 0 Disclaimer The provision herein does not

January 15, 2016 Coaching Softball Honors 2002-2010 Varsity Head Coach 2003 PIAA AAA State

1H CY2018 Unaudited Results Investor Briefing July 30, 2018 Irwin C. Lee Mike P. Liwanag

Results briefing for the Fiscal Year ended December 2015 February 4, 2016 Coca-Cola West

From single-use spring water to tap water How can we tempt people to drink our quality tap water

Embodied Question Answering NVIDIA GTC March 26, 2018 Abhishek Das - PowerPoint PPT Presentation

Embodied Question Answering NVIDIA GTC March 26, 2018 Abhishek Das PhD student, Georgia Tech Embodied Question Answering Samyak Datta Georgia Gkioxari Stefan Lee Devi Parikh Dhruv Batra Georgia Tech FAIR Georgia Tech FAIR/Georgia Tech

Question Answering What is Ques+on Answering? Dan Jurafsky Ques%on

Designing deep architectures for Visual Question Answering Matthieu Cord Sorbonne University

Question Answering and AnswerFinder Diego Moll a Centre for Language Technology Department of

A Multilingual Hybrid Question-Answering System Cross-Lingual Open-Domain Question Answering

Understanding Player Interpretation An Embodied Approach Jonne Arjoranta University of

ISLS: NAPLeS Embodied Cognition and the Learning Sciences Dor Abrahamson Embodied Design

Response: Pray the Story Keynote Session 2 Sarah Agnew 1 2 Terminology reminder Embodied

Six views of embodied cognition (Wilson, 2002) What is meant by embodied cognition?

EMBODIED CARBON IN THE BUILT ENVIRONMENT: SESSION 5 - REUSE August 17, 2018 Disclaimer Webinar

Making sense of time: The embodied nature of human abstraction Rafael E. Nez Embodied

Embodied Machines Artificial vs. Embodied Intelligence Artificial Intelligence (AI)

Invitation: the performer-interpreter employs tools of the body, emotion, and audience,

Embodied Carbon in the Built Environment: Change Through Policy February 16, 2018 Series

Embodied Carbon in MEP design Studies Louise Hamot Global Head of Lifecycle Research

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Statistical NLP Spring 2011 Lecture 26: Question Answering Dan Klein UC Berkeley Question

Q1 CY2020 Unaudited Results Investor Briefing April 28,2020 1 Irwin C. Lee Mike P. Liwanag

BI OREACTORS NEW SOLUTI ONS FOR OLD PROBLEMS Dr. Pavel Lehky LAMBDA Laborat ory I nst

Bitcoin and the Brave New World of Cryptocurrency AFP Arizona Bryce A. Suzuki Bryan Cave LLP

FY19 Strictly Private &amp; Confidential MAR 2020 0 Disclaimer The provision herein does not

January 15, 2016 Coaching Softball Honors 2002-2010 Varsity Head Coach 2003 PIAA AAA State

1H CY2018 Unaudited Results Investor Briefing July 30, 2018 Irwin C. Lee Mike P. Liwanag

Results briefing for the Fiscal Year ended December 2015 February 4, 2016 Coca-Cola West

From single-use spring water to tap water How can we tempt people to drink our quality tap water

FY19 Strictly Private & Confidential MAR 2020 0 Disclaimer The provision herein does not