Scene Navigation by Knowledge Graph and Interaction Mohammad - PowerPoint PPT Presentation

Scene Navigation by Knowledge Graph and Interaction Mohammad Rastegari ICCV, Oct, 2019

Task Navigate to Television … Television Television Television Television Move Move Rotate Done Forward Forward Right

• 120 Scenes • Room types • Kitchen • Living room • Bed room • Bath room • Each room class has 30 scenes • Training : 20 rooms/class • Testing: 5 rooms/class

Challenges • Normally we relocate a seen object in a seen scene • The main challenges are: • Generalizing to unseen scene • Generalizing to unseen object

Using Prior Knowledge Apple Coffee machine Cup Mango

Knowledge Graph

Scene Prior Plate Table Sand- Sink wich next to/on on Painting Remote Coffe Cabinet Machine TV Mug Bowl Table next to next to Cabinet Counter Micro- Laptop wave Box Toaster

Scene Prior Graph Remote n e x t t o Television

Architecture Flow History frames ! " Actor-Critic Model Environment ResNet-50 FC (512) # " Value Word MLP “ Television ” Embedding Action Sampler Policy Remote FC (512) Graph n Convolutional e x t Network t o FC (512) Joint Television Embedding

Architecture Flow with Scene Prior Graph History frames ! " Actor-Critic Model Environment ResNet-50 FC (512) # " Value Word MLP “ Television ” Embedding Action Sampler Policy Remote FC (512) Graph n Convolutional e x t Network t o FC (512) Joint Television Embedding

Graph Convolutional Network (GCN) H ( l +1) = f ( b AH ( l ) W ( l ) ) f ( b : Normalized Adjacency Matrix AH : Node features at the l th layer b AH ( l ) l ) W ( l ) ) : Learnable parameters at the l th Layer : Activation Function (e.g. ReLU) f

GCN for Scene Navigation * + “Fridge” $% ) ' ) ) $% & ' & ) !( # !( # FC (512) 1000 class score ResNet-50 … 512 512 “Toaster” concat 3 Layers The knowledge graph is updated over time according to the recent observations

Action Space • Move Ahead • Move Back • Rotate Right • Rotate Left • Stop We consider the stop action and expect the agent to issue this action when it reaches the target. This makes the learning challenging.

Seen Scenes, No Novel Objects

Bedroom | Mi Mirr rror or

Livingroom | Pa Painting

Kitchen | To Toaster

Kitchen | Mi Microwave

een Scenes, Known Objects Un Unseen

Bathroom | Soa Soap

Bedroom | La Lamp mp

Bedroom | Li Light S Switch ch

Kitchen | Ca Cabinet

een Scenes, No Novel Objects Un Unseen

Bathroom | To Towel

Kitchen | Mi Microwave

Evaluation Metrics • S uccess R ate (SR) • The ratio of successful navigations toward the object over N episodes • S uccess weighted by P ath L ength (SPL) • The ratio of successful navigations toward the object weighted by the path length over N episodes considering both Success Rate and P N as 1 L i i =1 S i max ( P i ,L i ) , N episode i , P represents

(SPL / SR) without STOP action (250 episods) Kitchen Living room Bedroom Bathroom Avg. Random 17.9 / 33.1 12.1 / 30.5 16.8 / 51.2 24.5 / 34.6 17.8 / 37.3 Seen scenes, A3C 79.9 / 86.7 38.8 / 57.6 87.8 / 89.5 93.7 / 96.6 75.0 / 82.5 Known objects Ours 83.5 / 88.2 46.4 / 64.4 90.6 / 92.7 93.6 / 96.5 78.5 / 85.5 Random 10.0 / 23.1 8.0 / 18.5 17.3 / 35.2 11.2 / 32.2 11.6 / 27.2 Seen scenes, A3C 20.2 / 38.8 24.2 / 46.5 23.5 / 35.8 50.2 / 74.6 29.5 / 48.9 Novel objects Ours 22.9 / 53.6 39.5 / 66.5 26.1 / 38.9 50.5 / 78.6 34.7 / 59.4 Random 27.3 / 45.2 5.6 / 16.6 13.1 / 34.5 36.0 / 49.1 20.5 / 36.3 Unseen scenes, A3C 39.5 / 56.2 12.0 / 31.8 22.5 / 49.2 47.4 / 60.2 30.3 / 49.3 Known objects Ours 46.2 / 62.5 13.8 / 40.6 26.5 / 58.6 51.5 / 65.8 34.5 / 56.9 Random 21.3 / 44.3 3.3 / 22.9 25.8 / 47.8 25.5 / 48.9 19.0 / 41.0 Unseen scenes, A3C 26.1 / 56.3 9.4 / 25.1 28.2 / 54.0 33.8 / 90.7 24.4 / 56.5 Novel objects Ours 38.5 / 62.5 13.7 / 40.3 30.1 / 63.1 39.2 / 93.6 30.4 / 64.9 Table 2: Results without termination (stop) action. SPL / Success rate ( ) is shown. We compare

(SPL / SR) with STOP action Kitchen Living room Bedroom Bathroom Avg. Random 2.4 / 3.5 1.1 / 1.7 1.8 / 2.7 3.2 / 4.8 2.1 / 3.1 Seen scenes, A3C 38.5 / 51.0 9.7 / 15.1 6.8 / 11.5 69.1 / 81.0 31.1 / 39.6 Known objects Ours 58.6 / 72.7 12.4 / 18.6 41.6 / 52.4 71.3 / 83.0 46.0 / 56.7 Random 0.9 / 1.3 0.8 / 1.2 2.3 / 3.4 1.4 / 2.1 1.4 / 2.0 Seen scenes, A3C 2.1 / 4.9 3.2 / 4.8 0.5 / 1.7 17.1 / 28.5 5.7 / 9.9 Novel objects Ours 3.2 / 6.1 9.8 / 16.2 6.2 / 8.6 24.7 / 37.3 11.0 / 17.1 Unseen scenes, Random 4.1 / 5.9 0.9 / 1.3 1.6 / 2.4 4.2 / 6.2 2.7 / 3.9 A3C 11.5 / 18.8 0.5 / 2.5 2.2 / 3.8 8.6 / 18.7 5.7 / 10.4 Known objects Ours 12.7 / 20.5 1.0 / 4.0 4.5 / 11.0 8.7 / 21.1 6.7 / 13.4 Random 2.0 / 2.8 0.6 / 1.0 2.0 / 2.8 2.7 / 3.9 1.8 / 2.6 Unseen scenes, A3C 2.2 / 7.5 2.5 / 4.4 1.3 / 4.4 3.4 / 9.3 2.4 / 5.9 Novel objects 3.3 / 12.7 2.8 / 5.3 2.0 / 6.3 4.1 / 12.2 3.1 / 8.5 Ours able 1: Results using termination (stop) action. SPL / Success rate ( ) is shown. We compare

Traditional Training Learning to Adapt Adaptation During Traditional Inference Inference

Initial Model Parameters Compute Adapted Parameters Initialize Model Complete Navigation Episode Take k steps Compute Compute Self- Supervised Supervised Navigation Loss Interaction Loss Backprop to Update Initialization

Learning to Learn Inference how to Learn Navigation Gradient (supervised) Learned Interaction Gradient (self-supervised)

Initial Model Parameters Compute Adapted Parameters Initialize Model Complete Navigation Episode Take k steps Compute Compute Self- Supervised Compute Self- Loss Supervised Navigation Loss Supervised Parameters Interaction Loss Interaction Loss via Neural Network

Navigation-Gradient (Training only) Forward Pass Interaction-Gradient (Training and Inference) 1D Temporal ResNet18 (Frozen) Conv Current Turn Look Move observation Image Down Forward Left Pointwise Feature Conv … 0 1 2 $ Pointwise Conv ,/×.×. ()*×.×. LSTM LSTM LSTM Target Glove Embedding Object Class Tile Laptop FC Concatenated 1 ×"## ,/×.×. policy and $ = # hidden states $ = ) &×(()* + ,) $ = *

Re Results Handcrafted Loss Handcrafted Loss Learned Loss Learned Loss Baseline Baseline SPL Success Training Scenes: 80 Validation Scenes: 20 Test Scenes: 20 Equal Split of Kitchen, Living Room, Bedroom, Bathroom

Goal: Navigate to Book

Thank you !!!!!

Scene Navigation by Knowledge Graph and Interaction Mohammad - PowerPoint PPT Presentation

Scene Navigation by Knowledge Graph and Interaction Mohammad Rastegari ICCV, Oct, 2019 Task Navigate to Television Television Television Television Television Move Move Rotate Done Forward Forward Right 120 Scenes Room

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

Navigation, Gravitation and Navigation, Gravitation and Navigation, Gravitation and Navigation,

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs & hierarchies

Haptic Navigation in Mobile Contexts Agenda What is Haptic Navigation? Advantages of

React Native Navigation Screens, moving, parameters React Navigation React Navigation is not

React Native Navigation: Tabs 1 Tab Navigation the most common style of navigation in

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --> Scene Parsing Scene

Volumetric Scene Reconstruction Volumetric Scene Reconstruction Goal Goal from Multiple

Spatial navigation in humans Recap: navigation strategies and spatial representations Spatial

OFDM Signal Navigation NAV 2008 2 OFDM Signal Navigation NAV 2008 3 OFDM Signal Navigation

HOUSING NAVIGATION CENTER https://www.hayward-ca.gov/content/hayward- housing-navigation-center

the interaction The Interaction interaction models translations between user and system

Physician Compare Preview Period Part II: Accessing and Navigating the Physician Compare Preview

MIDTERM REVIEW NEXT MONDAY: IN-CLASS MIDTERM CANNOT MAKE IT? If for some special circumstance,

Interactive Data Visualization for the Web Scott Murray Technology Foundations Web

Oliver Shaw github.com/Olliepop oshaw.co/react-native/slides.pdf Learn once, write

Deprecating Simplicity @CaseyRosenthal A B C D E F G A B C D E F G A B C D E F

Company Update Second Quarter 2020 August 3, 2020 Safe Harbor Statement This presentation

React Native Navigation Passing data and updating lists Lifecycle 1 Examples These

UO Parents Helping Their Children Navigate Online Learning Sol Joye Associate Director Oregon

Scene Navigation by Knowledge Graph and Interaction Mohammad - PowerPoint PPT Presentation

Scene Navigation by Knowledge Graph and Interaction Mohammad Rastegari ICCV, Oct, 2019 Task Navigate to Television Television Television Television Television Move Move Rotate Done Forward Forward Right 120 Scenes Room

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

Navigation, Gravitation and Navigation, Gravitation and Navigation, Gravitation and Navigation,

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs &amp; hierarchies

Haptic Navigation in Mobile Contexts Agenda What is Haptic Navigation? Advantages of

React Native Navigation Screens, moving, parameters React Navigation React Navigation is not

React Native Navigation: Tabs 1 Tab Navigation the most common style of navigation in

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --&gt; Scene Parsing Scene

Volumetric Scene Reconstruction Volumetric Scene Reconstruction Goal Goal from Multiple

Spatial navigation in humans Recap: navigation strategies and spatial representations Spatial

OFDM Signal Navigation NAV 2008 2 OFDM Signal Navigation NAV 2008 3 OFDM Signal Navigation

HOUSING NAVIGATION CENTER https://www.hayward-ca.gov/content/hayward- housing-navigation-center

the interaction The Interaction interaction models translations between user and system

Physician Compare Preview Period Part II: Accessing and Navigating the Physician Compare Preview

MIDTERM REVIEW NEXT MONDAY: IN-CLASS MIDTERM CANNOT MAKE IT? If for some special circumstance,

Interactive Data Visualization for the Web Scott Murray Technology Foundations Web

Oliver Shaw github.com/Olliepop oshaw.co/react-native/slides.pdf Learn once, write

Deprecating Simplicity @CaseyRosenthal A B C D E F G A B C D E F G A B C D E F

Company Update Second Quarter 2020 August 3, 2020 Safe Harbor Statement This presentation

React Native Navigation Passing data and updating lists Lifecycle 1 Examples These

UO Parents Helping Their Children Navigate Online Learning Sol Joye Associate Director Oregon

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs & hierarchies

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --> Scene Parsing Scene