Scene Navigation by Knowledge Graph and Interaction Mohammad - - PowerPoint PPT Presentation

scene navigation by knowledge graph and interaction
SMART_READER_LITE
LIVE PREVIEW

Scene Navigation by Knowledge Graph and Interaction Mohammad - - PowerPoint PPT Presentation

Scene Navigation by Knowledge Graph and Interaction Mohammad Rastegari ICCV, Oct, 2019 Task Navigate to Television Television Television Television Television Move Move Rotate Done Forward Forward Right 120 Scenes Room


slide-1
SLIDE 1

Scene Navigation by Knowledge Graph and Interaction

Mohammad Rastegari ICCV, Oct, 2019

slide-2
SLIDE 2

Task

Move Forward

Done

Television Television Television Television

Move Forward Rotate Right

Navigate to Television …

slide-3
SLIDE 3
  • 120 Scenes
  • Room types
  • Kitchen
  • Living room
  • Bed room
  • Bath room
  • Each room class has 30 scenes
  • Training : 20 rooms/class
  • Testing: 5 rooms/class
slide-4
SLIDE 4

Challenges

  • Normally we relocate a seen object in a seen scene
  • The main challenges are:
  • Generalizing to unseen scene
  • Generalizing to unseen object
slide-5
SLIDE 5

Using Prior Knowledge

Coffee machine Apple

Cup Mango

slide-6
SLIDE 6

Knowledge Graph

slide-7
SLIDE 7

Mug

Plate Sink

Cabinet

Bowl

Laptop

Toaster

Micro- wave

Table

Coffe Machine

Sand- wich

next to next to/on

TV

Table

Remote

Counter

Box

Cabinet

Painting

next to

  • n

Scene Prior

slide-8
SLIDE 8

Scene Prior Graph

Remote Television n e x t t

slide-9
SLIDE 9

Architecture Flow

Remote Television n e x t t

  • ResNet-50

Word Embedding Graph Convolutional Network “Television”

FC (512)

Value

Joint Embedding

Actor-Critic Model Action Sampler Environment Policy MLP

History frames

!"

#"

FC (512) FC (512)

slide-10
SLIDE 10

Architecture Flow with Scene Prior Graph

Remote Television n e x t t

  • ResNet-50

Word Embedding Graph Convolutional Network “Television”

FC (512)

Value

Joint Embedding

Actor-Critic Model Action Sampler Environment Policy MLP

History frames

!"

#"

FC (512) FC (512)

slide-11
SLIDE 11

Architecture Flow with Scene Prior Graph

Remote Television n e x t t

  • ResNet-50

Word Embedding Graph Convolutional Network “Television”

FC (512)

Value

Joint Embedding

Actor-Critic Model Action Sampler Environment Policy MLP

History frames

!"

#"

FC (512) FC (512)

slide-12
SLIDE 12

Graph Convolutional Network (GCN)

H(l+1) = f( b AH(l)W (l)) f( b AH b AH(l)

l)W (l))

f

: Normalized Adjacency Matrix : Node features at the lth layer : Learnable parameters at the lth Layer : Activation Function (e.g. ReLU)

slide-13
SLIDE 13

GCN for Scene Navigation

512 512

FC (512)

!( # $% & ' & ) !( # $% ) ' ) )

“Fridge” “Toaster”

1000 class score

ResNet-50 concat

*+

3 Layers

The knowledge graph is updated over time according to the recent observations

slide-14
SLIDE 14

Action Space

  • Move Ahead
  • Move Back
  • Rotate Right
  • Rotate Left
  • Stop

We consider the stop action and expect the agent to issue this action when it reaches the target. This makes the learning challenging.

slide-15
SLIDE 15

Seen Scenes, No

Novel Objects

slide-16
SLIDE 16

Bedroom | Mi Mirr rror

  • r
slide-17
SLIDE 17

Livingroom | Pa Painting

slide-18
SLIDE 18

Kitchen | To Toaster

slide-19
SLIDE 19

Kitchen | Mi Microwave

slide-20
SLIDE 20

Un Unseen een Scenes, Known Objects

slide-21
SLIDE 21

Bathroom | Soa Soap

slide-22
SLIDE 22

Bedroom | La Lamp mp

slide-23
SLIDE 23

Bedroom | Li Light S Switch ch

slide-24
SLIDE 24

Kitchen | Ca Cabinet

slide-25
SLIDE 25

Un Unseen een Scenes, No Novel Objects

slide-26
SLIDE 26

Bathroom | To Towel

slide-27
SLIDE 27

Kitchen | Mi Microwave

slide-28
SLIDE 28

Evaluation Metrics

  • Success Rate (SR)
  • The ratio of successful navigations toward the object over N episodes
  • Success weighted by Path Length (SPL)
  • The ratio of successful navigations toward the object weighted by the path

length over N episodes

considering both Success Rate and as 1

N

PN

i=1 Si Li max (Pi,Li),

episode i, P represents

slide-29
SLIDE 29

Kitchen Living room Bedroom Bathroom Avg. Seen scenes, Random 17.9 / 33.1 12.1 / 30.5 16.8 / 51.2 24.5 / 34.6 17.8 / 37.3 A3C 79.9 / 86.7 38.8 / 57.6 87.8 / 89.5 93.7 /96.6 75.0 / 82.5 Known objects Ours 83.5 / 88.2 46.4 /64.4 90.6 / 92.7 93.6 / 96.5 78.5 / 85.5 Seen scenes, Random 10.0 / 23.1 8.0 / 18.5 17.3 / 35.2 11.2 / 32.2 11.6 / 27.2 A3C 20.2 / 38.8 24.2 / 46.5 23.5 / 35.8 50.2 / 74.6 29.5 / 48.9 Novel objects Ours 22.9 / 53.6 39.5 / 66.5 26.1 / 38.9 50.5 / 78.6 34.7 / 59.4 Unseen scenes, Random 27.3 / 45.2 5.6 / 16.6 13.1 / 34.5 36.0 / 49.1 20.5 / 36.3 A3C 39.5 / 56.2 12.0 / 31.8 22.5 / 49.2 47.4 / 60.2 30.3 / 49.3 Known objects Ours 46.2 / 62.5 13.8 / 40.6 26.5 / 58.6 51.5 / 65.8 34.5 / 56.9 Unseen scenes, Random 21.3 / 44.3 3.3 / 22.9 25.8 / 47.8 25.5 / 48.9 19.0 / 41.0 A3C 26.1 / 56.3 9.4 / 25.1 28.2 / 54.0 33.8 / 90.7 24.4 / 56.5 Novel objects Ours 38.5 / 62.5 13.7 / 40.3 30.1 / 63.1 39.2 / 93.6 30.4 / 64.9 Table 2: Results without termination (stop) action. SPL / Success rate ( ) is shown. We compare

(SPL / SR) without STOP action (250 episods)

slide-30
SLIDE 30

(SPL / SR) with STOP action

Kitchen Living room Bedroom Bathroom Avg. Seen scenes, Random 2.4 / 3.5 1.1 / 1.7 1.8 / 2.7 3.2 / 4.8 2.1 / 3.1 A3C 38.5 / 51.0 9.7 / 15.1 6.8 / 11.5 69.1 / 81.0 31.1 / 39.6 Known objects Ours 58.6 / 72.7 12.4 / 18.6 41.6 / 52.4 71.3 / 83.0 46.0 / 56.7 Seen scenes, Random 0.9 / 1.3 0.8 / 1.2 2.3 / 3.4 1.4 / 2.1 1.4 / 2.0 A3C 2.1 / 4.9 3.2 / 4.8 0.5 / 1.7 17.1 / 28.5 5.7 / 9.9 Novel objects Ours 3.2 / 6.1 9.8 / 16.2 6.2 / 8.6 24.7 / 37.3 11.0 / 17.1 Unseen scenes, Random 4.1 / 5.9 0.9 / 1.3 1.6 / 2.4 4.2 / 6.2 2.7 / 3.9 A3C 11.5 / 18.8 0.5 / 2.5 2.2 / 3.8 8.6 / 18.7 5.7 / 10.4 Known objects Ours 12.7 / 20.5 1.0 / 4.0 4.5 / 11.0 8.7 / 21.1 6.7 / 13.4 Unseen scenes, Random 2.0 / 2.8 0.6 / 1.0 2.0 / 2.8 2.7 / 3.9 1.8 / 2.6 A3C 2.2 / 7.5 2.5 / 4.4 1.3 / 4.4 3.4 / 9.3 2.4 / 5.9 Novel objects Ours 3.3 / 12.7 2.8 / 5.3 2.0 / 6.3 4.1 / 12.2 3.1 / 8.5 able 1: Results using termination (stop) action. SPL / Success rate ( ) is shown. We compare

slide-31
SLIDE 31
slide-32
SLIDE 32

Traditional Training Learning to Adapt Traditional Inference Adaptation During Inference

slide-33
SLIDE 33

Initial Model Parameters Initialize Model Take k steps Compute Self- Supervised Interaction Loss Compute Adapted Parameters Complete Navigation Episode Compute Supervised Navigation Loss Backprop to Update Initialization

slide-34
SLIDE 34

Navigation Gradient (supervised)

Learning to Learn how to Learn Inference

Learned Interaction Gradient (self-supervised)

slide-35
SLIDE 35

Initial Model Parameters Initialize Model Take k steps Compute Self- Supervised Interaction Loss Compute Adapted Parameters Complete Navigation Episode Compute Supervised Navigation Loss Loss Parameters Compute Self- Supervised Interaction Loss via Neural Network

slide-36
SLIDE 36

LSTM Turn Left Look Down Move Forward

Image Feature ResNet18 (Frozen) Current

  • bservation

Glove Embedding 1×"## FC Tile $ = # Concatenated policy and hidden states &×(()* + ,) ()*×.×. ,/×.×. ,/×.×. Laptop Target Object Class $ = ) $ = * Navigation-Gradient (Training only) Forward Pass Interaction-Gradient (Training and Inference)

Pointwise Conv Pointwise Conv

1D Temporal Conv LSTM LSTM 01 2$

slide-37
SLIDE 37

Re Results SPL Success

Handcrafted Loss Handcrafted Loss Baseline Baseline Learned Loss Learned Loss

Training Scenes: 80 Validation Scenes: 20 Test Scenes: 20 Equal Split of Kitchen, Living Room, Bedroom, Bathroom

slide-38
SLIDE 38

Goal: Navigate to Book

slide-39
SLIDE 39

Thank you !!!!!