for mapping environments Joo F. Henriques, Andrea Vedaldi Visual - - PowerPoint PPT Presentation

for mapping environments
SMART_READER_LITE
LIVE PREVIEW

for mapping environments Joo F. Henriques, Andrea Vedaldi Visual - - PowerPoint PPT Presentation

MapNet: An allocentric spatial memory for mapping environments Joo F. Henriques, Andrea Vedaldi Visual Geometry Group Motivation What we usually have: Object detections Segmentations 3D information (relative to camera)


slide-1
SLIDE 1

MapNet: An allocentric spatial memory for mapping environments

João F. Henriques, Andrea Vedaldi Visual Geometry Group

slide-2
SLIDE 2

2 Henriques and Vedaldi, MapNet, CVPR 2018

Motivation

What we usually have:

  • Object detections
  • Segmentations
  • 3D information

(relative to camera)

  • ...

Image-centric tasks

slide-3
SLIDE 3

3 Henriques and Vedaldi, MapNet, CVPR 2018

Motivation

What we would like:

  • Reason beyond image, into world
  • Object permanence
  • Eventually, long-term goals and planning

World-centric tasks

slide-4
SLIDE 4

4 Henriques and Vedaldi, MapNet, CVPR 2018

Simultaneous Localization And Mapping (SLAM)

Agent

Map Location Frame #1

Agent

Map Location Frame #2

Agent

Frame #3

...

Time

  • Hard to adapt to new environments (hand-tuning)
  • No semantic information
  • No use of priors to compensate for missing data

Classic SLAM (No learning)

slide-5
SLIDE 5

5 Henriques and Vedaldi, MapNet, CVPR 2018

Related work – deep learning for SLAM

Time

  • No map
  • Cannot correct for inevitable drift

Egomotion predictors

Costante’15, Clark’17, Zhu’17, Wang’17, ... Agent

Location Frame #1

Agent

Location Frame #2

Agent

Frame #3

...

slide-6
SLIDE 6

6 Henriques and Vedaldi, MapNet, CVPR 2018

Related work – deep learning for SLAM

Map (offline)

Time

  • Map is stored in deep network’s parameters
  • New environments require re-training

Offline-learned localization

Kendall’15, Mirowski’18, Brahmbhatt’18, ... Agent

Location Frame #1

Agent

Location Frame #2

Agent

Frame #3

...

slide-7
SLIDE 7

7 Henriques and Vedaldi, MapNet, CVPR 2018

Related work – deep learning for SLAM

Time

  • Map is created on-the-fly as activations
  • Perfect egomotion input is used for localization, not map
  • Tested on synthetic environments (so far)

Online mapping, no localization

Kanitscheider’16, Gupta’17, Zhang’17, Parisotto’17, ... Agent

Location (egomotion) Map Frame #1

Agent

Map Frame #2

Agent

Frame #3

...

slide-8
SLIDE 8

8 Henriques and Vedaldi, MapNet, CVPR 2018

Proposed method

Agent

Map Location Frame #1

Agent

Map Location Frame #2

Agent

Frame #3

...

Time

  • Performs both Mapping and Localization with a deep net
  • No egomotion information
  • Fully online (mapping as we go)

Our method (MapNet)

slide-9
SLIDE 9

9 Henriques and Vedaldi, MapNet, CVPR 2018

Allocentric map memory

Image Map tensor Position/orientation heatmap Localization Mapping

𝑦 𝑧

Map model:

  • Represent ground plane as 2D grid.
  • Store one embedding per location.
  • Allows associating semantics with

world coordinates.

Embedding

slide-10
SLIDE 10

10 Henriques and Vedaldi, MapNet, CVPR 2018

Localization and mapping as dual operators

Embedding Image

∗ ⋆

Location Map memory at time 𝑢 Map memory at time 𝑢 + 1

Core insight: Localization ⇔ convolution Mapping ⇔ deconvolution

slide-11
SLIDE 11

11 Henriques and Vedaldi, MapNet, CVPR 2018

Ground projected CNN features

Image Ground projection Local view (CNN embeddings in the ground-plane)

CNN

Depth

  • Given depth and camera intrinsics,

project CNN features to ground-plane.

  • Since camera pose is unknown, the
  • utput 2D grid is local (camera-space).
slide-12
SLIDE 12

12 Henriques and Vedaldi, MapNet, CVPR 2018

Localization

𝜏 ⋆

Position heatmap Local view Cross-correlation Softmax Map

Localize by dense matching of the local view’s embeddings to the map.

  • Requires only one cross-correlation

(convolution).

  • Can be interpreted as addressing a

spatial associative memory.

slide-13
SLIDE 13

13 Henriques and Vedaldi, MapNet, CVPR 2018

Localization

Resampler (rotation)

𝜏 ⋆

Position and orientation heatmap Orientations Local view Rotated local views Map Cross-correlation Softmax

Also consider camera orientation:

  • Simply resample the local

view at several rotations.

  • Use as filter bank for

cross-correlation.

slide-14
SLIDE 14

14 Henriques and Vedaldi, MapNet, CVPR 2018

Localization

Camera reference-frame World reference-frame

slide-15
SLIDE 15

15 Henriques and Vedaldi, MapNet, CVPR 2018

Mapping

Rotated local views Position and orientation heatmap Registered local view

The mapping step updates the map with the local view.

  • The local view must be registered to world-space.
  • Requires one deconvolution of the position/orientation

heatmap, using the local views (filter bank).

Deconvolution

  • After registration, the local view can

be easily integrated into the map (e.g. by linear interpolation, or a convolutional LSTM)

slide-16
SLIDE 16

16 Henriques and Vedaldi, MapNet, CVPR 2018

Full pipeline

Ground projection Resampler (rotation) CNN Image

𝜏 ∗ ⋆

LSTM Local view Position and orientation heatmap Map Updated map Registered local view

slide-17
SLIDE 17

17 Henriques and Vedaldi, MapNet, CVPR 2018

Full pipeline

Ground projection Resampler (rotation) CNN Image

𝜏

LSTM Local view Position and orientation heatmap Map Updated map Registered local view

Mapping ⇔ deconvolution

Localization ⇔ convolution

slide-18
SLIDE 18

18 Henriques and Vedaldi, MapNet, CVPR 2018

Experiments – 2D data

Toy problem setup

  • 100,000 mazes
  • Agent moves at random
  • Limited, local visibility

Training

  • Input sequences of 5 frames
  • Position/orientation supervision
  • Min. logistic loss of predicted position (heatmap)

Local view

slide-19
SLIDE 19

19 Henriques and Vedaldi, MapNet, CVPR 2018

Experiments – 2D data

Local view (always facing right) Global view Predicted heatmap (blue – ground truth)

slide-20
SLIDE 20

20 Henriques and Vedaldi, MapNet, CVPR 2018

Experiments – 2D data

Global view Local view (always facing right) Predicted heatmap (blue – ground truth)

slide-21
SLIDE 21

21 Henriques and Vedaldi, MapNet, CVPR 2018

Experiments – 2D data

Map tensor (one channel per column)

Sample #1 Sample #2 Sample #3 Sample #4

⇒ Several local views are integrated into a larger map.

slide-22
SLIDE 22

22 Henriques and Vedaldi, MapNet, CVPR 2018

Experiments – 2D data

Is this map semantic? →

  • Assigned class labels to maze cells

(corridors, turns, dead-ends...).

  • Class label is correctly predicted from

a cell’s embedding most of the time. Yes!

Map embedding Class labels (color-coded) Balanced dataset prediction accuracy (chance: 50%)

slide-23
SLIDE 23

23 Henriques and Vedaldi, MapNet, CVPR 2018

Experiments – 3D game data

ResearchDoom Dataset

  • 4 recorded speed-runs through

the whole game

  • 6 hours of gameplay
  • Challenging, large hand-crafted

levels https://www.youtube.com/watch?v=mInSO7YW1EU

slide-24
SLIDE 24

24 Henriques and Vedaldi, MapNet, CVPR 2018

Experiments – 3D real data

Active Vision Dataset

  • Robot platform in 19 indoor scenes
  • Images collected at all

positions/orientations

  • Can be composed into unlimited

sequences https://www.youtube.com/watch?v=-MUXfcrxGEM

slide-25
SLIDE 25

25 Henriques and Vedaldi, MapNet, CVPR 2018

Experiments – 3D data quantitative results

ResearchDoom Dataset Active Vision Dataset

slide-26
SLIDE 26

26 Henriques and Vedaldi, MapNet, CVPR 2018

Conclusions

  • We perform SLAM entirely online

using an end-to-end learned architecture.

  • Localization and Mapping are a dual pair of

convolution/deconvolution.

  • Semantic embeddings of the World arise

from the self-localization objective.

  • Next step: navigation and long-term goals.

Project page with code: www.robots.ox.ac.uk/~joao/mapnet