for mapping environments
play

for mapping environments Joo F. Henriques, Andrea Vedaldi Visual - PowerPoint PPT Presentation

MapNet: An allocentric spatial memory for mapping environments Joo F. Henriques, Andrea Vedaldi Visual Geometry Group Motivation What we usually have: Object detections Segmentations 3D information (relative to camera)


  1. MapNet: An allocentric spatial memory for mapping environments João F. Henriques, Andrea Vedaldi Visual Geometry Group

  2. Motivation What we usually have: Object detections • Segmentations • 3D information • (relative to camera) ... • Image-centric ⇒ tasks Henriques and Vedaldi, MapNet , CVPR 2018 2

  3. Motivation What we would like: Reason beyond image, into world • Object permanence • Eventually, long-term goals and planning • World-centric ⇒ tasks Henriques and Vedaldi, MapNet , CVPR 2018 3

  4. Simultaneous Localization And Mapping (SLAM) Frame #3 Frame #1 Frame #2 Time Location Location ... Agent Agent Agent Map Map Hard to adapt to new environments (hand-tuning) • Classic SLAM No semantic information • (No learning) No use of priors to compensate for missing data • Henriques and Vedaldi, MapNet , CVPR 2018 4

  5. Related work – deep learning for SLAM Frame #3 Frame #1 Frame #2 Time ... Agent Agent Agent Location Location No map • Egomotion predictors Cannot correct for inevitable drift • Costante ’ 15, Clark ’ 17, Zhu ’ 17, Wang ’ 17, ... Henriques and Vedaldi, MapNet , CVPR 2018 5

  6. Related work – deep learning for SLAM Frame #3 Frame #1 Frame #2 Time ... Agent Agent Agent Location Location Map (offline) Map is stored in deep network ’ s parameters • Offline-learned localization New environments require re-training • Kendall ’ 15, Mirowski ’ 18, Brahmbhatt ’ 18, ... Henriques and Vedaldi, MapNet , CVPR 2018 6

  7. Related work – deep learning for SLAM Frame #3 Frame #1 Frame #2 Time ... Agent Agent Agent Map Map Location (egomotion) Map is created on-the-fly as activations • Online mapping, Perfect egomotion input is used for localization, not map • no localization Tested on synthetic environments (so far) • Kanitscheider ’ 16, Gupta ’ 17, Zhang ’ 17, Parisotto ’ 17, ... Henriques and Vedaldi, MapNet , CVPR 2018 7

  8. Proposed method Frame #3 Frame #1 Frame #2 Time Location Location ... Agent Agent Agent Map Map Performs both Mapping and Localization with a deep net • Our method No egomotion information • (MapNet) Fully online (mapping as we go) • Henriques and Vedaldi, MapNet , CVPR 2018 8

  9. Allocentric map memory Image Map model: Represent ground plane as 2D grid. • Store one embedding per location. 𝑦 • 𝑧 Localization Allows associating semantics with • Embedding world coordinates . Mapping Position/orientation Map tensor heatmap Henriques and Vedaldi, MapNet , CVPR 2018 9

  10. Localization and mapping as dual operators Embedding Image Location ⋆ ∗ Map memory Map memory at time 𝑢 at time 𝑢 + 1 Core insight: Localization ⇔ convolution Mapping ⇔ deconvolution Henriques and Vedaldi, MapNet , CVPR 2018 10

  11. Ground projected CNN features Ground projection CNN Local view Image (CNN embeddings in the ground-plane) Given depth and camera intrinsics, • project CNN features to ground-plane. Since camera pose is unknown, the • output 2D grid is local (camera-space). Depth Henriques and Vedaldi, MapNet , CVPR 2018 11

  12. Localization Localize by dense matching of the local view ’ s embeddings to the map. Position heatmap Local view Cross-correlation Softmax 𝜏 ⋆ Requires only one cross-correlation • (convolution). Can be interpreted as addressing a • Map spatial associative memory . Henriques and Vedaldi, MapNet , CVPR 2018 12

  13. Localization Also consider camera orientation : Rotated local views Position and orientation heatmap Local view Cross-correlation Softmax Resampler 𝜏 ⋆ (rotation) Orientations Simply resample the local • view at several rotations. Map Use as filter bank for • cross-correlation. Henriques and Vedaldi, MapNet , CVPR 2018 13

  14. Localization Camera reference-frame World reference-frame Henriques and Vedaldi, MapNet , CVPR 2018 14

  15. Mapping The mapping step updates the map with the local view. Rotated local views The local view must be registered to world-space. • Requires one deconvolution of the position/orientation • heatmap, using the local views (filter bank). After registration, the local view can ∗ • Deconvolution be easily integrated into the map Registered local view (e.g. by linear interpolation, or a Position and orientation heatmap convolutional LSTM) Henriques and Vedaldi, MapNet , CVPR 2018 15

  16. Full pipeline Image Local view Ground Resampler CNN projection (rotation) Registered local view 𝜏 ⋆ ∗ Position and orientation heatmap LSTM Map Updated map Henriques and Vedaldi, MapNet , CVPR 2018 16

  17. Full pipeline Image Local view Ground Resampler CNN projection (rotation) Mapping ⇔ deconvolution Registered local view Localization ⇔ 𝜏 ⋆ ∗ convolution Position and orientation heatmap LSTM Map Updated map Henriques and Vedaldi, MapNet , CVPR 2018 17

  18. Experiments – 2D data Toy problem setup 100,000 mazes • Agent moves at random • Local view Limited, local visibility • Training Input sequences of 5 frames • Position/orientation supervision • Min. logistic loss of predicted position (heatmap) • Henriques and Vedaldi, MapNet , CVPR 2018 18

  19. Experiments – 2D data Global view Local view (always facing right) Predicted heatmap (blue – ground truth) Henriques and Vedaldi, MapNet , CVPR 2018 19

  20. Experiments – 2D data Global view Local view (always facing right) Predicted heatmap (blue – ground truth) Henriques and Vedaldi, MapNet , CVPR 2018 20

  21. Experiments – 2D data Map tensor (one channel per column) Sample #1 Sample #2 Sample #3 Sample #4 ⇒ Several local views are integrated into a larger map. Henriques and Vedaldi, MapNet , CVPR 2018 21

  22. Experiments – 2D data Yes! Is this map semantic? → Assigned class labels to maze cells • Map embedding Class labels (color-coded) (corridors, turns, dead-ends...). Class label is correctly predicted from • a cell ’ s embedding most of the time. Balanced dataset prediction accuracy (chance: 50%) Henriques and Vedaldi, MapNet , CVPR 2018 22

  23. Experiments – 3D game data ResearchDoom Dataset • 4 recorded speed-runs through the whole game https://www.youtube.com/watch?v=mInSO7YW1EU • 6 hours of gameplay • Challenging, large hand-crafted levels Henriques and Vedaldi, MapNet , CVPR 2018 23

  24. Experiments – 3D real data Active Vision Dataset • Robot platform in 19 indoor scenes • Images collected at all https://www.youtube.com/watch?v=-MUXfcrxGEM positions/orientations • Can be composed into unlimited sequences Henriques and Vedaldi, MapNet , CVPR 2018 24

  25. Experiments – 3D data quantitative results ResearchDoom Dataset Active Vision Dataset Henriques and Vedaldi, MapNet , CVPR 2018 25

  26. Conclusions We perform SLAM entirely online • using an end-to-end learned architecture. Localization and Mapping are a dual pair of • convolution/deconvolution . Semantic embeddings of the World arise • from the self-localization objective. Next step: navigation and long-term goals. • Project page with code: www.robots.ox.ac.uk/~joao/mapnet Henriques and Vedaldi, MapNet , CVPR 2018 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend