Towards Real-Time Metric-Semantic SLAM Antoni Rosinol* 1 , Yun Chang - - PowerPoint PPT Presentation

towards real time metric semantic slam
SMART_READER_LITE
LIVE PREVIEW

Towards Real-Time Metric-Semantic SLAM Antoni Rosinol* 1 , Yun Chang - - PowerPoint PPT Presentation

5/13/19 1 Towards Real-Time Metric-Semantic SLAM Antoni Rosinol* 1 , Yun Chang 1 , Marcus Abate 1 , Daniel Wrafter 1, Siyi Hu 1 , Ben Smith 2 , Dan Griffith 2 , Luca Carlone 1 1 2 *arosinol@mit.edu Antoni Rosinol Real-Time Metric-Semantic


slide-1
SLIDE 1

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 1

Towards Real-Time Metric-Semantic SLAM

Antoni Rosinol*1, Yun Chang1, Marcus Abate1, Daniel Wrafter1, Siyi Hu1, Ben Smith2, Dan Griffith2, Luca Carlone1

1 2

*arosinol@mit.edu

slide-2
SLIDE 2

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 2

Motivation

Real-Time Metric-Semantic SLAM, what is it?

  • Metric: understanding the scene at the geometric level (landmarks,

lines, planes, normals, surfaces …)

  • Semantic: understanding the entities in the scene at a human level

(objects such as tables, chairs, coffee mug…)

  • Real-Time: we do not want to wait for hours, not even minutes.
slide-3
SLIDE 3

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 3

Motivation

Fully autonomous systems should operate given high-level tasks, and figure

  • ut the necessary low-level tasks.
slide-4
SLIDE 4

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 4

Bottleneck: 3D Scene Understanding

What does a robot need to accomplish high-level tasks? 3D Geometry of the Scene 3D Semantic segmentation 3D Scene Understanding

Metric-Semantic SLAM

3D Localization

Source: SLAMcore

slide-5
SLIDE 5

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 5

Motivation

Plethora of applications:

  • Search-and-Rescue: find stranded climbers on the mountain
  • Human-level navigation: go to the kitchen and bring me coffee
  • Exploration: find an exit to this building
  • Inventory: count and retrieve all chairs in this venue
  • Workplace Co-bots: give me the wrench, hold this object
  • Agriculture robots: detect and remove weeds, pick and count apples
  • Autonomous cars: bring me to work avoiding pedestrians, cars, …
slide-6
SLIDE 6

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 6

State-of-the-art Human readable Map

Palais des Congrès de Montréal

slide-7
SLIDE 7

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 7

State-of-the-art Robot readable Map

Point Clouds…

slide-8
SLIDE 8

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 8

Bridge the Gap between human vs robot maps

Requirements for the ideal Metric-Semantic 3D map:

  • Dense 3D geometry with topological information (surfaces,

normals, planes)

  • 3D Semantic information (walls, floor, objects)
  • Lightweight
  • Low resolution when possible (planes: walls, floor, …)
  • Easy to compute, store and process
slide-9
SLIDE 9

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 9

Point Clouds

  • Main benefits: allow accurate and fast localization.
  • Main disadvantages: sparse, lacks topology (normal, surfaces, …)
  • Most classical representation for SLAM, yet unsuitable for tasks such as

Obstacle-free navigation, Path Planning.

  • Semantics can be encoded on 3D points [1], but relies on the point cloud

being dense for meaningful segmentation.

[1] PointNet https://arxiv.org/abs/1612.00593

Map representation 3D Topology? Lightweight? Filters Noise/Outliers? Semantics?

Point Clouds

𝗬 ✓/𝗬

No, if Dense

𝗬 ✓/𝗬

No, if Sparse

slide-10
SLIDE 10

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 10

Point Clouds

How can we recover the topology of the scene from sparse samples?

slide-11
SLIDE 11

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 11

3D Mesh

Encoding connectivity of the 3D landmarks in a 3D mesh?

slide-12
SLIDE 12

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 12

3D Mesh

  • Main benefits: adds topological properties, while being efficient, multi-

resolution.

  • Main disadvantages: sensitive to noise, outliers, conceptually difficult

to build incrementally.

Map representation 3D Topology? Lightweight? Filters Noise/Outliers? Semantics?

Point Clouds

𝗬 ✓/𝗬

No, if Dense

𝗬

✓/𝗬

No, if Sparse

3D Mesh

✓ ✓ 𝗬 ✓

slide-13
SLIDE 13

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 13

3D Mesh

  • Ideally, one may achieve computer graphics levels of detail where

needed, while keeping mesh coarse otherwise:

If it wasn’t for the noisy and outlier 3D points...

slide-14
SLIDE 14

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 14

Volumetric Methods: Voxels/Octrees

  • Main benefits: robust to noise/outliers, dense.
  • Main disadvantages: costly to compute/store, fixed resolution, lacks

geometric invariance (shifts of cost volume produce different results).

Map representation 3D Topology? Lightweight? Filters Noise/Outliers? Semantics?

Point Clouds

𝗬 ✓/𝗬

No, if Dense

𝗬

✓/𝗬

No, if Sparse

3D Mesh

✓ ✓ 𝗬 ✓

Voxels

𝗬 ✓/𝗬

No, if small voxel

✓ ✓/𝗬

No, if large voxel

slide-15
SLIDE 15

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 15

3D Meshes need regularization

3D (local) mesh generation from noisy measurements requires regularization:

  • Variational approaches [1]
  • Surfel Meshing [2]
  • Structural Regularities [3]:

[1] W. N. Greene and N. Roy. "FLaME: Fast Lightweight Mesh Estimation Using Variational Smoothing on Delaunay Graphs". Proceedings of the International Conference on Computer Vision (ICCV), Venice, Italy, 2017. [2] Thomas Schöps and Torsten Sattler and Marc Pollefeys“SurfelMeshing: Online Surfel-Based Mesh Reconstruction” [3] Antoni Rosinol, Torsten Sattler, Marc Pollefeys, Luca Carlone. “Incremental Visual-Inertial 3D Mesh Generation with Structural Regularities” IEEE Int.

  • Conf. Robot. Autom. (ICRA), 2019

[4] E. Piazza, A. Romanoni, and M. Matteucci, “Real-time CPU-based large-scale 3D mesh reconstruction,” in RA-L, 2018. [5] M. Kazhdan, M. Bolitho, and H. Hoppe, “Poisson surface recon- struction,” in SGP, 2006.

Source: [3] Global methods such as Delaunay triangulation [4] or Poisson reconstruction [5] are too computationally expensive to run in real-time (for SLAM) on the dense point…

slide-16
SLIDE 16

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 16

Our Approach:

Real-Time Multi-Frame Incremental 3D Mesh generation + Pose Estimation in a tightly coupled approach using Structural Regularities

[2] Antoni Rosinol, Torsten Sattler, Marc Pollefeys, Luca Carlone. “Incremental Visual-Inertial 3D Mesh Generation with Structural Regularities” IEEE

  • Int. Conf. Robot. Autom. (ICRA), 2019

https://www.mit.edu/~arosinol/research/struct3dmesh.html

slide-17
SLIDE 17

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 17

Our Approach:

Real-Time Multi-Frame Incremental 3D Mesh generation + Pose Estimation in a tightly coupled approach using Structural Regularities

[2] Antoni Rosinol Vidal, Torsten Sattler, Marc Pollefeys, Luca Carlone. “Incremental Visual-Inertial 3D Mesh Generation with Structural Regularities” IEEE Int. Conf. Robot. Autom. (ICRA), 2019

slide-18
SLIDE 18

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 18

FUSES: Fast Unconstrained SEmidefinite Solver

  • Fastest MRF solver: outperforms

state of the art by 2-3x

  • Near-optimal solution

(typically 0.1% from opt.)

  • Same approach can be applied to 3D

mesh segmentation.

  • Evaluation on Cityscapes dataset

Open-Source C++ code: https://github.com/MIT-SPARK/FUSES [1] Siyi Hu, Luca Carlone “Accelerated inference in Markov random fields via smooth Riemannian optimization”

  • Markov Random Field (MRF): assign a

discrete label to each node given

= dog = background

slide-19
SLIDE 19

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 19

Current Datasets: lack of at least one sensor modality

  • Most datasets lack one of the following requirements:
  • Stereo Images
  • IMU data (synchronized with images)
  • 2D Semantic annotations
  • Only KITTI satisfies requirements, but… just 200 labeled images, and poor

IMU data synchronization.

  • What about synthetic data simulators:

Unfortunately, few simulators support modelling IMU data + Semantics:

  • Gazebo: but it does not provide photorealistic images…
  • FlightGoggles: IMU and photorealistic images, missing ground-truth semantic annotations.
  • AirSim: lacks comprehensive ROS support

Introducing our own Photorealistic + Physics Simulator…

slide-20
SLIDE 20

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 20

Photorealistic Physics Simulator:

Joint work with MIT Lincoln Labs: Benjamin Smith, Dan Griffith

slide-21
SLIDE 21

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 21

Photorealistic Physics Simulator:

Joint work with MIT Lincoln Labs: Benjamin Smith, Dan Griffith

slide-22
SLIDE 22

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 22

Photorealistic Physics Simulator:

Joint work with MIT Lincoln Labs: Benjamin Smith, Dan Griffith

2D Semantic Segmentation 3D Dense Stereo Reconstruction Global Semantic 3D Mesh

slide-23
SLIDE 23

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 23

Kitti Results

Results in Kitti, with ground-truth poses but 2D semantic labels estimated using real-time ESPNetv2 [1]:

[1] Sachin Mehta and Mohammad Rastegari and Linda G. Shapiro and Hannaneh Hajishirzi “ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network”

slide-24
SLIDE 24

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 24

Future Work: solving 2D Semantic Segmentation failures

State-of-the-art 2D semantic segmentation techniques fail in a number of scenarios.

slide-25
SLIDE 25

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 25

Future Work: solving 2D Semantic Segmentation failures

slide-26
SLIDE 26

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 26

Future Work: solving dense 3D reconstruction failures

  • Traditional SLAM fails in a number of cases as well:
  • Low-texture
  • Specularities, reflections
  • Low parallax

[1]

slide-27
SLIDE 27

Antoni Rosinol Real-Time Metric-Semantic SLAM

5/13/19 27

Conclusion

Real-Time Metric-Semantic 3D Mesh SLAM: the ultimate perception pipeline?

  • Might hold the key to perfecting :
  • 3D semantic segmentation
  • 3D geometry estimation
  • 3D localization
  • On top of that Real-Time? Truly Disruptive…