Lifelong Visual Mapping Linguang Zhang Princeton Vision Group - - PowerPoint PPT Presentation

lifelong visual mapping
SMART_READER_LITE
LIVE PREVIEW

Lifelong Visual Mapping Linguang Zhang Princeton Vision Group - - PowerPoint PPT Presentation

Lifelong Visual Mapping Linguang Zhang Princeton Vision Group Papers Toward lifelong object segmentation from change detection in dense rgb-d maps. Towards Semantic KinectFusion. Slam++: Simultaneous localisation and mapping at the


slide-1
SLIDE 1

Lifelong Visual Mapping

Linguang Zhang Princeton Vision Group

slide-2
SLIDE 2

Papers

  • Toward lifelong object segmentation from change

detection in dense rgb-d maps.

  • Towards Semantic KinectFusion.
  • Slam++: Simultaneous localisation and mapping at

the level of objects.

  • 3D mapping, localisation and object retrieval using

low cost robotic platforms: A robotic search engine for the real-world.

slide-3
SLIDE 3

Current SLAM Systems

  • NO truly ‘pick up and play’ SLAM systems.
  • Low-cost devices.
  • Used by non-expert users.
  • Sparse feature-based SLAM:
  • world is modeled as an unconnected point cloud.
  • can be improved by key frame SLAM systems (bundle adjustment).
  • Dense SLAM:
  • GPGPU processing hardware.
  • Reconstruct and track full surface models.
  • KinectFusion (extension using a sliding volume: Kintinuous).
  • Truly scalable, multi-resolution, loop closure capable dense non-parametric surface representation has not

been developed.

  • Wasteful in environments with symmetry.
slide-4
SLIDE 4

Tools

  • KinectFusion
  • Kintinuous: https://www.youtube.com/watch?v=D3yYjaLmiqU
  • Iterative Closest Point: https://www.youtube.com/watch?

v=PCPfuZ7njmQ

slide-5
SLIDE 5

Lifelong Object Segmentation

  • Input: two RGB-D maps with regions of overlap where

the map have changed (obtained from Kintinuous system)

  • Goals:
  • Discover objects by differencing.
  • Training segmentation algorithms.
  • If the same object is discovered, refine the features

and the segmentation method.

slide-6
SLIDE 6

Object Discovery

  • RGB-D SLAM - Kintinuous.
  • Map Alignment - manually initialized and refined by ICP.
  • Differencing (Symmetric).
  • .
  • Filtering.
  • Small scattered points - remove clusters smaller than a 3 x 3 x

3 cm cube (estimated from Kintinuous volumetric resolution).

  • Free-space filtering.
slide-7
SLIDE 7

Free-space Filtering

after filtering before filtering

slide-8
SLIDE 8

Free-space Filtering

unseen

  • ccupied

free space Only assume the clusters protrude into the free-space as

  • bjects.
slide-9
SLIDE 9

Segmentation Methods

  • Graph structure.
  • Treat every point in the map as a node.
  • Neighboring nodes are defined by the points within a radius

r’ (twice the volumetric resolution).

  • Connect neighboring nodes by undirected edges with weights

(initialized by T).

  • Kill edges with a weight below a dynamic threshold.
  • Threshold (controlled by k) grows after each joining.
  • k is correlated to the segment size.
slide-10
SLIDE 10

Edge Weights

  • Color edge weights:
  • Normal edge weights:

Convex Convex parts more likely correspond to objects. Concave parts correspond to object boundaries.

slide-11
SLIDE 11

Segmentation Fitting

  • Scoring.
  • Optimization.
slide-12
SLIDE 12

Different Segmentation Methods

Color Convexity surface normal

slide-13
SLIDE 13

Segmentation Fitting

  • Object representation.
  • based on Principle Component Analysis (PCA).
  • each feature is used as a normal distribution with

mean being the measured value and variance being

  • Object matching.
slide-14
SLIDE 14

Lifelong Learning

  • Manually group discovered objects together to guarantee the

correct convergence for the variance of the features.

  • Update scores and feature distributions.
  • new score is the weighted average using the number of

times each object was observed as the weights.

slide-15
SLIDE 15

Results

Precision Recall Curve trash bin stuffed bunny

slide-16
SLIDE 16

Results

slide-17
SLIDE 17

Semantic KinectFusion

  • Contributions.
  • using a TSDF volume to build a keyframe

representation of the environment.

  • Semantic Bundle Adjustment.
slide-18
SLIDE 18

Keyframe Representation

  • Surface voxels: voxels having a non-truncated

function value in TSDF.

  • Adding a new field to each TSDF voxel to keep

track of the list of keyframes.

  • Appending the keyframe index to the keyframe

list when merging a keyframe into the volume.

slide-19
SLIDE 19

Optimization

  • Optimize the pose graph. (G2O optimizer)
  • cost function:
  • Novel matching scheme - for each keyframe:
  • find its corresponding surface voxels by back-projection.
  • get keyframe list and project the voxel back to all

corresponding keyframes.

  • 2D local search and find the nearest 3D point.
  • After all these, recreate the TSDF.
slide-20
SLIDE 20

Semantic KinectFusion

  • Extracting 3D features. (In experiments: SIFT3D, Color-SHOT.)
  • “A Representation for 3-D Surface Matching”
  • “A combined texture-shape descriptor for enhanced 3D feature matching”
  • Generate set of candidate hypotheses on objects’ presence.
  • RANSAC-based 6DOF pose estimation.
  • Validation graph
  • edges:
  • virtual edges:
  • Clear wrong hypotheses and include all the constraints into the global graph.
slide-21
SLIDE 21

System Overview

slide-22
SLIDE 22

Results

slide-23
SLIDE 23

Results

slide-24
SLIDE 24

Results

slide-25
SLIDE 25

Object-oriented SLAM - SLAM++

  • Stronger assumptions:
  • World has intrinsic symmetry - repetitive objects.
  • Pre-define the objects.
  • Video: https://www.youtube.com/watch?v=tmrAh1CqCRo
slide-26
SLIDE 26

Characteristics of Object SLAM

  • The map only contains a few discrete entities.
  • Enables the possibility to jointly optimize over all
  • bject positions to make globally consistent map.
  • Tracking one object in 6DOF is enough to localize a

camera.

  • Lost camera or loop closure detection can be

performance based on a small number of object measurements.

slide-27
SLIDE 27

SLAM Map Representation

  • Graph representation:
  • object - world pose
  • object - camera pose
  • camera - world pose
  • additional factors:
  • camera - camera pose (ICP).
  • structural priors: objects must be grounded on the same

plane.

slide-28
SLIDE 28

Real-Time Object Recognition

  • Point-Pair Features (PPFs).
  • 4D descriptors of relative position and normals of pairs of
  • riented points.
  • Randomly sample points and pair up in all possible

combinations to generate PPFs.

  • Matching against models and producing a vote for each match.
  • Active Object Search:
  • Generate a mask image space from view prediction. (avoid
  • cclusion)
slide-29
SLIDE 29

Active Object Search

slide-30
SLIDE 30

Camera Tracking and Object Pose Estimation

  • Camera model tracking.
  • In KinectFusion, track against incomplete models at early stage.
  • For SLAM++, track against high quality models.
  • Tracking for model initialization. (reject incorrect objects)
  • Given a candidate object and detected pose, run camera-model

ICP estimation on the detected object pose.

  • Camera-Object pose constraints.
  • Run dense ICP estimate between the live frame and each visible

model object.

slide-31
SLIDE 31

Relocalization

slide-32
SLIDE 32

Loop Closure

  • Small loops:
  • standard ICP tracking mechanism
  • Large loops:
  • matching fragments within the main long-term

graph (same as relocalization)

slide-33
SLIDE 33

Loop Closure

slide-34
SLIDE 34

Real-world Example

Whelan, Thomas, et al. "3D mapping, localisation and object retrieval using low cost robotic platforms: A robotic search engine for the real-world." https://www.youtube.com/watch?v=XqDUniEY954

slide-35
SLIDE 35

System Overview

slide-36
SLIDE 36

Steps

  • Avoidance-based exploration.
  • Dense SLAM.
  • Planar simplification.
  • Object detection.
  • Path planning.
  • Onboard localization and control.
slide-37
SLIDE 37

Problem: compounding of failure rates

  • Frame-drops in the wireless streaming of raw-

RGB-D image sequence.

  • Failure of the planar segmentation algorithm.
  • Failure of object segmentation -> recognition due

to noise. Conclusion: techniques involved are quite robust

  • alone. But the reliability is not enough when combining

them together into a complete framework.