Geometric and semantic SLAM using high level features Shichao Yang - - PowerPoint PPT Presentation
Geometric and semantic SLAM using high level features Shichao Yang - - PowerPoint PPT Presentation
Geometric and semantic SLAM using high level features Shichao Yang Michael Kaess Sebastian Scherer Autonomous Robots q Widely used in searching, monitoring, mapping etc. Bridge inspection Riverine mapping q Focus on the monocular camera. 2
q Widely used in searching, monitoring, mapping etc. q Focus on the monocular camera.
Autonomous Robots
2
Bridge inspection Riverine mapping
q Simultaneous Localization and Mapping
SLAM
3
ORB SLAM, 2015 R Mur-Artal et al LSD, DSO, 2016 Jakob et al.
q Traditional SLAM might fail in challenging low-texture cases.
SLAM not enough?
4
DSO ORB SLAM
q Need objects and planes, in addition to points. q Reason position of 3D objects, layouts.
SLAM not enough?
5
Autonomous Driving
Mousavian, 2016
Virtual Reality
Placing furniture
Scene understanding
Hedau, Hoiem, 2010
Robotic Manipulation
Seung-Joon Yi, 2015
Methods
6
Scene understanding SLAM Jointly solve SLAM and scene understanding, demonstrating that they can benefit each other high level features: Lines, planes, objects…
q Line VO/SLAM q Plane SLAM q Object SLAM
Outline
7
q Line VO/SLAM q Plane SLAM q Object SLAM
Outline
8
q Line is an important feature
§ Exist in low-texture environments § Provide long range constraints
q Challenges:
§ Line parameterization § Sensitive to occlusion, less reliable than points
Line VO/SLAM
9
Feature points Lines
Shichao Yang,, Sebastian Scherer. " Direct Monocular Odometry Using Points and Lines." ICRA, 2017
q Only geometric error
Related Work: Points + Line
10
d d
Albert, et al, ICRA, 2017 Juan, et al. ICCV, 2015 Manohar, et al. ICRA, 2016
Point-point geometric error Line-line geometric error Point-line geometric error
q Point + lines. Two error types. q Contributions:
§ Combine points and lines with two types of error, especially suitable for
low-texture environments
§ Provide an uncertainty analysis and probabilistic fusion in tracking and
mapping
§ Real time VO, outperforming or comparable to existing VO
Proposed Line VO
11
Direct method Point-point photometric error d Feature-based method Point-line geometric error !" − !%
q Pipeline, as an extension of point based SDVO[1]
Proposed Line VO
12
[1] Semi-dense Visual Odometry for a Monocular Camera, Jakob Engel, et al. ICCV. 2013
Images
Point extraction Line detection
Tracking Mapping Estimate camera pose by minimizing two kinds of errors. Estimate keyframe’s depth through line regularization High gradient pixels Line pixels
Experiments - Line VO
13
Relative Position Error (cm/s) on TUM Dataset
q Datasets with various textures
Experiments - Line VO
14
https://youtu.be/wu4jL2jQEac
q Line VO/SLAM q Plane SLAM q Object SLAM
Outline
15
q Manhattan corridors §
Similar layout structures
§
Low-texture: few visual features Difficult for traditional v-SLAM
Introduction - Plane SLAM
16
SLAM with Layout Planes IROS, 2016
Texture-less but structured corridor. Sparse and inaccurate map of ORB-SLAM
Related Work – Layout Understanding
17 Hoiem, 2007
Decision tree segmentation + pop up
Hedau, 2009
Cuboidal room using vanishing points Fixed corridor models.
Lee, 2009 §
Usually works for Manhattan box environments or fixed corridor configurations, view points
§
Not real time.
q Sequential approach, solving problems separately.
Related Work - SLAM + Layout
18
Scene understanding SLAM Scene understanding SLAM Point Cloud Detect plane and object 3D Layout Post dense mapping
Sid Yingze Bao. 2014 Concha, Alejo, 2015
Limitations
§
One module fails, the other also fail.
Proposed methods
19
q Contributions
§
Jointly optimize scene layouts with camera poses in SLAM framework and large environments for the first time.
§
Real time system applicable for robot navigation. Scene understanding SLAM
q Layout plane extraction from single image
Plane model from Single Image
20
Shichao Yang, Daniel Maturana, Sebastian Scherer. " Real-time 3D scene layout from a single image using convolutional neural networks." ICRA, 2016
Ground segmentation Boundary line fitting Pop up 3D model
q Generalize to various environment structure
§ Wall Ground
Plane model from Single Image
Input Our Hoiem, 2007 Hedau, 2009 Lee, 2009
21
§
Fast, in real time 60Hz
§
More accurate and robust to various environments.
Plane model from Single Image
22
4x speed
https://youtu.be/2CvFHy5jk1c
q Previous method q Improvement
§
Plane matches true layout, invariant across frames, suitable for SLAM.
§
More accurate 3D model
Plane model from Single Image
23
Ground segmentation Line fitting Pop-up Detect edge Select edge Pop-up
q Optimal edge set selection
§
Submodular problem, greedy solution:
Plane model from Single Image
24
& ← & ∪ argmax
.∉0:0∪{.}∈5
6 7|&
∆ is the marginal cost gain of adding edge 7
Select edge one by one.
All edges Ground segmentation Selected ground edges
max
0⊆; < & , >?: & ∈ !
q Factor Graph q Edges
§
Plane measurement: @A from single image pop-up process
§
Re-pop to update measurement after camera poses changes
q Nodes §
Plane: BA = {D ∈ RF, | D | = 1}. 3 Dof quaternion as minimal representation for manifold optimization[1].
§
Camera Pose: HA 6 Dof SE3
Pop-up Plane SLAM
25
Shichao Yang, Yu Song, Michael Kaess, Sebastian Scherer. " Pop-up SLAM: a Semantic Monocular Plane SLAM for Low-texture Environments." IROS, 2016
q Data association
- Geometry, not visual features due to low-texture
- Plane normal I", I% angle difference.
- Overlapping ratio by projection B% onto B"
q Loop closing
§
Bag of words place recognition
§
Planes have different appearance and size across frames. Landmarks merged after being created for some frames. Need to shift factors.
Pop-up Plane SLAM
26
Shift factors
q Only Plane SLAM sometimes not enough q Point SLAM is not accurate in forward corridor motion with low
parallax
Point-plane Fusion
27
RGB image Pop-up depth map Much better than stereo triangulation
q Depth fusion:
§
Integrate LSD depth JK and pop-up depth JL in a filtering approach: MK
% and ML % are covariance of depth measurements.
Pop-up covariance ML
% computed through error propagation rule.
Point-plane Fusion
28
N MK
%JL + ML %JK
MK
% + ML %
, MK
%ML %
MK
% + ML %
q On public and collected dataset.
Experiments of Plane SLAM
29
https://youtu.be/TOSOWdxmtkw
q Compare with LSD and ORB SLAM
§
On TUM dataset.
Experiments of Plane SLAM
30
Plane Normal error Depth error Depth error<0.1m Value 2.83 6.2cm 86.8% Existing point SLAM fails
q On our data I
Experiments of Plane SLAM
31
LSD SLAM ORB SLAM Depth Enhanced LSD SLAM LSD Pop-up SLAM Input Image Our algorithms
q On our data II
Experiments of Plane SLAM
32
LSD SLAM ORB SLAM Input Image Our algorithms Loop error 0.67%.
q Line VO/SLAM q Plane SLAM q Object SLAM
Outline
33
q SLAM with objects and planes.
Introduction – Object SLAM
34
Plane SLAM Completed work Plane and Object SLAM Proposed work
Related Work – 3D Object Understanding
35
Schwing, 2013 Choi, 2013
Limitations
§
Need prior object CAD model or
§
shape priors. Object aligned with room Prior CAD model
Murthy1, 2017
Keypoint model
Related Work – Object SLAM
36
Bao, Sid Yingze, et al. 2012
SLAM++ (RGBD)
Salas-Moreno, et al. 2013
(Only two image) Limitations
§
Work for small workspace
§
Require known object model
Dorian Gálvez-López, et al. 2016
q Without 3D CAD or keypoint model.
Single image 3D object detection
37
q On TUM sequence (preliminary result)
Object SLAM
38
3D Object detection in single image, without prior object model Multi-view object SLAM Existing point SLAM all fail Each object has 6 DoF pose, and Length, width, height
q SLAM with high level features, from scene understanding. q Improve both state estimation and mapping. q Without prior CAD model or room model.
Conclusion
39
Line Plane Object
Plane Object Points
Image Modified from Salas-Moreno, 2014
q More complicated environment? Support relations? q Jointly points, plane, objects?
Future work
40
Segmentation Intersection Occlusion
41