Real-Time Structure and Object- Model Aware Sparse SLAM ARC Centre - - PowerPoint PPT Presentation

real time structure and object model aware sparse slam
SMART_READER_LITE
LIVE PREVIEW

Real-Time Structure and Object- Model Aware Sparse SLAM ARC Centre - - PowerPoint PPT Presentation

Real-Time Structure and Object- Model Aware Sparse SLAM ARC Centre of Excellence for Robotic Vision Mehdi Hosseinzadeh Australian Centre for Robotic Vision The University of Adelaide Invited Talk, Second LPM Workshop, 23 May 2019 ICRA 2019


slide-1
SLIDE 1

www.roboticvision.or g

roboticvision.org

ARC Centre of Excellence for Robotic Vision

Real-Time Structure and Object- Model Aware Sparse SLAM

Mehdi Hosseinzadeh

Australian Centre for Robotic Vision The University of Adelaide Invited Talk, Second LPM Workshop, 23 May 2019 ICRA 2019 Montreal, Canada

slide-2
SLIDE 2

www.roboticvision.or g

roboticvision.org

We want to build a robot that we can ask “Go get me a spoon!”

slide-3
SLIDE 3

www.roboticvision.or g

roboticvision.org

“The goal of semantic SLAM is to create maps that include meanings, both to robots and human. Maps that include semantic information make it easier for robots and human to communicate and reason about goals.” Meanings = Semantics, affordance, relations, …

slide-4
SLIDE 4

www.roboticvision.or g

roboticvision.org

Q: How to add semantics/objects to the map?

slide-5
SLIDE 5

www.roboticvision.or g

roboticvision.org

Different approaches

  • Semantic Mapping

– Offline approach – Online/Incremental approach

  • Semantic SLAM

– Indirect methods – Direct methods

slide-6
SLIDE 6

www.roboticvision.or g

roboticvision.org

Semantic Mapping

  • Incorporate semantics in the mapping

without informing localization

  • SemanticFusion, …

!" !#

$% &" &# '"

(

')*"

(

'#

(

')

(

+" +, +# +- ." .)*" /#,)

slide-7
SLIDE 7

www.roboticvision.or g

roboticvision.org

Semantic Mapping

  • Offline approach

– Map reconstruction followed by 3D semantic segmentation

  • Online/Incremental approach

– Incremental map reconstruction and semantic segmentation

slide-8
SLIDE 8

www.roboticvision.or g

roboticvision.org

Semantic SLAM

  • Semantics/Objects inform also

localization

– Indirectly by improving the data- association

  • Changing the topology of the factor graph

– Directly by involving in “object space”

  • ptimization
slide-9
SLIDE 9

www.roboticvision.or g

roboticvision.org

Indirect Semantic SLAM

  • High-level landmarks in the graph are typically coordinate

frames connected to the object

  • SLAM++, …

ICP ICP

!" #$ #% &$

'

&()$

'

&%

'

&(

'

*$ *+ *% *,

  • $
  • ()$

Pose/Centre of the Object

.%,(

slide-10
SLIDE 10

www.roboticvision.or g

roboticvision.org

Direct Semantic SLAM

  • High-level landmarks in the graph: latent space representation or

coarse representation of objects

  • Object or structure landmarks are optimized as independent

landmarks directly in BA

Input Reconstruction

Coarse Representation Fine Representation

!" #$ #% &$

'

&()$

'

&%

'

&(

'

*$ *+ *% *,

  • $
  • ()$

.%,(

slide-11
SLIDE 11

www.roboticvision.or g

roboticvision.org

Our Goal

  • Incorporating generic objects as quadrics

– Detected by a real-time deep-learned object

detector

  • Incorporating finer reconstruction of objects

– Reconstructed by a point set CNN – To refine the shape of quadric

  • Incorporating dominant planar structure of

the scene

  • in a sparse key-frame based point-SLAM

– More accurate localization – Semantically rich maps

slide-12
SLIDE 12

www.roboticvision.or g

roboticvision.org

Plane and Quadric Representations in

  • ur sparse SLAM
  • Proposed Dual Quadric Representation
  • sparse SLAM compatible
  • allows online updates
  • allows semantic constraints
  • estimates rough extent and orientation
  • bounding boxes as observations in images
  • Structure of the scene

– Planes

  • normalized homogenous representations
  • Priors

– Manhattan constraints – Affordance constraints – Shape priors

slide-13
SLIDE 13

www.roboticvision.or g

roboticvision.org

Quadric Geometry

  • Point Quadric 𝑅
  • a quadric surface in 3D space (ellipsoids, …) can be represented by

a homogeneous quadratic form defined on the 3D projective space ℙ# which satisfies the following equation § 𝑦%𝑅𝑦 = 0

  • the relationship between a point quadric and its projection into an

image plane (a conic) is not straightforward.

  • Dual Quadric 𝑅∗
  • is represented as the envelope of a set of tangent planes, viz:
  • 𝜌%𝑅∗𝜌 = 0
  • 𝜌 is a plane tangent to the point quadric 𝑅
slide-14
SLIDE 14

www.roboticvision.or g

roboticvision.org

Quadric Geometry

  • Dual quadric (ellipsoid) decomposition

𝑅∗ = 𝑈𝑅+

∗𝑈% =

𝑆 𝑢 0% 1 𝑀𝑀% 0% −1 𝑆% 𝑢% 1

  • Decoupled update in the underlying

manifolds

𝑅∗ ⊕ ∆𝑅∗= 𝑈, 𝑀 ⊕ ∆𝑈, ∆𝑀 = (𝑈. ∆𝑈, 𝑀 + ∆𝑀)

slide-15
SLIDE 15

www.roboticvision.or g

roboticvision.org

Landmarks and Constraints

Object Point Plane Camera

3D Points:

  • ORB features

3D Planes:

  • Minimal rep (normalized

homogenous plane)

  • Matched by the 3d geometry of

the planes and inlier matched points Objects:

  • Represented by a quadric (9D)
  • Decomposed to
  • Tracked based on:

○ Inlier matched points and semantics Supporting/Tangency Affordance:

  • Imposed based on:

○ geometric tangency in the map ○ vicinity of the semantic

  • bjects in the frame

Manhattan Assumption:

  • Orthogonal planes
  • Parallel planes

conic observation reprojection error

Point-Plane Constraint

3d plane

  • bservation
slide-16
SLIDE 16

www.roboticvision.or g

roboticvision.org

Quadrics Projective Geometry

𝑣 𝑤

𝑃

𝐃∗ 𝐶∗ 𝐶=>?

Observation Model

slide-17
SLIDE 17

www.roboticvision.or g

roboticvision.org

Quadric Initialization

Step 1:

Enclosing sphere of 3D points

Step 2:

Optimize the Observation factor 𝒈𝑹 independentely 3D Quadric is ready for initialization in factor graph

slide-18
SLIDE 18

www.roboticvision.or g

roboticvision.org

CNN Our SLAM . . . Normalized Point Cloud

Minimum Enclosing Ellipsoid Quadric Representation

Registration

R, t, s

Point Cloud in Quadric Representation

Quadric Prior Factor

Point-Cloud Reconstruction and Shape Priors

slide-19
SLIDE 19

www.roboticvision.or g

roboticvision.org

Point-Cloud Reconstructions

slide-20
SLIDE 20

www.roboticvision.or g

roboticvision.org

Monocular Plane Detection

Clustering in Normal Space

RGB Input Joint CNN

!

Clustering in Depth

" #

Regressing to Planar Regions

First Plane Hypothesis Second Plane Hypothesis $% $& '& '% Agreement

  • f

Hypotheses

Plane Detection

$ = (+, -, ., /)1

slide-21
SLIDE 21

www.roboticvision.or g

roboticvision.org

RGB Frame

ORB-SLAM2 Point Feature Extraction and Matching

Surface Normal

Joint CNN Network1

Plane Detection

Local Map Tracking Camera Pose Optimisation

Add KeyFrame Local Bundle Adjustment Create Local Map with Points, Planes, Quadrics, Object Point-Clouds Bag of Words Loop Detection Update Local Map of Points, Planes, Quadrics Global Bundle Adjustment

Estimated Map

YOLOv3 Object Detector

Object Tracking

Generate Hypotheses for Constraints

Point-Cloud Reconstructor CNN2

Registration

Depth

Semantic Segmentation

Plane Matching

Pipeline of our system

  • 1V. Nekrasov, T. Dharmasiri, A. Spek, T. Drummond, C. Shen, and I. Reid, “Real-time joint semantic segmentation and depth estimation using

asymmetric annotations,” ICRA 2019.

  • 2H. Fan, H. Su, and L. J. Guibas, “A point set generation network for 3d object reconstruction from a single image.”, CVPR 2017.
slide-22
SLIDE 22

www.roboticvision.or g

roboticvision.org

Results

ORB-Features Detected Objects Segmented Planes Reconstructed Map (Side) Reconstructed Map (Top)

nyu

  • ffice_1

nyu

  • ffice 1b

fr2/desk

slide-23
SLIDE 23

www.roboticvision.or g

roboticvision.org

Different plane detectors

Reconstructed map for fr1/xyz with the baseline plane detector Reconstructed map for fr1/xyz with the proposed plane detector

slide-24
SLIDE 24

www.roboticvision.or g

roboticvision.org

Outdoor large scale

Reconstructed map for KITTI-7 sequence with our SLAM system from different viewpoints with/without rendering quadrics. Proposed object observation and point-cloud-induced prior factors are effective in this reconstruction.

slide-25
SLIDE 25

www.roboticvision.or g

roboticvision.org

Quantitative Results

Ablation study against point-based monocular ORB-SLAM2

Table: Comparison against monocular ORB-SLAM2. PP, PP+M, PO, and PPO+MS mean points-planes only, points- planes+Manhattan constraint, points-objects only, and all of the landmarks with Manhattan and supporting constraints, respectively. RMSE is reported for ATE in cm for 7 sequences in TUM datasets. Numbers in bold in each row represent the best performance for each sequence. Numbers in [ ] show the percentage of improvement over monocular ORB-SLAM2.

slide-26
SLIDE 26

www.roboticvision.or g

roboticvision.org

Challenges

  • Engineering of a multicomponent system
  • Ad-hoc semantic/geometric factors
  • Initialization matters …

– Quadrics

  • Partial detections and occlusions
  • Missing real orientation of objects
slide-27
SLIDE 27

www.roboticvision.or g

roboticvision.org

Challenges

slide-28
SLIDE 28

www.roboticvision.or g

roboticvision.org

Future Directions

  • Additional learned pose/orientation factors

Object Point Plane Camera

!

!

#

!

$

!

%

!

&

!

'

!

(

!

)*+,-

.∗ 1 23

4

56789

:

6D Pose

*Left image from PoseCNN
slide-29
SLIDE 29

www.roboticvision.or g

roboticvision.org

Future Directions

  • Topology of the factor graph

(observation/constraint factors) based on scene graphs

*Left image from VGfM
slide-30
SLIDE 30

www.roboticvision.or g

roboticvision.org

Demo 1

Real-Time Monocular Object-Model Aware Sparse SLAM (ICRA 2019)

slide-31
SLIDE 31

www.roboticvision.or g

roboticvision.org

Demo 2

Real-Time Monocular Object-Model Aware Sparse SLAM (ICRA 2019)

slide-32
SLIDE 32

www.roboticvision.or g

roboticvision.org