Visual SLAM for Mobile Instructor - Simon Lucey 16-623 - Designing - PowerPoint PPT Presentation

Visual SLAM for Mobile Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps

Example of SLAM for AR Taken from: H. Liu et al. “Robust Keyframe-based Monocular SLAM for Augmented Reality”, ISMAR 2016.

What is SLAM?? • S imultaneous L ocalization a nd M apping. • On mobile interested primarily in Visual SLAM (VSLAM). • Sometimes called Mono SLAM if there is only one camera. • Can be viewed as an online SfM problem.

Today • SfM - Bundle Adjustment • VSLAM - Keyframe vs. Filtering • Visual Odometry • Loop Closure

Reminder - Bundle Adjustment The cathedral dataset: [ Ω i , τ i ] • 480 camera matrices • Total dof = 480 × (3 + 3) = 2880 • 91178 3D points. • Total dof = 91178 × 3 = 273543 Adapted from: Optimization Methods in Computer Vision. Anders Eriksson

Reminder - Two view reconstruction Start with pair of images taken from slightly different viewpoints

Reminder - Two view reconstruction Find features using a corner detection algorithm

Reminder - Two view reconstruction Match features using a greedy algorithm

Reminder - Two view reconstruction Fit fundamental matrix using robust algorithm such as RANSAC

Reminder - Two view reconstruction Find matching points that agree with the fundamental matrix

Reminder - Two view reconstruction • Extract essential matrix from fundamental matrix. • Extract rotation and translation from essential matrix. Ω τ • Reconstruct the 3D positions w of points. x = Ω w + τ λ ˜ • We refer to these matrices as belonging to the Special Euclidean Group - SE(3).  Ω � τ T = ∈ SE(3) 0 T 1

Reminder: Lie Algebra • Exponential maps on the SO(3), SL(3) and SE(3) groups are related to the much broader topic of Lie Algebra. • More details on this topic can be found at in Murray et al. 1994.  � Ω τ T = ∈ SE(3) 0 T 1 “Sophus Lie” θ

Reminder: Lie Algebra • Exponential maps on the SO(3), SL(3) and SE(3) groups are related to the much broader topic of Lie Algebra. • More details on this topic can be found at in Murray et al. 1994. 6 ! X T ( θ ) = exp θ i A i ∈ SE(3) i =1 “Sophus Lie” θ

SfM - Bundle Adjustment F N X X n − π ( w n ; θ f ) || 2 || x f arg min 2 w , θ n =1 f =1 x ← 2D projection w ← 3D point θ ← extrinsics N ← no. of points π ← projection function F ← no. of frames

SfM - Linearization  ∆ θ f � π ( w n + ∆ w n ; θ f � ∆ θ f ) ⇡ π ( w n ; θ f ) + J f n ∆ w n

SfM - Linearization  ∆ θ f � π ( w n + ∆ w n ; θ f � ∆ θ f ) ⇡ π ( w n ; θ f ) + J f n ∆ w n why not additive??

SfM - Linearization  ∆ θ f � π ( w n + ∆ w n ; θ f � ∆ θ f ) ⇡ π ( w n ; θ f ) + J f n ∆ w n F N  ∆ θ f � X X || x f n − π ( w n ; θ f ) − J f || 2 arg min n 2 ∆ w n ∆ θ , ∆ w n =1 f =1 x ← 2D projection w ← 3D point θ ← extrinsics N ← no. of points π ← projection function F ← no. of frames

Visibility of Points ρ 1 ρ F   1 1 . . . . . ... Υ =   . . “visibility matrix” . .   ρ 1 ρ F N N . . .

SfM - Bundle Adjustment  ∆ θ f � π ( w n + ∆ w n ; θ f � ∆ θ f ) ⇡ π ( w n ; θ f ) + J f n ∆ w n F N  ∆ θ f � X X ρ f n || x f n − π ( w n ; θ f ) − J f || 2 arg min n 2 ∆ w n ∆ θ , ∆ w n =1 f =1 x ← 2D projection w ← 3D point θ ← extrinsics N ← no. of points π ← projection function ρ → visibility ∈ [0 , 1] F ← no. of frames

SfM - Bundle Adjustment  � ∆ θ || 2 ∆ θ , ∆ w || b − A arg min 2 ∆ w poses landmarks • Can be solved efficiently using sparse linear solvers such as, • Google Ceres Solver - http://ceres-solver.org • G2o - https://openslam.org/g2o.html . Θ � 𝑞 𝑨 Θ 𝜖ℎ 𝑗 • Then iteratively apply GN or LM � ) 𝜖Θ� ℎ 𝑗 ( Θ 𝑨∈𝑎 algorithm. � Θ A b Θ � ℎ 𝑗 Θ − 𝑨 2 𝑗 e nt 𝜄 𝐵𝐵 − 𝑐 2 n

SfM - Bundle Adjustment  � ∆ θ || 2 ∆ θ , ∆ w || b − A arg min 2 ∆ w 6 F + 3 N poses landmarks • Can be solved efficiently using sparse linear solvers such as, • Google Ceres Solver - http://ceres-solver.org • G2o - https://openslam.org/g2o.html . Θ � 𝑞 𝑨 Θ 𝜖ℎ 𝑗 • Then iteratively apply GN or LM � ) 2 FN 𝜖Θ� ℎ 𝑗 ( Θ 𝑨∈𝑎 algorithm. � Θ A b Θ � ℎ 𝑗 Θ − 𝑨 2 𝑗 e nt 𝜄 𝐵𝐵 − 𝑐 2 n

Reminder: Gauss-Newton Algorithm • Gauss-Newton (GN) algorithm common strategy for optimizing non-linear least-squares problems. y || x − F ( y ) || 2 arg min 2 s.t. F : R N → R M Step 1: “Carl Friedrich Gauss” ∆ y || x − F ( y ) − ∂ F ( y ) ∂ y T ∆ y || 2 arg min 2 Step 2: y → y + ∆ y keep applying steps until converges. ∆ y “Isaac Newton” 18

Reminder: Gauss-Newton Algorithm • Gauss-Newton (GN) algorithm common strategy for optimizing non-linear least-squares problems. y || x − F ( y ) || 2 arg min 2 s.t. F : R N → R M Step 1: “Carl Friedrich Gauss” ∆ y || x − F ( y ) − ∂ F ( y ) ∂ y T ∆ y || 2 arg min 2 Step 2: y → y + ∆ y “Is the update additive?” keep applying steps until converges. ∆ y “Isaac Newton” 18

Today • SfM - Bundle Adjustment • VSLAM - Keyframe vs. Filtering • Visual Odometry • Loop Closure

Mono SLAM = Online SFM • Monocular SLAM is just another name for “online” SFM. • If computation was not an issue, one would just apply Bundle Adjustment after every new frame F N X X n − π ( w n ; θ f ) || 2 || x f arg min 2 w , θ n =1 f =1 x ← 2D projection w ← 3D point θ ← extrinsics N ← no. of points π ← projection function F ← no. of frames

Mono SLAM - MRF • One can view the problem of SfM - Bundle Adjustment as doing inference on a Markov Random Field (MRF). • Problem - becomes exponentially harder as times goes on. θ 3 θ 4 T 0 θ 1 θ 2 T 1 T 2 T 3 “edges based on visibility” ρ x x 2 x 3 x 4 x 5 x 6 w 1 w 2 w 3 w 4 w 5 w 6 1 H. Strasdat, J. M. M. Montiel, and A. J. Davison, “Visual SLAM: Why filter?” Image and Vision Computing, vol. 30, no. 2, pp. 65–77, 2012. .

Mono SLAM - Filtering • Classic way of resolving this was to pose BA problem as a filter - such as an Extended Kalman Filter (EKF). • Problem - Wastes processing time on frames that added very little information. θ 4 T 0 T 1 T 2 T 3 θ 1 θ 2 θ 3 2 3 “marginalizing out previous poses also results in unwanted direct connections between 3D points” x 3 x x 2 x 4 x 5 x 6 w 1 w 2 w 3 w 4 w 5 w 6 1 1 H. Strasdat, J. M. M. Montiel, and A. J. Davison, “Visual SLAM: Why filter?” Image and Vision Computing, vol. 30, no. 2, pp. 65–77, 2012. .

� Mono SLAM - Filtering • Filtering approaches are often times problematic (e.g. think when the device stops moving). • When frames are taken at nearby positions compared to the scene distance, 3D points will exhibit large uncertainty. Taken from D. Scaramuzza “Tutorial on Visual Odometry”. – –

Mono SLAM - Keyframe • A better strategy is to employ keyframe BA . • Made popular by Klein & Murray’s - Parallel Tracking and Mapping (PTAM) algorithm. T 0 T 1 T 2 T 3 θ 4 θ 1 θ 2 θ 3 “remove all but a small subset of keyframes” x 3 x x 2 x 4 x 5 x 6 w 1 w 2 w 3 w 4 w 5 w 6 6 1 G. Klein and D. Murray, “Parallel tracking and mapping for small AR workspaces”, ISMAR 2007. H. Strasdat, J. M. M. Montiel, and A. J. Davison, “Visual SLAM: Why filter?” Image and Vision Computing, vol. 30, no. 2, pp. 65–77, 2012. .

Keyframe Selection � • One way to avoid this consists of skipping frames until the � average uncertainty of the 3D points decreases below a � certain threshold. The selected frames are called keyframes . � • Rule of thumb: add a keyframe when, � keyframe distance > threshold (~10-20 %) when � average-depth . . . Taken from D. Scaramuzza “Tutorial on Visual Odometry”. – – – –

Keyframe-based SLAM Keyframe 1 Keyframe 2 Current frame New keyframe Initial pointcloud New triangulated points Taken from D. Scaramuzza “Tutorial on Visual Odometry”. [Nister’04, PTAM’07, LIBVISO’08, LSD SLAM’14 SVO’14, ORB SLAM’15] – –

Visual SLAM for Mobile Instructor - Simon Lucey 16-623 - Designing - PowerPoint PPT Presentation

Visual SLAM for Mobile Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps Example of SLAM for AR Taken from: H. Liu et al. Robust Keyframe-based Monocular SLAM for Augmented Reality, ISMAR 2016. Example of SLAM for AR

SLAM: COMPARATIVE APPROACH Khooshal Saurty 1 OUTLINE Introduction - What is SLAM? EKF SLAM

SLAM@NVIDIA Kari Pulli| Senior Director of Research Overview Keyframe-based SlAM 3D

Direct Visual SLAM Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps Reminder:

Visual SLAM for Mobile Instructor - Simon Lucey 16-423 - Designing Computer Vision Apps Example

Visual SLAM An Overview L. Freda ALCOR Lab DIAG University of Rome La Sapienza May 3,

I ntroduction to Mobile Robotics SLAM Grid-based FastSLAM Wolfram Burgard 1 The SLAM

Monocular Visual-Inertial SLAM for ISMAR SLAM Challenge Jie PAN Shaozu CAO, Jie PAN, Jieqi SHI,

Introduction to Mobile Robotics SLAM Grid-based FastSLAM Wolfram Burgard, Maren Bennewitz,

Introduction to Mobile Robotics SLAM: Simultaneous Localization and Mapping Wolfram Burgard,

Introduction to Mobile Robotics SLAM: Simultaneous Localization and Mapping Wolfram Burgard,

I ntroduction to Mobile Robotics SLAM Landm ark-based FastSLAM Wolfram Burgard Partial

Introduction to Mobile Robotics SLAM: Simultaneous Localization and Mapping Wolfram Burgard,

Introduction to Mobile Robotics SLAM Grid-based FastSLAM Wolfram Burgard, Cyrill Stachniss,

SLAM Landmark-based FastSLAM Wolfram Burgard, Diego Tipaldi Partial slide courtesy of Mike

SLAM Landmark-based FastSLAM Wolfram Burgard, Maren Bennewitz, Diego Tipaldi, Luciano

Visual SLAM with Multi-Fisheye Camera Systems Stefan Hinz, Steffen Urban Institute of

COMPUTER VISION FOR ROBOT NAVIGATION Sanketh Shetty Computer Vision and Robotics Laboratory

Single-View and Multi-View Planar Models for Dense Monocular Mapping Alejo Concha, Jos M.

Unsupervised Monocular Depth Estimation CNN Robust to Training Data Diversity Valery

DeepCap: Monocular Human Performance Capture Using Weak Supervision Marc Habermann, Weipeng Xu ,

Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia

* * 2 :

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris Kitani June 15, 2020 1 3D

Inferring 3D Cues from a Single Image Wei- -Cheng Su Cheng Su Wei Motivation 2 Human can