Direct Visual SLAM Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps
Reminder: SLAM • S imultaneous L ocalization a nd M apping. • On mobile interested primarily in Visual SLAM (VSLAM). • Sometimes called Mono SLAM if there is only one camera. • Can be viewed as an online SfM problem.
Reminder: VO vs VSLAM vs SFM SFM VSLAM VO Taken from D. Scaramuzza “Tutorial on Visual Odometry”. – –
Reminder: Keyframe-based SLAM Keyframe 1 Keyframe 2 Current frame New keyframe Initial pointcloud New triangulated points Taken from D. Scaramuzza “Tutorial on Visual Odometry”. [Nister’04, PTAM’07, LIBVISO’08, LSD SLAM’14 SVO’14, ORB SLAM’15] – –
A Tale of Two Threads f θ f { θ f } F f =1 to Adapted from S. Lovegrove & A. J. Davison “Real-Time Spherical Mosaicing using Whole Image Alignment”, ECCV 2010.
Example - ORB SLAM “Thread 1 - Visual Odometry” “Thread 2 - Local BA” R. Mur-Artal, J. M. M. Montiel, J. D. Tardos, “ORB-SLAM: a Versatile and Accurate Monocular SLAM System” IEEE Trans. Robotics 2015.
Today • Direct vs. Feature based methods • Dense SLAM • Semi-Dense SLAM
ECCV 1999 8
Feature-Based Methods
Feature-Based Methods Image is reduced to a sparse set of keypoints Usually matched with feature descriptors
Feature-Based Advantages Vanishing point Mikolajczyk, 2007 Mikolajczyk, 2007 Easier transition from Wide-baseline matching Illumination images to geometry invariance
Feature-Based Advantages Vanishing point Mikolajczyk, 2007 Mikolajczyk, 2007 Easier transition from Wide-baseline matching Illumination images to geometry invariance Using invariant descriptors
Feature-Based Challenges • Creates only a sparse map of the world. • Does not sample across all available image data - edges & weak intensities. • Needs high-resolution camera mode (bad for efficiency and battery life). Direct Method Feature-Based Method (ours) (ORB+RANSAC)
Feature-Based Challenges • Creates only a sparse map of the world. • Does not sample across all available image data - edges & weak intensities. • Needs high-resolution camera mode (bad for efficiency and battery life). Direct Method Feature-Based Method (ours) (ORB+RANSAC)
Feature-Based Challenges • Creates only a sparse map of the world. • Does not sample across all available image data - edges & weak intensities. • Needs high-resolution camera mode (bad for efficiency and battery life). Direct Method Feature-Based Method (ours) (ORB+RANSAC)
Today • Direct vs. Feature based methods • Dense SLAM • Semi-Dense SLAM
Reminder: Warp Functions x “Template” “Source”
Reminder: Warp Functions W ( x ; p ) x x 0 “Template” “Source” Our goal is to find the warp parameter vector ! p x = coordinate in template [ x, y ] T x 0 = corresponding coordinate in source [ x 0 , y 0 ] T W ( x ; p ) = warping function such that x 0 = W ( x ; p ) p = parameter vector describing warp
Review: Pinhole Camera Instead model impossible but more convenient Real camera image virtual image is inverted Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Relating Points between Views First camera: Second camera: Substituting: Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Pinhole Warp Function • One can represent the relationship of points between views of pinhole cameras as a warp function, W ( x ; θ , λ ) = π ( λ Ω ˜ x + τ ) “warp function” 0 1 u ✓ ◆ u/w v A = “pinhole projection” π @ v/w w � Ω τ T = ∈ SE(3) “pose parameters” 0 T 1
Pinhole Warp Function • One can represent the relationship of points between views of pinhole cameras as a warp function, W ( x ; θ , λ ) = π ( λ Ω ˜ x + τ ) “warp function” 0 1 u ✓ ◆ u/w v A = “pinhole projection” π @ v/w w 6 ! X T ( θ ) = exp θ i A i ∈ SE(3) “pose parameters” i =1
𝑈 𝑙 Photometric Relationship • We can employ this warp function to now express the problem as, T ( x n ) = I ( W{ x n ; θ f , λ n } ) 𝐽 𝑙−1 𝐽 𝑙 𝑈 𝑙 θ f T “keyframe template” I f “f- th image” λ n ˜ x n 𝑈 𝑙 “An Invitation to 3D Vision”, Ma,
Linearizing the Image for Pose T ( x n ) = I f ( W{ x n ; θ f � ∆ θ , λ n } ) ≈ I f ( W{ x n ; θ f , λ n } ) + A f n ∆ θ f Baker, Simon, and Iain Matthews. "Equivalence and efficiency of image alignment algorithms." CVPR 2001.
Linearizing the Image for Pose T ( x n ) = I f ( W{ x n ; θ f � ∆ θ , λ n } ) ≈ I f ( W{ x n ; θ f , λ n } ) + A f n ∆ θ f Baker, Simon, and Iain Matthews. "Equivalence and efficiency of image alignment algorithms." CVPR 2001.
𝑈 𝑙 Direct Camera Tracking { λ n } N • Assuming known depths , n =1 N X ||T ( x n ) − I f ( W{ x n ; θ f , λ n } ) − A f 𝐽 𝑙−1 n ∆ θ f || 2 arg min 𝐽 𝑙 2 ∆ θ f n =1 𝑈 𝑙 θ f T “keyframe template” I f “f- th image” λ n ˜ x n 𝑈 𝑙 “An Invitation to 3D Vision”, Ma,
Direct Camera Tracking • Most methods employ a variant of the Lucas-Kanade algorithm for estimating camera pose. • Engel et al. demonstrated using a “dense” number of points does not improve the performance of camera tracking (i.e pose estimation). • Advantage of density stems mainly from the map estimation. J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016. J. Engel, T. Schops, and D. Cremers. LSD-SLAM: Large-scale direct monocular slam. In European Conference on Computer Vision, pages 834–849. Springer, 2014.
Direct Camera Tracking • Most methods employ a variant of the Lucas-Kanade algorithm for estimating camera pose. • Engel et al. demonstrated using a “dense” number of points does not improve the performance of camera tracking (i.e pose estimation). • Advantage of density stems mainly from the map estimation. How do we update the depths? J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016. J. Engel, T. Schops, and D. Cremers. LSD-SLAM: Large-scale direct monocular slam. In European Conference on Computer Vision, pages 834–849. Springer, 2014.
Direct Map Estimation { θ f } F • Assuming known pose parameters , f =1 • Naively we could solve for the depths independently, λ n = arg min λ C ( x n , λ ) C ( x , λ ) F C ( x , λ ) = 1 X ||T ( x ) − I f ( W{ x ; θ f , λ } ) || 1 C ( x , λ ) F f =1 R. A. Newcombe, S. J. Lovegrove and A. J. Davison "DTAM: Dense Tracking and Mapping in Real-Time”, ICCV 2011.
DTAM • Newcombe et al. proposed - D ense T racking a nd M apping. • Attempted to substitute the feature based tracking and mapping modules of traditional VSLAM (e.g. PTAM) for dense methods. -1 λ min -1 λ max x T I F f = 1 : F R. A. Newcombe, S. J. Lovegrove and A. J. Davison "DTAM: Dense Tracking and Mapping in Real-Time”, ICCV 2011.
DTAM • Newcombe et al. proposed - D ense T racking a nd M apping. • Attempted to substitute the feature based tracking and mapping modules of traditional VSLAM (e.g. PTAM) for dense methods. -1 λ min “Sample across inverse depths” -1 λ max x T I F f = 1 : F R. A. Newcombe, S. J. Lovegrove and A. J. Davison "DTAM: Dense Tracking and Mapping in Real-Time”, ICCV 2011.
DTAM - Example C ( a , λ ) λ -1 photometric functions and the resulting R. A. Newcombe, S. J. Lovegrove and A. J. Davison "DTAM: Dense Tracking and Mapping in Real-Time”, ICCV 2011.
DTAM - Example C ( b , λ ) -1 λ R. A. Newcombe, S. J. Lovegrove and A. J. Davison "DTAM: Dense Tracking and Mapping in Real-Time”, ICCV 2011.
DTAM - Example C ( c , λ ) -1 λ are shown for three example R. A. Newcombe, S. J. Lovegrove and A. J. Davison "DTAM: Dense Tracking and Mapping in Real-Time”, ICCV 2011.
DTAM - Geometric Prior • Newcombe et al. proposed the employment of a geometric prior on depths, N X -1 arg min C ( x n , λ n ) + g ( x n ) || r ⇤ λ n || ✏ λ n =1 g ( x ) = exp( � α || r T ( x ) || β 2 )
DTAM - Geometric Prior • Newcombe et al. proposed the employment of a geometric prior on depths, N X -1 arg min C ( x n , λ n ) + g ( x n ) || r ⇤ λ n || ✏ λ n =1 g ( x ) = exp( � α || r T ( x ) || β 2 ) What do you think the prior is doing?
DTAM - Video
DTAM - Video
Recommend
More recommend