Direct Visual SLAM
Instructor - Simon Lucey
16-623 - Designing Computer Vision Apps
Direct Visual SLAM Instructor - Simon Lucey 16-623 - Designing - - PowerPoint PPT Presentation
Direct Visual SLAM Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps Reminder: SLAM S imultaneous L ocalization a nd M apping. On mobile interested primarily in Visual SLAM (VSLAM). Sometimes called Mono SLAM if there
Instructor - Simon Lucey
16-623 - Designing Computer Vision Apps
– –
SFM VSLAM VO
Taken from D. Scaramuzza “Tutorial on Visual Odometry”.
– –
[Nister’04, PTAM’07, LIBVISO’08, LSD SLAM’14 SVO’14, ORB SLAM’15]
Keyframe 1 Keyframe 2 Initial pointcloud New triangulated points Current frame New keyframe
Taken from D. Scaramuzza “Tutorial on Visual Odometry”.
f
θf {θf}F
f=1 to
Adapted from S. Lovegrove & A. J. Davison “Real-Time Spherical Mosaicing using Whole Image Alignment”, ECCV 2010.
Monocular SLAM System” IEEE Trans. Robotics 2015.
“Thread 1 - Visual Odometry” “Thread 2 - Local BA”
8
ECCV 1999
Image is reduced to a sparse set of keypoints Usually matched with feature descriptors
Vanishing point
Easier transition from images to geometry Wide-baseline matching
Mikolajczyk, 2007
Illumination invariance
Mikolajczyk, 2007
Vanishing point
Easier transition from images to geometry Wide-baseline matching
Mikolajczyk, 2007
Illumination invariance
Mikolajczyk, 2007
Direct Method (ours) Feature-Based Method (ORB+RANSAC)
the world.
available image data - edges & weak intensities.
mode (bad for efficiency and battery life).
Direct Method (ours) Feature-Based Method (ORB+RANSAC)
the world.
available image data - edges & weak intensities.
mode (bad for efficiency and battery life).
Direct Method (ours) Feature-Based Method (ORB+RANSAC)
the world.
available image data - edges & weak intensities.
mode (bad for efficiency and battery life).
x
“Source” “Template”
Our goal is to find the warp parameter vector ! W(x; p) x W(x; p) = warping function such that x0 = W(x; p) p = parameter vector describing warp x = coordinate in template [x, y]T x0 = corresponding coordinate in source [x0, y0]T
x0
“Source” “Template”
Real camera image is inverted Instead model impossible but more convenient virtual image
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
First camera: Second camera: Substituting:
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
pinhole cameras as a warp function,
“pinhole projection” “warp function”
“pose parameters”
pinhole cameras as a warp function,
“pinhole projection” “warp function”
∈ SE(3)
i=1
“pose parameters”
𝐽𝑙−1 𝐽𝑙
“An Invitation to 3D Vision”, Ma,
T (xn) = I(W{xn; θf, λn}) T If
λn˜ xn
“keyframe template” “f-th image”
Baker, Simon, and Iain Matthews. "Equivalence and efficiency of image alignment algorithms." CVPR 2001.
≈ If(W{xn; θf, λn}) + Af
n∆θf
T (xn) = If(W{xn; θf ∆θ, λn})
Baker, Simon, and Iain Matthews. "Equivalence and efficiency of image alignment algorithms." CVPR 2001.
≈ If(W{xn; θf, λn}) + Af
n∆θf
T (xn) = If(W{xn; θf ∆θ, λn})
{λn}N
n=1
arg min
∆θf N
X
n=1
||T (xn) − If(W{xn; θf, λn}) − Af
n∆θf||2 2
𝐽𝑙−1 𝐽𝑙
“An Invitation to 3D Vision”, Ma,
T If
λn˜ xn
“keyframe template” “f-th image”
estimating camera pose.
improve the performance of camera tracking (i.e pose estimation).
pages 834–849. Springer, 2014.
estimating camera pose.
improve the performance of camera tracking (i.e pose estimation).
pages 834–849. Springer, 2014.
How do we update the depths?
{θf}F
f=1
F
f=1
λ C(xn, λ)
modules of traditional VSLAM (e.g. PTAM) for dense methods.
λmax λmin
T
IF
f = 1 : F
x
modules of traditional VSLAM (e.g. PTAM) for dense methods.
λmax λmin
T
IF
f = 1 : F
x
“Sample across inverse depths”
C(a, λ)
C(b, λ)
C(c, λ)
depths,
arg min
λ N
X
n=1
C(xn, λn) + g(xn)||r ⇤ λn||✏ g(x) = exp(α||rT(x)||β
2)
depths,
arg min
λ N
X
n=1
C(xn, λn) + g(xn)||r ⇤ λn||✏ g(x) = exp(α||rT(x)||β
2)
What do you think the prior is doing?
method and therefore requires state of the art GPU to run in real- time.
that circumvents this limitation.
Tracking Depth Map Estimation Map Optimization
New Image
(640 x 480 at 30Hz)
Track on Current KF:
→ estimate SE(3) transformation
Current KF
Refine Current KF
→ small-baseline stereo → probabilistically merge into KF → regularize depth map
Create New KF
→ propagate depth map to new frame → regularize depth map
Add KF to Map
→ find closest keyframes → estimate Sim(3) edges
(See Sec. 3.2, 3.5 and 3.6) (See Sec. 3.3) (See Sec. 3.4)
replace KF refine KF yes no tracking reference add to map
Current Map Take KF?
min
ξ∈se(3)
P
p
p(p,ξ)
σ2
rp(p,ξ)
min
ξ∈sim(3)
P
p
σ2
rp(p,ξ) +r2
d(p,ξ)σ2
rd(p,ξ)834–849. Springer, 2014.
prior to DTAM.
834–849. Springer, 2014.
2
– –
– –
average-depth keyframe distance > threshold (~10-20 %)
Taken from D. Scaramuzza “Tutorial on Visual Odometry”.
First camera: Second camera: Substituting:
depth map with large variance.
seconds of operation the algorithm “locks” to a good configuration.
graph optimization.
834–849. Springer, 2014.
depth map with large variance.
seconds of operation the algorithm “locks” to a good configuration.
graph optimization.
834–849. Springer, 2014.
Why is BA not employed?
between non-adjacent frames.
𝑫𝟏 𝑫𝟐 𝑫𝟑 𝑫𝟒 𝑫𝒐−𝟐 𝑫𝒐 𝑼𝟐,𝟏 𝑼𝟑,𝟐 𝑼𝟒,𝟑 𝑼𝒐,𝒐−𝟐 𝑼𝟑,𝟏 𝑼𝟒,𝟏 𝑼𝒐−𝟐,𝟑
𝐷𝑗 − 𝑈𝑗𝑘𝐷𝑘
2 𝑘 𝑗
𝑫𝟐 𝑫𝟑 𝑫𝟒 𝑫𝒐−𝟐 𝑫𝒐 𝑼𝟐,𝟏 𝑼𝟑,𝟐 𝑼𝟒,𝟑 𝑼𝒐,𝒐−𝟐 𝑼𝟑,𝟏 𝑼𝟒,𝟏 𝑼𝒐−𝟐,𝟑
𝐷𝑗 − 𝑈𝑗𝑘𝐷𝑘
2 𝑘 𝑗
Taken from D. Scaramuzza “Tutorial on Visual Odometry”.
https://github.com/tum-vision/lsd_slam
portability to other platforms.
834–849. Springer, 2014.
effects when solving BA problem.
pose (diag) pose-geo geo (diag) geo (off-diag)
accurate and visual appealing.
reduce long-term, large-scale accuracy.
direct SLAM.
within a photometric bundle adjustment framework.
information.
≈ If(W{xn; θf, λn}) + [Af
n, Bf n]
∆θf ∆λn
direct SLAM.
within a photometric bundle adjustment framework.
information.
≈ If(W{xn; θf, λn}) + [Af
n, Bf n]
∆θf ∆λn
F - frames
F
X
r=1
F - frames
F
X
r=1
“reference frame”
F - frames
F
X
r=1
X
x∈Ir
x ∈ Ir
“reference frame”
F - frames
F
X
r=1
X
x∈Ir
x ∈ Ir X
f∈obs(x)
“reference frame”
arg min
λ,θ
||Ir(x) − If(W{x; θf, λr(x)})||2
2
doing inference on a Markov Random Field (MRF).
Computing, vol. 30, no. 2, pp. 65–77, 2012. .
3
1
θ1 θ2 θ3 θ4
w3 w4 w5 w6
“edges based
Mapping (PTAM) algorithm.
6 1
x x2 x3 x 4 x5 x 6 T1 T2 T3 T0
θ4
w3 w4 w5 w6 θ2 θ3
θ1
2007.
Computing, vol. 30, no. 2, pp. 65–77, 2012. .
“remove all but a small subset of keyframes”
ORB-SLAM:
s_01 s_10 s_20 s_30 s_40 s_50 Fwd Bwd
2 4 6 8 10
DSO:
s_01 s_10 s_20 s_30 s_40 s_50 Fwd Bwd
2 4 6 8 10
Full evaluation result.
All error values for the TUM- monoVO dataset.
44
44