Direct Visual SLAM Instructor - Simon Lucey 16-623 - Designing - - PowerPoint PPT Presentation

direct visual slam
SMART_READER_LITE
LIVE PREVIEW

Direct Visual SLAM Instructor - Simon Lucey 16-623 - Designing - - PowerPoint PPT Presentation

Direct Visual SLAM Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps Reminder: SLAM S imultaneous L ocalization a nd M apping. On mobile interested primarily in Visual SLAM (VSLAM). Sometimes called Mono SLAM if there


slide-1
SLIDE 1

Direct Visual SLAM

Instructor - Simon Lucey

16-623 - Designing Computer Vision Apps

slide-2
SLIDE 2

Reminder: SLAM

  • Simultaneous Localization and Mapping.
  • On mobile interested primarily in Visual SLAM (VSLAM).
  • Sometimes called Mono SLAM if there is only one camera.
  • Can be viewed as an online SfM problem.
slide-3
SLIDE 3

Reminder: VO vs VSLAM vs SFM

– –

SFM VSLAM VO

Taken from D. Scaramuzza “Tutorial on Visual Odometry”.

slide-4
SLIDE 4

Reminder: Keyframe-based SLAM

– –

[Nister’04, PTAM’07, LIBVISO’08, LSD SLAM’14 SVO’14, ORB SLAM’15]

Keyframe 1 Keyframe 2 Initial pointcloud New triangulated points Current frame New keyframe

Taken from D. Scaramuzza “Tutorial on Visual Odometry”.

slide-5
SLIDE 5

A Tale of Two Threads

f

θf {θf}F

f=1 to

Adapted from S. Lovegrove & A. J. Davison “Real-Time Spherical Mosaicing using Whole Image Alignment”, ECCV 2010.

slide-6
SLIDE 6

Example - ORB SLAM

  • R. Mur-Artal, J. M. M. Montiel, J. D. Tardos, “ORB-SLAM: a Versatile and Accurate

Monocular SLAM System” IEEE Trans. Robotics 2015.

“Thread 1 - Visual Odometry” “Thread 2 - Local BA”

slide-7
SLIDE 7

Today

  • Direct vs. Feature based methods
  • Dense SLAM
  • Semi-Dense SLAM
slide-8
SLIDE 8

8

ECCV 1999

slide-9
SLIDE 9

Feature-Based Methods

slide-10
SLIDE 10

Feature-Based Methods

Image is reduced to a sparse set of keypoints Usually matched with feature descriptors

slide-11
SLIDE 11

Feature-Based Advantages

Vanishing point

Easier transition from images to geometry Wide-baseline matching

Mikolajczyk, 2007

Illumination invariance

Mikolajczyk, 2007

slide-12
SLIDE 12

Feature-Based Advantages

Vanishing point

Easier transition from images to geometry Wide-baseline matching

Mikolajczyk, 2007

Illumination invariance

Mikolajczyk, 2007

Using invariant descriptors

slide-13
SLIDE 13

Feature-Based Challenges

Direct Method (ours) Feature-Based Method (ORB+RANSAC)

  • Creates only a sparse map of

the world.

  • Does not sample across all

available image data - edges & weak intensities.

  • Needs high-resolution camera

mode (bad for efficiency and battery life).

slide-14
SLIDE 14

Feature-Based Challenges

Direct Method (ours) Feature-Based Method (ORB+RANSAC)

  • Creates only a sparse map of

the world.

  • Does not sample across all

available image data - edges & weak intensities.

  • Needs high-resolution camera

mode (bad for efficiency and battery life).

slide-15
SLIDE 15

Feature-Based Challenges

Direct Method (ours) Feature-Based Method (ORB+RANSAC)

  • Creates only a sparse map of

the world.

  • Does not sample across all

available image data - edges & weak intensities.

  • Needs high-resolution camera

mode (bad for efficiency and battery life).

slide-16
SLIDE 16

Today

  • Direct vs. Feature based methods
  • Dense SLAM
  • Semi-Dense SLAM
slide-17
SLIDE 17

x

Reminder: Warp Functions

“Source” “Template”

slide-18
SLIDE 18

Our goal is to find the warp parameter vector ! W(x; p) x W(x; p) = warping function such that x0 = W(x; p) p = parameter vector describing warp x = coordinate in template [x, y]T x0 = corresponding coordinate in source [x0, y0]T

p

Reminder: Warp Functions

x0

“Source” “Template”

slide-19
SLIDE 19

Review: Pinhole Camera

Real camera image is inverted Instead model impossible but more convenient virtual image

Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince

slide-20
SLIDE 20

First camera: Second camera: Substituting:

Relating Points between Views

Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince

slide-21
SLIDE 21

Pinhole Warp Function

  • One can represent the relationship of points between views of

pinhole cameras as a warp function,

W(x; θ, λ) = π(λΩ˜ x + τ)

π @ u v w 1 A = ✓ u/w v/w ◆

“pinhole projection” “warp function”

T =  Ω τ 0T 1

  • ∈ SE(3)

“pose parameters”

slide-22
SLIDE 22

Pinhole Warp Function

  • One can represent the relationship of points between views of

pinhole cameras as a warp function,

W(x; θ, λ) = π(λΩ˜ x + τ)

π @ u v w 1 A = ✓ u/w v/w ◆

“pinhole projection” “warp function”

∈ SE(3)

T(θ) = exp 6 X

i=1

θiAi !

“pose parameters”

slide-23
SLIDE 23

Photometric Relationship

  • We can employ this warp function to now express the problem as,

𝑈𝑙

𝐽𝑙−1 𝐽𝑙

𝑈𝑙

“An Invitation to 3D Vision”, Ma,

𝑈𝑙

T (xn) = I(W{xn; θf, λn}) T If

θf

λn˜ xn

“keyframe template” “f-th image”

slide-24
SLIDE 24

Linearizing the Image for Pose

Baker, Simon, and Iain Matthews. "Equivalence and efficiency of image alignment algorithms." CVPR 2001.

≈ If(W{xn; θf, λn}) + Af

n∆θf

T (xn) = If(W{xn; θf ∆θ, λn})

slide-25
SLIDE 25

Linearizing the Image for Pose

Baker, Simon, and Iain Matthews. "Equivalence and efficiency of image alignment algorithms." CVPR 2001.

≈ If(W{xn; θf, λn}) + Af

n∆θf

T (xn) = If(W{xn; θf ∆θ, λn})

slide-26
SLIDE 26

Direct Camera Tracking

  • Assuming known depths ,

{λn}N

n=1

arg min

∆θf N

X

n=1

||T (xn) − If(W{xn; θf, λn}) − Af

n∆θf||2 2

𝑈𝑙

𝐽𝑙−1 𝐽𝑙

𝑈𝑙

“An Invitation to 3D Vision”, Ma,

𝑈𝑙

T If

θf

λn˜ xn

“keyframe template” “f-th image”

slide-27
SLIDE 27

Direct Camera Tracking

  • Most methods employ a variant of the Lucas-Kanade algorithm for

estimating camera pose.

  • Engel et al. demonstrated using a “dense” number of points does not

improve the performance of camera tracking (i.e pose estimation).

  • Advantage of density stems mainly from the map estimation.
  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.
  • J. Engel, T. Schops, and D. Cremers. LSD-SLAM: Large-scale direct monocular slam. In European Conference on Computer Vision,

pages 834–849. Springer, 2014.

slide-28
SLIDE 28

Direct Camera Tracking

  • Most methods employ a variant of the Lucas-Kanade algorithm for

estimating camera pose.

  • Engel et al. demonstrated using a “dense” number of points does not

improve the performance of camera tracking (i.e pose estimation).

  • Advantage of density stems mainly from the map estimation.
  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.
  • J. Engel, T. Schops, and D. Cremers. LSD-SLAM: Large-scale direct monocular slam. In European Conference on Computer Vision,

pages 834–849. Springer, 2014.

How do we update the depths?

slide-29
SLIDE 29

Direct Map Estimation

  • Assuming known pose parameters ,
  • Naively we could solve for the depths independently,

{θf}F

f=1

C(x, λ) = 1 F

F

X

f=1

||T (x) − If(W{x; θf, λ})||1

  • R. A. Newcombe, S. J. Lovegrove and A. J. Davison "DTAM: Dense Tracking and Mapping in Real-Time”, ICCV 2011.

λn = arg min

λ C(xn, λ)

C(x, λ)

C(x, λ)

slide-30
SLIDE 30

DTAM

  • Newcombe et al. proposed - Dense Tracking and Mapping.
  • Attempted to substitute the feature based tracking and mapping

modules of traditional VSLAM (e.g. PTAM) for dense methods.

  • R. A. Newcombe, S. J. Lovegrove and A. J. Davison "DTAM: Dense Tracking and Mapping in Real-Time”, ICCV 2011.

λmax λmin

T

IF

f = 1 : F

x

  • 1
  • 1
slide-31
SLIDE 31

DTAM

  • Newcombe et al. proposed - Dense Tracking and Mapping.
  • Attempted to substitute the feature based tracking and mapping

modules of traditional VSLAM (e.g. PTAM) for dense methods.

  • R. A. Newcombe, S. J. Lovegrove and A. J. Davison "DTAM: Dense Tracking and Mapping in Real-Time”, ICCV 2011.

λmax λmin

T

IF

f = 1 : F

x

  • 1
  • 1

“Sample across inverse depths”

slide-32
SLIDE 32

DTAM - Example

photometric functions and the resulting

  • R. A. Newcombe, S. J. Lovegrove and A. J. Davison "DTAM: Dense Tracking and Mapping in Real-Time”, ICCV 2011.

λ-1

C(a, λ)

slide-33
SLIDE 33

DTAM - Example

  • R. A. Newcombe, S. J. Lovegrove and A. J. Davison "DTAM: Dense Tracking and Mapping in Real-Time”, ICCV 2011.

λ

C(b, λ)

  • 1
slide-34
SLIDE 34

DTAM - Example

are shown for three example

  • R. A. Newcombe, S. J. Lovegrove and A. J. Davison "DTAM: Dense Tracking and Mapping in Real-Time”, ICCV 2011.

λ

C(c, λ)

  • 1
slide-35
SLIDE 35

DTAM - Geometric Prior

  • Newcombe et al. proposed the employment of a geometric prior on

depths,

arg min

λ N

X

n=1

C(xn, λn) + g(xn)||r ⇤ λn||✏ g(x) = exp(α||rT(x)||β

2)

  • 1
slide-36
SLIDE 36

DTAM - Geometric Prior

  • Newcombe et al. proposed the employment of a geometric prior on

depths,

arg min

λ N

X

n=1

C(xn, λn) + g(xn)||r ⇤ λn||✏ g(x) = exp(α||rT(x)||β

2)

What do you think the prior is doing?

  • 1
slide-37
SLIDE 37

DTAM - Video

slide-38
SLIDE 38

DTAM - Video

slide-39
SLIDE 39

LSD SLAM

  • A drawback to DTAM is that the depth estimation is a volumetric

method and therefore requires state of the art GPU to run in real- time.

  • Engel et al. recently proposed Large-Scale Direct Monocular SLAM

that circumvents this limitation.

Tracking Depth Map Estimation Map Optimization

New Image

(640 x 480 at 30Hz)

Track on Current KF:

→ estimate SE(3) transformation

Current KF

Refine Current KF

→ small-baseline stereo → probabilistically merge into KF → regularize depth map

Create New KF

→ propagate depth map to new frame → regularize depth map

Add KF to Map

→ find closest keyframes → estimate Sim(3) edges

(See Sec. 3.2, 3.5 and 3.6) (See Sec. 3.3) (See Sec. 3.4)

replace KF refine KF yes no tracking reference add to map

Current Map Take KF?

min

ξ∈se(3)

P

p

  • r2

p(p,ξ)

σ2

rp(p,ξ)

  • δ

min

ξ∈sim(3)

P

p

  • r2
p(p,ξ)

σ2

rp(p,ξ) +

r2

d(p,ξ)

σ2

rd(p,ξ)
  • δ
  • J. Engel, T. Schops, and D. Cremers. LSD-SLAM: Large-scale direct monocular slam. In European Conference on Computer Vision, pages

834–849. Springer, 2014.

slide-40
SLIDE 40

LSD SLAM

  • Depth map can instead be represented as a Gaussian distribution.
  • Much more efficient than DTAM’s volumetric approach.
  • Engel et al. also used a similar (but more efficient) geometric

prior to DTAM.

  • J. Engel, T. Schops, and D. Cremers. LSD-SLAM: Large-scale direct monocular slam. In European Conference on Computer Vision, pages

834–849. Springer, 2014.

C(x, λ) = σ(x)−2||λ−1 − µ(x)||2

2

slide-41
SLIDE 41

– –

  • . . .

Reminder: Keyframe Selection

  • Rule of thumb: add a keyframe when,

– –

  • when

average-depth keyframe distance > threshold (~10-20 %)

Taken from D. Scaramuzza “Tutorial on Visual Odometry”.

slide-42
SLIDE 42

Depths across Keyframes

First camera: Second camera: Substituting:

  • Depth from keyframe 1 can be propagated to keyframe 2.
slide-43
SLIDE 43

LSD SLAM - Details

  • To boot-strap LSD slam it is sufficient to initialize a random

depth map with large variance.

  • Given sufficient translation camera motion in the first

seconds of operation the algorithm “locks” to a good configuration.

  • Map is continuously optimized in the background using pose

graph optimization.

  • J. Engel, T. Schops, and D. Cremers. LSD-SLAM: Large-scale direct monocular slam. In European Conference on Computer Vision, pages

834–849. Springer, 2014.

slide-44
SLIDE 44

LSD SLAM - Details

  • To boot-strap LSD slam it is sufficient to initialize a random

depth map with large variance.

  • Given sufficient translation camera motion in the first

seconds of operation the algorithm “locks” to a good configuration.

  • Map is continuously optimized in the background using pose

graph optimization.

  • J. Engel, T. Schops, and D. Cremers. LSD-SLAM: Large-scale direct monocular slam. In European Conference on Computer Vision, pages

834–849. Springer, 2014.

Why is BA not employed?

slide-45
SLIDE 45

Pose Graph Optimization

  • Similar to BA, but does not optimize over 3D points.
  • Employs knowledge that transformations can be computed

between non-adjacent frames.

  • 𝑈𝑗𝑘
  • 𝑛
  • ...

𝑫𝟏 𝑫𝟐 𝑫𝟑 𝑫𝟒 𝑫𝒐−𝟐 𝑫𝒐 𝑼𝟐,𝟏 𝑼𝟑,𝟐 𝑼𝟒,𝟑 𝑼𝒐,𝒐−𝟐 𝑼𝟑,𝟏 𝑼𝟒,𝟏 𝑼𝒐−𝟐,𝟑

𝐷𝑗 − 𝑈𝑗𝑘𝐷𝑘

2 𝑘 𝑗

  • 𝑈𝑗𝑘
  • 𝑛
  • 𝑫𝟏

𝑫𝟐 𝑫𝟑 𝑫𝟒 𝑫𝒐−𝟐 𝑫𝒐 𝑼𝟐,𝟏 𝑼𝟑,𝟐 𝑼𝟒,𝟑 𝑼𝒐,𝒐−𝟐 𝑼𝟑,𝟏 𝑼𝟒,𝟏 𝑼𝒐−𝟐,𝟑

𝐷𝑗 − 𝑈𝑗𝑘𝐷𝑘

2 𝑘 𝑗

Taken from D. Scaramuzza “Tutorial on Visual Odometry”.

slide-46
SLIDE 46

LSD SLAM - Details

  • Source code to LSD SLAM can be found at,

https://github.com/tum-vision/lsd_slam

  • ROS is only used for input and output, facilitating easy

portability to other platforms.

  • J. Engel, T. Schops, and D. Cremers. LSD-SLAM: Large-scale direct monocular slam. In European Conference on Computer Vision, pages

834–849. Springer, 2014.

slide-47
SLIDE 47

Today

  • Direct vs. Feature based methods
  • Dense SLAM
  • Semi-Dense SLAM
  • Photometric Bundle Adjustment
slide-48
SLIDE 48

Drawbacks to Geometric Prior

  • Geometric prior used in DTAM and LSD slam can have unwanted

effects when solving BA problem.

pose (diag) pose-geo geo (diag) geo (off-diag)

  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.
slide-49
SLIDE 49

Drawbacks to Geometric Prior

  • While geometric prior makes 3D reconstruction denser, locally more

accurate and visual appealing.

  • Has additional drawbacks as it can introduce bias and thereby

reduce long-term, large-scale accuracy.

  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.
slide-50
SLIDE 50

Semi-Dense SLAM

  • Recently, the community has been exploring the idea of semi-dense

direct SLAM.

  • In this new approach ALL parameters are solved simultaneously

within a photometric bundle adjustment framework.

  • Can naturally sample all parts of image that contain image gradient

information.

  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.
  • H. Alismail, B. Browning and S. Lucey “Photometric Bundle Adjustment for Vision-based SLAM”, ACCV 2016.

≈ If(W{xn; θf, λn}) + [Af

n, Bf n]

∆θf ∆λn

  • T (xn) = If(W{xn; θf ∆θf, λn + ∆λn})
slide-51
SLIDE 51

Semi-Dense SLAM

  • Recently, the community has been exploring the idea of semi-dense

direct SLAM.

  • In this new approach ALL parameters are solved simultaneously

within a photometric bundle adjustment framework.

  • Can naturally sample all parts of image that contain image gradient

information.

  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.
  • H. Alismail, B. Browning and S. Lucey “Photometric Bundle Adjustment for Vision-based SLAM”, ACCV 2016.

≈ If(W{xn; θf, λn}) + [Af

n, Bf n]

∆θf ∆λn

  • T (xn) = If(W{xn; θf ∆θf, λn + ∆λn})
slide-52
SLIDE 52

Photometric Bundle Adjustment

  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.
  • H. Alismail, B. Browning and S. Lucey “Photometric Bundle Adjustment for Vision-based SLAM”, ACCV 2016.
slide-53
SLIDE 53

Photometric Bundle Adjustment

F - frames

F

X

r=1

  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.
  • H. Alismail, B. Browning and S. Lucey “Photometric Bundle Adjustment for Vision-based SLAM”, ACCV 2016.
slide-54
SLIDE 54

Photometric Bundle Adjustment

F - frames

F

X

r=1

“reference frame”

  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.
  • H. Alismail, B. Browning and S. Lucey “Photometric Bundle Adjustment for Vision-based SLAM”, ACCV 2016.
slide-55
SLIDE 55

Photometric Bundle Adjustment

F - frames

F

X

r=1

X

x∈Ir

x ∈ Ir

“reference frame”

  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.
  • H. Alismail, B. Browning and S. Lucey “Photometric Bundle Adjustment for Vision-based SLAM”, ACCV 2016.
slide-56
SLIDE 56

Photometric Bundle Adjustment

F - frames

F

X

r=1

X

x∈Ir

x ∈ Ir X

f∈obs(x)

“reference frame”

arg min

λ,θ

||Ir(x) − If(W{x; θf, λr(x)})||2

2

  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.
  • H. Alismail, B. Browning and S. Lucey “Photometric Bundle Adjustment for Vision-based SLAM”, ACCV 2016.
slide-57
SLIDE 57

Reminder: SLAM = BA

  • One can view the problem of SfM - Bundle Adjustment as

doing inference on a Markov Random Field (MRF).

  • Problem - becomes exponentially harder as times goes on.
  • H. Strasdat, J. M. M. Montiel, and A. J. Davison, “Visual SLAM: Why filter?” Image and Vision

Computing, vol. 30, no. 2, pp. 65–77, 2012. .

T1 T2

3

T0

1

x x2 x3 x 4 x5 x 6 T

θ1 θ2 θ3 θ4

w1 w2

w3 w4 w5 w6

“edges based

  • n visibility”
slide-58
SLIDE 58

Reminder: Keyframe

  • A better strategy is to employ keyframe BA.
  • Made popular by Klein & Murray’s - Parallel Tracking and

Mapping (PTAM) algorithm.

6 1

x x2 x3 x 4 x5 x 6 T1 T2 T3 T0

θ4

w1 w2

w3 w4 w5 w6 θ2 θ3

θ1

  • G. Klein and D. Murray, “Parallel tracking and mapping for small AR workspaces”, ISMAR

2007.

  • H. Strasdat, J. M. M. Montiel, and A. J. Davison, “Visual SLAM: Why filter?” Image and Vision

Computing, vol. 30, no. 2, pp. 65–77, 2012. .

“remove all but a small subset of keyframes”

slide-59
SLIDE 59

DSO SLAM

ORB-SLAM:

s_01 s_10 s_20 s_30 s_40 s_50 Fwd Bwd

2 4 6 8 10

DSO:

s_01 s_10 s_20 s_30 s_40 s_50 Fwd Bwd

2 4 6 8 10

Full evaluation result.

  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.

All error values for the TUM- monoVO dataset.

slide-60
SLIDE 60

44

slide-61
SLIDE 61

44