Dense Object Reconstruction on Mobile Instructor - Simon Lucey - - PowerPoint PPT Presentation

dense object reconstruction on mobile
SMART_READER_LITE
LIVE PREVIEW

Dense Object Reconstruction on Mobile Instructor - Simon Lucey - - PowerPoint PPT Presentation

Dense Object Reconstruction on Mobile Instructor - Simon Lucey 16-423 - Designing Computer Vision Apps Reminder - Project Presentation Each team will be given approximately 2.5 minutes per member to present (for example a 2 member team


slide-1
SLIDE 1

Dense Object Reconstruction on Mobile

Instructor - Simon Lucey

16-423 - Designing Computer Vision Apps

slide-2
SLIDE 2

Reminder - Project Presentation

  • Each team will be given approximately 2.5 minutes per

member to present (for example a 2 member team will have 5 minutes allotted).

  • Each team will fill out the following form, providing a short

(must be shorter than your allotted time) YouTube clip describing your App in action.

  • Teams can submit their YouTube clips through the form

http://goo.gl/forms/YoeQt0c1Hf.

  • 16423 staff will select the best presentations, with the winner

receiving the the best project prize.

slide-3
SLIDE 3

Today

  • 3D Object Reconstruction through Motion
  • 3D Object Reconstruction through Learning
slide-4
SLIDE 4

123D Catch

slide-5
SLIDE 5

123D Catch

slide-6
SLIDE 6

Feature-Based Methods

slide-7
SLIDE 7

Feature-Based Methods

Image is reduced to a sparse set of keypoints Usually matched with feature descriptors

slide-8
SLIDE 8

Feature-Based Advantages

Vanishing point

Easier transition from images to geometry Wide-baseline matching

Mikolajczyk, 2007

Illumination invariance

Mikolajczyk, 2007

slide-9
SLIDE 9

Feature-Based Advantages

Vanishing point

Easier transition from images to geometry Wide-baseline matching

Mikolajczyk, 2007

Illumination invariance

Mikolajczyk, 2007

Using invariant descriptors

slide-10
SLIDE 10

Feature-Based Challenges ?

Small baseline

  • 1. High depth uncertainty.
  • H. Ha, et al. “High-quality Depth from Uncalibrated Small Motion Clip“ in CVPR 2016.
slide-11
SLIDE 11

Feature-Based Challenges ?

Small baseline

  • 1. High depth uncertainty.
  • 2. Degenerate case of two view methods.
  • H. Ha, et al. “High-quality Depth from Uncalibrated Small Motion Clip“ in CVPR 2016.
slide-12
SLIDE 12

Feature-Based Challenges ?

Small baseline

  • 1. High depth uncertainty.
  • 2. Degenerate case of two view methods.
  • H. Ha, et al. “High-quality Depth from Uncalibrated Small Motion Clip“ in CVPR 2016.
  • 3. Reconstructions tend to be sparse and lack detail.
slide-13
SLIDE 13

9

ECCV 1999

slide-14
SLIDE 14

Feature-Based Challenges

Direct Method (ours) Feature-Based Method (ORB+RANSAC)

  • Creates only a sparse map of

the world.

  • Does not sample across all

available image data - edges & weak intensities.

  • Needs high-resolution camera

mode (bad for efficiency and battery life).

slide-15
SLIDE 15

Feature-Based Challenges

Direct Method (ours) Feature-Based Method (ORB+RANSAC)

  • Creates only a sparse map of

the world.

  • Does not sample across all

available image data - edges & weak intensities.

  • Needs high-resolution camera

mode (bad for efficiency and battery life).

slide-16
SLIDE 16

Feature-Based Challenges

Direct Method (ours) Feature-Based Method (ORB+RANSAC)

  • Creates only a sparse map of

the world.

  • Does not sample across all

available image data - edges & weak intensities.

  • Needs high-resolution camera

mode (bad for efficiency and battery life).

slide-17
SLIDE 17

Direct Methods

  • Although not always perfect, a common measure of photometric

image similarity is:

  • Sum of squared differences (SSD)

“Template” “Source Image” “Vector Form”

||I(p) − T (0)||2

2

SSD(p) =

   W(x1; p) . . . W(xN; p)       W(x1; 0) . . . W(xN; 0)   

I T

slide-18
SLIDE 18

Review - LK Algorithm

  • Lucas & Kanade (1981) realized this and proposed a method for

estimating warp displacement using the principles of gradients and spatial coherence.

  • Technique applies Taylor series approximation to any spatially

coherent area governed by the warp .

12

W(x; p)

I(p + ∆p) ≈ I(p) + ∂I(p) ∂pT ∆p

slide-19
SLIDE 19

Review - LK Algorithm

  • Lucas & Kanade (1981) realized this and proposed a method for

estimating warp displacement using the principles of gradients and spatial coherence.

  • Technique applies Taylor series approximation to any spatially

coherent area governed by the warp .

12

“We consider this image to always be static - referred to as the template.”

W(x; p)

I(p + ∆p) ≈ I(p) + ∂I(p) ∂pT ∆p

T (0)

slide-20
SLIDE 20

Review - LK Algorithm

  • Lucas & Kanade (1981) realized this and proposed a method for

estimating warp displacement using the principles of gradients and spatial coherence.

  • Technique applies Taylor series approximation to any spatially

coherent area governed by the warp .

13

W(x; p)

I(p + ∆p) ≈ I(p) + ∂I(p) ∂pT ∆p

slide-21
SLIDE 21

Review - LK Algorithm

  • Lucas & Kanade (1981) realized this and proposed a method for

estimating warp displacement using the principles of gradients and spatial coherence.

  • Technique applies Taylor series approximation to any spatially

coherent area governed by the warp .

14

W(x; p)

I(p + ∆p) ≈ I(p) + ∂I(p) ∂pT ∆p

∂I(p) ∂pT =    

∂I(x0

1)

∂x0T

1

. . . 0T . . . ... . . . 0T . . .

∂I(x0

N)

∂x0T

N

       

∂W(x1;p) ∂pT

. . .

∂W(xN;p) ∂pT

    x0 = W(x; p)

slide-22
SLIDE 22

Results

Direct Method (ours) Feature-Based Method (ORB+RANSAC)

  • H. Alismail, B. Browning, S. Lucey. "Enhancing Direct Camera Tracking with Feature Descriptors" ACCV 2016.
  • H. Alismail, B. Browning, M. Kaess, S. Lucey, “Direct Visual Odometry in Low Light using Binary Descriptors”, IEEE

International Conference on Robotics and Automation (ICRA) 2017.

slide-23
SLIDE 23

Results

Direct Method (ours) Feature-Based Method (ORB+RANSAC)

  • H. Alismail, B. Browning, S. Lucey. "Enhancing Direct Camera Tracking with Feature Descriptors" ACCV 2016.
  • H. Alismail, B. Browning, M. Kaess, S. Lucey, “Direct Visual Odometry in Low Light using Binary Descriptors”, IEEE

International Conference on Robotics and Automation (ICRA) 2017.

slide-24
SLIDE 24

Results

Direct Method (ours) Feature-Based Method (ORB+RANSAC)

  • H. Alismail, B. Browning, S. Lucey. "Enhancing Direct Camera Tracking with Feature Descriptors" ACCV 2016.
  • H. Alismail, B. Browning, M. Kaess, S. Lucey, “Direct Visual Odometry in Low Light using Binary Descriptors”, IEEE

International Conference on Robotics and Automation (ICRA) 2017.

slide-25
SLIDE 25

Today

  • Direct vs. Feature based methods
  • Dense SLAM
  • Semi-Dense SLAM
slide-26
SLIDE 26

x

Reminder: Warp Functions

“Source” “Template”

slide-27
SLIDE 27

Our goal is to find the warp parameter vector ! W(x; p) x W(x; p) = warping function such that x0 = W(x; p) p = parameter vector describing warp x = coordinate in template [x, y]T x0 = corresponding coordinate in source [x0, y0]T

p

Reminder: Warp Functions

x0

“Source” “Template”

slide-28
SLIDE 28

Review: Pinhole Camera

Real camera image is inverted Instead model impossible but more convenient virtual image

Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince

slide-29
SLIDE 29

First camera: Second camera: Substituting:

Relating Points between Views

Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince

slide-30
SLIDE 30

Pinhole Warp Function

  • One can represent the relationship of points between views of

pinhole cameras as a warp function,

W(x; θ, λ) = π(λΩ˜ x + τ)

π @ u v w 1 A = ✓ u/w v/w ◆

“pinhole projection” “warp function”

T =  Ω τ 0T 1

  • ∈ SE(3)

“pose parameters”

slide-31
SLIDE 31

Pinhole Warp Function

  • One can represent the relationship of points between views of

pinhole cameras as a warp function,

W(x; θ, λ) = π(λΩ˜ x + τ)

π @ u v w 1 A = ✓ u/w v/w ◆

“pinhole projection” “warp function”

∈ SE(3)

T(θ) = exp 6 X

i=1

θiAi !

“pose parameters”

slide-32
SLIDE 32

Photometric Relationship

  • We can employ this warp function to now express the problem as,

𝑈𝑙

𝐽𝑙−1 𝐽𝑙

𝑈𝑙

“An Invitation to 3D Vision”, Ma,

𝑈𝑙

T (xn) = I(W{xn; θf, λn}) T If

θf

λn˜ xn

“keyframe template” “f-th image”

slide-33
SLIDE 33

Linearizing the Image for Pose

Baker, Simon, and Iain Matthews. "Equivalence and efficiency of image alignment algorithms." CVPR 2001.

≈ If(W{xn; θf, λn}) + Af

n∆θf

T (xn) = If(W{xn; θf ∆θ, λn})

slide-34
SLIDE 34

Linearizing the Image for Pose

Baker, Simon, and Iain Matthews. "Equivalence and efficiency of image alignment algorithms." CVPR 2001.

≈ If(W{xn; θf, λn}) + Af

n∆θf

T (xn) = If(W{xn; θf ∆θ, λn})

slide-35
SLIDE 35

Direct Camera Tracking

  • Assuming known depths ,

{λn}N

n=1

arg min

∆θf N

X

n=1

||T (xn) − If(W{xn; θf, λn}) − Af

n∆θf||2 2

𝑈𝑙

𝐽𝑙−1 𝐽𝑙

𝑈𝑙

“An Invitation to 3D Vision”, Ma,

𝑈𝑙

T If

θf

λn˜ xn

“keyframe template” “f-th image”

slide-36
SLIDE 36

Direct Camera Tracking

  • Most methods employ a variant of the Lucas-Kanade algorithm for

estimating camera pose.

  • Engel et al. demonstrated using a “dense” number of points does not

improve the performance of camera tracking (i.e pose estimation).

  • Advantage of density stems mainly from the map estimation.
  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.
  • J. Engel, T. Schops, and D. Cremers. LSD-SLAM: Large-scale direct monocular slam. In European Conference on Computer Vision,

pages 834–849. Springer, 2014.

slide-37
SLIDE 37

Direct Camera Tracking

  • Most methods employ a variant of the Lucas-Kanade algorithm for

estimating camera pose.

  • Engel et al. demonstrated using a “dense” number of points does not

improve the performance of camera tracking (i.e pose estimation).

  • Advantage of density stems mainly from the map estimation.
  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.
  • J. Engel, T. Schops, and D. Cremers. LSD-SLAM: Large-scale direct monocular slam. In European Conference on Computer Vision,

pages 834–849. Springer, 2014.

How do we update the depths?

slide-38
SLIDE 38

Direct Map Estimation

  • Assuming known pose parameters ,
  • Naively we could solve for the depths independently,

{θf}F

f=1

C(x, λ) = 1 F

F

X

f=1

||T (x) − If(W{x; θf, λ})||1

  • R. A. Newcombe, S. J. Lovegrove and A. J. Davison "DTAM: Dense Tracking and Mapping in Real-Time”, ICCV 2011.

λn = arg min

λ C(xn, λ)

C(x, λ)

C(x, λ)

slide-39
SLIDE 39

DTAM

  • Newcombe et al. proposed - Dense Tracking and Mapping.
  • Attempted to substitute the feature based tracking and mapping

modules of traditional VSLAM (e.g. PTAM) for dense methods.

  • R. A. Newcombe, S. J. Lovegrove and A. J. Davison "DTAM: Dense Tracking and Mapping in Real-Time”, ICCV 2011.

λmax λmin

T

IF

f = 1 : F

x

  • 1
  • 1
slide-40
SLIDE 40

DTAM

  • Newcombe et al. proposed - Dense Tracking and Mapping.
  • Attempted to substitute the feature based tracking and mapping

modules of traditional VSLAM (e.g. PTAM) for dense methods.

  • R. A. Newcombe, S. J. Lovegrove and A. J. Davison "DTAM: Dense Tracking and Mapping in Real-Time”, ICCV 2011.

λmax λmin

T

IF

f = 1 : F

x

  • 1
  • 1

“Sample across inverse depths”

slide-41
SLIDE 41

DTAM - Example

photometric functions and the resulting

  • R. A. Newcombe, S. J. Lovegrove and A. J. Davison "DTAM: Dense Tracking and Mapping in Real-Time”, ICCV 2011.

λ-1

C(a, λ)

slide-42
SLIDE 42

DTAM - Example

  • R. A. Newcombe, S. J. Lovegrove and A. J. Davison "DTAM: Dense Tracking and Mapping in Real-Time”, ICCV 2011.

λ

C(b, λ)

  • 1
slide-43
SLIDE 43

DTAM - Example

are shown for three example

  • R. A. Newcombe, S. J. Lovegrove and A. J. Davison "DTAM: Dense Tracking and Mapping in Real-Time”, ICCV 2011.

λ

C(c, λ)

  • 1
slide-44
SLIDE 44

DTAM - Geometric Prior

  • Newcombe et al. proposed the employment of a geometric prior on

depths,

arg min

λ N

X

n=1

C(xn, λn) + g(xn)||r ⇤ λn||✏ g(x) = exp(α||rT(x)||β

2)

  • 1
slide-45
SLIDE 45

DTAM - Geometric Prior

  • Newcombe et al. proposed the employment of a geometric prior on

depths,

arg min

λ N

X

n=1

C(xn, λn) + g(xn)||r ⇤ λn||✏ g(x) = exp(α||rT(x)||β

2)

What do you think the prior is doing?

  • 1
slide-46
SLIDE 46

DTAM - Video

slide-47
SLIDE 47

DTAM - Video

slide-48
SLIDE 48

LSD SLAM

  • A drawback to DTAM is that the depth estimation is a volumetric

method and therefore requires state of the art GPU to run in real- time.

  • Engel et al. recently proposed Large-Scale Direct Monocular SLAM

that circumvents this limitation.

Tracking Depth Map Estimation Map Optimization

New Image

(640 x 480 at 30Hz)

Track on Current KF:

→ estimate SE(3) transformation

Current KF

Refine Current KF

→ small-baseline stereo → probabilistically merge into KF → regularize depth map

Create New KF

→ propagate depth map to new frame → regularize depth map

Add KF to Map

→ find closest keyframes → estimate Sim(3) edges

(See Sec. 3.2, 3.5 and 3.6) (See Sec. 3.3) (See Sec. 3.4)

replace KF refine KF yes no tracking reference add to map

Current Map Take KF?

min

ξ∈se(3)

P

p

  • r2

p(p,ξ)

σ2

rp(p,ξ)

  • δ

min

ξ∈sim(3)

P

p

  • r2
p(p,ξ)

σ2

rp(p,ξ) +

r2

d(p,ξ)

σ2

rd(p,ξ)
  • δ
  • J. Engel, T. Schops, and D. Cremers. LSD-SLAM: Large-scale direct monocular slam. In European Conference on Computer Vision, pages

834–849. Springer, 2014.

slide-49
SLIDE 49

LSD SLAM

  • Depth map can instead be represented as a Gaussian distribution.
  • Much more efficient than DTAM’s volumetric approach.
  • Engel et al. also used a similar (but more efficient) geometric

prior to DTAM.

  • J. Engel, T. Schops, and D. Cremers. LSD-SLAM: Large-scale direct monocular slam. In European Conference on Computer Vision, pages

834–849. Springer, 2014.

C(x, λ) = σ(x)−2||λ−1 − µ(x)||2

2

slide-50
SLIDE 50

Examples in Mobile

34

slide-51
SLIDE 51

ItSeez - App

slide-52
SLIDE 52

ItSeez - App

slide-53
SLIDE 53

Today

  • 3D Object Reconstruction through Motion
  • 3D Object Reconstruction through Learning
slide-54
SLIDE 54

What are we now doing?

slide-55
SLIDE 55

What are we now doing?

slide-56
SLIDE 56

Esimating Extrinsics

u v w

Optical center Optical axis

“Image Plane” Focal length

slide-57
SLIDE 57

Esimating Extrinsics

u v w

Optical center Optical axis

w =   u v w  

“Object in the world” “Image Plane” Focal length

slide-58
SLIDE 58

Esimating Extrinsics

u v w

Optical center Optical axis

w =   u v w   x = x y

  • “Object in the world”

“Image Plane” Focal length

slide-59
SLIDE 59

Esimating Extrinsics

u v w

Optical center Optical axis

“Object in the world” “Image Plane” Focal length

slide-60
SLIDE 60

Esimating Extrinsics

u v w

Optical center Optical axis

“Object in the world” “Image Plane” Focal length

Rw + τ

slide-61
SLIDE 61

Esimating Extrinsics

u v w

Optical center Optical axis

x = x y

  • “Object in the world”

“Image Plane” Focal length

Rw + τ

slide-62
SLIDE 62

Esimating Extrinsics

u v w

Optical center Optical axis

x = x y

  • “Object in the world”

“Image Plane” Focal length

Rw + τ

slide-63
SLIDE 63
  • A. Chang et al., “ShapeNet: An Information-Rich 3D Model Repository”, arXiv 1512.03012, 2015
slide-64
SLIDE 64

Annotating 3D Images

“Taken from the Beyond PASCAL benchmark website”

slide-65
SLIDE 65

Annotating 3D Images

“Taken from the Beyond PASCAL benchmark website”

slide-66
SLIDE 66

Current State of the Art

CAD Selection (VGG-NET) Extrinsics Selection (VGG-NET)

. . .

. . .

image patch 3@ (224x224)

  • H. Su, et al. “Render for CNN: Viewpoint Estimation in

Images Using CNNs Trained with Rendered 3D Model Views”. In ICCV 2015.

  • A. Bansal, et al. “Marr Revisited: 2D-3D Alignment

via Surface Normal Prediction”. In CVPR 2016.

slide-67
SLIDE 67

Current State of the Art

CAD Selection (VGG-NET) Extrinsics Selection (VGG-NET)

. . .

. . .

image patch 3@ (224x224)

  • H. Su, et al. “Render for CNN: Viewpoint Estimation in

Images Using CNNs Trained with Rendered 3D Model Views”. In ICCV 2015.

  • A. Bansal, et al. “Marr Revisited: 2D-3D Alignment

via Surface Normal Prediction”. In CVPR 2016.

slide-68
SLIDE 68

Compressible CAD Models

slide-69
SLIDE 69

Compressible CAD Models

slide-70
SLIDE 70

Our Approach

  • 1. Style Embedding Autoencoder

Encode style for aligned models

slide-71
SLIDE 71

Our Approach

  • 2. Image to Pose and Style Regressors

Regress from an rendered image to its style embedding and ground truth pose

slide-72
SLIDE 72

Network Architecture

  • 3. Adaptation to Natural Images

Fine tune style and pose regressors by minimizing reprojection loss on natural images.

slide-73
SLIDE 73

Results