Dense Object Reconstruction on Mobile
Instructor - Simon Lucey
16-423 - Designing Computer Vision Apps
Dense Object Reconstruction on Mobile Instructor - Simon Lucey - - PowerPoint PPT Presentation
Dense Object Reconstruction on Mobile Instructor - Simon Lucey 16-423 - Designing Computer Vision Apps Reminder - Project Presentation Each team will be given approximately 2.5 minutes per member to present (for example a 2 member team
Instructor - Simon Lucey
16-423 - Designing Computer Vision Apps
member to present (for example a 2 member team will have 5 minutes allotted).
(must be shorter than your allotted time) YouTube clip describing your App in action.
http://goo.gl/forms/YoeQt0c1Hf.
receiving the the best project prize.
Image is reduced to a sparse set of keypoints Usually matched with feature descriptors
Vanishing point
Easier transition from images to geometry Wide-baseline matching
Mikolajczyk, 2007
Illumination invariance
Mikolajczyk, 2007
Vanishing point
Easier transition from images to geometry Wide-baseline matching
Mikolajczyk, 2007
Illumination invariance
Mikolajczyk, 2007
Small baseline
Small baseline
Small baseline
9
ECCV 1999
Direct Method (ours) Feature-Based Method (ORB+RANSAC)
the world.
available image data - edges & weak intensities.
mode (bad for efficiency and battery life).
Direct Method (ours) Feature-Based Method (ORB+RANSAC)
the world.
available image data - edges & weak intensities.
mode (bad for efficiency and battery life).
Direct Method (ours) Feature-Based Method (ORB+RANSAC)
the world.
available image data - edges & weak intensities.
mode (bad for efficiency and battery life).
image similarity is:
“Template” “Source Image” “Vector Form”
||I(p) − T (0)||2
2
SSD(p) =
W(x1; p) . . . W(xN; p) W(x1; 0) . . . W(xN; 0)
I T
estimating warp displacement using the principles of gradients and spatial coherence.
coherent area governed by the warp .
12
W(x; p)
estimating warp displacement using the principles of gradients and spatial coherence.
coherent area governed by the warp .
12
“We consider this image to always be static - referred to as the template.”
W(x; p)
T (0)
estimating warp displacement using the principles of gradients and spatial coherence.
coherent area governed by the warp .
13
W(x; p)
estimating warp displacement using the principles of gradients and spatial coherence.
coherent area governed by the warp .
14
W(x; p)
∂I(p) ∂pT =
∂I(x0
1)
∂x0T
1
. . . 0T . . . ... . . . 0T . . .
∂I(x0
N)
∂x0T
N
∂W(x1;p) ∂pT
. . .
∂W(xN;p) ∂pT
x0 = W(x; p)
Direct Method (ours) Feature-Based Method (ORB+RANSAC)
International Conference on Robotics and Automation (ICRA) 2017.
Direct Method (ours) Feature-Based Method (ORB+RANSAC)
International Conference on Robotics and Automation (ICRA) 2017.
Direct Method (ours) Feature-Based Method (ORB+RANSAC)
International Conference on Robotics and Automation (ICRA) 2017.
x
“Source” “Template”
Our goal is to find the warp parameter vector ! W(x; p) x W(x; p) = warping function such that x0 = W(x; p) p = parameter vector describing warp x = coordinate in template [x, y]T x0 = corresponding coordinate in source [x0, y0]T
x0
“Source” “Template”
Real camera image is inverted Instead model impossible but more convenient virtual image
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
First camera: Second camera: Substituting:
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
pinhole cameras as a warp function,
“pinhole projection” “warp function”
“pose parameters”
pinhole cameras as a warp function,
“pinhole projection” “warp function”
∈ SE(3)
i=1
“pose parameters”
𝐽𝑙−1 𝐽𝑙
“An Invitation to 3D Vision”, Ma,
T (xn) = I(W{xn; θf, λn}) T If
λn˜ xn
“keyframe template” “f-th image”
Baker, Simon, and Iain Matthews. "Equivalence and efficiency of image alignment algorithms." CVPR 2001.
≈ If(W{xn; θf, λn}) + Af
n∆θf
T (xn) = If(W{xn; θf ∆θ, λn})
Baker, Simon, and Iain Matthews. "Equivalence and efficiency of image alignment algorithms." CVPR 2001.
≈ If(W{xn; θf, λn}) + Af
n∆θf
T (xn) = If(W{xn; θf ∆θ, λn})
{λn}N
n=1
arg min
∆θf N
X
n=1
||T (xn) − If(W{xn; θf, λn}) − Af
n∆θf||2 2
𝐽𝑙−1 𝐽𝑙
“An Invitation to 3D Vision”, Ma,
T If
λn˜ xn
“keyframe template” “f-th image”
estimating camera pose.
improve the performance of camera tracking (i.e pose estimation).
pages 834–849. Springer, 2014.
estimating camera pose.
improve the performance of camera tracking (i.e pose estimation).
pages 834–849. Springer, 2014.
How do we update the depths?
{θf}F
f=1
F
f=1
λ C(xn, λ)
modules of traditional VSLAM (e.g. PTAM) for dense methods.
λmax λmin
T
IF
f = 1 : F
x
modules of traditional VSLAM (e.g. PTAM) for dense methods.
λmax λmin
T
IF
f = 1 : F
x
“Sample across inverse depths”
C(a, λ)
C(b, λ)
C(c, λ)
depths,
arg min
λ N
X
n=1
C(xn, λn) + g(xn)||r ⇤ λn||✏ g(x) = exp(α||rT(x)||β
2)
depths,
arg min
λ N
X
n=1
C(xn, λn) + g(xn)||r ⇤ λn||✏ g(x) = exp(α||rT(x)||β
2)
What do you think the prior is doing?
method and therefore requires state of the art GPU to run in real- time.
that circumvents this limitation.
Tracking Depth Map Estimation Map Optimization
New Image
(640 x 480 at 30Hz)
Track on Current KF:
→ estimate SE(3) transformation
Current KF
Refine Current KF
→ small-baseline stereo → probabilistically merge into KF → regularize depth map
Create New KF
→ propagate depth map to new frame → regularize depth map
Add KF to Map
→ find closest keyframes → estimate Sim(3) edges
(See Sec. 3.2, 3.5 and 3.6) (See Sec. 3.3) (See Sec. 3.4)
replace KF refine KF yes no tracking reference add to map
Current Map Take KF?
min
ξ∈se(3)
P
p
p(p,ξ)
σ2
rp(p,ξ)
min
ξ∈sim(3)
P
p
σ2
rp(p,ξ) +r2
d(p,ξ)σ2
rd(p,ξ)834–849. Springer, 2014.
prior to DTAM.
834–849. Springer, 2014.
2
34
u v w
Optical center Optical axis
“Image Plane” Focal length
u v w
Optical center Optical axis
w = u v w
“Object in the world” “Image Plane” Focal length
u v w
Optical center Optical axis
w = u v w x = x y
“Image Plane” Focal length
u v w
Optical center Optical axis
“Object in the world” “Image Plane” Focal length
u v w
Optical center Optical axis
“Object in the world” “Image Plane” Focal length
Rw + τ
u v w
Optical center Optical axis
x = x y
“Image Plane” Focal length
Rw + τ
u v w
Optical center Optical axis
x = x y
“Image Plane” Focal length
Rw + τ
“Taken from the Beyond PASCAL benchmark website”
“Taken from the Beyond PASCAL benchmark website”
CAD Selection (VGG-NET) Extrinsics Selection (VGG-NET)
image patch 3@ (224x224)
Images Using CNNs Trained with Rendered 3D Model Views”. In ICCV 2015.
via Surface Normal Prediction”. In CVPR 2016.
CAD Selection (VGG-NET) Extrinsics Selection (VGG-NET)
image patch 3@ (224x224)
Images Using CNNs Trained with Rendered 3D Model Views”. In ICCV 2015.
via Surface Normal Prediction”. In CVPR 2016.
Encode style for aligned models
Regress from an rendered image to its style embedding and ground truth pose
Fine tune style and pose regressors by minimizing reprojection loss on natural images.