From 2D to 3D: Monocular Vision With application to robotics/AR - - PowerPoint PPT Presentation

from 2d to 3d monocular vision
SMART_READER_LITE
LIVE PREVIEW

From 2D to 3D: Monocular Vision With application to robotics/AR - - PowerPoint PPT Presentation

From 2D to 3D: Monocular Vision With application to robotics/AR Motivation How many sensors do we really need? Motivation What is the limit of what can be inferred from a single embodied (moving) camera frame? Aim AR with a hand-held


slide-1
SLIDE 1

From 2D to 3D: Monocular Vision

With application to robotics/AR

slide-2
SLIDE 2

Motivation

How many sensors do we really need?

slide-3
SLIDE 3

Motivation

  • What is the limit of what can be inferred from

a single embodied (moving) camera frame?

slide-4
SLIDE 4
slide-5
SLIDE 5

Aim

  • AR with a hand-held camera
  • Visual Tracking provides registration
  • Track without prior model of world
  • Challenges
  • Speed
  • Accuracy
  • Robustness
  • Interaction with real world
slide-6
SLIDE 6

Existing attempts: SLAM

  • Simultaneous Localization and Mapping
  • Well-established in robotics (using a rich array
  • f sensors)
  • Demonstrated with a single hand-held camera

by Davison 2003

slide-7
SLIDE 7

Model-based tracking vs SLAM

slide-8
SLIDE 8

Model-based tracking vs SLAM

  • Model-based tracking is
  • More robust
  • More accurate
  • Why?
  • SLAM fundamentally harder?
slide-9
SLIDE 9

Pinhole camera model

 X ,Y ,Z ↦ fX /Z , fY /Z 

X Y Z 1 ↦ fX fY Z  =[ f f 1 0] X Y Z 1 

PX x =

slide-10
SLIDE 10

Pinhole camera model

                                  =           + + 1 1 1 1 1 Z Y X p f p f Z Zp Y f Zp X f

y x x x

          = 1

y x

p f p f K

calibration matrix

[ ]

| I K P =

principal point:

) , (

y x p

p

slide-11
SLIDE 11

Camera rotation and translation

( ) C ~

  • X

~ R X ~

cam =

X 1 C ~ R R 1 X ~ 1 C ~ R R Xcam       − =               − =

[ ]

[ ]X

C ~ R | R K X | I K x

cam

− = =

[ ],

t | R K P = C ~ R t − =

In non-homogeneous coordinates: Note: C is the null space of the camera projection matrix (PC=0)

slide-12
SLIDE 12

Triangulation

  • Given projections of a 3D point in two
  • r more images (with known camera

matrices), find the coordinates of the point

O1 O2 x1 x2 X?

slide-13
SLIDE 13
  • Given: m images of n fixed 3D points

xij = Pi Xj , i = 1, … , m, j = 1, … , n

  • Problem: estimate m projection matrices Pi and n 3D points Xj

from the mn correspondences xij

Structure from Motion (SfM)

x1j x2j x3j Xj P1 P2 P3

slide-14
SLIDE 14

SfM ambiguity

  • If we scale the entire scene by some factor

k and, at the same time, scale the camera matrices by the factor of 1/k, the projections of the scene points in the image remain exactly the same: ) ( 1 X P PX x k k       = =

It is impossible to recover the absolute scale of the scene!

slide-15
SLIDE 15
  • Given: m images of n fixed 3D points

xij = Pi Xj , i = 1, … , m, j = 1, … , n Problem: estimate m projection matrices Pi and n 3D points Xj from the mn correspondences xij

  • With no calibration info, cameras and points can only be

recovered up to a 4x4 projective transformation Q: X QX, P PQ → →

  • 1
  • We can solve for structure and motion when

2mn >= 11m +3n

  • For two cameras, at least 7 points are needed

Structure from Motion (SfM)

slide-16
SLIDE 16
  • Non-linear method for refining structure and

motion (Levenberg-Marquardt)

  • Minimizing re-projection error

Bundle Adjustment

( )

2 1 1

, ) , (

∑∑

= =

=

m i n j j i ij

D E X P x X P

x1j x2j x3j Xj P1 P2 P3 P1Xj P2Xj P3Xj

slide-17
SLIDE 17
  • Self-calibration (auto-calibration) is the process of

determining intrinsic camera parameters directly from uncalibrated images

  • For example, when the images are acquired by a

single moving camera, we can use the constraint that the intrinsic parameter matrix remains fixed for all the images

  • Compute initial projective reconstruction and find

3D projective transformation matrix Q such that all camera matrices are in the form Pi = K [Ri | ti]

  • Can use constraints on the form of the calibration

matrix: zero skew

Self-calibration

slide-18
SLIDE 18

Why is this cool?

http://www.youtube.com/watch?v=sQegEro5Bfo

slide-19
SLIDE 19

Why is this still cool?

http://www.youtube.com/watch?v=p16frKJLVi0

slide-20
SLIDE 20
  • Simultaneous Localization And Mapping
  • A robot is exploring an unknown, static

environment

  • Given:
  • The robot's controls
  • Observations of nearby features
  • Estimate:
  • Map of features
  • Path of the robot

The SLAM Problem

slide-21
SLIDE 21

Structure of the landmark-based SLAM Problem

slide-22
SLIDE 22

SLAM a hard problem??

SLAM: robot path and map are both unknown Robot path error correlates errors in the map

slide-23
SLIDE 23

SLAM a hard problem??

Robot pose uncertainty

  • In the real world, the mapping between
  • bservations and landmarks is unknown
  • Picking wrong data associations can have

catastrophic consequences

  • Pose error correlates data associations
slide-24
SLIDE 24

SLAM

  • Full SLAM:
  • Online SLAM:

Integrations typically done one at a time

) , | , (

: 1 : 1 : 1 t t t

u z m x p

1 2 1 : 1 : 1 : 1 : 1 : 1

... ) , | , ( ) , | , (

∫∫ ∫

=

t t t t t t t

dx dx dx u z m x p u z m x p 

Estimates most recent pose and map! Estimates entire path and map!

slide-25
SLIDE 25

Graphical Model of Full SLAM

) , | , (

: 1 : 1 : 1 t t t

u z m x p

slide-26
SLIDE 26

Graphical Model of Online SLAM

1 2 1 : 1 : 1 : 1 : 1 : 1

... ) , | , ( ) , | , (

∫∫ ∫

=

t t t t t t t

dx dx dx u z m x p u z m x p 

slide-27
SLIDE 27

Scan Matching

{ }

) ˆ , | ( ) ˆ , | ( max arg ˆ

1 1 ] 1 [

− − −

⋅ =

t t t t t t x t

x u x p m x z p x

t

robot motion current measurement map constructed so far

  • Maximize the likelihood of the i-th pose and

map relative to the (i-1)-th pose and map

  • Calculate the map according to “mapping” with

known poses based on the poses and

  • bservations
slide-28
SLIDE 28

SLAM approach

slide-29
SLIDE 29

PTAM approach

slide-30
SLIDE 30

Tracking & Mapping threads

slide-31
SLIDE 31

Mapping thread

slide-32
SLIDE 32

Stereo Initialization

  • 5 point-pose algorithm (Stewenius et al '06)
  • Requires a pair of frames and feature

correspondences

  • Provides initial (sparse) 3D point cloud
slide-33
SLIDE 33

Wait for new keyframe

  • Keyframes are only added if:
  • There is a baseline to the other keyframes
  • Tracking quality is good
  • When a keyframe is added:
  • The mapping thread stops whatever it is doing
  • All points in the map are measured in the

keyframe

  • New map points are found and added to the map
slide-34
SLIDE 34

Add new map points

  • Want as many map points as possible
  • Check all maximal FAST corners in the

keyframe:

  • Check Shi-Tomasi score
  • Check if already in map
  • Epipolar search in a neighboring keyframe
  • Triangulate matches and add to map
  • Repeat in four image pyramid levels
slide-35
SLIDE 35

Optimize map

  • Use batch SFM method: Bundle Adjustment*
  • Adjusts map point positions and keyframe

poses

  • Minimizes re-projection error of all points in

all keyframes (or use only last N keyframes)

  • Cubic complexity with keyframes, linear with

map points

  • Compatible with M-estimators (we use Tukey)
slide-36
SLIDE 36

Map maintenance

  • When camera is not exploring, mapping thread

has idle time – use this to improve the map

  • Data association in bundle adjustment is

reversible

  • Re-attempt outlier measurements
  • Try to measure new map features in all old

keyframes

slide-37
SLIDE 37

Tracking thread

slide-38
SLIDE 38

Pre-process frame

  • Make mono and RGB version of image
  • Make 4 pyramid levels
  • Detect FAST corners
slide-39
SLIDE 39

Project Points

  • Use motion model to update camera pose
  • Project all map points into image to see which

are visible, and at what pyramid level

  • Choose subset to measure
  • ~50 biggest features for coarse stage
  • 1000 randomly selected for fine stage
slide-40
SLIDE 40

Measure Points

  • Generate 8x8 matching template (warped

from source keyframe)

  • Search a fixed radius around projected

position

  • Use zero-mean SSD
  • Only search at FAST corner points
  • Up to 10 inverse composition iterations for

subpixel position (for some patches)

  • Typically find 60-70% of patches
slide-41
SLIDE 41

Update camera pose

  • 6-DOF problem
  • 10 iterations
  • Tukey M-Estimator to minimize a robust
  • bjective function of re-projection error

where ej is the re-projection error vector

slide-42
SLIDE 42

Bundle-adjustment

  • Global bundle-adjustment
  • Local bundle-adjustment
  • X - The newest 5 keyframes in the keyframe chain
  • Z - All of the map points visible in any of these keyframes
  • Y - Keyframe for which a measurement of any point in Z has been

made That is, local bundle

  • Optimizes the pose of the most recent keyframe and its closest

neighbors, and all of the map points seen by these, using all of the measurements ever made of these points.

slide-43
SLIDE 43

Video

http://www.youtube.com/watch?v=Y9HMn6bd-v8 http://www.youtube.com/watch?v=pBI5HwitBX4

slide-44
SLIDE 44

Capabilities

slide-45
SLIDE 45

Capabilities

Multi-scale Compactly Supported Basis Functions Bundle adjusted point cloud with PTAM

slide-46
SLIDE 46

Video

http://www.youtube.com/watch?v=CZiSK7OMANw

slide-47
SLIDE 47

RGB-D Sensor

  • Principle: structured light
  • IR projector + IR camera
  • RGB camera
  • Dense depth images
slide-48
SLIDE 48

Kinect-based mapping

slide-49
SLIDE 49

System Overview

  • Frame-to-frame alignment
  • Global optimization (SBA for loop closure)
slide-50
SLIDE 50

Feature matching

slide-51
SLIDE 51

RANSAC

  • Features correspondences are established; outliers

robustly removed

  • Homography (Transformation) between the two

keyframes can now be estimated

slide-52
SLIDE 52

Global Optimization (RGBD-ICP)

slide-53
SLIDE 53

Benefits

  • Visual and depth information used jointly for real-time mapping

application

  • Reconstruct a dense map of the environment
  • Avoid dense stereo for every pair of KeyFrames
  • Optimize over sparse set of feature points
  • Results in dramatic speed improvements
  • Allows for computing other valuable algorithms simultaneously (e.g.

navigation, obstacle avoidance, scene understanding)

slide-54
SLIDE 54

Video

http://www.cs.washington.edu/ai/Mobile_Robotics/projects/rgbd-3d-mapping/

slide-55
SLIDE 55

Kinect + Real-time reconstruction

slide-56
SLIDE 56

Video

http://research.microsoft.com/apps/video/dl.aspx?id=152815

slide-57
SLIDE 57

Conclusion

  • So much information available from a single

camera

  • Yet to truly understand what we can infer from a

single camera

  • Several exciting technologies in the recent past
  • Software problem; not a hardware limitation
  • Monocular vision can be sufficient for a lot of use

cases

slide-58
SLIDE 58

Thanks!