A Whirlwind Tour of where we are in Computational Binocular Stereo - - PowerPoint PPT Presentation

a whirlwind tour of where we are in computational
SMART_READER_LITE
LIVE PREVIEW

A Whirlwind Tour of where we are in Computational Binocular Stereo - - PowerPoint PPT Presentation

A Whirlwind Tour of where we are in Computational Binocular Stereo Vision a beginners tutorial for the uninitiated Toby Breckon School of Engineering and Computing Sciences Durham University Slides:


slide-1
SLIDE 1

Stereo Vision : 1 ViiHM Mini-Workshop 2015

A Whirlwind Tour

  • f where we are in

Computational Binocular Stereo Vision

Toby Breckon School of Engineering and Computing Sciences Durham University

Slides: www.durham.ac.uk/toby.breckon/teaching/tutorials/vihm_wks_2015_breckon.pdf

Slide material acknowledgements (some portions): R. Szeliski (Microsoft/Washington), B. Fisher (Edinburgh), O. Hamilton (Cranfield/Durham), J. Xiao, N. Snavely, J. Hays, S. Prince

a beginners tutorial for the uninitiated

slide-2
SLIDE 2

Breckon: ViiHM 2015 Stereo Vision : 2

Setting the Scene ...

slide-3
SLIDE 3

Breckon: ViiHM 2015 Stereo Vision : 3

the core problem: stereo vision

slide-4
SLIDE 4

Breckon: ViiHM 2015 Stereo Vision : 4

the core problem: stereo vision

  • Binocular Stereo Vision

(i.e. only 2 cameras)

3D scene information implicitly encoded in image differences

Representation: RGB intensity images noisy ⇒

slide-5
SLIDE 5

Breckon: ViiHM 2015 Stereo Vision : 5

Left

slide-6
SLIDE 6

Breckon: ViiHM 2015 Stereo Vision : 6

Right

slide-7
SLIDE 7

Breckon: ViiHM 2015 Stereo Vision : 7

Stereo Vision – the key principle

image features (e.g. point / line / pixel) will project differently in the left and right images depending on its distance from the camera (or eyes in human vision).

PL PR This difference in image position is known as disparity, d =|PL- PR| PR PL

slide-8
SLIDE 8

Breckon: ViiHM 2015 Stereo Vision : 8

  • Matching every

feature between the left and right images results in a 2D ‘disparity map’ or ‘depth map’

(computed as disparity, d, at every feature position)

  • Real-world 3D

information (distances to scene

  • bjects) can be

recovered from this depth map

Stereo Vision - principle

slide-9
SLIDE 9

Breckon: ViiHM 2015 Stereo Vision : 9

Concept : depth recovery

Depth of scene object indicated by greyscale value

http://vision.middlebury.edu/stereo/

slide-10
SLIDE 10

Breckon: ViiHM 2015 Stereo Vision : 10

But why is this computationally challenging ?

slide-11
SLIDE 11

Breckon: ViiHM 2015 Stereo Vision : 11

Left

slide-12
SLIDE 12

Breckon: ViiHM 2015 Stereo Vision : 12

Right

slide-13
SLIDE 13

Breckon: ViiHM 2015 Stereo Vision : 13

In reality - images are noisy due to {encoding, sampling, illumination, camera alignment, camera variations, temperature} thus features appear differently in each image .. thus simple image matching

(most) often fails

slide-14
SLIDE 14

Breckon: ViiHM 2015 Stereo Vision : 14

this is what makes stereo vision challenging

slide-15
SLIDE 15

Breckon: ViiHM 2015 Stereo Vision : 15

Today, almost all computational stereo research addresses the matching problem

[to some degree, at some level]

slide-16
SLIDE 16

Breckon: ViiHM 2015 Stereo Vision : 16

Disparity Vs. Depth

  • Computer Vision people often refer to disparity estimation

– disparity is a 2D measure of feature

displacement between the images

(measured in pixels)

  • Biological Vision people often refer to depth perception

– depth is an axis of positional

measurement of distance within the scene

(measured in metres / mm / cm) PR PL

Relative scene depth, Z

Scene Depth Ordering d

slide-17
SLIDE 17

Breckon: ViiHM 2015 Stereo Vision : 17

… essentially the same thing

Depth of a scene object, Z, observed to have disparity difference, d, between two stereo images separated by a baseline distance, B, with camera lenses with a focal length, f.

.... if you have one you can calculate the other

slide-18
SLIDE 18

Breckon: ViiHM 2015 Stereo Vision : 18

Stereo : Standard Formulation

  • left / right views at known (calibrated) distance apart (baseline, B)

Camera 1 (left eye) Camera 2 (right eye)

BL⇒

R

slide-19
SLIDE 19

Breckon: ViiHM 2015 Stereo Vision : 19

PL PR

BL⇒

R

f f

Left Image Plane Right Image Plane

Z

Point P (in the world) is projected into the left image plane (as PL) and the right image plane (as PR)

P = (X,Y,Z) (in the world) PL=(xL,yL) (in left image) PR=(xR,yR) (in right image)

Stereo Vision – disparity to depth

slide-20
SLIDE 20

Breckon: ViiHM 2015 Stereo Vision : 20

PL PR

BL⇒

R

f f

Left Image Plane Right Image Plane

P Z PL d

The re-projection of PL from the left image plane into the right image plane allows us to recover disparity as a pixel distance within the image.

disparity, d =|PL-PR|

P = (X,Y,Z) (in the world) PL=(xL,yL) (in left image) PR=(xR,yR) (in right image)

Stereo Vision – disparity to depth

slide-21
SLIDE 21

Breckon: ViiHM 2015 Stereo Vision : 21

What is stereo vision? Stereo Vision – disparity to depth

f Z X Y Z y x

  • Images captured under Perspective Transform

(X,Y,Z) in scene (depth Z)

imaged at position (x,y) on the image plane

determined by the focal length of the camera f (lens to image plane distance)

image inverted during capture (fixed inside camera)

  • Thus in stereo to recover 3D position of P = (X, Y, Z):

depth of a feature, Z, with disparity, d, over a stereo baseline, B:

slide-22
SLIDE 22

Breckon: ViiHM 2015 Stereo Vision : 22

Computational Stereo – An Outline

[How do we solve the matching problem ?]

slide-23
SLIDE 23

Breckon: ViiHM 2015 Stereo Vision : 23

Stereo Vision - Overview

Image Capture

Feature Extraction

Feature Matching Triangulation

  • Stereo camera setup two cameras,

relative positions known (calibration)

  • What can we see in each

image?

  • Can we match

features between images? Depth recovery from matched features

2 stereo cameras viewing calibration target [Lukins '05]

slide-24
SLIDE 24

Breckon: ViiHM 2015 Stereo Vision : 24

Sparse Image Features

  • State of the Art : feature points

– high dimensional local feature descriptions

(e.g. 128D+)

– considerable research effort

Initial work - [ Harris, 1998] then intensive - [Period : 2004 → 2010+ ]

– robust matching performance

beyond the stereo case

  • considerably beyond (!)
  • strongly invariant (via RANSAC)

– Feature points in a nutshell:

  • pixels described by local gradient histograms
  • normalized for maximal invariance
  • discard pixel regions that are not locally

unique

[ SIFT – Lowe, 2004 / SURF – Bay et al., 2006]

slide-25
SLIDE 25

Breckon: ViiHM 2015 Stereo Vision : 25

Harris Feature Points – example - [Fisher / Breckon et al., 2014]

Sparse Image Features

slide-26
SLIDE 26

Breckon: ViiHM 2015 Stereo Vision : 26

Sparse Image Features

  • Under-pins ….

3D reconstruction from tourist photos: http://www.cs.cornell.edu/projects/p2f/ Object instance detection – [SURF, SIFT et al.] Deformable object matching - http://www.cvc.uab.es/~jcrubio/

… + object recognition and a whole lot more.

Real-time image mosaicking [Breckon et al., 2010]

slide-27
SLIDE 27

Breckon: ViiHM 2015 Stereo Vision : 27

Readily gives us feature-based stereo (i.e. sparse depth)

e.g. Match local unique “corner” features points (obtain disparity/depth at these points) Interpolate complete 3D depth solution / object positions etc.

slide-28
SLIDE 28

Breckon: ViiHM 2015 Stereo Vision : 28

Example: sparse stereo for HCI

[Features = red/green blobs]

[source: anon]

slide-29
SLIDE 29

Breckon: ViiHM 2015 Stereo Vision : 29

Example: sparse stereo for stereo odometry

https://www.youtube.com/watch?v=lTQGTbrNssQ [Features = feature points]

slide-30
SLIDE 30

Breckon: ViiHM 2015 Stereo Vision : 30

Reality … nobody really uses sparse stereo any more

[apart from bespoke applications like those just illustrated]

slide-31
SLIDE 31

Breckon: ViiHM 2015 Stereo Vision : 31

.. the world went dense.

slide-32
SLIDE 32

Breckon: ViiHM 2015 Stereo Vision : 32

Dense Stereo Vision

  • Concept: compute depth for each and every scene pixel
slide-33
SLIDE 33

Breckon: ViiHM 2015 Stereo Vision : 33

Key challenge: any pixel in left could now potentially match to any pixel in the right this is a lot of matches to evaluate! → a large search space of matches is computationally expensive (and prone to mis-matching errors)

slide-34
SLIDE 34

Breckon: ViiHM 2015 Stereo Vision : 34

Stereo Correspondence Problem

Q: For a given feature in the left, what is the correct correspondence? ?

  • Different pairing result in different 3D results

inconsistent correspondence = inconsistent 3D (!)

Key problem in all stereo vision approaches

slide-35
SLIDE 35

Breckon: ViiHM 2015 Stereo Vision : 35

In computational stereo vision this is addressed via three aspects:

camera calibration leading to epipolar geometry Match aggregation – matching regions not pixels Match optimization – compute many possible matches, then select the best subset that are maximal inter-consistent

slide-36
SLIDE 36

Breckon: ViiHM 2015 Stereo Vision : 36

Epipolar Geometry – reduces matching space

  • Feature pl in the left image lies on a ray r in space

– r projects to an epipolar line e in the right image – along which the matching feature pr must lie

  • If the images are “rectified”, then epipolar line is the image row

i.e. camera images both perfectly axis aligned

slide-37
SLIDE 37

Breckon: ViiHM 2015 Stereo Vision : 37

Epipolar Geometry – reduces matching space

  • Constrains L → R Correspondence

– reduces 2D search to 1D – images linked by fundamental matrix, F.

– For matched points pl F pr =0. – F generally derived from prior calibration routine (with pre- known target). – Points are homogeneous – F is 3x3

  • Match for point pl on ray r (left) must lie on epipolar line e (right).

Left Image Plane Right Image Plane

slide-38
SLIDE 38

Breckon: ViiHM 2015 Stereo Vision : 38

Example: rectified Images

“rectified” images = then epipolar line is the image row

  • rectification is performed via calibration

thus stereo is reduced to a 1D “scan-line matching” problem

  • riginal

rectified

slide-39
SLIDE 39

Breckon: ViiHM 2015 Stereo Vision : 39

… which aids Dense Stereo Matching

  • We rectify (transform) the images so that these lines

correspond to the image rows (or horizontal scan-lines).

Before: After:

slide-40
SLIDE 40

Breckon: ViiHM 2015 Stereo Vision : 40

Example: early dense stereo

…. suffered from vertical disparity streaking due to purely horizontal scan-line (mis-) matching

[ S. Birchfield and C. Tomasi, Depth Discontinuities by Pixel-to-Pixel Stereo, International Journal of Computer Vision, 35(3): 269-293, December 1999 ]

slide-41
SLIDE 41

Breckon: ViiHM 2015 Stereo Vision : 41

  • Estimate Fundamental Matrix F

– use know object of known (fixed) geometry – extract correspondences (pl , pr) between object features

  • ver N stereo pairs

– Use optimization routine to recover F

such that squared error, e: e = Σ (pl F pr)2 is minimized

  • Example: use chess-board

– easy to detect, known dimensions – easy to assign correspondences

Epipolar Geometry is enabled by Stereo Calibration

slide-42
SLIDE 42

Breckon: ViiHM 2015 Stereo Vision : 42

Today, stereo camera calibration is considered a solved problem

[almost everyone uses …

  • Z. Zhang. A flexible new technique for camera calibration. Trans. Patt. Anal. and Mach. Intell., 22(11):1330-1334, 2000.]
slide-43
SLIDE 43

Breckon: ViiHM 2015 Stereo Vision : 43

From Sparse to Dense 3D ...

  • Knowledge of fundamental matrix, F, allows us to

recover depth at each and every pixel between a given image pair.

Disparity map D(x,y)

slide-44
SLIDE 44

Breckon: ViiHM 2015 Stereo Vision : 44

Dense Stereo Matching

  • We can then scan-line

match each and every pixel ….

slide-45
SLIDE 45

Breckon: ViiHM 2015 Stereo Vision : 45

Using for example simple pixel matching

  • Find matches based on local pixel neighbourhood “match

score”

– e.g. ZNCC (Zero-mean Normalized Cross-Correlation)

[ one example of many approaches ] ZNCC x1,x2

( ) =

I(x1 + i)− I (x1)

( ) I(x2 + i)− I (x2) ( )

i

I(x1 +i)− I (x1)

( )

2 i

I(x2 + i)− I (x2)

( )

2 i

x2 + i x1 x2 x1 +i

i

slide-46
SLIDE 46

Breckon: ViiHM 2015 Stereo Vision : 46

Matching cost disparity Left Right scanline

Correspondence search via minimal matching cost

  • Slide neighbourhood window along the right scanline

and compare contents of that window with the reference window in the left image

  • Matching Cost: SSD, ZNCC, normalized correlation ...
slide-47
SLIDE 47

Breckon: ViiHM 2015 Stereo Vision : 47

To overcome “streaking” – aggregate matching costs over 2D “support region”

  • Use window, vertical/horizontal cross or adaptive

(similarity/distance) support regions

  • Matching cost (or energy) at (x,y) for disparity d is:

e.g. match pixel blocks - “semi-global” or “global” block-matching (Hirschmuller, 2005)

2D window or block 2D cross

slide-48
SLIDE 48

Breckon: ViiHM 2015 Stereo Vision : 48

Match aggregation: effects of window size

– Smaller aggregation window

+ More detail

  • More noise

– Larger aggregation window

+ Smoother disparity maps → smoother disparity hence 3D depth, Z

  • Less detail

Increased match aggregation region

slide-49
SLIDE 49

Breckon: ViiHM 2015 Stereo Vision : 49

Correspondence search via minimal matching cost

For every pixel in the left image finding the corresponding pixel in the right image

– matching based on neighbourhood aggregation regions

(blocks)

– optimization problem – – find best set of global matches

slide-50
SLIDE 50

Breckon: ViiHM 2015 Stereo Vision : 50

  • Disparity space: a “cost space” formulation of each

left pixel to each possible right pixel for each scanline

  • Disparity space image: a 3D volume of each

matching cost for each disparity offset at each pixel, m(x,y,d) = C(x,y,d)

Match Optimization

e.g. disparity volume slice:

Left scan-line Right scan-line Matching Cost (d = +/- 10)

slide-51
SLIDE 51

Breckon: ViiHM 2015 Stereo Vision : 51

Match Optimization …..

… over the disparity space image to arrive at a globally consistent solution i.e. minimal matching cost (= minimal cost path within disparity space)

left

S

right

S

correspondence

q p

Left

  • cclusion

t

Right

  • cclusion

s

  • ccl

C

  • ccl

C

corr

C

slide-52
SLIDE 52

Breckon: ViiHM 2015 Stereo Vision : 52

Match Optimization - example

Winner takes all – best match window block Ground truth (true disparity) Scene

slide-53
SLIDE 53

Breckon: ViiHM 2015 Stereo Vision : 53

Match Optimization - example

Dynamic Programming Ground truth (true disparity) Scene

slide-54
SLIDE 54

Breckon: ViiHM 2015 Stereo Vision : 54

Match Optimization - example

Graph cuts Ground truth (true disparity) Scene

  • Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate

Energy Minimization via Graph Cuts, PAMI 2001 Graph Cuts still considered optimal, features in many current approaches

slide-55
SLIDE 55

Breckon: ViiHM 2015 Stereo Vision : 55

Match Optimization today

Dynamic Programming, Graph Cuts, Belief Propogation, ….

slide-56
SLIDE 56

Breckon: ViiHM 2015 Stereo Vision : 56

Today's recipe for dense stereo

  • 1. construct your disparity space image (see earlier)
  • 2. pick your favourite computational optimizer

(any textbook on computational optimization)

  • 3. Compute dense stereo

[and hope to beat graph cuts!]

slide-57
SLIDE 57

Breckon: ViiHM 2015 Stereo Vision : 57

Common Stereo Pipeline

Compute Aggregated Matching Cost Construct Disparity Space Image Perform Match Optimization

Numerous variations possible ... Numerous variations possible ...

slide-58
SLIDE 58

Breckon: ViiHM 2015 Stereo Vision : 58

And so an “obsession” with accuracy grew ...

slide-59
SLIDE 59

Breckon: ViiHM 2015 Stereo Vision : 59

Middlebury Stereo Comparison

  • First de facto Standard Test Set

– statistical comparison – indoor lab conditions – on-line “league table”

  • accuracy only, not speed

– Result: “industrial-scale”

production of variations around this common pipeline

improved performance “in the lab” [2002-present]

http://vision.middlebury.edu/stereo/

slide-60
SLIDE 60

Breckon: ViiHM 2015 Stereo Vision : 60

KTTI Stereo Comparison

  • Contemporary

Application-Driven Test Set

– speed and accuracy – outdoor road conditions – pushes real-time agenda

  • f stereo for autonomous

systems

[2012 - present]

http://www.cvlibs.net/datasets/kitti/

slide-61
SLIDE 61

Breckon: ViiHM 2015 Stereo Vision : 61

… just variations (on parts of) a common pipeline

Compute Aggregated Matching Cost Construct Disparity Space Image Perform Match Optimization

Numerous variations. Numerous variations.

slide-62
SLIDE 62

Breckon: ViiHM 2015 Stereo Vision : 62

Today, … in research terms dense stereo is deemed: “a road well travelled”

slide-63
SLIDE 63

Breckon: ViiHM 2015 Stereo Vision : 63

Speed Vs. Accuracy

  • Semi-Global Block Matching

– approximate global disparity optimization via series of 1D

cost paths

– arguably first real insight to usable real-time dense stereo

[Hirschmuller, 2008]

  • Basis for many contemporary applications, still frequently outperforms

rivals in term some/all of {speed, depth accuracy, range accuracy}

slide-64
SLIDE 64

Breckon: ViiHM 2015 Stereo Vision : 64

Today, where we are now ….

slide-65
SLIDE 65

Breckon: ViiHM 2015 Stereo Vision : 65 Readily available open-source: OpenCV 2.3+

In (every) computer vision lab …

slide-66
SLIDE 66

Breckon: ViiHM 2015 Stereo Vision : 66

In the real world ….

For example, last week in Durham we did this ….

[Hamilton / Breckon, 2015 – Durham]

slide-67
SLIDE 67

Breckon: ViiHM 2015 Stereo Vision : 67

Off-line Dense Stereo Capture

Concept : Multi-scale based stereo matching

slide-68
SLIDE 68

Breckon: ViiHM 2015 Stereo Vision : 68

Off-line Dense Stereo Capture

slide-69
SLIDE 69

Breckon: ViiHM 2015 Stereo Vision : 69

Today, a key research driver is autonomy …. {of future road vehicles, robots, boats, air-vehicles … }

slide-70
SLIDE 70

Breckon: ViiHM 2015 Stereo Vision : 70

Research / Example: automotive stereo for future “driverless cars”

[Hamilton / Breckon, 2013]

slide-71
SLIDE 71

Breckon: ViiHM 2015 Stereo Vision : 71

Research / Example: automotive stereo for future “driverless cars”

[Hamilton / Breckon, 2013]

slide-72
SLIDE 72

Breckon: ViiHM 2015 Stereo Vision : 72

Speed Vs. Accuracy

Comparing different match costs, aggregation and optimizations - [Mroz / Breckon, 2012] http://breckon.eu/toby/demos/autostereo/

slide-73
SLIDE 73

Breckon: ViiHM 2015 Stereo Vision : 73

Today, future challenges and research directions ...

slide-74
SLIDE 74

Breckon: ViiHM 2015 Stereo Vision : 74

Multi-modal / Cross-spectral Stereo

From: [Pingerra / Breckon, 2012]

slide-75
SLIDE 75

Breckon: ViiHM 2015 Stereo Vision : 75

Range Accuracy

From: [Hamilton / Breckon, 2013]

slide-76
SLIDE 76

Breckon: ViiHM 2015 Stereo Vision : 76

Stereo Odometry and Mapping

https://www.youtube.com/watch?v=EPTJz7w_AqU

From: [Hamilton / Breckon, 2013] [Geiger et al. 2010]

slide-77
SLIDE 77

Breckon: ViiHM 2015 Stereo Vision : 77

All condition, all weather

From: [Webster / Breckon, 2015]

slide-78
SLIDE 78

Breckon: ViiHM 2015 Stereo Vision : 78

Future stereo vision will most likely advance along these axis of performance

slide-79
SLIDE 79

Breckon: ViiHM 2015 Stereo Vision : 79

Computer Vision – general overview

  • Computer Vision:

Algorithms and Applications

– Richard Szeliski, 2010

(Springer)

  • PDF download:

– http://szeliski.org/Book/

Supporting stereo vision:

  • see Chapter 11
slide-80
SLIDE 80

Breckon: ViiHM 2015 Stereo Vision : 80

Key Publications – a selection

  • D. Scharstein and R. Szeliski, “A Taxonomy and Evaluation of Dense Two-Frame Stereo

Correspondence Algorithms,” Int'l J.Computer Vision, vol. 47, nos. 1/2/3, pp. 7-42, Apr.- June 2002.

  • Kolmogorov, Vladimir, and Ramin Zabih. "Computing visual correspondence with
  • cclusions using graph cuts." Computer Vision, 2001. ICCV 2001. Proceedings. Eighth

IEEE International Conference on. Vol. 2. IEEE, 2001.

  • Hirschmuller, H. (2008). Stereo processing by semiglobal matching and mutual
  • information. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(2),

328-341.

  • Geiger, Andreas, Philip Lenz, and Raquel Urtasun. "Are we ready for autonomous

driving? the kitti vision benchmark suite." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.

  • Geiger, A., Roser, M., & Urtasun, R. (2011). Efficient large-scale stereo matching. In

Computer Vision–ACCV 2010 (pp. 25-38). Springer Berlin Heidelberg. (ELAS)

slide-81
SLIDE 81

Breckon: ViiHM 2015 Stereo Vision : 81

Final shameless plug ….

  • Dictionary of Computer Vision

and Image Processing R.B. Fisher, T.P. Breckon,

  • K. Dawson-Howe, A.

Fitzgibbon, C. Robertson,

  • E. Trucco, C.K.I. Williams,

Wiley, 2014. … maybe it will be useful

slide-82
SLIDE 82

Stereo Vision : 82 ViiHM Mini-Workshop 2015

That's all folks ...

Toby Breckon, toby.breckon@durham.ac.uk Slides: www.durham.ac.uk/toby.breckon/teaching/tutorials/vihm_wks_2015_breckon.pdf