a whirlwind tour of where we are in computational
play

A Whirlwind Tour of where we are in Computational Binocular Stereo - PowerPoint PPT Presentation

A Whirlwind Tour of where we are in Computational Binocular Stereo Vision a beginners tutorial for the uninitiated Toby Breckon School of Engineering and Computing Sciences Durham University Slides:


  1. A Whirlwind Tour of where we are in Computational Binocular Stereo Vision a beginners tutorial for the uninitiated Toby Breckon School of Engineering and Computing Sciences Durham University Slides: www.durham.ac.uk/toby.breckon/teaching/tutorials/vihm_wks_2015_breckon.pdf Slide material acknowledgements (some portions): R. Szeliski (Microsoft/Washington), B. Fisher (Edinburgh), O. Hamilton (Cranfield/Durham), J. Xiao, N. Snavely, J. Hays, S. Prince ViiHM Mini-Workshop 2015 Stereo Vision : 1

  2. Setting the Scene ... Breckon: ViiHM 2015 Stereo Vision : 2

  3. the core problem: stereo vision Breckon: ViiHM 2015 Stereo Vision : 3

  4. the core problem: stereo vision ● Binocular Stereo Vision (i.e. only 2 cameras) 3D scene information implicitly encoded in image differences – ⇒ Representation: RGB intensity images noisy – Breckon: ViiHM 2015 Stereo Vision : 4

  5. Left Breckon: ViiHM 2015 Stereo Vision : 5

  6. Right Breckon: ViiHM 2015 Stereo Vision : 6

  7. Stereo Vision – the key principle image features (e.g. point / line / pixel) will project differently in the left and right images depending on its distance from the camera (or eyes in human vision). P R P L P L P R This difference in image position is known as disparity , d =|P L - P R | Breckon: ViiHM 2015 Stereo Vision : 7

  8. Stereo Vision - principle - Matching every feature between the left and right images results in a 2D ‘disparity map’ or ‘depth map’ (computed as disparity, d, at every feature position) - Real-world 3D information (distances to scene objects) can be recovered from this depth map Breckon: ViiHM 2015 Stereo Vision : 8

  9. Concept : depth recovery Depth of scene object indicated by greyscale value http://vision.middlebury.edu/stereo/ Breckon: ViiHM 2015 Stereo Vision : 9

  10. But why is this computationally challenging ? Breckon: ViiHM 2015 Stereo Vision : 10

  11. Left Breckon: ViiHM 2015 Stereo Vision : 11

  12. Right Breckon: ViiHM 2015 Stereo Vision : 12

  13. In reality - images are noisy due to {encoding, sampling, illumination, camera alignment, camera variations, temperature} thus features appear differently in each image .. thus simple image matching (most) often fails Breckon: ViiHM 2015 Stereo Vision : 13

  14. this is what makes stereo vision challenging Breckon: ViiHM 2015 Stereo Vision : 14

  15. Today , almost all computational stereo research addresses the matching problem [to some degree, at some level] Breckon: ViiHM 2015 Stereo Vision : 15

  16. Disparity Vs. Depth ● Computer Vision people often refer to disparity estimation P L – disparity is a 2D measure of feature P R d displacement between the images (measured in pixels) ● Biological Vision people often refer to depth perception – depth is an axis of positional Scene measurement of distance Depth Ordering within the scene (measured in metres / mm / cm) Relative scene depth, Z Breckon: ViiHM 2015 Stereo Vision : 16

  17. … essentially the same thing Depth of a scene object, Z , observed to have disparity difference, d , between two stereo images separated by a baseline distance, B , with camera lenses with a focal length, f. .... if you have one you can calculate the other Breckon: ViiHM 2015 Stereo Vision : 17

  18. Stereo : Standard Formulation Camera 1 Camera 2 (left eye) (right eye) B L ⇒ R left / right views at known (calibrated) distance apart (baseline, B) ● Breckon: ViiHM 2015 Stereo Vision : 18

  19. Stereo Vision – disparity to depth Point P (in the world) is projected into the left image plane (as P L ) and the right image plane (as P R ) Z P L P R Left Right Image Plane Image Plane f f P = (X,Y,Z) (in the world) P L =(x L ,y L ) (in left image) B L ⇒ P R =(x R ,y R ) (in right image) R Breckon: ViiHM 2015 Stereo Vision : 19

  20. Stereo Vision – disparity to depth The re-projection of P L from the left image plane into the right image plane allows us to recover disparity as a pixel distance within the image. disparity, d =|P L -P R | P Z d P L P R P L Left Right Image Plane Image Plane f f P = (X,Y,Z) (in the world) P L =(x L ,y L ) (in left image) B L ⇒ P R =(x R ,y R ) (in right image) R Breckon: ViiHM 2015 Stereo Vision : 20

  21. Stereo Vision – disparity to depth What is stereo vision? Z X Y Images captured under Perspective Transform ● (X,Y,Z) in scene (depth Z) – imaged at position (x,y) on the image plane – determined by the focal length of the camera f – (lens to image plane distance) image inverted during capture (fixed inside camera) – y Z x f Thus in stereo to recover 3D position of P = (X, Y, Z): ● depth of a feature, Z , with disparity, d, over a stereo baseline, B: – Breckon: ViiHM 2015 Stereo Vision : 21

  22. Computational Stereo – An Outline [How do we solve the matching problem ?] Breckon: ViiHM 2015 Stereo Vision : 22

  23. Stereo Vision - Overview 2 stereo cameras Stereo camera setup two cameras, viewing calibration ● target [Lukins '05] relative positions known (calibration) Image Capture Feature Extraction What can we see in each ● image? Can we match ● Feature Matching features between images? Triangulation Depth recovery from matched features Breckon: ViiHM 2015 Stereo Vision : 23

  24. Sparse Image Features ● State of the Art : feature points – high dimensional local feature descriptions (e.g. 128D+) – considerable research effort Initial work - [ Harris, 1998] then intensive - [Period : 2004 → 2010+ ] – robust matching performance beyond the stereo case ● considerably beyond (!) ● strongly invariant (via RANSAC) – Feature points in a nutshell: ● pixels described by local gradient histograms ● normalized for maximal invariance ● discard pixel regions that are not locally unique [ SIFT – Lowe, 2004 / SURF – Bay et al., 2006] Breckon: ViiHM 2015 Stereo Vision : 24

  25. Sparse Image Features Harris Feature Points – example - [Fisher / Breckon et al., 2014] Breckon: ViiHM 2015 Stereo Vision : 25

  26. Sparse Image Features ● Under-pins …. 3D reconstruction from tourist photos: http://www.cs.cornell.edu/projects/p2f/ Real-time image mosaicking [Breckon et al., 2010] Deformable object matching - http://www.cvc.uab.es/~jcrubio/ Object instance detection – [SURF, SIFT et al.] … + object recognition and a whole lot more. Breckon: ViiHM 2015 Stereo Vision : 26

  27. Readily gives us feature-based stereo (i.e. sparse depth) e.g. Match local unique “corner” features points (obtain disparity/depth at these points) Interpolate complete 3D depth solution / object positions etc. Breckon: ViiHM 2015 Stereo Vision : 27

  28. Example: sparse stereo for HCI [Features = red/green blobs] [source: anon] Breckon: ViiHM 2015 Stereo Vision : 28

  29. Example: sparse stereo for stereo odometry [Features = feature points] https://www.youtube.com/watch?v=lTQGTbrNssQ Breckon: ViiHM 2015 Stereo Vision : 29

  30. Reality … nobody really uses sparse stereo any more [apart from bespoke applications like those just illustrated] Breckon: ViiHM 2015 Stereo Vision : 30

  31. .. the world went dense. Breckon: ViiHM 2015 Stereo Vision : 31

  32. Dense Stereo Vision ● Concept: compute depth for each and every scene pixel Breckon: ViiHM 2015 Stereo Vision : 32

  33. Key challenge: any pixel in left could now potentially match to any pixel in the right this is a lot of matches to evaluate! → a large search space of matches is computationally expensive (and prone to mis-matching errors) Breckon: ViiHM 2015 Stereo Vision : 33

  34. Stereo Correspondence Problem Q: For a given feature in the left, what is the correct correspondence? ? Different pairing result in different 3D results ● inconsistent correspondence = inconsistent 3D (!) – Key problem in all stereo vision approaches – Breckon: ViiHM 2015 Stereo Vision : 34

  35. In computational stereo vision this is addressed via three aspects: camera calibration leading to epipolar geometry Match aggregation – matching regions not pixels Match optimization – compute many possible matches, then select the best subset that are maximal inter-consistent Breckon: ViiHM 2015 Stereo Vision : 35

  36. Epipolar Geometry – reduces matching space ● Feature p l in the left image lies on a ray r in space – r projects to an epipolar line e in the right image – along which the matching feature p r must lie If the images are “rectified”, then epipolar line is the image row ● i.e. camera images both perfectly axis aligned – Breckon: ViiHM 2015 Stereo Vision : 36

  37. Epipolar Geometry – reduces matching space ● Constrains L → R Correspondence – reduces 2D search to 1D – images linked by fundamental matrix, F. – For matched points p l F p r =0. – F generally derived from prior calibration routine (with pre- known target). – Points are homogeneous – F is 3x3 Right Image Plane Left Image Plane Match for point p l on ray r (left) must lie on epipolar line e (right). ● Breckon: ViiHM 2015 Stereo Vision : 37

  38. Example: rectified Images original rectified  “rectified” images = then epipolar line is the image row • rectification is performed via calibration thus stereo is reduced to a 1D “scan-line matching” problem Breckon: ViiHM 2015 Stereo Vision : 38

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend