C18 Computer Vision Lecture 5 Imaging geometry, camera calibration - PowerPoint PPT Presentation

C18 Computer Vision Lecture 5 Imaging geometry, camera calibration Victor Adrian Prisacariu http://www.robots.ox.ac.uk/~victor

InfiniDense DEMO

Course Content • Projective geometry, camera calibration. • Salient feature detection. • Recovering 3D from two images I: epipolar geometry. • Recovering 3D from two images II: stereo correspondences, triangulation, neural nets. Slides at http://www.robots.ox.ac.uk/~victor -> Teaching Lots borrowed from David Murray + AV C18.

Useful Texts • Multi ltiple Vie iew Geometry try in in Computer Visi ision • Richard Hartley, Andrew Zisserman • Computer Visi ision: A Modern Approach • David Forsyth, Jean Ponce Prentice Hall; ISBN:0130851981 • 3-Dim imensional Computer Visi ision: A Geometr tric Vie iewpoint • Olivier Faugeras

Computer Vision: This time… 5. 5. Im Imaging geometry, camera calibration. 1. Introduction. 2. The perspective camera as a geometric device. 3. Perspective using homogeneous coordinates. 4. Calibration the elements of the perspective model. 6. Salient feature detection and description. 7. Recovering 3D from two images I: epipolar geometry. 8. Recovering 3D from two images II: stereo correspondences, triangulation, neural nets.

5.1 Introduction Aim in geometric computati tion vis isio ion is to take a number of 2D images, and obtain an understanding of the 3D environment; what is in it; and how it evolves over time. What do we have here …? … seems very easy …

It isn’t …

Organizing the tricks … Although human and (3D) computer vision might be bags of tricks, it is useful to place the tricks with ithin la larger proce cessing paradigms. For example: a) Data-driven, bottom-up processing. b) Model-driven, top-down, generative processing. c) Dynamic Vision (mixes bottom-up with top-down feedback). d) Active Vision (task oriented). e) Data-driven discriminative approach (machine learning). These are neither all-embracing nor exclusive.

(a) Data-driven, bottom-up processing • Image processing produces map of salient 2D features. • Features input into a range of shape from X processes whose output was the 2.5 .5D sketch. • Only in the last stage we get a fully 3D obje ject- ce centered description.

(b) Model-driven, and (c) Dynamic vision • Model-driven, top-down, generati tive proce cessing: – a model of the scene is assumed known. – Supply a pose for the object relative to the camera, and use projection to predict where salient features should be found in the image space. – Search for the features, and refine the pose by minimizing the observed deviation. • Dynamic vis vision: mixes bottom- up/top-down by introducing Top-down Dynamic feedback.

(d) Active Vision • Introduces task-oriented sensing-perception- actio ion lo loops: – Visual data needs only be “good enough” to drive the particular action. • No need to build and maintain an overarching representation of the surroundings. • Computational resources focused where they are needed.

(e) Data-driven approach • The aim is to le learn a description of the transformation between input and output using exemplars. • Geometry is not forgotten, but implicit learned representation are favored.

5.2 The perspective camera as a geometric device

This is (a picture of) my cat 0 520 x = 295 x 308 Cat nose 520

My cat lives in a 3D world 𝑌 1 𝐲 = 𝑦 1 𝑌 2 𝐘 = 𝑦 2 𝑌 3 The point 𝐘 in world space projects to the point 𝐲 in image space

Going from X in 3D to x in 2D ? 𝑌 1 𝐲 = 𝑦 1 𝑌 2 𝐘 = 𝑦 2 𝑌 3 film/sensor cat Output would be blurry  if film just exposed to the cat

Going from X in 3D to x in 2D ? 𝑌 1 𝐲 = 𝑦 1 𝑌 2 𝐘 = 𝑦 2 𝑌 3 film/sensor barrier cat Blur reduced, looks good ☺

Pinhole Camera ? 𝑌 1 𝐲 = 𝑦 1 𝑌 2 𝐘 = 𝑦 2 𝑌 3 Image Plane pinhole cat All rays pass through the ce center of of pr projection (a single point). Image forms on the image plane.

Pinhole Camera image plane 𝑌 1 𝑌 2 𝐘 = 𝑌 3 f p o Optical axis 𝐲 = 𝑦 1 𝑦 2 𝑌 1 is imaged into 𝐲 = 𝑦 1 𝑌 2 The 3D point 𝐘 = 𝑦 2 as: 𝑌 3 f – focal length 𝑌 1 𝑔 o – camera origin 𝑦 1 𝑌 3 𝑦 2 = p – principal point 𝑌 2 𝑔 𝑌 3

Homogeneous coordinates • The projection 𝐲 = 𝑔𝐘/𝑌 3 is non-linear  . • Can be made linear using homogeneous coordinates – involves representing the image and scene in higher dimensional space. • Limiting cases – e.g. vanishing points – are handled better. • Homogeneous coordinates allow for transformations to be concatenated more easily.

3D Euclidean transforms: inh inhomogeneous coordinates • My cat moves through 3D space. • The movement of the tip of the nose can be described using an Eucli lidean tr transform: ′ 𝐘 3×1 = 𝑺 3×3 𝐘 3×1 + 𝐮 3×1 rotation translation

3D Euclidean transforms: inh inhomogeneous coordinates ′ • Euclidean transform: 𝐘 3×1 = 𝑺 3×3 𝐘 3×1 + 𝐮 3×1 • Concatenation of successive transform is a mess! • 𝐘 1 = 𝑺 1 𝐘 + 𝐮 1 • 𝐘 2 = 𝑺 2 𝐘 1 + 𝐮 2 • 𝐘 2 = 𝑺 2 𝑺 1 𝐘 + 𝐮 1 + 𝐮 2 = 𝑺 2 𝑺 1 𝐘 + 𝑺 2 𝐮 𝟐 + 𝐮 2 .

3D Euclidean transforms: homogeneous coordinates 𝑌 𝑌 𝑍 • We replace the 3D points with a four vector . 𝑍 𝑎 𝑎 1 • The Euclidean transform becomes: 𝑺 𝐮 = 𝑭 𝐘 𝐘 𝐘′ 1 = 𝟏 𝑈 1 1 1 • Transformations can now be concatenated by matrix multiplication: 𝐘 1 = 𝑭 10 𝐘 0 𝐘 2 = 𝑭 21 𝐘 1 → 𝐘 2 = 𝑭 21 𝑭 10 𝐘 𝟏 1 1 1 1 1 1

Homogeneous coordinates – definition in 𝑆 3 𝑌, 𝑍, 𝑎 𝑈 is represented in homogeneous coordinates by any • 𝐘 = 4-vector 𝑌 1 𝑌 2 𝑌 3 𝑌 4 • such that 𝑌 = 𝑌 1 /𝑌 4 , 𝑍 = 𝑌 2 /𝑌 4 , and 𝑎 = 𝑌 3 /𝑌 4 . • So the following homogeneous vectors represent the same point, for any 𝜇 ≠ 0 : 𝑌 1 𝑌 1 𝑌 2 𝑌 2 and 𝜇 𝑌 3 𝑌 3 𝑌 4 𝑌 4 E.g. 2,3,5, 1 𝑈 is the same as −3, −4.5, −7.5, −1.5 𝑈 and both • same inhomogeneous point 2,3,5 𝑈 represent the sam

Homogeneous coordinates – definition in 𝑆 2 𝑦, 𝑧 𝑈 is represented in homogeneous • 𝐲 = coordinates by any 3-vector 𝑦 1 𝑦 2 𝑦 3 • such that 𝑦 = 𝑦 1 /𝑦 3 , 𝑧 = 𝑦 2 /𝑦 3 . • E.g. 1,2,3 𝑈 is the same as 3,6,9 𝑈 and both represent the same inhomogeneous point 0.33,0.66 𝑈

Homogeneous notation – rues for use 1. Convert the inhomogeneous point to an homogeneous vector: 𝑌 𝑌 𝑍 → 𝑍 𝑎 𝑎 1 2. Apply a 4 × 4 transform. 3. Dehomogenize the resulting vector: 𝑌 1 𝑌 1 /𝑌 4 𝑌 2 𝑌 2 /𝑌 4 → 𝑌 3 𝑌 3 /𝑌 4 𝑌 4

Projective transformations • A projective transformation is a linear transformation on homogeneous 4-vectors represented by a non-singular 4x4 matr trix ix. 𝑌′ 1 𝑞 11 𝑞 12 𝑞 13 𝑞 14 𝑌 1 𝑞 21 𝑞 22 𝑞 23 𝑞 24 𝑌′ 2 𝑌 2 = 𝑞 31 𝑞 32 𝑞 33 𝑞 34 𝑌 3 𝑌′ 3 𝑞 41 𝑞 42 𝑞 43 𝑞 44 𝑌 4 𝑌′ 4 • The effect on the homogenous points is that the original and transformed points are linked through a projection center. • The 4x4 matrix is defined up to scale, and so has 15 degrees of freedom.

More 3D-3D and 2D-2D Transforms Projective (15 dof): Projective (aka Homography, 8 dof): 𝑌′ 1 𝑌 1 𝑦 ′1 𝑦 1 𝑌′ 2 𝑌 2 𝑦 ′2 𝑦 2 = 𝐼 3×3 = 𝑸 4×4 𝑌 3 𝑦 3 𝑌′ 3 𝑦 ′ 3 𝑌 4 𝑌′ 4 Affine (6 dof): Affine (12 dof): 1 = 𝑩 𝟑×𝟑 𝐮 2 𝐲 𝐲′ 1 = 𝑩 3×3 𝐮 3 𝐘 𝐘′ 𝟏 𝑈 1 1 𝟏 𝑈 1 1 Similarity (5 dof): Similarity (7 dof): 1 = 𝑇𝑺 2×2 𝐮 2 𝐲 𝐲′ = 𝑇𝑺 3×3 𝐮 3 𝐘 𝐘′ 𝟏 𝑈 1 1 𝟏 𝑈 1 1 1 Euclidean (4 dof): Euclidean (6 dof): 1 = 𝑺 2×𝟑 𝐮 𝟑 𝐲 𝐲′ = 𝑺 3×3 𝐮 3 𝐘 𝐘′ 1 𝟏 𝑈 1 𝟏 𝑈 1 1 1

2D-2D Transform Examples 𝑏 11 𝑏 12 𝑢 𝑦 ℎ 11 ℎ 12 ℎ 12 cos 𝜄 − sin 𝜄 𝑢 𝑦 𝑡cos 𝜄 − 𝑡sin 𝜄 𝑢 𝑦 𝑏 21 𝑏 22 𝑢 𝑧 ℎ 21 ℎ 22 ℎ 23 sin 𝜄 cos 𝜄 𝑢 𝑧 𝑡sin 𝜄 𝑡cos 𝜄 𝑢 𝑧 ℎ 31 ℎ 32 ℎ 33 0 0 1 0 0 1 0 0 1 Euclidean Similarity Affine Projective 3 DoF 4 DoF 6 DoF 8 DoF

Perspective 3D-2D Transforms • Similar to a 3D-3D projective transform, but constr train the transformed poi point to to a a plane 𝒜 = 𝒈 . pla 𝑦 1 𝑦 2 𝑨 = 𝑔 → 𝐘 image = 𝑔 1 • Because z = 𝑔 is fixed, we can write: 𝑞 11 𝑞 12 𝑞 13 𝑞 14 𝑦 1 𝑌 1 𝑞 21 𝑞 22 𝑞 23 𝑞 24 𝑦 2 𝑌 2 𝜇 = 𝑔𝑞 31 𝑔𝑞 32 𝑔𝑞 33 𝑔𝑞 34 𝑔 𝑌 3 𝑞 31 𝑞 32 𝑞 33 𝑞 34 1 1 The 3 rd row is redundant, so: • 𝑌 1 𝑌 1 𝑞 11 𝑞 12 𝑞 13 𝑞 14 𝑦 1 𝑌 2 𝑌 2 𝑞 21 𝑞 22 𝑞 23 𝑞 24 𝑦 2 𝜇 = = 𝑄 3×4 𝑌 3 𝑌 3 𝑞 31 𝑞 32 𝑞 33 𝑞 34 1 1 1 𝑄 3×4 is the pr projection matrix ix and this is a per perspective transform

C18 Computer Vision Lecture 5 Imaging geometry, camera calibration - PowerPoint PPT Presentation

C18 Computer Vision Lecture 5 Imaging geometry, camera calibration Victor Adrian Prisacariu http://www.robots.ox.ac.uk/~victor InfiniDense DEMO Course Content Projective geometry, camera calibration. Salient feature detection.

C18 Computer Vision Lecture 6 Salient feature detection: points, edges and SIFTs Victor Adrian

C18 Computer Vision Lecture 8 Recovering 3D from two images II: stereo correspondences,

C18 Computer Vision Lecture 7 Recovering 3D from two images I: epipolar geometry Victor Adrian

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

Computer Vision Introduction Historical context Connections to other disciplines Vision and

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

J J R R Our Vision . . . Our Vision . . . Our Vision . . . Our Vision . . . TO BE THE BEST

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

2017 Humana Vision 130 LOOK Whats NEW! NEW RETAIL FRAME BENEFIT 2 Humana Vision 100

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

Building Our Vision St. Andrews Vision and Mission Our Vision: Our Vision: The Tree of Life is

A Dataset for Developing and Benchmarking Active Vision Phil Ammirato, Patrick Poirson, Eunbyung

Architecture and Synthesis for Power-Efficient FPGAs Jason Cong University of California, Los

Flex Ray: Serial Interface - a Formal Model for Coding and Decoding Seminar: The FlexRay

Context More a more devices are powered by battery: High performance Required features: Long

BADGr: A Toolbox for Box-based Approximation, Decomposition and Grasping Kai Huebner

5. Situated Agents (Robots) Part 1: Introduction to Robotics. ) Vision and uncertainty Vision

Coded MapReduce Mohammad Ali Maddah-Ali Bell Labs, Alcatel-Lucent joint work with Sonze Li

PORTRAIT PAINTING USING ACTIVE TEMPLATES Mingtian Zhao, Song-Chun Zhu University of California,

C18 Computer Vision Lecture 5 Imaging geometry, camera calibration - PowerPoint PPT Presentation

C18 Computer Vision Lecture 5 Imaging geometry, camera calibration Victor Adrian Prisacariu http://www.robots.ox.ac.uk/~victor InfiniDense DEMO Course Content Projective geometry, camera calibration. Salient feature detection.

C18 Computer Vision Lecture 6 Salient feature detection: points, edges and SIFTs Victor Adrian

C18 Computer Vision Lecture 8 Recovering 3D from two images II: stereo correspondences,

C18 Computer Vision Lecture 7 Recovering 3D from two images I: epipolar geometry Victor Adrian

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Branding Presentation VISION Mevushal VISION Muscat of Alexandria &amp; Viognier VISION

Vision Services Vision Services &amp; &amp; Vision Therapy Vision Therapy February 2, 2007

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

Computer Vision Introduction Historical context Connections to other disciplines Vision and

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

J J R R Our Vision . . . Our Vision . . . Our Vision . . . Our Vision . . . TO BE THE BEST

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

2017 Humana Vision 130 LOOK Whats NEW! NEW RETAIL FRAME BENEFIT 2 Humana Vision 100

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

Building Our Vision St. Andrews Vision and Mission Our Vision: Our Vision: The Tree of Life is

A Dataset for Developing and Benchmarking Active Vision Phil Ammirato, Patrick Poirson, Eunbyung

Architecture and Synthesis for Power-Efficient FPGAs Jason Cong University of California, Los

Flex Ray: Serial Interface - a Formal Model for Coding and Decoding Seminar: The FlexRay

Context More a more devices are powered by battery: High performance Required features: Long

BADGr: A Toolbox for Box-based Approximation, Decomposition and Grasping Kai Huebner

5. Situated Agents (Robots) Part 1: Introduction to Robotics. ) Vision and uncertainty Vision

Coded MapReduce Mohammad Ali Maddah-Ali Bell Labs, Alcatel-Lucent joint work with Sonze Li

PORTRAIT PAINTING USING ACTIVE TEMPLATES Mingtian Zhao, Song-Chun Zhu University of California,

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007