C18 Computer Vision
Victor Adrian Prisacariu
http://www.robots.ox.ac.uk/~victor
Lecture 5
Imaging geometry, camera calibration
C18 Computer Vision Lecture 5 Imaging geometry, camera calibration - - PowerPoint PPT Presentation
C18 Computer Vision Lecture 5 Imaging geometry, camera calibration Victor Adrian Prisacariu http://www.robots.ox.ac.uk/~victor InfiniDense DEMO Course Content Projective geometry, camera calibration. Salient feature detection.
Victor Adrian Prisacariu
http://www.robots.ox.ac.uk/~victor
Lecture 5
Imaging geometry, camera calibration
geometry.
correspondences, triangulation, neural nets. Slides at http://www.robots.ox.ac.uk/~victor -> Teaching
Lots borrowed from David Murray + AV C18.
ltiple Vie iew Geometry try in in Computer Visi ision
ision: A Modern Approach
imensional Computer Visi ision: A Geometr tric Vie iewpoint
5. 5. Im Imaging geometry, camera calibration. 1. Introduction.
3. Perspective using homogeneous coordinates. 4. Calibration the elements of the perspective model. 6. Salient feature detection and description. 7. Recovering 3D from two images I: epipolar geometry. 8. Recovering 3D from two images II: stereo correspondences, triangulation, neural nets.
Aim in geometric computati tion vis isio ion is to take a number of 2D images, and
evolves over time. What do we have here …?
… seems very easy …
Although human and (3D) computer vision might be bags of tricks, it is useful to place the tricks with ithin la larger proce cessing paradigms. For example: a) Data-driven, bottom-up processing. b) Model-driven, top-down, generative processing. c) Dynamic Vision (mixes bottom-up with top-down feedback). d) Active Vision (task oriented). e) Data-driven discriminative approach (machine learning). These are neither all-embracing nor exclusive.
produces map of salient 2D features.
range of shape from X processes whose output was the 2.5 .5D sketch.
get a fully 3D obje ject- ce centered description.
generati tive proce cessing:
– a model of the scene is assumed known. – Supply a pose for the object relative to the camera, and use projection to predict where salient features should be found in the image space. – Search for the features, and refine the pose by minimizing the observed deviation.
vision: mixes bottom- up/top-down by introducing feedback.
Top-down Dynamic
actio ion lo loops:
– Visual data needs only be “good enough” to drive the particular action.
representation of the surroundings.
are needed.
learn a description of the transformation between input and output using exemplars.
representation are favored.
5.2 The perspective camera as a geometric device
Cat nose x 520 520 x = 295
308
𝐘 = 𝑌1 𝑌2 𝑌3 𝐲 = 𝑦1 𝑦2
The point 𝐘 in world space projects to the point 𝐲 in image space
𝐘 = 𝑌1 𝑌2 𝑌3 𝐲 = 𝑦1 𝑦2
film/sensor cat Output would be blurry if film just exposed to the cat
film/sensor cat barrier Blur reduced, looks good ☺
𝐘 = 𝑌1 𝑌2 𝑌3 𝐲 = 𝑦1 𝑦2
cat All rays pass through the ce center of
projection (a single point). Image forms on the image plane. Image Plane pinhole
𝐘 = 𝑌1 𝑌2 𝑌3 𝐲 = 𝑦1 𝑦2
𝐲 = 𝑦1 𝑦2
𝑌1 𝑌2 𝑌3
f image plane f – focal length
p – principal point p Optical axis The 3D point 𝐘 = 𝑌1 𝑌2 𝑌3 is imaged into 𝐲 = 𝑦1 𝑦2 as: 𝑦1 𝑦2 = 𝑔
𝑌1 𝑌3
𝑔
𝑌2 𝑌3
– involves representing the image and scene in higher dimensional space.
better.
be concatenated more easily.
3D Euclidean transforms: inh inhomogeneous coordinates
space.
the nose can be described using an Eucli lidean tr transform:
𝐘3×1
′
= 𝑺3×3𝐘3×1 + 𝐮3×1
rotation translation
3D Euclidean transforms: inh inhomogeneous coordinates
′
= 𝑺3×3𝐘3×1 + 𝐮3×1
mess!
3D Euclidean transforms: homogeneous coordinates
𝑌 𝑍 𝑎 with a four vector 𝑌 𝑍 𝑎 1 .
𝐘′ 1 = 𝑭 𝐘 1 = 𝑺 𝐮 𝟏𝑈 1 𝐘 1
multiplication: 𝐘1 1 = 𝑭10 𝐘0 1 𝐘2 1 = 𝑭21 𝐘1 1 → 𝐘2 1 = 𝑭21𝑭10 𝐘𝟏 1
Homogeneous coordinates – definition in 𝑆3
𝑌, 𝑍, 𝑎 𝑈 is represented in homogeneous coordinates by any 4-vector 𝑌1 𝑌2 𝑌3 𝑌4
any 𝜇 ≠ 0: 𝑌1 𝑌2 𝑌3 𝑌4 and 𝜇 𝑌1 𝑌2 𝑌3 𝑌4
represent the sam same inhomogeneous point 2,3,5 𝑈
Homogeneous coordinates – definition in 𝑆2
𝑦, 𝑧 𝑈 is represented in homogeneous coordinates by any 3-vector 𝑦1 𝑦2 𝑦3
represent the same inhomogeneous point 0.33,0.66 𝑈
homogeneous vector: 𝑌 𝑍 𝑎 → 𝑌 𝑍 𝑎 1
𝑌1 𝑌2 𝑌3 𝑌4 → 𝑌1/𝑌4 𝑌2/𝑌4 𝑌3/𝑌4
Projective transformations
homogeneous 4-vectors represented by a non-singular 4x4 matr trix ix. 𝑌′1 𝑌′2 𝑌′3 𝑌′4 = 𝑞11 𝑞12 𝑞13 𝑞14 𝑞21 𝑞22 𝑞23 𝑞24 𝑞31 𝑞32 𝑞33 𝑞34 𝑞41 𝑞42 𝑞43 𝑞44 𝑌1 𝑌2 𝑌3 𝑌4
transformed points are linked through a projection center.
Projective (15 dof): 𝑌′1 𝑌′2 𝑌′3 𝑌′4 = 𝑸4×4 𝑌1 𝑌2 𝑌3 𝑌4 Affine (12 dof): 𝐘′ 1 = 𝑩3×3 𝐮3 𝟏𝑈 1 𝐘 1 Similarity (7 dof): 𝐘′ 1 = 𝑇𝑺3×3 𝐮3 𝟏𝑈 1 𝐘 1 Euclidean (6 dof): 𝐘′ 1 = 𝑺3×3 𝐮3 𝟏𝑈 1 𝐘 1 Projective (aka Homography, 8 dof): 𝑦′1 𝑦′2 𝑦′3 = 𝐼3×3 𝑦1 𝑦2 𝑦3 Affine (6 dof): 𝐲′ 1 = 𝑩𝟑×𝟑 𝐮2 𝟏𝑈 1 𝐲 1 Similarity (5 dof): 𝐲′ 1 = 𝑇𝑺2×2 𝐮2 𝟏𝑈 1 𝐲 1 Euclidean (4 dof): 𝐲′ 1 = 𝑺2×𝟑 𝐮𝟑 𝟏𝑈 1 𝐲 1
cos 𝜄 − sin 𝜄 𝑢𝑦 sin 𝜄 cos 𝜄 𝑢𝑧 1 𝑡cos 𝜄 − 𝑡sin 𝜄 𝑢𝑦 𝑡sin 𝜄 𝑡cos 𝜄 𝑢𝑧 1 𝑏11 𝑏12 𝑢𝑦 𝑏21 𝑏22 𝑢𝑧 1 ℎ11 ℎ12 ℎ12 ℎ21 ℎ22 ℎ23 ℎ31 ℎ32 ℎ33
Euclidean 3 DoF Similarity 4 DoF Affine 6 DoF Projective 8 DoF
train the transformed poi point to to a a pla plane 𝒜 = 𝒈. 𝑨 = 𝑔 → 𝐘image = 𝑦1 𝑦2 𝑔 1
𝜇 𝑦1 𝑦2 𝑔 1 = 𝑞11 𝑞12 𝑞13 𝑞14 𝑞21 𝑞22 𝑞23 𝑞24 𝑔𝑞31 𝑔𝑞32 𝑔𝑞33 𝑔𝑞34 𝑞31 𝑞32 𝑞33 𝑞34 𝑌1 𝑌2 𝑌3 1
𝜇 𝑦1 𝑦2 1 = 𝑞11 𝑞12 𝑞13 𝑞14 𝑞21 𝑞22 𝑞23 𝑞24 𝑞31 𝑞32 𝑞33 𝑞34 𝑌1 𝑌2 𝑌3 1 = 𝑄3×4 𝑌1 𝑌2 𝑌3 1 𝑄3×4 is the pr projection matrix ix and this is a per perspective transform
5.3 Perspective using homogeneous coordinates
𝐘 = 𝑌1 𝑌2 𝑌3 𝐲 = 𝑦1 𝑦2
𝑦1 𝑦2 = 𝑔 𝑌1
𝑌3
𝑔 𝑌2
𝑌3
𝜇 𝑦1 𝑦2 1 = 𝑔 𝑔 1 𝑌1 𝑌2 𝑌3 1 → 𝜇𝑦1 = 𝑔𝑌1 𝜇𝑦2 = 𝑔𝑌2 𝜇 = 𝑌3 → 𝑦1 = 𝑔 𝑌1 𝑌3 𝑦2 = 𝑔 𝑌2 𝑌3
Perspective using homogeneous coordinates 𝜇 𝑦1 𝑦2 1 = 𝑔 𝑔 1 𝑌1 𝑌2 𝑌3 1
Image Point Projection Matrix World Point
Perspective using homogeneous coordinates
parts:
1. a part that depends on the internals of the camera 2. a vanilla projection matrix 3. a Euclidean transformation between the world and camera frames.
camera coords, so that the extr xtrin insic camera matr trix is is id identity ty and get:
Imag Image Poin
Camera’s Intrinsic Calib Calibration Proj
ion matr trix (v (vanill lla) Camera’s Extrinsic Calib Calibration Wor
ld Poin
𝜇 𝑦 𝑧 1 𝑔 𝑔 1 1 1 1 1 1 1 1 𝐘
Perspective using homogeneous coordinates
– Insert a rotation 𝑺 and translation 𝐮 between world and camera coordinates. – Insert some extra term in the intrinsic calibration matrix.
Imag Image Poi
Camera’s Intrinsic Calib Calibration Proj
ion matr trix (v (vanill lla) Camera’s Extrinsic Calib Calibration Wor
ld Poin
𝜇 𝑦 𝑧 1 𝑔 𝑡𝑔 𝑣0 𝛿𝑔 𝑤𝑝 1 1 1 1 𝑠
11
𝑠
12
𝑠
13
𝑢1 𝑠21 𝑠22 𝑠23 𝑢2 𝑠31 𝑠32 𝑠33 𝑢3 1 𝐘
The camera’s extrinsic calibration is just the rotation 𝑆 and translation 𝐮 that take points from the world frame to the camera frame. 𝐘c 1 = 𝑺 𝐮 𝟏𝑈 1 𝐘𝑋 1
representations (Euler angles, quaternions, etc.).
ler angle gles capture the angles of rotation axis using 3 parameters, one for each axis.
𝑌′ = 𝑆𝑨𝑌𝑋 = cos 𝜄𝑨 sin 𝜄𝑨 − sin 𝜄𝑨 cos 𝜄𝑨 1 𝑌𝑥 𝑌′′ = 𝑆𝑧𝑌′ = cos 𝜄𝑧 − sin 𝜄𝑧 1 sin 𝜄𝑧 cos 𝜄𝑧 𝑌′ 𝑌𝐵 = 𝑆𝑦𝑌′′ = 1 cos 𝜄𝑦 ± sin 𝜄𝑦 ∓ sin 𝜄𝑦 cos 𝜄𝑦 𝑌′′ 𝑺𝐷𝑋 = 𝑺𝑦𝑺𝑧𝑺𝑨 Or Order matters!
𝑺𝐷𝑋 𝐮𝐷𝑋 𝟏𝑈 1
−1
= 𝑺𝑋𝐷 𝐮𝑋𝐷 𝟏𝑈 1
𝑺𝑋𝐷 = 𝑺𝐷𝑋 −1 = 𝑺𝐷𝑋 𝑈
For rotation:
𝐮𝑋𝐷 = −𝐮𝐷𝑋
For translation:
𝐮𝑋𝐷 = −𝑺𝑋𝐷𝐮𝐷𝑋
Describe har ardware properties of real cameras:
– The image plane might be skewed. – The central axis of the lens might not line up with the optical axis. – The light gathering elements might not be square. – Lens distortion.
𝐿 = 𝑔 𝛿𝑔 1 1
𝑣0 𝑔
1
𝑤0 𝛿𝑔
1 1 𝑡 1 1 = 𝑔 𝑡𝑔 𝑣𝑝 𝛿𝑔 𝑤𝑝 1
different scaling
𝛿 is the aspect ratio. Origin offset, (𝑣𝑝, 𝑤𝑝) is the principal point. s accounts for skew
1. Move scene point 𝐘𝑋, 1 𝑈 into camera coordinate by 4 × 4 extrinsic Euclidean transformation: 𝐘𝐷 1 = 𝑺 𝐮 𝟏𝑈 1 𝐘𝑋 1 2. Project into ideal camera via a vanilla perspective transformation: 𝐲′ 1 = 𝑱|𝟏 𝐘𝐷 1 3. Map the ideal image into the real image using intrinsic matrix: 𝐲 1 = 𝑳 𝐲′ 1
, and accounts for the internal physical characteristics of the camera.
auto-calibration or pre-calibration.
libration, using a specially made “known” visual scene.
Imag Image Poi
Camera’s Intrinsic Calib Calibration Proj
ion matr trix (v (vanill lla) Camera’s Extrinsic Calib Calibration Wor
ld Poin
𝜇 𝑦 𝑧 1 𝑔 𝑡𝑔 𝑣0 𝛿𝑔 𝑤𝑝 1 1 1 1 𝑠
11
𝑠
12
𝑠
13
𝑢1 𝑠21 𝑠22 𝑠23 𝑢2 𝑠31 𝑠32 𝑠33 𝑢3 1 𝐘 𝜇𝐲 𝑳 [𝑱|𝟏] 𝑺 𝐮 𝟏𝑈 1 𝐘
Ca Camera calib libratio ion: recover r 𝑳
−𝟐
−𝟐
using QR decomposition into 𝑺 and 𝑳.
𝑞24 𝑞24 𝑈.
𝜇 𝑦𝑗 𝑧𝑗 1 = 𝑞11 𝑞12 𝑞13 𝑞14 𝑞21 𝑞22 𝑞23 𝑞24 𝑞31 𝑞32 𝑞33 𝑞34 𝑌𝑗 𝑍
𝑗
𝑎𝑗 1 𝜇𝑗 = 𝑞31𝑌𝑗 + 𝑞32𝑍
𝑗 + 𝑞33𝑎𝑗 + 𝑞34
𝑞31𝑌𝑗 + 𝑞32𝑍
𝑗 + 𝑞33𝑎𝑗 + 𝑞34 𝑦𝑗 = 𝑞11𝑌𝑗 + 𝑞12𝑍 𝑗 + 𝑞13𝑎𝑗 + 𝑞14
𝑞31𝑌𝑗 + 𝑞32𝑍
𝑗 + 𝑞33𝑎𝑗 + 𝑞34 𝑧𝑗 = 𝑞21𝑌𝑗 + 𝑞22𝑍 𝑗 + 𝑞23𝑎𝑗 + 𝑞24
𝑌𝑗 𝑍
𝑗
𝑎𝑗 1 −𝑌𝑗𝑦𝑗 −𝑍
𝑗𝑦𝑗
−𝑎𝑗𝑦𝑗 −𝑦𝑗 𝑌𝑗 𝑍
𝑗
𝑎𝑗 1 −𝑌𝑗𝑧𝑗 −𝑍
𝑗𝑧𝑗
−𝑎𝑗𝑧𝑗 −𝑧𝑗 𝐪 = 0 whe here 𝐪 contains the un unknowns.
−𝟐
−𝟐
using QR decomposition into 𝑺 and 𝑳.
𝑞24 𝑞24 𝑈.
𝑸 = 𝑞11 𝑞12 𝑞13 𝑞14 𝑞21 𝑞22 𝑞23 𝑞24 𝑞31 𝑞32 𝑞33 𝑞34 𝑸𝑀𝐹𝐺𝑈
−𝟐
−𝟐
using QR decomposition into 𝑺 and 𝑳.
𝑞24 𝑞24 𝑈.
𝑄𝑀𝐹𝐺𝑈
−1
R K
K R
−𝟐
−𝟐
using QR decomposition into 𝑺 and 𝑳.
𝑞24 𝑞34 𝑈.
Camera Calibration – Example Algorithm
Can be done without point matches …
turn our camera into a notional camera with the world and the camera coordinates aligned and an “ideal” image plane.
distortions and aberrations. Radial distortion is the most common – see the Q sheet.
http://www.vision.caltech.edu/bouguetj/calib_doc/htmls/example.html
In this lecture we have:
vision, and some paradigms.
introduced homogeneous coordinates.
and saw that it could be made linear using homogeneous coordinates.
image of six or more known scene points.