Jean Ponce (ponce@di.ens.fr) http://www.di.ens.fr/~ponce - - PowerPoint PPT Presentation

jean ponce ponce di ens fr http di ens fr ponce equipe
SMART_READER_LITE
LIVE PREVIEW

Jean Ponce (ponce@di.ens.fr) http://www.di.ens.fr/~ponce - - PowerPoint PPT Presentation

Jean Ponce (ponce@di.ens.fr) http://www.di.ens.fr/~ponce Equipe-projet WILLOW ENS/INRIA/CNRS UMR 8548 Laboratoire dInformatique Ecole Normale Suprieure, Paris Cordelia Schmid Jean Ponce http://www.di.ens.fr/~ponce/


slide-1
SLIDE 1
slide-2
SLIDE 2

Jean Ponce (ponce@di.ens.fr) http://www.di.ens.fr/~ponce Equipe-projet WILLOW ENS/INRIA/CNRS UMR 8548 Laboratoire d’Informatique Ecole Normale Supérieure, Paris

slide-3
SLIDE 3

Cordelia Schmid

http://lear.inrialpes.fr/~schmid/

Josef Sivic

http://www.di.ens.fr/~josef/

Jean Ponce

http://www.di.ens.fr/~ponce/

Ivan Laptev

http://www.irisa.fr/vista/Equipe/People/Ivan.Laptev.html

slide-4
SLIDE 4

Outline

  • What computer vision is about
  • What this class is about
  • A brief history of visual recognition
  • A brief recap on geometry
slide-5
SLIDE 5

Images are brightness/color patterns drawn in a plane. They are formed by the projection of three-dimensional objects.

slide-6
SLIDE 6
slide-7
SLIDE 7

Camera Obscura in Edinburgh

Pinhole camera: trade-off between sharpness and light transmission

slide-8
SLIDE 8

Advantages of lens systems

E=(Π/4) [ (d/z’)2 cos4α ] L

  • Can focus

sharply on close and distanced

  • bjects
  • Transmits more

light than a pinhole camera

slide-9
SLIDE 9

Fundamental problem I: 3D world is “flattened” to 2D images Loss of information

3D scene Lense Image

slide-10
SLIDE 10

Question : how do we see “in 3D” ? (First-order) answer: with our two eyes.

slide-11
SLIDE 11

Epipolar Geometry

slide-12
SLIDE 12

Simulated 3D perception Disparity

slide-13
SLIDE 13

Depth cues: Linear perspective

But there are other cues..

slide-14
SLIDE 14

Shape from texture

slide-15
SLIDE 15

Depth cues: Aerial perspective

slide-16
SLIDE 16

[K. HE, J. Sun and X. Tang, CVPR 2009]

Depth from haze

Input haze image Reconstructed images Recovered depth map

slide-17
SLIDE 17

Shape and lighting cues: Shading

Source: J. Koenderink

slide-18
SLIDE 18

Source: J. Koenderink

slide-19
SLIDE 19

What is happening with the shadows?

slide-20
SLIDE 20

Image source: F. Durand

slide-21
SLIDE 21

Challenges or opportunities?

  • Images are confusing, but they also reveal the

structure of the world through numerous cues.

  • Our job is to interpret the cues!

Image source: J. Koenderink

slide-22
SLIDE 22

The goal of computer vision

To perceive the “world behind the picture”, e.g.,

  • as a metric measurement device
  • as a device for measuring “semantic” information
slide-23
SLIDE 23

The goal of computer vision

To perceive the “world behind the picture”, e.g.,

  • as a metric measurement device
  • as a device for “measuring” semantic information
slide-24
SLIDE 24

Vision as metric measurement device: Furukawa & Ponce (CVPR’07) (cf also Keriven’s class “Vision et reconstruction 3D)

slide-25
SLIDE 25

glass candle person drinking indoors car car car person kidnapping house street

  • utdoors

person car street

  • utdoors

car enter person car road field countryside car crash exit through a door building car people

  • utdoors

But we want much more than 3D: ex: Visual scene analysis

slide-26
SLIDE 26

How to make sense of “pixel-chaos”?

3D Scene reconstruction Object class recognition Face recognition Action recognition

Drinking

slide-27
SLIDE 27

Fundamental problem II: Images do not measure the meaning

  • We need lots of prior

knowledge to make meaningful interpretations of an image

slide-28
SLIDE 28

Outline

  • What computer vision is about
  • What this class is about
  • A brief history of visual recognition
  • A brief recap on geometry
slide-29
SLIDE 29

Specific object detection

(Lowe, 2004)

slide-30
SLIDE 30

Image classification

Caltech 101 : http://www.vision.caltech.edu/Image_Datasets/Caltech101/

slide-31
SLIDE 31

View variation Within-class variation Light variation Partial visibility

Object category detection

slide-32
SLIDE 32

Model ≡ locally rigid assembly of parts Part ≡ locally rigid assembly of features

Qualitative experiments on Pascal VOC’07 (Kushal, Schmid, Ponce, 2008)

slide-33
SLIDE 33

Scene understanding

Photo courtesy A. Efros.

slide-34
SLIDE 34

Local ambiguity and global scene interpretation

slide credit: Fei-Fei, Fergus & Torralba

slide-35
SLIDE 35
  • 1. Introduction plus recap on geometry (J. Ponce)
  • 2. Instance-level recognition I. - Local invariant features (C. Schmid)
  • 3. Instance-level recognition II. - Correspondence, efficient visual search (J. Sivic)
  • 4. Very large scale image indexing. Bag-of-feature models for category-level

recognition (C. Schmid)

  • 5. Sparse coding and dictionary learning for image analysis (J. Ponce)
  • 6. Part-based models and pictorial structures for object recognition (J. Sivic)
  • 7. Motion and human actions I. (I. Laptev)
  • 8. Motion and human actions II. (I. Laptev)
  • 9. Neural networks; Optimization methods (J. Ponce)
  • 10. Category level localization; Face detection and recognition (C. Schmid)
  • 11. Multiple object categories; Context; Recognizing large number of object classes;

Segmentation (I. Laptev, J. Sivic)

  • 12. Final project presentations (J. Sivic, I. Laptev)

This class

slide-36
SLIDE 36

Computer vision books

  • D.A. Forsyth and J. Ponce, “Computer Vision:

A Modern Approach, Prentice-Hall, 2003.

  • J. Ponce, M. Hebert, C. Schmid, and A. Zisserman,

“Toward category-level object recognition”, Springer LNCS, 2007.

  • R. Szeliski, “Computer Vision: Algorithms and

Applications”, Springer, 2010.

  • O. Faugeras, Q.T. Luong, and T. Papadopoulo,

“Geometry of Multiple Images,” MIT Press, 2001.

  • R. Hartley and A. Zisserman, “Multiple View

Geometry in Computer Vision”, Cambridge University Press, 2004.

  • J. Koenderink, “Solid Shape”, MIT Press, 1990.
slide-37
SLIDE 37

Class web-page

http://www.di.ens.fr/willow/teaching/recvis10 Slides available after classes:

http://www.di.ens.fr/willow/teaching/recvis10/lecture1.pptx http://www.di.ens.fr/willow/teaching/recvis10/lecture1.pdf

Note: Much of the material used in this lecture is courtesy of Svetlana Lazebnik:, http://www.cs.unc.edu/~lazebnik/

slide-38
SLIDE 38

Outline

  • What computer vision is about
  • What this class is about
  • A brief history of visual recognition
  • A brief recap on geometry
slide-39
SLIDE 39

Variability: Camera position Illumination Internal parameters Within-class variations

slide-40
SLIDE 40

Variability: Camera position Illumination Internal parameters

θ

Roberts (1963); Lowe (1987); Faugeras & Hebert (1986); Grimson & Lozano-Perez (1986); Huttenlocher & Ullman (1987)

slide-41
SLIDE 41

Origins of computer vision

  • L. G. Roberts, Machine Perception
  • f Three Dimensional Solids,

Ph.D. thesis, MIT Department of Electrical Engineering, 1963.

slide-42
SLIDE 42

Huttenlocher & Ullman (1987)

slide-43
SLIDE 43

Variability Invariance to: Camera position Illumination Internal parameters

Duda & Hart ( 1972); Weiss (1987); Mundy et al. (1992-94); Rothwell et al. (1992); Burns et al. (1993)

slide-44
SLIDE 44

BUT: True 3D objects do not admit monocular viewpoint invariants (Burns et al., 1993) !! Projective invariants (Rothwell et al., 1992): Example: affine invariants of coplanar points

slide-45
SLIDE 45

Empirical models of image variability:

Appearance-based techniques

Turk & Pentland (1991); Murase & Nayar (1995); etc.

slide-46
SLIDE 46

Eigenfaces (Turk & Pentland, 1991)

slide-47
SLIDE 47

Appearance manifolds

(Murase & Nayar, 1995)

slide-48
SLIDE 48

Correlation-based template matching (60s)

Ballard & Brown (1980, Fig. 3.3). Courtesy Bob Fisher and Ballard & Brown on-line.

  • Automated target recognition
  • Industrial inspection
  • Optical character recognition
  • Stereo matching
  • Pattern recognition
slide-49
SLIDE 49

Lowe’02 Mahamud & Hebert’03

In the lates 1990s, a new approach emerges: Combining local appearance, spatial constraints, invariants, and classification techniques from machine learning.

Query Retrieved (10o off) Schmid & Mohr’97

slide-50
SLIDE 50

ACRONYM (Brooks and Binford, 1981)

Representing and recognizing object categories is harder

Binford (1971), Nevatia & Binford (1972), Marr & Nishihara (1978)

slide-51
SLIDE 51

The Blum transform, 1967 Generalized cylinders (Binford, 1971)

Parts and invariants

slide-52
SLIDE 52

Generalized cylinders

(Binford, 1971; Marr & Nishihara, 1978) (Nevatia & Binford, 1972)

slide-53
SLIDE 53

Zhu and Yuille (1996) Ponce et al. (1989) Ioffe and Forsyth (2000)

Parts and invariants II

slide-54
SLIDE 54

Fergus, Perona & Zisserman (2003)

In the early 2000’s, a new approach ?

slide-55
SLIDE 55

Ballard & Brown (1980, Fig. 11.5). Courtesy Bob Fisher and Ballard & Brown on-line.

The “templates and springs” model (Fischler & Elschlager, 1973)

slide-56
SLIDE 56

slide credit: Fei-Fei, Fergus & Torralba

slide-57
SLIDE 57

Color histograms (S&B’91) Local jets (Florack’93) Spin images (J&H’99) Sift (Lowe’99) Shape contexts (B&M’95) Texton histograms (L&M’97) Gist (O&T’05) Spatial pyramids (LSP’06) Hog (D&T’06) Phog (B&Z’07) Convolutional nets (LC’90)

slide-58
SLIDE 58

Locally orderless structure of images (K&vD’99)

slide-59
SLIDE 59

Felzwenszalb, McAllester, Ramanan (2007)

[Wins on 6 of the Pascal’07 classes, see Chum & Zisserman (2007) for the other big winner.]

slide-60
SLIDE 60

Number of research papers with key-words “object recognition”, source: Springer.com

slide-61
SLIDE 61

Numbers of papers with key-words “epipolar geometry” source: Springer.com Visual Geometry Object Recognition

slide-62
SLIDE 62

Visual Geometry: Problems: Camera calibration, 3D reconstruction, Structure and motion estimation, … Tools: Bundle adjustment, Wide baseline matching, …

Scale/affine – invariant regions: SIFT, Harris-Laplace, etc.

slide-63
SLIDE 63

Outline

  • What computer vision is about
  • What this class is about
  • A brief history of visual recognition
  • A brief recap on geometry
slide-64
SLIDE 64

Feature-based alignment outline

slide-65
SLIDE 65

Feature-based alignment outline

Extract features

slide-66
SLIDE 66

Feature-based alignment outline

Extract features Compute putative matches

slide-67
SLIDE 67

Feature-based alignment outline

Extract features Compute putative matches Loop:

  • Hypothesize transformation T (small group of putative

matches that are related by T)

slide-68
SLIDE 68

Feature-based alignment outline

Extract features Compute putative matches Loop:

  • Hypothesize transformation T (small group of putative

matches that are related by T)

  • Verify transformation (search for other matches consistent

with T)

slide-69
SLIDE 69

Feature-based alignment outline

Extract features Compute putative matches Loop:

  • Hypothesize transformation T (small group of putative

matches that are related by T)

  • Verify transformation (search for other matches consistent

with T)

slide-70
SLIDE 70

2D transformation models

Similarity (translation, scale, rotation) Affine Projective (homography)

Why these transformations ???

slide-71
SLIDE 71

Pinhole perspective equation

NOTE: z is always negative..

slide-72
SLIDE 72

Affine models: Weak perspective projection

is the magnification.

When the scene relief is small compared its distance from the Camera, m can be taken constant: weak perspective projection.

slide-73
SLIDE 73

Affine models: Orthographic projection When the camera is at a (roughly constant) distance from the scene, take m=1.

slide-74
SLIDE 74

Analytical camera geometry

slide-75
SLIDE 75

Coordinate Changes: Pure Translations

OBP = OBOA + OAP , BP = AP + BOA

slide-76
SLIDE 76

Coordinate Changes: Pure Rotations

slide-77
SLIDE 77

Coordinate Changes: Rotations about the z Axis

slide-78
SLIDE 78

A rotation matrix is characterized by the following properties:

  • Its inverse is equal to its transpose, and
  • its determinant is equal to 1.

Or equivalently:

  • Its rows (or columns) form a right-handed
  • rthonormal coordinate system.
slide-79
SLIDE 79

Coordinate changes: pure rotations

slide-80
SLIDE 80

Coordinate Changes: Rigid Transformations

slide-81
SLIDE 81

Pinhole perspective equation

NOTE: z is always negative..

slide-82
SLIDE 82

The intrinsic parameters of a camera Normalized image coordinates Physical image coordinates Units: k,l : pixel/m f : m α,β : pixel

slide-83
SLIDE 83

The intrinsic parameters of a camera Calibration matrix The perspective projection equation

slide-84
SLIDE 84

The extrinsic parameters of a camera

slide-85
SLIDE 85

Perspective projections induce projective transformations between planes

slide-86
SLIDE 86

Weak-perspective projection Paraperspective projection

Affine cameras

slide-87
SLIDE 87

Orthographic projection Parallel projection

More affine cameras

slide-88
SLIDE 88

Weak-perspective projection model

r

(p and P are in homogeneous coordinates)

p = A P + b

(neither p nor P is in hom. coordinates)

p = M P

(P is in homogeneous coordinates)

slide-89
SLIDE 89

Affine projections induce affine transformations from planes

  • nto their images.
slide-90
SLIDE 90

Affine transformations

An affine transformation maps a parallelogram onto another parallelogram

slide-91
SLIDE 91

Fitting an affine transformation

Assume we know the correspondences, how do we get the transformation?

slide-92
SLIDE 92

Fitting an affine transformation

Linear system with six unknowns Each match gives us two linearly independent equations: need at least three to solve for the transformation parameters

slide-93
SLIDE 93

Beyond affine transformations

What is the transformation between two views of a planar surface? What is the transformation between images from two cameras that share the same center?

slide-94
SLIDE 94

Perspective projections induce projective transformations between planes

slide-95
SLIDE 95

Beyond affine transformations

Homography: plane projective transformation (transformation taking a quad to another arbitrary quad)

slide-96
SLIDE 96

Fitting a homography

Recall: homogenenous coordinates

Converting to homogenenous image coordinates Converting from homogenenous image coordinates

slide-97
SLIDE 97

Fitting a homography

Recall: homogenenous coordinates Equation for homography:

Converting to homogenenous image coordinates Converting from homogenenous image coordinates

slide-98
SLIDE 98

Fitting a homography

Equation for homography:

3 equations, only 2 linearly independent 9 entries, 8 degrees of freedom (scale is arbitrary)

slide-99
SLIDE 99

Direct linear transform

H has 8 degrees of freedom (9 parameters, but scale is arbitrary) One match gives us two linearly independent equations Four matches needed for a minimal solution (null space

  • f 8x9 matrix)

More than four: homogeneous least squares

slide-100
SLIDE 100

Application: Panorama stitching

Images courtesy of A. Zisserman.

slide-101
SLIDE 101