Object Detection Sanja Fidler CSC420: Intro to Image Understanding - - PowerPoint PPT Presentation

object detection
SMART_READER_LITE
LIVE PREVIEW

Object Detection Sanja Fidler CSC420: Intro to Image Understanding - - PowerPoint PPT Presentation

Object Detection Sanja Fidler CSC420: Intro to Image Understanding 1 / 48 Object Detection The goal of object detection is to localize objects in an image and tell their class Localization: place a tight bounding box around object Most


slide-1
SLIDE 1

Object Detection

Sanja Fidler CSC420: Intro to Image Understanding 1 / 48

slide-2
SLIDE 2

Object Detection

The goal of object detection is to localize objects in an image and tell their class Localization: place a tight bounding box around object Most approaches find only objects of one or a few specific classes, e.g. car

  • r cow

Sanja Fidler CSC420: Intro to Image Understanding 2 / 48

slide-3
SLIDE 3

Type of Approaches

Different approaches tackle detection differently. They can roughly be categorized into three main types: Find interest points, followed by Hough voting

Sanja Fidler CSC420: Intro to Image Understanding 3 / 48

slide-4
SLIDE 4

Interest Point Based Approaches

Compute interest points (e.g., Harris corner detector is a popular choice) Vote for where the object could be given the content around interest points

Sanja Fidler CSC420: Intro to Image Understanding 4 / 48

slide-5
SLIDE 5

Interest Point Based Approaches

Compute interest points (e.g., Harris corner detector is a popular choice) Vote for where the object could be given the content around interest points

Sanja Fidler CSC420: Intro to Image Understanding 4 / 48

slide-6
SLIDE 6

Interest Point Based Approaches

Compute interest points (e.g., Harris corner detector is a popular choice) Vote for where the object could be given the content around interest points

Sanja Fidler CSC420: Intro to Image Understanding 4 / 48

slide-7
SLIDE 7

Interest Point Based Approaches

Compute interest points (e.g., Harris corner detector is a popular choice) Vote for where the object could be given the content around interest points

Sanja Fidler CSC420: Intro to Image Understanding 4 / 48

slide-8
SLIDE 8

Interest Point Based Approaches

Compute interest points (e.g., Harris corner detector is a popular choice) Vote for where the object could be given the content around interest points

Sanja Fidler CSC420: Intro to Image Understanding 4 / 48

slide-9
SLIDE 9

Type of Approaches

Different approaches tackle detection differently. They can roughly be categorized into three main types: Find interest points, followed by Hough voting Sliding windows: “slide” a box around image and classify each image crop inside a box (contains object or not?)

Sanja Fidler CSC420: Intro to Image Understanding 5 / 48

slide-10
SLIDE 10

Sliding Window Approaches

Slide window and ask a classifier: “Is sheep in window or not?” 0.1 confidence

[Slide: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 6 / 48

slide-11
SLIDE 11

Sliding Window Approaches

Slide window and ask a classifier: “Is sheep in window or not?”

  • 0.2

[Slide: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 6 / 48

slide-12
SLIDE 12

Sliding Window Approaches

Slide window and ask a classifier: “Is sheep in window or not?”

  • 0.1

[Slide: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 6 / 48

slide-13
SLIDE 13

Sliding Window Approaches

Slide window and ask a classifier: “Is sheep in window or not?” 0.1

[Slide: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 6 / 48

slide-14
SLIDE 14

Sliding Window Approaches

Slide window and ask a classifier: “Is sheep in window or not?” . . . 1.5 . . .

[Slide: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 6 / 48

slide-15
SLIDE 15

Sliding Window Approaches

Slide window and ask a classifier: “Is sheep in window or not?” 0.5

[Slide: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 6 / 48

slide-16
SLIDE 16

Sliding Window Approaches

Slide window and ask a classifier: “Is sheep in window or not?” 0.4

[Slide: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 6 / 48

slide-17
SLIDE 17

Sliding Window Approaches

Slide window and ask a classifier: “Is sheep in window or not?” 0.3

[Slide: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 6 / 48

slide-18
SLIDE 18

Sliding Window Approaches

Slide window and ask a classifier: “Is sheep in window or not?” 0.1 confidence- 0.2

  • 0.1

0.1 . . . 1.5 . . . 0.5 0.4 0.3

[Slide: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 6 / 48

slide-19
SLIDE 19

Type of Approaches

Different approaches tackle detection differently. They can roughly be categorized into three main types: Find interest points, followed by Hough voting Sliding windows: “slide” a box around image and classify each image crop inside a box (contains object or not?) Generate region (object) proposals, and classify each region

Sanja Fidler CSC420: Intro to Image Understanding 7 / 48

slide-20
SLIDE 20

Region Proposal Based Approaches

Group pixels into object-like regions

Sanja Fidler CSC420: Intro to Image Understanding 8 / 48

slide-21
SLIDE 21

Region Proposal Based Approaches

Group pixels into object-like regions

Sanja Fidler CSC420: Intro to Image Understanding 8 / 48

slide-22
SLIDE 22

Region Proposal Based Approaches

Group pixels into object-like regions

Sanja Fidler CSC420: Intro to Image Understanding 8 / 48

slide-23
SLIDE 23

Region Proposal Based Approaches

Generate many different regions

Sanja Fidler CSC420: Intro to Image Understanding 8 / 48

slide-24
SLIDE 24

Region Proposal Based Approaches

Generate many different regions

Sanja Fidler CSC420: Intro to Image Understanding 8 / 48

slide-25
SLIDE 25

Region Proposal Based Approaches

Generate many different regions

Sanja Fidler CSC420: Intro to Image Understanding 8 / 48

slide-26
SLIDE 26

Region Proposal Based Approaches

The hope is that at least a few will cover real objects

Sanja Fidler CSC420: Intro to Image Understanding 8 / 48

slide-27
SLIDE 27

Region Proposal Based Approaches

The hope is that at least a few will cover real objects

Sanja Fidler CSC420: Intro to Image Understanding 8 / 48

slide-28
SLIDE 28

Region Proposal Based Approaches

Select a region

Sanja Fidler CSC420: Intro to Image Understanding 8 / 48

slide-29
SLIDE 29

Region Proposal Based Approaches

Crop out an image patch around it, throw to classifier (e.g., Neural Net)

Sanja Fidler CSC420: Intro to Image Understanding 8 / 48

slide-30
SLIDE 30

Region Proposal Based Approaches

Do this for every region

Sanja Fidler CSC420: Intro to Image Understanding 8 / 48

slide-31
SLIDE 31

Region Proposal Based Approaches

Do this for every region

Sanja Fidler CSC420: Intro to Image Understanding 8 / 48

slide-32
SLIDE 32

Region Proposal Based Approaches

Do this for every region

Sanja Fidler CSC420: Intro to Image Understanding 8 / 48

slide-33
SLIDE 33

Type of Approaches

Different approaches tackle detection differently. They can roughly be categorized into three main types: Find interest points, followed by Hough voting ← Let’s first look at

  • ne example method for this

Sliding windows: “slide” a box around image and classify each image crop inside a box (contains object or not?) Generate region (object) proposals, and classify each region

Sanja Fidler CSC420: Intro to Image Understanding 9 / 48

slide-34
SLIDE 34

Object Detection via Hough Voting:

Implicit Shape Model

  • B. Leibe, A. Leonardis, B. Schiele

Robust Object Detection with Interleaved Categorization and Segmentation IJCV, 2008 Paper: http://www.vision.rwth-aachen.de/publications/pdf/leibe-interleaved-ijcv07final.pdf

Sanja Fidler CSC420: Intro to Image Understanding 10 / 48

slide-35
SLIDE 35

Start with Simple: Line Detection

How can I find lines in this image? [Source: K. Grauman]

Sanja Fidler CSC420: Intro to Image Understanding 11 / 48

slide-36
SLIDE 36

Hough Transform

Idea: Voting (Hough Transform) Voting is a general technique where we let the features vote for all models that are compatible with it. Cycle through features, cast votes for model parameters. Look for model parameters that receive a lot of votes. [Source: K. Grauman]

Sanja Fidler CSC420: Intro to Image Understanding 12 / 48

slide-37
SLIDE 37

Hough Transform: Line Detection

Hough space: parameter space Connection between image (x, y) and Hough (m, b) spaces A line in the image corresponds to a point in Hough space What does a point (x0, y0) in the image space map to in Hough space?

[Source: S. Seitz]

Sanja Fidler CSC420: Intro to Image Understanding 13 / 48

slide-38
SLIDE 38

Hough Transform: Line Detection

Hough space: parameter space Connection between image (x, y) and Hough (m, b) spaces A line in the image corresponds to a point in Hough space A point in image space votes for all the lines that go through this

  • point. This votes are a line in the Hough space.

[Source: S. Seitz]

Sanja Fidler CSC420: Intro to Image Understanding 14 / 48

slide-39
SLIDE 39

Hough Transform: Line Detection

Hough space: parameter space Two points: Each point corresponds to a line in the Hough space A point where these two lines meet defines a line in the image!

[Source: S. Seitz]

Sanja Fidler CSC420: Intro to Image Understanding 15 / 48

slide-40
SLIDE 40

Hough Transform: Line Detection

Hough space: parameter space Vote with each image point Find peaks in Hough space. Each peak is a line in the image.

[Source: S. Seitz]

Sanja Fidler CSC420: Intro to Image Understanding 16 / 48

slide-41
SLIDE 41

Hough Transform: Line Detection

Issues with usual (m, b) parameter space: undefined for vertical lines A better representation is a polar representation of lines

[Source: S. Seitz]

Sanja Fidler CSC420: Intro to Image Understanding 17 / 48

slide-42
SLIDE 42

Example Hough Transform

With the parameterization x cos θ + y sin θ = d Points in picture represent sinusoids in parameter space Points in parameter space represent lines in picture Example 0.6x + 0.4y = 2.4, Sinusoids intersect at d = 2.4, θ = 0.9273 [Source: M. Kazhdan, slide credit: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 18 / 48

slide-43
SLIDE 43

Hough Transform: Line Detection

Hough Voting algorithm

[Source: S. Seitz]

Sanja Fidler CSC420: Intro to Image Understanding 19 / 48

slide-44
SLIDE 44

Hough Transform: Circle Detection

What about circles? How can I fit circles around these coins?

Sanja Fidler CSC420: Intro to Image Understanding 20 / 48

slide-45
SLIDE 45

Hough Transform: Circle Detection

Assume we are looking for a circle of known radius r Circle: (x − a)2 + (y − b)2 = r2 Hough space (a, b): A point (x0, y0) maps to (a − x0)2 + (b − y0)2 = r2 → a circle around (x0, y0) with radius r Each image point votes for a circle in Hough space

[Source: H. Rhody]

Sanja Fidler CSC420: Intro to Image Understanding 21 / 48

slide-46
SLIDE 46

Hough Transform: Circle Detection

What if we don’t know r? Hough space: ?

[Source: K. Grauman]

Sanja Fidler CSC420: Intro to Image Understanding 22 / 48

slide-47
SLIDE 47

Hough Transform: Circle Detection

What if we don’t know r? Hough space: conics

[Source: K. Grauman]

Sanja Fidler CSC420: Intro to Image Understanding 23 / 48

slide-48
SLIDE 48

Hough Transform: Circle Detection

Find the coins

[Source: K. Grauman]

Sanja Fidler CSC420: Intro to Image Understanding 24 / 48

slide-49
SLIDE 49

Hough Transform: Circle Detection

Iris detection

[Source: K. Grauman]

Sanja Fidler CSC420: Intro to Image Understanding 25 / 48

slide-50
SLIDE 50

Generalized Hough Voting

Hough Voting for general shapes

Dana H. Ballard, Generalizing the Hough Transform to Detect Arbitrary Shapes, 1980

Sanja Fidler CSC420: Intro to Image Understanding 26 / 48

slide-51
SLIDE 51

Implicit Shape Model

Implicit Shape Model adopts the idea of voting Basic idea: Find interest points in an image Match patch around each interest point to a training patch Vote for object center given that training instance

Sanja Fidler CSC420: Intro to Image Understanding 27 / 48

slide-52
SLIDE 52

Implicit Shape Model: Basic Idea

Vote for object center

Sanja Fidler CSC420: Intro to Image Understanding 28 / 48

slide-53
SLIDE 53

Implicit Shape Model: Basic Idea

Vote for object center

Sanja Fidler CSC420: Intro to Image Understanding 28 / 48

slide-54
SLIDE 54

Implicit Shape Model: Basic Idea

Vote for object center

Sanja Fidler CSC420: Intro to Image Understanding 28 / 48

slide-55
SLIDE 55

Implicit Shape Model: Basic Idea

Vote for object center

Sanja Fidler CSC420: Intro to Image Understanding 28 / 48

slide-56
SLIDE 56

Implicit Shape Model: Basic Idea

Vote for object center

Sanja Fidler CSC420: Intro to Image Understanding 28 / 48

slide-57
SLIDE 57

Implicit Shape Model: Basic Idea

Find the patches that produced the peak

Sanja Fidler CSC420: Intro to Image Understanding 28 / 48

slide-58
SLIDE 58

Implicit Shape Model: Basic Idea

Place a box around these patches → objects!

Sanja Fidler CSC420: Intro to Image Understanding 28 / 48

slide-59
SLIDE 59

Implicit Shape Model: Basic Idea

Really easy. Only one problem... Would be slow... How do we make it fast?

Sanja Fidler CSC420: Intro to Image Understanding 28 / 48

slide-60
SLIDE 60

Implicit Shape Model: Basic Idea

Visual vocabulary (we saw this for retrieval) Compare each patch to a small set of visual words (clusters)

Sanja Fidler CSC420: Intro to Image Understanding 28 / 48

slide-61
SLIDE 61

Implicit Shape Model: Basic Idea

Training: Getting the vocabulary

Sanja Fidler CSC420: Intro to Image Understanding 28 / 48

slide-62
SLIDE 62

Implicit Shape Model: Basic Idea

Find interest points in each training image

Sanja Fidler CSC420: Intro to Image Understanding 28 / 48

slide-63
SLIDE 63

Implicit Shape Model: Basic Idea

Collect patches around each interest point

Sanja Fidler CSC420: Intro to Image Understanding 28 / 48

slide-64
SLIDE 64

Implicit Shape Model: Basic Idea

Collect patches across all training examples

Sanja Fidler CSC420: Intro to Image Understanding 28 / 48

slide-65
SLIDE 65

Implicit Shape Model: Basic Idea

Cluster the patches to get a small set of “representative” patches

Sanja Fidler CSC420: Intro to Image Understanding 28 / 48

slide-66
SLIDE 66

Implicit Shape Model: Training

Represent each training patch with the closest visual word. Record the displacement vectors for each word across all training examples.

Training image

Visual codeword with displacement vectors

[Leibe et al. IJCV 2008]

Sanja Fidler CSC420: Intro to Image Understanding 29 / 48

slide-67
SLIDE 67

Implicit Shape Model: Test

At test times detect interest points Assign each patch around interest point to closes visual word Vote with all displacement vectors for that word [Source: B. Leibe]

Sanja Fidler CSC420: Intro to Image Understanding 30 / 48

slide-68
SLIDE 68

Recognition Pipeline

[Source: B. Leibe]

Sanja Fidler CSC420: Intro to Image Understanding 31 / 48

slide-69
SLIDE 69

Recognition Summary

Apply interest points and extract features around selected locations. Match those to the codebook. Collect consistent configurations using Generalized Hough Transform. Each entry votes for a set of possible positions and scales in continuous space. Extract maxima in the continuous space using Mean Shift. Refinement can be done by sampling more local features.

[Source: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 32 / 48

slide-70
SLIDE 70

Example

Original image [Source: B. Leibe, credit: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 33 / 48

slide-71
SLIDE 71

Example

Interest points [Source: B. Leibe, credit: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 33 / 48

slide-72
SLIDE 72

Example

Matched patches [Source: B. Leibe, credit: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 33 / 48

slide-73
SLIDE 73

Example

Voting space [Source: B. Leibe, credit: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 33 / 48

slide-74
SLIDE 74

Example

1st hypothesis [Source: B. Leibe, credit: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 33 / 48

slide-75
SLIDE 75

Example

2nd hypothesis [Source: B. Leibe, credit: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 33 / 48

slide-76
SLIDE 76

Example

3rd hypothesis [Source: B. Leibe, credit: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 33 / 48

slide-77
SLIDE 77

Scale Invariant Voting

Scale-invariant feature selection Scale-invariant interest points Rescale extracted patches Match to constant-size codebook Generate scale votes Scale as 3rd dimension in voting space xvote = ximg − xocc(simg/socc) yvote = yimg − yocc(simg/socc) svote = simg/socc Search for maxima in 3D voting space [Source: B. Leibe, credit: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 34 / 48

slide-78
SLIDE 78

Scale Invariant Voting

Search window x y s

[Slide credit: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 35 / 48

slide-79
SLIDE 79

Scale Voting: Efficient Computation

Continuous Generalized Hough Transform Binned accumulator array similar to standard Gen. Hough Transf. Quickly identify candidate maxima locations Refine locations by Mean-Shift search only around those points Avoid quantization effects by keeping exact vote locations.

y s x

Refinement (Mean-Shift)

y s x

Candidate maxima

y s

Scale votes

x y s

Binned

  • accum. array

x

[Source: B. Leibe, credit: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 36 / 48

slide-80
SLIDE 80

Extension: Rotation-Invariant Detection

Polar instead of Cartesian voting scheme Recognize objects under image-plane rotations Possibility to share parts between articulations But also increases false positive detections [Source: B. Leibe, credit: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 37 / 48

slide-81
SLIDE 81

Sometimes it’s Necessary

20

  • B. Leibe

Figure from [Mikolajczyk et al., CVPR’06]

[Source: B. Leibe, credit: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 38 / 48

slide-82
SLIDE 82

Recognition and Segmentation

Augment each visual word with meta-deta: for example, segmentation mask

Sanja Fidler CSC420: Intro to Image Understanding 39 / 48

slide-83
SLIDE 83

Recognition and Segmentation

Backprojected Hypotheses Local Features Matched Codebook Entries Probabilistic Voting Segmentation 3D Voting Space (continuous)

x y s

Backprojection

  • f Maxima

Pixel Contributions Backproject Meta- information

Sanja Fidler CSC420: Intro to Image Understanding 40 / 48

slide-84
SLIDE 84

Results

[Source: B. Leibe]

Sanja Fidler CSC420: Intro to Image Understanding 41 / 48

slide-85
SLIDE 85

Results

[Source: B. Leibe]

Sanja Fidler CSC420: Intro to Image Understanding 42 / 48

slide-86
SLIDE 86

Results

Office chairs Dining room chairs

[Source: B. Leibe]

Sanja Fidler CSC420: Intro to Image Understanding 43 / 48

slide-87
SLIDE 87

Results

[Source: B. Leibe]

Sanja Fidler CSC420: Intro to Image Understanding 44 / 48

slide-88
SLIDE 88

Inferring Other Information: Part Labels

Training Test Output

[Source: B. Leibe]

Sanja Fidler CSC420: Intro to Image Understanding 45 / 48

slide-89
SLIDE 89

Inferring Other Information: Part Labels

[Source: B. Leibe]

Sanja Fidler CSC420: Intro to Image Understanding 46 / 48

slide-90
SLIDE 90

Inferring Other Information: Depth

“Depth from a single image”

[Source: B. Leibe]

Sanja Fidler CSC420: Intro to Image Understanding 47 / 48

slide-91
SLIDE 91

Conclusion

Exploits a lot of parts (as many as interest points) Very simple Voting scheme: Generalized Hough Transform Works well, but not as well as Deformable Part-based Models with latent SVM training (next time) Extensions: train the weights discriminatively. Code, datasets & several pre-trained detectors available at http://www.vision.ee.ethz.ch/bleibe/code

[Source: B. Leibe, credit: R. Urtasun]

Sanja Fidler CSC420: Intro to Image Understanding 48 / 48