Designing Applications that See Lecture 7: Object Recognition Dan - - PowerPoint PPT Presentation

designing applications that see lecture 7 object
SMART_READER_LITE
LIVE PREVIEW

Designing Applications that See Lecture 7: Object Recognition Dan - - PowerPoint PPT Presentation

stanford hci group / cs377s Designing Applications that See Lecture 7: Object Recognition Dan Maynes-Aminzade 29 January 2008 Designing Applications that See http://cs377s.stanford.edu Reminders Pick up graded Assignment #1 Assignment


slide-1
SLIDE 1

stanford hci group / cs377s

Designing Applications that See Lecture 7: Object Recognition

Designing Applications that See http://cs377s.stanford.edu

Dan Maynes-Aminzade 29 January 2008

slide-2
SLIDE 2

Reminders

Pick up graded Assignment #1 Assignment #2 released today Fill out the interim course evaluation CS247L workshop on Flash tomorrow night, 6- 47 p g , 8PM in Wallenberg 332 Next class: OpenCV tutorial

Bring your webcams We will be using Windows with Visual Studio (you can try to follow along on your own machine, but you will need to figure out things like library and include paths on your system)

29 January 2008 2 Lecture 7: Object Recognition

slide-3
SLIDE 3

Today’s Goals

Understand the challenges of object recognition Learn about some algorithms that attempt to recognize particular objects (and object to recognize particular objects (and object classes) from their basic constituent features

29 January 2008 3 Lecture 7: Object Recognition

slide-4
SLIDE 4

Outline

Object recognition: general problem description Boosted tree classifiers SIFT l ith SIFT algorithm

29 January 2008 4 Lecture 7: Object Recognition

slide-5
SLIDE 5

Fast, Accurate, General Recognition

This guy is wearing a haircut This guy is wearing a haircut called a called a “mullet “mullet” ”

29 January 2008 5 Lecture 7: Object Recognition

(courtesy of G. Bradski)

slide-6
SLIDE 6

Quick, Spot the Mullets!

29 January 2008 6 Lecture 7: Object Recognition

(courtesy of G. Bradski)

slide-7
SLIDE 7

What Makes a Car a Car?

CARS NOT CARS

29 January 2008 7 Lecture 7: Object Recognition

slide-8
SLIDE 8

Object Detection

Goal: find an object of a pre-defined class in a static image or video frame. Approach

Extract certain image features, such as edges, color regions, textures, contours, etc. Use some heuristics to find configurations and/or combinations of those features specific to the object of interest

29 January 2008 8 Lecture 7: Object Recognition

slide-9
SLIDE 9

Many Approaches to Recognition

Geometric

Patches/Ulman Constellation/Perona Eigen Objects/Turk Shape models

ations

29 January 2008 9 Lecture 7: Object Recognition

Non-Geo Local Global

Histograms/Schiele HMAX/Poggio MRF/Freeman, Murphy

features rela

(courtesy of G. Bradski)

slide-10
SLIDE 10

A Common 2-Step Strategy

29 January 2008 10 Lecture 7: Object Recognition

slide-11
SLIDE 11

Possible Things to Look For

Symmetry Color Shadow Corners Edges Texture Taillights

29 January 2008 11 Lecture 7: Object Recognition

slide-12
SLIDE 12

Symmetry

Observed from rear view, a car generally has vertical symmetry Problems:

Symmetry estimation is sensitive to noise Prone to false detections, such as symmetrical background objects Doesn’t work for partly

  • ccluded vehicles

29 January 2008 12 Lecture 7: Object Recognition

slide-13
SLIDE 13

Color

Road is a fairly constant color Non-road regions within a road area are potential vehicles Problems:

Color of an object depends on ll fl illumination, reflectance properties of the object, viewing geometry, and camera properties Color of an object can be very different during different times

  • f the day, under different

weather conditions, and under different poses

29 January 2008 13 Lecture 7: Object Recognition

slide-14
SLIDE 14

Shadow

Area underneath a vehicle is distinctly darker than any

  • ther areas on an asphalt

paved road. P bl Problem:

Doesn‘t work in rain, under bad illumination (under a bridge for example) Intensity of the shadow depends on illumination of the image: how to choose appropriate threshold values?

29 January 2008 14 Lecture 7: Object Recognition

slide-15
SLIDE 15

Corners

Vehicles in general have a rectangular shape Can use four templates one for each corner, to detect all corners in image, and then use a search method to find valid configurations a search method to find valid configurations (matching corners)

29 January 2008 15 Lecture 7: Object Recognition

slide-16
SLIDE 16

Edges

Rear views of cars contain many horizontal and vertical structures (rear-window, bumper, etc.) One possibility: horizontal edge detector on the image (such as Sobel operator), then sum response in each column to locate horizontal p position (should be at the peaks) Problem:

Lots of parameters: threshold values for the edge detectors, the threshold values for picking the most important vertical and horizontal edges, and the threshold values for choosing the best maxima in profile image

29 January 2008 16 Lecture 7: Object Recognition

slide-17
SLIDE 17

Texture

Presence of a car causes local intensity changes General similarities among all vehicles means that the intensity changes may follow means that the intensity changes may follow a certain pattern Problem: in most environments the background contains lots of texture as well

29 January 2008 17 Lecture 7: Object Recognition

slide-18
SLIDE 18

Taillights

Fairly salient feature of all vehicles Problem:

A little different on every car every car Not that bright during the daytime; probably would work only at night.

29 January 2008 18 Lecture 7: Object Recognition

slide-19
SLIDE 19

Other Factors to Consider

Perspective: This is not a likely position / size Shape: Trace along the outer contour

29 January 2008 19 Lecture 7: Object Recognition

slide-20
SLIDE 20

General Problem

For complex objects, such as vehicles, it is hard to find features and heuristics that will handle the huge variety of instances of the object class:

May be rotated in any direction Lots of different kinds of cars in different colors May be a truck May have a missing headlight, bumper stickers, etc. May be half in light, half in shadow

29 January 2008 20 Lecture 7: Object Recognition

slide-21
SLIDE 21

Statistical Model Training

Training Set

Positive Samples Negative Samples

Different features are extracted from the training samples and distinctive features that training samples and distinctive features that can be used to classify the object are selected. This information is “compressed” into the statistical model parameters. Each time the trained classifier does not detect an object (misses the object) or mistakenly detects the absent object (gives a false alarm), model is adjusted.

29 January 2008 21 Lecture 7: Object Recognition

slide-22
SLIDE 22

Training in OpenCV

Uses simple features and a cascade of boosted tree classifiers as a statistical model. Paul Viola and Michael J. Jones. Rapid Object Detection using a Boosted Cascade

  • f Simple Features. IEEE CVPR, 2001.

Rainer Lienhart and Jochen Maydt. An Extended Set of Haar-like Features for Rapid Object Detection. IEEE ICIP 2002,

  • Vol. 1, pp. 900-903, Sep. 2002.

29 January 2008 22 Lecture 7: Object Recognition

slide-23
SLIDE 23

Approach Summary

Classifier is trained on images of fixed size (Viola uses 24x24) Detection is done by sliding a search window of that size through the image and checking whether an image region at a certain location “looks like a car” or not. Image (or classifier) can be scaled to detect objects of g ( ) j different sizes. A very large set of very simple “weak” classifiers that use a single feature to classify the image region as car or non-car. Each feature is described by the template (shape of the feature), its coordinate relative to the search window

  • rigin and the size (scale factor) of the feature.

29 January 2008 23 Lecture 7: Object Recognition

slide-24
SLIDE 24

Types of Features

Feature’s value is a weighted sum of two components:

Pixel sum over the black rectangle Sum over the whole feature area

29 January 2008 24 Lecture 7: Object Recognition

slide-25
SLIDE 25

Weak Classifier

Computed feature value is used as input to a very simple decision tree classifier with 2 terminal nodes

Bar detector works well for “nose ” a face

1 means “car” and -1 means “non-car”

29 January 2008 25 Lecture 7: Object Recognition

well for nose, a face detecting stump. It doesn’t work well for cars.

slide-26
SLIDE 26

Boosted Classifier

Complex and robust classifier is built out of multiple weak classifiers using a procedure called boosting. The boosted classifier is built iteratively as a weighted sum of weak classifiers: weighted sum of weak classifiers:

F = sign(c1f1 + c2f2 + … + cnfn)

On each iteration, a new weak classifier fi is trained and added to the sum. The smaller the error fi gives on the training set, the larger is the coefficient ci that is assigned to it.

29 January 2008 26 Lecture 7: Object Recognition

slide-27
SLIDE 27

Cascade of Boosted Classifiers

Sequence of boosted classifiers with constantly increasing complexity Chained into a cascade with the simpler classifiers going first classifiers going first.

29 January 2008 27 Lecture 7: Object Recognition

slide-28
SLIDE 28

Classifier Cascade Demo

29 January 2008 28 Lecture 7: Object Recognition

slide-29
SLIDE 29

Identifying a Particular Object

Correlation-based template matching?

Computationally infeasible when object rotation, scale, illumination and 3D pose vary Even more infeasible with partial occlusion p

29 January 2008 29 Lecture 7: Object Recognition

slide-30
SLIDE 30

Local Image Features

We need to learn a set of local object features that are unaffected by

Nearby clutter Partial occlusion

… and invariant to … and invariant to

Illumination 3D projective transforms Common object variations

...but, at the same time, sufficiently distinctive to identify specific objects among many alternatives!

29 January 2008 30 Lecture 7: Object Recognition

slide-31
SLIDE 31

What Features to Use?

Line segments, edges and region grouping?

Detection not good enough for reliable recognition

Peaks detection in local image variations? Peaks detection in local image variations?

Example: Harris corner detector Problem: image examined at only a single scale Different key locations as the image scale changes

29 January 2008 31 Lecture 7: Object Recognition

slide-32
SLIDE 32

Invariant Local Features

Goal: Transform image content into local feature coordinates that are invariant to translation, rotation, and scaling

29 January 2008 32 Lecture 7: Object Recognition

slide-33
SLIDE 33

SIFT Method

Scale Invariant Feature Transform (SIFT) Staged filtering approach

Identifies stable points (called image “keys”)

Computation time less than 2 seconds L l f t Local features:

Invariant to image translation, scaling, rotation Partially invariant to illumination changes and 3D projection (up to 20° of rotation) Minimally affected by noise Similar properties with neurons in inferior temporal cortex used for object recognition in primate vision

29 January 2008 33 Lecture 7: Object Recognition

slide-34
SLIDE 34

SIFT: Algorithm Outline

Detect extrema across a “scale-space”

Uses a difference-of-Gaussians function

Localize keypoints

Fi d b i l l ti d l t fit d l Find sub-pixel location and scale to fit a model

Assign orientations to keypoints

1 or more for each keypoint

Build keypoint descriptor

Created from local image gradients

29 January 2008 34 Lecture 7: Object Recognition

slide-35
SLIDE 35

Scale Space

Build a pyramid of images

Images are difference-of-Gaussian functions Resampling between each level

29 January 2008 35 Lecture 7: Object Recognition

slide-36
SLIDE 36

Finding Keypoints

Basic idea: find strong corners in the image, but in a scale invariant fashion

Run a linear filter (difference of Gaussians) at different resolutions on image pyramid g py

29 January 2008 36 Lecture 7: Object Recognition

  • =

Difference of Gaussians

slide-37
SLIDE 37

Finding Keypoints

Keypoints are the extreme values in the difference of Gaussians function across the scale space

29 January 2008 37 Lecture 7: Object Recognition

slide-38
SLIDE 38

Key Localization

Find maxima and minima, and extract image gradient and orientation at each level to build an orientation histogram

29 January 2008 38 Lecture 7: Object Recognition

2nd level 1st level

slide-39
SLIDE 39

Select Canonical Orientation

Create histogram of local gradient directions computed at selected scale Assign canonical

  • rientation at peak of

smoothed histogram

29 January 2008 39 Lecture 7: Object Recognition

2π 2π

slide-40
SLIDE 40

Orientation Histogram

Stored relative to the canonical orientation for the keypoint, to achieve rotation invariance

29 January 2008 40 Lecture 7: Object Recognition

slide-41
SLIDE 41

SIFT Keypoints

Feature vector describing the local image region sampled relative to its scale-space coordinate frame Represents blurred image gradient locations Represents blurred image gradient locations in multiple orientations planes and at multiple scales

29 January 2008 41 Lecture 7: Object Recognition

slide-42
SLIDE 42

SIFT: Experimental Results

29 January 2008 42 Lecture 7: Object Recognition

slide-43
SLIDE 43

SIFT: Finding Matches

Goal: identify candidate object matches

The best candidate match is the nearest neighbor (i.e., minimum Euclidean distance between descriptor vectors) The exact solution for high dimensional vectors has high complexity, so SIFT uses approximate Best-Bin-First complexity, so SIFT uses approximate Best Bin First (BBF) search method (Beis and Lowe)

Goal: Final verification

Low-residual least-squares fit Solution of a linear system: x = [ATA]-1ATb When at least 3 keys agree with low residual, there is strong evidence for the presence of the object Since there are dozens of keys in the image, this still works with partial occlusion

29 January 2008 43 Lecture 7: Object Recognition

slide-44
SLIDE 44

SIFT Example

29 January 2008 44 Lecture 7: Object Recognition

(courtesy of David Lowe)

slide-45
SLIDE 45

Partial Occlusion Example

29 January 2008 45 Lecture 7: Object Recognition

(courtesy of David Lowe)

slide-46
SLIDE 46

SIFT Demo

29 January 2008 46 Lecture 7: Object Recognition

slide-47
SLIDE 47

Context Focus?

29 January 2008 47 Lecture 7: Object Recognition

(courtesy of Kevin Murphy)

slide-48
SLIDE 48

Importance of Context

29 January 2008 48 Lecture 7: Object Recognition

(courtesy of Kevin Murphy)

slide-49
SLIDE 49

Importance of Context

We know there is a keyboard present in this scene even if we cannot see it clearly.

29 January 2008 49 Lecture 7: Object Recognition

We know there is no keyboard present in this scene… … even if there is one indeed. (courtesy of Kevin Murphy)

slide-50
SLIDE 50

Summary

SIFT

Very robust recognition of a specific object, even with messy backgrounds and varying illumination conditions Very quick and easy to train A little too slow to run at interactive frame rates (if the ( background is complex)

Classifier Cascades

Can recognize a general class of objects (rather than a specific object) Less robust (more false positives / missed detections) Slow to train, requires many examples Runs well at interactive frame rates

29 January 2008 50 Lecture 7: Object Recognition