[PPT] - Objects and scenes Objects and scenes: Recognizing Multiple Object PowerPoint Presentation

SLIDE 1

Reconnaissance d’objets et vision artificielle 2010

Objects and scenes Objects and scenes: Recognizing Multiple Object Classes

Josef Sivic and Ivan Laptev http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d’Informatique, Ecole Normale Supérieure, Paris With slides from: A. Torralba, D. Hoiem, D. Ramanan and others.

SLIDE 2

Multiclass object detection j

SLIDE 3

SLIDE 4

SLIDE 5

Context: objects appear in configurations

SLIDE 6

Generalization: objects share parts

SLIDE 7

How many categories?

SLIDE 8

How many categories?

Slide by Aude Oliva

SLIDE 9

How many object categories are there?

Biederman 1987

SLIDE 10

How many categories?

Probably this question is not even specific

h h enough to have an answer

SLIDE 11

Which level of categorization is the right one? is the right one?

Car is an object composed of: j p a few doors, four wheels (not all visible at all times), a roof, front lights, windshield

?

If you are thinking in buying a car, you might want to be a bit more specific about your categorization level.

SLIDE 12

Entry-level categories

(Jolicoeur Gluck Kosslyn 1984) (Jolicoeur, Gluck, Kosslyn 1984)

Typical member of a basic-level category are

yp g y categorized at the expected level

Atypical members tend to be classified at a
Atypical members tend to be classified at a

subordinate level.

A bird An ostrich

SLIDE 13

We do not need to recognize the exact category g g y

A new class can borrow information from similar A new class can borrow information from similar categories

SLIDE 14

So, where is computer vision?

Well…

SLIDE 15

Multiclass object detection

th t l d the not so early days

SLIDE 16

Multiclass object detection

th t l d the not so early days

Using a set of independent binary classifiers was a common strategy:

Viola-Jones extension for dealing with rotations
two cascades for each view
Schneiderman-Kanade multiclass object detection

(a) One detector for each class

There is nothing wrong with this approach if you have access to lots of training data and you do not care about efficiency.

SLIDE 17

Generalizing Across Categories

Can we transfer knowledge from one object category to another?

Slide by Erik Sudderth

SLIDE 18

Shared features

Is learning the object class 1000 easier than

f learning the first?

…

Can we transfer knowledge from one object to

another?

Are the shared properties interesting by

themselves? themselves?

SLIDE 19

Additive models and boosting

Independent binary classifiers:

Screen detector C d t t Car detector Face detector

Binary classifiers that share features:

Screen detector y Car detector d

Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007

Face detector

SLIDE 20

Specific feature p

pedestrian pedestrian chair Traffic light sign face k d l

Non-shared feature: this feature is too specific to faces

Background class

is too specific to faces.

Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007

SLIDE 21

Shared feature

shared feature

Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007

SLIDE 22

50 training samples/class 50 training samples/class 29 object classes 2000 entries in the dictionary 2000 entries in the dictionary Results averaged on 20 runs

Class specific features

Results averaged on 20 runs

Class-specific features Shared features

Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007

SLIDE 23

Generalization as a function of object i il i i similarities

12 viewpoints 12 unrelated object classes 12 viewpoints 12 unrelated object classes

C C under ROC under ROC Area Area Number of training samples per class Number of training samples per class

Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007

SLIDE 24

Generic vs. specific features Generic vs. specific features

Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007

SLIDE 25

Object clustering according to shared features Object clustering according to shared features

Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007

SLIDE 26

Another multi-class problem: F i i Face recognition

N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

We do not want to learn recognition of each person from scratch!

SLIDE 27

Are these images of the same person? Are these images of the same person?

N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

SLIDE 28

Prior approaches

Images Verification Low-level f g features

RGB HOG LBP SIFT

Different

SIFT … RGB HOG LBP LBP SIFT …

N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

SLIDE 29

Approach: attributes

Images Verification Attributes Low-level f g features

RGB HOG LBP SIFT

+

Male

Round Jaw Asian

Different

SIFT …

Dark hair R

RGB HOG LBP

+

LBP SIFT

N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

…

SLIDE 30

Attributes can define categories

Female Eyeglasses Middle-aged Dark hair Caucasian

N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

SLIDE 31

Some attributes may be irrelevant

Teeth showing Outside Tilted head

N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

SLIDE 32

Using attributes to perform verification

Verification classifier classifier

N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

SLIDE 33

Attributes are intuitive

Female Black hair Young Frontal pose Attractive Mouth closed White Eyes open

N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

SLIDE 34

Describe faces using similes

Penelope Cruz Angelina Jolie

N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

SLIDE 35

Training simile classifiers

Images of Penelope Cruz ’s eyes Images of Penelope Cruz s eyes Images of other people ’s eyes

N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

SLIDE 36

Using simile classifiers for verification

Verification classifier

N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

SLIDE 37

Experimental evaluation

LFW Image-Restricted Benchmark:

6,000 face pairs (3,000 same, 3,000 different)
10-fold cross-validation
N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

http://vis-www.cs.umass.edu/lfw

SLIDE 38

Previous state-of-the-art on LFW

N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

as of May 2009

SLIDE 39

Kumar et al. 2009 on LFW

85.29% Accuracy (31.68% Drop in error rates)

N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

as of May 2009

SLIDE 40

Human face verification performance

Original 99.20% Cropped Cropped 97.53% Inverse Cropped 94.27%

N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

SLIDE 41

What about multiple objects in the same i ? image?

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 42

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 43

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 44

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 45

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 46

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 47

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 48

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 49

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 50

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 51

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 52

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 53

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 54

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 55

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 56

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 57

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 58

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 59

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 60

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 61

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 62

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 63

C. Desai, D. Ramanan, C. Fowlkes ICCV 2009

SLIDE 64

What to do about h b h d? The Object That Cannot Be Named?

Slides by Derek Hoiem Computer Science Department University of Illinois at Urbana- Champaign

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 65

A failure/success story

Photo by Ivan Makarov

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 66

Dealing with inevitable failure Failure in categorization should not mean failure in recognition g

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 67

Key question

What to do about the Obj t Th t C t B N d? Object That Cannot Be Named?

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 68

Example

Assisted Driving

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 69

Example

Security Security

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 70

Key steps

1. We need richer, more interconnected object

representations p

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 71

What makes a good object representation?

Prediction

– Where will it go, what will it do, how could I use it?

Description

Description

– What is it, what is it doing, what does it look like?

Generalization
Generalization

– Applicable beyond the immediate task

C i i

Composition

– New, related objects and tasks are easier to learn

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 72

Current View of Recognition

Appearance Representation Appearance Model Object Representation Training Examples

LAB Histogram Textons x x x x x

HOG

x x x x

Bag of SIFT

x

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 73

Current View of Recognition g

Appearance Representation Appearance Model Object Representation Training Examples

LAB Histogram Textons x x x x x

HOG

x x x x

Bag of SIFT

x

Lots of effort – fancy stuff

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 74

Current View of Recognition

Appearance Representation Appearance Model Object Representation Training Examples

LAB Histogram Textons x x x x x

HOG

x x x x

Bag of SIFT

x

Not much changed

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 75

Value of basic categories

Has head Is animal

DOG

Is furry Is small Can be pet Eats meat

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 76

Limitations of basic categories

They provide limited prediction and description

DOG DOG

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 77

Limitations of basic categories g

They do not apply to objects from novel categories y pp y j g

Familiar Objects New Object

Cat Horse Dog

???

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 78

Limitations of basic categories g

They do not make it easier to learn new categories y g

Appearance Features Dog Classifier Appearance Features Zebra Classifier

SLIDE 79

Category-based representation

Limited description and prediction
No generalization to objects outside of

learned categories g

Provides little guidance for learning

So what would make a better So what would make a better representation?

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 80

Attribute-based Representation

Learn intermediate structure with object categories

Multiple Categories animal, land animal, …, cat

fur ears

Viewpoint/pose lying down, left side, facing camera F ti

eyes th

Function fast runner, climb trees, eat small animals, jump high, household pet scratch

tail mouth

pet, scratch

feet

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 81

What we mean by attributes

Properties that we want to describe or predict
Shared across basic categories
Made explicit through supervision

Multiple Categories animal, land animal, …, cat

fur ears

Viewpoint/pose lying down, left side, facing camera F ti

eyes th

Function fast runner, climb trees, eat small animals, jump high, household pet scratch

tail mouth

pet, scratch

feet

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 82

What do these attributes get us?

Image Level

Categories Animal d l

Detailed Attributes Level Contains donkey

Land animal Mammal Four legged animal Elk

Object Level

Pose Lying down = 1 Back = 1 …

Horse Horse Object Level

Functional Can see Can walk Herbivorous … Material Pixel segmentations

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 83

Advantages of supervised attributes

Enables verbal description of objects and images

p j g Large angry dog with pointy teeth

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 84

Advantages of supervised attributes

Provides correspondence for objects from different

categories categories

HEAD HEAD STANDING SITTING LEG HEAD LEG LEG STANDING LEG

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 85

Domain-based Recognition

Basic-Level Superordinate Parts Categories Categories Parts

…

Cat Detector Dog Detector 4-Legged Animal D t t Head Detector Detector

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 86

Domain-based Recognition

Cat Detector D

4-Legged Animal

Dog Detector 4-Legged Animal D t t

gg Head

Detector Head Detector Walking Left

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 87

Domain-based recognition: overview

Voting using Basic Level Categories Trained Detectors Voting using Shared Spatial Models

Vehicle Animal

Elephant, Dog, Eagle, Camel, Lizard, Bat, Dog, Penguin, Monkey, … Object Localization

ib Broad Categories

Four-legged Animal,

Attribute Predictors

Mammal, Water Animal, Animal Animal Head

Can run Four-legged Mammal

Object Description

Parts

Leg, Horn, Wing, Head, Eye, Ear, Foot, Mouth, Nose, Tail Leg

Can run Can Jump Is Herbivorous Facing right

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 88

CORE Dataset

Cross-category Object REcognition

2780 I f I N

2780 Images – from ImageNet
3192 Objects – 28 Categories
26695 Parts – 71 types
30046 Attributes – 34 types
1052 Material Images – 10 types

Download or browse online: http://vision.cs.uiuc.edu/CORE http://vision.cs.uiuc.edu/CORE

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 89

CORE Dataset

Annotation Example

Mirrors

Vehicle

Headlight Seat Gas tank

Lic. Plate

Two-wheeled Motorcycle

Exhaust Tail light Motorcycle

Facing right On the street Has a rider

Metal Rubber Wheel Wheel Engine

Has a rider

Rubber

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 90

Dataset examples: animals

Categories Seen During Training and Testing Categories Seen Only During Testing

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 91

Dataset examples: vehicles

Categories Seen Only Categories Seen During Training and Testing Categories Seen Only During Testing

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 92

Result: Part detectors can generalize across categories

Part Detections for Novel Object

Hump Leg Head

Detectors trained using (Felzenszwalb Girshik McAllester Ramanan 2009) method

SLIDE 93

Result: Broad category detectors can generalize across basic categories

Category Detections for Novel Object

Four-legged Animal Mammal Animal Mammal

Detectors trained using (Felzenszwalb Girshik McAllester Ramanan 2009) method

SLIDE 94

describe objects from familiar i categories

Trunk u Trunk Leg Leg Foot Foot Foot

SLIDE 95

describe objects from familiar i categories

ROC for Localization of Familiar Objects

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 96

describe objects from familiar i categories

AUC for Attribute Prediction for Familiar Objects

Baseline: Infer from Basic Categories Our Method: Infer from All

0 9 1 0 9 1

Animals Vehicles

0,7 0,8 0,9 0,7 0,8 0,9 0,5 0,6 Has Part Basic Broad Function Pose 0,5 0,6 Has Part Basic Cat Broad Function Pose Has Part Basic Cat Broad Cat Function Pose Has Part Basic Cat Broad Cat Function Pose

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 97

Result using only basic categories

Elk Semi Truck Eagle Snowmobile Camel Dog

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 98

Result 3: We can find and describe objects from novel categories

V hi l Animal

Four-legged Mammal

Vehicle Animal Head

a a

Wheel Leg

Can run Can Jump b Is Herbivorous Facing right Moves on road Facing right

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 99

Result 3: We can find and describe objects from novel categories

ROC for Localization of Unfamiliar Objects

A. Farhadi, I. Endres, and D. Hoiem 2010

SLIDE 100

Result 3: We can find and describe objects from novel categories

AUC for Attribute Prediction for Unfamiliar Objects

Baseline: Infer from Basic Categories Our Method: Infer from All 0,8 0,8

Animals Vehicles

0 6 0,7 0 6 0,7 0,5 0,6 d 0,5 0,6 d Has Part Broad Cat Function Pose Has Part Broad Cat Function Pose

SLIDE 101

Summary of Findings

Current detectors are good enough to recognize

general parts and broad categories

Learning to recognize parts and broad categories

g g p g improves both detection and description

By going beyond categories, we can partially

recognize novel objects recognize novel objects

A. Farhadi, I. Endres, and D. Hoiem 2010