Objects and scenes Objects and scenes: Recognizing Multiple Object - - PowerPoint PPT Presentation

objects and scenes objects and scenes recognizing
SMART_READER_LITE
LIVE PREVIEW

Objects and scenes Objects and scenes: Recognizing Multiple Object - - PowerPoint PPT Presentation

Reconnaissance dobjets et vision artificielle 2010 Objects and scenes Objects and scenes: Recognizing Multiple Object Classes Josef Sivic and Ivan Laptev http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire


slide-1
SLIDE 1

Reconnaissance d’objets et vision artificielle 2010

Objects and scenes Objects and scenes: Recognizing Multiple Object Classes

Josef Sivic and Ivan Laptev http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d’Informatique, Ecole Normale Supérieure, Paris With slides from: A. Torralba, D. Hoiem, D. Ramanan and others.

slide-2
SLIDE 2

Multiclass object detection j

slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5

Context: objects appear in configurations

slide-6
SLIDE 6

Generalization: objects share parts

slide-7
SLIDE 7

How many categories?

slide-8
SLIDE 8

How many categories?

Slide by Aude Oliva

slide-9
SLIDE 9

How many object categories are there?

Biederman 1987

slide-10
SLIDE 10

How many categories?

  • Probably this question is not even specific

h h enough to have an answer

slide-11
SLIDE 11

Which level of categorization is the right one? is the right one?

Car is an object composed of: j p a few doors, four wheels (not all visible at all times), a roof, front lights, windshield

?

If you are thinking in buying a car, you might want to be a bit more specific about your categorization level.

slide-12
SLIDE 12

Entry-level categories

(Jolicoeur Gluck Kosslyn 1984) (Jolicoeur, Gluck, Kosslyn 1984)

  • Typical member of a basic-level category are

yp g y categorized at the expected level

  • Atypical members tend to be classified at a
  • Atypical members tend to be classified at a

subordinate level.

A bird An ostrich

slide-13
SLIDE 13

We do not need to recognize the exact category g g y

A new class can borrow information from similar A new class can borrow information from similar categories

slide-14
SLIDE 14

So, where is computer vision?

Well…

slide-15
SLIDE 15

Multiclass object detection

th t l d the not so early days

slide-16
SLIDE 16

Multiclass object detection

th t l d the not so early days

Using a set of independent binary classifiers was a common strategy:

  • Viola-Jones extension for dealing with rotations
  • two cascades for each view
  • Schneiderman-Kanade multiclass object detection

(a) One detector for each class

There is nothing wrong with this approach if you have access to lots of training data and you do not care about efficiency.

slide-17
SLIDE 17

Generalizing Across Categories

Can we transfer knowledge from one object category to another?

Slide by Erik Sudderth

slide-18
SLIDE 18

Shared features

  • Is learning the object class 1000 easier than

f learning the first?

  • Can we transfer knowledge from one object to

another?

  • Are the shared properties interesting by

themselves? themselves?

slide-19
SLIDE 19

Additive models and boosting

  • Independent binary classifiers:

Screen detector C d t t Car detector Face detector

  • Binary classifiers that share features:

Screen detector y Car detector d

Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007

Face detector

slide-20
SLIDE 20

Specific feature p

pedestrian pedestrian chair Traffic light sign face k d l

Non-shared feature: this feature is too specific to faces

Background class

is too specific to faces.

Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007

slide-21
SLIDE 21

Shared feature

shared feature

Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007

slide-22
SLIDE 22

50 training samples/class 50 training samples/class 29 object classes 2000 entries in the dictionary 2000 entries in the dictionary Results averaged on 20 runs

Class specific features

Results averaged on 20 runs

Class-specific features Shared features

Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007

slide-23
SLIDE 23

Generalization as a function of object i il i i similarities

12 viewpoints 12 unrelated object classes 12 viewpoints 12 unrelated object classes

C C under ROC under ROC Area Area Number of training samples per class Number of training samples per class

Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007

slide-24
SLIDE 24

Generic vs. specific features Generic vs. specific features

Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007

slide-25
SLIDE 25

Object clustering according to shared features Object clustering according to shared features

Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007

slide-26
SLIDE 26

Another multi-class problem: F i i Face recognition

  • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

We do not want to learn recognition of each person from scratch!

slide-27
SLIDE 27

Are these images of the same person? Are these images of the same person?

  • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009
slide-28
SLIDE 28

Prior approaches

Images Verification Low-level f g features

RGB HOG LBP SIFT

Different

SIFT … RGB HOG LBP LBP SIFT …

  • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009
slide-29
SLIDE 29

Approach: attributes

Images Verification Attributes Low-level f g features

RGB HOG LBP SIFT

+

  • Male

Round Jaw Asian

Different

SIFT …

Dark hair R

RGB HOG LBP

+

LBP SIFT

  • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

slide-30
SLIDE 30

Attributes can define categories

Female Eyeglasses Middle-aged Dark hair Caucasian

  • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009
slide-31
SLIDE 31

Some attributes may be irrelevant

Teeth showing Outside Tilted head

  • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009
slide-32
SLIDE 32

Using attributes to perform verification

Verification classifier classifier

  • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009
slide-33
SLIDE 33

Attributes are intuitive

Female Black hair Young Frontal pose Attractive Mouth closed White Eyes open

  • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009
slide-34
SLIDE 34

Describe faces using similes

Penelope Cruz Angelina Jolie

  • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009
slide-35
SLIDE 35

Training simile classifiers

Images of Penelope Cruz ’s eyes Images of Penelope Cruz s eyes Images of other people ’s eyes

  • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009
slide-36
SLIDE 36

Using simile classifiers for verification

Verification classifier

  • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009
slide-37
SLIDE 37

Experimental evaluation

LFW Image-Restricted Benchmark:

  • 6,000 face pairs (3,000 same, 3,000 different)
  • 10-fold cross-validation
  • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

http://vis-www.cs.umass.edu/lfw

slide-38
SLIDE 38

Previous state-of-the-art on LFW

  • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

as of May 2009

slide-39
SLIDE 39

Kumar et al. 2009 on LFW

85.29% Accuracy (31.68% Drop in error rates)

  • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009

as of May 2009

slide-40
SLIDE 40

Human face verification performance

Original 99.20% Cropped Cropped 97.53% Inverse Cropped 94.27%

  • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar ICCV 2009
slide-41
SLIDE 41

What about multiple objects in the same i ? image?

  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-42
SLIDE 42
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-43
SLIDE 43
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-44
SLIDE 44
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-45
SLIDE 45
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-46
SLIDE 46
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-47
SLIDE 47
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-48
SLIDE 48
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-49
SLIDE 49
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-50
SLIDE 50
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-51
SLIDE 51
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-52
SLIDE 52
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-53
SLIDE 53
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-54
SLIDE 54
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-55
SLIDE 55
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-56
SLIDE 56
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-57
SLIDE 57
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-58
SLIDE 58
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-59
SLIDE 59
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-60
SLIDE 60
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-61
SLIDE 61
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-62
SLIDE 62
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-63
SLIDE 63
  • C. Desai, D. Ramanan, C. Fowlkes ICCV 2009
slide-64
SLIDE 64

What to do about h b h d? The Object That Cannot Be Named?

Slides by Derek Hoiem Computer Science Department University of Illinois at Urbana- Champaign

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-65
SLIDE 65

A failure/success story

Photo by Ivan Makarov

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-66
SLIDE 66

Dealing with inevitable failure Failure in categorization should not mean failure in recognition g

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-67
SLIDE 67

Key question

What to do about the Obj t Th t C t B N d? Object That Cannot Be Named?

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-68
SLIDE 68

Example

Assisted Driving

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-69
SLIDE 69

Example

Security Security

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-70
SLIDE 70

Key steps

  • 1. We need richer, more interconnected object

representations p

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-71
SLIDE 71

What makes a good object representation?

  • Prediction

– Where will it go, what will it do, how could I use it?

  • Description

Description

– What is it, what is it doing, what does it look like?

  • Generalization
  • Generalization

– Applicable beyond the immediate task

C i i

  • Composition

– New, related objects and tasks are easier to learn

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-72
SLIDE 72

Current View of Recognition

Appearance Representation Appearance Model Object Representation Training Examples

LAB Histogram Textons x x x x x

  • HOG

x x x x

  • Bag of SIFT

x

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-73
SLIDE 73

Current View of Recognition g

Appearance Representation Appearance Model Object Representation Training Examples

LAB Histogram Textons x x x x x

  • HOG

x x x x

  • Bag of SIFT

x

Lots of effort – fancy stuff

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-74
SLIDE 74

Current View of Recognition

Appearance Representation Appearance Model Object Representation Training Examples

LAB Histogram Textons x x x x x

  • HOG

x x x x

  • Bag of SIFT

x

Not much changed

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-75
SLIDE 75

Value of basic categories

Has head Is animal

DOG

Is furry Is small Can be pet Eats meat

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-76
SLIDE 76

Limitations of basic categories

They provide limited prediction and description

DOG DOG

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-77
SLIDE 77

Limitations of basic categories g

They do not apply to objects from novel categories y pp y j g

Familiar Objects New Object

Cat Horse Dog

???

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-78
SLIDE 78

Limitations of basic categories g

They do not make it easier to learn new categories y g

Appearance Features Dog Classifier Appearance Features Zebra Classifier

slide-79
SLIDE 79

Category-based representation

  • Limited description and prediction
  • No generalization to objects outside of

learned categories g

  • Provides little guidance for learning

So what would make a better So what would make a better representation?

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-80
SLIDE 80

Attribute-based Representation

Learn intermediate structure with object categories

Multiple Categories animal, land animal, …, cat

fur ears

Viewpoint/pose lying down, left side, facing camera F ti

eyes th

Function fast runner, climb trees, eat small animals, jump high, household pet scratch

tail mouth

pet, scratch

feet

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-81
SLIDE 81

What we mean by attributes

  • Properties that we want to describe or predict
  • Shared across basic categories
  • Made explicit through supervision

Multiple Categories animal, land animal, …, cat

fur ears

Viewpoint/pose lying down, left side, facing camera F ti

eyes th

Function fast runner, climb trees, eat small animals, jump high, household pet scratch

tail mouth

pet, scratch

feet

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-82
SLIDE 82

What do these attributes get us?

Image Level

Categories Animal d l

Detailed Attributes Level Contains donkey

Land animal Mammal Four legged animal Elk

Object Level

Pose Lying down = 1 Back = 1 …

Horse Horse Object Level

Functional Can see Can walk Herbivorous … Material Pixel segmentations

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-83
SLIDE 83

Advantages of supervised attributes

  • Enables verbal description of objects and images

p j g Large angry dog with pointy teeth

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-84
SLIDE 84

Advantages of supervised attributes

  • Provides correspondence for objects from different

categories categories

HEAD HEAD STANDING SITTING LEG HEAD LEG LEG STANDING LEG

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-85
SLIDE 85

Domain-based Recognition

Basic-Level Superordinate Parts Categories Categories Parts

Cat Detector Dog Detector 4-Legged Animal D t t Head Detector Detector

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-86
SLIDE 86

Domain-based Recognition

Cat Detector D

4-Legged Animal

Dog Detector 4-Legged Animal D t t

gg Head

Detector Head Detector Walking Left

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-87
SLIDE 87

Domain-based recognition: overview

Voting using Basic Level Categories Trained Detectors Voting using Shared Spatial Models

Vehicle Animal

Elephant, Dog, Eagle, Camel, Lizard, Bat, Dog, Penguin, Monkey, … Object Localization

ib Broad Categories

Four-legged Animal,

Attribute Predictors

Mammal, Water Animal, Animal Animal Head

Can run Four-legged Mammal

Object Description

Parts

Leg, Horn, Wing, Head, Eye, Ear, Foot, Mouth, Nose, Tail Leg

Can run Can Jump Is Herbivorous Facing right

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-88
SLIDE 88

CORE Dataset

Cross-category Object REcognition

2780 I f I N

  • 2780 Images – from ImageNet
  • 3192 Objects – 28 Categories
  • 26695 Parts – 71 types
  • 30046 Attributes – 34 types
  • 1052 Material Images – 10 types

Download or browse online: http://vision.cs.uiuc.edu/CORE http://vision.cs.uiuc.edu/CORE

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-89
SLIDE 89

CORE Dataset

Annotation Example

Mirrors

Vehicle

Headlight Seat Gas tank

  • Lic. Plate

Two-wheeled Motorcycle

Exhaust Tail light Motorcycle

Facing right On the street Has a rider

Metal Rubber Wheel Wheel Engine

Has a rider

Rubber

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-90
SLIDE 90

Dataset examples: animals

Categories Seen During Training and Testing Categories Seen Only During Testing

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-91
SLIDE 91

Dataset examples: vehicles

Categories Seen Only Categories Seen During Training and Testing Categories Seen Only During Testing

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-92
SLIDE 92

Result: Part detectors can generalize across categories

Part Detections for Novel Object

Hump Leg Head

Detectors trained using (Felzenszwalb Girshik McAllester Ramanan 2009) method

slide-93
SLIDE 93

Result: Broad category detectors can generalize across basic categories

Category Detections for Novel Object

Four-legged Animal Mammal Animal Mammal

Detectors trained using (Felzenszwalb Girshik McAllester Ramanan 2009) method

slide-94
SLIDE 94

describe objects from familiar i categories

Trunk u Trunk Leg Leg Foot Foot Foot

slide-95
SLIDE 95

describe objects from familiar i categories

ROC for Localization of Familiar Objects

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-96
SLIDE 96

describe objects from familiar i categories

AUC for Attribute Prediction for Familiar Objects

Baseline: Infer from Basic Categories Our Method: Infer from All

0 9 1 0 9 1

Animals Vehicles

0,7 0,8 0,9 0,7 0,8 0,9 0,5 0,6 Has Part Basic Broad Function Pose 0,5 0,6 Has Part Basic Cat Broad Function Pose Has Part Basic Cat Broad Cat Function Pose Has Part Basic Cat Broad Cat Function Pose

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-97
SLIDE 97

Result using only basic categories

Elk Semi Truck Eagle Snowmobile Camel Dog

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-98
SLIDE 98

Result 3: We can find and describe objects from novel categories

V hi l Animal

Four-legged Mammal

Vehicle Animal Head

a a

Wheel Leg

Can run Can Jump b Is Herbivorous Facing right Moves on road Facing right

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-99
SLIDE 99

Result 3: We can find and describe objects from novel categories

ROC for Localization of Unfamiliar Objects

  • A. Farhadi, I. Endres, and D. Hoiem 2010
slide-100
SLIDE 100

Result 3: We can find and describe objects from novel categories

AUC for Attribute Prediction for Unfamiliar Objects

Baseline: Infer from Basic Categories Our Method: Infer from All 0,8 0,8

Animals Vehicles

0 6 0,7 0 6 0,7 0,5 0,6 d 0,5 0,6 d Has Part Broad Cat Function Pose Has Part Broad Cat Function Pose

slide-101
SLIDE 101

Summary of Findings

  • Current detectors are good enough to recognize

general parts and broad categories

  • Learning to recognize parts and broad categories

g g p g improves both detection and description

  • By going beyond categories, we can partially

recognize novel objects recognize novel objects

  • A. Farhadi, I. Endres, and D. Hoiem 2010