Object categorization: the constellation models Li Fei-Fei with - - PowerPoint PPT Presentation

object categorization the constellation models
SMART_READER_LITE
LIVE PREVIEW

Object categorization: the constellation models Li Fei-Fei with - - PowerPoint PPT Presentation

Object categorization: the constellation models Li Fei-Fei with many thanks to Rob Fergus with many thanks to Rob Fergus The People and slides credit Pietro Perona Andrew Zisserman Thomas Leung Mike Burl Markus Weber Max Welling Rob


slide-1
SLIDE 1

Object categorization: the constellation models

Li Fei-Fei

with many thanks to Rob Fergus with many thanks to Rob Fergus

slide-2
SLIDE 2

The People and slides credit

Pietro Perona Mike Burl Thomas Leung Markus Weber Rob Fergus Max Welling Li Fei-Fei Andrew Zisserman

slide-3
SLIDE 3

Goal

  • Recognition of visual object classes
  • Unassisted learning
slide-4
SLIDE 4

Issues:

  • Representation
  • Learning
  • Recognition
slide-5
SLIDE 5

Model: Parts and Structure

slide-6
SLIDE 6
  • Fischler & Elschlager 1973
  • Yuille ‘91
  • Brunelli & Poggio ‘93
  • Lades, v.d. Malsburg et al. ‘93
  • Cootes, Lanitis, Taylor et al. ‘95
  • Amit & Geman ‘95, ‘99
  • et al. Perona ‘95, ‘96, ’98, ’00, ‘03
  • Huttenlocher et al. ’00
  • Agarwal & Roth ’02

etc…

Parts and Structure Literature

slide-7
SLIDE 7

The Constellation Model

  • T. Leung
  • M. Burl

Representation Detection Shape statistics – F&G ’95 Affine invariant shape – CVPR ‘98 CVPR ‘96 ECCV ‘98

  • M. Weber
  • M. Welling

Unsupervised Learning ECCV ‘00 Multiple views - F&G ’00 Discovering categories - CVPR ’00

  • R. Fergus
  • L. Fei-Fei

Joint shape & appearance learning Generic feature detectors One-Shot Learning Incremental learning CVPR ’03 Polluted datasets - ECCV ‘04 ICCV ’03 CVPR ‘04

slide-8
SLIDE 8

A B D C

Deformations

slide-9
SLIDE 9

Presence / Absence of Features

  • cclusion
slide-10
SLIDE 10

Background clutter

slide-11
SLIDE 11

Foreground model

Generative probabilistic model

Gaussian shape pdf

Clutter model

Uniform shape pdf

  • Prob. of detection

0.8 0.75 0.9

# detections

pPoisson(N2|λ2) pPoisson(N1|λ1) pPoisson(N3|λ3)

Assumptions: (a) Clutter independent of foreground detections (b) Clutter detections independent of each other

Example

  • 1. Object Part Positions
  • 3a. N false detect
  • 2. Part Absence

N1 N2

  • 3b. Position f. detect

N3

slide-12
SLIDE 12

Learning Models `Manually’

  • Obtain set of training images
  • Label parts by hand, train detectors
  • Learn model from labeled parts
  • Choose parts
slide-13
SLIDE 13

Recognition

1. Run part detectors exhaustively over image

⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ = 2 3 2 e.g.

4 3 2 1

h N N N N h K K K K

1 2 3 3 2 4 1 1 2 3 1 2

  • 2. Try different combinations of detections in model
  • Allow detections to be missing (occlusion)
  • 3. Pick hypothesis which maximizes:
  • 4. If ratio is above threshold then, instance detected

) , | ( ) , | ( Hyp Clutter Data p Hyp Object Data p

slide-14
SLIDE 14

So far…..

  • Representation

– Joint model of part locations – Ability to deal with background clutter and occlusions

  • Learning

– Manual construction of part detectors – Estimate parameters of shape density

  • Recognition

– Run part detectors over image – Try combinations of features in model – Use efficient search techniques to make fast

slide-15
SLIDE 15

Unsupervised Learning

Weber & Welling et. al.

slide-16
SLIDE 16

(Semi) Unsupervised learning

  • Know if image contains object or not
  • But no segmentation of object or manual selection of features
slide-17
SLIDE 17

Unsupervised detector training - 1

  • Highly textured neighborhoods are selected automatically
  • produces 100-1000 patterns per image

10 10

slide-18
SLIDE 18

Unsupervised detector training - 2

“Pattern Space” (100+ dimensions)

slide-19
SLIDE 19

Unsupervised detector training - 3

100-1000 images ~100 detectors

slide-20
SLIDE 20
  • Task:

Estimation of model parameters

Learning

  • Let the assignments be a hidden variable and use EM algorithm to

learn them and the model parameters

  • Chicken and Egg type problem, since we initially know neither:
  • Model parameters
  • Assignment of regions to foreground / background
  • Take training images. Pick set of detectors. Apply detectors.
slide-21
SLIDE 21

ML using EM

  • 1. Current estimate

... Image 1 Image 2 Image i

  • 2. Assign probabilities to constellations

Large P Small P

  • 3. Use probabilities as weights to re-estimate parameters. Example: μ

Large P x + Small P x pdf new estimate of μ + … =

slide-22
SLIDE 22

Detector Selection

Parameter Estimation

Choice 1 Choice 2

Parameter Estimation

Model 1 Model 2 Predict / measure model performance (validation set or directly from model) Detectors (≈100)

  • Try out different combinations of detectors

(Greedy search)

slide-23
SLIDE 23

Frontal Views of Faces

  • 200 Images (100 training, 100 testing)
  • 30 people, different for training and testing
slide-24
SLIDE 24

Learned face model

Pre-selected Parts Model Foreground pdf Sample Detection Parts in Model Test Error: 6% (4 Parts)

slide-25
SLIDE 25

Face images

slide-26
SLIDE 26

Background images

slide-27
SLIDE 27

Preselected Parts Model Foreground pdf Sample Detection Parts in Model

Car from Rear

Test Error: 13% (5 Parts)

slide-28
SLIDE 28

Detections of Cars

slide-29
SLIDE 29

Background Images

slide-30
SLIDE 30

3D Object recognition – Multiple mixture components

slide-31
SLIDE 31

3D Orientation Tuning

Frontal Profile

20 40 60 80 100 50 55 60 65 70 75 80 85 90 95 100

Orientation Tuning

angle in degrees % Correct

% Correct

slide-32
SLIDE 32

So far (2)…..

  • Representation

– Multiple mixture components for different viewpoints

  • Learning

– Now semi-unsupervised – Automatic construction and selection of part detectors – Estimation of parameters using EM

  • Recognition

– As before

  • Issues:
  • Learning is slow (many combinations of detectors)
  • Appearance learnt first, then shape
slide-33
SLIDE 33

Issues

  • Speed of learning

– Slow (many combinations of detectors)

  • Appearance learnt first, then shape

– Difficult to learn part that has stable location but variable appearance – Each detector is used as a cross-correlation filter, giving a hard definition of the part’s appearance

  • Would like a fully probabilistic representation of

the object

slide-34
SLIDE 34

Object categorization

Fergus et. al. CVPR ‘03

slide-35
SLIDE 35

Detection & Representation of regions

Appearance Location Scale

(x,y) coords. of region centre Radius of region (pixels) 11x11 patch Normalize Projection onto PCA basis c1 c2 c15

………..

Gives representation of appearance in low-dimensional vector space

  • Find regions within image
  • Use salient region operator

(Kadir & Brady 01)

slide-36
SLIDE 36

Motorbikes example

  • Kadir & Brady saliency region detector
slide-37
SLIDE 37

Foreground model

Gaussian shape pdf Poission pdf on # detections Uniform shape pdf

Generative probabilistic model (2)

Clutter model

Gaussian part appearance pdf Gaussian background appearance pdf

  • Prob. of detection

0.8 0.75 0.9

Gaussian relative scale pdf

log(scale)

Uniform relative scale pdf

log(scale)

based on Burl, Weber et al. [ECCV ’98, ’00]

slide-38
SLIDE 38

Motorbikes

Samples from appearance model

slide-39
SLIDE 39

Recognized Motorbikes

slide-40
SLIDE 40

Background images evaluated with motorbike model

slide-41
SLIDE 41

Frontal faces

slide-42
SLIDE 42

Airplanes

slide-43
SLIDE 43

Spotted cats

slide-44
SLIDE 44

Summary of results

10.0 10.0 Spotted cats 9.7 15.2 Cars (Rear) 7.0 9.8 Airplanes 4.6 4.6 Faces 6.7 7.5 Motorbikes

Scale invariant experiment Fixed scale experiment Dataset

% equal error rate Note: Within each series, same settings used for all datasets

slide-45
SLIDE 45

Comparison to other methods

Agarwal Roth [ECCV ’02] 21.0 11.5 Cars (Side) Weber 32.0 9.8 Airplanes Weber 6.0 4.6 Faces Weber et al. [ECCV ‘00] 16.0 7.5 Motorbikes Others Ours Dataset

% equal error rate

slide-46
SLIDE 46

Why this design?

  • Generic features seem to well in finding consistent parts
  • f the object
  • Some categories perform badly – different feature types

needed

  • Why PCA representation?

– Tried ICA, FLD, Oriented filter responses etc. – But PCA worked best

  • Fully probabilistic representation lets us use tools from

machine learning community

slide-47
SLIDE 47
  • S. Savarese, 2003
slide-48
SLIDE 48
  • P. Buegel, 1562
slide-49
SLIDE 49

One-Shot learning

Fei-Fei et. al. ICCV ‘03

slide-50
SLIDE 50

Faces, Cars ~2,000

Schneiderman, et al.

Faces ~500

Rowley et al.

Faces, Motorbikes, Spotted cats, Airplanes, Cars 200 ~ 400

Burl, et al. Weber, et al. Fergus, et al.

Faces ~10,000

Viola et al.

Categories Training Examples Algorithm

slide-51
SLIDE 51

1 2 3 4 5 6 7 8 9 10 20 30 40 50 60

log2 (Training images) Classification error (%) Generalisation performance Test Train

Number of training examples

Previously 6 part Motorbike model

slide-52
SLIDE 52

How do we do better than what statisticians have told us?

  • Intuition 1: use Prior information
  • Intuition 2: make best use of training information
slide-53
SLIDE 53

Prior knowledge: means

Shape Appearance

likely unlikely

slide-54
SLIDE 54

Bayesian framework

P(object | test, train) vs. P(clutter | test, train)

)

  • bject

( ) train

  • bject,

| test ( p p

Bayes Rule

θ θ θ d p p

) train

  • bject,

| ( )

  • bject

, | test (

Expansion by parametrization

slide-55
SLIDE 55

Bayesian framework

( )

ML

θ δ

Previous Work: P(object | test, train) vs. P(clutter | test, train)

)

  • bject

( ) train

  • bject,

| test ( p p

Bayes Rule

θ θ θ d p p

) train

  • bject,

| ( )

  • bject

, | test (

Expansion by parametrization

slide-56
SLIDE 56

Bayesian framework

One-Shot learning:

( ) ( )

θ θ p p

  • bject

, train

P(object | test, train) vs. P(clutter | test, train)

)

  • bject

( ) train

  • bject,

| test ( p p

Bayes Rule

θ θ θ d p p

) train

  • bject,

| ( )

  • bject

, | test (

Expansion by parametrization

slide-57
SLIDE 57

θ1 θ2 θn

model (θ) space

Each object model θ

Gaussian shape pdf Gaussian part appearance pdf

Model Structure

slide-58
SLIDE 58

θ2 θn

model distribution: p(θ)

  • conjugate distribution of p(train|θ,object)

θ1

model (θ) space

Each object model θ

Gaussian shape pdf Gaussian part appearance pdf

Model Structure

slide-59
SLIDE 59

Learning Model Distribution

  • use Prior information
  • Bayesian learning
  • marginalize over theta

Variational EM (Attias, Hinton, Minka, etc.)

( ) ( ) ( )

θ θ θ p p p

  • bject

, train train

  • bject,

slide-60
SLIDE 60

E-Step

Random initialization

Variational EM Variational EM

prior knowledge of p(θ)

new estimate

  • f p(θ|train)

M-Step

new θ’s

slide-61
SLIDE 61

Experiments

Training: 1- 6 randomly drawn images Testing: 50 fg/ 50 bg images

  • bject present/absent

Datasets

spotted cats airplanes motorbikes faces

[www.vision.caltech.edu]

slide-62
SLIDE 62

Faces Airplanes Motorbikes Spotted cats

slide-63
SLIDE 63

Experiments: obtaining priors

spotted cats airplanes motorbikes faces

Miller, et al. ‘00 model (θ) space

slide-64
SLIDE 64

Experiments: obtaining priors

spotted cats faces airplanes motorbikes

model (θ) space

slide-65
SLIDE 65

Number of training examples

slide-66
SLIDE 66

Number of training examples

slide-67
SLIDE 67

Number of training examples

slide-68
SLIDE 68

Number of training examples

slide-69
SLIDE 69

7.5 – 24.1% Faces ~500

Rowley et al.

5.6 – 17% Faces, Cars ~2,000

Schneiderman, et al.

8 – 15 %

Faces, Motorbikes, Spotted cats, Airplanes

1 ~ 5

Bayesian One-Shot

5.6 - 10 % Faces, Motorbikes, Spotted cats, Airplanes, Cars 200 ~ 400

Burl, et al. Weber, et al. Fergus, et al.

7-21% Faces ~10,000

Viola et al.

Results(e rror) Categories Training Examples Algorithm

slide-70
SLIDE 70
  • Viewpoint variation not accounted for, so learnt intrinsically

(legs of camel, curve of wheels for motorbikes)

  • Move to explicit representation (i.e. mixture models)
  • Use prior information: (a) Learning models

(b) commonly selected images

  • Use partially-labelled learning methods for 10 images case
  • Improve unsupervised learning methods

Future work