Multiclass object recognition Sharing parts and transfer learning - - PowerPoint PPT Presentation

multiclass object recognition sharing parts and transfer
SMART_READER_LITE
LIVE PREVIEW

Multiclass object recognition Sharing parts and transfer learning - - PowerPoint PPT Presentation

Multiclass object recognition Sharing parts and transfer learning Sharat Chikkerur Outline Historical perspective and motivation Discriminative approach A. Torralba, K. Murphy, W. Freeman, Sharing visual features for multiclass and


slide-1
SLIDE 1

Multiclass object recognition Sharing parts and transfer learning

Sharat Chikkerur

slide-2
SLIDE 2

Outline

 Historical perspective and motivation  Discriminative approach  A. Torralba, K. Murphy, W. Freeman, Sharing visual

features for multiclass and multiview object detection, IEEE PAMI 2007

 Bayesian approach  (Prelude) R. Fergus, P. Perona, A. Zisserman, Object

recognition by unsupervised scale­invariant learning, CVPR 03

 L. Fei­Fei, R. Fergus and P. Perona. One­Shot

learning of object categories. PAMI, 2006

slide-3
SLIDE 3

Perspective: Template vs parts

Dense representation

Useful for rigid objects

Less robust

Appearance only

Objects share features

Sparse representation

Rigid and articulate objects

More robust

Appearance and shape

Objects share parts

[Fischler73] [Fergus 03] [Viola & Jones 01] [Dalal & Triggs 05]

Summary: Part­based representation make more sense!

slide-4
SLIDE 4

Motivation: Sharing parts

slide-5
SLIDE 5

 Benefits

 Learning is faster

 Features are reused  Time complexity ~ O(log n) instead of O(n)

 Better generalization

 Individual parts share training data across classes  Robust to inter­class variation

 Challenges

 Identity of shared parts/classes unknown  Sharing may not follow tree structure  Exhaustive search ~ O(2P)

slide-6
SLIDE 6

How do you share parts?

 Create a universal dictionary of parts

 Serre et al 07 (HMAX), Ke and Sukhtankar (PCA SIFT)

 Learn the shared dictionary of parts

 Discriminative  Embed sharing into optimization  Discriminative dictionary (Marial et al 08)  Joint boosting (Torralba et al 07)  Generative  Use unlabeled data to learn prior  Constellation model (Fei­Fei et al 06)

slide-7
SLIDE 7

Discriminative approach

  • A. Torralba, K. Murphy, W. Freeman, Sharing visual

features for multiclass and multiview object detection, IEEE PAMI 2007

slide-8
SLIDE 8

Recap: Part representation

Feature (Appearance + Position)

slide-9
SLIDE 9

Recap: Boosting

An additive model for combining weak classifiers

Weak classifier:

Algorithm:

slide-10
SLIDE 10

 For each feature

 Evaluate the weighted error

 Pick the feature with minimum error

Choosing a weak classifier

slide-11
SLIDE 11

Joint Boosting

An additive model that jointly optimizes for all classes

Weak classifier:

H(v,1) H(v,2) H(v,3) H(v,4) H(v,5) h1(v,1) h1(v,2) h1(v,3) h1(v,4) h1(v,5) = + h2(v,1) h2(v,2) h2(v,3) h2(v,4) h2(v,5)

.....

slide-12
SLIDE 12

Vector valued

slide-13
SLIDE 13

Example

Joint boosting Independent Feature sharing

slide-14
SLIDE 14

Greedy approach

 Exhaustive search of all classes ~ O(2C)  Greedy approach

 Select the class with best reduction in error  Insert next class with lowest error  Continue till all classes are selected  Select the best member from the set  Complexity ~ O(C2)

slide-15
SLIDE 15

Typical behavior

 Independent features/ pairs ~ O(N)  Shared features ~ O(log N)

slide-16
SLIDE 16

Application: Object categorization

Feature (Appearance + Position) Training examples

  • Data: 21 object categories
  • 2000 candidate features (extracted by random sampling)
  • 50 training examples per category
slide-17
SLIDE 17

Object categorization: Performance

slide-18
SLIDE 18

Object categorization: Shared features

slide-19
SLIDE 19

Summary

 Joint boosting allows learning of shared parts (even

non­tree structures)

 Learning time reduces from O(N) to O(log N)  Allows scaling to large number of categories  Reduces training sample size (per class)  Useful for multi­class as well as multi­view recognition  Wish­list?  Automatic scale selection for features  Handling occlusion

slide-20
SLIDE 20

Bayesian approach

slide-21
SLIDE 21

Bayes 101: Coin tossing

 MLE

 Let p : probability of heads  Data : we observed H heads and T tails.  Inference: What is the chance of next head?  P(Head) = p = H/(H+T)

 Bayesian

 Let p: probability of heads (unknown!), p ~ f(p)  P(Head) = ∫P(Head|p) f(p) dp  Data: we observed H heads and T tails  p~ f(p|D), still not fixed!  P(Head|D) = ∫P(Head|p) f(p|D) dp

slide-22
SLIDE 22

Learning parameters: Conjugate priors

With no data, we assume that the coin is likely to be fair

Uncertainty based on hyper­ parameters p(h)~ B(a,b)

After we observe data

 D= (H­heads, T­tails)  the uncertainty in h is altered

p(h|D) ~ B(a+H,b+T)

Conjugate prior: Functional form of the prior and posterior distribution are identical Assumption “shared Knowledge” Learning

slide-23
SLIDE 23

Transfer learning

 Discriminative

 Given data: Learn shared parameters  New data : Use all old parameters (+ new)

 Bayesian

 Given data: Learn priors (“assumptions”)  New data : Update priors

slide-24
SLIDE 24

A prelude Object class recognition by unsupervised scale­invariant learning

slide-25
SLIDE 25

Constellation model

A1,X1 A2,X2 A4,X4 A5,X5 A3,X3 Torralba et al. ~100 parts A1,X1 A2,X2 A4,X4 A5,X5 A3,X3 Fergus et al < 10 parts

slide-26
SLIDE 26

Generative model/Bayesian detection

Generative model for shape, appearance and scale

H encodes the mapping from part (P) to interest point (N)

Example: N=10,P­4, h=[0 3 4 5] or h=[3 1 2 10], |h| ~ O(NP)

Bayesian detection:

MLE approximation Latent variable

slide-27
SLIDE 27

Factorization

Appearance

Shape

Scale

Occlusion

slide-28
SLIDE 28

Representation and learning

 Position/Shape

 Candidate part locations are obtained using Kadir­

Brady interest point detector

 Appearance

 Modeled using 11x11 pixels around the interest

point (PCA used for reducing dimension)

 Learning (EM)

X A h S

slide-29
SLIDE 29

Example: faces

slide-30
SLIDE 30

Example: motorbikes

slide-31
SLIDE 31

Comparison (Caltech 4)

Models are class­specific

Models are robust to scale variation

slide-32
SLIDE 32

Bayesian approach

  • L. Fei­Fei, R. Fergus and P. Perona. One­Shot learning
  • f object categories. PAMI, 2006
slide-33
SLIDE 33

Bayesian approach

MLE approximation

 Fergus et al (I = X,S,A)  Fei­Fei et al.

Parameter integration

slide-34
SLIDE 34

Generative model for shape and appearance

Foreground object (integrate over all hypotheses)

In the paper, =1 (>1 can handle pose variation)

Background (has a single null hypothesis)

Latent variable

slide-35
SLIDE 35

Factorization

Appearance

Fergus et al. Fei Fei et al.

Shape

Fergus et al. Fei Fei et al.

Scale and occlusion are not modeled

slide-36
SLIDE 36

Comparison

MLE

MAP

Bayesian

ө* ө

P(ө)

ө*

P(ө)

slide-37
SLIDE 37

Conjugate priors

Dirichlet Wishart Normal

Parameters

Priors

Hyper­parameters

Closed form solution

slide-38
SLIDE 38

Learning

 p(x|ө) = ∑p(x,h|ө)p(h|ө), h­unknown, but 'convenient'

 Regular EM

 E­step: Estimate p(h|x,өn), Q(ө)=Eh{ log(p(x,h|ө)) | h}  (Usually available in closed form)  M­step: Өn+1 = argmax Q(ө)

 Variational (EM)

 Getting p(h|x,өn) is hard­no closed form  p(h|x,өn) ~ q(x) , approximate the posterior

slide-39
SLIDE 39

Performance: Caltech 4

slide-40
SLIDE 40

Caltech 4 (cont.)

slide-41
SLIDE 41

Performance : Caltech 101

Performance: 10.4%, 13.9%,17.7% with 3,6,15 training example

State­of­the­art : > 60%

slide-42
SLIDE 42

Comparison

slide-43
SLIDE 43

Summary

 Transfer learning in a Bayesian setting  Recipe = learning priors on given data+ updating priors on

new data

 Good results with just 1~5 training examples (compared

to MLE approaches)

 Learning is hard (computationally)  Wishlist  Handling multiple objects within the image.

slide-44
SLIDE 44

Thank You!