Object Category Detection Yusuf Aytar & Andrew Zisserman, - - PowerPoint PPT Presentation
Object Category Detection Yusuf Aytar & Andrew Zisserman, - - PowerPoint PPT Presentation
Tabula Rasa: Model Transfer for Object Category Detection Yusuf Aytar & Andrew Zisserman, Department of Engineering Science Oxford (Presented by Elad Liebman) General Intuition I We have: a discriminatively trained classification
SLIDE 1
SLIDE 2
General Intuition I
- We have: a discriminatively
trained classification model for category A.
- We need: a classifier for a new
category B.
- Can we use it to make learning
a model for category B easier?
– Less examples? – Better accuracy?
SLIDE 3
General Intuition II
Tabula Rasa: Model Transfer for Object Category Detection, Aytar & Zisserman Motorbike images courtesy of the Caltech Vision Group, collated by Svetlana Lazebnik
SLIDE 4
Background I
- Good:
– There has been considerable progress recently in
- bject category detection.
– Successful tools are readily available.
- Bad:
– current methods require training the detector from scratch. – Training from scratch is very costly in terms of sample size required. – Not scalable in multi-category settings.
SLIDE 5
Background II
- Possible solution:
–Represent categories by their attributes, and re-use attributes. –Attributes are learned from multiple classes, so training data is abundant. –Attributes learned can be used even for categories that didn’t “participate” in the learning, as long as they share the attribute.
SLIDE 6
Background III
Wheel Detector
Use for detection of objects with “wheel” attributes
SLIDE 7
(This idea should sound familiar…)
“Sharing visual features for multiclass and multiview
- bject detection”, Torralba et al., 2007
– Training multiple category classifiers at the same time with lower sample and runtime complexity using shared features. – Uses a variation on boosting and shared regression stumps.
SLIDE 8
Torralba et al. – cont. I
Number of required features Effect on learning 12 different categories 12 views of same category
SLIDE 9
Torralba et al. – cont. II
- There is a difference in motivations here.
- Torralba et al. are mostly concerned with
scalability.
– Reduce the cost of training multiple detectors. – Use shared features when learning full sets of distinctive features per category is infeasible.
- Knowledge transfer is more concerned with
sample complexity.
– Use preexisting related classifiers when new examples are hard to come by.
SLIDE 10
- Unfortunately, this approach proves inferior in
practice to discriminative training (true for both detection and classification). (true to when the paper was
published…)
(Back to our paper…)
Wheel Detector
SLIDE 11
Background IV
- An alternative approach:
– Benefit from previously-learned category detectors. – Previously learned categories should be similar.
- We need a way to transfer information from
- ne classifier to the next.
SLIDE 12
Aytar & Zisserman I
- Consider the SVM discriminative training
framework for HOG template models of Dalal & Triggs & Felzenszwalb et al.
- Observation: learned template records the
spatial layout of positive and negative
- rientations.
- Classes that are geometrically similar will give
rise to similar templates.
SLIDE 13
Aytar & Zisserman II
- Apply transfer learning from one detector to
another.
- To do this, the previously learned template is
used as a regularizer in the cost function of the new classifier.
- This enables learning with a reduced number
- f examples.
SLIDE 14
Some (a few) Words on Regularization
- From a Bayesian standpoint, it’s similar to
introducing a prior.
- Often used to prevent overfitting or solve ill posed
problems.
- A good example for regularization: ridge regression
a𝑠𝑛𝑗𝑜𝛾{ 𝑍 − 𝑌𝛾 2+ Γ𝛾 2}
Images taken from Andrew Rosenberg’s slides, ML course, CUNY
SLIDE 15
Model Transfer Support Vector Machines
- We wish to detect a target object category.
- We already have a well trained detector for a
different source category.
- Three strategies to transfer knowledge from
the source detector to the target detector:
– Adaptive SVMs – Projective Model Transfer SVMs – Deformable Adaptive SVMs
SLIDE 16
Adaptive SVMs I
- Learn from the source model 𝑥𝑡 by
regularizing the distance between the learned model 𝑥 and 𝑥𝑡.
- 𝑦𝑗 are the training examples, 𝑧𝑗 ∈ {−1,1} are
the labels, and the loss function is the hinge loss: 𝑚 𝑦𝑗, 𝑧𝑗; 𝑥, 𝑐 = max (0, 1 − 𝑧𝑗 𝑥𝑈𝑦𝑗 + 𝑐 )
SLIDE 17
Adaptive SVMs II
- But now, our goal is to optimize:
𝑀𝐵 = min
𝑥,𝑐 { 𝑥 − Γ𝑥𝑡 2 + 𝐷 𝑚(𝑦𝑗, 𝑧𝑗; 𝑥, 𝑐) 𝑂 𝑗
}
- Γ controls the amount of transfer
regularization, 𝐷 controls the weight of the loss function and 𝑂 is the number of samples.
- Reminder: in regular SVMs we want to optimize:
𝑀𝐵 = min
𝑥,𝑐 { 𝑥 2 + 𝐷 𝑚(𝑦𝑗, 𝑧𝑗; 𝑥, 𝑐)} 𝑂 𝑗
SLIDE 18
An Illustration
minimize…
SLIDE 19
Adaptive SVMs III
- We note that if 𝑥𝑡 is normalized to 1 then:
- 𝑥 2 - “normal” SVM margin.
- (−2Γ 𝑥 𝑑𝑝𝑡𝜄) - the transfer.
- We wish to minimize 𝜄, the angle between 𝑥𝑡
and 𝑥.
- However, −2Γ 𝑥 𝑑𝑝𝑡𝜄 also encourages 𝑥 to
be larger, so Γ controls a tradeoff between margin maximization and knowledge transfer.
SLIDE 20
Projective Model Transfer SVMs I
- Rather than transfer by maximizing 𝑥 𝑑𝑝𝑡𝜄,
we can instead minimize the projection of 𝑥
- nto the separating hyperplane orthogonal to
𝑥𝑡.
- This directly translates to optimizing:
- Where 𝑄 is the projection matrix:
SLIDE 21
Yet another illustration
SLIDE 22
Projective Model Transfer SVMs II
- We note that 𝑄𝑥 2 is the squared norm of
the projection of 𝑥 onto the source hyperplane:
- 𝑥𝑈𝑥𝑡 ≥ 0 constraints 𝑥 to the positive
halfspace defined by 𝑥𝑡.
- Here too Γ controls the transfer. As Γ → 0, the
PMT-SVM reduces to a classic SVM
- ptimization problem.
SLIDE 23
Deformable Adaptive SVMs I
- Regularization shouldn’t be “equally forced”.
- Imagine we have a deformable source
template – small local deformations are allowed to better fit the source to the target.
- For instance, when transferring from a
motorbike wheel to a bicycle wheel:
- We need more flexible regularization…
SLIDE 24
Deformable Adaptive SVMs II
- Local deformations are described as a flow of
weight vectors from one cell to another, governed by the following flow definition:
- 𝜐 represents the flow transformation, 𝑥
𝑘 𝑡 is
the 𝑘𝑢ℎ cell in the source template, and 𝑔
𝑗𝑘
denotes the amount of transfer from the 𝑘𝑢ℎ cell in the source to the 𝑗𝑢ℎ cell in the target.
SLIDE 25
Deformable Adaptive SVMs III
𝑋
𝑘
𝑋
𝑗
𝑔
𝑗𝑘
SLIDE 26
Deformable Adaptive SVMs IV
- Now, the Deformable-Adaptive-SVM is simply
a generalization of the adaptive SVM we’ve seen before, with 𝑥𝑡 replaced with its deformable version 𝜐(𝑥𝑡):
(𝜇 is the weight of the deformation, 𝑒𝑗𝑘 is the distance between cells 𝑗, 𝑘 and 𝑒 is the penalty for overflow)
SLIDE 27
Deformable Adaptive SVMs V
- 𝜇 in effect controls the extent of deformability.
- High 𝜇 values make the model more rigid (you
pay more for the deformations you make), pushing the solution closer to that of the simple adaptive SVM.
- Low 𝜇 values allow for a more flexible source
template with less regularization.
- (Amazingly enough, the term 𝑥 − Γ𝜐(𝑥𝑡 2
is still convex.)
SLIDE 28
Experiments I.I
- In general, transfer learning can offer three
major benefits:
– Higher starting point – Higher slope (we learn faster) – Higher asymptote (learning converges into a better classifier)
SLIDE 29
Experiments I.II
- Two types of transfer experiments:
– Specialization (we know how to recognize quadrupeds, now we want to recognize horses) – Interclass transfer (we know how to recognize horses, now we want to recognize donkeys)
SLIDE 30
Experiments II – Interclass
- Baseline detectors are the SVM classifiers
trained directly without any transfer learning.
- Two scenarios studied:
– transferring from motorbikes to bicycles – transferring from cows to horses
- Two variants discussed:
– One shot learning – we can only choose one (!) example from the target class, and study our starting point. – Multiple shot learning
SLIDE 31
Experiments III – One Shot Learning
Top 15 Low 15 (middle)
(Looks good, but a bit unfair, especially when using lower-grade examples from the target category…)
SLIDE 32
Experiments IV – Multiple Shot
(We note that by ~10 examples, basic SVM has caught up with us…)
SLIDE 33
Experiments V – Multiple Shot
SLIDE 34
Experiments VI - Specialization
- “Quadruped” detector trained with instances of
cows, sheep and horses.
- Then specialization for cows and horses was
attempted via transfer.
(Once again we note that by ~15-20 examples, basic SVM has caught up with us…)
SLIDE 35
Discussion
- Pros:
– An interesting and fairly straightforward expansion
- f the basic category detection scheme.
– Provides a far better starting point for classifying new categories. – A different perspective on multi-category settings.
- Cons:
– “Closeness” between classes is very poorly defined. – One-shot experiments not particularly convincing. – Advantage degrades the more samples you have. – PMT-SVM doesn’t scale very well…
SLIDE 36
Something Related (But Different)
“Hedging Your Bets: Optimizing Accuracy Specificity Trade-Offs in Large Scale Visual Recognition”, Deng et al., 2012
– Object categories form a semantic hierarchy. – Make more reliable predictions about less specific classification when faced with uncertainty. (“If you liked Aytar & Zisserman, you might also enjoy this paper”)
SLIDE 37
Deng et al. – cont. I
- Given a hierarchy graph, a label is correct
either if it’s the right leaf, or any of its ancestors.
- In this setting, maximizing accuracy alone
cannot work.
- Instead – maximize information gain while
maintaining an error rate ≥ a required threshold.
- Done via a generalization of the Lagrange
multipliers method, with regular SVM one-vs- all classifiers for posterior probabilities on the leaves.
SLIDE 38
Deng et al. – cont. II
SLIDE 39
(Main References)
- Tabula Rasa: Model Transfer for Object
Category Recognition. Aytar & Zisserman, IEEE International Conference on Computer Vision, 2011.
- Histograms of Oriented Gradients for Human
- Detection. Dalal & Triggs, International
Conference on Computer Vision & Pattern Recognition - June 2005.
- Regularized Adaptation: Theory, Algorithms