Deformable Parts Model Rafi Witten, David Knight 4 April 2011 - - PowerPoint PPT Presentation

deformable parts model
SMART_READER_LITE
LIVE PREVIEW

Deformable Parts Model Rafi Witten, David Knight 4 April 2011 - - PowerPoint PPT Presentation

Deformable Parts Model Rafi Witten, David Knight 4 April 2011 Overview 1. Labeled dataset (PASCAL) 2. Making image pyramids 3. Feature extraction (HoG) 4. Parts model 5. Generating random negative examples 6. Generating hard negatives


slide-1
SLIDE 1

Deformable Parts Model

Rafi Witten, David Knight 4 April 2011

slide-2
SLIDE 2

Overview

  • 1. Labeled dataset (PASCAL)
  • 2. Making image pyramids
  • 3. Feature extraction (HoG)
  • 4. Parts model
  • 5. Generating random negative examples
  • 6. Generating hard negatives examples
  • 7. Train Latent SVM iteratively
slide-3
SLIDE 3
  • 1. Labeled dataset (PASCAL)
  • 2. Making image pyramids
  • 3. Feature extraction (HoG)
  • 4. Parts model
  • 5. Generating random negative examples
  • 6. Generating hard negatives examples
  • 7. Train Latent SVM iteratively
slide-4
SLIDE 4

PASCAL Challenge

  • ~10,000 images with ~25,000 target objects

– Objects from 20 categories (person, car, bicycle, cow, table) – Objects annotated with labeled bounding boxes – GOAL: produce a bounding box that overlaps 50%+ with ground truth bounding box

slide-5
SLIDE 5
slide-6
SLIDE 6
  • 1. Labeled dataset (PASCAL)
  • 2. Making image pyramids
  • 3. Feature extraction (HoG)
  • 4. Parts model
  • 5. Generating random negative examples
  • 6. Generating hard negatives examples
  • 7. Train Latent SVM iteratively
slide-7
SLIDE 7

Image Pyramid

  • Collection of the same

image at different sizes

  • Lower levels capture

higher spatial frequencies

  • Higher levels capture

lower spatial frequencies

slide-8
SLIDE 8

Implementation

image[0] = originalImage for i = 1 to (numPyramidLevels-1) image[i] = gaussFilter(image[i-1], sigma) image[i] = downSample(image[i], factor)

slide-9
SLIDE 9

Parameters

  • Number of levels
  • Gaussian filter σ
  • Per-level down-

sample factor

slide-10
SLIDE 10
  • 1. Labeled dataset (PASCAL)
  • 2. Making image pyramids
  • 3. Feature extraction (HoG)
  • 4. Parts model
  • 5. Generating random negative examples
  • 6. Generating hard negatives examples
  • 7. Train Latent SVM iteratively
slide-11
SLIDE 11

Histogram of Gradients

  • Intuition: normalized counts of gradient

directions in a local region are...

– Invariant to uniform lighting changes – Invariant to small shape deformations

  • Grayscale images are straight-forward
  • Color images

– At each pixel, take gradient of RGB channels, pick gradient with greatest magnitude

slide-12
SLIDE 12

Histogram of Gradients

  • HoG features

calculated for each level of an image pyramid

slide-13
SLIDE 13

Histogram of Gradients

  • Cells

– 8x8 pixels – Non-overlapping

  • Blocks

– 2x2 cells – Overlapping

Cell Block

slide-14
SLIDE 14

Histogram of Gradients

  • Gradient histograms generated per cell

– 9 bins

slide-15
SLIDE 15

Implementation

for each level in pyramid: levelGrad = gradient(level) for each cell in level: cell.hist = new Histogram(9 buckets) for each pixel in cell: cell.hist.vote(levelGrad.angle, levelGrad.magnitude) energySum = epsilon For each blockCell in block: energySum = energySum + sum(blockCell.hist.values) featureVec = new Vector()

binning normalize (L1-norm) concatenate take gradient

slide-16
SLIDE 16

Parameters

  • Cell size
  • Block size
  • Histogram

normalization method

Cell Block

slide-17
SLIDE 17
  • 1. Labeled dataset (PASCAL)
  • 2. Making image pyramids
  • 3. Feature extraction (HoG)
  • 4. Parts model
  • 5. Generating random negative examples
  • 6. Generating hard negatives examples
  • 7. Train Latent SVM iteratively
slide-18
SLIDE 18

Parts Model

  • Root window

region (cyan)

– Object of interest – Should coincide with PASCAL bounding box

slide-19
SLIDE 19

Parts Model

  • Parts window

regions (yellow)

– Positioned relative to root window – Located at a lower pyramid level (higher detail)

slide-20
SLIDE 20

Filters

  • Fixed size templates
  • Series of weights applied to a local region
  • f a HoG pyramid level
  • Applied with dot product between...

– Vector of filter weights – Vector of HoG features for a region

slide-21
SLIDE 21

Filter Scoring

Filter HoG features Quadratic Spatial Penalty

Part Distance from Root Center

slide-22
SLIDE 22

Parameters

  • Number of parts
  • Part filter sizes

– Width – Height

  • Spatial penalty si
slide-23
SLIDE 23

Framing as a Learning Problem

  • PASCAL data

“Weakly-labeled”

  • Training images
  • nly have object

ground truth

  • Part filters and

locations must be blindly learned

slide-24
SLIDE 24

Root Window Initialization

  • Dimensions

– Aspect ratio: most common amongst training data – Size: largest size smaller than 80% of the image

  • Filter

– Classical SVM – Resized (aspect ratio) positive examples

slide-25
SLIDE 25

Part Initialization

  • Where n is the number of parts, select an

area a such that na = 80% of the root filter area

– Pick rectangular region in a with largest positive energy, zero region out – Repeat above step (n-1) times

  • Copy (and resize) root filter content for

region, becomes initial part filter

  • ai = (0, 0), bi = (-1, -1)
slide-26
SLIDE 26

Framing as a Learning Problem

  • Filters and penalty terms become classifier

weights ( β )

  • Root and part window locations become

latent variables ( z )

  • Entire HoG pyramid becomes classifier

feature vectors ( x ) for latent variable search

  • HoG features in window regions become

classifier feature vectors ( Φ ) for final SVM training

slide-27
SLIDE 27
  • 1. Labeled dataset (PASCAL)
  • 2. Making image pyramids
  • 3. Feature extraction (HoG)
  • 4. Parts model
  • 5. Generating random negative examples
  • 6. Generating hard negatives examples
  • 7. Train Latent SVM iteratively
slide-28
SLIDE 28
  • We simply put bounding boxes in random

images.

  • Many don’t look very much like humans.
  • Intuitively this is making training slower –

we want fewer, ‘harder’ negatives.

slide-29
SLIDE 29
  • 1. Labeled dataset (PASCAL)
  • 2. Making image pyramids
  • 3. Feature extraction (HoG)
  • 4. Parts model
  • 5. Generating random negative examples
  • 6. Generating hard negatives examples
  • 7. Train Latent SVM iteratively
slide-30
SLIDE 30
  • Using the feature extraction methods

explained earlier, fit an SVM on negatives and original labeled dataset.

  • Keep hard negatives.
slide-31
SLIDE 31

Why Find Hard Examples?

  • This isn’t surprising, given intuition about

SVM, but fitting an SVM will give you exactly the same answer if you exclude the easy examples.

  • More formally, there is typically a relatively

small subset of the examples that, when trained on, give same separating hyperplane as the full dataset.

  • Since next step will be the bottleneck, it

helps to spend a lot of time finding that subset.

slide-32
SLIDE 32

Finding Hard Examples

  • This step is very important, so the system they

use is quite costly.

  • The algorithm follows. Here xi is the features

extracted from an image and bounding box. (1)Start with positive examples. (x1, x2, …, xk). (2)Fill cache with random negative examples up to size n. {(x1, 1), … (xk, 1), (xk+1, -1), …, (xn, -1)}. (3)Fit binary SVM on data from (2). Only using root filter. (4)Only keep hard examples ( yi fβ(xi) <1). (5)Goto (2).

slide-33
SLIDE 33

Theoretical Guarantees

  • Assume that there is indeed a subset of all
  • f the examples of size n that gives the

same hyperplane as the full dataset.

  • If at each step (2) we have space to add

more examples this can be shown to converge to one such subset.

  • In practice, they only run the loop for 10

iterations, and use the resulting dataset as the dataset for the latent SVM that we get to in part (7).

slide-34
SLIDE 34
  • We’re kind of cheating in the last step.

We’re assuming we know the bounding boxes.

  • Location of the bounding box is a latent

variable.

  • Without bounding box we can’t do the

feature extraction. Additionally, aren't using part model!

  • (Obviously, PASCAL challenge doesn’t

give us bounding box)

  • Solution: Latent SVM.
slide-35
SLIDE 35
  • 1. Labeled dataset (PASCAL)
  • 2. Making image pyramids
  • 3. Feature extraction (HoG)
  • 4. Parts model
  • 5. Generating random negative examples
  • 6. Generating hard negatives examples
  • 7. Train Latent SVM iteratively
slide-36
SLIDE 36

Latent SVM

slide-37
SLIDE 37

How To Optimize A Latent SVM

For z fixed, fω is a linear function (of ω), lets call it gω(x,z).

  • Hence, for all the zi fixed, what we have is:
  • Which is just a SVM!
slide-38
SLIDE 38

Latent SVM Algorithm

  • Given labeled data, {(x1, 1), … (xk, 1), (xk+1,
  • 1), …, (xn, -1)} and some initial ω.

(1)Find zi that that attain fω(xi). (2)Calculate ω to solve SVM: (3)Goto (1).

slide-39
SLIDE 39

Some Notes

  • Computing step (1) requires looking at all

bounding boxes.

  • The algorithm is a descent algorithm.
  • Won’t necessarily converge to global
  • ptimum, but fixed points are strong local
  • ptimum.
  • According to other sources, algorithm

does convergence in mathematical sense.

  • If you want to get better latent SVM

performance, try self-paced learning!

slide-40
SLIDE 40

What They Actually Do

(1) Choose hard examples, with ω fixed. Denote these {(x1 , 1), … (xk , 1), (xk+1 ,

  • 1), …, (xn , -1)}.

(2) Choose zk to attain max for each xk. (3) Solve SVM to calculate ω. (4) Goto 1.

  • Note that hard examples means pictures

with negatives that fire or positives with nothing that fires for current ω.

slide-41
SLIDE 41

Direction For Improvement?

  • The latent SVM algorithm doesn’t depend on

this, but fω is a convex function (since it’s the pointwise max of linear functions).

  • Hence, looking at the optimization problem

defined by the latent SVM:

  • Terms in sum are convex for yi<0, not for yi>0.
slide-42
SLIDE 42

Directions For Improvement?

  • This suggests an alternative algorithm, taking

advantage of so called “semi-convexity”.

  • Given labeled data, {(x1, 1), … (xk, 1), (xk+1, -1),

…, (xn, -1)} and some initial ω. (1)With ω fixed, find optimal zi for positive examples, {z1, …, zk}. (2)Solve resulting convex problem with positive zi fixed. (3)Goto (1).

slide-43
SLIDE 43

Problems

  • The reason the authors of the paper didn’t

use that algorithm is because it relies on solving a non-SVM convex problem.

  • This is typically much slower than using a

designated SVM solver.

  • Might be able to think of a fast way of

solving it.

  • May or may not result in improvement in
  • performance. (Theoretically, could hurt

performance).

slide-44
SLIDE 44

Results

  • Works well on both rigid and non-rigid
  • bjects.
  • Doesn't require that much training data.
  • On the 2007 challenge they would have

done the best on 10 out of 20 tasks and second best in 6 out of 20.

  • Takes approximately 3-4 hours for them to

train (which is considered good).

  • Ablative analysis justifies the system they

use.

slide-45
SLIDE 45
slide-46
SLIDE 46
slide-47
SLIDE 47

References

  • P.F. Felzenszwalb, et al.

– “A Discriminatively Trained, Multiscale, Deformable Part Model,” 2008 – http://people.cs.uchicago.edu/~pff/talks/deformable.pdf

  • SVM Image:

http://www.compgeom.com/~piyush/img/svm.png

  • M.P. Kumar, et al.

– SELF-PACED LEARNING FOR LATENT VARIABLE MODELS, 2008