Object Detection with Discriminatively Trained Part Based Models - - PowerPoint PPT Presentation

object detection with discriminatively trained part based
SMART_READER_LITE
LIVE PREVIEW

Object Detection with Discriminatively Trained Part Based Models - - PowerPoint PPT Presentation

Object Detection with Discriminatively Trained Part Based Models Pedro F . Felzenszwalb, Ross B. Girshick, David McAllester and Deva Ramanan Presented by Amy Bearman and Amani Peddada Roadmap 1. Introduction 2. Related Work 3. Model Overview


slide-1
SLIDE 1

Object Detection with Discriminatively Trained Part Based Models

Pedro F . Felzenszwalb, Ross B. Girshick, David McAllester and Deva Ramanan Presented by Amy Bearman and Amani Peddada

slide-2
SLIDE 2

Roadmap

  • 1. Introduction
  • 2. Related Work
  • 3. Model Overview
  • 4. Latent SVM
  • 5. Features & Post Processing
  • 6. Experiments
slide-3
SLIDE 3

Introduction

  • Problem: Detecting and localizing generic objects from

various categories, such as cars, people, etc.

  • Challenges: Illumination, viewpoint, deformations,

intraclass variability

slide-4
SLIDE 4

How they solve it

Mixtures of multi-scale deformable part model

  • Trained with a discriminative procedure
  • Data is partially labeled (bounding boxes, not parts)
slide-5
SLIDE 5

Deformable parts model

  • Represents an object as a

collection of parts arranged in a deformable configuration

  • Each part represents local

appearances

  • Spring-like connections

between certain pairs of parts

slide-6
SLIDE 6

One motivation of this paper

To address the performance gap between simpler models: … and sophisticated models like deformable parts Rigid templates

slide-7
SLIDE 7

Why do simpler models perform better?

  • Simple models are easily trained using discriminative

methods such as SVMs

  • Richer models use latent information (location of parts)
slide-8
SLIDE 8

Roadmap

  • 1. Introduction
  • 2. Related Work
  • 3. Model Overview
  • 4. Latent SVM
  • 5. Features & Post Processing
  • 6. Experiments
slide-9
SLIDE 9

Related Work: Detection

  • Bag-of-Features
  • Rigid Templates
  • Dalal-Triggs
  • Deformable Models
  • Deformable Templates (e.g. Active Appearance

Models)

  • Part-Based Models — Constellation, Pictorial Structure
slide-10
SLIDE 10

Dalal-Triggs Method

  • Histogram of Oriented

Gradients for Human Detection - Dalal and Triggs, 2005

  • Sliding Window, HOG feature

extraction + Linear SVM

  • One of the most influential

papers in CV!

slide-11
SLIDE 11

Active Appearance Model

  • Active Appearance

Models - Cootes, Edwards, and Taylor, 1998

  • Attempts to match

statistical model to new image using iterative scheme

slide-12
SLIDE 12

Deformable Models — Constellation

  • Object class recognition by

unsupervised scale-invariant learning - Fergus et al., 2003

  • Utilizes Expectation Maximization to

determine parameters of scale- invariant model

  • Entropy-based feature detector.
  • Appearance learnt simultaneously with

shape.

slide-13
SLIDE 13

Constellation Models

  • Towards Automatic Discovery
  • f Object Categories - Weber et

al., 2000

  • Derives Mixture Models and a

probabilistic framework for modeling classes with large variability

  • Constrained to testing on faces,

leaves, and cars.

  • Automatically selects distinctive

features of object class

slide-14
SLIDE 14

Pictorial Structure Models

  • The Representation and Matching of Pictorial

Structures - Fischler & Elschlager, 1973

  • Formalizes a dynamic programming approach (“Linear

Embedding Algorithm”) to find optimal configuration of part- based model.

slide-15
SLIDE 15

Pictorial Structure Models

  • Pictorial Structures for Object Recognition -

Felzenszwalb et al., 2005

  • Finds multiple optimal hypotheses; presents framework as a

energy minimization problem over graph

  • Poses novel, efficient minimization techniques to achieve

reasonable results on face/body image data.

slide-16
SLIDE 16

Roadmap

  • 1. Introduction
  • 2. Related Work
  • 3. Model Overview
  • 4. Latent SVM
  • 5. Features & Post Processing
  • 6. Experiments
slide-17
SLIDE 17

Starting point: sliding window classifiers

  • Detect objects by testing each sub-window
  • Reduces object detection to binary classification
  • Dalal & Triggs: HOG features + linear SVM classifier
  • Previous state of the art for detecting people
slide-18
SLIDE 18

Innovations on Dalal-Triggs

  • Star model = root filter + set of part filters and associated

deformation models

Root filter analogous to Dalal-Triggs Part filters

slide-19
SLIDE 19

HOG Filters

  • Models use linear filters applied to dense feature maps
  • Feature map = array of feature vectors, where each

feature vector describes a local image patch

  • Filter = rectangular template = array of weight vectors
  • Score = dot product of the filter and a sub-window of the

feature map

slide-20
SLIDE 20

Feature Pyramid

slide-21
SLIDE 21

Model Overview

  • Mixture of deformable part models
  • Each component has global component + deformable

parts

  • Fully trained from bounding boxes alone
slide-22
SLIDE 22

Deformable Part Models

  • Star model: coarse root filter + higher resolution part filters
  • Higher resolution features for part filters is essential for

high recognition performance

slide-23
SLIDE 23

Deformable Part Models

  • A model for an object with parts is a tuple:

(Fi, vi, di) (F0, P1, · · · , Pn, b)

filter for the i-th part “anchor” position for part i relative to the root position defines a deformation cost for each possible placement of the part relative to the anchor position Root filter Model for 1st part Bias term

(n + 2) n

  • Each part-based model defined as:

Fi vi di

slide-24
SLIDE 24

Object Hypothesis

specifies the level and position of the i-th filter pi = (xi, yi, li)

slide-25
SLIDE 25

Score of Object Hypothesis

+b

slide-26
SLIDE 26

Matching

score(p0) = max

p1,...,pn score(p0, . . . , pn)

  • Define an overall score for each root location according

to the best placement of parts:

  • High scoring root locations define detections (“sliding

window approach”)

slide-27
SLIDE 27
slide-28
SLIDE 28

Matching Step 1: Compute filter responses

  • Compute arrays storing the response of the i-th model filter

in the l-th level of the feature pyramid (cross correlation):

Ri,l(x, y) = F 0

i · φ(H, (x, y, l))

slide-29
SLIDE 29

Matching Step 2: Spatial Uncertainty

  • Transform the responses of the part filters to allow for

spatial uncertainty: Di,l(x, y) = max

dx,dy(Ri,l(x + dx, y + dy) − di · φd(dx, dy))

slide-30
SLIDE 30

Matching Step 3: Compute overall root scores

  • Compute overall root score at each level by summing the root

filter response at that level, plus the contributions from each part: score(x0, y0, l0) = R0,l0(x0, y0) +

n

X

i=1

Di,l0−λ(2(x0, y0) + vi) + b

slide-31
SLIDE 31

Matching Step 4: Compute optimal part displacements

  • After finding a root location with a high score,

we can find the corresponding part locations by looking up the optimal displacements in (x0, y0, l0) Pi,l(x, y) = arg max

dx,dy(Ri,l(x + dx, y + dy) − di · φd(dx, dy))

Pi,l0−λ(2(x0, y0) + vi)

slide-32
SLIDE 32

Mixture Models

A mixture model with components is M = (M1, . . . , Mm) m where is the model for the -th component Mc c

An object hypothesis for a mixture model consists of:

  • A mixture component,
  • A location for each filter of

β · φ(H, z) = βc · φ(H, z0)

1 ≤ c ≤ m

Mc, z = (c, p0, . . . , pnc)

Score of hypothesis: To detect objects using a mixture model we use the matching algorithm to find root locations that yield high scoring hypotheses independently for each component

slide-33
SLIDE 33

Roadmap

  • 1. Introduction
  • 2. Related Work
  • 3. Model Overview
  • 4. Latent SVM
  • 5. Features & Post Processing
  • 6. Experiments
slide-34
SLIDE 34

Training

  • Training data consists of images with labeled bounding boxes
  • Weakly labeled setting since the bounding boxes don’t specify component labels
  • r part locations
  • Need to learn the model structure, filters and deformation costs
slide-35
SLIDE 35

SVM Review

  • Separable by a hyperplane in high-dimensional space
  • Choose the hyperplane with the max margin
slide-36
SLIDE 36

Latent SVM

fβ(x) = max

z∈Z(x) β · Φ(x, z)

x β z

D = (hx1, y1i, . . . , hxn, yni) where y 2 {1, 1} yifβ(xi) > 0

β

LD(β) = 1 2kβk2 + C

n

X

i=1

max(0, 1 yifβ(xi))

Vector of HOG features and part offsets

  • Classifiers that score an example using
  • are model parameters, are latent values
  • Training data
  • Learning: find such that
  • Minimize:

Hinge loss Regularization

slide-37
SLIDE 37

Semi-convexity

fβ(x) = max

z∈Z(x) β · Φ(x, z)

β max(0, 1 − yifβ(xi))

LD(β) = 1 2kβk2 + C

n

X

i=1

max(0, 1 yifβ(xi))

  • Maximum of convex functions is convex

is convex in is convex for negative examples

  • Convex if latent values for positive examples are fixed
  • Important because it makes optimizing a convex optimization

problem, even though the latent values for the negative examples are not fixed

β

slide-38
SLIDE 38

Latent SVM Training

  • Convex if we fix for positive examples
  • Optimization:
  • Initialize and iterate:
  • Pick best for each positive example
  • Optimize via gradient descent with data-mining

LD(β) = 1 2kβk2 + C

n

X

i=1

max(0, 1 yifβ(xi))

z β z β

slide-39
SLIDE 39

Training Models

  • Reduce to Latent SVM training problem
  • Positive example specifies some should have high

score

  • Bounding box defines range of root locations
  • Parts can be anywhere
  • This defines (vector of part offsets)

z Z(x)

slide-40
SLIDE 40

Training Algorithm

slide-41
SLIDE 41

Training Algorithm

Finds the highest scoring object hypothesis with a root filter that significantly overlaps B in I. Implemented with matching procedure

slide-42
SLIDE 42

Training Algorithm

Computes the best object hypothesis for each root location and selects the ones that score above a threshold. Implemented with matching procedure

slide-43
SLIDE 43

Training Algorithm

Trains β using cached feature vectors

slide-44
SLIDE 44

Roadmap

  • 1. Introduction
  • 2. Related Work
  • 3. Model Overview
  • 4. Latent SVM
  • 5. Features & Post Processing
  • 6. Experiments
slide-45
SLIDE 45

Histogram of Gradient features

  • Image is partitioned into 8x8 pixel blocks
  • In each block we compute a histogram of gradient orientations
  • Invariant to changes in lighting, small deformations
  • Compute features at different resolutions (pyramid)
  • They use

= number of levels we need to go down in the pyramid to get to a feature map computed at twice the resolution of another one

λ = 5 in training, λ = 10 in testing

slide-46
SLIDE 46

Background

  • Negative example specifies no should have high score
  • One negative example per root location in a background

image

  • Huge number of negative examples
  • Consistent with requiring low false-positive rate

z

slide-47
SLIDE 47

Post Processing: Bounding Box Prediction

  • Predict and from part locations
  • Learn four linear functions for predicting 


Done via linear least-squares regression, independently for each component of a mixture model. (x1, y1) (x2, y2) x1, x2, y1, y2

slide-48
SLIDE 48

Roadmap

  • 1. Introduction
  • 2. Related Work
  • 3. Model Overview
  • 4. Latent SVM
  • 5. Features & Post Processing
  • 6. Experiments
slide-49
SLIDE 49

Car Model

slide-50
SLIDE 50

Person Model

slide-51
SLIDE 51

Bottle Model

slide-52
SLIDE 52

Car Detections

slide-53
SLIDE 53

Person Detections

slide-54
SLIDE 54

Horse Detections

slide-55
SLIDE 55

Quantitative Results

  • PASCAL Challenge: ~10,000 images, with ~25,000 target objects
  • Objects from 20 categories (person, car, bicycle, cow, table...)
  • Out of 20 classes we got:
  • First place in 7 classes
  • Second place in 8 classes
  • Some statistics:
  • Takes ~2 seconds to evaluate a model in one image
  • Takes ~4 hours to train a model
  • MUCH faster than most systems
slide-56
SLIDE 56

Comparison of Car Models on 2006 Data

Results for: 1- and 2-component models, with and without parts 2-component model with parts and bounding box prediction

Average precision

slide-57
SLIDE 57

Comparison of Person Models on 2006 Data

Results for: 1- and 2-component models, with and without parts 2-component model with parts and bounding box prediction

Average precision

slide-58
SLIDE 58

Summary

  • Object detection based on mixtures of multiscale

deformable models

  • Discriminative training of classifiers that use latent

information

  • Fast matching algorithms
  • Learning from weakly-labeled data (no component

labels or part locations)

  • Leads to state-of-the-art results in PASCAL challenge
slide-59
SLIDE 59

Questions?