Structured Regression for Efficient Object Detection Christoph - - PowerPoint PPT Presentation

structured regression for efficient object detection
SMART_READER_LITE
LIVE PREVIEW

Structured Regression for Efficient Object Detection Christoph - - PowerPoint PPT Presentation

Structured Regression for Efficient Object Detection Christoph Lampert www.christoph-lampert.org Max Planck Institute for Biological Cybernetics, Tbingen December 3rd, 2009 [ C.L. , Matthew B. Blaschko, Thomas Hofmann. CVPR 2008]


slide-1
SLIDE 1

Structured Regression for Efficient Object Detection

Christoph Lampert www.christoph-lampert.org

Max Planck Institute for Biological Cybernetics, Tübingen

December 3rd, 2009

  • [C.L., Matthew B. Blaschko, Thomas Hofmann. CVPR 2008]
  • [Matthew B. Blaschko, C.L. ECCV 2008]
  • [C.L., Matthew B. Blaschko, Thomas Hofmann. PAMI 2009]
slide-2
SLIDE 2

Category-Level Object Localization

slide-3
SLIDE 3

Category-Level Object Localization What objects are present? person, car

slide-4
SLIDE 4

Category-Level Object Localization Where are the objects?

slide-5
SLIDE 5

Object Localization ⇒ Scene Interpretation A man inside of a car A man outside of a car ⇒ He’s driving. ⇒ He’s passing by.

slide-6
SLIDE 6

Algorithmic Approach: Sliding Window

f(y1) = 0.2 f(y2) = 0.8 f(y3) = 1.5

Use a (pre-trained) classifier function f:

  • Place candidate window on the image.
  • Iterate:

◮ Evaluate f and store result. ◮ Shift candidate window by k pixels.

  • Return position where f was largest.
slide-7
SLIDE 7

Algorithmic approach: Sliding Window

f(y1) = 0.2 f(y2) = 0.8 f(y3) = 1.5

Drawbacks:

  • single scale, single aspect ratio

→ repeat with different window sizes/shapes

  • search on grid

→ speed–accuracy tradeoff

  • computationally expensive
slide-8
SLIDE 8

New view: Generalized Sliding Window Assumptions:

  • Objects are rectangular image regions of arbitrary size.
  • The score of f is largest at the correct object position.

Mathematical Formulation: yopt = argmax

y∈Y

f(y) with Y = {all rectangular regions in image}

slide-9
SLIDE 9

New view: Generalized Sliding Window Mathematical Formulation: yopt = argmax

y∈Y

f(y) with Y = {all rectangular regions in image}

  • How to choose/construct/learn the function f ?
  • How to do the optimization efficiently and robustly?

(exhaustive search is too slow, O(w2h2) elements).

slide-10
SLIDE 10

New view: Generalized Sliding Window Mathematical Formulation: yopt = argmax

y∈Y

f(y) with Y = {all rectangular regions in image}

  • How to choose/construct/learn the function f ?
  • How to do the optimization efficiently and robustly?

(exhaustive search is too slow, O(w2h2) elements).

slide-11
SLIDE 11

New view: Generalized Sliding Window Use the problem’s geometric structure:

slide-12
SLIDE 12

New view: Generalized Sliding Window Use the problem’s geometric structure:

  • Calculate scores for

sets of boxes jointly.

  • If no element can

contain the maximum, discard the box set.

  • Otherwise, split the

box set and iterate. → Branch-and-bound

  • ptimization
  • finds global maximum yopt
slide-13
SLIDE 13

New view: Generalized Sliding Window Use the problem’s geometric structure:

  • Calculate scores for

sets of boxes jointly.

  • If no element can

contain the maximum, discard the box set.

  • Otherwise, split the

box set and iterate. → Branch-and-bound

  • ptimization
  • finds global maximum yopt
slide-14
SLIDE 14

Representing Sets of Boxes

  • Boxes: [l, t, r, b] ∈ R4.
slide-15
SLIDE 15

Representing Sets of Boxes

  • Boxes: [l, t, r, b] ∈ R4.

Boxsets: [L, T, R, B] ∈ (R2)4

slide-16
SLIDE 16

Representing Sets of Boxes

  • Boxes: [l, t, r, b] ∈ R4.

Boxsets: [L, T, R, B] ∈ (R2)4 Splitting:

  • Identify largest interval.
slide-17
SLIDE 17

Representing Sets of Boxes

  • Boxes: [l, t, r, b] ∈ R4.

Boxsets: [L, T, R, B] ∈ (R2)4 Splitting:

  • Identify largest interval. Split at center: R → R1∪R2.
slide-18
SLIDE 18

Representing Sets of Boxes

  • Boxes: [l, t, r, b] ∈ R4.

Boxsets: [L, T, R, B] ∈ (R2)4 Splitting:

  • Identify largest interval. Split at center: R → R1∪R2.
  • New box sets:

[L, T, R1, B]

slide-19
SLIDE 19

Representing Sets of Boxes

  • Boxes: [l, t, r, b] ∈ R4.

Boxsets: [L, T, R, B] ∈ (R2)4 Splitting:

  • Identify largest interval. Split at center: R → R1∪R2.
  • New box sets:

[L, T, R1, B] and [L, T, R2, B].

slide-20
SLIDE 20

Calculating Scores for Box Sets Example: Linear Support-Vector-Machine f(y) :=

pi∈y wi.

+

f upper(Y) =

  • pi∈y∩

min(0, wi) +

  • pi∈y∪

max(0, wi) Can be computed in O(1) using integral images.

slide-21
SLIDE 21

Calculating Scores for Box Sets Histogram Intersection Similarity: f(y) := J

j=1 min(h′ j, hy j ).

f upper(Y) =

J

j=1 min(h′ j, hy∪ j )

As fast as for a single box: O(J) with integral histograms.

slide-22
SLIDE 22

Evaluation: Speed (on PASCAL VOC 2006) Sliding Window Runtime:

  • always: O(w2h2)

Branch-and-Bound (ESS) Runtime:

  • worst-case: O(w2h2)
  • empirical: not more than O(wh)
slide-23
SLIDE 23

Extensions: Action classification: (y, t)opt = argmax(y,t)∈Y×T fx(y, t)

  • J. Yuan: Discriminative 3D Subvolume Search for Efficient Action Detection, CVPR 2009.
slide-24
SLIDE 24

Extensions: Localized image retrieval: (x, y)opt = argmaxy∈Y, x∈D fx(y)

  • C.L.: Detecting Objects in Large Image Collections and Videos by Efficient Subimage Retrieval, ICCV 2009
slide-25
SLIDE 25

Extensions: Hybrid – Branch-and-Bound with Implicit Shape Model

  • A. Lehmann, B. Leibe, L. van Gool: Feature-Centric Efficient Subwindow Search, ICCV 2009
slide-26
SLIDE 26
slide-27
SLIDE 27

Generalized Sliding Window yopt = argmax

y∈Y

f(y) with Y = {all rectangular regions in image}

  • How to choose/construct/learn f ?
  • How to do the optimization efficiently and robustly?
slide-28
SLIDE 28

Traditional Approach: Binary Classifier Training images:

  • x+

1 , . . . , x+ n show the object

  • x−

1 , . . . , x− m show something else

Train a classifier, e.g.

  • support vector machine,
  • boosted cascade,
  • artificial neural network,. . .

Decision function f : {images} → R

  • f > 0 means “image shows the object.”
  • f < 0 means “image does not show

the object.”

slide-29
SLIDE 29

Traditional Approach: Binary Classifier Drawbacks:

  • Train distribution

= test distribution

  • No control over partial

detections.

  • No guarantee to even find

training examples again.

slide-30
SLIDE 30

Object Localization as Structured Output Regression Ideal setup:

  • function

g : {all images} → {all boxes} to predict object boxes from images

  • train and test in the same way, end-to-end

gcar

    =

slide-31
SLIDE 31

Object Localization as Structured Output Regression Ideal setup:

  • function

g : {all images} → {all boxes} to predict object boxes from images

  • train and test in the same way, end-to-end

Regression problem:

  • training examples (x1, y1), . . . , (xn, yn) ∈ X × Y

◮ xi are images, yi are bounding boxes

  • Learn a mapping

g : X→Y that generalizes from the given examples:

◮ g(xi) ≈ yi, for i = 1, . . . , n,

slide-32
SLIDE 32

Structured Support Vector Machine SVM-like framework by Tsochantaridis et al.:

  • Positive definite kernel k : (X × Y) × (X × Y)→R.

ϕ : X × Y → H : (implicit) feature map induced by k.

  • ∆ : Y × Y → R:

loss function

  • Solve the convex optimization problem

minw,ξ 1 2w2 + C

n

  • i=1

ξi subject to margin constraints for i =1, . . . , n : ∀y ∈ Y \ {yi} : ∆(y, yi) + w, ϕ(xi, y)−w, ϕ(xi, yi) ≤ ξi,

  • unique solution: w∗ ∈ H
  • I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun: Large Margin Methods for Structured and Interdependent

Output Variables, Journal of Machine Learning Research (JMLR), 2005.

slide-33
SLIDE 33

Structured Support Vector Machine

  • w∗ defines compatiblity function

F(x, y)=w∗, ϕ(x, y)

  • best prediction for x is the most compatible y:

g(x) := argmax

y∈Y

F(x, y).

  • evaluating g : X → Y is like generalized Sliding Window:

◮ for fixed x, evaluate quality function for every box y ∈ Y. ◮ for example, use previous branch-and-bound procedure!

slide-34
SLIDE 34

Joint Image/Box-Kernel: Example Joint kernel: how to compare one (image,box)-pair (x, y) with another (image,box)-pair (x′, y′)? kjoint

  • ,
  • = k
  • ,
  • is large.

kjoint

  • ,
  • = k
  • ,
  • is small.

kjoint

  • ,
  • = kimage
  • ,
  • could also be large.
slide-35
SLIDE 35

Loss Function: Example Loss function: how to compare two boxes y and y′? ∆(y, y′) := 1 − area overlap between y and y′ = 1 − area(y ∩ y′) area(y ∪ y′)

slide-36
SLIDE 36

Structured Support Vector Machine

  • S-SVM Optimization:

minw,ξ

1 2w2 + C n

  • i=1 ξi

subject to for i =1, . . . , n : ∀y ∈ Y \ {yi} : ∆(y, yi) + w, ϕ(xi, y)−w, ϕ(xi, yi) ≤ ξi,

slide-37
SLIDE 37

Structured Support Vector Machine

  • S-SVM Optimization:

minw,ξ

1 2w2 + C n

  • i=1 ξi

subject to for i =1, . . . , n : ∀y ∈ Y \ {yi} : ∆(y, yi) + w, ϕ(xi, y)−w, ϕ(xi, yi) ≤ ξi,

  • Solve via constraint generation:
  • Iterate:

◮ Solve minimization with working set of contraints ◮ Identify argmaxy∈Y ∆(y, yi) + w, ϕ(xi, y) ◮ Add violated constraints to working set and iterate

  • Polynomial time convergence to any precision ε
  • Similar to bootstrap training, but with a margin.
slide-38
SLIDE 38

Evaluation: PASCAL VOC 2006

Example detections for VOC 2006 bicycle, bus and cat. Precision–recall curves for VOC 2006 bicycle, bus and cat.

  • Structured regression improves detection accuracy.
  • New best scores (at that time) in 6 of 10 classes.
slide-39
SLIDE 39

Why does it work?

Learned weights from binary (center) and structured training (right).

  • Both methods assign positive weights to object region.
  • Structured training also assigns negative weights to

features surrounding the bounding box position.

  • Posterior distribution over box coordinates becomes more

peaked.

slide-40
SLIDE 40

More Recent Results (PASCAL VOC 2009) aeroplane

slide-41
SLIDE 41

More Recent Results (PASCAL VOC 2009) bicycle

slide-42
SLIDE 42

More Recent Results (PASCAL VOC 2009) bird

slide-43
SLIDE 43

More Recent Results (PASCAL VOC 2009) boat

slide-44
SLIDE 44

More Recent Results (PASCAL VOC 2009) bottle

slide-45
SLIDE 45

More Recent Results (PASCAL VOC 2009) bus

slide-46
SLIDE 46

More Recent Results (PASCAL VOC 2009) car

slide-47
SLIDE 47

More Recent Results (PASCAL VOC 2009) cat

slide-48
SLIDE 48

More Recent Results (PASCAL VOC 2009) chair

slide-49
SLIDE 49

More Recent Results (PASCAL VOC 2009) cow

slide-50
SLIDE 50

More Recent Results (PASCAL VOC 2009) diningtable

slide-51
SLIDE 51

More Recent Results (PASCAL VOC 2009) dog

slide-52
SLIDE 52

More Recent Results (PASCAL VOC 2009) horse

slide-53
SLIDE 53

More Recent Results (PASCAL VOC 2009) motorbike

slide-54
SLIDE 54

More Recent Results (PASCAL VOC 2009) person

slide-55
SLIDE 55

More Recent Results (PASCAL VOC 2009) pottedplant

slide-56
SLIDE 56

More Recent Results (PASCAL VOC 2009) sheep

slide-57
SLIDE 57

More Recent Results (PASCAL VOC 2009) sofa

slide-58
SLIDE 58

More Recent Results (PASCAL VOC 2009) train

slide-59
SLIDE 59

More Recent Results (PASCAL VOC 2009) tvmonitor

slide-60
SLIDE 60

Extensions: Image segmentation with connectedness constraint: CRF segmentation connected CRF segmentation

  • S. Nowozin, C.L.: Global Connectivity Potentials for Random Field Models, CVPR 2009.
slide-61
SLIDE 61

Summary Object Localization is a step towards image interpretation. Conceptual approach instead of algorithmic:

  • Branch-and-bound evaluation:

◮ don’t slide a window, but solve an argmax problem,

⇒ higher efficiency

  • Structured regression training:

◮ solve the prediction problem, not a classification proxy.

⇒ higher localization accuracy

  • Modular and kernelized:

◮ easily adapted to other problems/representations, e.g.

image segmentations

slide-62
SLIDE 62