Class Object Layout Chaitanya Desai, Deva Ramanan, Charles Fowlkes - - PowerPoint PPT Presentation

class object layout
SMART_READER_LITE
LIVE PREVIEW

Class Object Layout Chaitanya Desai, Deva Ramanan, Charles Fowlkes - - PowerPoint PPT Presentation

Discriminative Models for Multi- Class Object Layout Chaitanya Desai, Deva Ramanan, Charles Fowlkes Presented by: Vignesh Ramanathan, Vivardhan Kanoria, Kevin Truong Introduction Why another Object Detector? Issues with other Detectors:


slide-1
SLIDE 1

Discriminative Models for Multi- Class Object Layout

Chaitanya Desai, Deva Ramanan, Charles Fowlkes

Presented by: Vignesh Ramanathan, Vivardhan Kanoria, Kevin Truong

slide-2
SLIDE 2

Introduction Why another Object Detector?

Issues with other Detectors:

  • Binary 0-1 classification model for each image

window and object class, independent of the remaining image and objects present in it

  • Heuristic post processing to improve

performance of detectors on datasets, e.g. Non Maximal Suppression

slide-3
SLIDE 3

Interactions between Objects

  • 1. Activation

Intra Class – Textures of Objects

[17] Y. Liu, W. Lin, and J. Hays. Near-regular texture analysis and

  • manipulation. ACM Transactions on Graphics, 23(3):368–376, 2004

Between Class – Spatial Cueing

slide-4
SLIDE 4

Interactions between Objects

  • 2. Inhibition

Intra Class – Non Maximal Suppression Between Class – Mutual Exclusion

slide-5
SLIDE 5

Interactions between Objects

  • 3. Global Properties

Between Class – Co-occurrence At most 1 biker per bike Intra Class – Total Counts At most 1 Sydney Opera House

slide-6
SLIDE 6

Summary of Spatial Interactions Modeled

Within Class Between Class Activation Textures of Objects Spatial Cueing Inhibition Non Maximal Suppression Mutual Exclusion Global Expected Counts Co-occurrence

slide-7
SLIDE 7

Contributions of Multi-Class Object Layout

  • The object layout framework formulates

detection as a structured prediction task for an entire image rather than a binary classification task on sub-windows

  • The model learns all of the listed spatial

interactions, in addition to learning local appearance statistics

slide-8
SLIDE 8

Problem Formulation

The objective is to train a model to detect multiple classes of objects in test images given training images with annotated bounding boxes for each class specified

learning Model Parameters Test Image Inference

slide-9
SLIDE 9

Model Formulation

Suppose we wish to model 𝐿 different object classes. The vector of object labels is 𝑚𝑏𝑐𝑓𝑚𝑡: 𝑍 = 𝑧𝑗: 𝑗 = 1 … 𝑁 , 𝑧𝑗𝜗 0 … 𝐿 ; 0 = background Construct the image pyramid Let 𝑁 be the total number of sub-windows. An image 𝑌 is represented by a set of features 𝑦𝑗: 𝑌 = 𝑦𝑗: 𝑗 = 1 … 𝑁

𝒚𝒋 = 𝑰𝑷𝑯 𝑮𝒇𝒃𝒖𝒗𝒔𝒇𝒕 𝒛𝒋 = 𝟒 (𝒊𝒗𝒏𝒃𝒐)

Task: Model should predict all labels Y, given an image X

slide-10
SLIDE 10

Spatial Interaction Model

The spatial configuration of a window 𝑘 with respect to a window 𝑗 is encoded as follows:

𝑒𝑗𝑘 = 𝑂𝑓𝑏𝑠? 𝐺𝑏𝑠? 𝐵𝑐𝑝𝑤𝑓? 𝑃𝑜𝑢𝑝𝑞? 𝐶𝑓𝑚𝑝𝑥? 𝑂𝑓𝑦𝑢 − 𝑢𝑝? 50% 𝑃𝑤𝑓𝑠𝑚𝑏𝑞? ; j=1 j=2 𝑒𝑗1 = 1 ; 𝑒𝑗2 = 1 1

𝑒𝑗𝑘 is a 7 dimensional sparse binary vector: The first 6 components depend only on the relative location of the center of window j with respect to window i.

slide-11
SLIDE 11

Model Parameters

The score of labeling an image X with labels Y is: 𝑇 𝑌, 𝑍 = 𝜕𝑧𝑗,𝑧𝑘

𝑈 𝑗,𝑘

𝑒𝑗𝑘 + 𝜕𝑧𝑗

𝑈 𝑗

𝑦𝑗 ; where 𝜕𝑏,𝑐 and 𝜕𝑑 are model parameters.

Sum over all pairs of windows Sum over all windows

  • 𝜕𝑏,𝑐 captures spatial interactions between object classes a and b
  • 𝜕𝑑 captures local appearance characteristics of object class c
  • 𝜕𝑏,𝑐 = 7 × 1; 𝑏, 𝑐 ∈ 0 … 𝐿 × 0 … 𝐿
  • 𝜕𝑑 = 𝑇𝑗𝑨𝑓 𝑝𝑔 𝐺𝑓𝑏𝑢𝑣𝑠𝑓 𝑦𝑗 𝐼𝑃𝐻, 𝑓𝑢𝑑. ; 𝑑 𝜗 0 … 𝐿
  • Append a 1 to each 𝑦𝑗 to learn biases between classes
  • Assign 𝜕0 and 𝜕0,1 and 𝜕1,0 to be 0
slide-12
SLIDE 12

Inference: NP Hard

To get the desired detection, we need to compute: arg 𝑛𝑏𝑦𝑍 𝑇 𝑌, 𝑍 = arg 𝑛𝑏𝑦𝑍 𝜕𝑧𝑗,𝑧𝑘

𝑈 𝑗,𝑘

𝑒𝑗𝑘 + 𝜕𝑧𝑗

𝑈 𝑗

𝑦𝑗 i.e. Find the labeling 𝑍 that maximizes the score S for image 𝑌, given learnt model parameters 𝜕 There are (𝐿 + 1)𝑁 possible values for 𝑍 This is NP hard.

slide-13
SLIDE 13

Inference: Greedy Forward Search Algorithm

  • 1. Initialize all labels to 0 (i.e. background)
  • 2. Repeatedly change the label of window 𝑗 to class 𝑑, where:

𝑗, 𝑑 is the window-class pair that maximizes the increase in score S(X,Y)

  • 3. Stop when all windows have been instanced or step 2 causes

a decrease in score

  • Effectiveness was tested on small scale problems where

the brute force solution was easily computed

  • The score for the greedy forward search algorithm was

found to be quite close to the actual solution

  • The two solutions typically differed in the labels of 1-3

windows

slide-14
SLIDE 14

Greedy Forward Search: Details

Initialize 𝐽 = ; Set of instanced windows 𝑇 = 0; ∆ 𝑗, 𝑑 = 𝜕𝑑

𝑈𝑦𝑗 ; Change in score

Repeat: 1. 𝑗∗, 𝑑∗ = arg 𝑛𝑏𝑦(𝑗,𝑑)∉𝐽 ∆ 𝑗, 𝑑

  • 2. 𝐽 = 𝐽 ∪

𝑗∗, 𝑑∗

  • 3. 𝑇 = 𝑇 + ∆ 𝑗∗, 𝑑∗
  • 4. ∆ 𝑗, 𝑑 = ∆ 𝑗, 𝑑 + 𝜕𝑑∗,𝑑

𝑈

𝑒𝑗∗,𝑗 + 𝜕𝑑,𝑑∗

𝑈

𝑒𝑗,𝑗∗ Stop when: ∆ 𝑗∗, 𝑑∗ < 0 or all windows have been instanced

slide-15
SLIDE 15

CRF Formulation - Scoring

  • Model 𝑄(𝑍|𝑌) as a CRF with pairwise potentials between 𝑍 and each 𝑌

being exponential in 𝑇 𝑌, 𝑍 , i.e. 𝑄 𝑍 𝑌 =

1 𝑎(𝑌) 𝑓𝑇(𝑌,𝑍)

  • A natural choice for scoring each detection is the log odds ratio between

probability of detecting a class c versus detecting any other class: 𝑛 𝑧𝑗 = 𝑑 = 𝑚𝑝𝑕 𝑄(𝑧𝑗 = 𝑑|𝑌) 𝑄(𝑧𝑗 ≠ 𝑑|𝑌) = 𝑚𝑝𝑕 𝑄(𝑧𝑗 = 𝑑, 𝒛𝒔|𝑌)

𝒛𝒔

𝑄(𝑧𝑗 = 𝑑′, 𝒛𝒕|𝑌)

𝒛𝒕,𝒅′≠𝑑

  • Assume that both marginals are dominated by their largest terms. These

are given by: 𝑠∗ = arg 𝑛𝑏𝑦𝑠 𝑇 𝑌, 𝑧𝑗 = 𝑑, 𝑧𝑠 𝑡∗ = arg 𝑛𝑏𝑦𝑡,𝑑′≠𝑑 𝑇 𝑌, 𝑧𝑗 = 𝑑′, 𝑧𝑡

  • Then the log odds ratio is given by:

𝑛 𝑧𝑗 = 𝑑 ≈ 𝑚𝑝𝑕

𝑄 𝑧𝑗=𝑑,𝑧𝑠∗ 𝑌 𝑄 𝑧𝑗=𝑑∗,𝑧𝑡∗ 𝑌

= 𝑇 𝑌, 𝑧𝑗 = 𝑑, 𝑧𝑠∗ − 𝑇 𝑌, 𝑧𝑗 = 𝑑∗, 𝑧𝑡∗