 
              Discriminative Models for Multi- Class Object Layout Chaitanya Desai, Deva Ramanan, Charles Fowlkes Presented by: Vignesh Ramanathan, Vivardhan Kanoria, Kevin Truong
Introduction Why another Object Detector? Issues with other Detectors:  Binary 0-1 classification model for each image window and object class, independent of the remaining image and objects present in it  Heuristic post processing to improve performance of detectors on datasets, e.g. Non Maximal Suppression
Interactions between Objects 1. Activation Intra Class – Textures of Objects Between Class – Spatial Cueing [17] Y. Liu, W. Lin, and J. Hays. Near-regular texture analysis and manipulation. ACM Transactions on Graphics, 23(3):368 – 376, 2004
Interactions between Objects 2. Inhibition Intra Class – Non Maximal Suppression Between Class – Mutual Exclusion
Interactions between Objects 3. Global Properties Intra Class – Total Counts Between Class – Co-occurrence At most 1 Sydney Opera House At most 1 biker per bike
Summary of Spatial Interactions Modeled Within Class Between Class Activation Textures of Objects Spatial Cueing Inhibition Non Maximal Suppression Mutual Exclusion Global Expected Counts Co-occurrence
Contributions of Multi-Class Object Layout  The object layout framework formulates detection as a structured prediction task for an entire image rather than a binary classification task on sub-windows  The model learns all of the listed spatial interactions, in addition to learning local appearance statistics
Problem Formulation The objective is to train a model to detect multiple classes of objects in test images given training images with annotated bounding boxes for each class specified Test Image learning Model Parameters Inference
Model Formulation Construct the image pyramid Let 𝑁 be the total number of sub-windows. An image 𝑌 is represented by a set of features 𝑦 𝑗 : 𝑌 = 𝑦 𝑗 : 𝑗 = 1 … 𝑁 𝒚 𝒋 = 𝑰𝑷𝑯 𝑮𝒇𝒃𝒖𝒗𝒔𝒇𝒕 𝒛 𝒋 = 𝟒 (𝒊𝒗𝒏𝒃𝒐) Suppose we wish to model 𝐿 different object classes. The vector of object labels is 𝑚𝑏𝑐𝑓𝑚𝑡: 𝑍 = 𝑧 𝑗 : 𝑗 = 1 … 𝑁 , 𝑧 𝑗 𝜗 0 … 𝐿 ; 0 = background Task: Model should predict all labels Y, given an image X
Spatial Interaction Model The spatial configuration of a window 𝑘 with respect to a window 𝑗 is encoded as follows: 𝑂𝑓𝑏𝑠? 1 1 𝐺𝑏𝑠? 0 0 j=1 𝐵𝑐𝑝𝑤𝑓? 0 0 𝑒 𝑗1 = 𝑒 𝑗2 = 𝑃𝑜𝑢𝑝𝑞? 0 0 𝑒 𝑗𝑘 = ; ; 0 0 𝐶𝑓𝑚𝑝𝑥? j=2 0 1 𝑂𝑓𝑦𝑢 − 𝑢𝑝? 0 0 50% 𝑃𝑤𝑓𝑠𝑚𝑏𝑞? 𝑒 𝑗𝑘 is a 7 dimensional sparse binary vector: The first 6 components depend only on the relative location of the center of window j with respect to window i.
Model Parameters The score of labeling an image X with labels Y is: 𝑈 𝑈 𝑇 𝑌, 𝑍 = 𝜕 𝑧 𝑗 ,𝑧 𝑘 𝑒 𝑗𝑘 + 𝜕 𝑧 𝑗 𝑦 𝑗 ; 𝑗,𝑘 𝑗 Sum over all pairs of windows Sum over all windows where 𝜕 𝑏,𝑐 and 𝜕 𝑑 are model parameters.  𝜕 𝑏,𝑐 captures spatial interactions between object classes a and b  𝜕 𝑑 captures local appearance characteristics of object class c 𝜕 𝑏,𝑐 = 7 × 1; 𝑏, 𝑐 ∈ 0 … 𝐿 × 0 … 𝐿 • 𝜕 𝑑 = 𝑇𝑗𝑨𝑓 𝑝𝑔 𝐺𝑓𝑏𝑢𝑣𝑠𝑓 𝑦 𝑗 𝐼𝑃𝐻, 𝑓𝑢𝑑. ; 𝑑 𝜗 0 … 𝐿 •  Append a 1 to each 𝑦 𝑗 to learn biases between classes  Assign 𝜕 0 and 𝜕 0,1 and 𝜕 1,0 to be 0
Inference: NP Hard To get the desired detection, we need to compute: 𝑈 𝑈 arg 𝑛𝑏𝑦 𝑍 𝑇 𝑌, 𝑍 = arg 𝑛𝑏𝑦 𝑍 𝜕 𝑧 𝑗 ,𝑧 𝑘 𝑒 𝑗𝑘 + 𝜕 𝑧 𝑗 𝑦 𝑗 𝑗,𝑘 𝑗 i.e. Find the labeling 𝑍 that maximizes the score S for image 𝑌 , given learnt model parameters 𝜕 There are (𝐿 + 1) 𝑁 possible values for 𝑍 This is NP hard.
Inference: Greedy Forward Search Algorithm 1. Initialize all labels to 0 (i.e. background) 2. Repeatedly change the label of window 𝑗 to class 𝑑 , where: 𝑗, 𝑑 is the window-class pair that maximizes the increase in score S(X,Y) 3. Stop when all windows have been instanced or step 2 causes a decrease in score  Effectiveness was tested on small scale problems where the brute force solution was easily computed  The score for the greedy forward search algorithm was found to be quite close to the actual solution  The two solutions typically differed in the labels of 1-3 windows
Greedy Forward Search: Details Initialize 𝐽 = ; Set of instanced windows 𝑇 = 0; 𝑈 𝑦 𝑗 ; Change in score ∆ 𝑗, 𝑑 = 𝜕 𝑑 Repeat: 𝑗 ∗ , 𝑑 ∗ = arg 𝑛𝑏𝑦 (𝑗,𝑑)∉𝐽 ∆ 𝑗, 𝑑 1. 𝑗 ∗ , 𝑑 ∗ 2. 𝐽 = 𝐽 ∪ 3. 𝑇 = 𝑇 + ∆ 𝑗 ∗ , 𝑑 ∗ 𝑈 𝑈 4. ∆ 𝑗, 𝑑 = ∆ 𝑗, 𝑑 + 𝜕 𝑑 ∗ ,𝑑 𝑒 𝑗 ∗ ,𝑗 + 𝜕 𝑑,𝑑 ∗ 𝑒 𝑗,𝑗 ∗ Stop when: ∆ 𝑗 ∗ , 𝑑 ∗ < 0 or all windows have been instanced
CRF Formulation - Scoring Model 𝑄(𝑍|𝑌) as a CRF with pairwise potentials between 𝑍 and each 𝑌  1 𝑎(𝑌) 𝑓 𝑇(𝑌,𝑍) being exponential in 𝑇 𝑌, 𝑍 , i.e. 𝑄 𝑍 𝑌 =  A natural choice for scoring each detection is the log odds ratio between probability of detecting a class c versus detecting any other class: 𝑄(𝑧 𝑗 = 𝑑, 𝒛 𝒔 |𝑌) 𝑛 𝑧 𝑗 = 𝑑 = 𝑚𝑝 𝑄(𝑧 𝑗 = 𝑑|𝑌) 𝒛 𝒔 𝑄(𝑧 𝑗 ≠ 𝑑|𝑌) = 𝑚𝑝 𝑄(𝑧 𝑗 = 𝑑 ′ , 𝒛 𝒕 |𝑌) 𝒛 𝒕 ,𝒅 ′ ≠𝑑  Assume that both marginals are dominated by their largest terms. These are given by: 𝑠 ∗ = arg 𝑛𝑏𝑦 𝑠 𝑇 𝑌, 𝑧 𝑗 = 𝑑, 𝑧 𝑠 𝑡 ∗ = arg 𝑛𝑏𝑦 𝑡,𝑑 ′ ≠𝑑 𝑇 𝑌, 𝑧 𝑗 = 𝑑 ′ , 𝑧 𝑡  Then the log odds ratio is given by: 𝑄 𝑧 𝑗 =𝑑,𝑧 𝑠∗ 𝑌 = 𝑇 𝑌, 𝑧 𝑗 = 𝑑, 𝑧 𝑠 ∗ − 𝑇 𝑌, 𝑧 𝑗 = 𝑑 ∗ , 𝑧 𝑡 ∗ 𝑛 𝑧 𝑗 = 𝑑 ≈ 𝑚𝑝 𝑄 𝑧 𝑗 =𝑑 ∗ ,𝑧 𝑡∗ 𝑌
Recommend
More recommend