 
              Recognizing and Learning Object Categories Based on work and slides by R. Fergus, P. Perona, A. Zisserman, A. Efros, J. Ponce, S. Lazebnik, C. Schmid, F. DiMaio, and others Traditional Problem: Single Object Recognition 1
Most Objects Exhibit Considerable Intra-Class Variability Task: Recognition of object categories Some object categories Learn from just examples Difficulties: Size variation f Background clutter f Occlusion f Intra-class variation f Viewpoint variation f Illumination variation f 2
Chairs Related by function, not form Approach 1: Discriminative Methods Object detection and recognition is formulated as a classification problem The image is partitioned into a set of overlapping windows … and a decision is taken at each window about if it contains a target object or not Decision boundary Background Where are the screens? Computer screen Bag of image patches In some feature space 3
HRCT Lung Image Dilated bronchus Training Examples ������������������ �������������� ������������������� ������������������� ��� × ��������� 4
Formulation § Formulation: binary classification … … x 1 x 2 x 3 x N x N+1 x N+2 … x N+M Features x = -1 +1 -1 -1 y = ? ? ? Labels Training data: each image patch is labeled Test data as containing the object or not • Classification function Where belongs to some family of functions • Minimize misclassification error (Not that simple: we need some guarantees that there will be generalization) Discriminative Methods Nearest Neighbor Neural Networks 10 6 examples LeCun, Bottou, Bengio, Haffner 1998 Shakhnarovich, Viola, Darrell 2003 Rowley, Baluja, Kanade 1998 Berg, Berg, Malik 2005 … … Conditional Random Fields Support Vector Machines and Kernels Guyon, Vapnik McCallum, Freitag, Pereira 2000 Heisele, Serre, Poggio, 2001 Kumar, Hebert 2003 … … 5
Object categorization: Object categorization: the statistical viewpoint the statistical viewpoint p ( zebra | image ) vs. p ( no zebra|imag e ) § Bayes’s rule: p ( zebra | image ) p ( image | zebra ) p ( zebra ) = ⋅ p ( no zebra | image ) p ( image | no zebra ) p ( no zebra ) posterior ratio likelihood ratio prior ratio Object categorization: Object categorization: the statistical viewpoint the statistical viewpoint p ( zebra | image ) p ( image | zebra ) p ( zebra ) = ⋅ p ( no zebra | image ) p ( image | no zebra ) p ( no zebra ) posterior ratio likelihood ratio prior ratio § Discriminative methods model the posterior § Generative methods model the likelihood and prior 6
Discriminative p ( zebra | image ) § Direct modeling of p ( no zebra | image ) Decision Zebra boundary Non-zebra Generative p ( image | zebra ) p ( image | no zebra ) § Model and p ( image | zebra ) p ( image | no zebra ) Low Middle High Middle � Low 7
Three main issues Three main issues § Representation § How to represent an object category § Learning § How to form the classifier, given training data § Recognition § How the classifier is to be used on novel data Constructing models of image content Basic components: local features and spatial relations Textures Objects Scenes 8
Constructing models of image content Basic components: local features and spatial relations Textures Objects Scenes Local model Constructing models of image content Basic components: local features and spatial relations Textures Objects Scenes Local model 9
Constructing models of image content Basic components: local features and spatial relations Textures Objects Scenes Local model Semi-local model Constructing models of image content Basic components: local features and spatial relations Textures Objects Scenes Semi-local model Local model 10
Constructing models of image content Basic components: local features and spatial relations Textures Objects Scenes (usually appearance) Local model Semi-local model Global model Approach 2: Generative Methods using Bag of Words Models § An image is represented by a collection of “visual words” and their corresponding counts given a universal dictionary § Object categories are modeled by the distributions of these visual words § Although “bag of words” models can use both generative and discriminative approaches, here we will focus on generative models 11
Object Object Bag of ‘words’ Bag of ‘words’ Analogy to documents Analogy to documents China is forecasting a trade surplus of $90bn Of all the sensory impressions proceeding to (£51bn) to $100bn this year, a threefold the brain, the visual experiences are the increase on 2004's $32bn. The Commerce dominant ones. Our perception of the world Ministry said the surplus would be created by a around us is based essentially on the predicted 30% jump in exports to $750bn, messages that reach the brain from our eyes. compared with a 18% rise in imports to For a long time it was thought that the retinal sensory, brain, China, trade, $660bn. The figures are likely to further annoy image was transmitted point by point to visual visual, perception, the US, which has long argued that China's surplus, commerce, centers in the brain; the cerebral cortex was a exports are unfairly helped by a deliberately movie screen, so to speak, upon which the retinal, cerebral cortex, exports, imports, US, undervalued yuan. Beijing agrees the surplus image in the eye was projected. Through the eye, cell, optical is too high, but says the yuan is only one yuan, bank, domestic, discoveries of Hubel and Wiesel we now know factor. Bank of China governor Zhou that behind the origin of the visual perception nerve, image foreign, increase, Xiaochuan said the country also needed to do in the brain there is a considerably more Hubel, Wiesel more to boost domestic demand so more trade, value complicated course of events. By following the goods stayed within the country. China visual impulses along their path to the various increased the value of the yuan against the cell layers of the optical cortex, Hubel and dollar by 2.1% in July and permitted it to trade Wiesel have been able to demonstrate that the within a narrow band, but the US wants the message about the image falling on the retina yuan to be allowed to trade freely. However, undergoes a step-wise analysis in a system of Beijing has made it clear that it will take its nerve cells stored in columns. In this system time and tread carefully before allowing the each cell has its specific function and is yuan to rise further in value. responsible for a specific detail in the pattern of the retinal image. 12
learning recognition learning recognition codewords dictionary codewords dictionary feature detection & representation image representation category models category models category category (and/or) classifiers decision (and/or) classifiers decision 13
1. Feature Detection and Representation 1. Feature Detection and Representation Feature Detection Feature Detection § Sliding window § Leung et al., 1999 § Viola et al., 1999 § Renninger et al. 2002 14
Feature Detection Feature Detection § Sliding window § Leung et al., 1999 § Viola et al., 1999 § Renninger et al., 2002 § Regular grid § Vogel et al., 2003 § Fei-Fei et al., 2005 Feature Detection Feature Detection § Sliding window § Leung et al., 1999 § Viola et al., 1999 § Renninger et al., 2002 § Regular grid § Vogel et al., 2003 § Fei-Fei et al., 2005 § Interest point detector § Csurka et al., 2004 § Fei-Fei et al., 2005 § Sivic et al., 2005 15
Feature Detection Feature Detection § Sliding window § Leung et al., 1999 § Viola et al., 1999 § Renninger et al., 2002 § Regular grid § Vogel et al., 2003 § Fei-Fei et al., 2005 § Interest point detector § Csurka et al., 2004 § Fei-Fei et al., 2005 § Sivic et al., 2005 § Other methods § Random sampling (Ullman et al., 2002) § Segmentation based patches (Barnard et al., 2003 Feature Representation Feature Representation Visual words, aka textons, aka keypoints: K-means clustered pieces of the image § Various representations: § Filter bank responses § Image Patches § SIFT descriptors All encode more-or-less the same thing … 16
Interest Point Features Interest Point Features Compute Normalize SIFT patch descriptor [Lowe’99] Detect patches [Mikojaczyk and Schmid ’02] [Matas et al. ’02] [Sivic et al. ’03] Slide credit: Josef Sivic Interest Point Features Interest Point Features … 17
Patch Features Patch Features … Dictionary Formation Dictionary Formation … 18
Clustering (usually k- -Means) Means) Clustering (usually k … Vector quantization Slide credit: Josef Sivic Clustered Image Patches Clustered Image Patches Fei-Fei et al. 2005 19
Image Patch Examples of Codewords Codewords Image Patch Examples of Sivic et al. 2005 Image Representation Image Representation frequency ….. codewords 20
Recommend
More recommend