Recognizing and Learning Object Categories Based on work and slides - PDF document

Recognizing and Learning Object Categories Based on work and slides by R. Fergus, P. Perona, A. Zisserman, A. Efros, J. Ponce, S. Lazebnik, C. Schmid, F. DiMaio, and others Traditional Problem: Single Object Recognition 1

Most Objects Exhibit Considerable Intra-Class Variability Task: Recognition of object categories Some object categories Learn from just examples Difficulties: Size variation f Background clutter f Occlusion f Intra-class variation f Viewpoint variation f Illumination variation f 2

Chairs Related by function, not form Approach 1: Discriminative Methods Object detection and recognition is formulated as a classification problem The image is partitioned into a set of overlapping windows … and a decision is taken at each window about if it contains a target object or not Decision boundary Background Where are the screens? Computer screen Bag of image patches In some feature space 3

HRCT Lung Image Dilated bronchus Training Examples �� × �� 4

Formulation § Formulation: binary classification … … x 1 x 2 x 3 x N x N+1 x N+2 … x N+M Features x = -1 +1 -1 -1 y = ? ? ? Labels Training data: each image patch is labeled Test data as containing the object or not • Classification function Where belongs to some family of functions • Minimize misclassification error (Not that simple: we need some guarantees that there will be generalization) Discriminative Methods Nearest Neighbor Neural Networks 10 6 examples LeCun, Bottou, Bengio, Haffner 1998 Shakhnarovich, Viola, Darrell 2003 Rowley, Baluja, Kanade 1998 Berg, Berg, Malik 2005 … … Conditional Random Fields Support Vector Machines and Kernels Guyon, Vapnik McCallum, Freitag, Pereira 2000 Heisele, Serre, Poggio, 2001 Kumar, Hebert 2003 … … 5

Object categorization: Object categorization: the statistical viewpoint the statistical viewpoint p ( zebra | image ) vs. p ( no zebra|imag e ) § Bayes’s rule: p ( zebra | image ) p ( image | zebra ) p ( zebra ) = ⋅ p ( no zebra | image ) p ( image | no zebra ) p ( no zebra ) posterior ratio likelihood ratio prior ratio Object categorization: Object categorization: the statistical viewpoint the statistical viewpoint p ( zebra | image ) p ( image | zebra ) p ( zebra ) = ⋅ p ( no zebra | image ) p ( image | no zebra ) p ( no zebra ) posterior ratio likelihood ratio prior ratio § Discriminative methods model the posterior § Generative methods model the likelihood and prior 6

Three main issues Three main issues § Representation § How to represent an object category § Learning § How to form the classifier, given training data § Recognition § How the classifier is to be used on novel data Constructing models of image content Basic components: local features and spatial relations Textures Objects Scenes 8

Constructing models of image content Basic components: local features and spatial relations Textures Objects Scenes Local model Constructing models of image content Basic components: local features and spatial relations Textures Objects Scenes Local model 9

Constructing models of image content Basic components: local features and spatial relations Textures Objects Scenes Local model Semi-local model Constructing models of image content Basic components: local features and spatial relations Textures Objects Scenes Semi-local model Local model 10

Constructing models of image content Basic components: local features and spatial relations Textures Objects Scenes (usually appearance) Local model Semi-local model Global model Approach 2: Generative Methods using Bag of Words Models § An image is represented by a collection of “visual words” and their corresponding counts given a universal dictionary § Object categories are modeled by the distributions of these visual words § Although “bag of words” models can use both generative and discriminative approaches, here we will focus on generative models 11

Object Object Bag of ‘words’ Bag of ‘words’ Analogy to documents Analogy to documents China is forecasting a trade surplus of $90bn Of all the sensory impressions proceeding to (£51bn) to $100bn this year, a threefold the brain, the visual experiences are the increase on 2004's $32bn. The Commerce dominant ones. Our perception of the world Ministry said the surplus would be created by a around us is based essentially on the predicted 30% jump in exports to $750bn, messages that reach the brain from our eyes. compared with a 18% rise in imports to For a long time it was thought that the retinal sensory, brain, China, trade, $660bn. The figures are likely to further annoy image was transmitted point by point to visual visual, perception, the US, which has long argued that China's surplus, commerce, centers in the brain; the cerebral cortex was a exports are unfairly helped by a deliberately movie screen, so to speak, upon which the retinal, cerebral cortex, exports, imports, US, undervalued yuan. Beijing agrees the surplus image in the eye was projected. Through the eye, cell, optical is too high, but says the yuan is only one yuan, bank, domestic, discoveries of Hubel and Wiesel we now know factor. Bank of China governor Zhou that behind the origin of the visual perception nerve, image foreign, increase, Xiaochuan said the country also needed to do in the brain there is a considerably more Hubel, Wiesel more to boost domestic demand so more trade, value complicated course of events. By following the goods stayed within the country. China visual impulses along their path to the various increased the value of the yuan against the cell layers of the optical cortex, Hubel and dollar by 2.1% in July and permitted it to trade Wiesel have been able to demonstrate that the within a narrow band, but the US wants the message about the image falling on the retina yuan to be allowed to trade freely. However, undergoes a step-wise analysis in a system of Beijing has made it clear that it will take its nerve cells stored in columns. In this system time and tread carefully before allowing the each cell has its specific function and is yuan to rise further in value. responsible for a specific detail in the pattern of the retinal image. 12

learning recognition learning recognition codewords dictionary codewords dictionary feature detection & representation image representation category models category models category category (and/or) classifiers decision (and/or) classifiers decision 13

1. Feature Detection and Representation 1. Feature Detection and Representation Feature Detection Feature Detection § Sliding window § Leung et al., 1999 § Viola et al., 1999 § Renninger et al. 2002 14

Feature Detection Feature Detection § Sliding window § Leung et al., 1999 § Viola et al., 1999 § Renninger et al., 2002 § Regular grid § Vogel et al., 2003 § Fei-Fei et al., 2005 Feature Detection Feature Detection § Sliding window § Leung et al., 1999 § Viola et al., 1999 § Renninger et al., 2002 § Regular grid § Vogel et al., 2003 § Fei-Fei et al., 2005 § Interest point detector § Csurka et al., 2004 § Fei-Fei et al., 2005 § Sivic et al., 2005 15

Feature Detection Feature Detection § Sliding window § Leung et al., 1999 § Viola et al., 1999 § Renninger et al., 2002 § Regular grid § Vogel et al., 2003 § Fei-Fei et al., 2005 § Interest point detector § Csurka et al., 2004 § Fei-Fei et al., 2005 § Sivic et al., 2005 § Other methods § Random sampling (Ullman et al., 2002) § Segmentation based patches (Barnard et al., 2003 Feature Representation Feature Representation Visual words, aka textons, aka keypoints: K-means clustered pieces of the image § Various representations: § Filter bank responses § Image Patches § SIFT descriptors All encode more-or-less the same thing … 16

Interest Point Features Interest Point Features Compute Normalize SIFT patch descriptor [Lowe’99] Detect patches [Mikojaczyk and Schmid ’02] [Matas et al. ’02] [Sivic et al. ’03] Slide credit: Josef Sivic Interest Point Features Interest Point Features … 17

Patch Features Patch Features … Dictionary Formation Dictionary Formation … 18

Clustering (usually k- -Means) Means) Clustering (usually k … Vector quantization Slide credit: Josef Sivic Clustered Image Patches Clustered Image Patches Fei-Fei et al. 2005 19

Image Patch Examples of Codewords Codewords Image Patch Examples of Sivic et al. 2005 Image Representation Image Representation frequency ….. codewords 20

Recognizing and Learning Object Categories Based on work and slides - PDF document

Recognizing and Learning Object Categories Based on work and slides by R. Fergus, P. Perona, A. Zisserman, A. Efros, J. Ponce, S. Lazebnik, C. Schmid, F. DiMaio, and others Traditional Problem: Single Object Recognition 1 Most Objects

Recognizing object instances 3. Recognizing object instances Kristen Grauman UT-Austin Image

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Recognizing and Learning Object Categories Based on work and slides by R. Fergus, P. Perona, A.

Recognizing objects and actions in Finding boundaries images and video Recognizing

Combinatory Categorial Grammar (CCG) Categories Categories = types Primitive categories

Boris Babenko, Steve Branson, Serge Belongie University of California, San Diego ICCV 2009, Kyoto,

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

What is a Chair? The object The texture The object The texture The scene The object

Notes on derived categories and motives Daniel Krashen Table of Contents Introduction The

Tutorial: Differential Categories and Cartesian Differential Categories JS Pacaud Lemay FMCS

Tangent categories are locally Cartesian differential categories J.R.B. Cockett Department of

Ontological Categories Roberto Poli Ontologys three main components Fundamental categories

Weak functors for degenerate Trimble 3-categories Eugenia Cheng School of the Art Institute of

Challenges in Recognizing Challenges in Recognizing NFL with DY NFL with DY Accessibility

Overview of the Recognizing Inference in TExt (RITE-2) at Recognizing Inference in

1 Issues for Cache Hierarchies Issues for Cache Hierarchies Hashing: Cache Array Routing

RGB-D Object Discovery via Mul7scene Analysis Evan Herbst,

From heuristic to optimal models in naturalistic visual search Angela Radulescu 1,2 *, Bas van

How do we deal with such pointers? What about write-barrier cost? Inter-generational ptrs

The Cost of Capital Chapter 14 Principles Applied in This Chapter Principle 1: Money Has a

Introduction John Earwaker john_earwaker@first-economics.com Starting Point for Todays

PR19 Methodology webinar: Aligning Risk and Return and Financeability 19 July 2017 Trust in

Binary Addition Goals DEF: A binary adder with input length n is a combinational circuit