Bag-of-features models for category classification for category classification Cordelia Schmid
Category recognition Category recognition • Image classification: assigning a class label to the image Image classification: assigning a class label to the image Car: present Cow: present Bike: not present Horse: not present Horse: not present …
Category recognition Category recognition Tasks Tasks • Image classification: assigning a class label to the image Image classification: assigning a class label to the image Car: present Cow: present Bike: not present Horse: not present Horse: not present … • Object localization: define the location and the category Object localization: define the location and the category L Location ti Car Cow Category Category
Difficulties: within object variations Difficulties: within object variations Variability : Camera position, Illumination,Internal parameters Within-object variations
Difficulties: within class variations Difficulties: within class variations
Image classification Image classification • Given Given Positive training images containing an object class Negative training images that don’t • Classify Classify A test image as to whether it contains the object class or not ?
Bag-of-features – Origin: texture recognition Bag of features Origin: texture recognition • Texture is characterized by the repetition of basic elements Texture is characterized by the repetition of basic elements or textons Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
Bag-of-features – Origin: texture recognition Bag of features Origin: texture recognition histogram Universal texton dictionary Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
Bag-of-features – Origin: bag-of-words (text) Bag of features Origin: bag of words (text) • Orderless document representation: frequencies of words Orderless document representation: frequencies of words from a dictionary • Classification to determine document categories Classification to determine document categories Bag-of-words Common Co o 2 0 0 1 3 3 People 3 0 0 2 Sculpture 0 1 3 0 … … … … …
Bag-of-features for image classification Bag of features for image classification SVM SVM Extract regions Extract regions Compute Compute Find clusters Find clusters Compute distance Compute distance Classification Classification descriptors and frequencies matrix [Csurka et al., ECCV Workshop’04], [Nowak,Jurie&Triggs,ECCV’06], [Zhang,Marszalek,Lazebnik&Schmid,IJCV’07]
Bag-of-features for image classification Bag of features for image classification SVM SVM Extract regions Extract regions Compute Compute Find clusters Find clusters Compute distance Compute distance Classification Classification descriptors and frequencies matrix Step 1 Step 3 Step 2
Step 1: feature extraction Step 1: feature extraction • Scale-invariant image regions + SIFT (see previous lecture) Scale invariant image regions + SIFT (see previous lecture) – Affine invariant regions give “too” much invariance – Rotation invariance for many realistic collections “too” much Rotation invariance for many realistic collections too much invariance • Dense descriptors – Improve results in the context of categories (for most categories) – Interest points do not necessarily capture “all” features I t t i t d t il t “ ll” f t • Color based descriptors • Color-based descriptors • Shape based descriptors • Shape-based descriptors
Dense features Dense features - Multi-scale dense grid: extraction of small overlapping patches at multiple scales -Computation of the SIFT descriptor for each grid cells Computation of the SIFT descriptor for each grid cells -Exp.: Horizontal/vertical step size 3 pixel, scaling factor of 1.2 per level
Bag-of-features for image classification Bag of features for image classification SVM SVM Extract regions Extract regions Compute Compute Find clusters Find clusters Compute distance Classification Compute distance Classification descriptors and frequencies matrix Step 1 Step 3 Step 2
Step 2: Quantization Visual vocabulary Visual vocabulary Clustering Clustering
Examples for visual words p Airplanes Airplanes Motorbikes Faces Wild Cats Leaves People Bikes
Step 2: Quantization Step 2: Quantization • Cluster descriptors Cluster descriptors – K-means – Gaussian mixture model Gaussian mixture model • Assign each visual word to a cluster g – Hard or soft assignment • Build frequency histogram
K-means clustering K means clustering • Minimizing sum of squared Euclidean distances g q between points x i and their nearest cluster centers • Algorithm: – Randomly initialize K cluster centers y – Iterate until convergence: • Assign each data point to the nearest center • Recompute each cluster center as the mean of all points R t h l t t th f ll i t assigned to it • Local minimum, solution dependent on initialization • Initialization important, run several times, select best
Gaussian mixture model (GMM) Gaussian mixture model (GMM) • Mixture of Gaussians: weighted sum of Gaussians • Mixture of Gaussians: weighted sum of Gaussians where e e
Hard or soft assignment Hard or soft assignment • K-means hard assignment K means hard assignment – Assign to the closest cluster center – Count number of descriptors assigned to a center Count number of descriptors assigned to a center • Gaussian mixture model soft assignment g – Estimate distance to all centers – Sum over number of descriptors • Represent image by a frequency histogram
Image representation Image representation cy requenc fr ….. codewords • each image is represented by a vector, typically 1000-4000 dimension, normalization with L1/L2 norm • fine grained – represent model instances fine grained represent model instances • coarse grained – represent object categories
Bag-of-features for image classification Bag of features for image classification SVM SVM Extract regions Extract regions Compute Compute Find clusters Find clusters Compute distance Classification Compute distance Classification descriptors and frequencies matrix Step 1 Step 2 Step 3
Step 3: Classification • Learn a decision rule (classifier) assigning bag-of- Learn a decision rule (classifier) assigning bag of features representations of images to different classes Decision Zebra boundary Non-zebra
Training data Training data Vectors are histograms, one from each training image positive negative Train classifier,e.g.SVM
Linear classifiers Linear classifiers • Find linear function ( hyperplane ) to separate positive and negative examples i l x x positive positive : : x x w w 0 0 b b i i i i x negative : x w 0 b i i Which hyperplane is best?
Linear classifiers - margin Linear classifiers margin x 2 x x x 2 2 2 (color) (color) • Generalization is not G li ti i t good in this case: (roundness (roundness ) ) x x 1 1 x 2 x x x 2 2 2 (color) (color) • Better if a margin is introduced: b/| | w (roundness (roundness ) ) x x 1 1
Nonlinear SVMs Nonlinear SVMs • Datasets that are linearly separable work out great: x 0 • But what if the dataset is just too hard? x 0 • We can map it to a higher-dimensional space: We can map it to a higher dimensional space: x 2 0 x
Nonlinear SVMs Nonlinear SVMs • General idea: the original input space can always be General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ : x → φ ( x )
Nonlinear SVMs Nonlinear SVMs • The kernel trick : instead of explicitly computing the lifting transformation φ ( x ), define a kernel function K such that K ( x i , x j j ) = φ ( x i ) · φ ( x j ) • This gives a nonlinear decision boundary in the original feature space: eatu e space ( x , x ) y K b i i i i
Kernels for bags of features Kernels for bags of features N • Histogram intersection kernel: ( , ) min( ( ), ( )) I h h h i h i 1 2 1 2 1 i • Generalized Gaussian kernel: 1 2 ( ( , ) ) exp exp ( ( , ) ) K K h h h h D D h h h h 1 2 1 2 A • D can be Euclidean distance RBF kernel • D can be Euclidean distance RBF kernel 2 ( ( ) ) ( ( ) ) N h i h i • D can be χ 2 distance 2 distance ( ( , ) ) 1 1 2 2 D can be D D h h h h 1 2 ( ) ( ) h i h i 1 1 2 i
Recommend
More recommend