bag of features models for category classification for
play

Bag-of-features models for category classification for category - PowerPoint PPT Presentation

Bag-of-features models for category classification for category classification Cordelia Schmid Category recognition Category recognition Image classification: assigning a class label to the image Image classification: assigning a class


  1. Bag-of-features models for category classification for category classification Cordelia Schmid

  2. Category recognition Category recognition • Image classification: assigning a class label to the image Image classification: assigning a class label to the image Car: present Cow: present Bike: not present Horse: not present Horse: not present …

  3. Category recognition Category recognition Tasks Tasks • Image classification: assigning a class label to the image Image classification: assigning a class label to the image Car: present Cow: present Bike: not present Horse: not present Horse: not present … • Object localization: define the location and the category Object localization: define the location and the category L Location ti Car Cow Category Category

  4. Difficulties: within object variations Difficulties: within object variations Variability : Camera position, Illumination,Internal parameters Within-object variations

  5. Difficulties: within class variations Difficulties: within class variations

  6. Image classification Image classification • Given Given Positive training images containing an object class Negative training images that don’t • Classify Classify A test image as to whether it contains the object class or not ?

  7. Bag-of-features – Origin: texture recognition Bag of features Origin: texture recognition • Texture is characterized by the repetition of basic elements Texture is characterized by the repetition of basic elements or textons Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

  8. Bag-of-features – Origin: texture recognition Bag of features Origin: texture recognition histogram Universal texton dictionary Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

  9. Bag-of-features – Origin: bag-of-words (text) Bag of features Origin: bag of words (text) • Orderless document representation: frequencies of words Orderless document representation: frequencies of words from a dictionary • Classification to determine document categories Classification to determine document categories Bag-of-words Common Co o 2 0 0 1 3 3 People 3 0 0 2 Sculpture 0 1 3 0 … … … … …

  10. Bag-of-features for image classification Bag of features for image classification SVM SVM Extract regions Extract regions Compute Compute Find clusters Find clusters Compute distance Compute distance Classification Classification descriptors and frequencies matrix [Csurka et al., ECCV Workshop’04], [Nowak,Jurie&Triggs,ECCV’06], [Zhang,Marszalek,Lazebnik&Schmid,IJCV’07]

  11. Bag-of-features for image classification Bag of features for image classification SVM SVM Extract regions Extract regions Compute Compute Find clusters Find clusters Compute distance Compute distance Classification Classification descriptors and frequencies matrix Step 1 Step 3 Step 2

  12. Step 1: feature extraction Step 1: feature extraction • Scale-invariant image regions + SIFT (see previous lecture) Scale invariant image regions + SIFT (see previous lecture) – Affine invariant regions give “too” much invariance – Rotation invariance for many realistic collections “too” much Rotation invariance for many realistic collections too much invariance • Dense descriptors – Improve results in the context of categories (for most categories) – Interest points do not necessarily capture “all” features I t t i t d t il t “ ll” f t • Color based descriptors • Color-based descriptors • Shape based descriptors • Shape-based descriptors

  13. Dense features Dense features - Multi-scale dense grid: extraction of small overlapping patches at multiple scales -Computation of the SIFT descriptor for each grid cells Computation of the SIFT descriptor for each grid cells -Exp.: Horizontal/vertical step size 3 pixel, scaling factor of 1.2 per level

  14. Bag-of-features for image classification Bag of features for image classification SVM SVM Extract regions Extract regions Compute Compute Find clusters Find clusters Compute distance Classification Compute distance Classification descriptors and frequencies matrix Step 1 Step 3 Step 2

  15. Step 2: Quantization Visual vocabulary Visual vocabulary Clustering Clustering

  16. Examples for visual words p Airplanes Airplanes Motorbikes Faces Wild Cats Leaves People Bikes

  17. Step 2: Quantization Step 2: Quantization • Cluster descriptors Cluster descriptors – K-means – Gaussian mixture model Gaussian mixture model • Assign each visual word to a cluster g – Hard or soft assignment • Build frequency histogram

  18. K-means clustering K means clustering • Minimizing sum of squared Euclidean distances g q between points x i and their nearest cluster centers • Algorithm: – Randomly initialize K cluster centers y – Iterate until convergence: • Assign each data point to the nearest center • Recompute each cluster center as the mean of all points R t h l t t th f ll i t assigned to it • Local minimum, solution dependent on initialization • Initialization important, run several times, select best

  19. Gaussian mixture model (GMM) Gaussian mixture model (GMM) • Mixture of Gaussians: weighted sum of Gaussians • Mixture of Gaussians: weighted sum of Gaussians where e e

  20. Hard or soft assignment Hard or soft assignment • K-means  hard assignment K means  hard assignment – Assign to the closest cluster center – Count number of descriptors assigned to a center Count number of descriptors assigned to a center • Gaussian mixture model  soft assignment g – Estimate distance to all centers – Sum over number of descriptors • Represent image by a frequency histogram

  21. Image representation Image representation cy requenc fr ….. codewords • each image is represented by a vector, typically 1000-4000 dimension, normalization with L1/L2 norm • fine grained – represent model instances fine grained represent model instances • coarse grained – represent object categories

  22. Bag-of-features for image classification Bag of features for image classification SVM SVM Extract regions Extract regions Compute Compute Find clusters Find clusters Compute distance Classification Compute distance Classification descriptors and frequencies matrix Step 1 Step 2 Step 3

  23. Step 3: Classification • Learn a decision rule (classifier) assigning bag-of- Learn a decision rule (classifier) assigning bag of features representations of images to different classes Decision Zebra boundary Non-zebra

  24. Training data Training data Vectors are histograms, one from each training image positive negative Train classifier,e.g.SVM

  25. Linear classifiers Linear classifiers • Find linear function ( hyperplane ) to separate positive and negative examples i l      x x positive positive : : x x w w 0 0 b b i i i i    x negative : x w 0 b i i Which hyperplane is best?

  26. Linear classifiers - margin Linear classifiers margin x 2 x x x 2 2 2 (color) (color) • Generalization is not G li ti i t good in this case: (roundness (roundness ) ) x x 1 1 x 2 x x x 2 2 2 (color) (color) • Better if a margin is introduced: b/| | w (roundness (roundness ) ) x x 1 1

  27. Nonlinear SVMs Nonlinear SVMs • Datasets that are linearly separable work out great: x 0 • But what if the dataset is just too hard? x 0 • We can map it to a higher-dimensional space: We can map it to a higher dimensional space: x 2 0 x

  28. Nonlinear SVMs Nonlinear SVMs • General idea: the original input space can always be General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ : x → φ ( x )

  29. Nonlinear SVMs Nonlinear SVMs • The kernel trick : instead of explicitly computing the lifting transformation φ ( x ), define a kernel function K such that K ( x i , x j j ) = φ ( x i ) · φ ( x j ) • This gives a nonlinear decision boundary in the original feature space: eatu e space     ( x , x ) y K b i i i i

  30. Kernels for bags of features Kernels for bags of features N   • Histogram intersection kernel: ( , ) min( ( ), ( )) I h h h i h i 1 2 1 2  1 i • Generalized Gaussian kernel:   1   2     ( ( , ) ) exp exp ( ( , ) ) K K h h h h D D h h h h 1 2 1 2   A • D can be Euclidean distance  RBF kernel • D can be Euclidean distance  RBF kernel     2  ( ( ) ) ( ( ) ) N h i h i   • D can be χ 2 distance 2 distance  ( ( , ) ) 1 1 2 2 D can be D D h h h h 1 2  ( ) ( ) h i h i  1 1 2 i

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend