bag of visual words
play

Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie - PowerPoint PPT Presentation

Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University What object do these parts belong to? Some local feature are very informative An object as a collection of local features (bag-of-features) deals well


  1. Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

  2. What object do these parts belong to?

  3. Some local feature are very informative An object as a collection of local features (bag-of-features) • deals well with occlusion • scale invariant • rotation invariant

  4. (not so) crazy assumption spatial information of local features can be ignored for object recognition (i.e., verification)

  5. CalTech6 dataset Works pretty well for image-level classification Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

  6. Bag-of-features represent a data item (document, texture, image) as a histogram over features

  7. Bag-of-features represent a data item (document, texture, image) as a histogram over features an old idea (e.g., texture recognition and information retrieval)

  8. Texture recognition histogram Universal texton dictionary Julesz, 1981 Mori, Belongie and Malik, 2001

  9. Vector Space Model G. Salton. ‘Mathematics and Information Retrieval’ Journal of Documentation,1979 1 6 2 1 0 0 0 1 Tartan robot CHIMP CMU bio soft ankle sensor 0 4 0 1 4 5 3 2 Tartan robot CHIMP CMU bio soft ankle sensor http://www.fodey.com/generators/newspaper/snippet.asp

  10. A document (datapoint) is a vector of counts over each word (feature) v d = [ n ( w 1 ,d ) n ( w 2 ,d ) n ( w T,d )] · · · n ( · ) counts the number of occurrences just a histogram over words What is the similarity between two documents?

  11. A document (datapoint) is a vector of counts over each word (feature) v d = [ n ( w 1 ,d ) n ( w 2 ,d ) n ( w T,d )] · · · n ( · ) counts the number of occurrences just a histogram over words What is the similarity between two documents? Use any distance you want but the cosine distance is fast. v i d ( v i , v j ) = cos θ v i · v j = v j k v i kk v j k θ

  12. but not all words are created equal

  13. TF-IDF T erm F requency I nverse D ocument F requency v d = [ n ( w 1 ,d ) n ( w 2 ,d ) n ( w T,d )] · · · weigh each word by a heuristic v d = [ n ( w 1 ,d ) α 1 n ( w 2 ,d ) α 2 n ( w T,d ) α T ] · · · inverse document term frequency frequency ⇢ � D n ( w i,d ) α i = n ( w i,d ) log P d 0 1 [ w i ∈ d 0 ] (down-weights common terms)

  14. Standard BOW pipeline (for image classification)

  15. Dictionary Learning: Learn Visual Words using clustering Encode: build Bags-of-Words (BOW) vectors for each image Classify: Train and test data using BOWs

  16. Dictionary Learning: Learn Visual Words using clustering 1. extract features (e.g., SIFT) from images

  17. Dictionary Learning: Learn Visual Words using clustering 2. Learn visual dictionary (e.g., K-means clustering)

  18. Dictionary Learning: Learn Visual Words using clustering Encode: build Bags-of-Words (BOW) vectors for each image Classify: Train and test data using BOWs

  19. 1. Quantization: image features gets associated to a visual word (nearest cluster center) Encode: build Bags-of-Words (BOW) vectors for each image

  20. Encode: build Bags-of-Words (BOW) vectors for each image 2. Histogram: count the number of visual word occurrences

  21. Feature Extraction What kinds of features can we extract?

  22. • Regular ¡grid ¡ • Vogel ¡& ¡Schiele, ¡2003 • Fei-­‑Fei ¡& ¡Perona, ¡2005 • Interest ¡point ¡detector ¡ • Csurka ¡et ¡al. ¡2004 • Fei-­‑Fei ¡& ¡Perona, ¡2005 • Sivic ¡et ¡al. ¡2005 ¡ • Other ¡methods ¡ • Random ¡sampling ¡(Vidal-­‑Naquet ¡& ¡ Ullman, ¡2002) • Segmentation-­‑based ¡patches ¡(Barnard ¡ et ¡al. ¡2003)

  23. Compute ¡SIFT ¡ Normalize ¡patch descriptor ¡ ¡ ¡ ¡ ¡ ¡ ¡[Lowe’99] Detect ¡patches ¡ [Mikojaczyk ¡and ¡Schmid ¡’02] ¡ [Mata, ¡Chum, ¡Urban ¡& ¡Pajdla, ¡’02] ¡ ¡ [Sivic ¡& ¡Zisserman, ¡’03]

  24. Visual Vocabulary (coding and vector quantization)

  25. Alternative perspective… visual vocabulary = code book visual word = code vector The codebook is used for quantizing features A vector quantizer takes a feature vector and maps it to the index of the nearest code vector in a codebook

  26. … Clustering

  27. Visual ¡vocabulary … Clustering

  28. K-means Clustering Given k: 1.Select initial centroids at random. 2.Assign each object to the cluster with the nearest centroid. 3.Compute each centroid as the mean of the objects assigned to it. 4.Repeat previous 2 steps until no change.

  29. 1. Select initial centroids at random

  30. 2. Assign each object to 1. Select initial the cluster with the centroids at random nearest centroid.

  31. 2. Assign each object to 1. Select initial the cluster with the centroids at random nearest centroid. 3. Compute each centroid as the mean of the objects assigned to it (go to 2)

  32. 2. Assign each object to 1. Select initial the cluster with the centroids at random nearest centroid. 3. Compute each centroid as the 2. Assign each object to mean of the objects assigned to the cluster with the it (go to 2) nearest centroid.

  33. 2. Assign each object to 1. Select initial the cluster with the centroids at random nearest centroid. 3. Compute each centroid as the 2. Assign each object to mean of the objects assigned to the cluster with the it (go to 2) nearest centroid. Repeat previous 2 steps until no change

  34. From what data should I learn the code book? • Codebook can be learned on separate training set • Provided the training set is sufficiently representative, the codebook will be “universal”

  35. Example ¡visual ¡vocabulary Fei-­‑Fei ¡et ¡al. ¡2005

  36. Example codebook … Appearance codebook Source: B. Leibe

  37. Another codebook … … … … … Appearance codebook Source: B. Leibe

  38. Visual vocabularies: Issues • How to choose vocabulary size? • Too small: visual words not representative of all patches • Too large: quantization artifacts, overfitting • Computational efficiency • Vocabulary trees 
 (Nister & Stewenius, 2006)

  39. Histogram

  40. frequency ….. codewords

  41. Classification

  42. Given the bag-of-features representations of images from different classes, learn a classifier using machine learning (more on this soon)

  43. Extension to bag-of- words models

  44. All of these images have the same color histogram! How can we encode the spatial layout?

  45. Spatial Pyramid representation level 0 Lazebnik, Schmid & Ponce (CVPR 2006)

  46. Spatial Pyramid representation level 0 level 1 Lazebnik, Schmid & Ponce (CVPR 2006)

  47. Spatial Pyramid representation level 0 level 1 level 2 Lazebnik, Schmid & Ponce (CVPR 2006)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend