Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie - PowerPoint PPT Presentation

Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

What object do these parts belong to?

Some local feature are very informative An object as a collection of local features (bag-of-features) • deals well with occlusion • scale invariant • rotation invariant

(not so) crazy assumption spatial information of local features can be ignored for object recognition (i.e., verification)

CalTech6 dataset Works pretty well for image-level classification Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

Bag-of-features represent a data item (document, texture, image) as a histogram over features

Bag-of-features represent a data item (document, texture, image) as a histogram over features an old idea (e.g., texture recognition and information retrieval)

Texture recognition histogram Universal texton dictionary Julesz, 1981 Mori, Belongie and Malik, 2001

Vector Space Model G. Salton. ‘Mathematics and Information Retrieval’ Journal of Documentation,1979 1 6 2 1 0 0 0 1 Tartan robot CHIMP CMU bio soft ankle sensor 0 4 0 1 4 5 3 2 Tartan robot CHIMP CMU bio soft ankle sensor http://www.fodey.com/generators/newspaper/snippet.asp

A document (datapoint) is a vector of counts over each word (feature) v d = [ n ( w 1 ,d ) n ( w 2 ,d ) n ( w T,d )] · · · n ( · ) counts the number of occurrences just a histogram over words What is the similarity between two documents?

A document (datapoint) is a vector of counts over each word (feature) v d = [ n ( w 1 ,d ) n ( w 2 ,d ) n ( w T,d )] · · · n ( · ) counts the number of occurrences just a histogram over words What is the similarity between two documents? Use any distance you want but the cosine distance is fast. v i d ( v i , v j ) = cos θ v i · v j = v j k v i kk v j k θ

but not all words are created equal

TF-IDF T erm F requency I nverse D ocument F requency v d = [ n ( w 1 ,d ) n ( w 2 ,d ) n ( w T,d )] · · · weigh each word by a heuristic v d = [ n ( w 1 ,d ) α 1 n ( w 2 ,d ) α 2 n ( w T,d ) α T ] · · · inverse document term frequency frequency ⇢ � D n ( w i,d ) α i = n ( w i,d ) log P d 0 1 [ w i ∈ d 0 ] (down-weights common terms)

Standard BOW pipeline (for image classification)

Dictionary Learning: Learn Visual Words using clustering Encode: build Bags-of-Words (BOW) vectors for each image Classify: Train and test data using BOWs

Dictionary Learning: Learn Visual Words using clustering 1. extract features (e.g., SIFT) from images

Dictionary Learning: Learn Visual Words using clustering 2. Learn visual dictionary (e.g., K-means clustering)

Dictionary Learning: Learn Visual Words using clustering Encode: build Bags-of-Words (BOW) vectors for each image Classify: Train and test data using BOWs

1. Quantization: image features gets associated to a visual word (nearest cluster center) Encode: build Bags-of-Words (BOW) vectors for each image

Encode: build Bags-of-Words (BOW) vectors for each image 2. Histogram: count the number of visual word occurrences

Feature Extraction What kinds of features can we extract?

• Regular ¡grid ¡ • Vogel ¡& ¡Schiele, ¡2003 • Fei-‑Fei ¡& ¡Perona, ¡2005 • Interest ¡point ¡detector ¡ • Csurka ¡et ¡al. ¡2004 • Fei-‑Fei ¡& ¡Perona, ¡2005 • Sivic ¡et ¡al. ¡2005 ¡ • Other ¡methods ¡ • Random ¡sampling ¡(Vidal-‑Naquet ¡& ¡ Ullman, ¡2002) • Segmentation-‑based ¡patches ¡(Barnard ¡ et ¡al. ¡2003)

Compute ¡SIFT ¡ Normalize ¡patch descriptor ¡ ¡ ¡ ¡ ¡ ¡ ¡[Lowe’99] Detect ¡patches ¡ [Mikojaczyk ¡and ¡Schmid ¡’02] ¡ [Mata, ¡Chum, ¡Urban ¡& ¡Pajdla, ¡’02] ¡ ¡ [Sivic ¡& ¡Zisserman, ¡’03]

Visual Vocabulary (coding and vector quantization)

Alternative perspective… visual vocabulary = code book visual word = code vector The codebook is used for quantizing features A vector quantizer takes a feature vector and maps it to the index of the nearest code vector in a codebook

… Clustering

Visual ¡vocabulary … Clustering

K-means Clustering Given k: 1.Select initial centroids at random. 2.Assign each object to the cluster with the nearest centroid. 3.Compute each centroid as the mean of the objects assigned to it. 4.Repeat previous 2 steps until no change.

1. Select initial centroids at random

2. Assign each object to 1. Select initial the cluster with the centroids at random nearest centroid.

2. Assign each object to 1. Select initial the cluster with the centroids at random nearest centroid. 3. Compute each centroid as the mean of the objects assigned to it (go to 2)

2. Assign each object to 1. Select initial the cluster with the centroids at random nearest centroid. 3. Compute each centroid as the 2. Assign each object to mean of the objects assigned to the cluster with the it (go to 2) nearest centroid.

2. Assign each object to 1. Select initial the cluster with the centroids at random nearest centroid. 3. Compute each centroid as the 2. Assign each object to mean of the objects assigned to the cluster with the it (go to 2) nearest centroid. Repeat previous 2 steps until no change

From what data should I learn the code book? • Codebook can be learned on separate training set • Provided the training set is sufficiently representative, the codebook will be “universal”

Example ¡visual ¡vocabulary Fei-‑Fei ¡et ¡al. ¡2005

Example codebook … Appearance codebook Source: B. Leibe

Another codebook … … … … … Appearance codebook Source: B. Leibe

Visual vocabularies: Issues • How to choose vocabulary size? • Too small: visual words not representative of all patches • Too large: quantization artifacts, overfitting • Computational efficiency • Vocabulary trees   (Nister & Stewenius, 2006)

Histogram

frequency ….. codewords

Classification

Given the bag-of-features representations of images from different classes, learn a classifier using machine learning (more on this soon)

Extension to bag-of- words models

All of these images have the same color histogram! How can we encode the spatial layout?

Spatial Pyramid representation level 0 Lazebnik, Schmid & Ponce (CVPR 2006)

Spatial Pyramid representation level 0 level 1 Lazebnik, Schmid & Ponce (CVPR 2006)

Spatial Pyramid representation level 0 level 1 level 2 Lazebnik, Schmid & Ponce (CVPR 2006)

Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie - PowerPoint PPT Presentation

Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University What object do these parts belong to? Some local feature are very informative An object as a collection of local features (bag-of-features) deals well

Lecture: Visual Bag of Words Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning

Bag of Words Model Overview of todays lecture Bag-of-words. K-means clustering.

WINE BOTTLE AIRBAG SINGLE WINE BOTTLE AIRBAG SINGLE BOTTLE AIR BAG PROTECT ALL BOTTLED PRODUCT

Red-Bag Engineers Consultants Software User Day April 2017 Red-Bag 2017 1 Ves Online

Pathway Red Bag Scheme October 2018 The Red Bag concept The Red Bag scheme was first implemented

The Plastic Bag Free world in action Surfriders Ban the Bag Campaign Plastic bag free

Efficient visual search of local features Cordelia Schmid Bag-of-features

Text Representation Bag-of-Words and Word Embeddings count vector unordered bag over

DC Bag Law Presented by Jeffrey Seltzer Associate Director Stormwater Management Division District

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Nave Bayes CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Terminology: bag-of-words

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

Words & Pictures Clustering and Bag of Words Many

Extracting keywords from images Bag-of-visual-words enriched with graph techniques Gjorgji

Lecture 2 Matthieu Bloch 1 Channel coding problem As illustrated in Figure 1, the problem of

Sing It, Say It, Share It LESSON 7 Your Response to the Lesson What was most interesting in the

Core object model EO / EFL++ Carsten Haitzler Samsung Electronics Principal Engineer

Concepts of programming languages F# Tim Zoet, Zino Onomiwo, Martijn Boom, Rik van Toor [Faculty

Pattern Recognition Part 5: Codebook Training Gerhard Schmidt Christian-Albrechts-Universitt

Information Theory Lecture 4 Discrete channels, codes and capacity: CT7 Channels:

Refactoring Legacy Code By: Adam Culp Twitter: @ adamculp https://joind.in/ 11658 1 Refactoring

Attacks only get better: The case of OCB2 Tetsu Iwata Nagoya University Real World Crypto 2020,

Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie - PowerPoint PPT Presentation

Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University What object do these parts belong to? Some local feature are very informative An object as a collection of local features (bag-of-features) deals well

Lecture: Visual Bag of Words Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning

Bag of Words Model Overview of todays lecture Bag-of-words. K-means clustering.

WINE BOTTLE AIRBAG SINGLE WINE BOTTLE AIRBAG SINGLE BOTTLE AIR BAG PROTECT ALL BOTTLED PRODUCT

Red-Bag Engineers Consultants Software User Day April 2017 Red-Bag 2017 1 Ves Online

Pathway Red Bag Scheme October 2018 The Red Bag concept The Red Bag scheme was first implemented

The Plastic Bag Free world in action Surfriders Ban the Bag Campaign Plastic bag free

Efficient visual search of local features Cordelia Schmid Bag-of-features

Text Representation Bag-of-Words and Word Embeddings count vector unordered bag over

DC Bag Law Presented by Jeffrey Seltzer Associate Director Stormwater Management Division District

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Nave Bayes CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Terminology: bag-of-words

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

Words &amp; Pictures Clustering and Bag of Words Many

Extracting keywords from images Bag-of-visual-words enriched with graph techniques Gjorgji

Lecture 2 Matthieu Bloch 1 Channel coding problem As illustrated in Figure 1, the problem of

Sing It, Say It, Share It LESSON 7 Your Response to the Lesson What was most interesting in the

Core object model EO / EFL++ Carsten Haitzler Samsung Electronics Principal Engineer

Concepts of programming languages F# Tim Zoet, Zino Onomiwo, Martijn Boom, Rik van Toor [Faculty

Pattern Recognition Part 5: Codebook Training Gerhard Schmidt Christian-Albrechts-Universitt

Information Theory Lecture 4 Discrete channels, codes and capacity: CT7 Channels:

Refactoring Legacy Code By: Adam Culp Twitter: @ adamculp https://joind.in/ 11658 1 Refactoring

Attacks only get better: The case of OCB2 Tetsu Iwata Nagoya University Real World Crypto 2020,

Words & Pictures Clustering and Bag of Words Many