Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie - - PowerPoint PPT Presentation

bag of visual words
SMART_READER_LITE
LIVE PREVIEW

Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie - - PowerPoint PPT Presentation

Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University What object do these parts belong to? Some local feature are very informative An object as a collection of local features (bag-of-features) deals well


slide-1
SLIDE 1

Bag-of-Visual-Words

16-385 Computer Vision (Kris Kitani)

Carnegie Mellon University

slide-2
SLIDE 2

What object do these parts belong to?

slide-3
SLIDE 3

a collection of local features

(bag-of-features)

An object as

Some local feature are very informative

  • deals well with occlusion
  • scale invariant
  • rotation invariant
slide-4
SLIDE 4

(not so) crazy assumption

spatial information of local features can be ignored for object recognition (i.e., verification)

slide-5
SLIDE 5

Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

Works pretty well for image-level classification CalTech6 dataset

slide-6
SLIDE 6

Bag-of-features

represent a data item (document, texture, image) as a histogram over features

slide-7
SLIDE 7

Bag-of-features

an old idea

(e.g., texture recognition and information retrieval)

represent a data item (document, texture, image) as a histogram over features

slide-8
SLIDE 8

Texture recognition

Universal texton dictionary histogram

Mori, Belongie and Malik, 2001 Julesz, 1981

slide-9
SLIDE 9

Vector Space Model

  • G. Salton. ‘Mathematics and Information Retrieval’ Journal of Documentation,1979

1 6 2 1 1 Tartan robot CHIMP CMU bio soft ankle sensor 4 1 4 5 3 2 Tartan robot CHIMP CMU bio soft ankle sensor

http://www.fodey.com/generators/newspaper/snippet.asp
slide-10
SLIDE 10

A document (datapoint) is a vector of counts over each word (feature)

What is the similarity between two documents?

vd = [n(w1,d) n(w2,d) · · · n(wT,d)] n(·) counts the number of occurrences

just a histogram over words

slide-11
SLIDE 11

A document (datapoint) is a vector of counts over each word (feature)

What is the similarity between two documents?

vd = [n(w1,d) n(w2,d) · · · n(wT,d)] n(·) counts the number of occurrences

just a histogram over words

Use any distance you want but the cosine distance is fast.

d(vi, vj) = cos θ = vi · vj kvikkvjk θ vi vj

slide-12
SLIDE 12

but not all words are created equal

slide-13
SLIDE 13

TF-IDF

weigh each word by a heuristic

Term Frequency Inverse Document Frequency

vd = [n(w1,d) n(w2,d) · · · n(wT,d)] vd = [n(w1,d)α1 n(w2,d)α2 · · · n(wT,d)αT ]

term frequency inverse document frequency

n(wi,d)αi = n(wi,d) log ⇢ D P

d0 1[wi ∈ d0]

  • (down-weights common terms)
slide-14
SLIDE 14

Standard BOW pipeline

(for image classification)

slide-15
SLIDE 15

Dictionary Learning: Learn Visual Words using clustering Encode: build Bags-of-Words (BOW) vectors for each image Classify: Train and test data using BOWs

slide-16
SLIDE 16

Dictionary Learning: Learn Visual Words using clustering

  • 1. extract features (e.g., SIFT) from images
slide-17
SLIDE 17

Dictionary Learning: Learn Visual Words using clustering

  • 2. Learn visual dictionary (e.g., K-means clustering)
slide-18
SLIDE 18

Dictionary Learning: Learn Visual Words using clustering Encode: build Bags-of-Words (BOW) vectors for each image Classify: Train and test data using BOWs

slide-19
SLIDE 19

Encode: build Bags-of-Words (BOW) vectors for each image

  • 1. Quantization: image features gets

associated to a visual word (nearest cluster center)

slide-20
SLIDE 20

Encode: build Bags-of-Words (BOW) vectors for each image

  • 2. Histogram: count the

number of visual word

  • ccurrences
slide-21
SLIDE 21

Feature Extraction

What kinds of features can we extract?

slide-22
SLIDE 22
  • Regular ¡grid ¡
  • Vogel ¡& ¡Schiele, ¡2003
  • Fei-­‑Fei ¡& ¡Perona, ¡2005
  • Interest ¡point ¡detector ¡
  • Csurka ¡et ¡al. ¡2004
  • Fei-­‑Fei ¡& ¡Perona, ¡2005
  • Sivic ¡et ¡al. ¡2005 ¡
  • Other ¡methods ¡
  • Random ¡sampling ¡(Vidal-­‑Naquet ¡& ¡

Ullman, ¡2002)

  • Segmentation-­‑based ¡patches ¡(Barnard ¡

et ¡al. ¡2003)

slide-23
SLIDE 23

Normalize ¡patch

Detect ¡patches ¡

[Mikojaczyk ¡and ¡Schmid ¡’02] ¡ [Mata, ¡Chum, ¡Urban ¡& ¡Pajdla, ¡’02] ¡ ¡ [Sivic ¡& ¡Zisserman, ¡’03]

Compute ¡SIFT ¡ descriptor ¡

¡ ¡ ¡ ¡ ¡ ¡[Lowe’99]

slide-24
SLIDE 24

slide-25
SLIDE 25

Visual Vocabulary

(coding and vector quantization)

slide-26
SLIDE 26

A vector quantizer takes a feature vector and maps it to the index of the nearest code vector in a codebook visual vocabulary = code book visual word = code vector The codebook is used for quantizing features Alternative perspective…

slide-27
SLIDE 27

slide-28
SLIDE 28

Clustering

slide-29
SLIDE 29

Clustering

Visual ¡vocabulary

slide-30
SLIDE 30

K-means Clustering

Given k: 1.Select initial centroids at random. 2.Assign each object to the cluster with the nearest centroid. 3.Compute each centroid as the mean of the objects assigned to it. 4.Repeat previous 2 steps until no change.

slide-31
SLIDE 31
  • 1. Select initial

centroids at random

slide-32
SLIDE 32
  • 1. Select initial

centroids at random

  • 2. Assign each object to

the cluster with the nearest centroid.

slide-33
SLIDE 33
  • 1. Select initial

centroids at random

  • 2. Assign each object to

the cluster with the nearest centroid.

  • 3. Compute each centroid as the

mean of the objects assigned to it (go to 2)

slide-34
SLIDE 34
  • 1. Select initial

centroids at random

  • 2. Assign each object to

the cluster with the nearest centroid.

  • 3. Compute each centroid as the

mean of the objects assigned to it (go to 2)

  • 2. Assign each object to

the cluster with the nearest centroid.

slide-35
SLIDE 35
  • 1. Select initial

centroids at random

  • 2. Assign each object to

the cluster with the nearest centroid.

  • 3. Compute each centroid as the

mean of the objects assigned to it (go to 2)

  • 2. Assign each object to

the cluster with the nearest centroid. Repeat previous 2 steps until no change

slide-36
SLIDE 36

From what data should I learn the code book?

  • Codebook can be learned on separate training

set

  • Provided the training set is sufficiently

representative, the codebook will be “universal”

slide-37
SLIDE 37

Example ¡visual ¡vocabulary

Fei-­‑Fei ¡et ¡al. ¡2005

slide-38
SLIDE 38

Example codebook

Source: B. Leibe

Appearance codebook

slide-39
SLIDE 39

Another codebook

Appearance codebook

… … … … …

Source: B. Leibe

slide-40
SLIDE 40

Visual vocabularies: Issues

  • How to choose vocabulary size?
  • Too small: visual words not representative of all patches
  • Too large: quantization artifacts, overfitting
  • Computational efficiency
  • Vocabulary trees 


(Nister & Stewenius, 2006)

slide-41
SLIDE 41

Histogram

slide-42
SLIDE 42

…..

frequency

codewords

slide-43
SLIDE 43

Classification

slide-44
SLIDE 44

Given the bag-of-features representations of images from different classes, learn a classifier using machine learning

(more on this soon)

slide-45
SLIDE 45

Extension to bag-of- words models

slide-46
SLIDE 46

How can we encode the spatial layout? All of these images have the same color histogram!

slide-47
SLIDE 47

Spatial Pyramid representation

level 0 Lazebnik, Schmid & Ponce (CVPR 2006)

slide-48
SLIDE 48

level 1 level 0

Spatial Pyramid representation

Lazebnik, Schmid & Ponce (CVPR 2006)

slide-49
SLIDE 49

level 1 level 0

Spatial Pyramid representation

level 2 Lazebnik, Schmid & Ponce (CVPR 2006)