Lecture: Visual Bag of Words Juan Carlos Niebles and Ranjay Krishna - - PowerPoint PPT Presentation

lecture visual bag of words
SMART_READER_LITE
LIVE PREVIEW

Lecture: Visual Bag of Words Juan Carlos Niebles and Ranjay Krishna - - PowerPoint PPT Presentation

Visual bag of wods Lecture: Visual Bag of Words Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 07-Nov-2019 1 1 St Stanfor ord University CS 131 Roadmap Visual bag of wods Pixels Segments Images Videos Web


slide-1
SLIDE 1

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 1

Lecture: Visual Bag of Words

Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab

1

slide-2
SLIDE 2

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 2

CS 131 Roadmap

Pixels Images

Convolutions Edges Descriptors

Segments

Resizing Segmentation Clustering Recognition Detection Machine learning

Videos

Motion Tracking

Web

Neural networks Convolutional neural networks

slide-3
SLIDE 3

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 3

What we will learn today

  • Visual bag of words (BoW)
  • Spatial Pyramid Matching
  • Naive Bayes

3

slide-4
SLIDE 4

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 4

What we will learn today

  • Visual bag of words (BoW)
  • Spatial Pyramid Matching
  • Naïve Bayes

4

slide-5
SLIDE 5

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 5

Object Bag of ‘words’

slide-6
SLIDE 6

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 6

Origin 1: Texture Recognition

Example textures (from Wikipedia)

slide-7
SLIDE 7

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 7

  • Texture is characterized by the repetition of basic elements or

textons

Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

Origin 1: Texture Recognition

slide-8
SLIDE 8

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 8

Origin 1: Texture recognition

Universal texton dictionary histogram

Universal texton dictionary

slide-9
SLIDE 9

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 9

Origin 2: Bag-of-words models

  • Orderless document representation: frequencies of words from a

dictionary Salton & McGill (1983)

slide-10
SLIDE 10

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 10

Origin 2: Bag-of-words models

US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/

  • Orderless document representation: frequencies of words from a

dictionary Salton & McGill (1983)

10

slide-11
SLIDE 11

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 11

Origin 2: Bag-of-words models

US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/

  • Orderless document representation: frequencies of words from a

dictionary Salton & McGill (1983)

11

slide-12
SLIDE 12

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 12

Origin 2: Bag-of-words models

US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/

  • Orderless document representation: frequencies of words from a

dictionary Salton & McGill (1983)

12

slide-13
SLIDE 13

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 13

Bags of features for object recognition

  • Works pretty well for image-level classification and for recognizing
  • bject instances

Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

face, flowers, building

slide-14
SLIDE 14

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 14

Bags of features for object recognition

Caltech6 dataset

bag of features bag of features Parts-and-shape model

slide-15
SLIDE 15

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 15

Bag of features

  • First, take a bunch of images, extract features, and build up a “dictionary” or

“visual vocabulary” – a list of common features

  • Given a new image, extract features and build a histogram – for each feature, find

the closest visual word in the dictionary

slide-16
SLIDE 16

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 16

Bag of features: outline

1. Extract features

slide-17
SLIDE 17

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 17

Bag of features: outline

1. Extract features 2. Learn “visual vocabulary”

slide-18
SLIDE 18

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 18

Bag of features: outline

1. Extract features 2. Learn “visual vocabulary” 3. Quantize features using visual vocabulary

slide-19
SLIDE 19

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 19

Bag of features: outline

1. Extract features 2. Learn “visual vocabulary” 3. Quantize features using visual vocabulary 4. Represent images by frequencies of “visual words”

slide-20
SLIDE 20

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 20

  • 1. Feature extraction
  • Regular grid

– Vogel & Schiele, 2003 – Fei-Fei & Perona, 2005

slide-21
SLIDE 21

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 21

  • 1. Feature extraction
  • Regular grid

– Vogel & Schiele, 2003 – Fei-Fei & Perona, 2005

  • Interest point detector

– Csurka et al. 2004 – Fei-Fei & Perona, 2005 – Sivic et al. 2005

slide-22
SLIDE 22

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 22

  • 1. Feature extraction
  • Regular grid

– Vogel & Schiele, 2003 – Fei-Fei & Perona, 2005

  • Interest point detector

– Csurka et al. 2004 – Fei-Fei & Perona, 2005 – Sivic et al. 2005

  • Other methods

– Random sampling (Vidal-Naquet & Ullman, 2002) – Segmentation-based patches (Barnard et al. 2003)

slide-23
SLIDE 23

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 23

  • 2. Learning the visual vocabulary

slide-24
SLIDE 24

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 24

  • 2. Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic

slide-25
SLIDE 25

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 25

  • 2. Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic

Visual vocabulary

slide-26
SLIDE 26

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 26

K-means clustering recap

å å

  • =

k k i k i

m x M X D

cluster cluster in point 2

) ( ) , (

  • Want to minimize sum of squared Euclidean distances

between points xi and their nearest cluster centers mk

  • Algorithm:
  • Randomly initialize K cluster centers
  • Iterate until convergence:

– Assign each data point to the nearest center – Recompute each cluster center as the mean of all points assigned to it

slide-27
SLIDE 27

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 27

From clustering to vector quantization

  • Clustering is a common method for learning a visual vocabulary or

codebook

– Unsupervised learning process – Each cluster center produced by k-means becomes a codevector – Codebook can be learned on separate training set – Provided the training set is sufficiently representative, the codebook will be “universal”

  • The codebook is used for quantizing features

– A vector quantizer takes a feature vector and maps it to the index of the nearest codevector in a codebook – Codebook = visual vocabulary – Codevector = visual word

slide-28
SLIDE 28

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 28

Example visual vocabulary

Fei-Fei et al. 2005

slide-29
SLIDE 29

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 29

Image patch examples of visual words

Sivic et al. 2005

slide-30
SLIDE 30

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 30

Visual vocabularies: Issues

  • How to choose vocabulary size?

– Too small: visual words not representative of all patches – Too large: quantization artifacts, overfitting

  • Computational efficiency

– Vocabulary trees (Nister & Stewenius, 2006)

slide-31
SLIDE 31

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 31

  • 3. Image representation

…..

frequency

codewords

slide-32
SLIDE 32

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 32

Image classification

  • Given the bag-of-features representations of images from different classes, how

do we learn a model for distinguishing them?

slide-33
SLIDE 33

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 33

Uses of BoW representation

  • Treat as feature vector for standard classifier

– e.g k-nearest neighbors, support vector machine

  • Cluster BoW vectors over image collection

– Discover visual themes

slide-34
SLIDE 34

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 34

Large-scale image matching

  • Bag-of-words models have been useful in

matching an image to a large database of

  • bject instances

11,400 images of game covers (Caltech games dataset)

how do I find this image in the database?

slide-35
SLIDE 35

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 35

Large-scale image search

Build the database:

– Extract features from the database images – Learn a vocabulary using k-means (typical k: 100,000) – Compute weights for each word – Create an inverted file mapping words à images

slide-36
SLIDE 36

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 36

Weighting the words

  • Just as with text, some visual words are more discriminative than others
  • the bigger fraction of the documents a word appears in, the less useful it

is for matching

– e.g., a word that appears in all documents is not helping us

the, and, or

  • vs. cow, AT&T, Cher
slide-37
SLIDE 37

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 41

Large-scale image search

query image top 6 results

  • Cons:

– performance degrades as the database grows

slide-38
SLIDE 38

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 42

Large-scale image search

  • Pros:

– Works well for CD covers, movie posters – Real-time performance possible

real-time retrieval from a database of 40,000 CD covers Nister & Stewenius, Scalable Recognition with a Vocabulary Tree

slide-39
SLIDE 39

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 43

Example bag-of-words matches

slide-40
SLIDE 40

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 44

Example bag-of-words matches

slide-41
SLIDE 41

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 45

45

slide-42
SLIDE 42

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 46

Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.

Space-time interest points

Bags of features for action recognition

slide-43
SLIDE 43

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 47

Bags of features for action recognition

Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.

slide-44
SLIDE 44

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 48

What about spatial info?

?

slide-45
SLIDE 45

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 49

What we will learn today

  • Visual bag of words (BoW)
  • Spatial Pyramid Matching
  • Naïve Bayes
slide-46
SLIDE 46

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 50

Pyramids

  • Very useful for representing images.
  • Pyramid is built by using multiple copies of image.
  • Each level in the pyramid is 1/4 of the size of previous level.
  • The lowest level is of the highest resolution.
  • The highest level is of the lowest resolution.
slide-47
SLIDE 47

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 51

Bag of words + pyramids

slide-48
SLIDE 48

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 52

Bag of words + pyramids

slide-49
SLIDE 49

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 53

Bag of words + pyramids

slide-50
SLIDE 50

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 54 Lazebnik, Schmid & Ponce (CVPR 2006)

Scene category dataset

Multi-class classification results (100 training images per class) Slide credit: Svetlana Lazebnik

slide-51
SLIDE 51

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 55

Caltech101 dataset

http://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html

Multi-class classification results (30 training images per class)

Slide credit: Svetlana Lazebnik

slide-52
SLIDE 52

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 56

What we will learn today

  • Visual bag of words (BoW)
  • Spatial Pyramid Matching
  • Naïve Bayes
slide-53
SLIDE 53

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 57

Naïve Bayes

  • Classify image using histograms of occurrences on visual words:
  • where:

– 𝑦" is the event of visual word 𝑤" appearing in the image, – 𝑂(𝑗) the number of times word 𝑤" occurs in the image, – 𝑛 is the number of words in our vocabulary.

Csurka Bray, Dance & Fan, 2004

slide-54
SLIDE 54

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 58

Naïve Bayes - classification

  • Our goal is to classify that the image represented by 𝒚 is belongs class that has

the highest posterior probability: 𝑑

∗ = 𝑏𝑠𝑕 max

3

𝑄 𝑑 𝒚)

slide-55
SLIDE 55

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 59

Naïve Bayes – conditional independence

  • Naïve Bayes classifier assumes that visual words are conditionally independent

given object class.

  • Therefore, we can multiply the probability of each visual word to obtain the joint

probability.

  • Model for image x under object class c:

𝑄 𝑦 𝑑) = 5

"67 8

𝑄 𝑦𝑗 𝑑) = 5

"67 8

𝑄 𝑤𝑗 𝑑)9(")

  • How do we compute 𝑄 𝑤𝑗 𝑑) ?

Csurka Bray, Dance & Fan, 2004

slide-56
SLIDE 56

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 60

Naïve Bayes – prior

  • Class priors 𝑄(𝑑) encode how likely we are to see one class versus others.
  • Note that:

:

"67 8

𝑄 𝑑 = 1

Csurka Bray, Dance & Fan, 2004

slide-57
SLIDE 57

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 61

Naïve Bayes - posterior

  • With the equations from the previous slides, we can now calculate the probability

that an image represented by 𝒚 belongs to class category 𝑑. 𝑄 𝑑 𝒚) = 𝑄 𝑑 𝑄 𝒚 𝑑) ∑3= 𝑄 𝑑= 𝑄 𝒚 𝑑′)

Bayes Theorem

slide-58
SLIDE 58

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 62

Naïve Bayes – posterior

  • With the equations from the previous slides, we can now calculate the probability

that an image represented by 𝒚 belongs to class category 𝑑. 𝑄 𝑑 𝒚) = 𝑄 𝑑 𝑄 𝒚 𝑑) ∑3= 𝑄 𝑑= 𝑄 𝒚 𝑑′) 𝑄 𝑑 𝒚) = 𝑄 𝑑 ∏"67

8 𝑄 𝑦𝑗 𝑑)

∑3= 𝑄 𝑑= ∏"67

8 𝑄 𝑦𝑗 𝑑′)

slide-59
SLIDE 59

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 63

Naïve Bayes - classification

  • We can now classify that the image represented by 𝒚 is belongs class that has the

highest probability: 𝑑

∗ = 𝑏𝑠𝑕 max

3

𝑄 𝑑 𝒚) 𝑑

∗ = 𝑏𝑠𝑕 max

3

log 𝑄 𝑑 𝒚)

slide-60
SLIDE 60

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 64

Let’s break down the posterior

The probability that 𝒚 belongs to class 𝑑1: 𝑄 𝑑1 𝒚) = 𝑄 𝑑1 ∏"67

8 𝑄 𝑦𝑗 𝑑1)

∑3= 𝑄 𝑑= ∏"67

8 𝑄 𝑦𝑗 𝑑′)

And the probability that 𝒚 belongs to class 𝑑2: 𝑄 𝑑2 𝒚) = 𝑄 𝑑2 ∏"67

8 𝑄 𝑦𝑗 𝑑2)

∑3= 𝑄 𝑑= ∏"67

8 𝑄 𝑦𝑗 𝑑′)

slide-61
SLIDE 61

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 65

Both their denominators are the same

The probability that 𝒚 belongs to class 𝑑1: 𝑄 𝑑1 𝒚) = 𝑄 𝑑1 ∏"67

8 𝑄 𝑦𝑗 𝑑1)

∑3= 𝑄 𝑑= ∏"67

8 𝑄 𝑦𝑗 𝑑′)

And the probability that 𝒚 belongs to class 𝑑2: 𝑄 𝑑2 𝒚) = 𝑄 𝑑2 ∏"67

8 𝑄 𝑦𝑗 𝑑2)

∑3= 𝑄 𝑑= ∏"67

8 𝑄 𝑦𝑗 𝑑′)

slide-62
SLIDE 62

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 66

Both their denominators are the same

  • Since we only want the max, we can ignore the denominator:

𝑄 𝑑1 𝒚) ∝ 𝑄 𝑑1 5

"67 8

𝑄 𝑦𝑗 𝑑1) 𝑄 𝑑2 𝒚) ∝ 𝑄 𝑑2 5

"67 8

𝑄 𝑦𝑗 𝑑2)

slide-63
SLIDE 63

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 67

For the general class c,

𝑄 𝑑 𝒚) ∝ 𝑄 𝑑 5

"67 8

𝑄 𝑦𝑗 𝑑)

slide-64
SLIDE 64

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 68

For the general class c,

𝑄 𝑑 𝒚) ∝ 𝑄 𝑑 5

"67 8

𝑄 𝑦𝑗 𝑑) We can take the log: log 𝑄 𝑑 𝒚) ∝ 𝑚𝑝𝑕𝑄 𝑑 + :

"67 8

𝑚𝑝𝑕𝑄 𝑦𝑗 𝑑)

slide-65
SLIDE 65

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 69

Naïve Bayes - classification

  • So, the following classification becomes:

𝑑

∗ = 𝑏𝑠𝑕 max

3

𝑄 𝑑 𝒚) 𝑑

∗ = 𝑏𝑠𝑕 max

3

log 𝑄 𝑑 𝒚) 𝑑

∗ = 𝑏𝑠𝑕 max

3

𝑚𝑝𝑕𝑄 𝑑 + :

"67 8

𝑚𝑝𝑕𝑄 𝑦𝑗 𝑑)

slide-66
SLIDE 66

Visual bag of wods

St Stanfor

  • rd University

07-Nov-2019 70

What we have learned today

  • Visual bag of words (BoW)
  • Spatial Pyramid Matching
  • Naïve Bayes