Visual bag of wods
St Stanfor
- rd University
07-Nov-2019 1
Lecture: Visual Bag of Words
Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab
1
Lecture: Visual Bag of Words Juan Carlos Niebles and Ranjay Krishna - - PowerPoint PPT Presentation
Visual bag of wods Lecture: Visual Bag of Words Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 07-Nov-2019 1 1 St Stanfor ord University CS 131 Roadmap Visual bag of wods Pixels Segments Images Videos Web
Visual bag of wods
St Stanfor
07-Nov-2019 1
1
Visual bag of wods
St Stanfor
07-Nov-2019 2
Convolutions Edges Descriptors
Resizing Segmentation Clustering Recognition Detection Machine learning
Motion Tracking
Neural networks Convolutional neural networks
Visual bag of wods
St Stanfor
07-Nov-2019 3
3
Visual bag of wods
St Stanfor
07-Nov-2019 4
4
Visual bag of wods
St Stanfor
07-Nov-2019 5
Visual bag of wods
St Stanfor
07-Nov-2019 6
Example textures (from Wikipedia)
Visual bag of wods
St Stanfor
07-Nov-2019 7
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
Visual bag of wods
St Stanfor
07-Nov-2019 8
Universal texton dictionary histogram
Visual bag of wods
St Stanfor
07-Nov-2019 9
Visual bag of wods
St Stanfor
07-Nov-2019 10
US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/
10
Visual bag of wods
St Stanfor
07-Nov-2019 11
US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/
11
Visual bag of wods
St Stanfor
07-Nov-2019 12
US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/
12
Visual bag of wods
St Stanfor
07-Nov-2019 13
Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)
face, flowers, building
Visual bag of wods
St Stanfor
07-Nov-2019 14
bag of features bag of features Parts-and-shape model
Visual bag of wods
St Stanfor
07-Nov-2019 15
Visual bag of wods
St Stanfor
07-Nov-2019 16
Visual bag of wods
St Stanfor
07-Nov-2019 17
Visual bag of wods
St Stanfor
07-Nov-2019 18
Visual bag of wods
St Stanfor
07-Nov-2019 19
Visual bag of wods
St Stanfor
07-Nov-2019 20
– Vogel & Schiele, 2003 – Fei-Fei & Perona, 2005
Visual bag of wods
St Stanfor
07-Nov-2019 21
– Vogel & Schiele, 2003 – Fei-Fei & Perona, 2005
– Csurka et al. 2004 – Fei-Fei & Perona, 2005 – Sivic et al. 2005
Visual bag of wods
St Stanfor
07-Nov-2019 22
– Vogel & Schiele, 2003 – Fei-Fei & Perona, 2005
– Csurka et al. 2004 – Fei-Fei & Perona, 2005 – Sivic et al. 2005
– Random sampling (Vidal-Naquet & Ullman, 2002) – Segmentation-based patches (Barnard et al. 2003)
Visual bag of wods
St Stanfor
07-Nov-2019 23
Visual bag of wods
St Stanfor
07-Nov-2019 24
Slide credit: Josef Sivic
Visual bag of wods
St Stanfor
07-Nov-2019 25
Slide credit: Josef Sivic
Visual bag of wods
St Stanfor
07-Nov-2019 26
k k i k i
cluster cluster in point 2
– Assign each data point to the nearest center – Recompute each cluster center as the mean of all points assigned to it
Visual bag of wods
St Stanfor
07-Nov-2019 27
– Unsupervised learning process – Each cluster center produced by k-means becomes a codevector – Codebook can be learned on separate training set – Provided the training set is sufficiently representative, the codebook will be “universal”
– A vector quantizer takes a feature vector and maps it to the index of the nearest codevector in a codebook – Codebook = visual vocabulary – Codevector = visual word
Visual bag of wods
St Stanfor
07-Nov-2019 28
Fei-Fei et al. 2005
Visual bag of wods
St Stanfor
07-Nov-2019 29
Sivic et al. 2005
Visual bag of wods
St Stanfor
07-Nov-2019 30
– Too small: visual words not representative of all patches – Too large: quantization artifacts, overfitting
– Vocabulary trees (Nister & Stewenius, 2006)
Visual bag of wods
St Stanfor
07-Nov-2019 31
frequency
Visual bag of wods
St Stanfor
07-Nov-2019 32
Visual bag of wods
St Stanfor
07-Nov-2019 33
– e.g k-nearest neighbors, support vector machine
– Discover visual themes
Visual bag of wods
St Stanfor
07-Nov-2019 34
11,400 images of game covers (Caltech games dataset)
how do I find this image in the database?
Visual bag of wods
St Stanfor
07-Nov-2019 35
– Extract features from the database images – Learn a vocabulary using k-means (typical k: 100,000) – Compute weights for each word – Create an inverted file mapping words à images
Visual bag of wods
St Stanfor
07-Nov-2019 36
– e.g., a word that appears in all documents is not helping us
Visual bag of wods
St Stanfor
07-Nov-2019 41
query image top 6 results
– performance degrades as the database grows
Visual bag of wods
St Stanfor
07-Nov-2019 42
– Works well for CD covers, movie posters – Real-time performance possible
real-time retrieval from a database of 40,000 CD covers Nister & Stewenius, Scalable Recognition with a Vocabulary Tree
Visual bag of wods
St Stanfor
07-Nov-2019 43
Visual bag of wods
St Stanfor
07-Nov-2019 44
Visual bag of wods
St Stanfor
07-Nov-2019 45
45
Visual bag of wods
St Stanfor
07-Nov-2019 46
Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.
Space-time interest points
Visual bag of wods
St Stanfor
07-Nov-2019 47
Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.
Visual bag of wods
St Stanfor
07-Nov-2019 48
Visual bag of wods
St Stanfor
07-Nov-2019 49
Visual bag of wods
St Stanfor
07-Nov-2019 50
Visual bag of wods
St Stanfor
07-Nov-2019 51
Visual bag of wods
St Stanfor
07-Nov-2019 52
Visual bag of wods
St Stanfor
07-Nov-2019 53
Visual bag of wods
St Stanfor
07-Nov-2019 54 Lazebnik, Schmid & Ponce (CVPR 2006)
Multi-class classification results (100 training images per class) Slide credit: Svetlana Lazebnik
Visual bag of wods
St Stanfor
07-Nov-2019 55
http://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html
Multi-class classification results (30 training images per class)
Slide credit: Svetlana Lazebnik
Visual bag of wods
St Stanfor
07-Nov-2019 56
Visual bag of wods
St Stanfor
07-Nov-2019 57
– 𝑦" is the event of visual word 𝑤" appearing in the image, – 𝑂(𝑗) the number of times word 𝑤" occurs in the image, – 𝑛 is the number of words in our vocabulary.
Csurka Bray, Dance & Fan, 2004
Visual bag of wods
St Stanfor
07-Nov-2019 58
∗ = 𝑏𝑠 max
3
Visual bag of wods
St Stanfor
07-Nov-2019 59
"67 8
"67 8
Csurka Bray, Dance & Fan, 2004
Visual bag of wods
St Stanfor
07-Nov-2019 60
"67 8
Csurka Bray, Dance & Fan, 2004
Visual bag of wods
St Stanfor
07-Nov-2019 61
Visual bag of wods
St Stanfor
07-Nov-2019 62
8 𝑄 𝑦𝑗 𝑑)
8 𝑄 𝑦𝑗 𝑑′)
Visual bag of wods
St Stanfor
07-Nov-2019 63
∗ = 𝑏𝑠 max
3
∗ = 𝑏𝑠 max
3
Visual bag of wods
St Stanfor
07-Nov-2019 64
8 𝑄 𝑦𝑗 𝑑1)
8 𝑄 𝑦𝑗 𝑑′)
8 𝑄 𝑦𝑗 𝑑2)
8 𝑄 𝑦𝑗 𝑑′)
Visual bag of wods
St Stanfor
07-Nov-2019 65
8 𝑄 𝑦𝑗 𝑑1)
8 𝑄 𝑦𝑗 𝑑′)
8 𝑄 𝑦𝑗 𝑑2)
8 𝑄 𝑦𝑗 𝑑′)
Visual bag of wods
St Stanfor
07-Nov-2019 66
"67 8
"67 8
Visual bag of wods
St Stanfor
07-Nov-2019 67
"67 8
Visual bag of wods
St Stanfor
07-Nov-2019 68
"67 8
"67 8
Visual bag of wods
St Stanfor
07-Nov-2019 69
∗ = 𝑏𝑠 max
3
∗ = 𝑏𝑠 max
3
∗ = 𝑏𝑠 max
3
"67 8
Visual bag of wods
St Stanfor
07-Nov-2019 70