Feature Representation – Vision
BoWs and Beyond
Praveen Krishnan
Feature Representation Vision BoWs and Beyond Praveen Krishnan - - PowerPoint PPT Presentation
Feature Representation Vision BoWs and Beyond Praveen Krishnan Feature Representation in Vision Low Level Local Detectors and Descriptors Bag of Words Mid Level Parts Attributes Hierarchical Deep
BoWs and Beyond
Praveen Krishnan
Low Level
Local Detectors and Descriptors Bag of Words
Mid Level
Parts Attributes
Hierarchical
Deep Representations
Bag of
Visual
Vocabulary
Vector Quantization Spatial
Verification
Inspirations from IR
Advanced coding and pooling schemes
Soft quantization Higher order representation
BoWs
Image Bag
Origins in text processing
Slide: ICCV 2005 short course, L. Fei-Fei Salton & McGill (1983)
Origins in texture recognition
Universal texton dictionary Histogram Julesz, 1981
BoWs Representation
(i) Interest Point Detection (ii) Feature Extraction Visual Vocabulary (iii) Vector Quantization (iv) Coding and Pooling
Figure Courtesy: Tsai‟12
Local detectors & descriptors
SIFT, HOG, LBP, …
Vocabulary
k-means, approximate k-means, GMM
Coding and Pooling
Histogram, kernel code books, sparse codes, LLC, Fisher kernels,
Super Vectors, VLAD
Average, Max
Spatial
Spatial pyramids, Min Hash, LLAH
Recognition & Retrieval
SVMs Weighting schemes, query expansion, re-ranking etc.
Local detectors & descriptors
SIFT, HOG, LBP, …
Vocabulary
k-means, approximate k-means, GMM
Coding and Pooling
Histogram, kernel code books, sparse codes, LLC, Fisher kernels,
Super Vectors, VLAD
Average, Max
Spatial
Spatial pyramids, Min Hash, LLAH
Recognition & Retrieval
SVMs Weighting schemes, query expansion, re-ranking etc.
Assume dense sampling at multiple scales
Detection
Regular
Fei-Fei et. al.‟ 05 Bosh et. al. „‟06
Sparse or Interest point
Mikolajczyk et. al ‟05 Csurka et al. 2004
Description
SIFT – Lowe‟99 MSER – Matas et.al. ‟02 HoG – Dalal et.al „05 many more… Descriptor
Descriptor
Partitioning the local descriptor
Let be „N‟
k-means
Minimize sum of squared Euclidean
distances between points and their nearest cluster centers.
Here B is the codebook
Clustering Visual vocabulary
Visual Words/ Code Words Image patch example
Sivic et.al. ICCV‟05
Issues
Size of vocabulary?
Too small: visual words not
representative of all patches.
Too large: quantization artifacts, over
fitting
Generative or discriminative
Gaussian mixture models. (More later)
Computational Efficiency
Approximate k-means using randomized
kd-trees. Phibin et.al . CVPR‟07
Hierarchical K-Means. Nister et.al.
CVPR‟07
Nister et.al. CVPR‟07
Vector quantization
Assigns each feature to the nearest
visual word in the vocabulary.
Hard quantization. Codebook
Invariance to changes in position,
Robustness to clutter Compactness of representation Types
Sum or average Max Codebook
Invariance to changes in position,
Robustness to clutter Compactness of representation Types
Sum or average Max Codebook There goes the geometry too
Pyramid Match Kernel
Weighted sum of histogram
intersections at multiple resolutions.
More weightage for matches
found at fine level than coarse level.
Used for matching in high
dimensional spaces.
Spatial Pyramid Matching
Concatenate the histogram
vectors at all pyramid levels.
Pyramid Match Kernel, Grauman et.al. ICCV‟05 Spatial Pyramid Matching, Lazebnik et. al. CVPR‟06
Recognition
Discriminative Methods
K-nearest neighbor SVMs
Non-linear kernels
Generative Methods
Naïve Bayes. Bayesian Models. (pLSA, LDA)
Ranking & Retrieval
Nearest neighbor search Indexing Agenda for this talk
Similarity measures
Cosine distance L1 distance Chi-square distance Hellinger distance L1 Chi-Square Hellinger Applies discount to large values
Earth Mover‟s Distance (EMD)
Computes dissimilarity between distribution. Let be the distribution with „m‟ elements
and be the distribution with „n‟ elements. The flow F that minimizes the overall cost is given as:-
Transportation problem Distance between element si and qj
Evaluation measures
Notations:
TP-True positives; FP-False positives; TN-True negatives; FN-False negatives
Precision (P): Recall (R): F-measure: Mean Average Precision (mAP) Area under precision and recall curve
Query TP TP FP TP FP TN FN Cats and Dogs Database
Use the position and shape of the underlying
Estimate geometric transformation to remove outliers Approaches:
RANSAC Hough Transform Both images have many matches – which is correct?
Slide Credit: Cordelia Schmid
Fitting an affine transformation
Assume we know the correspondences, how do we get the
transformation?
Slide Credit: Cordelia Schmid
RANSAC (Fischler & Bolles, 1981):
Randomly select a seed group of matches Compute transformation from seed group Find inliers to this transformation If the number of inliers is sufficiently large,
re-compute least-squares estimate of transformation on all of the inliers
Keep the transformation with the largest
number of inliers
Randomly select minimal subset of points Hypothesize a model Compute error function Select points consistent with model Repeat hypothesize‐and verify loop E.g. Fitting a Line
Slide Credit: Kristen Grauman
Making faster
Inverted indexing
Reverse look up Enables fast search by exploiting the sparse representation.
Image Courtesy : Jawahar et. al DAS‟14 #Images #Visual Words
Weighting schemes
Zipf‟s Law: Frequency of any
word is inversely proportional to its rank.
TF-IDF Weighting: Stop Words: T
frequent visual words.
Image Courtesy Wikipedia
Improving the recall
Query expansion: Reformulating the query to increase its
Query image Results New query Spatial verification New results Chum et.al., Total Recall, ICCV’07 Repeat
Query Expansion:
Baseline Transitive closure expansion
Use of priority queue.
Average query expansion Recursive average query expansion Multiple image resolution expansion
Compute the median image resolution Formulate query for other resolution bands. (0, 4/5) - (2/3, 3/2) - (5/4,
infinity)
Do average query expansion for each band.
Chum et.al., Total Recall, ICCV’07
Hard quantization (VQ) Issues
codeword uncertainty codeword plausibility.
Kernel code books
Allowing a degree of ambiguity in assigning code words from
image features.
Uses kernel density estimation
Kernel size determine the amount of smoothing Kernel shape is related to distance function
Kernel Codebook Gemert et al, Kernel Codebooks for Scene Categorization, ECCV2008
Gemert et al, Kernel Codebooks for Scene Categorization, ECCV2008
Code word uncertainty Code word plausibility
From
Max Pooling
Too restrictive. Relax by L1 norm ! Yang, e.t.al, CVPR‟09
Sparse Coding
Fix
V and solve for U [LASSO]
Fix U and solve for V [Least Square]
Linear Classification using SPM kernel
Yang, e.t.al, CVPR‟09
SC results tends to be local. Locality more essential than sparsity ?
Local coordinate coding (LCC) Locality-constrained Linear Coding (LLC)
Dropping the sparsity term and evoking the locality term explicitly Here denotes element wise multiplication and di is the locality
adaptor that gives different weights to different basis vector as per similarity .
Wang et. al., CVPR‟10
Comparison with
Better reconstruction Local smooth sparsity Analytical solution
Wang et. al., CVPR‟10
Discover sub spaces
Geometry of data
Slide Credit: Kai Yu
So far…
Representation extracted from the count statistics. (BoWs) Why not include other statistics?
mean of local descriptors (co)variance of local descriptors
Slide Credit: Perronnin, CVPR‟12 tutorial on Large-Scale Visual Recognition
Given a codebook e.g.
learned with K-means, and a set of local descriptors,
Vectors):
Slide Credit: Perronnin, CVPR‟12 tutorial on Large-Scale Visual Recognition
Characterizes the distribution w.r.t. center (first order
Dimensionality (D) = N x d. Here „N‟ is the codebook size
Images and corresponding VLAD descriptors Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR‟10.
Slides from:-
Florent Perronnin, CVPR‟12 tutorial on Large-Scale
Visual Recognition
The INRIA package:
http://lear.inrialpes.fr/src/inria_fisher/
The Oxford package:
http://www.robots.ox.ac.uk/~vgg/research/encoding_ev
al/
J. Sivic and A. Zisserman.
Video Google: A text retrieval approach to object matching in videos. In Proc. ICCV, 2003
Nister CVPR‟06 M. Muja and D. G. Lowe. Fast approximate nearest neighbors with
automatic algorithmic configuration. In Proc. VISAPP , 2009
Chatfield BMVC‟11 J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization:
Improving particular object retrieval in large scale image databases. In Proc. CVPR, 2008.
J. C. van Gemert, J. M. Geusebroek, C. J.
Veenman, and A. W. M. Smeulders. Kernel codebooks for scene categorization. In Proc. ECCV, 2008.
Yang, Jianchao, et al. "Linear spatial pyramid matching using sparse coding for
image classification. In Proc CVPR 2009.
Wang, Jinjun, et al. "Locality-constrained linear coding for image
F. Perronnin and C. Dance. Fisher kernels on visual vocabularies for image
Florent Perronnin,
Yan Liu, Jorge Sánchez, Herve Poirier, Large-scale image retrieval with compressed Fisher vectors. CVPR 2010
Svetlana Lazebnik, Cordelia Schmid, Jean Ponce, Beyond Bags of Features:
Spatial Pyramid Matching for Recognizing Natural Scene Categories. CVPR 2006
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval, ACM
Press, ISBN: 020139829, 1999.
Ondrej Chum, Andrej Mikulík, Michal Perdoch, Jiri Matas, Total recall II:
Query expansion revisited, CVPR 2011