Feature Representation Vision BoWs and Beyond Praveen Krishnan - - PowerPoint PPT Presentation

feature representation vision
SMART_READER_LITE
LIVE PREVIEW

Feature Representation Vision BoWs and Beyond Praveen Krishnan - - PowerPoint PPT Presentation

Feature Representation Vision BoWs and Beyond Praveen Krishnan Feature Representation in Vision Low Level Local Detectors and Descriptors Bag of Words Mid Level Parts Attributes Hierarchical Deep


slide-1
SLIDE 1

Feature Representation – Vision

BoWs and Beyond

Praveen Krishnan

slide-2
SLIDE 2

Feature Representation in Vision

 Low Level

 Local Detectors and Descriptors  Bag of Words

 Mid Level

 Parts  Attributes

 Hierarchical

 Deep Representations

slide-3
SLIDE 3

Low Level Vision

 Bag of

Visual Words (BoWs)

 Visual

Vocabulary

 Vector Quantization  Spatial

Verification

 Inspirations from IR

 Advanced coding and pooling schemes

 Soft quantization  Higher order representation

slide-4
SLIDE 4

Bag of Words (BoWs)

slide-5
SLIDE 5

A quick walk through..

 BoWs

Image Bag

slide-6
SLIDE 6

A quick walk through..

 Origins in text processing

Slide: ICCV 2005 short course, L. Fei-Fei Salton & McGill (1983)

slide-7
SLIDE 7

A quick walk through..

 Origins in texture recognition

Universal texton dictionary Histogram Julesz, 1981

slide-8
SLIDE 8

A quick walk through..

 BoWs Representation

(i) Interest Point Detection (ii) Feature Extraction Visual Vocabulary (iii) Vector Quantization (iv) Coding and Pooling

Figure Courtesy: Tsai‟12

slide-9
SLIDE 9

Devil is in the details

 Local detectors & descriptors

 SIFT, HOG, LBP, …

 Vocabulary

 k-means, approximate k-means, GMM

 Coding and Pooling

 Histogram, kernel code books, sparse codes, LLC, Fisher kernels,

Super Vectors, VLAD

 Average, Max

 Spatial

Verification

 Spatial pyramids, Min Hash, LLAH

 Recognition & Retrieval

 SVMs  Weighting schemes, query expansion, re-ranking etc.

slide-10
SLIDE 10

Devil is in the details

 Local detectors & descriptors

 SIFT, HOG, LBP, …

 Vocabulary

 k-means, approximate k-means, GMM

 Coding and Pooling

 Histogram, kernel code books, sparse codes, LLC, Fisher kernels,

Super Vectors, VLAD

 Average, Max

 Spatial

Verification

 Spatial pyramids, Min Hash, LLAH

 Recognition & Retrieval

 SVMs  Weighting schemes, query expansion, re-ranking etc.

Assume dense sampling at multiple scales

slide-11
SLIDE 11

Feature Extraction

 Detection

 Regular

 Fei-Fei et. al.‟ 05  Bosh et. al. „‟06

 Sparse or Interest point

 Mikolajczyk et. al ‟05  Csurka et al. 2004

 Description

 SIFT – Lowe‟99  MSER – Matas et.al. ‟02  HoG – Dalal et.al „05  many more… Descriptor

Descriptor

slide-12
SLIDE 12

Learning Visual Vocabulary

 Partitioning the local descriptor

space into informative regions.

 Let be „N‟

SIFT descriptors from a subset of entire corpus.

 k-means

 Minimize sum of squared Euclidean

distances between points and their nearest cluster centers.

 Here B is the codebook

Clustering Visual vocabulary

  • r Codebook

Visual Words/ Code Words Image patch example

Sivic et.al. ICCV‟05

slide-13
SLIDE 13

Visual vocabulary

 Issues

 Size of vocabulary?

 Too small: visual words not

representative of all patches.

 Too large: quantization artifacts, over

fitting

 Generative or discriminative

learning?

 Gaussian mixture models. (More later)

 Computational Efficiency

 Approximate k-means using randomized

kd-trees. Phibin et.al . CVPR‟07

 Hierarchical K-Means. Nister et.al.

CVPR‟07

Nister et.al. CVPR‟07

slide-14
SLIDE 14

Coding

  Vector quantization

 Assigns each feature to the nearest

visual word in the vocabulary.

 Hard quantization. Codebook

slide-15
SLIDE 15

Pooling

  Invariance to changes in position,

lightning conditions.

 Robustness to clutter  Compactness of representation  Types

 Sum or average  Max Codebook

slide-16
SLIDE 16

Pooling

  Invariance to changes in position,

lightning conditions.

 Robustness to clutter  Compactness of representation  Types

 Sum or average  Max Codebook There goes the geometry too

slide-17
SLIDE 17

Spatial Pooling

 Pyramid Match Kernel

 Weighted sum of histogram

intersections at multiple resolutions.

 More weightage for matches

found at fine level than coarse level.

 Used for matching in high

dimensional spaces.

 Spatial Pyramid Matching

 Concatenate the histogram

vectors at all pyramid levels.

Pyramid Match Kernel, Grauman et.al. ICCV‟05 Spatial Pyramid Matching, Lazebnik et. al. CVPR‟06

slide-18
SLIDE 18

Recognition & Retrieval

 Recognition

 Discriminative Methods

 K-nearest neighbor  SVMs

 Non-linear kernels

 Generative Methods

 Naïve Bayes.  Bayesian Models. (pLSA, LDA)

 Ranking & Retrieval

 Nearest neighbor search  Indexing Agenda for this talk

slide-19
SLIDE 19

Ranking & Retrieval

 Similarity measures

 Cosine distance  L1 distance  Chi-square distance  Hellinger distance L1 Chi-Square Hellinger Applies discount to large values

slide-20
SLIDE 20

Ranking & Retrieval

 Earth Mover‟s Distance (EMD)

 Computes dissimilarity between distribution.  Let be the distribution with „m‟ elements

and be the distribution with „n‟ elements. The flow F that minimizes the overall cost is given as:-

Transportation problem Distance between element si and qj

slide-21
SLIDE 21

Ranking & Retrieval

 Evaluation measures

 Notations:

TP-True positives; FP-False positives; TN-True negatives; FN-False negatives

 Precision (P):  Recall (R):  F-measure:  Mean Average Precision (mAP)  Area under precision and recall curve

Query TP TP FP TP FP TN FN Cats and Dogs Database

slide-22
SLIDE 22

Re-ranking using Geometric Verification

 Use the position and shape of the underlying

features to improve retrieval quality.

 Estimate geometric transformation to remove outliers  Approaches:

 RANSAC  Hough Transform Both images have many matches – which is correct?

Slide Credit: Cordelia Schmid

slide-23
SLIDE 23

Re-ranking using Geometric Verification

 Fitting an affine transformation

 Assume we know the correspondences, how do we get the

transformation?

Slide Credit: Cordelia Schmid

slide-24
SLIDE 24

Re-ranking using Geometric Verification

 RANSAC (Fischler & Bolles, 1981):

 Randomly select a seed group of matches  Compute transformation from seed group  Find inliers to this transformation  If the number of inliers is sufficiently large,

re-compute least-squares estimate of transformation on all of the inliers

 Keep the transformation with the largest

number of inliers

Randomly select minimal subset of points Hypothesize a model Compute error function Select points consistent with model Repeat hypothesize‐and verify loop E.g. Fitting a Line

Slide Credit: Kristen Grauman

slide-25
SLIDE 25

Inspirations from IR

 Making faster

 Inverted indexing

 Reverse look up  Enables fast search by exploiting the sparse representation.

Image Courtesy : Jawahar et. al DAS‟14 #Images #Visual Words

slide-26
SLIDE 26

Inspirations from IR

 Weighting schemes

 Zipf‟s Law: Frequency of any

word is inversely proportional to its rank.

 TF-IDF Weighting:  Stop Words: T

  • p 5% of

frequent visual words.

Image Courtesy Wikipedia

slide-27
SLIDE 27

Inspirations from IR

 Improving the recall

 Query expansion: Reformulating the query to increase its

  • expressiveness. E.g. adding synonyms, jittering etc.

Query image Results New query Spatial verification New results Chum et.al., Total Recall, ICCV’07 Repeat

slide-28
SLIDE 28

Inspirations from IR

 Query Expansion:

 Baseline  Transitive closure expansion

 Use of priority queue.

 Average query expansion  Recursive average query expansion  Multiple image resolution expansion

 Compute the median image resolution  Formulate query for other resolution bands. (0, 4/5) - (2/3, 3/2) - (5/4,

infinity)

 Do average query expansion for each band.

Chum et.al., Total Recall, ICCV’07

slide-29
SLIDE 29

Advanced coding schemes

slide-30
SLIDE 30

Lost in Quantization

 Hard quantization (VQ)  Issues

 codeword uncertainty  codeword plausibility.

slide-31
SLIDE 31

Modeling Uncertainty

 Kernel code books

 Allowing a degree of ambiguity in assigning code words from

image features.

 Uses kernel density estimation

 Kernel size determine the amount of smoothing  Kernel shape is related to distance function

 Kernel Codebook Gemert et al, Kernel Codebooks for Scene Categorization, ECCV2008

slide-32
SLIDE 32

Modeling Uncertainty

Gemert et al, Kernel Codebooks for Scene Categorization, ECCV2008

 Code word uncertainty  Code word plausibility

slide-33
SLIDE 33

Encoding – Sparse Coding

 From

VQ to SC

 Max Pooling

Too restrictive. Relax by L1 norm ! Yang, e.t.al, CVPR‟09

slide-34
SLIDE 34

Encoding – Sparse Coding

 Sparse Coding

 Fix

V and solve for U [LASSO]

 Fix U and solve for V [Least Square]

 Linear Classification using SPM kernel

Yang, e.t.al, CVPR‟09

slide-35
SLIDE 35

Encoding

 SC results tends to be local.  Locality more essential than sparsity ?

 Local coordinate coding (LCC)  Locality-constrained Linear Coding (LLC)

 Dropping the sparsity term and evoking the locality term explicitly  Here denotes element wise multiplication and di is the locality

adaptor that gives different weights to different basis vector as per similarity .

Wang et. al., CVPR‟10

slide-36
SLIDE 36

 Comparison with

VQ and SC

 Better reconstruction  Local smooth sparsity  Analytical solution

Encoding - LLC

Wang et. al., CVPR‟10

slide-37
SLIDE 37

Interpretation so far…

 Discover sub spaces

  • Each basis is a “direction”
  • Sparsity: each datum is a

linear combination of only several bases.

  • Related to topic model

 Geometry of data

manifold

  • Each basis an “anchor

point”

  • Sparsity is induced by

locality: each datum is a linear combination of neighbor anchors.

Slide Credit: Kai Yu

slide-38
SLIDE 38

Higher Order Representation

slide-39
SLIDE 39

Encoding

 So far…

 Representation extracted from the count statistics. (BoWs)  Why not include other statistics?

 mean of local descriptors  (co)variance of local descriptors

Slide Credit: Perronnin, CVPR‟12 tutorial on Large-Scale Visual Recognition

slide-40
SLIDE 40

VLAD-Vector of Locally Aggregated Descriptors

 Given a codebook e.g.

learned with K-means, and a set of local descriptors,

  • 1. Assign:
  • 2. Compute (Residual

Vectors):

  • 3. Concatenate and L2 nomalize

Slide Credit: Perronnin, CVPR‟12 tutorial on Large-Scale Visual Recognition

slide-41
SLIDE 41

VLAD

 Characterizes the distribution w.r.t. center (first order

statistics)

 Dimensionality (D) = N x d. Here „N‟ is the codebook size

and „d‟ is the dimensionality of local descriptor.

Images and corresponding VLAD descriptors Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR‟10.

slide-42
SLIDE 42

The Fisher vector

 Slides from:-

 Florent Perronnin, CVPR‟12 tutorial on Large-Scale

Visual Recognition

slide-43
SLIDE 43

Packages

 The INRIA package:

 http://lear.inrialpes.fr/src/inria_fisher/

 The Oxford package:

 http://www.robots.ox.ac.uk/~vgg/research/encoding_ev

al/

slide-44
SLIDE 44

References

 J. Sivic and A. Zisserman.

Video Google: A text retrieval approach to object matching in videos. In Proc. ICCV, 2003

 Nister CVPR‟06  M. Muja and D. G. Lowe. Fast approximate nearest neighbors with

automatic algorithmic configuration. In Proc. VISAPP , 2009

 Chatfield BMVC‟11  J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization:

Improving particular object retrieval in large scale image databases. In Proc. CVPR, 2008.

 J. C. van Gemert, J. M. Geusebroek, C. J.

Veenman, and A. W. M. Smeulders. Kernel codebooks for scene categorization. In Proc. ECCV, 2008.

 Yang, Jianchao, et al. "Linear spatial pyramid matching using sparse coding for

image classification. In Proc CVPR 2009.

 Wang, Jinjun, et al. "Locality-constrained linear coding for image

  • classification. In Proc. CVPR, 2010
slide-45
SLIDE 45

References

 F. Perronnin and C. Dance. Fisher kernels on visual vocabularies for image

  • categorization. In CVPR, 2007.

 Florent Perronnin,

Yan Liu, Jorge Sánchez, Herve Poirier, Large-scale image retrieval with compressed Fisher vectors. CVPR 2010

 Svetlana Lazebnik, Cordelia Schmid, Jean Ponce, Beyond Bags of Features:

Spatial Pyramid Matching for Recognizing Natural Scene Categories. CVPR 2006

 R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval, ACM

Press, ISBN: 020139829, 1999.

 Ondrej Chum, Andrej Mikulík, Michal Perdoch, Jiri Matas, Total recall II:

Query expansion revisited, CVPR 2011

slide-46
SLIDE 46

Thank You