Feature Representation Vision BoWs and Beyond Praveen Krishnan - PowerPoint PPT Presentation

Feature Representation – Vision BoWs and Beyond Praveen Krishnan

Feature Representation in Vision  Low Level  Local Detectors and Descriptors  Bag of Words  Mid Level  Parts  Attributes  Hierarchical  Deep Representations

Low Level Vision  Bag of Visual Words (BoWs)  Visual Vocabulary  Vector Quantization  Spatial Verification  Inspirations from IR  Advanced coding and pooling schemes  Soft quantization  Higher order representation

Bag of Words (BoWs)

A quick walk through..  BoWs Image Bag

A quick walk through..  Origins in text processing Salton & McGill (1983) Slide: ICCV 2005 short course, L. Fei-Fei

A quick walk through..  Origins in texture recognition Julesz, 1981 Histogram Universal texton dictionary

A quick walk through..  BoWs Representation (i) Interest Point Detection (ii) Feature Extraction (iii) Vector Quantization Visual Vocabulary (iv) Coding and Pooling Figure Courtesy: Tsai‟12

Devil is in the details  Local detectors & descriptors  SIFT, HOG, LBP, …  Vocabulary  k-means, approximate k-means, GMM  Coding and Pooling  Histogram, kernel code books, sparse codes, LLC, Fisher kernels, Super Vectors, VLAD  Average, Max  Spatial Verification  Spatial pyramids, Min Hash, LLAH  Recognition & Retrieval  SVMs  Weighting schemes, query expansion, re-ranking etc.

Devil is in the details Assume dense sampling at  Local detectors & descriptors multiple scales  SIFT, HOG, LBP, …  Vocabulary  k-means, approximate k-means, GMM  Coding and Pooling  Histogram, kernel code books, sparse codes, LLC, Fisher kernels, Super Vectors, VLAD  Average, Max  Spatial Verification  Spatial pyramids, Min Hash, LLAH  Recognition & Retrieval  SVMs  Weighting schemes, query expansion, re-ranking etc.

Feature Extraction  Detection  Regular  Fei-Fei et. al.‟ 05  Bosh et. al. „‟06  Sparse or Interest point  Mikolajczyk et. al ‟05 Descriptor  Csurka et al. 2004  Description  SIFT – Lowe‟99  MSER – Matas et.al. ‟02  HoG – Dalal et.al „05 Descriptor  many more…

Visual Words/ Learning Visual Vocabulary Code Words  Partitioning the local descriptor space into informative regions.  Let be „N‟ Clustering SIFT descriptors from a subset of entire corpus. Visual vocabulary  k-means or Codebook  Minimize sum of squared Euclidean distances between points and their nearest cluster centers.  Here B is the codebook Image patch example Sivic et.al. ICCV‟05

Visual vocabulary  Issues  Size of vocabulary?  Too small: visual words not representative of all patches.  Too large: quantization artifacts , over fitting  Generative or discriminative learning?  Gaussian mixture models. (More later)  Computational Efficiency  Approximate k-means using randomized kd-trees. Phibin et.al . CVPR‟07  Hierarchical K-Means. Nister et.al. CVPR‟07 Nister et.al. CVPR‟07

Coding Codebook   Vector quantization  Assigns each feature to the nearest visual word in the vocabulary.  Hard quantization.

Pooling Codebook   Invariance to changes in position, lightning conditions.  Robustness to clutter  Compactness of representation  Types  Sum or average  Max

Pooling Codebook   Invariance to changes in position, lightning conditions.  Robustness to clutter  Compactness of representation  Types  Sum or average  Max There goes the geometry too 

Spatial Pooling  Pyramid Match Kernel  Weighted sum of histogram intersections at multiple resolutions.  More weightage for matches found at fine level than Pyramid Match Kernel, Grauman et.al. ICCV‟05 coarse level.  Used for matching in high dimensional spaces.  Spatial Pyramid Matching  Concatenate the histogram vectors at all pyramid levels. Spatial Pyramid Matching, Lazebnik et. al. CVPR‟06

Recognition & Retrieval  Recognition  Discriminative Methods  K-nearest neighbor  SVMs  Non-linear kernels  Generative Methods  Naïve Bayes.  Bayesian Models. (pLSA, LDA)  Ranking & Retrieval  Nearest neighbor search Agenda for this talk  Indexing

Ranking & Retrieval  Similarity measures  Cosine distance L1  L1 distance  Chi-square distance Chi-Square  Hellinger distance Hellinger Applies discount to large values

Ranking & Retrieval  Earth Mover‟s Distance (EMD)  Computes dissimilarity between distribution.  Let be the distribution with „m‟ elements and be the distribution with „n‟ elements. The flow F that minimizes the overall cost is given as:- Distance between element s i and q j Transportation problem

Ranking & Retrieval Cats and Dogs Database  Evaluation measures  Notations: TP-True positives; FP-False positives; TN-True negatives; FN-False negatives  Precision (P): Query  Recall (R):  F-measure:  Mean Average Precision (mAP) TP TP FP TP FP  Area under precision and recall curve TN FN

Re-ranking using Geometric Verification  Use the position and shape of the underlying features to improve retrieval quality. Both images have many matches – which is correct?  Estimate geometric transformation to remove outliers  Approaches:  RANSAC  Hough Transform Slide Credit: Cordelia Schmid

Re-ranking using Geometric Verification  Fitting an affine transformation  Assume we know the correspondences, how do we get the transformation? Slide Credit: Cordelia Schmid

Slide Credit: Kristen Grauman Re-ranking using Geometric Verification E.g. Fitting a Line  RANSAC (Fischler & Bolles, 1981): Randomly select  Randomly select a seed group of matches minimal subset of  Compute transformation from seed group points  Find inliers to this transformation  If the number of inliers is sufficiently large, re-compute least-squares estimate of Hypothesize a model transformation on all of the inliers  Keep the transformation with the largest number of inliers Repeat hypothesize ‐ and Select points consistent with Compute error verify loop model function

Inspirations from IR  Making faster  Inverted indexing  Reverse look up  Enables fast search by exploiting the sparse representation. #Images #Visual Words Image Courtesy : Jawahar et. al DAS‟14

Inspirations from IR  Weighting schemes  Zipf‟s Law: Frequency of any word is inversely proportional to its rank.  TF-IDF Weighting:  Stop Words: T op 5% of frequent visual words. Image Courtesy Wikipedia

Inspirations from IR  Improving the recall  Query expansion: Reformulating the query to increase its expressiveness. E.g. adding synonyms, jittering etc. Results … Spatial verification Query image New results Repeat New query Chum et.al., Total Recall, ICCV’07

Inspirations from IR  Query Expansion:  Baseline  Transitive closure expansion  Use of priority queue.  Average query expansion  Recursive average query expansion  Multiple image resolution expansion  Compute the median image resolution  Formulate query for other resolution bands. (0, 4/5) - (2/3, 3/2) - (5/4, infinity)  Do average query expansion for each band. Chum et.al., Total Recall, ICCV’07

Advanced coding schemes

Lost in Quantization  Hard quantization (VQ)  Issues  codeword uncertainty  codeword plausibility.

Modeling Uncertainty  Kernel code books  Allowing a degree of ambiguity in assigning code words from image features.  Uses kernel density estimation  Kernel size determine the amount of smoothing  Kernel shape is related to distance function  Kernel Codebook Gemert et al, Kernel Codebooks for Scene Categorization, ECCV2008

Modeling Uncertainty  Code word uncertainty  Code word plausibility Gemert et al, Kernel Codebooks for Scene Categorization, ECCV2008

Encoding – Sparse Coding  From VQ to SC Too restrictive. Relax by L1 norm !  Max Pooling Yang, e.t.al, CVPR‟09

Encoding – Sparse Coding  Sparse Coding  Fix V and solve for U [LASSO]  Fix U and solve for V [Least Square]  Linear Classification using SPM kernel Yang, e.t.al, CVPR‟09

Encoding  SC results tends to be local.  Locality more essential than sparsity ?  Local coordinate coding (LCC)  Locality-constrained Linear Coding (LLC)  Dropping the sparsity term and evoking the locality term explicitly  Here denotes element wise multiplication and d i is the locality adaptor that gives different weights to different basis vector as per similarity . Wang et. al., CVPR‟10

Encoding - LLC  Comparison with VQ and SC  Better reconstruction  Local smooth sparsity  Analytical solution Wang et. al., CVPR‟10

Interpretation so far…  Discover sub spaces  Geometry of data manifold • Each basis an “anchor • Each basis is a “direction” point” • Sparsity: each datum is a • Sparsity is induced by linear combination of only locality: each datum is a several bases. linear combination of • Related to topic model neighbor anchors. Slide Credit: Kai Yu

Feature Representation Vision BoWs and Beyond Praveen Krishnan - PowerPoint PPT Presentation

Feature Representation Vision BoWs and Beyond Praveen Krishnan Feature Representation in Vision Low Level Local Detectors and Descriptors Bag of Words Mid Level Parts Attributes Hierarchical Deep

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Neural representation of linguistic feature Neural representation of linguistic feature hierarchy

Intro to Feature Representation in Virtual Screening Shengchao Liu, Gitter Group Feature

CS4495/6495 Introduction to Computer Vision 4B-L2 Matching feature points (a little) Feature

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Feature Extraction 7-1 Ronald Peikert SciVis 2007 - Feature Extraction What are features?

Feature Structures, Unification Some grammatical phenomena Linguistic features Feature

Feature Point Feature-based approach: Detect and match feature Detec.on and Matching points

Week 3 Video 4 Automated Feature Generation Automated Feature Selection Automated Feature

Using Data Fusion and Web Mining to Support Feature Location in Software SEMERU Feature: a

Feature Extraction 7-1 Ronald Peikert SciVis 2008 - Feature Extraction What are features?

Wavelets Shai Avidan Tel Aviv University Slide Credits (partial list) Rick Szeliski

Impact, Risks, and Mitigating Controls for Trafficking and Money Laundering in Art and Antiquities

3D Viewing & Clipping Where do geometries come from? Where do geometries come from? Pin-hole

AMeshlessHierarchical RepresentaKonforLightTransport JaakkoLehKnen 1,2

LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification Ma Jin 1

Python 1 Python Python is high-level programming language for general-purpose programming.

Python: Functions Functions Mathematical functions f(x) = x 2 f(x,y) = x 2 + y 2 In programming

Some Python Basics Getting user input, random numbers, review of if-statements, and Lists...

Feature Representation Vision BoWs and Beyond Praveen Krishnan - PowerPoint PPT Presentation

Feature Representation Vision BoWs and Beyond Praveen Krishnan Feature Representation in Vision Low Level Local Detectors and Descriptors Bag of Words Mid Level Parts Attributes Hierarchical Deep

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Neural representation of linguistic feature Neural representation of linguistic feature hierarchy

Intro to Feature Representation in Virtual Screening Shengchao Liu, Gitter Group Feature

CS4495/6495 Introduction to Computer Vision 4B-L2 Matching feature points (a little) Feature

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Branding Presentation VISION Mevushal VISION Muscat of Alexandria &amp; Viognier VISION

Feature Extraction 7-1 Ronald Peikert SciVis 2007 - Feature Extraction What are features?

Feature Structures, Unification Some grammatical phenomena Linguistic features Feature

Feature Point Feature-based approach: Detect and match feature Detec.on and Matching points

Week 3 Video 4 Automated Feature Generation Automated Feature Selection Automated Feature

Using Data Fusion and Web Mining to Support Feature Location in Software SEMERU Feature: a

Feature Extraction 7-1 Ronald Peikert SciVis 2008 - Feature Extraction What are features?

Wavelets Shai Avidan Tel Aviv University Slide Credits (partial list) Rick Szeliski

Impact, Risks, and Mitigating Controls for Trafficking and Money Laundering in Art and Antiquities

3D Viewing &amp; Clipping Where do geometries come from? Where do geometries come from? Pin-hole

AMeshlessHierarchical RepresentaKonforLightTransport JaakkoLehKnen 1,2

LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification Ma Jin 1

Python 1 Python Python is high-level programming language for general-purpose programming.

Python: Functions Functions Mathematical functions f(x) = x 2 f(x,y) = x 2 + y 2 In programming

Some Python Basics Getting user input, random numbers, review of if-statements, and Lists...

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

3D Viewing & Clipping Where do geometries come from? Where do geometries come from? Pin-hole