Learning Visual Semantics: Models, Massive Computation, and - PowerPoint PPT Presentation

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center

Evolvement of Visual Features • Low level features and histogram • SIFT and bag-of-words models • Sparse coding • Super vector and Fisher vector • Deep CNN 2

Evolvement of Visual Features • Low level features and histogram Less parameters • SIFT and bag-of-words models • Sparse coding • Super vector and Fisher vector • Deep CNN More parameters 3

Evolvement of Visual Features • Low level features and spatial histogram • SIFT and bag-of-words models Three fundamental techniques • Sparse coding 1. histogram 2. spatial gridding • Super vector and Fisher vector 3. filter have been used extensively • Deep CNN 4

Low Level Features and Spatial Pyramid 5

Raw Pixels as Feature Application 1: Face recognition Application 2: Hand written digits Concatenating raw pixels as 1D vector Tiny Image [Torralba et al 2007]: resize an image to 32x32 color thumbnail, which corresponds to a 3072 dimensional vector Pictures courtesy to Face Research Lab, Antonio Torralba and Sam Roweis

From Pixels to Histograms Color histogram [Swain and Ballard 91]   r   is proposed to model the distribution   g     b of colors in an image. Unlike raw pixel based vectors, histograms are not sensitive to • misalignment • scale transform • global rotation We can extend color histogram to : • Edge histogram • Shape context histogram • Local binary patterns (LBP) • Histogram of gradients Similar color histogram feature 7

From Histogram to Spatialized Histogram Problem of histograms: No spatial information! The same histogram! Example thanks to Erik Learned-Miller Histograms of spatial cells Spatial pyramid matching [Lazebnik et al CVPR’06] Ojala et al, PAMI’02 8

IBM IMARS Spatial Gridding First position in 1 st and 2 nd ImageCLEF Medical Imaging Classification Task: Determine which modality a medical image belongs to . - Images from Pubmed articles - 31 categories (x-ray, CT, MRI, ultrasound, etc.) 9

IBM IMARS Spatial Gridding First position in 1 st and 2 nd ImageCLEF Medical Imaging Classification http://www.imageclef.org/2012/medical 10

Image Filters • In addition to histogram, another group of features can be represented as “filters”. For example: 1. Harr-like filters (Viola-Jones face detection) 2. Gabor filters Widely used in fingerprint, (simple cells in the visual iris, OCR, texture and face cortex can be modeled by recognition. Gabor functions) 11

SIFT Feature and Bag-of-Words Model 1999 • Raw pixel Classical • SIFT SIFT features and features • Histogram feature beyond • HOG – Color Histogram • SURF – Edge histogram • Frequency analysis • DAISY • Image filters • BRIEF • Texture features • … – LBP • DoG • Scene features • Hessian detector – GIST • Shape descriptors • Laplacian of Harris • Edge detection • FAST • Corner detection • ORB • … 12

Scale-Invariant Feature Transform (SIFT) David G. Lowe - Distinctive image features from scale-invariant keypoints, IJCV 2004 - Object recognition from local scale-invariant features, ICCV 1999 SIFT Descriptor: Histogram of gradient orientation - Histogram is more robust to position than raw pixels - Edge gradient is more distinctive than color for local patches Concatenate histograms in spatial cells David Lowe’s excellent performance tuning: • Good parameters: 4 ori, 4 x 4 grid • Soft-assignment to spatial bins • Gaussian weighting over spatial location • Reduce the influence of large gradient magnitudes: thresholding +normalization 13

Scale-Invariant Feature Transform (SIFT) David G. Lowe - Distinctive image features from scale-invariant keypoints, IJCV 2004 - Object recognition from local scale-invariant features, ICCV 1999 SIFT Detector: Detect maxima and minima of difference-of-Gaussian in scale space Post-processing: keep corner points but reject low-contrast and edge points • In general object recognition, we may combine multiple detectors (e.g., Harris, Hessian), or use dense sampling for good performance. • Following SIFT, many research works including SURF, BRIEF, ORB, BRISK and etc have been proposed for faster local feature extraction. 14

Histogram of Local Features And Bag-of-Words Models 15

Histogram of Local Features frequency ….. dim = # codewords codewords 16

Histogram of Local Features + Spatial Gridding …… dim = #codewords x #grids 17

Bag of Words Models 18

Bag-of-Words Representation Object Bag of ‘words’ Computer Vision: Text and NLP: Slide credit: Fei-Fei Li 19

Topic Models for Bag-of-Words Representation Supervised classification Unsupervised classification Fei-Fei et al. CVPR 2005 Sivic et al. ICCV 2005 Classification + segmentation Cao and Fei-Fei. ICCV 2007 20

Pros and Cons of Bag of Words Models Images differ from texts! Bag of Words Models are good in - Modeling prior knowledge - Providing intuitive interpretation But these models suffer from - Loss of spatial information - Loss of information in quantization of “visual words” Better coding approach 21

Sparse Coding 22

Sparse Coding • Naïve histogram uses Vector Quantization as a hard assignment, while Sparse Coding provides a soft assignment. • Sparse Coding: approximation of l 0 norm (sparse solution): • SC works better with max pooling (while traditional VQ with averages pooling) • References: [M. Ranzato et al, CVPR’07] [J. Yang et al, CVPR09], [J. Wang et al CVPR10], [Y. Boureau et al, CVPR10] 23

Sparse Coding + Spatial Pyramid Yang et al, Linear Spatial Pyramid Matching using Sparse Coding for Image Classification, CVPR 2009 Sparse coding + spatial pyramid + linear SVM 24

Efficient Approach Locality preserving linear coding: 1. find k nearest neighbors to the query 2. compute sparse coding with the k neighbors Significantly faster than naïve SC, e.g., O(1000 a ) -> O(5 a ) For further speedup, we can use LS regression to replace SC [J. Wang et al CVPR10] Matlab implementation (http://www.ifp.illinois.edu/~jyang29/LLC.htm ) Can be further speed up for top-k search 25

Sparse Coding Are Not Necessarily Sparse Sparse coding Hard quantization s.t. Sparsest solution! Less sparse! Sparse coding is less sparse. Image level representation is not sparse after pooling. Is the success of SC due to sparsity? 26

Fisher Vector and Super Vector 27

Information Loss • Coding with information loss: VQ: Sparse coding: • Lossless coding: • Significant difference with a function: a scalar!! SC or VQ: a function!! Lossless coding: 28

Lossless Coding as Mixture of Experts • Let’s look at each codeword as a “local expert”: Gating function (e.g., GMM, sparse GMM, Harmonic K-means, etc) Expert 1 Expert 2 Expert 3 29

Pooling Towards Image-Level Representation Component 1 Component 2 Component 3 + + + + + + Pooling : Normalize and concatenate Both Fisher Vector and Super Vector can be written in this form (with different subtraction and normalization and factors) Related references: • Fisher Vector [Perronnin et al, ECCV10] • Supervector [X. Zhou, K. Yu, T. Zhang et al, ECCV10] • HG [X. Zhou et al, ECCV09] 30

Pooling Towards Image-Level Representation Component 1 Component 2 Component 3 + + + + + + Pooling : Normalize and concatenate Big model : The dimension becomes C (#components) x d (#fea dim) For example, if C=1000, d=128, the final dimension is 128K 100+ times longer than that from SC or VQ! 31

Very Long Vector as Feature Representation We can generate very long image feature vector as we discussed before The strong feature we used for ImageNet LSVRC 2010 – Dense sampling: LBP + HOG, fea dim=100 (after PCA) – GMM with 1024 components – 4 spatial gridding (1+3x1) – Dimension of image feature: 100 x 1024 x 4 = 0.41 M LBP GMM pooling HOG 32

How to solve big models? 33

For Small Datasets: Use Kernel Trick! Kernel trick: • 10K images => Kernel matrix: 10K x 10K ~100M • Computational complexity depends on the size of Kernel matrix which is less than feature dimension We tried nonlinear kernels for face verification and got good performance Results on LFW dataset Learning Locally-Adaptive Decision Functions for Person Verification , CVPR’13 (with Z. Li and S. Chang, F. Liang, T. Huang, J. Smith) 34

For Large Dataset: Use Stochastic Gradient Descent • Suppose we are working on ImageNet data using 0.4 M feature vectors. • Total training data: 1.2M x 0.4M ~ 0.5 T real values! – Too big to load into memory – Too many samples to use kernel tricks • Solution: Stochastic Gradient Descent (SGD) – Idea: estimate the gradient on a randomly picked sample – Comparing with gradient descent: 35

SGD Can Be Very Simple To Implement A 10 line binary SVM solver by Shai Shalev-Shwartz decreasing learning rate 36

Deep CNN and Related Tech 37

Learning Visual Semantics: Models, Massive Computation, and - PowerPoint PPT Presentation

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features Low level features and histogram

Massive Data Algorithmics Lecture 1: Introduction Massive Data Algorithmics Lecture 1:

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

The FIFA Universe Massive scale, massive influence, massive corruption First, Some History.

Polyteam Semantics Team Semantics Axiomatizations in team semantics Polyteams and Jonni

Semantics in Practice Semantics of Practice How do we write semantics? 1: pen-and-paper How do

Introductory Notes Jigsaw Semantics or: Dynamic Semantics Put Together Again Formal semantics

Polyteam Semantics Team Semantics Axiomatisations in team semantics Polyteams and

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

A different look to massive MIMO Ana Garca Armada Communications Research Group (GCOM)

Massive Data Algorithmics Lecture 10: Connected Components and MST Massive Data Algorithmics

1 2 Compress a massive object to a small sketch 2 Compress a massive object to a small

Boosted Cascade of Simple Features Paul Viola and Michael Jones CVPR 2001 Brendan Morris

Matching and Image Alignment Computer Vision Fall 2018 Columbia University Feature Matching

Efficient visual search of local features Cordelia Schmid Bag-of-features

Local Feature Extraction and Learning for Computer Vision Bin Fan, Chinese Academy of Sciences,

CS201 Computer Vision Lect 08: SIFT Keypoint Detection John Magee 23 Septermber 2014 Slides

Feature Point Feature-based approach: Detect and match feature Detec.on and Matching points

Scale Invariant Region Selection and SIFT Sung-Eui Yoon ( ) Course URL:

Scalable SIFT for NUMA with Actors Frank Feinbube , Lena Herscheid, Christoph Neijenhuis, Peter