Learning Representations for Visual Object Class Recognition Marcin - PowerPoint PPT Presentation

Introduction Method description Experiments Summary Learning Representations for Visual Object Class Recognition Marcin Marszałek Cordelia Schmid Hedi Harzallah Joost van de Weijer LEAR, INRIA Grenoble, Rhône-Alpes, France October 15th, 2007 Learning Representations for Visual Object Class Recognition M. Marszałek, C. Schmid

Introduction Method description Experiments Summary Bag-of-Features Zhang, Marszałek, Lazebnik and Schmid [IJCV’07] Bag-of-Features (BoF) is an orderless distribution of local image features sampled from an image The representations are compared using χ 2 distance Channels can be combined to improve the accuracy Classification with non-linear Support Vector Machines Learning Representations for Visual Object Class Recognition M. Marszałek, C. Schmid

Introduction Method description Experiments Summary Spatial pyramid Lazebnik, Schmid and Ponce [CVPR’06] Spatial grids allow for locally orderless description They can be viewed as an extension to Bag-of-Features level 0 level 1 level 2 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + � 1/4 � 1/4 � 1/2 They were shown to work on scene category and object class datasets Learning Representations for Visual Object Class Recognition M. Marszałek, C. Schmid

Introduction Method description Experiments Summary Combining kernels Bosch, Zisserman and Munoz [CIVR’07], Varma and Ray [ICCV’07] It was shown that linear kernel combinations can be learned Through extensive search [Bosch’07] By extending the C-SVM objective function [Varma’07] We learn linear distance combinations instead Our approach can still be viewed as learning a kernel We exploit the kernel trick (it’s more than linear combination of kernels) No kernel parameters are set by hand, everything is learned Optimization task is more difficult Learning Representations for Visual Object Class Recognition M. Marszałek, C. Schmid

Introduction Method description Experiments Summary Our approach: large number of channels In our approach images are represented with several BoFs, where each BoF is assigned to a cell of a spatial grid We combine various methods for sampling the image, describing the local content and organizing BoFs spatially With few samplers, descriptors and spatial grids we can generate tens of possible representations that we call “channels” Useful channels can be found on per-class basis by running a multi-goal genetic algorithm Learning Representations for Visual Object Class Recognition M. Marszałek, C. Schmid

Introduction Method description Experiments Summary Overview of the processing chain Image → Sampler × Local descriptor × Spatial grid ⇒ Fusion → Classification Image is sampled Regions are locally described with feature vectors Features are quantized (assigned to a vocabulary word) and spatially ordered (assigned to a grid cell) Various channels are combined in the kernel Image is classified with an SVM Learning Representations for Visual Object Class Recognition M. Marszałek, C. Schmid

Introduction Method description Experiments Summary P ASCAL VOC 2007 challenge Image → Sampler × Local descriptor × Spatial grid ⇒ Fusion → Classification bottle car chair dog plant train Learning Representations for Visual Object Class Recognition M. Marszałek, C. Schmid

Introduction Method description Experiments Summary Image sampling Image → Sampler × Local descriptor × Spatial grid ⇒ Fusion → Classification Interest points detectors Harris-Laplace — detects corners [Mikołajczyk’04] Laplacian — detects blobs [Lindeberg’98] Dense sampling Multiscale grid with horizontal/vertical step of 6 pixels (half of the SIFT support area width/height) and scaling factor of 1.2 per scale-level Learning Representations for Visual Object Class Recognition M. Marszałek, C. Schmid

Introduction Method description Experiments Summary Local description Image → Sampler × Local descriptor × Spatial grid ⇒ Fusion → Classification SIFT — gradient orientation histogram [Lowe’04] SIFT+hue — SIFT with color [van de Weijer’06] SIFT descriptor gradient orientation color descriptor hue circle saturation saturation hue hue PAS — edgel histogram [Ferrari’06] Learning Representations for Visual Object Class Recognition M. Marszałek, C. Schmid

Introduction Method description Experiments Summary Spatial organization Image → Sampler × Local descriptor × Spatial grid ⇒ Fusion → Classification Visual vocabulary is created by clustering the features using k-means ( k = 4000) Spatial grids allow to separately describe the properties of roughly defined image regions 1x1 — standard Bag-of-Features 2x2 — defines four image quarters horizontal 3x1 — defines upper, middle and lower regions Learning Representations for Visual Object Class Recognition M. Marszałek, C. Schmid

Introduction Method description Experiments Summary Support Vector Machines Image → Sampler × Local descriptor × Spatial grid ⇒ Fusion → Classification We use non-linear Support Vector Machines The decision function has the following form � g ( x ) = i α i y i K ( x i , x ) − b We propose a multichannel extended Gaussian kernel � � � K ( x j , x k ) = exp − γ ch D ch ( x j , x k ) ch D ch ( x j , x k ) is a similarity measure ( χ 2 distance in our setup) for channel ch Learning Representations for Visual Object Class Recognition M. Marszałek, C. Schmid

Introduction Method description Experiments Summary Support Vector Machines Image → Sampler × Local descriptor × Spatial grid ⇒ Fusion → Classification We use non-linear Support Vector Machines The decision function has the following form � g ( x ) = i α i y i K ( x i , x ) − b We propose a multichannel extended Gaussian kernel � � � K ( x j , x k ) = exp − γ ch D ch ( x j , x k ) ch D ch ( x j , x k ) is a similarity measure ( χ 2 distance in our setup) for channel ch Problem: How to set each γ ch ? Learning Representations for Visual Object Class Recognition M. Marszałek, C. Schmid

Introduction Method description Experiments Summary Weighting the channels Image → Sampler × Local descriptor × Spatial grid ⇒ Fusion → Classification If we set γ ch to 1 / D ch we almost obtain (up to channels normalization) the method of Zhang et al. This approach demonstrated remarkable performance in both VOC’05 and VOC’06 We submit this approach as the “flat” method As γ ch controls the weight of channel ch in the sum, it can be used to select the most useful channels We run a genetic algorithm to optimize per-task γ ch , t kernel parameters and also C t SVM parameter The learned channel weights are used for the “genetic” submission Learning Representations for Visual Object Class Recognition M. Marszałek, C. Schmid

Introduction Method description Experiments Summary Genetic algorithm to optimize SVM parameters Image → Sampler × Local descriptor × Spatial grid ⇒ Fusion → Classification The genoms encode the optimized parameters In every iteration (generation) Random genoms are added to the pool (population) 1 Cross-validation is used to evaluate the genoms 2 (individuals) simultaneously for each class The more useful the genom is the more chance it has to be 3 selected and combined with another good genom Information from combined genoms is randomly mixed 4 (crossed) and forms the next generation To better avoid local minimas, random genes are altered 5 (mutated) Useful genes and gene combinations survive and multiply Learning Representations for Visual Object Class Recognition M. Marszałek, C. Schmid

Introduction Method description Experiments Summary Multiplying channels Channels ( γ ch = 1 / D ch ) # Average AP HS,LS × SIFT × 1,2x2 4 47.7 HS,LS,DS × SIFT × 1,2x2 6 52.6 HS,LS,DS × SIFT × 1,2x2,h3x1 9 53.3 HS,LS,DS × SIFT,SIFT+hue × 1,2x2,h3x1 18 54.0 HS,LS,DS × SIFT,SIFT+hue,PAS × 1,2x2,h3x1 21 54.2 DS × SIFT,SIFT+hue,PAS × 1,2x2,h3x1 9 51.8 Table: Class-averaged AP on VOC’07 validation set Combination of interest points and dense sampling boosts the performance, color and 3x1 grid are important The performance monotonically increases with the number of channels Last experiments show, that anything sensible (HoGs, different vocabularies) further helps the performance Learning Representations for Visual Object Class Recognition M. Marszałek, C. Schmid

Learning Representations for Visual Object Class Recognition Marcin - PowerPoint PPT Presentation

Introduction Method description Experiments Summary Learning Representations for Visual Object Class Recognition Marcin Marszaek Cordelia Schmid Hedi Harzallah Joost van de Weijer LEAR, INRIA Grenoble, Rhne-Alpes, France October 15th,

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Rich representations for Rich representations for learning visual recognition learning visual

61A Lecture 16 Announcements String Representations String Representations 4 String

Visual Representations of Newspaper Information 2 | 25 | Visual Representations of

Inheritance II Is-a versus has-a When an object of class A has a n object of class B, use

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

CS6501: Deep Learning for Visual Recognition Object Detection: RCNN, Fast-RCNN, Faster-RCNN

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

3/14/16 Review Class/Object Type Class Keyword class class Point

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

CSE 331 The Object class; Object equality and the equals method slides created by Marty Stepp

CS 221 Review Mason Vail Inheritance (1) Every class - except the Object class - directly

Interactive Image Mining Annie Morin 1 , Nguyen-Khang Pham 1,2 1 TEXMEX/IRISA 2 Cantho

3D Vision Viktor Larsson Spring 2019 Schedule Feb 18 Introduction Feb 25 Geometry, Camera

E9 205 Machine Learning for Signal Processing 23-8-17 Outline Basics for Image Processing

ImageProof: Enabling Authentication for Large-Scale Image Retrieval Shangwei Guo 1 Jianliang Xu 1

Texture and materials Subhransu Maji CMPSCI 670: Computer Vision December 1, 2016 CMPSCI 670

image matching presented by Dmytro Mishkin joint work with Anastasia Mishchuk, Milan Pultar,

For Forensics Sake What to do when IR Strikes By : Joe Gumke Joe Gumke Twitter : @joegumke

B i n a r y h e a p s ( c h a p t e r s 2 0 . 3 2 0 . 5 ) L e f