1
play

1 Feature Extraction and Description Visual Vocabulary - PDF document

Visual Categorization With Bags Basic Problem Addressed of Keypoints. ECCV, 2004 . Find a method for Generic Visual G. Csurka, C. Bray, C. Dance, and L. Fan. Categorization Visual Categorization: Identifying whether objects of one or


  1. Visual Categorization With Bags Basic Problem Addressed of Keypoints. ECCV, 2004 . � Find a method for Generic Visual G. Csurka, C. Bray, C. Dance, and L. Fan. Categorization � Visual Categorization: Identifying whether objects of one or more types Shilpa Gulati are present in an image. 2/15/2007 � Generic: Method generalizes to new object types. Invariant to scale, rotation, affine transformation, lighting changes, occlusion, intra-class variations etc. 1 2 Main Idea The Approach I: Training � Extract interest points from a dataset � Applying the bag-of-keywords of training images and attach approach for text categorization to descriptors to them. visual categorization. � Cluster the keypoints and construct a � Constructing vocabulary of feature set of vocabularies (Why a set? Next vectors from clustered descriptors of slide). images. � Train a multi-class qualifier using bags-of-keypoints around the cluster centers. 3 4 Why a set of vocabularies? The Approach II: Testing � The approach is motivated by text � Given a new image, get its keypoint categorization (spam filtering for descriptors. example). � Label each keypoint with its closest � For text, the keywords have a clear meaning cluster center in feature space. (Lottery! Deal! Affine Invariance). Hence � Categorize the objects using the finding a vocabulary is easy. multi-class classifier learnt earlier: � For images, keypoints don’t necessarily have repeatable meanings. � Naïve Bayes � Hence find a set, then experiment and find � Support Vector Machines (SVMs) the best vocabulary and classifier . 5 6 1

  2. Feature Extraction and Description Visual Vocabulary Construction � From a database of images: � Use a k -means clustering algorithm to Vocabulary is Extract interest points using Harris affine � form a set of clusters V = { V 1, V 2 .. ,V m } detector . of feature vectors. � It was shown in Mikolajczyk and Schmid (2002) � The feature vectors Construct multiple that scale invariant interest point detectors are associated with the not sufficient to handle affine transformations. vocabularies. cluster centers ( V 1 ..V m ) form a Attach SI FT descriptors to the interest points. � vocabulary. A SIFT description is 128 dimension vector. � Find multiple sets of � SIFT descriptors were found to be best for clusters using different matching in Mikolajczyk and Schmid (2003). values of k. V m V 1 V 2 7 8 Slide inspired by [ 3 ] Categorization by Naïve Bayes I: Clustering Example Training Extract keypoint � F descriptors from a Minimum set of labeled Image of distance from V 2 category C i images. Put the descriptor � in the cluster or n i1 n i2 n im “bag” with minimum distance V 1 V 2 V m from cluster center. n ij is the total number of times Count the number � All features Clusters a feature “near” V j occurs in of keypoints in each training images of category i bag. If a feature in image I is nearest to cluster center Slide inspired 9 10 Image taken from [2] Vj, we say that keypoint j has occurred in image I by [ 3 ] Categorization by Naïve Bayes III: Categorization by Naïve Bayes II: Testing Training � For each category C i , � P ( C i | Image ) = β P( C i )P( Image | C i ) P( C i ) = Number of images of category C i / � Total number of images = β P( C i )P( V 0 , V 1 ,.. , V m | C i ) � In all images I of category C i , For each keypoint V j � m ∏ = β P( C i ) ( | ) � P ( V j | C i ) = Number of keypoints V j in I / P V C i i Total number of keypoints in I = i 0 = n ij / n i � But use Laplace smoothing to avoid numbers near zero. P ( Vj | C i ) = ( n ij + 1) / ( n i + | V |) 11 12 Slide inspired by [ 3 ] Slide inspired by [ 3 ] 2

  3. SVM: Brief Introduction Categorization by SVM I: Training � The classifying function is � SVM classifier finds a hyperplane that separates two-class data with � f ( x ) = sign ( ∑ i y i β i K ( x, x i ) + b ) � x i is a feature vector from the training maximum margin. Two class dataset with images, y i is the label for x i (yes, in linearly separable classes. category C i , or no not in C i ), β i and b support vectors have to be learnt. Maximum margin hyperplane � Data is not always linearly separable give greatest separation (Non linear SVM) between classes. � A function Φ maps original data space to The data instances closest to higher dimensional space. maximum margin the hyperplane are called hyperplane. � K ( x, x i ) = Φ ( x ) . Φ ( x i ) support vectors. Equation f(x) is the target (classifying function) 13 14 Categorization by SVM II: Training Categorization by SVM III: Testing � For an image of category C i , x i is a � Given a query image, assign it to the vector formed by the number of category with the highest SVM occurrences of keypoints V in the output. image. � The parameters are sometimes learnt using Sequential Quadratic Program m ing . The approach used in the paper is not mentioned. � For the m class problem, the authors train m SVMs, each to distinguish some category C i from the other m -1. 15 16 Experiments Performance Metrics � Two databases � Confusion Matrix, M � m ij = Number of images from category j � DB1: In-house. 1779 images. identified by the classifier as category i . � 7 object classes: faces, buildings, trees, cars, phones, bikes. � Overall Error Rate, R � Some images contain objects from multiple � Accuracy = Total number of correctly classified classes. But large proprtion of image is test images/Total number of test images occupied by target image. � R = 1 – Accuracy � DB2: Freely available from various sites. � Mean Rank, MR About 3500 images. MR for category j = E [rank of class j in � � 5 object classes: faces, airplanes, cars classified output | true class is j ] (rear), cars(side) and motorbikes(side). 17 18 3

  4. Finding Value of k Naïve Bayes Results for DB1 � Error rate True � Faces Buildings Trees Phones Cars Bikes Books 4 2 2 4 3 9 decreases with Faces 7 5 2 5 0 5 3 3 Buildings 4 2 increasing k. 2 2 0 0 5 0 Trees 8 0 selected � Decrease is low 4 0 0 3 0 3 Phones 7 6 operating after k >1000. 8 1 15 Cars 1 5 6 7 1 3 1 3 point 2 11 0 9 0 Bikes 1 4 7 3 � Choose k = 1000. 4 0 5 7 1 Books 1 9 6 9 Good tradeoff � 1.49 1.88 1.33 1.33 1.63 1.57 1.57 Mean between accuracy rank and speed. Confusion Matrix for Naïve Bayes on DB1 Overall error rate = 28% Graph of error rate vs. k for Naïve Bayes for DB1 19 20 Graph is taken from [ 2 ] Table taken from [2] SVM Results SVM Results Results for DB1 � Linear SVM gives best results out of True � Faces Buildings Trees Cars Phones Bikes Books 0 13 Faces 9 8 1 4 1 0 1 0 3 4 linear, quadratic and cubic, except for 1 3 0 3 1 6 Buildings 6 3 cars. Quadratic gives best results on 1 10 1 0 6 0 Trees 8 1 0 1 1 5 0 5 cars. Cars 8 5 0 5 4 3 2 3 Phones 5 5 � How do we know these will work for 0 4 1 0 1 0 Bikes 9 1 other categories? What if we have to use 0 3 0 1 2 0 Books 7 3 higher degrees? Only time and more 1.04 1.77 1.28 1.30 1.83 1.09 1.39 Mean rank experiments will tell. Confusion Matrix for SVM on DB1 Error rate for faces = 2%. But increased rate of confusion with other categories due to larger number of faces in the training set. 21 22 Overall error rate = 15% Multiple Object Instances: Correctly Partially Visible Objects: Correctly Classified Classified 23 24 Images taken from [2] Images taken from [2] 4

  5. Images with Multi-Category Conclusions Objects � Good results for 7 category database. � However time information (for training and testing) not provided! � SVMs superior to Naïve Bayes. � Robust to background clutter. � Extension is to test on databases where the target object does NOT form a large fraction of the image. � May need to include geometric information. 25 26 Images taken from [2] References SVM Results Results on DB2 1. G. Csurka, C. Bray, C. Dance, and L. Fan. Faces Airplanes Cars Cars Motorbikes Visual categorization with bags of ( frontal) ( side) ( rear) ( side) ( side) keypoints . In Workshop on Statistical Learning 0.4 0.7 0 1.4 Faces ( frontal) 9 4 in Computer Vision, ECCV, 2004. 1.5 0.2 0.1 2.7 Airplanes ( side) 9 6 .3 1.9 0.5 0 0.9 Cars ( rear) 9 7 .7 2. Gabriela Csurka, Jutta Willamowski, 1.7 1.9 0.5 Cars ( side) 9 9 .6 2 .3 Christopher Dance. Xerox Research Centre 0.9 1.9 0.9 Motorbikes ( side) 0 .3 9 2 .7 Europe, Grenoble, France. Weak Geometry for 1.07 1.04 1.03 1.01 1.09 Mean rank Visual Categorization . Presentation Slides. Confusion Matrix for SVM on DB2 3. R. Mooney. Computer Science Department, University of Texas at Austin. CS 391L: Machine Learning - Text Categorization. Lecture Slides. 27 28 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend