category level localization g y
play

Category-level localization g y Cordelia Schmid Cordelia Schmid - PowerPoint PPT Presentation

Category-level localization g y Cordelia Schmid Cordelia Schmid Recognition Recognition Classification Classification Object present/absent in an image Often presence of a significant amount of background clutter


  1. Category-level localization g y Cordelia Schmid Cordelia Schmid

  2. Recognition Recognition • • Classification Classification – Object present/absent in an image – Often presence of a significant amount of background clutter • Localization / Detection – Localize object within the frame – Bounding box or pixel- level segmentation

  3. Pixel-level object classification Pixel level object classification

  4. Difficulties Difficulties • Intra-class variations Intra class variations • Scale and viewpoint change • Multiple aspects of categories

  5. Approaches Approaches • Intra-class variation Intra class variation => Modeling of the variations, mainly by learning from a large dataset for example by SVMs large dataset, for example by SVMs • Scale + limited viewpoints changes • Scale + limited viewpoints changes => multi-scale approach or invariant local features • Multiple aspects of categories => separate detectors for each aspect, front/profile face, > separate detectors for each aspect front/profile face build an approximate 3D “category” model

  6. Outline 1. Sliding window detectors S 2. Features and adding spatial information g p 3. Histogram of Oriented Gradients (HOG) 4. State of the art algorithms and PASCAL VOC

  7. Sliding window detector • Basic component: binary classifier Car/non-car Classifier Yes, No, not a car a car t

  8. Sliding window detector • Detect objects in clutter by search Car/non-car Classifier • Sliding window : exhaustive search over position and scale

  9. Sliding window detector • Detect objects in clutter by search Car/non-car Classifier • Sliding window : exhaustive search over position and scale

  10. Detection by Classification • Detect objects in clutter by search Car/non-car Classifier • Sliding window : exhaustive search over position and scale Sliding window : exhaustive search over position and scale (can use same size window over a spatial pyramid of images)

  11. Feature Extraction Classification Detection Does the image contain a car? Does the image contain a car? • Classification: Unknown location + clutter ) lots of invariance • Detection: Uncluttered, normalized image ) more “detail”

  12. Window (Image) Classification Training Data  Feature Classifier    Extraction   Car/Non-car • Features usually engineered • Classifier learnt from data

  13. Problems with sliding windows … • aspect ratio • granularity (finite grid) • granularity (finite grid) • partial occlusion • multiple responses

  14. Outline 1. Sliding window detectors S 2. Features and adding spatial information g p 3. Histogram of Oriented Gradients (HOG) 4. State of the art algorithms and PASCAL VOC

  15. BOW + Spatial pyramids Start from BoW for region of interest (ROI) • no spatial information recorded no spatial information recorded • sliding window detector B Bag of Words f W d     Feature Vector      

  16. Adding Spatial Information to Bag of Words Bag of Words C Concatenate t t         Feature Vector     Keeps fixed length feature vector for a window

  17. Spatial Pyramid – represent correspondence  1 BoW      4 BoW          16 BoW 16 BoW      

  18. Dense Visual Words • Why extract only sparse image fragments? fragments? • Good where lots of invariance is needed, but not relevant to sliding window detection? • Extract dense visual words on an overlapping grid Quantize   Word    Patch / SIFT • More “detail” at the expense of invariance

  19. Outline 1. Sliding window detectors S 2. Features and adding spatial information g p 3. Histogram of Oriented Gradients + linear SVM classifier 4. State of the art algorithms and PASCAL VOC

  20. Feature: Histogram of Oriented Gradients (HOG) Gradients (HOG) dominant HOG image direction ency • tile 64 x 128 pixel window into 8 x 8 pixel cells tile 64 x 128 pixel window into 8 x 8 pixel cells freque • each cell represented by histogram over 8 orientation bins (i.e. angles in range 0-180 degrees) orientation orientation

  21. Histogram of Oriented Gradients (HOG) continued • Adds a second level of overlapping spatial bins re • Adds a second level of overlapping spatial bins re- normalizing orientation histograms over a larger spatial area • Feature vector dimension (approx) = 16 x 8 (for tiling) x 8 (orientations) x 4 (for blocks) = 4096 (orientations) x 4 (for blocks) 4096

  22. Window (Image) Classification Training Data  Feature Classifier    Extraction   pedestrian/Non-pedestrian • HOG Features • Linear SVM classifier

  23. Averaged examples

  24. Dalal and Triggs, CVPR 2005

  25. positive training data average over g p f( x )  w T x  b Learned model

  26. Training a sliding window detector g g • • Unlike training an image classifier there are a (virtually) Unlike training an image classifier, there are a (virtually) infinite number of possible negative windows • Training (learning) generally proceeds in three distinct Training (learning) generally proceeds in three distinct stages: 1. Bootstrapping: learn an initial window classifier from 1 B i l i i i l i d l ifi f positives and random negatives 2. Hard negatives: use the initial window classifier for detection on the training images (inference) and identify false positives with a high score false positives with a high score 3. Retraining: use the hard negatives as additional t training data i i d t

  27. Training a sliding window detector • Object detection is inherently asymmetric: much more “non-object” than “object” data non object than object data • Classifier needs to have very low false positive rate • Non-object category is very complex – need lots of data • Non-object category is very complex – need lots of data

  28. Bootstrapping 1. Pick negative training set at random set at random 2. Train classifier 3 3. Run on training data Run on training data 4. Add false positives to training set training set 5. Repeat from 2 • Collect a finite but diverse set of non-object windows • Force classifier to concentrate on hard negative examples • For some classifiers can ensure equivalence to training on For some classifiers can ensure equivalence to training on entire data set

  29. Example: train an upper body detector – Training data – used for training and validation sets • 33 Hollywood2 training movies 33 Hollywood2 training movies • 1122 frames with upper bodies marked – First stage training (bootstrapping) • 1607 upper body annotations jittered to 32k positive samples • • 55k negatives sampled from the same set of frames 55k negatives sampled from the same set of frames – Second stage training (retraining) • 150k hard negatives found in the training data

  30. Training data Training data – positive annotations positive annotations

  31. Positive windows Note: common size and alignment

  32. Jittered positives

  33. Jittered positives

  34. Random negatives

  35. Random negatives

  36. Window (Image) first stage classification Linear SVM  HOG Feature HOG Feature Jittered positives Jittered positives  Classifier  Extraction  random negatives f( x )  w T x  b  x • find high scoring false positives detections find high scoring false positives detections • these are the hard negatives for the next round of training • these are the hard negatives for the next round of training • cost = # training images x inference on each image cost = # training images x inference on each image

  37. Hard negatives

  38. Hard negatives

  39. First stage performance on validation set

  40. Precision – Recall curve correct returned windows windows windows windows • Precision: % of returned windows that are correct are correct • Recall: % of correct windows that are • Recall: % of correct windows that are returned all windows 1 0.8 classifier score decreasing 0.6 on precisio 0.4 0 2 0.2 0 0 0.2 0.4 0.6 0.8 1 recall

  41. Effects of retraining

  42. Side by side before retraining after retraining

  43. Side by side before retraining after retraining

  44. Accelerating Sliding Window Search • Sliding window search is slow because so many windows are needed e g x × y × scale ≈ 100 000 for a 320×240 image needed e.g. x × y × scale 100,000 for a 320×240 image • Most windows are clearly not the object class of interest • Can we speed up the search?

  45. Cascaded Classification • Build a sequence of classifiers with increasing complexity More complex, slower, lower false positive rate Classifier Classifier Classifier Possibly a Possibly a Face 1 1 2 2 N N face face face face Window Non-face Non-face Non-face • Reject easy non-objects using simpler and faster classifiers

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend