Category-level localization g y Cordelia Schmid Cordelia Schmid

Recognition Recognition • • Classification Classification – Object present/absent in an image – Often presence of a significant amount of background clutter • Localization / Detection – Localize object within the frame – Bounding box or pixel- level segmentation

Pixel-level object classification Pixel level object classification

Difficulties Difficulties • Intra-class variations Intra class variations • Scale and viewpoint change • Multiple aspects of categories

Approaches Approaches • Intra-class variation Intra class variation => Modeling of the variations, mainly by learning from a large dataset for example by SVMs large dataset, for example by SVMs • Scale + limited viewpoints changes • Scale + limited viewpoints changes => multi-scale approach or invariant local features • Multiple aspects of categories => separate detectors for each aspect, front/profile face, > separate detectors for each aspect front/profile face build an approximate 3D “category” model

Outline 1. Sliding window detectors S 2. Features and adding spatial information g p 3. Histogram of Oriented Gradients (HOG) 4. State of the art algorithms and PASCAL VOC

Sliding window detector • Basic component: binary classifier Car/non-car Classifier Yes, No, not a car a car t

Sliding window detector • Detect objects in clutter by search Car/non-car Classifier • Sliding window : exhaustive search over position and scale

Detection by Classification • Detect objects in clutter by search Car/non-car Classifier • Sliding window : exhaustive search over position and scale Sliding window : exhaustive search over position and scale (can use same size window over a spatial pyramid of images)

Feature Extraction Classification Detection Does the image contain a car? Does the image contain a car? • Classification: Unknown location + clutter ) lots of invariance • Detection: Uncluttered, normalized image ) more “detail”

Window (Image) Classification Training Data  Feature Classifier    Extraction   Car/Non-car • Features usually engineered • Classifier learnt from data

Problems with sliding windows … • aspect ratio • granularity (finite grid) • granularity (finite grid) • partial occlusion • multiple responses

Outline 1. Sliding window detectors S 2. Features and adding spatial information g p 3. Histogram of Oriented Gradients (HOG) 4. State of the art algorithms and PASCAL VOC

BOW + Spatial pyramids Start from BoW for region of interest (ROI) • no spatial information recorded no spatial information recorded • sliding window detector B Bag of Words f W d     Feature Vector      

Adding Spatial Information to Bag of Words Bag of Words C Concatenate t t         Feature Vector     Keeps fixed length feature vector for a window

Spatial Pyramid – represent correspondence  1 BoW      4 BoW          16 BoW 16 BoW      

Dense Visual Words • Why extract only sparse image fragments? fragments? • Good where lots of invariance is needed, but not relevant to sliding window detection? • Extract dense visual words on an overlapping grid Quantize   Word    Patch / SIFT • More “detail” at the expense of invariance

Outline 1. Sliding window detectors S 2. Features and adding spatial information g p 3. Histogram of Oriented Gradients + linear SVM classifier 4. State of the art algorithms and PASCAL VOC

Feature: Histogram of Oriented Gradients (HOG) Gradients (HOG) dominant HOG image direction ency • tile 64 x 128 pixel window into 8 x 8 pixel cells tile 64 x 128 pixel window into 8 x 8 pixel cells freque • each cell represented by histogram over 8 orientation bins (i.e. angles in range 0-180 degrees) orientation orientation

Histogram of Oriented Gradients (HOG) continued • Adds a second level of overlapping spatial bins re • Adds a second level of overlapping spatial bins re- normalizing orientation histograms over a larger spatial area • Feature vector dimension (approx) = 16 x 8 (for tiling) x 8 (orientations) x 4 (for blocks) = 4096 (orientations) x 4 (for blocks) 4096

Window (Image) Classification Training Data  Feature Classifier    Extraction   pedestrian/Non-pedestrian • HOG Features • Linear SVM classifier

Averaged examples

Dalal and Triggs, CVPR 2005

positive training data average over g p f( x )  w T x  b Learned model

Training a sliding window detector g g • • Unlike training an image classifier there are a (virtually) Unlike training an image classifier, there are a (virtually) infinite number of possible negative windows • Training (learning) generally proceeds in three distinct Training (learning) generally proceeds in three distinct stages: 1. Bootstrapping: learn an initial window classifier from 1 B i l i i i l i d l ifi f positives and random negatives 2. Hard negatives: use the initial window classifier for detection on the training images (inference) and identify false positives with a high score false positives with a high score 3. Retraining: use the hard negatives as additional t training data i i d t

Training a sliding window detector • Object detection is inherently asymmetric: much more “non-object” than “object” data non object than object data • Classifier needs to have very low false positive rate • Non-object category is very complex – need lots of data • Non-object category is very complex – need lots of data

Bootstrapping 1. Pick negative training set at random set at random 2. Train classifier 3 3. Run on training data Run on training data 4. Add false positives to training set training set 5. Repeat from 2 • Collect a finite but diverse set of non-object windows • Force classifier to concentrate on hard negative examples • For some classifiers can ensure equivalence to training on For some classifiers can ensure equivalence to training on entire data set

Example: train an upper body detector – Training data – used for training and validation sets • 33 Hollywood2 training movies 33 Hollywood2 training movies • 1122 frames with upper bodies marked – First stage training (bootstrapping) • 1607 upper body annotations jittered to 32k positive samples • • 55k negatives sampled from the same set of frames 55k negatives sampled from the same set of frames – Second stage training (retraining) • 150k hard negatives found in the training data

Training data Training data – positive annotations positive annotations

Positive windows Note: common size and alignment

Jittered positives

Random negatives

Window (Image) first stage classification Linear SVM  HOG Feature HOG Feature Jittered positives Jittered positives  Classifier  Extraction  random negatives f( x )  w T x  b  x • find high scoring false positives detections find high scoring false positives detections • these are the hard negatives for the next round of training • these are the hard negatives for the next round of training • cost = # training images x inference on each image cost = # training images x inference on each image

Hard negatives

First stage performance on validation set

Precision – Recall curve correct returned windows windows windows windows • Precision: % of returned windows that are correct are correct • Recall: % of correct windows that are • Recall: % of correct windows that are returned all windows 1 0.8 classifier score decreasing 0.6 on precisio 0.4 0 2 0.2 0 0 0.2 0.4 0.6 0.8 1 recall

Effects of retraining

Side by side before retraining after retraining

Accelerating Sliding Window Search • Sliding window search is slow because so many windows are needed e g x × y × scale ≈ 100 000 for a 320×240 image needed e.g. x × y × scale 100,000 for a 320×240 image • Most windows are clearly not the object class of interest • Can we speed up the search?

Cascaded Classification • Build a sequence of classifiers with increasing complexity More complex, slower, lower false positive rate Classifier Classifier Classifier Possibly a Possibly a Face 1 1 2 2 N N face face face face Window Non-face Non-face Non-face • Reject easy non-objects using simpler and faster classifiers

Category-level localization g y Cordelia Schmid Cordelia Schmid - PowerPoint PPT Presentation

Category-level localization g y Cordelia Schmid Cordelia Schmid Recognition Recognition Classification Classification Object present/absent in an image Often presence of a significant amount of background clutter

Category-level localization Cordelia Schmid Category-level localization Localization of

Category-level localization Cordelia Schmid Category-level localization Localization up to a

Localization Nischal K N System Overview Mapping Hector Mapping Localization Path Planning

Category-level localization Cordelia Schmid Recognition Classification Object

Category-level localization Cordelia Schmid Recognition Classification Object

Category-level localization Cordelia Schmid Recognition Classification Object

The SAT 2005 Competition Industrial category Certified UNSAT Special track Fourth Edition Non

Category Change Print Docs Communications 1 Category Change Cert Action S creen

Bag-of-features models for category classification for category classification Cordelia Schmid

Category Management Coalition for Government Procurement April 10, 2014 Category Management

Bag-of-features for category classification for category classification Cordelia Schmid

Anderson Localization Alaska Subedi April 24, 2008 Alaska Subedi Anderson Localization

Lecture 18: Localization Lecture 18: Localization algorithms algorithms Mythili Vutukuru CS

E. Elnahrawy, X. Li, and R. Martin Rutgers U. WLAN-Based Localization Localization in

Localization in Sensor Networks Rahul Jain ETH Z urich May 5, 2010 Rahul Jain Localization

Robot Localization Localization Robot and and Kalman Filters Filters Kalman Rudy Negenborn

Reynolds Parametricity Patricia Johann Appalachian State University cs.appstate.edu/

PROGR OGRAMMING NG IN N HA HASKE KELL LL Chapter 2 - First Steps 0 Glasgow Haskell

HOW TO AGGREGATE THE CL SIGNATURE SCHEME Dominique Schroeder* University of Maryland, USA

Bergey Windpower Co. Bergey Windpower Co. g y g y p p Norman, OK Norman, OK Bergey

Public-Key Cryptography Public-Key Cryptography Lecture 9 Public-Key Cryptography Lecture 9 El

Timestamp /16 at LBL, sampled 1-in-1K 2nd /16, sampled 1-in-1K Number of relays 8000 6000

Protecting TLS from Legacy Crypto http://mitls.org Karthikeyan Bhargavan + + many, many other

Security I retired slides Markus Kuhn Computer Laboratory, University of Cambridge