Hypercolumns for Object Segmentation and Fine-grained Localization
Bharath Hariharan, Pablo Arbelaez, RossGirshick, Jitendra Malik Göksu Erdoğan
Hypercolumns for Object Segmentation and Fine-grained Localization - - PowerPoint PPT Presentation
Hypercolumns for Object Segmentation and Fine-grained Localization Bharath Hariharan, Pablo Arbelaez, RossGirshick, Jitendra Malik Gksu Erdoan Image Classification horse, person, building Slide credit:Bharath Hariharan Object Detection
Bharath Hariharan, Pablo Arbelaez, RossGirshick, Jitendra Malik Göksu Erdoğan
horse, person, building
Slide credit:Bharath Hariharan
Slide credit:Bharath Hariharan
Detect and segment every instanceof the categoryin the image
Girshick, and J. Malik. Simultaneous detection and
Slide credit:Bharath Hariharan
Slide credit:Bharath Hariharan
Detect and segment every instanceof the categoryin the image and labelits parts
Slide credit:Bharath Hariharan
Detect every instanceof the category in the imageand mark its keypoints
Slide credit:Bharath Hariharan
§ Task: Assigncategory labelsto imagesor boundingboxes § General Approach: Output of last layer of CNN § This is most sensitive to category-levelsemanticinformation § The informationis generalizedover in the top layer
§ Is output of last layer of CNN appropriate for finer-
§ Not optimal representation! § Last layer of CNN is mostly invariant to ‘nuisance ’ variablessuch as
pose, illumination, articulation, preciselocation…
§ Pose and nuisancevariablesare preciselywhat we interested in.
§ How can we get such an information?
§ It is present in
§ Less sensitive
§ Top layerslose localizationinformation § Bottom layers are not semanticenough
 Simultaneous detection and segmentation
Simultaneous detection and segmentation. In ECCV, 2014
Combine subsampled intermediate layerswith top layer Difference Upsampling
Pedestrian Detectionwith Unsupervised Multi-StageFeature Learning Sermanet et. al.
PedestrianDetection
§ Start from a detection (R-CNN) § Heatmaps § Use category-specific, instance-specificinformationto… § Classifyeachpixel in detection window
Slide credit:Bharath Hariharan
SDS Does the pixel belong to the object? Part labeling Which part does the pixel belong to? Pose estimation Does it lie on/near a particular keypoint
Slide credit:Bharath Hariharan
§ Segmentation:
§ Probability that a particular locationinside the object
§ Part Labeling:
§ Separate heatmap for each part § Each heatmap is the probability a location belongs to that part
§ KeypointPrediction
§ Separate heatmap for each keypoint § Each heatmap isthe probability of the keypoint at a particular location
Slide credit:Bharath Hariharan
§ Term derived from Hubel and
Wiesel
§ Re-imaginesold ideas:
§ Jets(Koenderink and van Doorn) § Pyramids(Burt andAdelson) § Filter Banks(Malik and Perona)
Slide credit:Bharath Hariharan
§ Upsamplingfeature map F to f § feature vector for at locationi § alfa_ik: positionof i and k in the box § Concatenate features from every locationto one long vector
§ Fully connectedlayerscontribute to global instance-specificbias § Different classifierforeach locationcontribute to seperate instance-
specificbias
§ Simplest way to get locationspecificclassifier:
§ train seperate classifiersat each 50x50 locations
§ What would be the problems of this approach?
1.
Reduce amount of data for eachclassifierduringtraining
2.
Computationallyexpensive
3.
Classifiervary with locations
4.
Risk of overfitting
§ Train a coarse KxKgrid of classifiersandinterpolate between them § Interpolate grid of functionsinstead of values § Each classifierin the grid is a functiongk(.) § gk(feature vector)=probability § Score of i’th pixel
§ Interpolationis not used in train time § Divide each box to KxK grid § Training data for k’th classifieronlyconsistsof pixels from the k’th
grid cell acrossall traininginstances.
§ Train with logisticregression
Slide credit:Bharath Hariharan
§ Upsamplinglargefeature maps is expensive! § If classificationandupsamplingare linear
§ Classification o upsampling=Upsampling oclassification
§ Linear classification=1x1 convolution
§ Extension : use nxn convolution
§ Classification=convolve,upsample,sum,sigmoid
Slide credit:Bharath Hariharan
Slide credit:Bharath Hariharan
Slide credit:Bharath Hariharan
§ MCG candidatesoverlapswith ground truth by %70 or more § For eachcandidate findmost overlappedground truth instance § Crop ground truth to the expandedboundingbox of the candidate § Label locationspositiveor negative accordingto problem
§ Similar to bounding box detection metric § Box overlap=
∩
§
∪
§ If box overlap> threshold, correct
Slide credit:Bharath Hariharan
§ Similar to bounding box detection metric § But with segments instead of boundingboxes § Each detection/GT comes with a segment
segment overlap= ∩ ∪
§ If segment overlap> threshold, correct
Slide credit:Bharath Hariharan
§ System 1:
§ Refinement step with hypercolumnsrepresentation § Features
§ Top-level fc7 features § Conv4 features § Pool2 features § 1/0 according to location was inside original regioncandidate or not § Coarse 10x10 discretizationof original candidate into 100-dimensional vector
§ 10x10 grid of classifiers § Project predictionsover superpixelsand average
§ System 2: § MCG insteadof selective
search
§ Expand set of boxes by adding
nearby high-scoringboxes after NMS
Slide credit:Bharath Hariharan
Slide credit:Bharath Hariharan
§ A general framework for fine-grained localization that:
§ Leverages information from multiple CNN layers § Achieves state-of-the-art performance on SDS and part labeling and accurate results on keypoint prediction
Slide credit:Bharath Hariharan
§ applyinghypercolumnrepresentationto fine-grained tasks
§ Attribute classification § Action classification § …