2x speedup City Dusk Rainy Tunnel Overcast Daytime Sunny - - PowerPoint PPT Presentation
2x speedup City Dusk Rainy Tunnel Overcast Daytime Sunny - - PowerPoint PPT Presentation
2x speedup City Dusk Rainy Tunnel Overcast Daytime Sunny Parking Highway Snowy Night Residential Time of Day Weather Scenes The picture can't be displayed. The picture can't be displayed. Panoptic Drivable Area Bounding Box
2x speedup
Scenes Time of Day
Residential Highway Tunnel Parking City Daytime Dusk Night Rainy Overcast Snowy Sunny
Weather
Instance Segmentation Tracking Bounding Box Tracking Panoptic Segmentation Drivable Area Lane & Tagging
Sunny City Street Daytime
Pascal COCO Mapillary Waymo Argoverse nuScenes Youtube- BB BDD100K Images 10K 328k 25K
- Videos
- 2K
113 1K 240K 100K Crowd Sourced √ √ √ x x x √ √ Diverse Weather √ √ √ √ √ √ √ √ >10 Objects per Image x √ √ √ √ √ x √ Pixel Annotation √ √ √ x x x x √ Tracking x x x √ √ √ √ √ Multitask √ √ x √ √ √ x √
Pascal COCO Mapillary Waymo Argoverse nuScenes Youtube- BB BDD100K Images 10K 328k 25K
- Videos
- 2K
113 1K 240K 100K Crowd Sourced √ √ √ x x x √ √ Diverse Weather √ √ √ √ √ √ √ √ >10 Objects per Image x √ √ √ √ √ x √ Pixel Annotation √ √ √ x x x x √ Tracking x x x √ √ √ √ √ Multitask √ √ x √ √ √ x √
Pascal COCO Mapillary Waymo Argoverse nuScenes Youtube- BB BDD100K Images 10K 328k 25K
- Videos
- 2K
113 1K 240K 100K Crowd Sourced √ √ √ x x x √ √ Diverse Weather √ √ √ √ √ √ √ √ >10 Objects per Image x √ √ √ √ √ x √ Pixel Annotation √ √ √ x x x x √ Tracking x x x √ √ √ √ √ Multitask √ √ x √ √ √ x √
8 34 318 0,92 1,64 131
40 80 120 100 200 300 KITTI MOT17 BDD100K
# Instances
103
# Labeled Frames
103
Frames
8 3 28 0,75 0,23 12,6
4 8 12 10 20 30 KITTI MOTS BDD100K
# Instances
103
# Labeled Frames
103
Instances
Quasi-Dense Instance Similarity Learning, Pang et al. ArXiv 2020
RoI Align RoI Align BBox Head BBox Head shared Frame 1 Frame 2 Backbone Backbone RPN RPN shared
Sparse GTs Quasi-Dense Samples
cls reg cls reg
Object Detection
Frame 1 Frame 2 Contrastive Learning Backbone Backbone RPN RPN RoI Align RoI Align Embedding Head Embedding Head shared
Sparse GTs Quasi-Dense Samples
shared
Instance Similarity Learning
Object Association
Bi-directional Softmax Tracklets Vanished Tracklets Backdrops Detections Embedding Extractor Embedding Extractor shared
Low Similarity Inconsistent New Object Vanished Object High Similarity
Current Frame Previous Frames
Consistent
Image Tagging Trajectory Prediction Panoptic Segmentation Object Detection Semantic Segmentation Drivable Area Lane Marking
The picture can't be displayed. The picture can't be displayed.Bounding Box Tracking Instance Segmentation Tracking Domain Adaptation
Drivable Area Lane Markings
45,4 54,4 54,5 50,4 54,1 54,2 40 45 50 55 60
10K 20K 70K
# Images
Lane ODS-F (%)
Lane marking Lane marking w/ Drivable area 64,2 71,1 71,4 64,4 71,7 72,2 60 65 70 75
10K 20K 70K
# Images
Drivable IoU (%)
Drivable area Drivable area w/ Lane marking
Image Instance Segmentation
Image Box Detection Instance Segmentation
70K Labeled Images 7K Labeled Images
21,8 40,5 20,5 24,5 45,4 21,6
10 15 20 25 30 35 40 45 50
AP AP50 AP75
Instance Segmentation
Inst-Seg Inst-Seg w/ Det
Mask Head Box Head Backbone L
- s
s
Only a subset of object instances have mask annotation
Loss
Abundant Box Annotations Limited Mask Annotations
Learning Saliency Propagation for Semi-Supervised Instance Segmentation, Zhou et al. CVPR 2020
Abundant Box Annotations Abundant boxes statistically provide knowledge of instance salient regions (part of shape)
Well-generalized pixel relation can be learned from limited masks
Pixel relation can be inferred from low-level semantics (e.g., color, texture)
Mask Head Box Head Backbone L
- s
s Loss
ShapeProp
Mo d u l e
Shap eProp m o du l e
Backbone
Box Head Mask Head
Propagating
Saliency
Activating
Saliency
Fuse
Car
Pixelwise classification
Predicted Mask
Ex istin g Instance Segmentation Framework
Instance Saliency
Shape Activation Box Detection ROI Feature
#1 #2
Normalize & Shuffle
t=0
t = 1 t = max(%, ') t = 2 Shape Activation
ROI Feature
Propagation Weights Conv Conv
GT Mask
Reconstruction Loss Reuse Conv Blocks Instance Saliency ()×+,)×H×' Propagated Features )×%×W
+ +
Significant improvements over baselines on class-wise semi-supervision setting More than 10 points of AP gain
(only a subset of classes have mask annotations)
(only a subset of images have mask annotations)
Improve both single-stage and two-stage frameworks Significant improvements over baselines on image-wise semi-supervision setting
(all instances have mask annotation)
Improve segmentation quality and generalization of existing frameworks The learned shape representation also bring gains to fully supervision setting
Instance Saliency -> Shape Activation Saliency Propagation on BDD100K
Saliency Propagation on BDD100K Instance Saliency -> Shape Activation
Class-wise Semi-supervised Instance Segmentation on COCO w/ ShapeProp wo/ ShapeProp
Image-wise Semi-supervised Instance Segmentation on BDD100K
wo/ ShapeProp
Seg Tracking Seg Tracking Frame 1 Frame 2
Seg Tracking Detection Detection Segmentation Segmentation Box Tracking Box Tracking Seg Tracking Frame 1 Frame 2
Seg Track w/ Instance Seg w/ Box Track All AP
13.0 18.7 19.7 23.3
MOTSA
30.4 33.7 40.3 41.4
Instance Segmentation Tracking Bounding Box Tracking Panoptic Segmentation Drivable Area Lane & Tagging
Sunny City Street Daytime