Deep Watershed Transform for Instance Segmentation
Min Bai & Raquel Urtasun
To appear at IEEE CVPR 2017 in Hawaii Presented at NVIDIA GTC 2017
Deep Watershed Transform for Instance Segmentation Min Bai & - - PowerPoint PPT Presentation
Deep Watershed Transform for Instance Segmentation Min Bai & Raquel Urtasun To appear at IEEE CVPR 2017 in Hawaii Presented at NVIDIA GTC 2017 Semantic Segmentation Input: RGB Image Output at each pixel: Semantic label
Min Bai & Raquel Urtasun
To appear at IEEE CVPR 2017 in Hawaii Presented at NVIDIA GTC 2017
○ Semantic label
○ Semantic label ○ Instance label ■ Same for each px in object ■ Different among objects ○ Difficulty: How to phrase the problem?
Image credit: Davi Frossard
Image credit: http://www.rethinkrobotics.com/build-a-bot/
Image credit: Shenlong Wang
○ Our instance segmentation method leverages an existing technique ○
Image credit: H. Zhao et al.
Image (left) credit: Adrian Fisher
Image source: Wikipedia: byVivekj78 - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=15346899
each pixel
pointing toward direction of greatest ascent
Gradient of Energy Landscape Energy Landscape Predicted Instances Input Image Semantic Segmentation
Gradient of Energy Landscape Energy Landscape Predicted Instances Input Image Semantic Segmentation
Input Image Energy Landscape Direction of Gradient
Much sharper difference in the direction label at the boundary!
Ground Truth Directions Predicted Directions Input Image Semantic Segmentation
Ground Truth Energy Predicted Energy Ground Truth Instances Predicted Instances
○ Approximately 25 hours total for training on one GP100 core ○ ~0.1s per image for forward pass ○ Thank you NVIDIA for the generous gift!
Image source: www.nvidia.com
* Average Precision (AP): higher is better AP* AP* @ 50% AP* @ 50m AP* @ 100m van den Brand et al. 2.3% 3.7% 3.9% 4.9% Cordts et al. 4.6% 12.9% 7.7% 10.3% Uhrig et al. 8.9% 21.1% 15.3% 16.7% Ours 19.4% 35.3% 31.4% 36.8%
Recently, new approaches have achieved even higher performance.
Input RGB Semantic Segmentation Direction Prediction Energy Prediction Predicted Instances Ground Truth Instances
Input RGB Semantic Segmentation Direction Prediction Energy Prediction Predicted Instances Ground Truth Instances
Input RGB Semantic Segmentation Direction Prediction Energy Prediction Predicted Instances Ground Truth Instances
Input RGB Semantic Segmentation Direction Prediction Energy Prediction Predicted Instances Ground Truth Instances
Input RGB Semantic Segmentation Direction Prediction Energy Prediction Predicted Instances Ground Truth Instances
Input RGB Semantic Segmentation Direction Prediction Energy Prediction Predicted Instances Ground Truth Instances
Input RGB Semantic Segmentation (ResNet) Predicted Building Instances
Weighted Coverage* AP* Recall* @ 50% Precision* @ 50% FCN-8 41.92% 11.37% 21.50% 36.00% ResNet-56 40.65% 12.13% 18.90% 45.36% Ours 56.22% 21.22% 67.16% 63.67% * higher is better
supervision