Deep Watershed Transform for Instance Segmentation Min Bai & - - PowerPoint PPT Presentation

deep watershed transform for instance segmentation
SMART_READER_LITE
LIVE PREVIEW

Deep Watershed Transform for Instance Segmentation Min Bai & - - PowerPoint PPT Presentation

Deep Watershed Transform for Instance Segmentation Min Bai & Raquel Urtasun To appear at IEEE CVPR 2017 in Hawaii Presented at NVIDIA GTC 2017 Semantic Segmentation Input: RGB Image Output at each pixel: Semantic label


slide-1
SLIDE 1

Deep Watershed Transform for Instance Segmentation

Min Bai & Raquel Urtasun

To appear at IEEE CVPR 2017 in Hawaii Presented at NVIDIA GTC 2017

slide-2
SLIDE 2

Semantic Segmentation

  • Input: RGB Image
  • Output at each pixel:

○ Semantic label

slide-3
SLIDE 3

Instance Segmentation

  • Input: RGB Image
  • Output at each pixel:

○ Semantic label ○ Instance label ■ Same for each px in object ■ Different among objects ○ Difficulty: How to phrase the problem?

slide-4
SLIDE 4

Applications

  • Object tracking

Image credit: Davi Frossard

slide-5
SLIDE 5

Applications

  • Interacting with the environment

Image credit: http://www.rethinkrobotics.com/build-a-bot/

slide-6
SLIDE 6

Applications

  • Useful information for other algorithms such as optical flow, etc

Image credit: Shenlong Wang

slide-7
SLIDE 7

Semantic Segmentation

  • Semantic segmentation is a well studied problem

○ Our instance segmentation method leverages an existing technique ○

  • H. Zhao et al, Pyramid Scene Parsing Network, https://arxiv.org/abs/1612.01105

Image credit: H. Zhao et al.

slide-8
SLIDE 8

Watershed Transform

  • Classical image segmentation technique

Image (left) credit: Adrian Fisher

slide-9
SLIDE 9

Scalar Field and Gradient

Image source: Wikipedia: byVivekj78 - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=15346899

  • Scalar field: single number at

each pixel

  • Gradient: vector at each pixel,

pointing toward direction of greatest ascent

slide-10
SLIDE 10

Overview of Approach

Gradient of Energy Landscape Energy Landscape Predicted Instances Input Image Semantic Segmentation

slide-11
SLIDE 11

Overview of Approach

Gradient of Energy Landscape Energy Landscape Predicted Instances Input Image Semantic Segmentation

slide-12
SLIDE 12

Why Predict Direction First?

Input Image Energy Landscape Direction of Gradient

Much sharper difference in the direction label at the boundary!

slide-13
SLIDE 13

Overall Network

slide-14
SLIDE 14

Direction Prediction Network

Ground Truth Directions Predicted Directions Input Image Semantic Segmentation

slide-15
SLIDE 15

Energy Prediction Network

Ground Truth Energy Predicted Energy Ground Truth Instances Predicted Instances

slide-16
SLIDE 16

Training and Inference

  • Pre-train both networks
  • End-to-end fine-tuning
  • Network trained on NVIDIA DGX-1

○ Approximately 25 hours total for training on one GP100 core ○ ~0.1s per image for forward pass ○ Thank you NVIDIA for the generous gift!

Image source: www.nvidia.com

slide-17
SLIDE 17

Cityscapes Dataset

  • 2975 training / 500 validation / 1525 testing images
  • Instances: car, truck, bus, train, person, rider, motorcycle, bicycle
slide-18
SLIDE 18

Cityscapes Dataset

  • 2975 training / 500 validation / 1525 testing images
  • Instances: car, truck, bus, train, person, rider, motorcycle, bicycle
slide-19
SLIDE 19

Cityscapes Instance Segmentation Leaderboard

* Average Precision (AP): higher is better AP* AP* @ 50% AP* @ 50m AP* @ 100m van den Brand et al. 2.3% 3.7% 3.9% 4.9% Cordts et al. 4.6% 12.9% 7.7% 10.3% Uhrig et al. 8.9% 21.1% 15.3% 16.7% Ours 19.4% 35.3% 31.4% 36.8%

Recently, new approaches have achieved even higher performance.

slide-20
SLIDE 20

Sample Output

Input RGB Semantic Segmentation Direction Prediction Energy Prediction Predicted Instances Ground Truth Instances

slide-21
SLIDE 21

Sample Output

Input RGB Semantic Segmentation Direction Prediction Energy Prediction Predicted Instances Ground Truth Instances

slide-22
SLIDE 22

Sample Output

Input RGB Semantic Segmentation Direction Prediction Energy Prediction Predicted Instances Ground Truth Instances

slide-23
SLIDE 23

Sample Output

Input RGB Semantic Segmentation Direction Prediction Energy Prediction Predicted Instances Ground Truth Instances

slide-24
SLIDE 24

Sample Output

Input RGB Semantic Segmentation Direction Prediction Energy Prediction Predicted Instances Ground Truth Instances

slide-25
SLIDE 25

Sample Output

Input RGB Semantic Segmentation Direction Prediction Energy Prediction Predicted Instances Ground Truth Instances

slide-26
SLIDE 26

Preliminary TorontoCity Aerial Instance Segmentation

Input RGB Semantic Segmentation (ResNet) Predicted Building Instances

slide-27
SLIDE 27

Preliminary TorontoCity Aerial Instance Segmentation

Weighted Coverage* AP* Recall* @ 50% Precision* @ 50% FCN-8 41.92% 11.37% 21.50% 36.00% ResNet-56 40.65% 12.13% 18.90% 45.36% Ours 56.22% 21.22% 67.16% 63.67% * higher is better

slide-28
SLIDE 28

In Summary...

  • Simple technique for instance segmentation
  • Encodes object instances as energy map
  • Predicts gradient direction as intermediate task for better

supervision