Pixel-Level Im Image Understanding wit ith Semantic Segmentation - - PowerPoint PPT Presentation

pixel level im image understanding wit ith semantic
SMART_READER_LITE
LIVE PREVIEW

Pixel-Level Im Image Understanding wit ith Semantic Segmentation - - PowerPoint PPT Presentation

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation Hengshuang Zhao The Chinese University of Hong Kong May 29, 2019 Part I: I: Semantic Segmentation Semantic Segmentation background car person


slide-1
SLIDE 1

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation

Hengshuang Zhao The Chinese University of Hong Kong May 29, 2019

slide-2
SLIDE 2

Part I: I: Semantic Segmentation

slide-3
SLIDE 3

Semantic Segmentation

Original Image Per-Pixel Annotation person horse car background Images adapted from PASCAL VOC 2012 Images adapted from ADE20K

slide-4
SLIDE 4

Fully Convolutional Network

FCN [Long et al. 2015]

slide-5
SLIDE 5

Conditional Random Field

DeepLabV1 [Chen et al. 2015], DPN [Liu et al. 2015], CRF-RNN [Zheng et al. 2015]

slide-6
SLIDE 6

Encoder-Decoder

UNet [Ronneberger et al. 2015], DeconvNet [Noh et al. 2015], SegNet [Badrinarayanan et al. 2015], LRR [Ghiasi et al. 2016], RefineNet [Lin et al. 2017], FRRN [Pohlen et al. 2017]

slide-7
SLIDE 7

Atrous Convolution / Dilated Convolution

DeepLabV1 [Chen et al. 2015], Dilation [Fisher et al. 2016]

slide-8
SLIDE 8

Context Aggregation

Pooling: ParseNet [Liu et al. 2015], PSPNet [Zhao et al. 2017], DeepLabV2 [Chen et al. 2016] Large Kernel: GCN [Peng et al. 2017]

slide-9
SLIDE 9

Neural Architecture Search

Search for backbone: Auto-DeepLab [Liu et al. 2019] Search for head: DPC [Chen et al. 2018]

slide-10
SLIDE 10

Attention Mechanism

Spatial attention (dot product): Transformer [Vaswani et al. 2017], Non-Local-Net [Wang et al. 2018] OCNet [Yuan et al. 2018], DANet [Fu et al. 2018], CCNet [Huang et al. 2018] Channel reweighting: SENet [Hu et al. 2018], EncNet [Zhang et al. 2018], DFN [Yu et al. 2018]

slide-11
SLIDE 11

Point-wise Spatial Attention Network (PSANet)

  • Conv & Dilated Conv: Fixed grid, information flow restricted inside local regions
  • Pooling Operation: Fixed weights at each position with none adaptively manner
  • Feature Correlation: Relative position information ignored
  • Point-wise Spatial Attention:
  • Long-range context aggregation for dense prediction
  • Bi-direction information propagation
  • Self-adaptively learned and location-sensitive masks
slide-12
SLIDE 12

Point-wise Spatial Attention Network

slide-13
SLIDE 13

Point-wise Spatial Attention Network

Information collection branch Information distribution branch

Over-completed Compact

slide-14
SLIDE 14

Point-wise Spatial Attention Network

Information collection branch Information distribution branch

Over-completed Compact

feature fusion: local & global

slide-15
SLIDE 15

Attention Mask Generation

slide-16
SLIDE 16

Incorporation with FCN

slide-17
SLIDE 17

Result on ADE20K and VOC 2012

ADE20K: information aggregation approaches ADE20K: result on val set PSACAL VOC 2012:result on val set PSACAL VOC 2012: result on val set

slide-18
SLIDE 18

Result on Cityscapes

result on val set result on test set (train with fine set) result on test set (train with fine+coarse set)

slide-19
SLIDE 19

Visual Prediction on ADE20K

slide-20
SLIDE 20

Visual Prediction on VOC 2012

slide-21
SLIDE 21

Visual Prediction on Cityscapes

slide-22
SLIDE 22

Mask Visualization

slide-23
SLIDE 23

Part II: II: Panoptic Segmentation

slide-24
SLIDE 24

Semantic Segmentation

semantic segmentation: instances indistinguishable

slide-25
SLIDE 25

Instance Segmentation

instance segmentation: stuff unsolved

slide-26
SLIDE 26

Panoptic Segmentation

panoptic segmentation: stuff and things are solved, instances distinguishable

slide-27
SLIDE 27

Heuristic Combination

Mask R-CNN [He et al. 2017] PSPNet [Zhao et al. 2017]

Instance Semantic redundant computation for independent models

slide-28
SLIDE 28

Heuristic Combination

Mask R-CNN [He et al. 2017] PSPNet [Zhao et al. 2017]

Instance Semantic

Heuristic Merge

heuristic merge logic is not end-to-end trainable

slide-29
SLIDE 29

heuristic combination

slide-30
SLIDE 30
  • ur end-to-end output
slide-31
SLIDE 31

Unified Panoptic Segmentation Network (UPSNet)

Unified Backbone Network Save Computation! Pixel-wise Classification Consistent Estimation!

slide-32
SLIDE 32

Semantic & Instance Head

Semantic Head: FPN with Deformable Conv Instance Head: Same as Mask-RCNN

slide-33
SLIDE 33

Panoptic Head

Mask logits from Instance head

𝑍

𝑗

resize/pad

𝑌thing

Thing & Stuff logits from Semantic head

𝑌mask𝑗

𝑂inst

H x W 𝑌stuff

𝑂stuff

H x W Panoptic logits

max max 1

Logits for Unknown

slide-34
SLIDE 34

Performance Comparison

160 165 170 175 180 185 190 41.4 41.6 41.8 42 42.2 42.4 42.6 Results on COCO (800 x 1300) 200 400 600 800 1000 1200 57 57.5 58 58.5 59 59.5 Results on Cityscapes (1024 x 2048) UPSNet MR-CNN-PSP UPSNet MR-CNN-PSP

slide-35
SLIDE 35

Detailed Result

result on COCO result on Cityscapes result on internal data run time comparison

slide-36
SLIDE 36

Visual Prediction

result on COCO result on Cityscapes

slide-37
SLIDE 37

Code Resource

I. Semantic Segmentation:

  • Caffe:
  • https://github.com/hszhao/PSPNet
  • https://github.com/hszhao/PSANet
  • https://github.com/hszhao/ICNet
  • PyTorch:
  • https://github.com/hszhao/semseg (new)
  • highly optimized codebase with better reimplementation results

II. Panoptic Segmentation:

  • PyTorch:
  • https://github.com/uber-research/UPSNet
  • the first open sourced codebase for unified end-to-end panoptic segmentation
slide-38
SLIDE 38

Remain Problem

I. Semantic Segmentation:

  • imbalance classes: long-tail distribution
  • confusion classes: using human’s confusion matrix (e.g., ade20k) as prior
  • data augmentation: adaptive augmentation or auto augmentation
  • hard mining: effective while not elegant
  • robustness and generalization: one model for different datasets
  • accuracy and efficiency: can both be achieved?

II. Panoptic Segmentation:

  • introduce parameters into panoptic head (e.g., 3d Conv)
  • new frameworks with a single panoptic head
slide-39
SLIDE 39

Thanks!