Context Encoding for Semantic Segmentation CVPR 2018, Salt Lake - - PowerPoint PPT Presentation

context encoding for semantic segmentation
SMART_READER_LITE
LIVE PREVIEW

Context Encoding for Semantic Segmentation CVPR 2018, Salt Lake - - PowerPoint PPT Presentation

Context Encoding for Semantic Segmentation CVPR 2018, Salt Lake City Hang Zhang 1,2 , Kristin Dana 1 , Jianping Shi 3 , Zhongyue Zhang 2 , Xiaogang Wang 4 , Ambrish Tyagi 2 , and Amit Agrawal 2 1 Rutgers University, 2 Amzon Inc, 3 Sensetime, 4


slide-1
SLIDE 1

Context Encoding for Semantic Segmentation (EncNet)

Context Encoding for Semantic Segmentation

Hang Zhang1,2, Kristin Dana1, Jianping Shi3, Zhongyue Zhang2, Xiaogang Wang4, Ambrish Tyagi2, and Amit Agrawal2

CVPR 2018, Salt Lake City

1Rutgers University, 2Amzon Inc, 3Sensetime, 4CUHK

slide-2
SLIDE 2

Context Encoding for Semantic Segmentation (EncNet)

Semantic Segmentation

Hang Zhang 2

  • Per-pixel predictions of
  • bject categories
  • A comprehensive scene

description (object category, location and shape) Examples from ADE20K Dataset.

slide-3
SLIDE 3

Context Encoding for Semantic Segmentation (EncNet)

Fully Convolutional Network [1] (FCN)

Hang Zhang 3

1Jonathan Long, Evan Shelhamer, & Trevor Darrell. “Fully Convolutional Networks for Semantic Segmentation”.CVPR2015

  • Meta algorithm for Semantic

Segmentation

  • Pre-trained CNN + Decoder
  • Translation equivariant

Figure credit: Long et al.

slide-4
SLIDE 4

Context Encoding for Semantic Segmentation (EncNet)

Difficulties in Predicting Categories and Shapes

Hang Zhang 4

2Chen et al. “Rethinking Atrous Convolution for Semantic Image Segmentation”. arXiv 2015 3Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions." ICLR 2016 4Zheng, Shuai, et al. "Conditional random fields as recurrent neural networks. ICCV 2015 5Badrinarayanan, Vijay, Alex Kendall, and Roberto Cipolla. "Segnet: A deep convolutional encoder-decoder architecture for image

segmentation.”

6Pohlen, Tobias, et al. "Full-resolution residual networks for semantic segmentation in street scenes.” CVPR 2017

  • Work refining shapes/boundaries:
  • Dilated/Atrous Convolution [2,3]
  • CRF Post-processing [4]
  • Adding Lateral/Skip Connections [5]
  • Enlarging Spatial Resolution [6]
  • Difficult to identifying categories
slide-5
SLIDE 5

Context Encoding for Semantic Segmentation (EncNet)

Challenges in Understanding Context

Hang Zhang 5

FCN results on ADE20K Dataset. (ResNet 50, stride 8)

slide-6
SLIDE 6

Context Encoding for Semantic Segmentation (EncNet)

Increasing Receptive Field?

Using pyramid representations

  • PSPNet [7]

Spatial Pyramid Pooling

  • DeepLab-v3 [8]

large rate Dilated/Atrous convolutions

“Is capturing contextual information the same as increasing the receptive-field size?“

Hang Zhang 6

7Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia. “Pyramid Scene Parsing Network”. CVPR 2017. 8Chen et al. “Rethinking Atrous Convolution for Semantic Image Segmentation”. arXiv 2017.

Figure credit: Zhao et al.

slide-7
SLIDE 7

Context Encoding for Semantic Segmentation (EncNet)

Labeling an Image

Hang Zhang 7

Consider labeling a new image for ADE20K dataset with 150 categories.

Scene Context:

slide-8
SLIDE 8

Context Encoding for Semantic Segmentation (EncNet)

Design a “Labeling Tool” for CNN

Hang Zhang 8

  • Scene Context
  • Narrowing the list of

probable categories Examples from ADE20K Dataset.

slide-9
SLIDE 9

Context Encoding for Semantic Segmentation (EncNet)

Capturing Contextual Info in Computer Vision

Hang Zhang 9

Feature extraction Dictionary Learning

Encoding

Classifier

  • BoWs, VQ or VLAD

9Hang Zhang, Jia Xue, Kristin Dana. “Deep TEN: Texture Encoding Network”. CVPR2017 CNN

Dictionary Residuals Assign Aggregate

Encoding-Layer

Code available on GitHub

slide-10
SLIDE 10

Context Encoding for Semantic Segmentation (EncNet)

Context Encoding

  • Encoding Layer [9]
  • Considers ! ∈ ℝ$×&×' as a set of (-

dimensional features ! = *+, … *. , where / = 0×1

Hang Zhang 10

  • Leans a codebook 2 = {4+, … 45}, smoothing factors 7 =

{8+, … 85}

  • Outputs the residual encoder 9: = ∑<=+

.

9<:: 9<: = exp(−8: C

<: D)

∑F=+

5

exp(−8

F‖

H C

<F D)

C

<:

Where the residuals are given by C

<: = *< − 4:.

9Hang Zhang, Jia Xue, Kristin Dana. “Deep TEN: Texture Encoding Network”. CVPR2017

slide-11
SLIDE 11

Context Encoding for Semantic Segmentation (EncNet)

Context Encoding Network (EncNet)

Hang Zhang 11

FC CONV

Encode

Context Encoding Module

C H W C11

  • CNN

Notation: FC fully connected layer, Conv convolutional layer, Encode Encoding Layer9, ⨂ channel-wise multiplication

FC

SE-loss

sidewalk

9Hang Zhang, Jia Xue, Kristin Dana. “Deep TEN: Texture Encoding Network”. CVPR2017

slide-12
SLIDE 12

Context Encoding for Semantic Segmentation (EncNet) Hang Zhang 12

  • ResNet with Dilation Strategy (stride 8)
  • Synchronize Cross-GPU Batch Normalization[10]

(SyncBN)

10Ioffe and Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift.” ICML. 2015.

Network Training of EncNet

!" #" “Sync Once” Cross GPU BN implementation !$ #$ !% #% !& #& ! = {!", !$, !%, !&} # = {#", #$, #%, #&}

GPU 1 GPU 2 GPU 3 GPU 4

∑!, ∑!,

$

slide-13
SLIDE 13

Context Encoding for Semantic Segmentation (EncNet)

Ablation Study of EncNet on PASCAL Context

Hang Zhang 13

Semantic segmentation results on PASCAL-Context

  • dataset. (mIoU on 59 classes w/o background)

mIoU and pixAcc as a function of SE- loss weight !.

slide-14
SLIDE 14

Context Encoding for Semantic Segmentation (EncNet)

EncNet Results on PASCAL Context

Hang Zhang 14

Segmentation results on PASCAL- Context dataset. (mIoU on 60 classes w/ background)

slide-15
SLIDE 15

Context Encoding for Semantic Segmentation (EncNet)

EncNet Results on PASCAL VOC 2012

Hang Zhang 15

Results on PASCAL VOC 2012, showing per- class IoU on first 5 categories. Results on PASCAL VOC 2012 with COCO pre-training, showing per-class IoU on first 5 categories.

[11] http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=6

slide-16
SLIDE 16

Context Encoding for Semantic Segmentation (EncNet)

EncNet Results on ADE20K

Hang Zhang 16

Results on ADE20K validation set.

Results on ADE20K test set, ranks in COCO-Place challenge 2017. Our single model surpass the winning entry of the COCO-Place challenge and PSPNet-269 (1st place in 2016).

[12] Leaderboard at http://sceneparsing.csail.mit.edu/

slide-17
SLIDE 17

Context Encoding for Semantic Segmentation (EncNet)

Visual Examples of EncNet in ADE20K

Hang Zhang 17

slide-18
SLIDE 18

Context Encoding for Semantic Segmentation (EncNet)

Conclusion

  • Context Encoding Module with EncNet
  • straightforward, light-weight
  • compatible with FCN based approaches
  • Superior performance on gold-standard benchmarks.
  • The complete systems are publicly available (including SyncBN)
  • Source training/evaluation code and pretrained models

https://github.com/zhanghang1989/PyTorch-Encoding

  • Poster #A5

Hang Zhang 18

The authors would like to thank Sean Liu from Amazon Lab 126, Sheng Zha and Mu Li from Amazon AI for helpful discussions and comments. We thank Amazon Web Service (AWS) for providing free EC2 access.

slide-19
SLIDE 19

Context Encoding for Semantic Segmentation (EncNet) Hang Zhang 19

More EncNet Examples on ADE20K Dataset

slide-20
SLIDE 20

Context Encoding for Semantic Segmentation (EncNet)

More EncNet Examples on ADE20K Dataset

Hang Zhang 20

slide-21
SLIDE 21

Context Encoding for Semantic Segmentation (EncNet) Hang Zhang 21

More EncNet Examples on ADE20K Dataset

slide-22
SLIDE 22

Context Encoding for Semantic Segmentation (EncNet) Hang Zhang 22

More EncNet Examples on ADE20K Dataset

slide-23
SLIDE 23

Context Encoding for Semantic Segmentation (EncNet)

Conclusion

  • Context Encoding Module with EncNet
  • straightforward, light-weight
  • compatible with FCN based approaches
  • Superior performance on gold-standard benchmarks.
  • The complete systems are publicly available (including SyncBN)
  • Source training/evaluation code and pretrained models

https://github.com/zhanghang1989/PyTorch-Encoding

  • Poster #A5

Hang Zhang 23

The authors would like to thank Sean Liu from Amazon Lab 126, Sheng Zha and Mu Li from Amazon AI for helpful discussions and comments. We thank Amazon Web Service (AWS) for providing free EC2 access.

slide-24
SLIDE 24

Context Encoding for Semantic Segmentation (EncNet)

Prior Work in Featuremap Attention

  • Spatial Attention: Spatial Transformer Network
  • Channel-wise manipulation:
  • AdaIN or MSG-Net in style transfer
  • SE-Net
  • Relations and Differences with SE-Net:
  • Semantic Encoding, an explicit representations for global context
  • EncNet directly highlight the class-dependent feature.

Hang Zhang 24

slide-25
SLIDE 25

Context Encoding for Semantic Segmentation (EncNet)

EncNet Experiments on CIFAR-10

Comparison of model depth, number of parameters, test errors (%) on CIFAR-10.

Hang Zhang 25

Train and validation curves of EncNet- 32k64d and the baseline Se-ResNet- 64d on CIFAR-10 dataset.

slide-26
SLIDE 26

Context Encoding for Semantic Segmentation (EncNet)

Context Encoding

  • Featuremap Attention
  • FC on encoded semantics, outputs scaling factors ! = #(%&),

where % is the layer weight and # is sigmoid function.

  • Channel-wise multiplication ( = )⨂!

Hang Zhang 26

  • Encoding Layer [9]
  • Outputs the residual encoder

as encoded semantics e = ∑-./ 1(&-)

9Hang Zhang, Jia Xue, Kristin Dana. “Deep TEN: Texture Encoding Network”. CVPR2017

slide-27
SLIDE 27

Context Encoding for Semantic Segmentation (EncNet) Hang Zhang 27

x ! "# y Batch Normalization [5] in training mode.

$ℓ $& $ℓ $'

$ℓ $()

$ℓ $*

forward backward

Standard BN and Data Parallelism

5Ioffe and Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift.” ICML. 2015.

+, !, σ, ., Standard BN with data parallel implementation. +# !# σ# .# +/ !/ σ/ ./ +0 !0 σ0 .0 + = {+,, +#, +/, +0} . = {.,, .#, ./, .0}

GPU 1 GPU 2 GPU 3 GPU 4

slide-28
SLIDE 28

Context Encoding for Semantic Segmentation (EncNet)

  • ! = ∑$%

&

  • ' =

∑ $%() * &

+ ,

Hang Zhang 28

“Sync twice” Implementation[6,7]

  • .

/.

/0

  • 1

/1

  • 2

/2

GPU 1 GPU 2 GPU 3 GPU 4

! '0

3. 30 31 32

Cross-GPU Batch Norm (“Sync twice”)

6Peng, Chao, et al. "MegDet: A Large Mini-Batch Object Detector.” CVPR2018 7Liu, Shu, et al. "Path Aggregation Network for Instance Segmentation.” CVPR2018

slide-29
SLIDE 29

Context Encoding for Semantic Segmentation (EncNet)

  • ! = ∑$%

&

  • ' =

∑ $%() * &

+ , =

∑$%

*

& − !. + ,

=

∑$%

*

& − ∑$% * &*

+ ,

Hang Zhang 29

/0 10 Our “Sync Once” implementation /. 1. /2 12 /3 13 / = {/0, /., /2, /3} 1 = {10, 1., 12, 13}

GPU 1 GPU 2 GPU 3 GPU 4

∑/7 ∑/7

.

Cross-GPU Batch Norm (“Sync once”)

6Peng, Chao, et al. "MegDet: A Large Mini-Batch Object Detector.” CVPR2018 7Liu, Shu, et al. "Path Aggregation Network for Instance Segmentation.” CVPR2018

slide-30
SLIDE 30

Context Encoding for Semantic Segmentation (EncNet) Hang Zhang 30

More EncNet Examples on ADE20K Dataset

slide-31
SLIDE 31

Context Encoding for Semantic Segmentation (EncNet) Hang Zhang 31

More EncNet Examples on ADE20K Dataset

slide-32
SLIDE 32

Context Encoding for Semantic Segmentation (EncNet)

Failure Examples of EncNet

Hang Zhang 32

slide-33
SLIDE 33

Context Encoding for Semantic Segmentation (EncNet)

Highlights

  • Introduce a novel CNN architecture:
  • Context Encoding Network, EncNet ( “Ink-Net”)
  • State-of-the-art performance:
  • 85.9% mIoU on PASCAL VOC 2012, 51.7% mIoU on PASCAL Context
  • On ADE20K dataset, out single model surpass the winning entry of COCO Places

challenge 2017 (achieving a final score of 0.5567)

  • The complete system are publicly available.
  • Source code and pretrained models

Hang Zhang 33