Fast-SCNN: Fast Semantic Segmentation Network Rudra PK Poudel - - PowerPoint PPT Presentation

fast scnn fast semantic segmentation network
SMART_READER_LITE
LIVE PREVIEW

Fast-SCNN: Fast Semantic Segmentation Network Rudra PK Poudel - - PowerPoint PPT Presentation

Fast-SCNN: Fast Semantic Segmentation Network Rudra PK Poudel Stephan Liwicki Roberto Cipolla Cambridge Research Laboratory Toshiba Research Europe, UK BMVC 2019 R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019


slide-1
SLIDE 1

Fast-SCNN: Fast Semantic Segmentation Network

Rudra PK Poudel Stephan Liwicki Roberto Cipolla

Cambridge Research Laboratory Toshiba Research Europe, UK

BMVC 2019

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 1 / 22

slide-2
SLIDE 2

Real-time Semantic Image Segmentation

What am I seeing and where is it? Real-time perception is critical for autonomous systems

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 2 / 22

slide-3
SLIDE 3

Real-time Semantic Image Segmentation

What am I seeing and where is it? Real-time perception is critical for autonomous systems

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 2 / 22

slide-4
SLIDE 4

Motivation

Problem: SOTA models are accurate but resource hungry

Compute: floating point ops Power consumption Memory

Observations:

1

First few layers of DCNN extract low-level features (Zeiler et al., 2014)

2

Larger receptive field (context) is important for accuracy (Poudel et al., 2018)

3

Spatial details is necessary to preserve boundary (Shelhamer et al. 2016)

4

SOTA efficient models adapt multi-resolution and multi-branch architecture

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 3 / 22

slide-5
SLIDE 5

Motivation: First Few Layers Learn Low-level Features

Zeiler et al., ECCV 2014

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 4 / 22

slide-6
SLIDE 6

Motivation: Importance of Larger Receptive Field

+

Deep Network for Context Shallow Network for Spatial Detail

+

Convolution Block Bottleneck Residual Block Depth-wise Separable Convolution Block Feature Fusion Unit

ContextNet (Poudel et al., BMVC 2018)

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 5 / 22

slide-7
SLIDE 7

Motivation: Importance of Spatial Details

copy and crop input image tile

  • utput

segmentation map

64 1 128 256 512 1024

max pool 2x2 up-conv 2x2 conv 3x3, ReLU

572 x 572 284² 64 128 256 512 570 x 570 568 x 568 282² 280² 140² 138² 136² 68² 66² 64² 32² 28² 56² 54² 52² 512 104² 102² 100² 200² 30² 198² 196² 392 x 392 390 x 390 388 x 388 388 x 388 1024 512 256 256 128 64 128 64 2

conv 1x1

U-Net (Ronneberger et al., MICCAI 2015)

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 6 / 22

slide-8
SLIDE 8

Motivation: Efficient Multi-resolution Architectures

ICNet (Zhao et el., ECCV 2018).

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 7 / 22

slide-9
SLIDE 9

Motivation

Problem: SOTA models are accurate but resource hungry

Compute: floating point ops Power consumption Memory

Observations:

1

First few layers of DCNN extract low-level features (Zeiler et al., 2014)

2

Larger receptive field (context) is important for accuracy (Poudel et al., 2018)

3

Spatial details is necessary to preserve boundary (Shelhamer et al. 2016)

4

SOTA efficient models adapt multi-resolution and multi-branch architecture

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 8 / 22

slide-10
SLIDE 10

Proposed Model: Overview

Hypothesis: jointly learn the low level features of multi-branch networks to increase the model efficiency.

Fast-SCNN

Learning to Down-sample jointly learns the low level features

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 9 / 22

slide-11
SLIDE 11

Proposed Model: Learning to Down-sample

Learning to Down-sample sharing computation of multi-resolution branches improves efficiency No need for multiple resizes and memory copies of the original input

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 10 / 22

slide-12
SLIDE 12

Proposed Model: Larger Receptive Field

Going deeper with convnet Fast-SCNN can be reduced to convnet Early sub-sampling/max-pooling layers increase receptive field and efficiency

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 11 / 22

slide-13
SLIDE 13

Proposed Model: Skip-Connection

Spatial details skip-connection helps to recover boundary information We preferred simple feature fusion module i.e. addition only

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 12 / 22

slide-14
SLIDE 14

Proposed Model: Fast-SCNN

Deeper path at low resolution captures global context information Shallow path focuses on high resolution segmentation details No need to learn low-level features separately Quantization, network pruning and other techniques are also applicable

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 13 / 22

slide-15
SLIDE 15

Proposed Model: Qualitative Validation

Input image Skip-Connection: No Skip-Connection: Yes

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 14 / 22

slide-16
SLIDE 16

Proposed Model: Qualitative Validation

Input image Skip-Connection: No Skip-Connection: Yes

❅ ❅ ❘ ❅ ❅ ❘

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 14 / 22

slide-17
SLIDE 17

Fast-SCNN: Quantitative Evaluation

0% 20% 40% 60% 80% 100% 15 25 50 75 100 125

Accuracy (% mIoU - class) Runtime (fps - 2 MP images)

Real-time ENet ICNet ERFNet BiSeNet* ContextNet Fast-SCNN*

Our Fast-SCNN Other Methods

∗ Nvidia Titan Xp (Pascal); Others Nvidia Titan X (Maxwell)

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 15 / 22

slide-18
SLIDE 18

Fast-SCNN: Quantitative Evaluation

Fast-SCNN balances accuracy and speed

Class mIoU% Category mIoU% Params in Millions FPS on 1024x2048 SegNet 56.1 79.8 29.46 1.6 ENet 58.3 80.4 0.37 20.4 ICNet 69.5

  • 6.68

30.3 ERFNet 68.0 86.5 2.1 11.2 ContextNet 66.1 82.7 0.85 41.9 BiSeNet* 71.4

  • 5.8

57.3 GUN* 70.4

  • 33.3

Fast-SCNN* 68.0 84.7 1.11 123.5 ∗ Nvidia Titan Xp (Pascal); Others Nvidia Titan X (Maxwell)

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 16 / 22

slide-19
SLIDE 19

Fast-SCNN: Input Size Variation

Fast-SCNN is efficient on smaller as well as larger scale input sizes

Input Size Class mIoU% Frame-Per-Second 1024 × 2048 68.0 123.5 512 × 1024 62.8 285.8 256 × 512 51.9 485.4

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 17 / 22

slide-20
SLIDE 20

Is ImageNet Pre-Training is Necessary?

Total number of gradient updates is important At least in validation and test sets ImageNet pre-training is not important! Similar finding on Rethinking ImageNet Pre-training by He et al. (ICCV 2019)

Model Class mIoU% Fast-SCNN 68.62 Fast-SCNN + ImageNet 69.15 Fast-SCNN + Coarse 69.22 Fast-SCNN + Coarse + ImageNet 69.19

200 400 600 800 1000

Epochs

20 40 60 80

Accuracy

Fast-SCNN Fast-SCNN + ImageNet Fast-SCNN + Coarse Fast-SCNN + Coarse + ImageNet

1 2 3 4 5 6 7

Iterations

105 20 40 60 80

Accuracy

Fast-SCNN Fast-SCNN + ImageNet Fast-SCNN + Coarse Fast-SCNN + Coarse + ImageNet

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 18 / 22

slide-21
SLIDE 21

Fast-SCNN: Qualitative Evaluation

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 19 / 22

slide-22
SLIDE 22

Conclusion

Fast-SCNN is

memory, computation and power efficient twice as fast as other state-of-the-art models above real-time i.e. 123.5 fps on 1024×2048 images efficient and competitive on smaller as well as larger scale input sizes

We have shown accuracy without ImageNet pre-training is comparable Limitations: accuracy gap with bigger off-line models Future work: apply to depth estimation and instance segmentation

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 20 / 22

slide-23
SLIDE 23

References

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S. and Schiele, B., The Cityscapes Dataset for Semantic Urban Scene Understanding. In CVPR, 2016. He, K., Girshick, R., Dollár, P . Rethinking ImageNet Pre-training. In arXiv:1811.08883, 2018. Poudel, R. P . K., Bonde, U., Liwicki, S., Zach, C., ContextNet: Exploring Context and Detai,l for Semantic Segmentation in Real-time. In BMVC, 2018. Ronneberger, O. and Fischer, P . and Brox, T., U-Net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015. Shelhamer, E. and Long, J. and Darrell, T., Fully convolutional networks for semantic segmentation. In PAMI, 2016. Zeiler, M. D. and Fergus, R., Visualizing and understanding convolutional networks. In ECCV, 2014. Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J., ICNet for Real-Time Semantic Segmentation on High-Resolution Images, In ECCV 2018.

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 21 / 22

slide-24
SLIDE 24

Questions

Public implementations on PyTorch and TensorFlow are available on Github!

Thank you!

  • R. Poudel et al. (CRL)

Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 22 / 22