fast scnn fast semantic segmentation network
play

Fast-SCNN: Fast Semantic Segmentation Network Rudra PK Poudel - PowerPoint PPT Presentation

Fast-SCNN: Fast Semantic Segmentation Network Rudra PK Poudel Stephan Liwicki Roberto Cipolla Cambridge Research Laboratory Toshiba Research Europe, UK BMVC 2019 R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019


  1. Fast-SCNN: Fast Semantic Segmentation Network Rudra PK Poudel Stephan Liwicki Roberto Cipolla Cambridge Research Laboratory Toshiba Research Europe, UK BMVC 2019 R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 1 / 22

  2. Real-time Semantic Image Segmentation What am I seeing and where is it? Real-time perception is critical for autonomous systems R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 2 / 22

  3. Real-time Semantic Image Segmentation What am I seeing and where is it? Real-time perception is critical for autonomous systems R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 2 / 22

  4. Motivation Problem: SOTA models are accurate but resource hungry Compute: floating point ops Power consumption Memory Observations: First few layers of DCNN extract low-level features (Zeiler et al., 2014) 1 Larger receptive field (context) is important for accuracy (Poudel et al., 2018) 2 Spatial details is necessary to preserve boundary (Shelhamer et al. 2016) 3 SOTA efficient models adapt multi-resolution and multi-branch architecture 4 R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 3 / 22

  5. Motivation: First Few Layers Learn Low-level Features Zeiler et al., ECCV 2014 R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 4 / 22

  6. Motivation: Importance of Larger Receptive Field Convolution Block Bottleneck Residual Block Deep Network for Context Depth-wise Separable Convolution Block + Feature Fusion Unit + Shallow Network for Spatial Detail ContextNet (Poudel et al., BMVC 2018) R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 5 / 22

  7. Motivation: Importance of Spatial Details 1 64 64 128 64 64 2 input output image segmentation tile 392 x 392 390 x 390 388 x 388 388 x 388 map 572 x 572 570 x 570 568 x 568 128 128 256 128 200² 198² 196² 284² 282² 280² 256 256 512 256 conv 3x3, ReLU 104² 140² 138² 136² 102² 100² copy and crop 512 512 1024 512 max pool 2x2 56² 68² 66² 64² 54² 52² up-conv 2x2 1024 32² conv 1x1 30² 28² U-Net (Ronneberger et al., MICCAI 2015) R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 6 / 22

  8. Motivation: Efficient Multi-resolution Architectures ICNet (Zhao et el., ECCV 2018). R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 7 / 22

  9. Motivation Problem: SOTA models are accurate but resource hungry Compute: floating point ops Power consumption Memory Observations: First few layers of DCNN extract low-level features (Zeiler et al., 2014) 1 Larger receptive field (context) is important for accuracy (Poudel et al., 2018) 2 Spatial details is necessary to preserve boundary (Shelhamer et al. 2016) 3 SOTA efficient models adapt multi-resolution and multi-branch architecture 4 R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 8 / 22

  10. Proposed Model: Overview Hypothesis: jointly learn the low level features of multi-branch networks to increase the model efficiency. Fast-SCNN Learning to Down-sample jointly learns the low level features R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 9 / 22

  11. Proposed Model: Learning to Down-sample Learning to Down-sample sharing computation of multi-resolution branches improves efficiency No need for multiple resizes and memory copies of the original input R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 10 / 22

  12. Proposed Model: Larger Receptive Field Going deeper with convnet Fast-SCNN can be reduced to convnet Early sub-sampling/max-pooling layers increase receptive field and efficiency R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 11 / 22

  13. Proposed Model: Skip-Connection Spatial details skip-connection helps to recover boundary information We preferred simple feature fusion module i.e. addition only R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 12 / 22

  14. Proposed Model: Fast-SCNN Deeper path at low resolution captures global context information Shallow path focuses on high resolution segmentation details No need to learn low-level features separately Quantization, network pruning and other techniques are also applicable R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 13 / 22

  15. Proposed Model: Qualitative Validation � ✒ � Input image Skip-Connection: No � ✒ � Skip-Connection: Yes R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 14 / 22

  16. Proposed Model: Qualitative Validation ❅ ❅ ❘ Input image Skip-Connection: No ❅ ❅ ❘ Skip-Connection: Yes R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 14 / 22

  17. Fast-SCNN: Quantitative Evaluation Our Fast-SCNN 125 Fast-SCNN* Runtime (fps - 2 MP images) Other Methods 100 75 BiSeNet* 50 ContextNet ICNet 25 ENet Real-time 15 ERFNet 0% 20% 40% 60% 80% 100% Accuracy (% mIoU - class) ∗ Nvidia Titan Xp (Pascal); Others Nvidia Titan X (Maxwell) R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 15 / 22

  18. Fast-SCNN: Quantitative Evaluation Fast-SCNN balances accuracy and speed Class mIoU% Category mIoU% Params in Millions FPS on 1024x2048 SegNet 56.1 79.8 29.46 1.6 ENet 58.3 80.4 0.37 20.4 ICNet 69.5 - 6.68 30.3 ERFNet 68.0 86.5 2.1 11.2 ContextNet 66.1 82.7 0.85 41.9 BiSeNet* 71.4 - 5.8 57.3 GUN* 70.4 - - 33.3 68.0 84.7 1.11 Fast-SCNN* 123.5 ∗ Nvidia Titan Xp (Pascal); Others Nvidia Titan X (Maxwell) R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 16 / 22

  19. Fast-SCNN: Input Size Variation Fast-SCNN is efficient on smaller as well as larger scale input sizes Input Size Class mIoU% Frame-Per-Second 68.0 123.5 1024 × 2048 62.8 285.8 512 × 1024 51.9 485.4 256 × 512 R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 17 / 22

  20. Is ImageNet Pre-Training is Necessary? Total number of gradient updates is important At least in validation and test sets ImageNet pre-training is not important! Similar finding on Rethinking ImageNet Pre-training by He et al. (ICCV 2019) Model Class mIoU% Fast-SCNN 68.62 Fast-SCNN + ImageNet 69.15 Fast-SCNN + Coarse 69.22 Fast-SCNN + Coarse + ImageNet 69.19 80 80 60 60 Accuracy Accuracy 40 40 Fast-SCNN Fast-SCNN Fast-SCNN + ImageNet Fast-SCNN + ImageNet 20 20 Fast-SCNN + Coarse Fast-SCNN + Coarse Fast-SCNN + Coarse + ImageNet Fast-SCNN + Coarse + ImageNet 0 0 0 200 400 600 800 1000 0 1 2 3 4 5 6 7 10 5 Epochs Iterations R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 18 / 22

  21. Fast-SCNN: Qualitative Evaluation R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 19 / 22

  22. Conclusion Fast-SCNN is memory, computation and power efficient twice as fast as other state-of-the-art models above real-time i.e. 123.5 fps on 1024 × 2048 images efficient and competitive on smaller as well as larger scale input sizes We have shown accuracy without ImageNet pre-training is comparable Limitations: accuracy gap with bigger off-line models Future work: apply to depth estimation and instance segmentation R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 20 / 22

  23. References Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S. and Schiele, B., The Cityscapes Dataset for Semantic Urban Scene Understanding. In CVPR, 2016. He, K., Girshick, R., Dollár, P . Rethinking ImageNet Pre-training. In arXiv:1811.08883, 2018. Poudel, R. P . K., Bonde, U., Liwicki, S., Zach, C., ContextNet: Exploring Context and Detai,l for Semantic Segmentation in Real-time. In BMVC, 2018. Ronneberger, O. and Fischer, P . and Brox, T., U-Net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015. Shelhamer, E. and Long, J. and Darrell, T., Fully convolutional networks for semantic segmentation. In PAMI, 2016. Zeiler, M. D. and Fergus, R., Visualizing and understanding convolutional networks. In ECCV, 2014. Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J., ICNet for Real-Time Semantic Segmentation on High-Resolution Images, In ECCV 2018. R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 21 / 22

  24. Questions Public implementations on PyTorch and TensorFlow are available on Github! Thank you! R. Poudel et al. (CRL) Fast-SCNN: Fast Semantic Segmentation Network BMVC 2019 22 / 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend