PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale - - PowerPoint PPT Presentation
PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale - - PowerPoint PPT Presentation
PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer Anbang Yao Qifeng Chen Duo Li Highlights investigates multi-scale architecture through the lens of kernel engineering instead of network engineering
PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer
Qifeng Chen Duo Li Anbang Yao
Highlights
- investigates multi-scale architecture through the lens of kernel
engineering instead of network engineering
- extends the scope of conventional mono-scale convolution operation
by developing our Poly-Scale Convolution
- bring about performance improvement on classification, detection,
segmentation tasks with NO computational overheads.
Motivation: Multi-Scale Architecture Design
- Single-Scale
- AlexNet
- VGGNet
- ……
- Multi-Scale
- FCN -> skip connection
- Inception -> parallel stream
- ……
Long et al., Fully Convolutional Networks for Semantic Segmentation, CVPR 2015 Szegedy et al., Going Deeper with Convolutions, CVPR 2015
Previous Work: Layer-Level Skip Connection
Previous Work: Filter-Level Parallel Stream
Dilation Rate Kernel Size
Previous Work: Filter-Level Feature Pyramid
Motivation: Kernel-Level Feature Pyramid
Input Feature Map Convolutional Filter Banks Different Colors->Different Dilation Rates
Method
Standard Convolution Dilated Convolution Poly-Scale Convolution
Efficient Implementation
Observation: Feature channel indices are interchangeable Implementation: Grouping kernels with the same dilation rate together and implement with group convolution
Quantitative Results: ILSVRC 2012
Comparison to baseline models and SOTA multi-scale architectures on ImageNet
Quantitative Results: MS COCO 2017
Comparison to baseline with basic/cascade detectors on COCO detection track
Quantitative Results: MS COCO 2017
Comparison to baseline with basic/cascade detectors on COCO segmentation track
Qualitative Results: Scale Allocation
PS-ResNet-50 on ImageNet
■ indicates starting residual block of one stage
PS-ResNeXt-29 on CIFAR-100
Conclusion
- a plug-and-play convolution operation for any deep learning models
- leads to consistent and considerable performance margins in a wide
range of vision tasks, without bells and whistles
- code available for reproducibility: https://github.com/d-li14/PSConv