MUXConv: Information Multiplexing in Convolutional Neural Networks - - PowerPoint PPT Presentation
MUXConv: Information Multiplexing in Convolutional Neural Networks - - PowerPoint PPT Presentation
MUXConv: Information Multiplexing in Convolutional Neural Networks Zhichao Lu, Kalyanmoy Deb and Vishnu Naresh Boddeti Michigan State University { luzhicha, kdeb, vishnu } @msu.edu https://github.com/human-analysis/MUXConv MUXConv New
MUXConv
◮ New layer: Multiplexed Convolutions (Spatial + Channel Multiplexing) ◮ Idea: increase flow of information between space and channels ◮ Goal: smaller model size, increased efficiency, maintain/increase performance
Spatial Multiplexing: Idea
𝑋 𝐼 𝐷 𝐷 𝐷 Group-wise Conv 𝑋 2 𝐼 2 4𝐷 𝑋 𝐼 𝐷 2𝐼 2𝑋 𝐷 4 𝑋 𝐼 𝐷 𝐷 𝐷
Spatial Multiplexing
𝐼 𝑠 Spatial to channel 𝑋 𝐼 𝑋 𝑠 𝑠$
Subpixel
𝑠" 𝑠 𝐼 𝑋 𝐼 𝑠 𝑋 Channel to spatial 𝑋 𝑠
Superpixel
Spatial Multiplexing: Evaluation
40.7 56.0 65.5 70.4 72.7 5.78
8 9 10 2 3 4 5 6 7 8 9 100 2 3
35 40 45 50 55 60 65 70 75 MobileNetV2 MobileNetV2 w/ spatial multiplexing
Number of MAdds (Millions) ImageNet Top1 accuracy (%)
◮ Consistent improvement on accuracy over the original depth-wise separable
convolution
◮ Particularly effective in low MAdds Regime
Channel Multiplexing:Idea
3 x 3 conv 3 x 3 conv
+ 1 x 1 conv 3 x 3 conv + 1 x 1 conv
1 x 1 conv depth-wise 3 x 3 conv + 1 x 1 conv
(a) Original residual block (ResNet-18, ResNet-34) (b) Bottleneck (ResNet-50, DenseNet-BC) (c) Inverted bottleneck (MobileNetV2/V3, MNASNet)
1 x 1 conv group-wise 3 x 3 conv + 1 x 1 conv
(d) Proposed
↓ 1 2 3 4 5 6 7 8 ↓ 1 × 1 ↓ SpatialMUX ↓ 1×1 ↓ 1 2 3 4 5 6 7 8 mix-up ↓ 1 5 2 6 3 7 4 8 ↓ 1 × 1 ↓ SpatialMUX ↓ 1 × 1 ↓ 1 5 2 6 3 7 4 8 mix-up ↓ 1 3 5 7 2 4 6 8 ↓ Reduction Block ↓ copy leave out copy leave out
Channel Multiplexing: Evaluation
l=0.25 l=0.5 l=0.75 w=0.75 w=0.5 r=192 r=160 r=128 (w=1.0, r=224, l=0.0)
100 150 200 250 300 65 66 67 68 69 70 71 72 73
l=0.25 l=0.5 l=0.75 w=0.75 w=0.5 r=192 r=160 r=128 (w=1.0, r=224, l=0.0)
2 2.5 3 3.5 65 66 67 68 69 70 71 72 73
width multiplier input resolution channel multiplexing
Number of MAdds (Millions) Number of Parameters (Millions) Top 1 accuracy (%)
◮ Consistently outperforming existing scaling methods
Tri-Objective Search: Idea
Reference Point Reference Direction Region of Interest Ideal Point Pareto Surface Attainable Objective set 𝑒" 𝑒# Ideal Point 𝑨
∗
Reference Point Reference Direction 𝒙 = 𝑥#,𝑥"
+
Region of Interest
◮ Simultaneously optimize for accuracy (↑), #Params (↓), and #MAdds (↓) ◮ User preference guided search through PBI1 decomposition
1Qingfu Zhang and Hui Li. Moea/d: A multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on Evolutionary Computation, 11(6):712731, 2007
Tri-Objective Search: Evaluation
20 40 60 80 Training Time (mins) 0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 Top-1 Accuracy 10 20 30 40 50 Number of Parameters (Million) 0.800 0.825 0.850 0.875 0.900 0.925 0.950 Top-1 Accuracy N u m b e r
- f
P a r a m e t e r s ( M i l l i
- n
) 10 20 30 40 50 Training Time (mins) 20 40 60 80 T
- p
- 1
A c c u r a c y 0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94
Attainable models Reference Point-1 Reference Point-2 Reference Point-3 Ours Regularized Evolution Ours Regularized Evolution
◮ NASBench101: our search is more efficient than regularized evolution2
2Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc VLe. Regularized evolution for image classifier architecture search. In AAAI, 2019
ImageNet-1K Classification
2 3 4 5 6 7 8 60 62 64 66 68 70 72 74 76 78 80
3 4 5 6 7 8 9 100 2 3 4 5 6 7
60 62 64 66 68 70 72 74 76 78 80
MUXNet MobileNetV2 MobileNetV3 large MobileNetV3 small MnasNet MixNet FBNet ChamNet ProxylessNAS GPU NASNetA AmoebaNetA DARTS
Number of Parameters (Millions) Number of MAdds (Millions) Top 1 accuracy (%)
Model Type #MAdds Ratio #Params Ratio CPU(ms) GPU(ms) Top-1 (%) Top-5 (%) MUXNet-xs (ours) auto 66M‡ 1.0x 1.8M‡ 1.0x 6.8 18 66.7 86.8 MobileNetV2 0.5 manual 97M 1.5x 2.0M 1.1x 6.2 17 65.4 86.4 MobileNetV3 small combined 66M 1.0x 2.9M 1.6x 6.2‡ 14 67.4
- MUXNet-s (ours)
auto 117M‡ 1.0x 2.4M‡ 1.0x 9.5 25 71.6 90.3 MobileNetV1 manual 575M 4.9x 4.2M 1.8x 7.3 20 70.6 89.5 ShuffleNetV2 manual 146M 1.3x
- 6.8
11‡ 69.4
- ChamNet-C
auto 212M 1.8x 3.4M 1.4x
- 71.6
- MUXNet-m (ours)
auto 218M‡ 1.0x 3.4M‡ 1.0x 14.7 42 75.3 92.5 MobileNetV2 manual 300M 1.4x 3.4M 1.0x 8.3‡ 23 72.0 91.0 ShuffleNetV2 2× manual 591M 2.7x 7.4M 2.2x 11.0 22‡ 74.9
- MnasNet-A1
auto 312M 1.4x 3.9M 1.1x 9.3‡ 32 75.2 92.5 MobileNetV3 large combined 219M 1.0x 5.4M 1.6x 10.0‡ 33 75.2
- MUXNet-l (ours)
auto 318M‡ 1.0x 4.0M‡ 1.0x 19.2 74 76.6 93.2 MnasNet-A2 auto 340M 1.1x 4.8M 1.2x
- 75.6
92.7 FBNet-C auto 375M 1.2x 5.5M 1.4x 9.1‡ 31 74.9
- EfficientNet-B0
auto 390M‡ 1.2x 5.3M 1.3x 14.4 46 76.3 93.2 MixNet-M auto 360M‡ 1.1x 5.0M 1.2x 24.3 79 77.0 93.3
‡ indicates the objective that the method explicitly optimizes through NAS.
Additional Experiments
Generalization to ImageNet-V2
ShuffleNetV2 ResNet18 MUXNets (ours) GoogLeNet MobileNetV2 DARTS MnasNetA1 NASNetA mobile MUXNetm (ours) MUXNetl (ours) DenseNet169 ResNeXt50 32x4d
78 80 82 84 86 88 90 92 94 ImageNet ImageNetV2
Top5 Accuracy (%)
10.0 8.8 9.5 8.6 8.9 9.1 9.1 8.1 8.3 8.2 7.8 7.7
PASCAL VOC2007 Detection
Network #MAdds #Params mAP (%) VGG16 + SSD 35B 26.3M 74.3 MobileNet + SSD 1.6B 9.5M 67.6 MobileNetV2 + SSDLite 0.7B 3.4M 67.4 MobileNetV2 + SSD 1.4B 8.9M 73.2 MUXNet-m + SSDLite 0.5B 3.2M 68.6 MUXNet-l + SSD 1.4B 9.9M 73.8
ADE20K Semantic Segmentation
Network #MAdds #Params mIoU (%) Acc (%) ResNet18 + C1 1.8B 11.7M 33.82 76.05 MobileNetV2 + C1 0.3B 3.5M 34.84 75.75 MUXNet-m + C1 0.2B 3.4M 32.42 75.00 ResNet18 + PPM 1.8B 11.7M 38.00 78.64 MobileNetV2 + PPM 0.3B 3.5M 35.76 77.77 MUXNet-m + PPM 0.2B 3.4M 35.80 76.33
Additional Experiments
Transfer Learning on CIFAR
1
2 3 4 5 6 7 8 9 10 2
96 96.5 97 97.5 98 100
2 5
1000
2 5
96 96.5 97 97.5 98 1
2 3 4 5 6 7 8 9 10 2
81 82 83 84 85 86 87 88 100
2 5
1000
2 5
81 82 83 84 85 86 87 88
MUXNet ResNet-50 DenseNet-169 Inception v3 MobileNetV1 MobileNetV2 NASNet-A mobile EfficientNet-B0 MixNet-M
Number of Parameters (Millions) Number of Mult-Adds (Millions) Number of Parameters (Millions) Number of Mult-Adds (Millions) Top 1 accuracy (%) CIFAR-10 CIFAR-10 CIFAR-100 CIFAR-100
Robustness to Degradations
b r i g h t n e s s c
- n
t r a s t d e f
- c
u s _ b l u r e l a s t i c _ t r a n s f
- r
m f
- g
f r
- s
t g a u s s i a n _ b l u r g a u s s i a n _ n
- i
s e g l a s s _ b l u r i m p u l s e _ n
- i
s e j p e g _ c
- m
p r e s s i
- n
m
- t
i
- n
_ b l u r p i x e l a t e s a t u r a t e s h
- t
_ n
- i
s e s n
- w
s p a t t e r s p e c k l e _ n
- i
s e z
- m
_ b l u r 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 ShuffleNetV2 MobileNetV2 DARTS MnasNetA1 MUXNetm Normalized Top5 Acc.
Visualization on Segmentation Results
Test images Ground truth MUXNet-m +PPM