muxconv information multiplexing in convolutional neural
play

MUXConv: Information Multiplexing in Convolutional Neural Networks - PowerPoint PPT Presentation

MUXConv: Information Multiplexing in Convolutional Neural Networks Zhichao Lu, Kalyanmoy Deb and Vishnu Naresh Boddeti Michigan State University { luzhicha, kdeb, vishnu } @msu.edu https://github.com/human-analysis/MUXConv MUXConv New


  1. MUXConv: Information Multiplexing in Convolutional Neural Networks Zhichao Lu, Kalyanmoy Deb and Vishnu Naresh Boddeti Michigan State University { luzhicha, kdeb, vishnu } @msu.edu https://github.com/human-analysis/MUXConv

  2. MUXConv ◮ New layer: Multiplexed Convolutions (Spatial + Channel Multiplexing) ◮ Idea: increase flow of information between space and channels ◮ Goal: smaller model size, increased efficiency, maintain/increase performance

  3. Spatial Multiplexing: Idea 2𝐼 𝐷 4 2𝑋 𝐷 𝐼 𝐼 𝐷 𝐷 𝐷 𝐼 𝐷 𝐷 𝑋 𝑋 𝐷 𝑋 𝐼 4𝐷 2 𝑋 Group-wise Conv 2 Spatial Multiplexing Spatial to Channel 𝐼 𝑠 𝐼 channel 𝐼 to spatial 𝐼 𝑠 $ 𝑠 " 𝑠 𝑋 𝑋 𝑋 𝑋 𝑠 𝑋 𝑠 𝑠 Subpixel Superpixel

  4. Spatial Multiplexing: Evaluation 75 72.7 70.4 70 65.5 ImageNet
Top­1
accuracy
(%) 65 60 56.0 55 50 45 40.7 40 5.78 MobileNetV2 MobileNetV2
w/
spatial
multiplexing 35 9 10 9 100 8 2 3 4 5 6 7 8 2 3 Number
of
MAdds
(Millions) ◮ Consistent improvement on accuracy over the original depth-wise separable convolution ◮ Particularly effective in low MAdds Regime

  5. Channel Multiplexing:Idea 1 x 1 conv 1 x 1 conv 1 x 1 conv 3 x 3 conv group-wise + depth-wise + + 3 x 3 conv + 3 x 3 conv 3 x 3 conv 3 x 3 conv 1 x 1 conv 1 x 1 conv 1 x 1 conv (a) Original residual block (b) Bottleneck (c) Inverted bottleneck (d) Proposed (ResNet-18, ResNet-34) (ResNet-50, DenseNet-BC) (MobileNetV2/V3, MNASNet) 8 8 8 8 8 SpatialMUX 7 7 4 4 6 SpatialMUX 1 × 1 1×1 ↓ ↓ ↓ ↓ 1 × 1 1 × 1 6 6 7 7 4 Reduction Block ↓ ↓ ↓ ↓ 5 3 3 5 2 ↓ ↓ ↓ ↓ ↓ 4 4 6 6 7 leave out 3 3 mix-up 2 2 mix-up 5 leave out 2 copy 2 5 5 3 copy 1 1 1 1 1

  6. Channel Multiplexing: Evaluation width
multiplier input
resolution channel
multiplexing 73 73 (w=1.0,
r=224,
l=0.0) (w=1.0,
r=224,
l=0.0) l=0.25 72 72 l=0.25 l=0.5 l=0.5 71 71 r=192 Top
1
accuracy
(%) r=192 70 70 w=0.75 w=0.75 l=0.75 l=0.75 69 69 r=160 r=160 68 68 67 67 66 66 w=0.5 r=128 w=0.5 r=128 65 65 100 150 200 250 300 2 2.5 3 3.5 Number
of
MAdds
(Millions) Number
of
Parameters
(Millions) ◮ Consistently outperforming existing scaling methods

  7. Tri-Objective Search: Idea + 𝒙 = 𝑥 # ,𝑥 " Reference Direction Reference Direction 𝑒 " Region of Region of Interest Interest Reference Point Pareto Surface 𝑒 # Ideal Reference Point Point Ideal Attainable ∗ 𝑨 Point Objective set ◮ Simultaneously optimize for accuracy ( ↑ ), #Params ( ↓ ), and #MAdds ( ↓ ) ◮ User preference guided search through PBI 1 decomposition 1Qingfu Zhang and Hui Li. Moea/d: A multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on Evolutionary Computation , 11(6):712731, 2007

  8. Tri-Objective Search: Evaluation 0.96 Attainable models Reference Point-1 Reference Point-2 Reference Point-3 0.94 0.92 Top-1 Accuracy 0.90 0.88 0.94 0.86 0.92 0.84 y 0.90 c a 0.82 r u c 0.88 c A 0.80 1 - p 0 20 40 60 80 o 0.86 Training Time (mins) T 0.84 0.950 0.82 0.925 0.80 0.900 Top-1 Accuracy 80 0.875 Training Time (mins) 60 0 0.850 10 40 20 N u m b e r o f 0.825 P a r 30 20 a m e t e r s ( M i l l i o n ) 40 0.800 50 0 10 20 30 40 50 Ours Ours Regularized Evolution Regularized Evolution Number of Parameters (Million) ◮ NASBench101: our search is more efficient than regularized evolution 2 2Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc VLe. Regularized evolution for image classifier architecture search. In AAAI , 2019

  9. ImageNet-1K Classification MUXNet MobileNetV2 MobileNetV3
large MobileNetV3
small MnasNet MixNet FBNet ChamNet ProxylessNAS
GPU NASNet­A AmoebaNet­A DARTS 80 80 78 78 76 76 74 74 Top
1
accuracy
(%) 72 72 70 70 68 68 66 66 64 64 62 62 60 60 9 100 2 3 4 5 6 7 8 3 4 5 6 7 8 2 3 4 5 6 7 Number
of
Parameters
(Millions) Number
of
MAdds
(Millions) Model Type #MAdds Ratio #Params Ratio CPU(ms) GPU(ms) Top-1 (%) Top-5 (%) 66M ‡ 1.8M ‡ MUXNet-xs (ours) auto 1.0x 1.0x 6.8 18 66.7 86.8 MobileNetV2 0.5 manual 97M 1.5x 2.0M 1.1x 6.2 17 65.4 86.4 6.2 ‡ MobileNetV3 small combined 66M 1.0x 2.9M 1.6x 14 67.4 - MUXNet-s (ours) auto 117M ‡ 1.0x 2.4M ‡ 1.0x 9.5 25 71.6 90.3 MobileNetV1 manual 575M 4.9x 4.2M 1.8x 7.3 20 70.6 89.5 ShuffleNetV2 manual 146M 1.3x - - 6.8 11 ‡ 69.4 - ChamNet-C auto 212M 1.8x 3.4M 1.4x - - 71.6 - 218M ‡ 3.4M ‡ MUXNet-m (ours) auto 1.0x 1.0x 14.7 42 75.3 92.5 8.3 ‡ MobileNetV2 manual 300M 1.4x 3.4M 1.0x 23 72.0 91.0 22 ‡ ShuffleNetV2 2 × manual 591M 2.7x 7.4M 2.2x 11.0 74.9 - MnasNet-A1 auto 312M 1.4x 3.9M 1.1x 9.3 ‡ 32 75.2 92.5 10.0 ‡ MobileNetV3 large combined 219M 1.0x 5.4M 1.6x 33 75.2 - 318M ‡ 4.0M ‡ MUXNet-l (ours) auto 1.0x 1.0x 19.2 74 76.6 93.2 MnasNet-A2 auto 340M 1.1x 4.8M 1.2x - - 75.6 92.7 9.1 ‡ FBNet-C auto 375M 1.2x 5.5M 1.4x 31 74.9 - 390M ‡ EfficientNet-B0 auto 1.2x 5.3M 1.3x 14.4 46 76.3 93.2 360M ‡ MixNet-M auto 1.1x 5.0M 1.2x 24.3 79 77.0 93.3 ‡ indicates the objective that the method explicitly optimizes through NAS.

  10. Additional Experiments Generalization to ImageNet-V2 94 ImageNet ImageNet­V2 92 90 7.7 Top­5
Accuracy
(%) 8.2 7.8 8.3 88 8.1 9.1 9.1 86 8.9 9.5 8.6 8.8 84 10.0 82 80 78 ShuffleNetV2 ResNet18 MUXNet­s
(ours) GoogLeNet MobileNetV2 DARTS MnasNet­A1 NASNet­A
mobile MUXNet­m
(ours) MUXNet­l
(ours) DenseNet­169 ResNeXt50
32x4d PASCAL VOC2007 Detection ADE20K Semantic Segmentation Network #MAdds #Params mAP (%) Network #MAdds #Params mIoU (%) Acc (%) VGG16 + SSD 35B 26.3M 74.3 ResNet18 + C1 1.8B 11.7M 33.82 76.05 MobileNet + SSD 1.6B 9.5M 67.6 MobileNetV2 + C1 0.3B 3.5M 34.84 75.75 MobileNetV2 + SSDLite 0.7B 3.4M 67.4 MUXNet-m + C1 0.2B 3.4M 32.42 75.00 MobileNetV2 + SSD 1.4B 8.9M 73.2 ResNet18 + PPM 1.8B 11.7M 38.00 78.64 MUXNet-m + SSDLite 0.5B 3.2M 68.6 MobileNetV2 + PPM 0.3B 3.5M 35.76 77.77 MUXNet-l + SSD 1.4B 9.9M 73.8 MUXNet-m + PPM 0.2B 3.4M 35.80 76.33

  11. Additional Experiments Transfer Learning on CIFAR MUXNet ResNet-50 DenseNet-169 Inception v3 MobileNetV1 MobileNetV2 NASNet-A mobile EfficientNet-B0 MixNet-M CIFAR-10 CIFAR-10 CIFAR-100 CIFAR-100 88 88 98 98 Top 1 accuracy (%) 87 87 97.5 97.5 86 86 85 85 97 97 84 84 96.5 96.5 83 83 82 82 96 96 81 81 2 3 4 5 6 7 8 9 10 2 100 2 5 1000 2 5 2 3 4 5 6 7 8 9 10 2 100 2 5 1000 2 5 1 1 Number of Mult-Adds (Millions) Number of Mult-Adds (Millions) Number of Parameters (Millions) Number of Parameters (Millions) Robustness to Degradations Visualization on Segmentation Results 2.3 Test 2.2 ShuffleNetV2 2.1 images MobileNetV2 2 DARTS 1.9 Normalized
Top­5
Acc. 1.8 MnasNet­A1 MUXNet­m 1.7 1.6 Ground truth 1.5 1.4 1.3 1.2 MUXNet-m 1.1 +PPM b r c o d e e l a o f g r f o g a g a g l a m i p j e m o p i x a s s h s n s p s p z o i g h n t o f t s s t u s u s s s p u g i t e l t u r o t o w a t t e c m o n t r a s c u i c _ s i a s a i _ b l s _ c o n a t t a _ n e r k l e _ e s t s _ t r n _ n _ l u e _ o m _ b e e o s i _ n b l u s b l u a n b n r n o p r l u e o r r s f u l r o s i i s e e s r i s e r o m e i s o n

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend