MUXConv: Information Multiplexing in Convolutional Neural Networks - PowerPoint PPT Presentation

MUXConv: Information Multiplexing in Convolutional Neural Networks Zhichao Lu, Kalyanmoy Deb and Vishnu Naresh Boddeti Michigan State University { luzhicha, kdeb, vishnu } @msu.edu https://github.com/human-analysis/MUXConv

MUXConv ◮ New layer: Multiplexed Convolutions (Spatial + Channel Multiplexing) ◮ Idea: increase flow of information between space and channels ◮ Goal: smaller model size, increased efficiency, maintain/increase performance

Spatial Multiplexing: Idea 2𝐼 𝐷 4 2𝑋 𝐷 𝐼 𝐼 𝐷 𝐷 𝐷 𝐼 𝐷 𝐷 𝑋 𝑋 𝐷 𝑋 𝐼 4𝐷 2 𝑋 Group-wise Conv 2 Spatial Multiplexing Spatial to Channel 𝐼 𝑠 𝐼 channel 𝐼 to spatial 𝐼 𝑠 $ 𝑠 " 𝑠 𝑋 𝑋 𝑋 𝑋 𝑠 𝑋 𝑠 𝑠 Subpixel Superpixel

Spatial Multiplexing: Evaluation 75 72.7 70.4 70 65.5 ImageNet Top1 accuracy (%) 65 60 56.0 55 50 45 40.7 40 5.78 MobileNetV2 MobileNetV2 w/ spatial multiplexing 35 9 10 9 100 8 2 3 4 5 6 7 8 2 3 Number of MAdds (Millions) ◮ Consistent improvement on accuracy over the original depth-wise separable convolution ◮ Particularly effective in low MAdds Regime

Channel Multiplexing:Idea 1 x 1 conv 1 x 1 conv 1 x 1 conv 3 x 3 conv group-wise + depth-wise + + 3 x 3 conv + 3 x 3 conv 3 x 3 conv 3 x 3 conv 1 x 1 conv 1 x 1 conv 1 x 1 conv (a) Original residual block (b) Bottleneck (c) Inverted bottleneck (d) Proposed (ResNet-18, ResNet-34) (ResNet-50, DenseNet-BC) (MobileNetV2/V3, MNASNet) 8 8 8 8 8 SpatialMUX 7 7 4 4 6 SpatialMUX 1 × 1 1×1 ↓ ↓ ↓ ↓ 1 × 1 1 × 1 6 6 7 7 4 Reduction Block ↓ ↓ ↓ ↓ 5 3 3 5 2 ↓ ↓ ↓ ↓ ↓ 4 4 6 6 7 leave out 3 3 mix-up 2 2 mix-up 5 leave out 2 copy 2 5 5 3 copy 1 1 1 1 1

Channel Multiplexing: Evaluation width multiplier input resolution channel multiplexing 73 73 (w=1.0, r=224, l=0.0) (w=1.0, r=224, l=0.0) l=0.25 72 72 l=0.25 l=0.5 l=0.5 71 71 r=192 Top 1 accuracy (%) r=192 70 70 w=0.75 w=0.75 l=0.75 l=0.75 69 69 r=160 r=160 68 68 67 67 66 66 w=0.5 r=128 w=0.5 r=128 65 65 100 150 200 250 300 2 2.5 3 3.5 Number of MAdds (Millions) Number of Parameters (Millions) ◮ Consistently outperforming existing scaling methods

Tri-Objective Search: Idea + 𝒙 = 𝑥 # ,𝑥 " Reference Direction Reference Direction 𝑒 " Region of Region of Interest Interest Reference Point Pareto Surface 𝑒 # Ideal Reference Point Point Ideal Attainable ∗ 𝑨 Point Objective set ◮ Simultaneously optimize for accuracy ( ↑ ), #Params ( ↓ ), and #MAdds ( ↓ ) ◮ User preference guided search through PBI 1 decomposition 1Qingfu Zhang and Hui Li. Moea/d: A multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on Evolutionary Computation , 11(6):712731, 2007

Tri-Objective Search: Evaluation 0.96 Attainable models Reference Point-1 Reference Point-2 Reference Point-3 0.94 0.92 Top-1 Accuracy 0.90 0.88 0.94 0.86 0.92 0.84 y 0.90 c a 0.82 r u c 0.88 c A 0.80 1 - p 0 20 40 60 80 o 0.86 Training Time (mins) T 0.84 0.950 0.82 0.925 0.80 0.900 Top-1 Accuracy 80 0.875 Training Time (mins) 60 0 0.850 10 40 20 N u m b e r o f 0.825 P a r 30 20 a m e t e r s ( M i l l i o n ) 40 0.800 50 0 10 20 30 40 50 Ours Ours Regularized Evolution Regularized Evolution Number of Parameters (Million) ◮ NASBench101: our search is more efficient than regularized evolution 2 2Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc VLe. Regularized evolution for image classifier architecture search. In AAAI , 2019

ImageNet-1K Classification MUXNet MobileNetV2 MobileNetV3 large MobileNetV3 small MnasNet MixNet FBNet ChamNet ProxylessNAS GPU NASNetA AmoebaNetA DARTS 80 80 78 78 76 76 74 74 Top 1 accuracy (%) 72 72 70 70 68 68 66 66 64 64 62 62 60 60 9 100 2 3 4 5 6 7 8 3 4 5 6 7 8 2 3 4 5 6 7 Number of Parameters (Millions) Number of MAdds (Millions) Model Type #MAdds Ratio #Params Ratio CPU(ms) GPU(ms) Top-1 (%) Top-5 (%) 66M ‡ 1.8M ‡ MUXNet-xs (ours) auto 1.0x 1.0x 6.8 18 66.7 86.8 MobileNetV2 0.5 manual 97M 1.5x 2.0M 1.1x 6.2 17 65.4 86.4 6.2 ‡ MobileNetV3 small combined 66M 1.0x 2.9M 1.6x 14 67.4 - MUXNet-s (ours) auto 117M ‡ 1.0x 2.4M ‡ 1.0x 9.5 25 71.6 90.3 MobileNetV1 manual 575M 4.9x 4.2M 1.8x 7.3 20 70.6 89.5 ShuffleNetV2 manual 146M 1.3x - - 6.8 11 ‡ 69.4 - ChamNet-C auto 212M 1.8x 3.4M 1.4x - - 71.6 - 218M ‡ 3.4M ‡ MUXNet-m (ours) auto 1.0x 1.0x 14.7 42 75.3 92.5 8.3 ‡ MobileNetV2 manual 300M 1.4x 3.4M 1.0x 23 72.0 91.0 22 ‡ ShuffleNetV2 2 × manual 591M 2.7x 7.4M 2.2x 11.0 74.9 - MnasNet-A1 auto 312M 1.4x 3.9M 1.1x 9.3 ‡ 32 75.2 92.5 10.0 ‡ MobileNetV3 large combined 219M 1.0x 5.4M 1.6x 33 75.2 - 318M ‡ 4.0M ‡ MUXNet-l (ours) auto 1.0x 1.0x 19.2 74 76.6 93.2 MnasNet-A2 auto 340M 1.1x 4.8M 1.2x - - 75.6 92.7 9.1 ‡ FBNet-C auto 375M 1.2x 5.5M 1.4x 31 74.9 - 390M ‡ EfficientNet-B0 auto 1.2x 5.3M 1.3x 14.4 46 76.3 93.2 360M ‡ MixNet-M auto 1.1x 5.0M 1.2x 24.3 79 77.0 93.3 ‡ indicates the objective that the method explicitly optimizes through NAS.

Additional Experiments Generalization to ImageNet-V2 94 ImageNet ImageNetV2 92 90 7.7 Top5 Accuracy (%) 8.2 7.8 8.3 88 8.1 9.1 9.1 86 8.9 9.5 8.6 8.8 84 10.0 82 80 78 ShuffleNetV2 ResNet18 MUXNets (ours) GoogLeNet MobileNetV2 DARTS MnasNetA1 NASNetA mobile MUXNetm (ours) MUXNetl (ours) DenseNet169 ResNeXt50 32x4d PASCAL VOC2007 Detection ADE20K Semantic Segmentation Network #MAdds #Params mAP (%) Network #MAdds #Params mIoU (%) Acc (%) VGG16 + SSD 35B 26.3M 74.3 ResNet18 + C1 1.8B 11.7M 33.82 76.05 MobileNet + SSD 1.6B 9.5M 67.6 MobileNetV2 + C1 0.3B 3.5M 34.84 75.75 MobileNetV2 + SSDLite 0.7B 3.4M 67.4 MUXNet-m + C1 0.2B 3.4M 32.42 75.00 MobileNetV2 + SSD 1.4B 8.9M 73.2 ResNet18 + PPM 1.8B 11.7M 38.00 78.64 MUXNet-m + SSDLite 0.5B 3.2M 68.6 MobileNetV2 + PPM 0.3B 3.5M 35.76 77.77 MUXNet-l + SSD 1.4B 9.9M 73.8 MUXNet-m + PPM 0.2B 3.4M 35.80 76.33

Additional Experiments Transfer Learning on CIFAR MUXNet ResNet-50 DenseNet-169 Inception v3 MobileNetV1 MobileNetV2 NASNet-A mobile EfficientNet-B0 MixNet-M CIFAR-10 CIFAR-10 CIFAR-100 CIFAR-100 88 88 98 98 Top 1 accuracy (%) 87 87 97.5 97.5 86 86 85 85 97 97 84 84 96.5 96.5 83 83 82 82 96 96 81 81 2 3 4 5 6 7 8 9 10 2 100 2 5 1000 2 5 2 3 4 5 6 7 8 9 10 2 100 2 5 1000 2 5 1 1 Number of Mult-Adds (Millions) Number of Mult-Adds (Millions) Number of Parameters (Millions) Number of Parameters (Millions) Robustness to Degradations Visualization on Segmentation Results 2.3 Test 2.2 ShuffleNetV2 2.1 images MobileNetV2 2 DARTS 1.9 Normalized Top5 Acc. 1.8 MnasNetA1 MUXNetm 1.7 1.6 Ground truth 1.5 1.4 1.3 1.2 MUXNet-m 1.1 +PPM b r c o d e e l a o f g r f o g a g a g l a m i p j e m o p i x a s s h s n s p s p z o i g h n t o f t s s t u s u s s s p u g i t e l t u r o t o w a t t e c m o n t r a s c u i c _ s i a s a i _ b l s _ c o n a t t a _ n e r k l e _ e s t s _ t r n _ n _ l u e _ o m _ b e e o s i _ n b l u s b l u a n b n r n o p r l u e o r r s f u l r o s i i s e e s r i s e r o m e i s o n

MUXConv: Information Multiplexing in Convolutional Neural Networks - PowerPoint PPT Presentation

MUXConv: Information Multiplexing in Convolutional Neural Networks Zhichao Lu, Kalyanmoy Deb and Vishnu Naresh Boddeti Michigan State University { luzhicha, kdeb, vishnu } @msu.edu https://github.com/human-analysis/MUXConv MUXConv New

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

1 Wavelength Division Analog Carrier Systems Multiplexing AT&T (USA) Multiple

Multiplexing UDP-based protocols with QUIC January 2018, Melbourne Multiplexing QUIC and RFC

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Convolutional Neural Nets CS447 Natural Language Processing (J. Hockenmaier)

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

ON TEGRA X1 ALAN WANG, NVIDIA Convolutional Neural Network optimization target Result

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Outline Convolutional Neural Network Architectures for Matching Natural Language Sentences.

Corruption and Enforcement in Transnational Business: The Legal Framework Lucinda A. Low

The Yann Arthus-Bertrand project "Earth seen from the Sky" recorded a huge public

Global Illumination CPSC 453 Fall 2018 Sonny Chan Outline for Today (and Thursday)

Domain-specific front-end for virtual Domain-specific front-end for virtual system modeling

digital publishing summit EUROPE 2018 BERLIN 2018 getting useful feedback from readers by using

Co-Design Phase 2 Deepening and Designing Aim of Today: Share and plan for our vision for a

MOLECULAR GAS IN GALAXIES THROUGH THE EDGE-CALIFA SURVEY

Website Investing Done Right Because theres a wrong way to do stuff yall Slides:

MUXConv: Information Multiplexing in Convolutional Neural Networks - PowerPoint PPT Presentation

MUXConv: Information Multiplexing in Convolutional Neural Networks Zhichao Lu, Kalyanmoy Deb and Vishnu Naresh Boddeti Michigan State University { luzhicha, kdeb, vishnu } @msu.edu https://github.com/human-analysis/MUXConv MUXConv New

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

1 Wavelength Division Analog Carrier Systems Multiplexing AT&amp;T (USA) Multiple

Multiplexing UDP-based protocols with QUIC January 2018, Melbourne Multiplexing QUIC and RFC

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Convolutional Neural Nets CS447 Natural Language Processing (J. Hockenmaier)

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

ON TEGRA X1 ALAN WANG, NVIDIA Convolutional Neural Network optimization target Result

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Outline Convolutional Neural Network Architectures for Matching Natural Language Sentences.

Corruption and Enforcement in Transnational Business: The Legal Framework Lucinda A. Low

The Yann Arthus-Bertrand project &quot;Earth seen from the Sky&quot; recorded a huge public

Global Illumination CPSC 453 Fall 2018 Sonny Chan Outline for Today (and Thursday)

Domain-specific front-end for virtual Domain-specific front-end for virtual system modeling

digital publishing summit EUROPE 2018 BERLIN 2018 getting useful feedback from readers by using

Co-Design Phase 2 Deepening and Designing Aim of Today: Share and plan for our vision for a

MOLECULAR GAS IN GALAXIES THROUGH THE EDGE-CALIFA SURVEY

Website Investing Done Right Because theres a wrong way to do stuff yall Slides:

1 Wavelength Division Analog Carrier Systems Multiplexing AT&T (USA) Multiple

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

The Yann Arthus-Bertrand project "Earth seen from the Sky" recorded a huge public