MUXConv: Information Multiplexing in Convolutional Neural Networks - - PowerPoint PPT Presentation

muxconv information multiplexing in convolutional neural
SMART_READER_LITE
LIVE PREVIEW

MUXConv: Information Multiplexing in Convolutional Neural Networks - - PowerPoint PPT Presentation

MUXConv: Information Multiplexing in Convolutional Neural Networks Zhichao Lu, Kalyanmoy Deb and Vishnu Naresh Boddeti Michigan State University { luzhicha, kdeb, vishnu } @msu.edu https://github.com/human-analysis/MUXConv MUXConv New


slide-1
SLIDE 1

MUXConv: Information Multiplexing in Convolutional Neural Networks

Zhichao Lu, Kalyanmoy Deb and Vishnu Naresh Boddeti Michigan State University

{luzhicha, kdeb, vishnu}@msu.edu https://github.com/human-analysis/MUXConv

slide-2
SLIDE 2

MUXConv

◮ New layer: Multiplexed Convolutions (Spatial + Channel Multiplexing) ◮ Idea: increase flow of information between space and channels ◮ Goal: smaller model size, increased efficiency, maintain/increase performance

slide-3
SLIDE 3

Spatial Multiplexing: Idea

𝑋 𝐼 𝐷 𝐷 𝐷 Group-wise Conv 𝑋 2 𝐼 2 4𝐷 𝑋 𝐼 𝐷 2𝐼 2𝑋 𝐷 4 𝑋 𝐼 𝐷 𝐷 𝐷

Spatial Multiplexing

𝐼 𝑠 Spatial to channel 𝑋 𝐼 𝑋 𝑠 𝑠$

Subpixel

𝑠" 𝑠 𝐼 𝑋 𝐼 𝑠 𝑋 Channel to spatial 𝑋 𝑠

Superpixel

slide-4
SLIDE 4

Spatial Multiplexing: Evaluation

40.7 56.0 65.5 70.4 72.7 5.78

8 9 10 2 3 4 5 6 7 8 9 100 2 3

35 40 45 50 55 60 65 70 75 MobileNetV2 MobileNetV2
w/
spatial
multiplexing

Number
of
MAdds
(Millions) ImageNet
Top­1
accuracy
(%)

◮ Consistent improvement on accuracy over the original depth-wise separable

convolution

◮ Particularly effective in low MAdds Regime

slide-5
SLIDE 5

Channel Multiplexing:Idea

3 x 3 conv 3 x 3 conv

+ 1 x 1 conv 3 x 3 conv + 1 x 1 conv

1 x 1 conv depth-wise 3 x 3 conv + 1 x 1 conv

(a) Original residual block (ResNet-18, ResNet-34) (b) Bottleneck (ResNet-50, DenseNet-BC) (c) Inverted bottleneck (MobileNetV2/V3, MNASNet)

1 x 1 conv group-wise 3 x 3 conv + 1 x 1 conv

(d) Proposed

↓ 1 2 3 4 5 6 7 8 ↓ 1 × 1 ↓ SpatialMUX ↓ 1×1 ↓ 1 2 3 4 5 6 7 8 mix-up ↓ 1 5 2 6 3 7 4 8 ↓ 1 × 1 ↓ SpatialMUX ↓ 1 × 1 ↓ 1 5 2 6 3 7 4 8 mix-up ↓ 1 3 5 7 2 4 6 8 ↓ Reduction Block ↓ copy leave out copy leave out

slide-6
SLIDE 6

Channel Multiplexing: Evaluation

l=0.25 l=0.5 l=0.75 w=0.75 w=0.5 r=192 r=160 r=128 (w=1.0,
r=224,
l=0.0)

100 150 200 250 300 65 66 67 68 69 70 71 72 73

l=0.25 l=0.5 l=0.75 w=0.75 w=0.5 r=192 r=160 r=128 (w=1.0,
r=224,
l=0.0)

2 2.5 3 3.5 65 66 67 68 69 70 71 72 73

width
multiplier input
resolution channel
multiplexing

Number
of
MAdds
(Millions) Number
of
Parameters
(Millions) Top
1
accuracy
(%)

◮ Consistently outperforming existing scaling methods

slide-7
SLIDE 7

Tri-Objective Search: Idea

Reference Point Reference Direction Region of Interest Ideal Point Pareto Surface Attainable Objective set 𝑒" 𝑒# Ideal Point 𝑨

Reference Point Reference Direction 𝒙 = 𝑥#,𝑥"

+

Region of Interest

◮ Simultaneously optimize for accuracy (↑), #Params (↓), and #MAdds (↓) ◮ User preference guided search through PBI1 decomposition

1Qingfu Zhang and Hui Li. Moea/d: A multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on Evolutionary Computation, 11(6):712731, 2007

slide-8
SLIDE 8

Tri-Objective Search: Evaluation

20 40 60 80 Training Time (mins) 0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 Top-1 Accuracy 10 20 30 40 50 Number of Parameters (Million) 0.800 0.825 0.850 0.875 0.900 0.925 0.950 Top-1 Accuracy N u m b e r

  • f

P a r a m e t e r s ( M i l l i

  • n

) 10 20 30 40 50 Training Time (mins) 20 40 60 80 T

  • p
  • 1

A c c u r a c y 0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94

Attainable models Reference Point-1 Reference Point-2 Reference Point-3 Ours Regularized Evolution Ours Regularized Evolution

◮ NASBench101: our search is more efficient than regularized evolution2

2Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc VLe. Regularized evolution for image classifier architecture search. In AAAI, 2019

slide-9
SLIDE 9

ImageNet-1K Classification

2 3 4 5 6 7 8 60 62 64 66 68 70 72 74 76 78 80

3 4 5 6 7 8 9 100 2 3 4 5 6 7

60 62 64 66 68 70 72 74 76 78 80

MUXNet MobileNetV2 MobileNetV3
large MobileNetV3
small MnasNet MixNet FBNet ChamNet ProxylessNAS
GPU NASNet­A AmoebaNet­A DARTS

Number
of
Parameters
(Millions) Number
of
MAdds
(Millions) Top
1
accuracy
(%)

Model Type #MAdds Ratio #Params Ratio CPU(ms) GPU(ms) Top-1 (%) Top-5 (%) MUXNet-xs (ours) auto 66M‡ 1.0x 1.8M‡ 1.0x 6.8 18 66.7 86.8 MobileNetV2 0.5 manual 97M 1.5x 2.0M 1.1x 6.2 17 65.4 86.4 MobileNetV3 small combined 66M 1.0x 2.9M 1.6x 6.2‡ 14 67.4

  • MUXNet-s (ours)

auto 117M‡ 1.0x 2.4M‡ 1.0x 9.5 25 71.6 90.3 MobileNetV1 manual 575M 4.9x 4.2M 1.8x 7.3 20 70.6 89.5 ShuffleNetV2 manual 146M 1.3x

  • 6.8

11‡ 69.4

  • ChamNet-C

auto 212M 1.8x 3.4M 1.4x

  • 71.6
  • MUXNet-m (ours)

auto 218M‡ 1.0x 3.4M‡ 1.0x 14.7 42 75.3 92.5 MobileNetV2 manual 300M 1.4x 3.4M 1.0x 8.3‡ 23 72.0 91.0 ShuffleNetV2 2× manual 591M 2.7x 7.4M 2.2x 11.0 22‡ 74.9

  • MnasNet-A1

auto 312M 1.4x 3.9M 1.1x 9.3‡ 32 75.2 92.5 MobileNetV3 large combined 219M 1.0x 5.4M 1.6x 10.0‡ 33 75.2

  • MUXNet-l (ours)

auto 318M‡ 1.0x 4.0M‡ 1.0x 19.2 74 76.6 93.2 MnasNet-A2 auto 340M 1.1x 4.8M 1.2x

  • 75.6

92.7 FBNet-C auto 375M 1.2x 5.5M 1.4x 9.1‡ 31 74.9

  • EfficientNet-B0

auto 390M‡ 1.2x 5.3M 1.3x 14.4 46 76.3 93.2 MixNet-M auto 360M‡ 1.1x 5.0M 1.2x 24.3 79 77.0 93.3

‡ indicates the objective that the method explicitly optimizes through NAS.

slide-10
SLIDE 10

Additional Experiments

Generalization to ImageNet-V2

ShuffleNetV2 ResNet18 MUXNet­s
(ours) GoogLeNet MobileNetV2 DARTS MnasNet­A1 NASNet­A
mobile MUXNet­m
(ours) MUXNet­l
(ours) DenseNet­169 ResNeXt50
32x4d

78 80 82 84 86 88 90 92 94 ImageNet ImageNet­V2

Top­5
Accuracy
(%)

10.0 8.8 9.5 8.6 8.9 9.1 9.1 8.1 8.3 8.2 7.8 7.7

PASCAL VOC2007 Detection

Network #MAdds #Params mAP (%) VGG16 + SSD 35B 26.3M 74.3 MobileNet + SSD 1.6B 9.5M 67.6 MobileNetV2 + SSDLite 0.7B 3.4M 67.4 MobileNetV2 + SSD 1.4B 8.9M 73.2 MUXNet-m + SSDLite 0.5B 3.2M 68.6 MUXNet-l + SSD 1.4B 9.9M 73.8

ADE20K Semantic Segmentation

Network #MAdds #Params mIoU (%) Acc (%) ResNet18 + C1 1.8B 11.7M 33.82 76.05 MobileNetV2 + C1 0.3B 3.5M 34.84 75.75 MUXNet-m + C1 0.2B 3.4M 32.42 75.00 ResNet18 + PPM 1.8B 11.7M 38.00 78.64 MobileNetV2 + PPM 0.3B 3.5M 35.76 77.77 MUXNet-m + PPM 0.2B 3.4M 35.80 76.33

slide-11
SLIDE 11

Additional Experiments

Transfer Learning on CIFAR

1

2 3 4 5 6 7 8 9 10 2

96 96.5 97 97.5 98 100

2 5

1000

2 5

96 96.5 97 97.5 98 1

2 3 4 5 6 7 8 9 10 2

81 82 83 84 85 86 87 88 100

2 5

1000

2 5

81 82 83 84 85 86 87 88

MUXNet ResNet-50 DenseNet-169 Inception v3 MobileNetV1 MobileNetV2 NASNet-A mobile EfficientNet-B0 MixNet-M

Number of Parameters (Millions) Number of Mult-Adds (Millions) Number of Parameters (Millions) Number of Mult-Adds (Millions) Top 1 accuracy (%) CIFAR-10 CIFAR-10 CIFAR-100 CIFAR-100

Robustness to Degradations

b r i g h t n e s s c

  • n

t r a s t d e f

  • c

u s _ b l u r e l a s t i c _ t r a n s f

  • r

m f

  • g

f r

  • s

t g a u s s i a n _ b l u r g a u s s i a n _ n

  • i

s e g l a s s _ b l u r i m p u l s e _ n

  • i

s e j p e g _ c

  • m

p r e s s i

  • n

m

  • t

i

  • n

_ b l u r p i x e l a t e s a t u r a t e s h

  • t

_ n

  • i

s e s n

  • w

s p a t t e r s p e c k l e _ n

  • i

s e z

  • m

_ b l u r 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 ShuffleNetV2 MobileNetV2 DARTS MnasNet­A1 MUXNet­m Normalized
Top­5
Acc.

Visualization on Segmentation Results

Test images Ground truth MUXNet-m +PPM