DENSELY CONNECTED CONVOLUTIONAL NETWORKS Gao Huang*, Zhuang Liu*, - - PowerPoint PPT Presentation

densely connected convolutional networks
SMART_READER_LITE
LIVE PREVIEW

DENSELY CONNECTED CONVOLUTIONAL NETWORKS Gao Huang*, Zhuang Liu*, - - PowerPoint PPT Presentation

Best paper award DENSELY CONNECTED CONVOLUTIONAL NETWORKS Gao Huang*, Zhuang Liu*, Laurens van der Maaten, Kilian Q. Weinberger Cornell University Tsinghua University Facebook AI Research CVPR 2017 CONVOLUTIONAL NETWORKS LeNet AlexNet VGG


slide-1
SLIDE 1

DENSELY CONNECTED CONVOLUTIONAL NETWORKS

Gao Huang*, Zhuang Liu*, Laurens van der Maaten, Kilian Q. Weinberger

CVPR 2017

Cornell University Tsinghua University Facebook AI Research

Best paper award

slide-2
SLIDE 2

CONVOLUTIONAL NETWORKS

LeNet AlexNet VGG Inception ResNet

slide-3
SLIDE 3

STANDARD CONNECTIVITY

slide-4
SLIDE 4

RESNET CONNECTIVITY

Deep residual learning for image recognition: [He, Zhang, Ren, Sun] (CVPR 2015)

Identity mappings promote gradient propagation. : Element-wise addition

slide-5
SLIDE 5

DENSE CONNECTIVITY

C C C C

: Channel-wise concatenation

C

slide-6
SLIDE 6

DENSE AND SLIM

k channels k channels k channels k channels

k : Growth Rate

C C C C

slide-7
SLIDE 7

x4 x2 x1 h1 h2 h3 h4 x0

FORWARD PROPAGATION

x4 x3 x2 x1 x0 x0 x1 x0 x3 x2 x1 x0 x3 x2 x1 x0

Batc ReL Con

slide-8
SLIDE 8

COMPOSITE LAYER IN DENSENET

x5 =h5([x0, …, x4]) k channels x4

Batch Norm ReLU Convolution (3x3)

x3 x2 x1 x0 x3 x2 x1 x0

slide-9
SLIDE 9

COMPOSITE LAYER IN DENSENET

Higher parameter and computational efficiency

WITH BOTTLENECK LAYER

lxk channels 4xk channels

Batch Norm ReLU Convolution (3x3) Batch Norm ReLU Convolution (1x1)

k channels x3 x2 x1 x0 x4

slide-10
SLIDE 10

DENSENET

Convolution Pooling Convolution Pooling Convolution Pooling Linear

Dense Block 1 Dense Block 2 Dense Block 3

Pooling reduces feature map sizes

Output

Feature map sizes match within each block

slide-11
SLIDE 11

ADVANTAGES OF DENSE CONNECTIVITY

slide-12
SLIDE 12

ADVANTAGE 1: STRONG GRADIENT FLOW

Error Signal

Deeply supervised Net: [Lee, Xie, Gallagher, Zhang, Tu] (2015)

Implicit “deep supervision”

slide-13
SLIDE 13

ADVANTAGE 2: PARAMETER & COMPUTATIONAL EFFICIENCY

ResNet connectivity: DenseNet connectivity: O(CxC)

C C lXk

O(lxkxk)

k

C

  • r

r e l a t e d f e a t u r e s

k: Growth rate

#parameters:

k<<C

D i v e r s i fi e d f e a t u r e s

Input Output Input Output

hl hl

slide-14
SLIDE 14

ADVANTAGE 3: MAINTAINS LOW COMPLEXITY FEATURES

y = w4h4(x) w4 Classifier uses most complex (high level) features classifier

Standard Connectivity:

h1(x) h2(x) h3(x) h4(x) x Increasingly complex features

slide-15
SLIDE 15

ADVANTAGE 3: MAINTAINS LOW COMPLEXITY FEATURES

Dense Connectivity:

h1(x) h2(x) h3(x) h4(x) x

C C C C

y = w0x + +w1h1(x) +w2h2(x) +w3h3(x) +w4h4(x) classifier w4 w0 w1 w2 w3 Classifier uses features of all complexity levels Increasingly complex features

slide-16
SLIDE 16

RESULTS

slide-17
SLIDE 17

Test Error (%) 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0

3.6 4.5 4.62 6.41

4.2

RESULTS ON CIFAR-10

ResNet (110 Layers, 1.7 M) ResNet (1001 Layers, 10.2 M) DenseNet (100 Layers, 0.8 M) DenseNet (250 Layers, 15.3 M) Previous SOTA 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 5.2 5.9 10.56 11.26 7.3 Previous SOTA

With data augmentation Without data augmentation

slide-18
SLIDE 18

Test Error (%) 10.0 15.0 20.0 25.0 30.0 35.0

17.6 22.3 22.71 27.22

20.5

RESULTS ON CIFAR-100

ResNet (110 Layers, 1.7 M) ResNet (1001 Layers, 10.2 M) DenseNet (100 Layers, 0.8 M) DenseNet (250 Layers, 15.3 M) Previous SOTA 10.0 15.0 20.0 25.0 30.0 35.0 19.6 24.2 33.47 35.58 28.2 Previous SOTA

With data augmentation Without data augmentation

slide-19
SLIDE 19

Top-1 error (%) 20.0 22.0 24.0 26.0 28.0 GFLOPs 3 10 16 23 29

DenseNet ResNet

RESULTS ON IMAGENET

Top-1 error (%) 20.0 22.0 24.0 26.0 28.0 # Parameters (M) 20 40 60 80

DenseNet ResNet

ResNet-152 ResNet-101 ResNet-50 ResNet-34 ResNet-152 ResNet-101 ResNet-50 ResNet-34 DenseNet-264(k=48) DenseNet-264 DenseNet-201 DenseNet-169 DenseNet-121 DenseNet-121 DenseNet-169 DenseNet-201 DenseNet-264 DenseNet-264(k=48)

Top-1: 20.27% Top-5: 5.17%

slide-20
SLIDE 20

MULTI-SCALE DENSENET

Classifier 4 Classifier 2 Classifier 3 Classifier 1

… … … …

Classifier 2 Classifier 3 Classifier 1 cat: 0.2 0.2 ≱ threshold cat: 0.4 0.4 ≱ threshold cat: 0.6 0.6 > threshold

Multi-Scale DenseNet: [Huang, Chen, Li, Wu, van der Maaten, Weinberger] (arXiv Preprint: 1703.09844)

(Preview)

slide-21
SLIDE 21

MULTI-SCALE DENSENET

Classifier 4 Classifier 2 Classifier 3 Classifier 1

… … … Test Input “Easy” examples “Hard” examples

Inference Speed: ~ 2.6x faster than ResNets ~ 1.3x faster than DenseNets

(Preview) …

slide-22
SLIDE 22

Memory efficient Torch implementation: https://github.com/liuzhuang13/DenseNet

Our Caffe Implementation Our memory-efficient Caffe Implementation. Our memory-efficient PyTorch Implementation. PyTorch Implementation by Andreas Veit. PyTorch Implementation by Brandon Amos. MXNet Implementation by Nicatio. MXNet Implementation (supports ImageNet) by Xiong Lin. Tensorflow Implementation by Yixuan Li. Tensorflow Implementation by Laurent Mazare. Tensorflow Implementation (with BC structure) by Illarion Khlestov. Lasagne Implementation by Jan Schlüter. Keras Implementation by tdeboissiere. Keras Implementation by Roberto de Moura Estevão Filho. Keras Implementation (with BC structure) by Somshubra Majumdar. Chainer Implementation by Toshinori Hanya. Chainer Implementation by Yasunori Kudo.

Other implementations:

slide-23
SLIDE 23

REFERENCES

  • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016
  • Chen-Yu Lee, et al. "Deeply-supervised nets" AISTATS 2015
  • Gao Huang, et al. "Deep networks with stochastic depth" ECCV 2016
  • Gao Huang, et al. "Multi-Scale Dense Convolutional Networks for Efficient Prediction" arXiv

preprint arXiv:1703.09844 (2017)

  • Geoff Pleiss, et al. "Memory-Efficient Implementation of DenseNets”, arXiv preprint arXiv:

1707.06990 (2017)