An Overview of Deep Residual Learning
Semih Yagcioglu
01.03.2016
An Overview of Deep Residual Learning Semih Yagcioglu 01.03.2016 - - PowerPoint PPT Presentation
An Overview of Deep Residual Learning Semih Yagcioglu 01.03.2016 Deep Residual Learning Microsoft Research Asia (MSRA) Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv
Semih Yagcioglu
01.03.2016
for Image Recognition”. arXiv 2015.
Real-Time Object Detection with Region Proposal Networks”. NIPS 2015.
Slide Credit: He et al. (MSRA)
*improvements are relative numbers
Slide Credit: He et al. (MSRA)
3.57 6.7 7.3 16.4 11.7 25.8 28.2
ILSVRC'15
ILSVRC'14 ILSVRC'14 ILSVRC'13 ILSVRC'12 ILSVRC'11 ILSVRC'10 ResNet GoogleNet VGG AlexNet
ImageNet Classification top-5 error (%)
shallow 8 layers 19 layers 22 layers
152 layers
8 layers
Slide Credit: He et al. (MSRA)
58 66 86 HOG, DPM AlexNet (RCNN) VGG (RCNN) ResNet (Faster RCNN)*
PASCAL VOC 2007 Object Detection mAP (%)
shallow 8 layers
34
16 layers
101 layers
*w/ other improvements & more data
Engines of visual recognition
Slide Credit: He et al. (MSRA)
11x11 conv, 96, /4, pool/2
5x5 conv, 256, pool/2 3x3 conv, 384 3x3 conv, 384 3x3 conv, 256, pool/2 fc, 4096 fc, 4096 fc, 1000 AlexNet, 8 layers (ILSVRC 2012)
Slide Credit: He et al. (MSRA)
11x11 conv, 96, /4, pool/2
5x5 conv, 256, pool/2 3x3 conv, 384 3x3 conv, 384 3x3 conv, 256, pool/2 fc, 4096 fc, 4096 fc, 1000
AlexNet, 8 layers (ILSVRC 2012)
3x3 conv, 64 3x3 conv, 64, pool/2 3x3 conv, 128 3x3 conv, 128, pool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256, pool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512, pool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512, pool/2 fc, 4096 fc, 4096 fc, 1000
VGG, 19 layers (ILSVRC 2014)
i n pu t C o n v 7 x 7 + 2 (S ) Max P ool 3 x 3 + 2 (S ) L o c a l R e s p N o r m C o n v 1 x 1 + 1 (V) C o n v 3 x 3 + 1 (S ) L o c a l R e s p N o r m Max P ool 3 x 3 + 2 (S ) C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t Max P ool 3 x 3 + 2 (S ) C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v M a x P o o l Av e r a g e P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 3 (V) C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v M a x P o o l Av e r a g e P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 3 (V) D e pt h Co n c a t Max P ool 3 x 3 + 2 (S ) C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t Av e r a ge Po o l 7 x 7 + 1 (V) FC D e p t h C o n c a t F C C o n v C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) FC S oft max A c t i v a t i o n s o f t m a x 0 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) FC FC S oft max A c t i v a t i o n s o f t m a x 1 S oft max A c t i v a t i o n s o f t m a x 2GoogleNet, 22 layers (ILSVRC 2014)
Slide Credit: He et al. (MSRA)
1x1 co nv , 256 1x1 co nv , 128, /2 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 256, /2 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 512, /2 3x3 co nv , 512 1x1 co nv , 2048 1x1 co nv , 512 3x3 co nv , 512 1x1 co nv , 2048 1x1 co nv , 512 3x3 co nv , 512 1x1 co nv , 2048 ave pool, fc 1000 7x7 conv, 64, /2, pool /2 1x1 co nv , 64 3x3 co nv , 64 1x1 co nv , 256 1x1 co nv , 64 3x3 co nv , 64 1x1 co nv , 256 1x1 co nv , 64 3x3 co nv , 64AlexNet, 8 layers (ILSVRC 2012) ResNet, 152 layers (ILSVRC 2015)
3x3 conv , 64 3x3 conv , 64, pool/2 3x3 conv , 128 3x3 conv , 128, pool/2 3x3 conv , 256 3x3 conv , 256 3x3 conv , 256 3x3 conv , 256, pool/2 3x3 conv , 512 3x3 conv , 512 3x3 conv , 512 3x3 conv , 512, pool/2 3x3 conv , 512 3x3 conv , 512 3x3 conv , 512 3x3 conv , 512, pool/2 fc, 4096 fc, 4096 fc, 1000 5x5 conv , 256, pool/2 3x3 conv , 384 3x3 conv , 384 3x3 conv , 256, pool/2 fc, 4096 fc, 4096 fc, 1000VGG, 19 layers (ILSVRC 2014)
Slide Credit: He et al. (MSRA)
1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 128, /2 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 256, /2
7x7 conv, 64, /2, pool /23x3 conv, 256
ResNet, 152 layers
Slide Credit: He et al. (MSRA)
1x1 conv, 256, / 2 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 15 1x1 conv, 256 3x3 conv, 256
ResNet, 152 layers
Slide Credit: He et al. (MSRA)
1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256
ResNet, 152 layers
3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 16 3x3 conv, 256 1x1 conv, 1024
Slide Credit: He et al. (MSRA)
1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 512, /2 3x3 conv, 512 1x1 conv, 2048 1x1 conv, 512 3x3 conv, 512 1x1 conv, 2048 1x1 conv, 512 3x3 conv, 512 1x1 conv, 2048 ave pool, fc 1000
ResNet, 152 layers
Slide Credit: He et al. (MSRA)
Slide Credit: He et al. (MSRA)
1 2 4 5 6 0 0 10 20 3
1 2 4 5 6 10 20 3
test error (%)
CIFAR-10
train error (%)
20-layer 56-layer 56-layer 20-layer
Slide Credit: He et al. (MSRA)
1 2 4 5 6 10 20 3
error (%)
5
plain-20 plain-32 plain-44 plain-56
CIFAR-10 56-layer 44-layer 32-layer 20-layer
10 20 40 50 20 30 40 50 60 30
error (%)
plain-18 plain-34
ImageNet-1000 34- layer 18- layer
solid: test/val dashed: train
Slide Credit: He et al. (MSRA)
7x7 conv, 64, /2 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128, /2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 256, /2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 512, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 fc 1000
a shallower model (18 layers)
a deeper counterpart (34 layers)
7x7 conv, 64, /2 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128, /2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 256, /2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 512, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 fc 1000
“extra” layers
higher training error
construction:
learned shallower model
cannot find the solution when going deeper…
Slide Credit: He et al. (MSRA)
any two stacked layers
weight layer weight layer
H(x) is any desired mapping, hope the 2 weight layers fit F(x)
H(x)
Slide Credit: He et al. (MSRA)
Slide Credit: He et al. (MSRA)
via reddit: londons_explorer
Slide Credit: He et al. (MSRA)
Introduce a new architecture designed to ease gradient-based training of very deep
"information highways”. Use a soft gate that depends on the data, so 0<alpha<1 times the activations goes through the layer, and 1 - alpha is directly forwarded to the next layer (where alpha is a function of the activation).
Designed to avoid the long-term dependency problem. Remembering information for long periods of time. Each memory cell is associated with an input gate, an output gate and an internal state that feeds into itself unperturbed across time steps.
Slide Credit: He et al. (MSRA)
7x7 conv, 64, /2 pool, /2 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128, /2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 256, /2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 512, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 avg pool fc 1000 7x7 conv, 64, /2 pool, /2 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128, /2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 256, /2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 512, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 avg pool fc 1000
plain net ResNet
Slide Credit: He et al. (MSRA)
Slide Credit: He et al. (MSRA)
1 2 4 5 6 10 20 3
error (%)
5
plain-20 plain-32 plain-44 plain-56
56-layer 44-layer 32-layer 20-layer CIFAR-10 plain nets
1 2 4 5 6 5 10 20 3
error (%)
ResNet-20 ResNet-32 ResNet-44 ResNet-56 ResNet-110
CIFAR-10 ResNets 20-layer 32-layer 44-layer 56-layer 110-layer
solid: test dashed: train
Slide Credit: He et al. (MSRA)
10 20 40 50 20 30 40 50 60 30
error (%)
ResNet-18 ResNet-34
10 20 40 50 20 30 40 50 60 30
error (%)
plain-18 plain-34
ImageNet plain nets ImageNet ResNets
solid: test dashed: train
34-layer 18-layer 18-layer 34-layer
Slide Credit: He et al. (MSRA)
3x3, 64 3x3, 64
relu relu 64-d
3x3, 64 1x1, 64
relu
1x1, 256
relu relu 256-d
(for ResNet-50/101/152)
similar complexity
Slide Credit: He et al. (MSRA)
7.4 6.7 6.1 5.7
4 5 6 7 8
ResNet-34 ResNet-152 ResNet-101 ResNet-50
10-crop testing, top-5 val error (%)
this model has lower time complexity than VGG-16/19
Slide Credit: He et al. (MSRA)
3.57 6.7 7.3 16.4 11.7 25.8 28.2
ILSVRC'15 ILSVRC'14 ILSVRC'14 ILSVRC'13 ILSVRC'12 ILSVRC'11 ILSVRC'10 ResNet GoogleNet VGG AlexNet
ImageNet Classification top-5 error (%)
shallow 8 layers 19 layers 22 layers
152 layers
8 layers
Slide Credit: He et al. (MSRA)
Ren et al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”. NIPS 2015.
Slide Credit: He et al. (MSRA)
Slide Credit: He et al. (MSRA)
Slide Credit: He et al. (MSRA)
then it might not make sense much
master/requirements.txt)
deep_residual_learning
works!
New ImageNet Record!
Szegedy et al, Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning, arXiv 2016
Slide Credit: Fei-Fei Li & Andrej Karpathy & Justin Johnson
Szegedy et al, Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning, arXiv 2016