[PPT] - An Overview of Deep Residual Learning Semih Yagcioglu 01.03.2016 PowerPoint Presentation

SLIDE 1

An Overview of Deep Residual Learning

Semih Yagcioglu

01.03.2016

SLIDE 2

Deep Residual Learning

Microsoft Research Asia (MSRA)
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning

for Image Recognition”. arXiv 2015.

Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. “Faster R-CNN: Towards

Real-Time Object Detection with Region Proposal Networks”. NIPS 2015.

ILSVRC & COCO 2015 competitions

SLIDE 3

Slide Credit: He et al. (MSRA)

MSRA @ ILSVRC & COCO 2015 Competitions

1st places in all five main tracks
ImageNet Classification: “Ultra-deep” 152-layer nets
ImageNet Detection: 16% better than 2nd
ImageNet Localization: 27% better than 2nd
COCO Detection: 11% better than 2nd
COCO Segmentation: 12% better than 2nd

*improvements are relative numbers

SLIDE 4

SLIDE 5

SLIDE 6

SLIDE 7

SLIDE 8

Slide Credit: He et al. (MSRA)

3.57 6.7 7.3 16.4 11.7 25.8 28.2

ILSVRC'15

ILSVRC'14 ILSVRC'14 ILSVRC'13 ILSVRC'12 ILSVRC'11 ILSVRC'10 ResNet GoogleNet VGG AlexNet

ImageNet Classification top-5 error (%)

shallow 8 layers 19 layers 22 layers

152 layers

8 layers

Revolution of Depth

SLIDE 9

Slide Credit: He et al. (MSRA)

58 66 86 HOG, DPM AlexNet (RCNN) VGG (RCNN) ResNet (Faster RCNN)*

PASCAL VOC 2007 Object Detection mAP (%)

shallow 8 layers

34

16 layers

101 layers

*w/ other improvements & more data

Engines of visual recognition

Revolution of Depth

SLIDE 10

Residual learning reformulates the learning procedure and redirects the information flow in deep neural networks.

SLIDE 11

Slide Credit: He et al. (MSRA)

11x11 conv, 96, /4, pool/2

5x5 conv, 256, pool/2 3x3 conv, 384 3x3 conv, 384 3x3 conv, 256, pool/2 fc, 4096 fc, 4096 fc, 1000 AlexNet, 8 layers (ILSVRC 2012)

Revolution of Depth

SLIDE 12

Slide Credit: He et al. (MSRA)

11x11 conv, 96, /4, pool/2

5x5 conv, 256, pool/2 3x3 conv, 384 3x3 conv, 384 3x3 conv, 256, pool/2 fc, 4096 fc, 4096 fc, 1000

AlexNet, 8 layers (ILSVRC 2012)

3x3 conv, 64 3x3 conv, 64, pool/2 3x3 conv, 128 3x3 conv, 128, pool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256, pool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512, pool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512, pool/2 fc, 4096 fc, 4096 fc, 1000

VGG, 19 layers (ILSVRC 2014)

i n pu t C o n v 7 x 7 + 2 (S ) Max P ool 3 x 3 + 2 (S ) L o c a l R e s p N o r m C o n v 1 x 1 + 1 (V) C o n v 3 x 3 + 1 (S ) L o c a l R e s p N o r m Max P ool 3 x 3 + 2 (S ) C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t Max P ool 3 x 3 + 2 (S ) C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v M a x P o o l Av e r a g e P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 3 (V) C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v M a x P o o l Av e r a g e P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 3 (V) D e pt h Co n c a t Max P ool 3 x 3 + 2 (S ) C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t Av e r a ge Po o l 7 x 7 + 1 (V) FC D e p t h C o n c a t F C C o n v C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) FC S oft max A c t i v a t i o n s o f t m a x 0 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) FC FC S oft max A c t i v a t i o n s o f t m a x 1 S oft max A c t i v a t i o n s o f t m a x 2

GoogleNet, 22 layers (ILSVRC 2014)

Revolution of Depth

SLIDE 13

Slide Credit: He et al. (MSRA)

1x1 co nv , 256 1x1 co nv , 128, /2 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 256, /2 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 512, /2 3x3 co nv , 512 1x1 co nv , 2048 1x1 co nv , 512 3x3 co nv , 512 1x1 co nv , 2048 1x1 co nv , 512 3x3 co nv , 512 1x1 co nv , 2048 ave pool, fc 1000 7x7 conv, 64, /2, pool /2 1x1 co nv , 64 3x3 co nv , 64 1x1 co nv , 256 1x1 co nv , 64 3x3 co nv , 64 1x1 co nv , 256 1x1 co nv , 64 3x3 co nv , 64

AlexNet, 8 layers (ILSVRC 2012) ResNet, 152 layers (ILSVRC 2015)

3x3 conv , 64 3x3 conv , 64, pool/2 3x3 conv , 128 3x3 conv , 128, pool/2 3x3 conv , 256 3x3 conv , 256 3x3 conv , 256 3x3 conv , 256, pool/2 3x3 conv , 512 3x3 conv , 512 3x3 conv , 512 3x3 conv , 512, pool/2 3x3 conv , 512 3x3 conv , 512 3x3 conv , 512 3x3 conv , 512, pool/2 fc, 4096 fc, 4096 fc, 1000 5x5 conv , 256, pool/2 3x3 conv , 384 3x3 conv , 384 3x3 conv , 256, pool/2 fc, 4096 fc, 4096 fc, 1000

VGG, 19 layers (ILSVRC 2014)

Revolution of Depth

SLIDE 14

Slide Credit: He et al. (MSRA)

1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 128, /2 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 256, /2

7x7 conv, 64, /2, pool /2

3x3 conv, 256

Revolution of Depth

ResNet, 152 layers

SLIDE 15

Slide Credit: He et al. (MSRA)

1x1 conv, 256, / 2 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 15 1x1 conv, 256 3x3 conv, 256

Revolution of Depth

ResNet, 152 layers

SLIDE 16

Slide Credit: He et al. (MSRA)

1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256

Revolution of Depth

ResNet, 152 layers

3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 16 3x3 conv, 256 1x1 conv, 1024

SLIDE 17

Slide Credit: He et al. (MSRA)

1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 512, /2 3x3 conv, 512 1x1 conv, 2048 1x1 conv, 512 3x3 conv, 512 1x1 conv, 2048 1x1 conv, 512 3x3 conv, 512 1x1 conv, 2048 ave pool, fc 1000

Revolution of Depth

ResNet, 152 layers

SLIDE 18

Slide Credit: He et al. (MSRA)

Is learning better networks as simple as stacking more layers?

SLIDE 19

Slide Credit: He et al. (MSRA)

Simply stacking layers?

1 2 4 5 6 0 0 10 20 3

iter. (1e4)

1 2 4 5 6 10 20 3

iter. (1e4)

test error (%)

CIFAR-10

train error (%)

20-layer 56-layer 56-layer 20-layer

Plain nets: stacking 3x3 conv layers…
56-layer net has higher training error and test error than 20-layer net

SLIDE 20

Slide Credit: He et al. (MSRA)

Simply stacking layers?

1 2 4 5 6 10 20 3

iter. (1e4)

error (%)

5

plain-20 plain-32 plain-44 plain-56

CIFAR-10 56-layer 44-layer 32-layer 20-layer

10 20 40 50 20 30 40 50 60 30

iter. (1e4)

error (%)

plain-18 plain-34

ImageNet-1000 34- layer 18- layer

“Overly deep” plain nets have higher training error
A general phenomenon, observed in many datasets

solid: test/val dashed: train

SLIDE 21

Slide Credit: He et al. (MSRA)

7x7 conv, 64, /2 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128, /2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 256, /2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 512, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 fc 1000

a shallower model (18 layers)

a deeper counterpart (34 layers)

7x7 conv, 64, /2 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128, /2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 256, /2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 512, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 fc 1000

“extra” layers

A deeper model should not have

higher training error

A solution by

construction:

riginal layers: copied from a

learned shallower model

extra layers: set as identity
at least the same training error
Optimization difficulties: solvers

cannot find the solution when going deeper…

SLIDE 22

Slide Credit: He et al. (MSRA)

Deep Residual Learning

any two stacked layers

Plain net

𝑦 𝑦

weight layer weight layer

relu relu

H(x) is any desired mapping, hope the 2 weight layers fit F(x)

H(x)

SLIDE 23

Slide Credit: He et al. (MSRA)

Deep Residual Learning

Residual net

SLIDE 24

Slide Credit: He et al. (MSRA)

Deep Residual Learning

SLIDE 25

Building Block Oversimplified

via reddit: londons_explorer

SLIDE 26

Slide Credit: He et al. (MSRA)

Related Works – Residual Representations

VLAD & Fisher Vector [Jegou et al 2010], [Perronnin et al 2007]
Encoding residual vectors; powerful shallower representations.
Product Quantization (IVF-ADC) [Jegou et al 2011]
Quantizing residual vectors; efficient nearest-neighbor search.
MultiGrid & Hierarchical Precondition [Briggs, et al 2000], [Szeliski 1990, 2006]
Solving residual sub-problems; efficient PDE solvers.

SLIDE 27

More Related Work

Highway Networks - Srivastava et al. (2015) (http://arxiv.org/abs/1505.00387)

Introduce a new architecture designed to ease gradient-based training of very deep

networks. This architecture allow unimpeded information flow across several layers on

"information highways”. Use a soft gate that depends on the data, so 0<alpha<1 times the activations goes through the layer, and 1 - alpha is directly forwarded to the next layer (where alpha is a function of the activation).

LSTM Networks - Hochreiter & Schmidhuber (1997)

Designed to avoid the long-term dependency problem. Remembering information for long periods of time. Each memory cell is associated with an input gate, an output gate and an internal state that feeds into itself unperturbed across time steps.

SLIDE 28

Slide Credit: He et al. (MSRA)

7x7 conv, 64, /2 pool, /2 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128, /2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 256, /2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 512, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 avg pool fc 1000 7x7 conv, 64, /2 pool, /2 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128, /2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 256, /2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 512, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 avg pool fc 1000

plain net ResNet

SLIDE 29

SLIDE 30

A Residual Block

SLIDE 31

Slide Credit: He et al. (MSRA)

Training

All plain/residual nets are trained from scratch
All plain/residual nets use Batch Normalization
Standard hyper-parameters & augmentation

SLIDE 32

Slide Credit: He et al. (MSRA)

CIFAR-10 experiments

1 2 4 5 6 10 20 3

iter. (1e4)

error (%)

5

plain-20 plain-32 plain-44 plain-56

56-layer 44-layer 32-layer 20-layer CIFAR-10 plain nets

1 2 4 5 6 5 10 20 3

iter. (1e4)

error (%)

ResNet-20 ResNet-32 ResNet-44 ResNet-56 ResNet-110

CIFAR-10 ResNets 20-layer 32-layer 44-layer 56-layer 110-layer

Deep ResNets can be trained without difficulties
Deeper ResNets have lower training error, and also lower test error

solid: test dashed: train

SLIDE 33

Slide Credit: He et al. (MSRA)

ImageNet experiments

10 20 40 50 20 30 40 50 60 30

iter. (1e4)

error (%)

ResNet-18 ResNet-34

10 20 40 50 20 30 40 50 60 30

iter. (1e4)

error (%)

plain-18 plain-34

ImageNet plain nets ImageNet ResNets

solid: test dashed: train

34-layer 18-layer 18-layer 34-layer

Deep ResNets can be trained without difficulties
Deeper ResNets have lower training error, and also lower test error!

SLIDE 34

Slide Credit: He et al. (MSRA)

ImageNet experiments

A practical design of going deeper

3x3, 64 3x3, 64

relu relu 64-d

3x3, 64 1x1, 64

relu

1x1, 256

relu relu 256-d

all-3x3 bottleneck

(for ResNet-50/101/152)

similar complexity

SLIDE 35

Slide Credit: He et al. (MSRA)

ImageNet experiments

7.4 6.7 6.1 5.7

4 5 6 7 8

ResNet-34 ResNet-152 ResNet-101 ResNet-50

10-crop testing, top-5 val error (%)

this model has lower time complexity than VGG-16/19

Deeper ResNets have lower error

SLIDE 36

Slide Credit: He et al. (MSRA)

ImageNet experiments

3.57 6.7 7.3 16.4 11.7 25.8 28.2

ILSVRC'15 ILSVRC'14 ILSVRC'14 ILSVRC'13 ILSVRC'12 ILSVRC'11 ILSVRC'10 ResNet GoogleNet VGG AlexNet

ImageNet Classification top-5 error (%)

shallow 8 layers 19 layers 22 layers

152 layers

8 layers

SLIDE 37

Just classification?

A treasure from ImageNet is on learning features.

SLIDE 38

Slide Credit: He et al. (MSRA)

“Features matter.” (quote [Girshick et al. 2014], the R-CNN paper)

Their results are all based on ResNet-101
Their features are well transferrable

SLIDE 39

Faster R-CNN

Ren et al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”. NIPS 2015.

SLIDE 40

Slide Credit: He et al. (MSRA)

Results on COCO – too many objects, let’s check carefully!

SLIDE 41

Slide Credit: He et al. (MSRA)

SLIDE 42

Slide Credit: He et al. (MSRA)

SLIDE 43

SLIDE 44

Fresh From The Torch Blog!

SLIDE 45

Implementation

SLIDE 46

SLIDE 47

SLIDE 48

SLIDE 49

Experiments

There are currently a few implementations of Deep Residual Learning
Torch (will be covered in detail in the upcoming weeks)
A few Python based implementation (mxnet, lasagne)
I have played with a few these implementations
Most of them are quite easy to grasp
But again, they are built upon existing frameworks
If you are not familiar with the abstractions, i.e. how things are implemented underneath,

then it might not make sense much

Torch implementation is the best I have seen in terms of readability and understandability

SLIDE 50

How to Get Started with Lasagne Implementation

1. Install Requirements (sudo pip install -r https://raw.githubusercontent.com/Lasagne/Lasagne/

master/requirements.txt)

2. Install Lasagne (sudo pip install https://github.com/Lasagne/Lasagne/archive/master.zip)
3. It will probably fail. Fix things accordingly.
4. Download this : https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
5. Extract the cifar-10 dataset into a folder
6. Download this folder and code : https://github.com/Lasagne/Recipes/tree/master/papers/

deep_residual_learning

7. Run Deep_Residual_Learning_CIFAR-10.py

SLIDE 51

Conclusions

Amazing results!
Simple idea
Deeper is better
Features matter
Their building block is so simple and effective that it will change the direction of the upcoming

works!

Actually it just did!
Inception-v4 (3.08% Top-5 error)
Torch implementation reports better results than the paper!

SLIDE 52

24 Feb 2016

Hot From the Oven

New ImageNet Record!

Szegedy et al, Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning, arXiv 2016

Slide Credit: Fei-Fei Li & Andrej Karpathy & Justin Johnson

6

SLIDE 53

Inception-v4

Szegedy et al, Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning, arXiv 2016

SLIDE 54

Final Remarks

Seems quite easy to implement.
Several third party frameworks are adopting this method.
Torch implementation will be described in detail in the upcoming Torch tutorial!
Finally the idea is quite simple and yields amazing results!
Compared to previous works, requires less hardware and time!

An Overview of Deep Residual Learning

Deep Residual Learning

MSRA @ ILSVRC & COCO 2015 Competitions

Revolution of Depth

Revolution of Depth

Residual learning reformulates the learning procedure and redirects the information flow in deep neural networks.

Revolution of Depth

Revolution of Depth

Revolution of Depth

Revolution of Depth

Revolution of Depth

Revolution of Depth

Revolution of Depth

Is learning better networks as simple as stacking more layers?

Simply stacking layers?

Simply stacking layers?

Deep Residual Learning

𝑦 𝑦

relu relu

Deep Residual Learning

Deep Residual Learning

Building Block Oversimplified

Related Works – Residual Representations

More Related Work

Network “Design”

A Residual Block

Training

CIFAR-10 experiments

ImageNet experiments

ImageNet experiments

all-3x3 bottleneck

ImageNet experiments

ImageNet experiments

Just classification?

A treasure from ImageNet is on learning features.

“Features matter.” (quote [Girshick et al. 2014], the R-CNN paper)

Faster R-CNN

Results on COCO – too many objects, let’s check carefully!

Fresh From The Torch Blog!

Implementation

Experiments

How to Get Started with Lasagne Implementation

Conclusions

24 Feb 2016

Hot From the Oven

6

Inception-v4

Final Remarks