An Overview of Deep Residual Learning Semih Yagcioglu 01.03.2016 - - PowerPoint PPT Presentation

an overview of deep residual learning
SMART_READER_LITE
LIVE PREVIEW

An Overview of Deep Residual Learning Semih Yagcioglu 01.03.2016 - - PowerPoint PPT Presentation

An Overview of Deep Residual Learning Semih Yagcioglu 01.03.2016 Deep Residual Learning Microsoft Research Asia (MSRA) Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arXiv


slide-1
SLIDE 1

An Overview of Deep Residual Learning

Semih Yagcioglu

01.03.2016

slide-2
SLIDE 2

Deep Residual Learning

  • Microsoft Research Asia (MSRA)
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning

for Image Recognition”. arXiv 2015.

  • Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. “Faster R-CNN: Towards

Real-Time Object Detection with Region Proposal Networks”. NIPS 2015.

  • ILSVRC & COCO 2015 competitions
slide-3
SLIDE 3

Slide Credit: He et al. (MSRA)

MSRA @ ILSVRC & COCO 2015 Competitions

  • 1st places in all five main tracks
  • ImageNet Classification: “Ultra-deep” 152-layer nets
  • ImageNet Detection: 16% better than 2nd
  • ImageNet Localization: 27% better than 2nd
  • COCO Detection: 11% better than 2nd
  • COCO Segmentation: 12% better than 2nd

*improvements are relative numbers

slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8

Slide Credit: He et al. (MSRA)

3.57 6.7 7.3 16.4 11.7 25.8 28.2

ILSVRC'15

ILSVRC'14 ILSVRC'14 ILSVRC'13 ILSVRC'12 ILSVRC'11 ILSVRC'10 ResNet GoogleNet VGG AlexNet

ImageNet Classification top-5 error (%)

shallow 8 layers 19 layers 22 layers

152 layers

8 layers

Revolution of Depth

slide-9
SLIDE 9

Slide Credit: He et al. (MSRA)

58 66 86 HOG, DPM AlexNet (RCNN) VGG (RCNN) ResNet (Faster RCNN)*

PASCAL VOC 2007 Object Detection mAP (%)

shallow 8 layers

34

16 layers

101 layers

*w/ other improvements & more data

Engines of visual recognition

Revolution of Depth

slide-10
SLIDE 10

Residual learning reformulates the learning procedure and redirects the information flow in deep neural networks.

slide-11
SLIDE 11

Slide Credit: He et al. (MSRA)

11x11 conv, 96, /4, pool/2

5x5 conv, 256, pool/2 3x3 conv, 384 3x3 conv, 384 3x3 conv, 256, pool/2 fc, 4096 fc, 4096 fc, 1000 AlexNet, 8 layers (ILSVRC 2012)

Revolution of Depth

slide-12
SLIDE 12

Slide Credit: He et al. (MSRA)

11x11 conv, 96, /4, pool/2

5x5 conv, 256, pool/2 3x3 conv, 384 3x3 conv, 384 3x3 conv, 256, pool/2 fc, 4096 fc, 4096 fc, 1000

AlexNet, 8 layers (ILSVRC 2012)

3x3 conv, 64 3x3 conv, 64, pool/2 3x3 conv, 128 3x3 conv, 128, pool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256, pool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512, pool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512, pool/2 fc, 4096 fc, 4096 fc, 1000

VGG, 19 layers (ILSVRC 2014)

i n pu t C o n v 7 x 7 + 2 (S ) Max P ool 3 x 3 + 2 (S ) L o c a l R e s p N o r m C o n v 1 x 1 + 1 (V) C o n v 3 x 3 + 1 (S ) L o c a l R e s p N o r m Max P ool 3 x 3 + 2 (S ) C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t Max P ool 3 x 3 + 2 (S ) C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v M a x P o o l Av e r a g e P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 3 (V) C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v M a x P o o l Av e r a g e P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 3 (V) D e pt h Co n c a t Max P ool 3 x 3 + 2 (S ) C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t Av e r a ge Po o l 7 x 7 + 1 (V) FC D e p t h C o n c a t F C C o n v C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) FC S oft max A c t i v a t i o n s o f t m a x 0 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) FC FC S oft max A c t i v a t i o n s o f t m a x 1 S oft max A c t i v a t i o n s o f t m a x 2

GoogleNet, 22 layers (ILSVRC 2014)

Revolution of Depth

slide-13
SLIDE 13

Slide Credit: He et al. (MSRA)

1x1 co nv , 256 1x1 co nv , 128, /2 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 256, /2 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 512, /2 3x3 co nv , 512 1x1 co nv , 2048 1x1 co nv , 512 3x3 co nv , 512 1x1 co nv , 2048 1x1 co nv , 512 3x3 co nv , 512 1x1 co nv , 2048 ave pool, fc 1000 7x7 conv, 64, /2, pool /2 1x1 co nv , 64 3x3 co nv , 64 1x1 co nv , 256 1x1 co nv , 64 3x3 co nv , 64 1x1 co nv , 256 1x1 co nv , 64 3x3 co nv , 64

AlexNet, 8 layers (ILSVRC 2012) ResNet, 152 layers (ILSVRC 2015)

3x3 conv , 64 3x3 conv , 64, pool/2 3x3 conv , 128 3x3 conv , 128, pool/2 3x3 conv , 256 3x3 conv , 256 3x3 conv , 256 3x3 conv , 256, pool/2 3x3 conv , 512 3x3 conv , 512 3x3 conv , 512 3x3 conv , 512, pool/2 3x3 conv , 512 3x3 conv , 512 3x3 conv , 512 3x3 conv , 512, pool/2 fc, 4096 fc, 4096 fc, 1000 5x5 conv , 256, pool/2 3x3 conv , 384 3x3 conv , 384 3x3 conv , 256, pool/2 fc, 4096 fc, 4096 fc, 1000

VGG, 19 layers (ILSVRC 2014)

Revolution of Depth

slide-14
SLIDE 14

Slide Credit: He et al. (MSRA)

1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 128, /2 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 256, /2

7x7 conv, 64, /2, pool /2

3x3 conv, 256

Revolution of Depth

ResNet, 152 layers

slide-15
SLIDE 15

Slide Credit: He et al. (MSRA)

1x1 conv, 256, / 2 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 15 1x1 conv, 256 3x3 conv, 256

Revolution of Depth

ResNet, 152 layers

slide-16
SLIDE 16

Slide Credit: He et al. (MSRA)

1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256

Revolution of Depth

ResNet, 152 layers

3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 16 3x3 conv, 256 1x1 conv, 1024

slide-17
SLIDE 17

Slide Credit: He et al. (MSRA)

1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 512, /2 3x3 conv, 512 1x1 conv, 2048 1x1 conv, 512 3x3 conv, 512 1x1 conv, 2048 1x1 conv, 512 3x3 conv, 512 1x1 conv, 2048 ave pool, fc 1000

Revolution of Depth

ResNet, 152 layers

slide-18
SLIDE 18

Slide Credit: He et al. (MSRA)

Is learning better networks as simple as stacking more layers?

slide-19
SLIDE 19

Slide Credit: He et al. (MSRA)

Simply stacking layers?

1 2 4 5 6 0 0 10 20 3

  • iter. (1e4)

1 2 4 5 6 10 20 3

  • iter. (1e4)

test error (%)

CIFAR-10

train error (%)

20-layer 56-layer 56-layer 20-layer

  • Plain nets: stacking 3x3 conv layers…
  • 56-layer net has higher training error and test error than 20-layer net
slide-20
SLIDE 20

Slide Credit: He et al. (MSRA)

Simply stacking layers?

1 2 4 5 6 10 20 3

  • iter. (1e4)

error (%)

5

plain-20 plain-32 plain-44 plain-56

CIFAR-10 56-layer 44-layer 32-layer 20-layer

10 20 40 50 20 30 40 50 60 30

  • iter. (1e4)

error (%)

plain-18 plain-34

ImageNet-1000 34- layer 18- layer

  • “Overly deep” plain nets have higher training error
  • A general phenomenon, observed in many datasets

solid: test/val dashed: train

slide-21
SLIDE 21

Slide Credit: He et al. (MSRA)

7x7 conv, 64, /2 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128, /2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 256, /2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 512, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 fc 1000

a shallower model (18 layers)

a deeper counterpart (34 layers)

7x7 conv, 64, /2 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128, /2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 256, /2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 512, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 fc 1000

“extra” layers

  • A deeper model should not have

higher training error

  • A solution by

construction:

  • riginal layers: copied from a

learned shallower model

  • extra layers: set as identity
  • at least the same training error
  • Optimization difficulties: solvers

cannot find the solution when going deeper…

slide-22
SLIDE 22

Slide Credit: He et al. (MSRA)

Deep Residual Learning

any two stacked layers

  • Plain net

𝑦 𝑦

weight layer weight layer

relu relu

H(x) is any desired mapping, hope the 2 weight layers fit F(x)

H(x)

slide-23
SLIDE 23

Slide Credit: He et al. (MSRA)

Deep Residual Learning

  • Residual net
slide-24
SLIDE 24

Slide Credit: He et al. (MSRA)

Deep Residual Learning

slide-25
SLIDE 25

Building Block Oversimplified

via reddit: londons_explorer

slide-26
SLIDE 26

Slide Credit: He et al. (MSRA)

Related Works – Residual Representations

  • VLAD & Fisher Vector [Jegou et al 2010], [Perronnin et al 2007]
  • Encoding residual vectors; powerful shallower representations.
  • Product Quantization (IVF-ADC) [Jegou et al 2011]
  • Quantizing residual vectors; efficient nearest-neighbor search.
  • MultiGrid & Hierarchical Precondition [Briggs, et al 2000], [Szeliski 1990, 2006]
  • Solving residual sub-problems; efficient PDE solvers.
slide-27
SLIDE 27

More Related Work

  • Highway Networks - Srivastava et al. (2015) (http://arxiv.org/abs/1505.00387)

Introduce a new architecture designed to ease gradient-based training of very deep

  • networks. This architecture allow unimpeded information flow across several layers on

"information highways”. Use a soft gate that depends on the data, so 0<alpha<1 times the activations goes through the layer, and 1 - alpha is directly forwarded to the next layer (where alpha is a function of the activation).

  • LSTM Networks - Hochreiter & Schmidhuber (1997)

Designed to avoid the long-term dependency problem. Remembering information for long periods of time. Each memory cell is associated with an input gate, an output gate and an internal state that feeds into itself unperturbed across time steps.

slide-28
SLIDE 28

Slide Credit: He et al. (MSRA)

7x7 conv, 64, /2 pool, /2 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128, /2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 256, /2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 512, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 avg pool fc 1000 7x7 conv, 64, /2 pool, /2 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128, /2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 256, /2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 512, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 avg pool fc 1000

Network “Design”

  • Keep it simple
  • Our basic design (VGG-style)
  • all 3x3 conv (almost)
  • spatial size /2 => # filters x2
  • Simple design; just deep!
  • Other remarks:
  • no max pooling (almost)
  • no hidden fc
  • no dropout

plain net ResNet

slide-29
SLIDE 29
slide-30
SLIDE 30

A Residual Block

slide-31
SLIDE 31

Slide Credit: He et al. (MSRA)

Training

  • All plain/residual nets are trained from scratch
  • All plain/residual nets use Batch Normalization
  • Standard hyper-parameters & augmentation
slide-32
SLIDE 32

Slide Credit: He et al. (MSRA)

CIFAR-10 experiments

1 2 4 5 6 10 20 3

  • iter. (1e4)

error (%)

5

plain-20 plain-32 plain-44 plain-56

56-layer 44-layer 32-layer 20-layer CIFAR-10 plain nets

1 2 4 5 6 5 10 20 3

  • iter. (1e4)

error (%)

ResNet-20 ResNet-32 ResNet-44 ResNet-56 ResNet-110

CIFAR-10 ResNets 20-layer 32-layer 44-layer 56-layer 110-layer

  • Deep ResNets can be trained without difficulties
  • Deeper ResNets have lower training error, and also lower test error

solid: test dashed: train

slide-33
SLIDE 33

Slide Credit: He et al. (MSRA)

ImageNet experiments

10 20 40 50 20 30 40 50 60 30

  • iter. (1e4)

error (%)

ResNet-18 ResNet-34

10 20 40 50 20 30 40 50 60 30

  • iter. (1e4)

error (%)

plain-18 plain-34

ImageNet plain nets ImageNet ResNets

solid: test dashed: train

34-layer 18-layer 18-layer 34-layer

  • Deep ResNets can be trained without difficulties
  • Deeper ResNets have lower training error, and also lower test error!
slide-34
SLIDE 34

Slide Credit: He et al. (MSRA)

ImageNet experiments

  • A practical design of going deeper

3x3, 64 3x3, 64

relu relu 64-d

3x3, 64 1x1, 64

relu

1x1, 256

relu relu 256-d

all-3x3 bottleneck

(for ResNet-50/101/152)

similar complexity

slide-35
SLIDE 35

Slide Credit: He et al. (MSRA)

ImageNet experiments

7.4 6.7 6.1 5.7

4 5 6 7 8

ResNet-34 ResNet-152 ResNet-101 ResNet-50

10-crop testing, top-5 val error (%)

this model has lower time complexity than VGG-16/19

  • Deeper ResNets have lower error
slide-36
SLIDE 36

Slide Credit: He et al. (MSRA)

ImageNet experiments

3.57 6.7 7.3 16.4 11.7 25.8 28.2

ILSVRC'15 ILSVRC'14 ILSVRC'14 ILSVRC'13 ILSVRC'12 ILSVRC'11 ILSVRC'10 ResNet GoogleNet VGG AlexNet

ImageNet Classification top-5 error (%)

shallow 8 layers 19 layers 22 layers

152 layers

8 layers

slide-37
SLIDE 37

Just classification?

A treasure from ImageNet is on learning features.

slide-38
SLIDE 38

Slide Credit: He et al. (MSRA)

“Features matter.” (quote [Girshick et al. 2014], the R-CNN paper)

  • Their results are all based on ResNet-101
  • Their features are well transferrable
slide-39
SLIDE 39

Faster R-CNN

Ren et al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”. NIPS 2015.

slide-40
SLIDE 40

Slide Credit: He et al. (MSRA)

Results on COCO – too many objects, let’s check carefully!

slide-41
SLIDE 41

Slide Credit: He et al. (MSRA)

slide-42
SLIDE 42

Slide Credit: He et al. (MSRA)

slide-43
SLIDE 43
slide-44
SLIDE 44

Fresh From The Torch Blog!

slide-45
SLIDE 45

Implementation

slide-46
SLIDE 46
slide-47
SLIDE 47
slide-48
SLIDE 48
slide-49
SLIDE 49

Experiments

  • There are currently a few implementations of Deep Residual Learning
  • Torch (will be covered in detail in the upcoming weeks)
  • A few Python based implementation (mxnet, lasagne)
  • I have played with a few these implementations
  • Most of them are quite easy to grasp
  • But again, they are built upon existing frameworks
  • If you are not familiar with the abstractions, i.e. how things are implemented underneath,

then it might not make sense much

  • Torch implementation is the best I have seen in terms of readability and understandability
slide-50
SLIDE 50

How to Get Started with Lasagne Implementation

  • 1. Install Requirements (sudo pip install -r https://raw.githubusercontent.com/Lasagne/Lasagne/

master/requirements.txt)

  • 2. Install Lasagne (sudo pip install https://github.com/Lasagne/Lasagne/archive/master.zip)
  • 3. It will probably fail. Fix things accordingly.
  • 4. Download this : https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
  • 5. Extract the cifar-10 dataset into a folder
  • 6. Download this folder and code : https://github.com/Lasagne/Recipes/tree/master/papers/

deep_residual_learning

  • 7. Run Deep_Residual_Learning_CIFAR-10.py
slide-51
SLIDE 51

Conclusions

  • Amazing results!
  • Simple idea
  • Deeper is better
  • Features matter
  • Their building block is so simple and effective that it will change the direction of the upcoming

works!

  • Actually it just did!
  • Inception-v4 (3.08% Top-5 error)
  • Torch implementation reports better results than the paper!
slide-52
SLIDE 52

24 Feb 2016

Hot From the Oven

New ImageNet Record!

Szegedy et al, Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning, arXiv 2016

Slide Credit: Fei-Fei Li & Andrej Karpathy & Justin Johnson

6

slide-53
SLIDE 53

Inception-v4

Szegedy et al, Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning, arXiv 2016

slide-54
SLIDE 54

Final Remarks

  • Seems quite easy to implement.
  • Several third party frameworks are adopting this method.
  • Torch implementation will be described in detail in the upcoming Torch tutorial!
  • Finally the idea is quite simple and yields amazing results!
  • Compared to previous works, requires less hardware and time!