Deep Residual Learning for Image Recognition ILSVRC 2015 MS COCO - PowerPoint PPT Presentation

Deep Residual Learning for Image Recognition ILSVRC 2015 MS COCO 2015 K. He, X. Zhang, S. Ren and J. Sun WINNER Microsoft Research Article overview by Ilya Kuzovkin Computational Neuroscience Seminar University of Tartu 2016

T HE I DEA

1000 classes

2012 8 layers 15.31% error

2012 2013 8 layers 9 layers, 2x params 15.31% error 11.74% error

2012 2013 2014 8 layers 9 layers, 2x params 19 layers 15.31% error 11.74% error 7.41% error

2012 2013 2014 2015 ? 8 layers 9 layers, 2x params 19 layers 15.31% error 11.74% error 7.41% error

2012 2013 2014 2015 Is learning better networks as easy as stacking more layers ? ? 8 layers 9 layers, 2x params 19 layers 15.31% error 11.74% error 7.41% error

2012 2013 2014 2015 Is learning better networks as easy as stacking more layers ? Vanishing / exploding gradients ? 8 layers 9 layers, 2x params 19 layers 15.31% error 11.74% error 7.41% error

2012 2013 2014 2015 Is learning better networks as easy as stacking more layers ? Vanishing / exploding gradients ? Normalized initialization & intermediate normalization 8 layers 9 layers, 2x params 19 layers 15.31% error 11.74% error 7.41% error

2012 2013 2014 2015 Is learning better networks as easy as stacking more layers ? Vanishing / exploding gradients ? Normalized initialization & intermediate normalization Degradation problem 8 layers 9 layers, 2x params 19 layers 15.31% error 11.74% error 7.41% error

Degradation problem “with the network depth increasing, accuracy gets saturated”

Degradation problem “with the network depth increasing, accuracy gets saturated” Not caused by overfitting:

Conv Conv Conv Conv Trained Tested Accuracy X%

Conv Conv Conv Conv Conv Conv Conv Conv Identity Trained Identity Identity Tested Identity Tested Accuracy X%

Conv Conv Conv Conv Conv Conv Conv Conv Identity Trained Identity Identity Tested Identity Tested Same Accuracy X% performance

Conv Conv Conv Conv Conv Conv Conv Conv Conv Conv Conv Conv Identity Conv Trained Identity Conv Identity Conv Tested Identity Conv Tested Trained Tested Same Accuracy X% performance

Conv Conv Conv Conv Conv Conv Conv Conv Conv Conv Conv Conv Identity Conv Trained Identity Conv Identity Conv Tested Identity Conv Tested Trained Tested Same Worse! Accuracy X% performance

Conv Conv Conv “Our current solvers on hand are unable to Conv Conv Conv find solutions that are comparably good or Conv Conv Conv better than the constructed solution Conv Conv Conv (or unable to do so in feasible time)” Identity Conv Trained Identity Conv Identity Conv Tested Identity Conv Tested Trained Tested Same Worse! Accuracy X% performance

Conv Conv Conv “Our current solvers on hand are unable to Conv Conv Conv find solutions that are comparably good or Conv Conv Conv better than the constructed solution Conv Conv Conv (or unable to do so in feasible time)” Identity Conv Trained Identity Conv “Solvers might have difficulties in Identity Conv Tested approximating identity mappings by Identity Conv multiple nonlinear layers” Tested Trained Tested Same Worse! Accuracy X% performance

Conv Conv Conv “Our current solvers on hand are unable to Conv Conv Conv find solutions that are comparably good or Conv Conv Conv better than the constructed solution Conv Conv Conv (or unable to do so in feasible time)” Identity Conv Trained Identity Conv “Solvers might have difficulties in Identity Conv Tested approximating identity mappings by Identity Conv multiple nonlinear layers” Tested Trained Tested Add explicit identity connections and “solvers may simply drive the weights of the multiple nonlinear layers toward zero ” Same Worse! Accuracy X% performance

Add explicit identity connections and “solvers may simply drive the weights of the multiple nonlinear layers toward zero ” is the true function we want to learn

Add explicit identity connections and “solvers may simply drive the weights of the multiple nonlinear layers toward zero ” is the true function we want to learn Let’s pretend we want to learn instead.

Add explicit identity connections and “solvers may simply drive the weights of the multiple nonlinear layers toward zero ” is the true function we want to learn Let’s pretend we want to learn instead. The original function is then

Network can decide how deep it needs to be…

Network can decide how deep it needs to be… “The identity connections introduce neither extra parameter nor computation complexity”

2012 2013 2014 2015 ? 8 layers 9 layers, 2x params 19 layers 15.31% error 11.74% error 7.41% error

2012 2013 2014 2015 8 layers 9 layers, 2x params 19 layers 152 layers 15.31% error 11.74% error 7.41% error 3.57% error

E XPERIMENTS AND D ETAILS

• Lots of convolutional 3x3 layers • VGG complexity is 19.6 billion FLOPs 34-layer-ResNet is 3.6 bln. FLOPs

• Lots of convolutional 3x3 layers • VGG complexity is 19.6 billion FLOPs 34-layer-ResNet is 3.6 bln. FLOPs � • Batch normalization • SGD with batch size 256 • (up to) 600,000 iterations • LR 0.1 (divided by 10 when error plateaus) • Momentum 0.9 • No dropout • Weight decay 0.0001

• Lots of convolutional 3x3 layers • VGG complexity is 19.6 billion FLOPs 34-layer-ResNet is 3.6 bln. FLOPs � • Batch normalization • SGD with batch size 256 • (up to) 600,000 iterations • LR 0.1 (divided by 10 when error plateaus) • Momentum 0.9 • No dropout • Weight decay 0.0001 � • 1.28 million training images • 50,000 validation • 100,000 test

• 34-layer ResNet has lower training error . This indicates that the degradation problem is well addressed and we manage to obtain accuracy gains from increased depth .

• 34-layer ResNet has lower training error . This indicates that the degradation problem is well addressed and we manage to obtain accuracy gains from increased depth . � • 34-layer-ResNet reduces the top-1 error by 3.5%

• 34-layer ResNet has lower training error . This indicates that the degradation problem is well addressed and we manage to obtain accuracy gains from increased depth . � • 34-layer-ResNet reduces the top-1 error by 3.5% � • 18-layer ResNet converges faster and thus ResNet eases the optimization by providing faster convergence at the early stage.

G OING D EEPER

Due to time complexity the usual building block is replaced by Bottleneck Block 50 / 101 / 152 - layer ResNets are build from those blocks

A NALYSIS ON CIFAR-10

ImageNet Classification 2015 1st 3.57% error ImageNet Object Detection 2015 1st 194 / 200 categories ImageNet Object Localization 2015 1st 9.02% error COCO Detection 2015 1st 37.3% COCO Segmentation 2015 1st 28.2% http://research.microsoft.com/en-us/um/people/kahe/ilsvrc15/ilsvrc2015_deep_residual_learning_kaiminghe.pdf

Deep Residual Learning for Image Recognition ILSVRC 2015 MS COCO - PowerPoint PPT Presentation

Deep Residual Learning for Image Recognition ILSVRC 2015 MS COCO 2015 K. He, X. Zhang, S. Ren and J. Sun WINNER Microsoft Research Article overview by Ilya Kuzovkin Computational Neuroscience Seminar University of Tartu 2016 T HE I DEA

Pipeline Strategies and conversations behind securing a Residual Bequest Agenda 1. Why Residual?

An Overview of Deep Residual Learning Semih Yagcioglu 01.03.2016 Deep Residual Learning

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Residual Learning for Image Recognition Kaiming He et al. (Microsoft Research) By Zana

Clarifying Residual Flow s for Surface Water Takes August 2017 Clarifying Residual Flow s

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Image: Scaling Up Image Recognition Ren Wu, Shengen Yan, Yi Shan, Qingqing Dang, Gang Sun

AMMI Introduction to Deep Learning 6.5. Residual networks Fran cois Fleuret

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Deep Image-Text Embeddings Learning Deep Structure-Preserving Image-Text Embeddings (CVPR 2016)

SPOT Farm East (Elveden) 2016 Residual Herbicide Demonstration Report Background The urea

Residual Flows for Invertible Generative Modeling Ricky T. Q. Chen, Jens Behrmann, David

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

English sounds John Goldsmith September 27, 2011 John Goldsmith () English sounds September

Don Juans Troubles Don Juans Troubles Hey, Anna, how are you? Don Juans Troubles Hey,

Kahoot.it Mental Math Finding Addition Compatibles Use your device to navigate to

Neural Networks Still seeking flexible, non-linear models for classfication and CS 335: Neural

DEEP NEURAL NETWORKS FOR OBJECT DETECTION Sergey Nikolenko Steklov Institute of Mathematics at

Introduction to Deep Learning Milan Straka February 24, 2020 Charles University in Prague

IMPLIC IMPLICATION & TION & EVIDEN EVIDENCE CE Nations - ations - 19 193 Languag

Deep Residual Learning for Image Recognition ILSVRC 2015 MS COCO - PowerPoint PPT Presentation

Deep Residual Learning for Image Recognition ILSVRC 2015 MS COCO 2015 K. He, X. Zhang, S. Ren and J. Sun WINNER Microsoft Research Article overview by Ilya Kuzovkin Computational Neuroscience Seminar University of Tartu 2016 T HE I DEA

Pipeline Strategies and conversations behind securing a Residual Bequest Agenda 1. Why Residual?

An Overview of Deep Residual Learning Semih Yagcioglu 01.03.2016 Deep Residual Learning

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Residual Learning for Image Recognition Kaiming He et al. (Microsoft Research) By Zana

Clarifying Residual Flow s for Surface Water Takes August 2017 Clarifying Residual Flow s

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Image: Scaling Up Image Recognition Ren Wu, Shengen Yan, Yi Shan, Qingqing Dang, Gang Sun

AMMI Introduction to Deep Learning 6.5. Residual networks Fran cois Fleuret

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Deep Image-Text Embeddings Learning Deep Structure-Preserving Image-Text Embeddings (CVPR 2016)

SPOT Farm East (Elveden) 2016 Residual Herbicide Demonstration Report Background The urea

Residual Flows for Invertible Generative Modeling Ricky T. Q. Chen, Jens Behrmann, David

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

English sounds John Goldsmith September 27, 2011 John Goldsmith () English sounds September

Don Juans Troubles Don Juans Troubles Hey, Anna, how are you? Don Juans Troubles Hey,

Kahoot.it Mental Math Finding Addition Compatibles Use your device to navigate to

Neural Networks Still seeking flexible, non-linear models for classfication and CS 335: Neural

DEEP NEURAL NETWORKS FOR OBJECT DETECTION Sergey Nikolenko Steklov Institute of Mathematics at

Introduction to Deep Learning Milan Straka February 24, 2020 Charles University in Prague

IMPLIC IMPLICATION &amp; TION &amp; EVIDEN EVIDENCE CE Nations - ations - 19 193 Languag

IMPLIC IMPLICATION & TION & EVIDEN EVIDENCE CE Nations - ations - 19 193 Languag