Deep learning 5.6. Architecture choice and training protocol Fran - PowerPoint PPT Presentation

Deep learning 5.6. Architecture choice and training protocol Fran¸ cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020

Choosing the network structure is a difficult exercise. There is no silver bullet. • Re-use something “well known, that works”, or at least start from there, Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 1 / 9

Choosing the network structure is a difficult exercise. There is no silver bullet. • Re-use something “well known, that works”, or at least start from there, • split feature extraction / inference (although this is debatable), Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 1 / 9

Choosing the network structure is a difficult exercise. There is no silver bullet. • Re-use something “well known, that works”, or at least start from there, • split feature extraction / inference (although this is debatable), • modulate the capacity until it overfits a small subset, but does not overfit / underfit the full set, • capacity increases with more layers, more channels, larger receptive fields, or more units, • regularization to reduce the capacity or induce sparsity, Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 1 / 9

Choosing the network structure is a difficult exercise. There is no silver bullet. • Re-use something “well known, that works”, or at least start from there, • split feature extraction / inference (although this is debatable), • modulate the capacity until it overfits a small subset, but does not overfit / underfit the full set, • capacity increases with more layers, more channels, larger receptive fields, or more units, • regularization to reduce the capacity or induce sparsity, • identify common paths for siamese-like, • identify what path(s) or sub-parts need more/less capacity, Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 1 / 9

Choosing the network structure is a difficult exercise. There is no silver bullet. • Re-use something “well known, that works”, or at least start from there, • split feature extraction / inference (although this is debatable), • modulate the capacity until it overfits a small subset, but does not overfit / underfit the full set, • capacity increases with more layers, more channels, larger receptive fields, or more units, • regularization to reduce the capacity or induce sparsity, • identify common paths for siamese-like, • identify what path(s) or sub-parts need more/less capacity, • use prior knowledge about the ”scale of meaningful context” to size filters / combinations of filters (e.g. knowing the size of objects in a scene, the max duration of a sound snippet that matters), Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 1 / 9

Choosing the network structure is a difficult exercise. There is no silver bullet. • Re-use something “well known, that works”, or at least start from there, • split feature extraction / inference (although this is debatable), • modulate the capacity until it overfits a small subset, but does not overfit / underfit the full set, • capacity increases with more layers, more channels, larger receptive fields, or more units, • regularization to reduce the capacity or induce sparsity, • identify common paths for siamese-like, • identify what path(s) or sub-parts need more/less capacity, • use prior knowledge about the ”scale of meaningful context” to size filters / combinations of filters (e.g. knowing the size of objects in a scene, the max duration of a sound snippet that matters), • grid-search all the variations that come to mind (and hopefully have farms of GPUs to do so). Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 1 / 9

Choosing the network structure is a difficult exercise. There is no silver bullet. • Re-use something “well known, that works”, or at least start from there, • split feature extraction / inference (although this is debatable), • modulate the capacity until it overfits a small subset, but does not overfit / underfit the full set, • capacity increases with more layers, more channels, larger receptive fields, or more units, • regularization to reduce the capacity or induce sparsity, • identify common paths for siamese-like, • identify what path(s) or sub-parts need more/less capacity, • use prior knowledge about the ”scale of meaningful context” to size filters / combinations of filters (e.g. knowing the size of objects in a scene, the max duration of a sound snippet that matters), • grid-search all the variations that come to mind (and hopefully have farms of GPUs to do so). We will re-visit this list with additional regularization / normalization methods. Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 1 / 9

Regarding the learning rate, for training to succeed it has to • reduce the loss quickly ⇒ large learning rate, • not be trapped in a bad minimum ⇒ large learning rate, Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 2 / 9

Regarding the learning rate, for training to succeed it has to • reduce the loss quickly ⇒ large learning rate, • not be trapped in a bad minimum ⇒ large learning rate, • not bounce around in narrow valleys ⇒ small learning rate, and • not oscillate around a minimum ⇒ small learning rate. Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 2 / 9

Regarding the learning rate, for training to succeed it has to • reduce the loss quickly ⇒ large learning rate, • not be trapped in a bad minimum ⇒ large learning rate, • not bounce around in narrow valleys ⇒ small learning rate, and • not oscillate around a minimum ⇒ small learning rate. These constraints lead to a general policy of using a larger step size first, and a smaller one in the end. Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 2 / 9

Regarding the learning rate, for training to succeed it has to • reduce the loss quickly ⇒ large learning rate, • not be trapped in a bad minimum ⇒ large learning rate, • not bounce around in narrow valleys ⇒ small learning rate, and • not oscillate around a minimum ⇒ small learning rate. These constraints lead to a general policy of using a larger step size first, and a smaller one in the end. The practical strategy is to look at the losses and error rates across epochs and pick a learning rate and learning rate adaptation. For instance by reducing it at discrete pre-defined steps, or with a geometric decay. Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 2 / 9

CIFAR10 data-set 32 × 32 color images, 50 , 000 train samples, 10 , 000 test samples. (Krizhevsky, 2009, chap. 3) Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 3 / 9

Small convnet on CIFAR10, cross-entropy, batch size 100, η = 1 e − 1. 0.75 10 0 0.70 0.65 10 − 1 0.60 Accuracy Loss 0.55 10 − 2 0.50 0.45 Train loss Test accuracy 10 − 3 0.40 0 10 20 30 40 50 Nb. epochs Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 4 / 9

Small convnet on CIFAR10, cross-entropy, batch size 100 0.75 10 0 0.70 0.65 10 − 1 0.60 Accuracy Loss 0.55 10 − 2 0.50 Train loss (lr = 2e-1) 0.45 Train loss (lr = 1e-1) Train loss (lr = 1e-2) 10 − 3 0.40 0 10 20 30 40 50 Nb. epochs Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 5 / 9

Using η = 1 e − 1 for 25 epochs, then reducing it. 0.75 10 0 0.70 0.65 10 − 1 0.60 Accuracy Loss 0.55 10 − 2 0.50 Train loss (no change) Train loss (lr2 = 7e-2) 0.45 Train loss (lr2 = 5e-2) Train loss (lr2 = 2e-2) 10 − 3 0.40 0 10 20 30 40 50 Nb. epochs Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 6 / 9

Using η = 1 e − 1 for 25 epochs, then η = 5 e − 2. 0.75 10 0 0.70 0.65 10 − 1 0.60 Accuracy Loss 0.55 10 − 2 0.50 0.45 Train loss Test accuracy 10 − 3 0.40 0 10 20 30 40 50 Nb. epochs Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 7 / 9

While the test error still goes down, the test loss may increase, as it gets even worse on misclassified examples, and decreases less on the ones getting fixed. 0.75 10 0 0.70 0.65 10 − 1 0.60 Accuracy Loss 0.55 10 − 2 0.50 Test loss 0.45 Train loss Test accuracy 10 − 3 0.40 0 10 20 30 40 50 Nb. epochs Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 8 / 9

We can plot the train and test distributions of the per-sample loss � exp( f Y ( X ; w )) � 퓁 = − log � k exp( f k ( X ; w )) through epochs to visualize the over-fitting. Epoch 1 10 Train Test 8 6 4 2 0 10 − 5 10 − 3 10 − 1 10 1 Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 9 / 9

We can plot the train and test distributions of the per-sample loss � exp( f Y ( X ; w )) � 퓁 = − log � k exp( f k ( X ; w )) through epochs to visualize the over-fitting. Epoch 2 10 Train Test 8 6 4 2 0 10 − 5 10 − 3 10 − 1 10 1 Fran¸ cois Fleuret Deep learning / 5.6. Architecture choice and training protocol 9 / 9

Deep learning 5.6. Architecture choice and training protocol Fran - PowerPoint PPT Presentation

Deep learning 5.6. Architecture choice and training protocol Fran cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020 Choosing the network structure is a difficult exercise. There is no silver bullet. Re-use something well known, that

MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY Planning for Mobility in

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Voting in Maines Ranked Choice Election A non-partisan guide to ranked choice elections

Homecare Choice Program Presented by Jenny Cokeley Homecare Choice Program Manager Homecare

Computer architecture for deep learning applications David Brooks School of Engineering and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

1 WEEK/CHOICE #1 The Reality Choice: realize Im not God, and humbly admit that I need help.

HOW WILL HOW WILL RANKED CHOICE VOTING RANKED CHOICE VOTING WORK IN HI? WORK IN HI? VOTERS

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

September 19 th & 20 th , 2018 Jacksonville, FL September 19 th & 20 th , 2018

Unisyn Open E lect First Complete VVSG 2005 certified system (8 months and $1 million to

User Experience for Business Analysts Parker Malenke What is User Experience Design? Users An

Adaptive Software Cache Management Gil Einziger 1 , Ohad Eytan 2 , Roy Friedman 2 and Ben Manes 3

zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Risk Management in Higher Education ERM?

Highly Available Database Architectures in AWS Santa Clara, California | April 23th 25th,

Matthew Hause UPDM Co-Chair UPDM Group Adaptive Mitre Artisan Software Northrop Grumman ASMG

Melinda Stelzer and Bill Opsal Enhancing collaboration in distributed teams Partial

Deep learning 5.6. Architecture choice and training protocol Fran - PowerPoint PPT Presentation

Deep learning 5.6. Architecture choice and training protocol Fran cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020 Choosing the network structure is a difficult exercise. There is no silver bullet. Re-use something well known, that

MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY Planning for Mobility in

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Voting in Maines Ranked Choice Election A non-partisan guide to ranked choice elections

Homecare Choice Program Presented by Jenny Cokeley Homecare Choice Program Manager Homecare

Computer architecture for deep learning applications David Brooks School of Engineering and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

1 WEEK/CHOICE #1 The Reality Choice: realize Im not God, and humbly admit that I need help.

HOW WILL HOW WILL RANKED CHOICE VOTING RANKED CHOICE VOTING WORK IN HI? WORK IN HI? VOTERS

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

September 19 th &amp; 20 th , 2018 Jacksonville, FL September 19 th &amp; 20 th , 2018

Unisyn Open E lect First Complete VVSG 2005 certified system (8 months and $1 million to

User Experience for Business Analysts Parker Malenke What is User Experience Design? Users An

Adaptive Software Cache Management Gil Einziger 1 , Ohad Eytan 2 , Roy Friedman 2 and Ben Manes 3

zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Risk Management in Higher Education ERM?

Highly Available Database Architectures in AWS Santa Clara, California | April 23th 25th,

Matthew Hause UPDM Co-Chair UPDM Group Adaptive Mitre Artisan Software Northrop Grumman ASMG

Melinda Stelzer and Bill Opsal Enhancing collaboration in distributed teams Partial

September 19 th & 20 th , 2018 Jacksonville, FL September 19 th & 20 th , 2018