Deep Learning T HEORY , H ISTORY , S TATE OF THE A RT & P - PowerPoint PPT Presentation

NET OUT How it learns Backpropagation Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

NET OUT How it learns Backpropagation Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights Learning rate http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

NET OUT How it learns Backpropagation Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights Learning rate Gradient descent update rule http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

NET OUT How it learns Backpropagation Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

NET OUT How it learns Backpropagation Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 • Repeat for w6, w7, w8 http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

NET OUT How it learns Backpropagation Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 • Repeat for w6, w7, w8 • In analogous way for w1, w2, w3, w4 http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

NET OUT How it learns Backpropagation Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 • Repeat for w6, w7, w8 • In analogous way for w1, w2, w3, w4 • Calculate the total error again: 0.29 1027924 it was: 0.29 8371109 http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

NET OUT How it learns Backpropagation Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 • Repeat for w6, w7, w8 • In analogous way for w1, w2, w3, w4 • Calculate the total error again: 0.29 1027924 it was: 0.29 8371109 � • Repeat 10,000 times: 0.000035085 http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

How it learns Optimization methods Alec Radford “Introduction to Deep Learning with Python”

How it evolved

How it evolved 1-layer NN INPUT OUTPUT Alec Radford “Introduction to Deep Learning with Python”

How it evolved 1-layer NN 92.5% on the MNIST test set INPUT OUTPUT Alec Radford “Introduction to Deep Learning with Python”

How it evolved One hidden layer Alec Radford “Introduction to Deep Learning with Python”

How it evolved One hidden layer 98.2% on the MNIST test set Alec Radford “Introduction to Deep Learning with Python”

How it evolved One hidden layer Activity of a 100 hidden 98.2% on the MNIST test set neurons (out of 625) Alec Radford “Introduction to Deep Learning with Python”

How it evolved Overfitting

How it evolved Dropout Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, 2014

How it evolved ReLU X. Glorot, A. Bordes, Y. Bengio, “Deep Sparse Rectifier Neural Networks”, 2011

How it evolved “Modern” ANN • Several hidden layers • ReLU activation units • Dropout

How it evolved “Modern” ANN • Several hidden layers • ReLU activation units • Dropout 99.0% on the MNIST test set

How it evolved Convolution

How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt edge detector

How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt 40 40 40 edge 40 40 40 detector 40 40 40 10 10 10 10 10 10 10 10 10

How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt +1 +1 +1 40 40 40 edge 0 0 0 40 40 40 detector -1 -1 -1 40 40 40 10 10 10 10 10 10 10 10 10

How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt +40 +40 +40 edge 0 0 0 detector -40 -40 -40 10 10 10 10 10 10 10 10 10

How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt +40 +40 +40 edge 0 0 0 0 detector -40 -40 -40 10 10 10 10 10 10 10 10 10

How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt 40 40 40 edge 0 40 40 40 detector 90 40 40 40 90 10 10 10 0 10 10 10 10 10 10

How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt 40 40 40 edge 0 40 40 40 detector 90 40 40 40 90 10 10 10 0 10 10 10 10 10 10 Edge detector is a handcrafted feature detector.

How it evolved Convolution The idea of a convolutional layer is to learn feature detectors instead of using handcrafted ones

How it evolved Convolution The idea of a convolutional layer is to learn feature detectors instead of using handcrafted ones http://yann.lecun.com/exdb/mnist/

How it evolved Convolution The idea of a convolutional layer is to learn feature detectors instead of using handcrafted ones 99.50% on the MNIST test set CURRENT BEST : 99.77% by committee of 35 conv. nets http://yann.lecun.com/exdb/mnist/

How it evolved More layers

How it evolved More layers C. Szegedy, et al., “Going Deeper with Convolutions”, 2014

How it evolved More layers C. Szegedy, et al., “Going Deeper with Convolutions”, 2014 ILSVRC 2015 winner — 152 (!) layers K. He et al., “Deep Residual Learning for Image Recognition”, 2015

How it evolved Hyperparameters • Network: architecture • number of layers • number of units (in each layer) • type of the activation function • weight initialization • • Convolutional layers: size • stride • number of filters • • Optimization method: learning rate • other method-specific • constants … •

How it evolved Grid search :( Hyperparameters • Network: architecture • number of layers • number of units (in each layer) • type of the activation function • weight initialization • • Convolutional layers: size • stride • number of filters • • Optimization method: learning rate • other method-specific • constants … •

How it evolved Grid search :( Hyperparameters Random search :/ • Network: architecture • number of layers • number of units (in each layer) • type of the activation function • weight initialization • • Convolutional layers: size • stride • number of filters • • Optimization method: learning rate • other method-specific • constants … •

How it evolved Grid search :( Hyperparameters Random search :/ • Network: architecture • Bayesian optimization :) number of layers • number of units (in each layer) • type of the activation function • weight initialization • • Convolutional layers: size • stride • number of filters • • Optimization method: learning rate • other method-specific • constants … • Snoek, Larochelle, Adams, “Practical Bayesian Optimization of Machine Learning Algorithms”

How it evolved Grid search :( Hyperparameters Random search :/ • Network: architecture • Bayesian optimization :) number of layers • number of units (in each layer) • type of the activation function • weight initialization • • Convolutional layers: size • stride • number of filters • • Optimization method: learning rate • other method-specific • constants Informal parameter search :) … • Snoek, Larochelle, Adams, “Practical Bayesian Optimization of Machine Learning Algorithms”

How it evolved Major Types of ANNs feedforward convolutional

How it evolved Major Types of ANNs feedforward convolutional recurrent

Deep Learning T HEORY , H ISTORY , S TATE OF THE A RT & P - PowerPoint PPT Presentation

Deep Learning T HEORY , H ISTORY , S TATE OF THE A RT & P RACTICAL T OOLS by Ilya Kuzovkin ilya.kuzovkin@gmail.com Machine Learning Estonia http://neuro.cs.ut.ee 2016 Where it has started How it learns How it evolved What is the state

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Activation Functions Activation Functions In [1]: % matplotlib inline import d2l from mxnet

JACKPOT: Online Experimentation of Cloud Microservices BY M. TOSLALI 1 , S. PARTHASARATHY 2 , F.

Dense layers IN TRODUCTION TO TEN S ORF LOW IN P YTH ON Isaiah Hull Economist The linear

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of

A general-purpose method for faithfully rounded floating-point function approximation in FPGAs

Gradient for Cross-Entropy Loss with Sigmoid For a single example ( x , y ): K

EN. 601.467/667 Introduction to Human Language Technology Deep Learning II Shinji Watanabe 1

Logistic Regression: From Binary to Multi-Class Shuiwang Ji Department of Computer Science &

Deep Learning T HEORY , H ISTORY , S TATE OF THE A RT & P - PowerPoint PPT Presentation

Deep Learning T HEORY , H ISTORY , S TATE OF THE A RT & P RACTICAL T OOLS by Ilya Kuzovkin ilya.kuzovkin@gmail.com Machine Learning Estonia http://neuro.cs.ut.ee 2016 Where it has started How it learns How it evolved What is the state

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Activation Functions Activation Functions In [1]: % matplotlib inline import d2l from mxnet

JACKPOT: Online Experimentation of Cloud Microservices BY M. TOSLALI 1 , S. PARTHASARATHY 2 , F.

Dense layers IN TRODUCTION TO TEN S ORF LOW IN P YTH ON Isaiah Hull Economist The linear

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of

A general-purpose method for faithfully rounded floating-point function approximation in FPGAs

Gradient for Cross-Entropy Loss with Sigmoid For a single example ( x , y ): K

EN. 601.467/667 Introduction to Human Language Technology Deep Learning II Shinji Watanabe 1

Logistic Regression: From Binary to Multi-Class Shuiwang Ji Department of Computer Science &amp;

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Logistic Regression: From Binary to Multi-Class Shuiwang Ji Department of Computer Science &