Deep Learning (Partly) Need for Pooling Demystified Which Pooling - PowerPoint PPT Presentation

Overview Traditional Neural . . . Why Go Beyond . . . Which Activation . . . Deep Learning (Partly) Need for Pooling Demystified Which Pooling . . . Pooling Four Values Sensitivity of Deep . . . Vladik Kreinovich How to Deal with . . . Department of Computer Science Home Page University of Texas at El Paso Title Page El Paso, Texas 79968, USA vladik@utep.edu ◭◭ ◮◮ http://www.cs.utep.edu/vladik ◭ ◮ Page 1 of 35 Go Back Full Screen Close Quit

Overview Traditional Neural . . . 1. Overview Why Go Beyond . . . • Successes of deep learning are partly due to appropriate Which Activation . . . selection of activation function, pooling functions, etc. Need for Pooling Which Pooling . . . • Most of these choices have been made based on empir- Pooling Four Values ical comparison and heuristic ideas. Sensitivity of Deep . . . • In this talk, we show that: How to Deal with . . . Home Page – many of these choices – and the surprising success of deep learning in the first place Title Page – can be explained by reasonably simple and natural ◭◭ ◮◮ mathematics. ◭ ◮ Page 2 of 35 Go Back Full Screen Close Quit

Overview Traditional Neural . . . 2. Traditional Neural Networks: A Brief Reminder Why Go Beyond . . . • To explain deep neural networks, let us first briefly Which Activation . . . recall the motivations behind traditional ones. Need for Pooling Which Pooling . . . • In the old days, computers were much slower. Pooling Four Values • This was a big limitation that prevented us from solv- Sensitivity of Deep . . . ing many important practical problems. How to Deal with . . . • As a result, researchers started looking for ways to Home Page speed up computations. Title Page • If a person has a task which takes too long for one ◭◭ ◮◮ person, a natural idea is to ask for help. ◭ ◮ • Several people can work on this task in parallel – and Page 3 of 35 thus, get the result faster; similarly: Go Back – if a computation task takes too long, Full Screen – a natural idea is to have several processing units working in parallel. Close Quit

Overview Traditional Neural . . . 3. Traditional Neural Networks (cont-d) Why Go Beyond . . . • In this case: Which Activation . . . Need for Pooling – the overall computation time is just Which Pooling . . . – the time that is needed for each of the processing Pooling Four Values unit to finish its sub-task. Sensitivity of Deep . . . • To minimize the overall time, it is therefore necessary How to Deal with . . . to make these sub-tasks as simple as possible. Home Page • In data processing, the simplest possible functions to Title Page compute are linear functions. ◭◭ ◮◮ • However, if we only have processing units that compute ◭ ◮ linear functions, we will only compute linear functions. Page 4 of 35 • Indeed, a composition of linear functions is always lin- Go Back ear. Full Screen • Thus, we need to supplement these units with some nonlinear units. Close Quit

Overview Traditional Neural . . . 4. Traditional Neural Networks (cont-d) Why Go Beyond . . . • In general, the more inputs, the more complex (and Which Activation . . . thus longer) the resulting computations. Need for Pooling Which Pooling . . . • So, the fastest possible nonlinear units are the ones Pooling Four Values that compute functions of one variable. Sensitivity of Deep . . . • So, our ideal computational device should consist of: How to Deal with . . . – linear (L) units and Home Page – nonlinear units (NL) that compute functions of one Title Page variable. ◭◭ ◮◮ • These units should work in parallel: ◭ ◮ – first, all the units from one layer will work, Page 5 of 35 – then all units from another layer, etc. Go Back • The fewer layers, the faster the resulting computations. Full Screen • One can prove that 1- and 2-layer schemes do not have a universal approximation property. Close Quit

Overview Traditional Neural . . . 5. Traditional Neural Networks (cont-d) Why Go Beyond . . . • One can also prove that 3-layer neurons already have Which Activation . . . this property. Need for Pooling Which Pooling . . . • There are two possible 3-layer schemes: L-NL-L and Pooling Four Values NL-L-NL. Sensitivity of Deep . . . • The first one is faster, since it uses slower nonlinear How to Deal with . . . units only once. Home Page • In this scheme, first, each unit from the first layer ap- Title Page plies a linear transformation to the inputs x 1 , . . . , x n : ◭◭ ◮◮ n � ◭ ◮ z k = w ki · x i − w k 0 . i =1 Page 6 of 35 • The values w ki are known as weights . Go Back Full Screen • In the next NL layer, these values are transformed into y k = s k ( y k ), for some nonlinear functions s k ( z ). Close Quit

Overview Traditional Neural . . . 6. Traditional Neural Networks (cont-d) Why Go Beyond . . . • Finally, in the last (L) layer, the values y k are linearly Which Activation . . . combined into the final result Need for Pooling � n Which Pooling . . . K K � � � � y = W k · y k − W 0 = W k · s k w ki · x i − w k 0 − W 0 . Pooling Four Values k =1 k =1 i =1 Sensitivity of Deep . . . How to Deal with . . . • This is exactly the formula that describes the tradi- Home Page tional neural network. Title Page • In the traditional neural network, usually, all the NL ◭◭ ◮◮ neurons compute the same function – sigmoid: 1 ◭ ◮ s k ( z ) = 1 + exp( − z ) . Page 7 of 35 Go Back Full Screen Close Quit

Overview Traditional Neural . . . 7. Why Go Beyond Traditional Neural Networks Why Go Beyond . . . • Traditional neural networks were invented when com- Which Activation . . . puters were reasonably slow. Need for Pooling Which Pooling . . . • This prevented computers from solving important prac- Pooling Four Values tical problems. Sensitivity of Deep . . . • For these computers, computation speed was the main How to Deal with . . . objective. Home Page • As we have just shown, this need led to what we know Title Page as traditional neural networks. ◭◭ ◮◮ • Nowadays, computers are much faster. ◭ ◮ • In most practical applications, speed is no longer the Page 8 of 35 main problem. Go Back • But the traditional neural networks: Full Screen – while fast, – have limited accuracy of their predictions. Close Quit

Overview Traditional Neural . . . 8. The More Models We Have, the More Accu- Why Go Beyond . . . rately We Can Approximate Which Activation . . . • As a result of training a neural network, we get the Need for Pooling values of some parameters for which Which Pooling . . . Pooling Four Values – the corresponding models Sensitivity of Deep . . . – provides the best approximation to the actual data. How to Deal with . . . • Let a denote the number of parameters. Home Page • Let b the number of bits representing each parameter. Title Page • Then, to represent all parameters, we need N = a · b ◭◭ ◮◮ bits. ◭ ◮ • Different models obtained from training can be de- Page 9 of 35 scribed by different N -bit sequences. • In general, for N bits, there are 2 N possible N -bit se- Go Back quences. Full Screen • Thus, we can have 2 N possible models. Close Quit

Overview Traditional Neural . . . 9. The More Models We Have (cont-d) Why Go Beyond . . . • In these terms, training simply means selecting one of Which Activation . . . these 2 N possible models. Need for Pooling Which Pooling . . . • If we have only one model to represent the actual de- Pooling Four Values pendence, this model will be a very lousy description. Sensitivity of Deep . . . • If we can have two models, we can have more accurate How to Deal with . . . approximations. Home Page • In general, the more models we have, the more accurate Title Page representation we can have. ◭◭ ◮◮ • We can illustrate this idea on the example of approxi- ◭ ◮ mating real numbers from the interval [0 , 1]. Page 10 of 35 • If we have only one model – e.g., the value x = 0 . 5, then Go Back we approximate every other number with accuracy 0.5. Full Screen Close Quit

Overview Traditional Neural . . . 10. The More Models We Have (cont-d) Why Go Beyond . . . • If we can have 10 models, then we can take 10 values Which Activation . . . 0.05, 0.15, . . . , 0.95. Need for Pooling Which Pooling . . . • The first value approximates all the numbers from the Pooling Four Values interval [0 , 0 . 1] with accuracy 0.05. Sensitivity of Deep . . . • The second value approximates all the numbers from How to Deal with . . . the interval [0 , 1 , 0 . 2] with the same accuracy, etc. Home Page • By selecting one of these values, we can approximate Title Page any number from [0 , 1] with accuracy 0.05. ◭◭ ◮◮ ◭ ◮ Page 11 of 35 Go Back Full Screen Close Quit

Deep Learning (Partly) Need for Pooling Demystified Which Pooling - PowerPoint PPT Presentation

Overview Traditional Neural . . . Why Go Beyond . . . Which Activation . . . Deep Learning (Partly) Need for Pooling Demystified Which Pooling . . . Pooling Four Values Sensitivity of Deep . . . Vladik Kreinovich How to Deal with . . .

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

HUDSA Partly similar issues Partly different solutions EHPM Meeting Budapest 05.07.2018 Dr.

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Short introduction to standard regulator Daniele Carnevale Dipartimento di Ing. Civile ed Ing.

2 2 2 Netlink2 2 2 2 2 2 as ForCES protocol draft-jhsrha-forces-netlink2-00.txt Robert

Lecture 19 Practical Issues in PID Implementation Process Control Prof. Kannan M. Moudgalya

Lecture 16 Introduction to Controllers and PID Controllers Process Control Prof. Kannan M.

Complexity of Linear Regions in Deep Nets Boris Hanin Facebook AI Research and Texas A&M

1 Stochastic, Partially Observable Markov Decision Process (MDP) Partially Observable MDP S

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Lifting Tropical Curves and Linear Systems on Graphs Eric Katz (University of Waterloo) September

Deep Learning (Partly) Need for Pooling Demystified Which Pooling - PowerPoint PPT Presentation

Overview Traditional Neural . . . Why Go Beyond . . . Which Activation . . . Deep Learning (Partly) Need for Pooling Demystified Which Pooling . . . Pooling Four Values Sensitivity of Deep . . . Vladik Kreinovich How to Deal with . . .

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

HUDSA Partly similar issues Partly different solutions EHPM Meeting Budapest 05.07.2018 Dr.

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Short introduction to standard regulator Daniele Carnevale Dipartimento di Ing. Civile ed Ing.

2 2 2 Netlink2 2 2 2 2 2 as ForCES protocol draft-jhsrha-forces-netlink2-00.txt Robert

Lecture 19 Practical Issues in PID Implementation Process Control Prof. Kannan M. Moudgalya

Lecture 16 Introduction to Controllers and PID Controllers Process Control Prof. Kannan M.

Complexity of Linear Regions in Deep Nets Boris Hanin Facebook AI Research and Texas A&amp;M

1 Stochastic, Partially Observable Markov Decision Process (MDP) Partially Observable MDP S

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Lifting Tropical Curves and Linear Systems on Graphs Eric Katz (University of Waterloo) September

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Complexity of Linear Regions in Deep Nets Boris Hanin Facebook AI Research and Texas A&M