deep learning partly
play

Deep Learning (Partly) Need for Pooling Demystified Which Pooling - PowerPoint PPT Presentation

Overview Traditional Neural . . . Why Go Beyond . . . Which Activation . . . Deep Learning (Partly) Need for Pooling Demystified Which Pooling . . . Pooling Four Values Sensitivity of Deep . . . Vladik Kreinovich How to Deal with . . .


  1. Overview Traditional Neural . . . Why Go Beyond . . . Which Activation . . . Deep Learning (Partly) Need for Pooling Demystified Which Pooling . . . Pooling Four Values Sensitivity of Deep . . . Vladik Kreinovich How to Deal with . . . Department of Computer Science Home Page University of Texas at El Paso Title Page El Paso, Texas 79968, USA vladik@utep.edu ◭◭ ◮◮ http://www.cs.utep.edu/vladik ◭ ◮ Page 1 of 35 Go Back Full Screen Close Quit

  2. Overview Traditional Neural . . . 1. Overview Why Go Beyond . . . • Successes of deep learning are partly due to appropriate Which Activation . . . selection of activation function, pooling functions, etc. Need for Pooling Which Pooling . . . • Most of these choices have been made based on empir- Pooling Four Values ical comparison and heuristic ideas. Sensitivity of Deep . . . • In this talk, we show that: How to Deal with . . . Home Page – many of these choices – and the surprising success of deep learning in the first place Title Page – can be explained by reasonably simple and natural ◭◭ ◮◮ mathematics. ◭ ◮ Page 2 of 35 Go Back Full Screen Close Quit

  3. Overview Traditional Neural . . . 2. Traditional Neural Networks: A Brief Reminder Why Go Beyond . . . • To explain deep neural networks, let us first briefly Which Activation . . . recall the motivations behind traditional ones. Need for Pooling Which Pooling . . . • In the old days, computers were much slower. Pooling Four Values • This was a big limitation that prevented us from solv- Sensitivity of Deep . . . ing many important practical problems. How to Deal with . . . • As a result, researchers started looking for ways to Home Page speed up computations. Title Page • If a person has a task which takes too long for one ◭◭ ◮◮ person, a natural idea is to ask for help. ◭ ◮ • Several people can work on this task in parallel – and Page 3 of 35 thus, get the result faster; similarly: Go Back – if a computation task takes too long, Full Screen – a natural idea is to have several processing units working in parallel. Close Quit

  4. Overview Traditional Neural . . . 3. Traditional Neural Networks (cont-d) Why Go Beyond . . . • In this case: Which Activation . . . Need for Pooling – the overall computation time is just Which Pooling . . . – the time that is needed for each of the processing Pooling Four Values unit to finish its sub-task. Sensitivity of Deep . . . • To minimize the overall time, it is therefore necessary How to Deal with . . . to make these sub-tasks as simple as possible. Home Page • In data processing, the simplest possible functions to Title Page compute are linear functions. ◭◭ ◮◮ • However, if we only have processing units that compute ◭ ◮ linear functions, we will only compute linear functions. Page 4 of 35 • Indeed, a composition of linear functions is always lin- Go Back ear. Full Screen • Thus, we need to supplement these units with some nonlinear units. Close Quit

  5. Overview Traditional Neural . . . 4. Traditional Neural Networks (cont-d) Why Go Beyond . . . • In general, the more inputs, the more complex (and Which Activation . . . thus longer) the resulting computations. Need for Pooling Which Pooling . . . • So, the fastest possible nonlinear units are the ones Pooling Four Values that compute functions of one variable. Sensitivity of Deep . . . • So, our ideal computational device should consist of: How to Deal with . . . – linear (L) units and Home Page – nonlinear units (NL) that compute functions of one Title Page variable. ◭◭ ◮◮ • These units should work in parallel: ◭ ◮ – first, all the units from one layer will work, Page 5 of 35 – then all units from another layer, etc. Go Back • The fewer layers, the faster the resulting computations. Full Screen • One can prove that 1- and 2-layer schemes do not have a universal approximation property. Close Quit

  6. Overview Traditional Neural . . . 5. Traditional Neural Networks (cont-d) Why Go Beyond . . . • One can also prove that 3-layer neurons already have Which Activation . . . this property. Need for Pooling Which Pooling . . . • There are two possible 3-layer schemes: L-NL-L and Pooling Four Values NL-L-NL. Sensitivity of Deep . . . • The first one is faster, since it uses slower nonlinear How to Deal with . . . units only once. Home Page • In this scheme, first, each unit from the first layer ap- Title Page plies a linear transformation to the inputs x 1 , . . . , x n : ◭◭ ◮◮ n � ◭ ◮ z k = w ki · x i − w k 0 . i =1 Page 6 of 35 • The values w ki are known as weights . Go Back Full Screen • In the next NL layer, these values are transformed into y k = s k ( y k ), for some nonlinear functions s k ( z ). Close Quit

  7. Overview Traditional Neural . . . 6. Traditional Neural Networks (cont-d) Why Go Beyond . . . • Finally, in the last (L) layer, the values y k are linearly Which Activation . . . combined into the final result Need for Pooling � n Which Pooling . . . K K � � � � y = W k · y k − W 0 = W k · s k w ki · x i − w k 0 − W 0 . Pooling Four Values k =1 k =1 i =1 Sensitivity of Deep . . . How to Deal with . . . • This is exactly the formula that describes the tradi- Home Page tional neural network. Title Page • In the traditional neural network, usually, all the NL ◭◭ ◮◮ neurons compute the same function – sigmoid: 1 ◭ ◮ s k ( z ) = 1 + exp( − z ) . Page 7 of 35 Go Back Full Screen Close Quit

  8. Overview Traditional Neural . . . 7. Why Go Beyond Traditional Neural Networks Why Go Beyond . . . • Traditional neural networks were invented when com- Which Activation . . . puters were reasonably slow. Need for Pooling Which Pooling . . . • This prevented computers from solving important prac- Pooling Four Values tical problems. Sensitivity of Deep . . . • For these computers, computation speed was the main How to Deal with . . . objective. Home Page • As we have just shown, this need led to what we know Title Page as traditional neural networks. ◭◭ ◮◮ • Nowadays, computers are much faster. ◭ ◮ • In most practical applications, speed is no longer the Page 8 of 35 main problem. Go Back • But the traditional neural networks: Full Screen – while fast, – have limited accuracy of their predictions. Close Quit

  9. Overview Traditional Neural . . . 8. The More Models We Have, the More Accu- Why Go Beyond . . . rately We Can Approximate Which Activation . . . • As a result of training a neural network, we get the Need for Pooling values of some parameters for which Which Pooling . . . Pooling Four Values – the corresponding models Sensitivity of Deep . . . – provides the best approximation to the actual data. How to Deal with . . . • Let a denote the number of parameters. Home Page • Let b the number of bits representing each parameter. Title Page • Then, to represent all parameters, we need N = a · b ◭◭ ◮◮ bits. ◭ ◮ • Different models obtained from training can be de- Page 9 of 35 scribed by different N -bit sequences. • In general, for N bits, there are 2 N possible N -bit se- Go Back quences. Full Screen • Thus, we can have 2 N possible models. Close Quit

  10. Overview Traditional Neural . . . 9. The More Models We Have (cont-d) Why Go Beyond . . . • In these terms, training simply means selecting one of Which Activation . . . these 2 N possible models. Need for Pooling Which Pooling . . . • If we have only one model to represent the actual de- Pooling Four Values pendence, this model will be a very lousy description. Sensitivity of Deep . . . • If we can have two models, we can have more accurate How to Deal with . . . approximations. Home Page • In general, the more models we have, the more accurate Title Page representation we can have. ◭◭ ◮◮ • We can illustrate this idea on the example of approxi- ◭ ◮ mating real numbers from the interval [0 , 1]. Page 10 of 35 • If we have only one model – e.g., the value x = 0 . 5, then Go Back we approximate every other number with accuracy 0.5. Full Screen Close Quit

  11. Overview Traditional Neural . . . 10. The More Models We Have (cont-d) Why Go Beyond . . . • If we can have 10 models, then we can take 10 values Which Activation . . . 0.05, 0.15, . . . , 0.95. Need for Pooling Which Pooling . . . • The first value approximates all the numbers from the Pooling Four Values interval [0 , 0 . 1] with accuracy 0.05. Sensitivity of Deep . . . • The second value approximates all the numbers from How to Deal with . . . the interval [0 , 1 , 0 . 2] with the same accuracy, etc. Home Page • By selecting one of these values, we can approximate Title Page any number from [0 , 1] with accuracy 0.05. ◭◭ ◮◮ ◭ ◮ Page 11 of 35 Go Back Full Screen Close Quit

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend