why squashing functions in
play

Why Squashing Functions in Shall We Go Beyond . . . Which . . . - PowerPoint PPT Presentation

A Short Introduction Machine Learning Is . . . Deep Learning Why Squashing Functions in Shall We Go Beyond . . . Which . . . Multi-Layer Neural Invariance Traditional Neural . . . Networks This Leads Exactly to . . . Home Page Julio C.


  1. A Short Introduction Machine Learning Is . . . Deep Learning Why Squashing Functions in Shall We Go Beyond . . . Which . . . Multi-Layer Neural Invariance Traditional Neural . . . Networks This Leads Exactly to . . . Home Page Julio C. Urenda 1 , Orsolya Csisz´ ar 2 , 3 , G´ ar 4 , abor Csisz´ ozsef Dombi 5 , Olga Kosheleva 1 , Vladik Kreinovich 1 , J´ Title Page orgy Eigner 3 Gy¨ ◭◭ ◮◮ 1 University of Texas at El Paso, USA ◭ ◮ 2 University of Applied Sciences Esslingen, Germany 3 ´ Obuda University, Budapest, Hungary Page 1 of 46 4 University of Stuttgart, Germany Go Back 5 University of Szeged, Hungary Full Screen E-mails: vladik@utep.edu, orsolya.csiszar@nik.uni-obuda.hu, gabor.csiszar@mp.imw.uni-stuttgart.de, dombi@inf.u-szeged.hu, Close olgak@utep.edu, vladik@utep.edu, eigner.gyorgy@nik.uni-obuda.hu Quit

  2. A Short Introduction 1. A Short Introduction Machine Learning Is . . . Deep Learning • In their successful applications, deep neural networks Shall We Go Beyond . . . use a non-linear transformation s ( z ) = max(0 , z ). Which . . . • It is called a rectified linear activation function. Invariance Traditional Neural . . . • Sometimes, more general transformations – called squash- This Leads Exactly to . . . ing functions – lead to even better results. Home Page • In this talk, we provide a theoretical explanation for Title Page this empirical fact. ◭◭ ◮◮ • To provide this explanation, let us first briefly recall: ◭ ◮ – why we need machine learning in the first place, Page 2 of 46 – what are deep neural networks, and Go Back – what activation functions these neural networks use. Full Screen Close Quit

  3. A Short Introduction 2. Machine Learning Is Needed Machine Learning Is . . . Deep Learning • For some simple systems, we know the equations that Shall We Go Beyond . . . describe the system’s dynamics. Which . . . • These equations may be approximate, but they are of- Invariance ten good enough. Traditional Neural . . . • With more complex systems (such as systems of sys- This Leads Exactly to . . . Home Page tems), this is often no longer the case. Title Page • Even when we have a good approximate model for each subsystem, the corresponding inaccuracies add up. ◭◭ ◮◮ • So, the resulting model of the whole system is too in- ◭ ◮ accurate to be useful. Page 3 of 46 • We also need to use the records of the actual system’s Go Back behavior when making predictions. Full Screen • Using the previous behavior to predict the future is Close called machine learning . Quit

  4. A Short Introduction 3. Deep Learning Machine Learning Is . . . Deep Learning • The most efficient machine learning technique is deep Shall We Go Beyond . . . learning : the use of multi-layer neural networks. Which . . . • In general, on a layer of a neural network, we transform Invariance � n � signals x 1 , . . . , x n into a new signal y = s � w i · x i + w 0 . Traditional Neural . . . i =1 This Leads Exactly to . . . • The coefficient w i (called weights ) are to be determined Home Page during training. Title Page • s ( z ) is a non-linear function called activation function . ◭◭ ◮◮ • Most multi-layer neural networks use s ( z ) = max( z, 0) ◭ ◮ known as rectified linear function. Page 4 of 46 Go Back Full Screen Close Quit

  5. A Short Introduction 4. Shall We Go Beyond Rectified Linear? Machine Learning Is . . . Deep Learning • Preliminary analysis shows that for some applications: Shall We Go Beyond . . . – it is more advantageous to use different activation Which . . . functions for different neurons; Invariance – specifically, this was shown for a special family of Traditional Neural . . . squashing activation functions This Leads Exactly to . . . Home Page λ · β · ln 1 + exp( β · z − ( a − λ/ 2)) 1 S ( β ) a,λ ( z ) = 1 + exp( β · z − ( a + λ/ 2)); Title Page ◭◭ ◮◮ – this family contains rectified linear neurons as a particular case. ◭ ◮ • We explain their empirical success of squashing func- Page 5 of 46 tions by showing that: Go Back – their formulas Full Screen – follow from reasonably natural symmetries. Close Quit

  6. A Short Introduction 5. How This Talk Is Structured Machine Learning Is . . . Deep Learning • First, we recall the main ideas of symmetries and in- Shall We Go Beyond . . . variance. Which . . . • Then, we recall how these ideas can be used to explain Invariance the efficiency of the sigmoid activation function Traditional Neural . . . 1 This Leads Exactly to . . . s 0 ( z ) = 1 + exp( − z ) . Home Page Title Page • This function is used in the traditional 3-layer neural ◭◭ ◮◮ networks. ◭ ◮ • Finally, we use this information to explain the effi- ciency of squashing activation functions. Page 6 of 46 Go Back Full Screen Close Quit

  7. A Short Introduction 6. Which Transformations Are Natural? Machine Learning Is . . . Deep Learning • From the mathematical viewpoint, we can apply any Shall We Go Beyond . . . non-linear transformation. Which . . . • However, some of these transformations are purely math- Invariance ematical, with no clear physical interpretation. Traditional Neural . . . This Leads Exactly to . . . • Other transformation are natural in the sense that they Home Page have physical meaning. Title Page • What are natural transformations? ◭◭ ◮◮ ◭ ◮ Page 7 of 46 Go Back Full Screen Close Quit

  8. A Short Introduction 7. Numerical Values Change When We Change a Machine Learning Is . . . Measuring Unit And/Or Starting Point Deep Learning Shall We Go Beyond . . . • In data processing, we deal with numerical values of Which . . . different physical quantities. Invariance • Computers just treat these values as numbers. Traditional Neural . . . This Leads Exactly to . . . • However, from the physical viewpoint, the numerical Home Page values are not absolute; they change: Title Page – if we change the measuring unit and/or ◭◭ ◮◮ – the starting point for measuring the corresponding quantity. ◭ ◮ • The corresponding changes in numerical values are clearly Page 8 of 46 physically meaningful, i.e., natural. Go Back • For example, we can measure a person’s height in me- Full Screen ters or in centimeters. Close Quit

  9. A Short Introduction 8. Numerical Values Change (cont-d) Machine Learning Is . . . Deep Learning • The same height of 1.7 m, when described in centime- Shall We Go Beyond . . . ters, becomes 170 cm. Which . . . • In general, if we replace the original measuring unit Invariance with a new unit which is λ times smaller, then: Traditional Neural . . . – instead of the original numerical value x , This Leads Exactly to . . . Home Page – we get a new numerical value λ · x – while the actual quantity remains the same. Title Page • Such a transformation x → λ · x is known as scaling . ◭◭ ◮◮ ◭ ◮ • For some quantities, e.g., for time or temperature, the numerical value also depends on the starting point. Page 9 of 46 • For example, we can measure the time from the mo- Go Back ment when the talk started. Full Screen • Alternatively, we can use the usual calendar time, in Close which Year 0 is the starting point. Quit

  10. A Short Introduction 9. Numerical Values Change (cont-d) Machine Learning Is . . . Deep Learning • In general, if we replace the original starting point with Shall We Go Beyond . . . the new one which is x 0 units earlier, than: Which . . . – each original numerical value x Invariance – is replaced by a new numerical value x + x 0 . Traditional Neural . . . This Leads Exactly to . . . • Such a transformation x → x + x 0 is known as shift . Home Page • In general, if we change both the measuring unit and Title Page the starting point, we get a linear transformation: ◭◭ ◮◮ x → λ · x + x 0 . ◭ ◮ • A usual example of such a transformation is a transi- Page 10 of 46 tion from Celsius to Fahrenheit temperature scales: Go Back t F = 1 . 8 · t C + 32 . Full Screen Close Quit

  11. A Short Introduction 10. Invariance Machine Learning Is . . . Deep Learning • Changing the measuring unit and/or starting point: Shall We Go Beyond . . . – changes the numerical values but Which . . . – does not change the actual quantity. Invariance Traditional Neural . . . • It is therefore reasonable to require that physical equa- tions do not change if we simply: This Leads Exactly to . . . Home Page – change the measuring unit and/or Title Page – change the starting point. ◭◭ ◮◮ • Of course, to preserve the physical equations: ◭ ◮ – if we change the measuring unit and/or starting point for one quantity, Page 11 of 46 – we may need to change the measuring units and/or Go Back starting points for other quantities as well. Full Screen • For example, there is a well-known relation d = v · t Close between distance d , velocity v , and time t . Quit

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend