orthogonal bases are the
play

Orthogonal Bases Are the Towards Formulating . . . Best: A Theorem - PowerPoint PPT Presentation

Neural Networks: . . . Apollonis Idea Why Symmetries? Symmetries Explain . . . Orthogonal Bases Are the Towards Formulating . . . Best: A Theorem Justifying How to Describe . . . Kahrunen-Loeve (KL) . . . Bruno Apollonis Heuristic


  1. Neural Networks: . . . Apolloni’s Idea Why Symmetries? Symmetries Explain . . . Orthogonal Bases Are the Towards Formulating . . . Best: A Theorem Justifying How to Describe . . . Kahrunen-Loeve (KL) . . . Bruno Apolloni’s Heuristic Proof of the Main Result Conclusions Neural Network Idea Home Page Jaime Nava and Vladik Kreinovich Title Page ◭◭ ◮◮ Department of Computer Science University of Texas at El Paso ◭ ◮ 500 W. University El Paso, TX 79968, USA Page 1 of 18 Emails: jenava@miners.utep.edu, Go Back vladik@utep.edu Full Screen Close Quit

  2. Neural Networks: . . . Apolloni’s Idea 1. Neural Networks: Brief Reminder Why Symmetries? • In the traditional (3-layer) neural networks, the input Symmetries Explain . . . values x 1 , . . . , x n : Towards Formulating . . . How to Describe . . . – first go through the non-linear layer of “hidden” Kahrunen-Loeve (KL) . . . neurons, resulting in the values � n Proof of the Main Result � � y i = s 0 w ij · x j − w i 0 1 ≤ i ≤ m, Conclusions Home Page j =1 – after which a linear neuron combines the results y i Title Page m into the output y = � W i · y i − W 0 . ◭◭ ◮◮ i =1 ◭ ◮ • Here, W i and w ij are weights selected based on the data, and s 0 ( z ) is a non-linear activation function . Page 2 of 18 Go Back • Usually, the “sigmoid” activation function is used: 1 Full Screen s 0 ( z ) = 1 + exp( − z ) . Close Quit

  3. Neural Networks: . . . Apolloni’s Idea 2. Training a Neural Network: Reminder Why Symmetries? • The weights W i and w ij are selected so as to fit the Symmetries Explain . . . data, i.e., that Towards Formulating . . . y ( k ) ≈ f � � x ( k ) How to Describe . . . 1 , . . . , x ( k ) , where: n Kahrunen-Loeve (KL) . . . • x ( k ) 1 , . . . , x ( k ) (1 ≤ k ≤ N ) are given values of the Proof of the Main Result n inputs, and Conclusions • y ( k ) are given values of the output. Home Page Title Page • One of the problems with the traditional neural net- works is that ◭◭ ◮◮ – in the process of learning – i.e., in the process of ◭ ◮ adjusting the values of the weights to fit the data – Page 3 of 18 – some of the neurons are duplicated, i.e., we get w ij = w i ′ j for some i � = i ′ and thus, y i = y i ′ . Go Back Full Screen • As a result, we do not fully use the learning capacity of a neural network: we could use fewer hidden neurons. Close Quit

  4. Neural Networks: . . . Apolloni’s Idea 3. Apolloni’s Idea Why Symmetries? • Problem (reminder): Symmetries Explain . . . Towards Formulating . . . – in the process of learning – i.e., in the process of How to Describe . . . adjusting the values of the weights to fit the data – Kahrunen-Loeve (KL) . . . – some of the neurons are duplicated, i.e., we get w ij = w i ′ j for some i � = i ′ and thus, y i = y i ′ . Proof of the Main Result Conclusions • To avoid this problem, B. Apolloni et al. suggested that Home Page we orthogonalize the neurons during training. Title Page • In other words, we make sure that the corresponding ◭◭ ◮◮ functions y i ( x 1 , . . . , x n ) remain orthogonal: ◭ ◮ � � y i , y j � = y i ( x ) · y j ( x ) dx = 0 . Page 4 of 18 Go Back • Since Apolloni et al. idea works well, it is desirable to look for its precise mathematical justification. Full Screen • We provide such a justification in terms of symmetries. Close Quit

  5. Neural Networks: . . . Apolloni’s Idea 4. Why Symmetries? Why Symmetries? • At first glance, the use of symmetries in neural net- Symmetries Explain . . . works may sound somewhat strange. Towards Formulating . . . How to Describe . . . • Indeed, there are no explicit symmetries there. Kahrunen-Loeve (KL) . . . • However, as we will show, hidden symmetries have been Proof of the Main Result actively used in neural networks. Conclusions Home Page • For example, symmetries explain the empirically ob- served advantages of the sigmoid activation function Title Page 1 ◭◭ ◮◮ s 0 ( z ) = 1 + exp( − z ) . ◭ ◮ Page 5 of 18 Go Back Full Screen Close Quit

  6. Neural Networks: . . . Apolloni’s Idea 5. Symmetry: a Fundamental Property of the Phys- Why Symmetries? ical World Symmetries Explain . . . • One of the main objectives of science: prediction. Towards Formulating . . . How to Describe . . . • Basis for prediction: we observed similar situations in Kahrunen-Loeve (KL) . . . the past, and we expect similar outcomes. Proof of the Main Result • In mathematical terms: similarity corresponds to sym- Conclusions metry , and similarity of outcomes – to invariance. Home Page • Example: we dropped the ball, it fall down. Title Page • Symmetries: shift, rotation, etc. ◭◭ ◮◮ • In modern physics: theories are usually formulated in ◭ ◮ terms of symmetries (not diff. equations). Page 6 of 18 • Natural idea: let us use symmetry to describe uncer- Go Back tainty as well. Full Screen Close Quit

  7. Neural Networks: . . . Apolloni’s Idea 6. Basic Symmetries: Scaling and Shift Why Symmetries? • Typical situation: we deal with the numerical values of Symmetries Explain . . . a physical quantity. Towards Formulating . . . How to Describe . . . • Numerical values depend on the measuring unit. Kahrunen-Loeve (KL) . . . • Scaling: if we use a new unit which is λ times smaller, Proof of the Main Result numerical values are multiplied by λ : x → λ · x . Conclusions Home Page • Example: x meters = 100 · x cm. Title Page • Another possibility: change the starting point. ◭◭ ◮◮ • Shift: if we use a new starting point which is s units before, then x → x + s (example: time). ◭ ◮ Page 7 of 18 • Together, scaling and shifts form linear transforma- tions x → a · x + b . Go Back • Invariance: physical formulas should not depend on Full Screen the choice of a measuring unit or of a starting point. Close Quit

  8. Neural Networks: . . . Apolloni’s Idea 7. Basic Nonlinear Symmetries Why Symmetries? • Sometimes, a system also has nonlinear symmetries. Symmetries Explain . . . Towards Formulating . . . • If a system is invariant under f and g , then: How to Describe . . . – it is invariant under their composition f ◦ g , and Kahrunen-Loeve (KL) . . . – it is invariant under the inverse transformation f − 1 . Proof of the Main Result • In mathematical terms, this means that symmetries Conclusions Home Page form a group . Title Page • In practice, at any given moment of time, we can only store and describe finitely many parameters. ◭◭ ◮◮ • Thus, it is reasonable to restrict ourselves to finite- ◭ ◮ dimensional groups. Page 8 of 18 • Question (N. Wiener): describe all finite-dimensional Go Back groups that contain all linear transformations. Full Screen • Answer (for real numbers): all elements of this group Close are fractionally-linear x → ( a · x + b ) / ( c · x + d ) . Quit

  9. Neural Networks: . . . Apolloni’s Idea 8. Symmetries Explain the Choice of an Activa- Why Symmetries? tion Function Symmetries Explain . . . • What needs explaining: formula for the activation func- Towards Formulating . . . tion f ( x ) = 1 / (1 + e − x ). How to Describe . . . Kahrunen-Loeve (KL) . . . • A change in the input starting point: x → x + s . Proof of the Main Result • Reasonable requirement: the new output f ( x + s ) equiv- Conclusions alent to the f ( x ) mod. appropriate transformation. Home Page • Reminder: all appropriate transformations are frac- Title Page tionally linear. ◭◭ ◮◮ • Conclusion: f ( x + s ) = a ( s ) · f ( x ) + b ( s ) c ( s ) · f ( x ) + d ( s ) . ◭ ◮ Page 9 of 18 • Differentiating both sides by s and equating s to 0, we get a differential equation for f ( x ). Go Back • Its known solution is the sigmoid activation function – Full Screen which can thus be explained by symmetries. Close Quit

  10. Neural Networks: . . . Apolloni’s Idea 9. Towards Formulating the Problem in Precise Why Symmetries? Terms Symmetries Explain . . . • We select a basis e 0 ( x ) , e 1 ( x ) , . . . , e n ( x ) , . . . so that each Towards Formulating . . . f-n f ( x ) is represented as f ( x ) = � c i · e i ( x ); e.g.: How to Describe . . . i Kahrunen-Loeve (KL) . . . • Taylor series: e 0 ( x ) = 1, e 1 ( x ) = x , e 2 ( x ) = x 2 , . . . Proof of the Main Result • Fourier transform: e i ( x ) = sin( ω i · x ). Conclusions Home Page • We store c 0 , c 1 , . . . , instead of the original f-n f ( x ). Title Page • Criterion: e.g., smallest # of bits to store f ( x ) with given accuracy. ◭◭ ◮◮ • Observation: storing c i and − c i takes the same space. ◭ ◮ • Thus, changing one of e i ( x ) to e ′ Page 10 of 18 i ( x ) = − e i ( x ) does not change accuracy or storage space, so: Go Back • if e 0 ( x ) , . . . , e i − 1 ( x ) , e i ( x ) , e i +1 ( x ) , . . . is an opt. base, Full Screen • e 0 ( x ) , . . . , e i − 1 ( x ) , − e i ( x ) , e i +1 ( x ) , . . . is also optimal. Close Quit

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend