one more advantage of deep learning
play

One More Advantage of Deep Learning: From Traditional NN . . . - PowerPoint PPT Presentation

Why Traditional . . . How the Need for Fast . . . Faster Differentiation: . . . Beyond Traditional NN One More Advantage of Deep Learning: From Traditional NN . . . While in General, A Perfect Training Formulation of the . . . of a Neural


  1. Why Traditional . . . How the Need for Fast . . . Faster Differentiation: . . . Beyond Traditional NN One More Advantage of Deep Learning: From Traditional NN . . . While in General, A Perfect Training Formulation of the . . . of a Neural Network Is NP-Hard, What Us Feasible: . . . It Is Feasible for Bounded-Width NP-Hardness Result Deep Networks Feasibility Resuly Home Page Vladik Kreinovich Title Page Department of Computer Science University of Texas at El Paso ◭◭ ◮◮ El Paso, TX 79968, USA vladik@utep.edu ◭ ◮ http://www.cs.utep.edu/vladik Page 1 of 29 (based on a joint work with Chitta Baral) Go Back Full Screen Close Quit

  2. Why Traditional . . . How the Need for Fast . . . 1. Why Traditional Neural Networks: Faster Differentiation: . . . (Sanitized) History Beyond Traditional NN • How do we make computers think? From Traditional NN . . . Formulation of the . . . • To make machines that fly it is reasonable to look at What Us Feasible: . . . the creatures that know how to fly: the birds. NP-Hardness Result • To make computers think, it is reasonable to analyze Feasibility Resuly how we humans think. Home Page • On the biological level, our brain processes information Title Page via special cells called neurons . ◭◭ ◮◮ • Somewhat surprisingly, in the brain, signals are electric ◭ ◮ – just as in the computer. Page 2 of 29 • The main difference is that in a neural network, signals Go Back are sequence of identical pulses. Full Screen Close Quit

  3. Why Traditional . . . How the Need for Fast . . . 2. Why Traditional NN: (Sanitized) History Faster Differentiation: . . . • The intensity of a signal is described by the frequency Beyond Traditional NN of pulses. From Traditional NN . . . Formulation of the . . . • A neuron has many inputs (up to 10 4 ). What Us Feasible: . . . • All the inputs x 1 , . . . , x n are combined, with some loss, NP-Hardness Result n into a frequency � w i · x i . Feasibility Resuly i =1 Home Page • Low inputs do not active the neuron at all, high inputs Title Page lead to largest activation. ◭◭ ◮◮ • The output signal is a non-linear function � n � ◭ ◮ � y = f w i · x i − w 0 . Page 3 of 29 i =1 Go Back • In biological neurons, f ( x ) = 1 / (1 + exp( − x )) . Full Screen • Traditional neural networks emulate such biological neurons. Close Quit

  4. Why Traditional . . . How the Need for Fast . . . 3. Why Traditional Neural Networks: Faster Differentiation: . . . Real History Beyond Traditional NN • At first, researchers ignored non-linearity and only From Traditional NN . . . used linear neurons. Formulation of the . . . What Us Feasible: . . . • They got good results and made many promises. NP-Hardness Result • The euphoria ended in the 1960s when MIT’s Marvin Feasibility Resuly Minsky and Seymour Papert published a book. Home Page • Their main result was that a composition of linear func- Title Page tions is linear (I am not kidding). ◭◭ ◮◮ • This ended the hopes of original schemes. ◭ ◮ • For some time, neural networks became a bad word. Page 4 of 29 • Then, smart researchers came us with a genius idea: Go Back let’s make neurons non-linear. Full Screen • This revived the field. Close Quit

  5. Why Traditional . . . How the Need for Fast . . . 4. Traditional Neural Networks: Main Motivation Faster Differentiation: . . . • One of the main motivations for neural networks was Beyond Traditional NN that computers were slow. From Traditional NN . . . Formulation of the . . . • Although human neurons are much slower than CPU, What Us Feasible: . . . the human processing was often faster. NP-Hardness Result • So, the main motivation was to make data processing Feasibility Resuly faster. Home Page • The idea was that: Title Page – since we are the result of billion years of ever im- ◭◭ ◮◮ proving evolution, ◭ ◮ – our biological mechanics should be optimal (or close Page 5 of 29 to optimal). Go Back Full Screen Close Quit

  6. Why Traditional . . . How the Need for Fast . . . 5. How the Need for Fast Computation Leads to Faster Differentiation: . . . Traditional Neural Networks Beyond Traditional NN • To make processing faster, we need to have many fast From Traditional NN . . . processing units working in parallel. Formulation of the . . . What Us Feasible: . . . • The fewer layers, the smaller overall processing time. NP-Hardness Result • In nature, there are many fast linear processes – e.g., Feasibility Resuly combining electric signals. Home Page • As a result, linear processing (L) is faster than non- Title Page linear one. ◭◭ ◮◮ • For non-linear processing, the more inputs, the longer ◭ ◮ it takes. Page 6 of 29 • So, the fastest non-linear processing (NL) units process just one input. Go Back Full Screen • It turns out that two layers are not enough to approx- imate any function. Close Quit

  7. Why Traditional . . . How the Need for Fast . . . 6. Why One or Two Layers Are Not Enough Faster Differentiation: . . . • With 1 linear (L) layer, we only get linear functions. Beyond Traditional NN From Traditional NN . . . • With one nonlinear (NL) layer, we only get functions Formulation of the . . . of one variable. � n What Us Feasible: . . . � • With L → NL layers, we get g � w i · x i − w 0 . NP-Hardness Result i =1 Feasibility Resuly • For these functions, the level sets f ( x 1 , . . . , x n ) = const Home Page n � are planes w i · x i = c . Title Page i =1 • Thus, they cannot approximate, e.g., f ( x 1 , x 2 ) = x 1 · x 2 ◭◭ ◮◮ for which the level set is a hyperbola. ◭ ◮ n • For NL → L layers, we get f ( x 1 , . . . , x n ) = � f i ( x i ). Page 7 of 29 i =1 Go Back ∂ 2 f def • For all these functions, d = = 0, so we also Full Screen ∂x 1 ∂x 2 cannot approximate f ( x 1 , x 2 ) = x 1 · x 2 with d = 1 � = 0. Close Quit

  8. Why Traditional . . . How the Need for Fast . . . 7. Why Three Layers Are Sufficient: Faster Differentiation: . . . Newton’s Prism and Fourier Transform Beyond Traditional NN • In principle, we can have two 3-layer configurations: From Traditional NN . . . L → NL → L and NL → L → NL. Formulation of the . . . What Us Feasible: . . . • Since L is faster than NL, the fastest is L → NL → L: � n NP-Hardness Result K � � � y = W k · f k w ki · x i − w k 0 − W 0 . Feasibility Resuly Home Page k =1 i =1 • Newton showed that a prism decomposes while light Title Page (or any light) into elementary colors. ◭◭ ◮◮ • In precise terms, elementary colors are sinusoids ◭ ◮ A · sin( w · t ) + B · cos( w · t ) . Page 8 of 29 • Thus, every function can be approximated, with any Go Back accuracy, as a linear combination of sinusoids: Full Screen � f ( x 1 ) ≈ ( A k · sin( w k · x 1 ) + B k · cos( w k · x 1 )) . Close k Quit

  9. Why Traditional . . . How the Need for Fast . . . 8. Why Three Layers Are Sufficient (cont-d) Faster Differentiation: . . . • Newton’s prism result: Beyond Traditional NN � From Traditional NN . . . f ( x 1 ) ≈ ( A k · sin( w k · x 1 ) + B k · cos( w k · x 1 )) . Formulation of the . . . k What Us Feasible: . . . • This result was theoretically proven later by Fourier. NP-Hardness Result • For f ( x 1 , x 2 ), we get a similar expression for each x 2 , Feasibility Resuly with A k ( x 2 ) and B k ( x 2 ). Home Page • We can similarly represent A k ( x 2 ) and B k ( x 2 ), thus Title Page getting products of sines, and it is known that, e.g.: ◭◭ ◮◮ cos( a ) · cos( b ) = 1 2 · (cos( a + b ) + cos( a − b )) . ◭ ◮ • Thus, we get an approximation of the desired form with Page 9 of 29 f k = sin or f k = cos: Go Back � n K � � � Full Screen y = W k · f k w ki · x i − w k 0 . i =1 k =1 Close Quit

  10. Why Traditional . . . How the Need for Fast . . . 9. Which Activation Functions f k ( z ) Should We Faster Differentiation: . . . Choose Beyond Traditional NN • A general 3-layer NN has the form: From Traditional NN . . . � n K � Formulation of the . . . � � y = W k · f k w ki · x i − w k 0 − W 0 . What Us Feasible: . . . i =1 k =1 NP-Hardness Result • Biological neurons use f ( z ) = 1 / (1 + exp( − z )), but Feasibility Resuly shall we simulate it? Home Page • Simulations are not always efficient. Title Page • E.g., airplanes have wings like birds but they do not ◭◭ ◮◮ flap them. ◭ ◮ • Let us analyze this problem theoretically. Page 10 of 29 • There is always some noise c in the communication Go Back channel. Full Screen • So, we can consider either the original signals x i or Close denoised ones x i − c . Quit

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend