from traditional neural
play

From Traditional Neural From Traditional NN . . . Networks to Deep - PowerPoint PPT Presentation

Why Traditional . . . How the Need for Fast . . . Faster Differentiation: . . . Beyond Traditional NN From Traditional Neural From Traditional NN . . . Networks to Deep Learning Need to Go Beyond . . . Constraints Are . . . and Beyond


  1. Why Traditional . . . How the Need for Fast . . . Faster Differentiation: . . . Beyond Traditional NN From Traditional Neural From Traditional NN . . . Networks to Deep Learning Need to Go Beyond . . . Constraints Are . . . and Beyond Carnegie-Mellon Idea New Idea: Details Vladik Kreinovich Home Page Title Page Department of Computer Science University of Texas at El Paso ◭◭ ◮◮ El Paso, TX 79968, USA ◭ ◮ vladik@utep.edu http://www.cs.utep.edu/vladik Page 1 of 38 (Based on joint work with Chitta Baral, Go Back also with Olac Fuentes and Francisco Zapata) Full Screen Close Quit

  2. Why Traditional . . . How the Need for Fast . . . 1. Why Traditional Neural Networks: Faster Differentiation: . . . (Sanitized) History Beyond Traditional NN • How do we make computers think? From Traditional NN . . . Need to Go Beyond . . . • To make machines that fly it is reasonable to look at Constraints Are . . . the creatures that know how to fly: the birds. Carnegie-Mellon Idea • To make computers think, it is reasonable to analyze New Idea: Details how we humans think. Home Page • On the biological level, our brain processes information Title Page via special cells called ]it neurons. ◭◭ ◮◮ • Somewhat surprisingly, in the brain, signals are electric ◭ ◮ – just as in the computer. Page 2 of 38 • The main difference is that in a neural network, signals Go Back are sequence of identical pulses. Full Screen Close Quit

  3. Why Traditional . . . How the Need for Fast . . . 2. Why Traditional NN: (Sanitized) History Faster Differentiation: . . . • The intensity of a signal is described by the frequency Beyond Traditional NN of pulses. From Traditional NN . . . Need to Go Beyond . . . • A neuron has many inputs (up to 10 4 ). Constraints Are . . . • All the inputs x 1 , . . . , x n are combined, with some loss, Carnegie-Mellon Idea n into a frequency � w i · x i . New Idea: Details i =1 Home Page • Low inputs do not active the neuron at all, high inputs Title Page lead to largest activation. ◭◭ ◮◮ • The output signal is a non-linear function � n � ◭ ◮ � y = f w i · x i − w 0 . Page 3 of 38 i =1 Go Back • In biological neurons, f ( x ) = 1 / (1 + exp( − x )) . Full Screen • Traditional neural networks emulate such biological neurons. Close Quit

  4. Why Traditional . . . How the Need for Fast . . . 3. Why Traditional Neural Networks: Faster Differentiation: . . . Real History Beyond Traditional NN • At first, researchers ignored non-linearity and only From Traditional NN . . . used linear neurons. Need to Go Beyond . . . Constraints Are . . . • They got good results and made many promises. Carnegie-Mellon Idea • The euphoria ended in the 1960s when MIT’s Marvin New Idea: Details Minsky and Seymour Papert published a book. Home Page • Their main result was that a composition of linear func- Title Page tions is linear (I am not kidding). ◭◭ ◮◮ • This ended the hopes of original schemes. ◭ ◮ • For some time, neural networks became a bad word. Page 4 of 38 • Then, smart researchers came us with a genius idea: Go Back let’s make neurons non-linear. Full Screen • This revived the field. Close Quit

  5. Why Traditional . . . How the Need for Fast . . . 4. Traditional Neural Networks: Main Motivation Faster Differentiation: . . . • One of the main motivations for neural networks was Beyond Traditional NN that computers were slow. From Traditional NN . . . Need to Go Beyond . . . • Although human neurons are much slower than CPU, Constraints Are . . . the human processing was often faster. Carnegie-Mellon Idea • So, the main motivation was to make data processing New Idea: Details faster. Home Page • The idea was that: Title Page – since we are the result of billion years of ever im- ◭◭ ◮◮ proving evolution, ◭ ◮ – our biological mechanics should be optimal (or close Page 5 of 38 to optimal). Go Back Full Screen Close Quit

  6. Why Traditional . . . How the Need for Fast . . . 5. How the Need for Fast Computation Leads to Faster Differentiation: . . . Traditional Neural Networks Beyond Traditional NN • To make processing faster, we need to have many fast From Traditional NN . . . processing units working in parallel. Need to Go Beyond . . . Constraints Are . . . • The fewer layers, the smaller overall processing time. Carnegie-Mellon Idea • In nature, there are many fast linear processes – e.g., New Idea: Details combining electric signals. Home Page • As a result, linear processing (L) is faster than non- Title Page linear one. ◭◭ ◮◮ • For non-linear processing, the more inputs, the longer ◭ ◮ it takes. Page 6 of 38 • So, the fastest non-linear processing (NL) units process just one input. Go Back Full Screen • It turns out that two layers are not enough to approx- imate any function. Close Quit

  7. Why Traditional . . . How the Need for Fast . . . 6. Why One or Two Layers Are Not Enough Faster Differentiation: . . . • With 1 linear (L) layer, we only get linear functions. Beyond Traditional NN From Traditional NN . . . • With one nonlinear (NL) layer, we only get functions Need to Go Beyond . . . of one variable. � n Constraints Are . . . � • With L → NL layers, we get g � w i · x i − w 0 . Carnegie-Mellon Idea i =1 New Idea: Details • For these functions, the level sets f ( x 1 , . . . , x n ) = const Home Page n � are planes w i · x i = c . Title Page i =1 • Thus, they cannot approximate, e.g., f ( x 1 , x 2 ) = x 1 · x 2 ◭◭ ◮◮ for which the level set is a hyperbola. ◭ ◮ n • For NL → L layers, we get f ( x 1 , . . . , x n ) = � f i ( x i ). Page 7 of 38 i =1 Go Back ∂ 2 f def • For all these functions, d = = 0, so we also Full Screen ∂x 1 ∂x 2 cannot approximate f ( x 1 , x 2 ) = x 1 · x 2 with d = 1 � = 0. Close Quit

  8. Why Traditional . . . How the Need for Fast . . . 7. Why Three Layers Are Sufficient: Faster Differentiation: . . . Newton’s Prism and Fourier Transform Beyond Traditional NN • In principle, we can have two 3-layer configurations: From Traditional NN . . . L → NL → L and NL → L → NL. Need to Go Beyond . . . Constraints Are . . . • Since L is faster than NL, the fastest is L → NL → L: � n Carnegie-Mellon Idea K � � � y = W k · f k w ki · x i − w k 0 − W 0 . New Idea: Details Home Page k =1 i =1 • Newton showed that a prism decomposes while light Title Page (or any light) into elementary colors. ◭◭ ◮◮ • In precise terms, elementary colors are sinusoids ◭ ◮ A · sin( w · t ) + B · cos( w · t ) . Page 8 of 38 • Thus, every function can be approximated, with any Go Back accuracy, as a linear combination of sinusoids: Full Screen � f ( x 1 ) ≈ ( A k · sin( w k · x 1 ) + B k · cos( w k · x 1 )) . Close k Quit

  9. Why Traditional . . . How the Need for Fast . . . 8. Why Three Layers Are Sufficient (cont-d) Faster Differentiation: . . . • Newton’s prism result: Beyond Traditional NN � From Traditional NN . . . f ( x 1 ) ≈ ( A k · sin( w k · x 1 ) + B k · cos( w k · x 1 )) . Need to Go Beyond . . . k Constraints Are . . . • This result was theoretically proven later by Fourier. Carnegie-Mellon Idea • For f ( x 1 , x 2 ), we get a similar expression for each x 2 , New Idea: Details with A k ( x 2 ) and B k ( x 2 ). Home Page • We can similarly represent A k ( x 2 ) and B k ( x 2 ), thus Title Page getting products of sines, and it is known that, e.g.: ◭◭ ◮◮ cos( a ) · cos( b ) = 1 2 · (cos( a + b ) + cos( a − b )) . ◭ ◮ • Thus, we get an approximation of the desired form with Page 9 of 38 f k = sin or f k = cos: Go Back � n K � � � Full Screen y = W k · f k w ki · x i − w k 0 . i =1 k =1 Close Quit

  10. Why Traditional . . . How the Need for Fast . . . 9. Which Activation Functions f k ( z ) Should We Faster Differentiation: . . . Choose Beyond Traditional NN • A general 3-layer NN has the form: From Traditional NN . . . � n K � Need to Go Beyond . . . � � y = W k · f k w ki · x i − w k 0 − W 0 . Constraints Are . . . i =1 k =1 Carnegie-Mellon Idea • Biological neurons use f ( z ) = 1 / (1 + exp( − z )), but New Idea: Details shall we simulate it? Home Page • Simulations are not always efficient. Title Page • E.g., airplanes have wings like birds but they do not ◭◭ ◮◮ flap them. ◭ ◮ • Let us analyze this problem theoretically. Page 10 of 38 • There is always some noise c in the communication Go Back channel. Full Screen • So, we can consider either the original signals x i or Close denoised ones x i − c . Quit

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend