From Traditional Neural From Traditional NN . . . Networks to Deep - PowerPoint PPT Presentation

Why Traditional . . . How the Need for Fast . . . Faster Differentiation: . . . Beyond Traditional NN From Traditional Neural From Traditional NN . . . Networks to Deep Learning Need to Go Beyond . . . Constraints Are . . . and Beyond Carnegie-Mellon Idea New Idea: Details Vladik Kreinovich Home Page Title Page Department of Computer Science University of Texas at El Paso ◭◭ ◮◮ El Paso, TX 79968, USA ◭ ◮ vladik@utep.edu http://www.cs.utep.edu/vladik Page 1 of 38 (Based on joint work with Chitta Baral, Go Back also with Olac Fuentes and Francisco Zapata) Full Screen Close Quit

Why Traditional . . . How the Need for Fast . . . 1. Why Traditional Neural Networks: Faster Differentiation: . . . (Sanitized) History Beyond Traditional NN • How do we make computers think? From Traditional NN . . . Need to Go Beyond . . . • To make machines that fly it is reasonable to look at Constraints Are . . . the creatures that know how to fly: the birds. Carnegie-Mellon Idea • To make computers think, it is reasonable to analyze New Idea: Details how we humans think. Home Page • On the biological level, our brain processes information Title Page via special cells called ]it neurons. ◭◭ ◮◮ • Somewhat surprisingly, in the brain, signals are electric ◭ ◮ – just as in the computer. Page 2 of 38 • The main difference is that in a neural network, signals Go Back are sequence of identical pulses. Full Screen Close Quit

Why Traditional . . . How the Need for Fast . . . 2. Why Traditional NN: (Sanitized) History Faster Differentiation: . . . • The intensity of a signal is described by the frequency Beyond Traditional NN of pulses. From Traditional NN . . . Need to Go Beyond . . . • A neuron has many inputs (up to 10 4 ). Constraints Are . . . • All the inputs x 1 , . . . , x n are combined, with some loss, Carnegie-Mellon Idea n into a frequency � w i · x i . New Idea: Details i =1 Home Page • Low inputs do not active the neuron at all, high inputs Title Page lead to largest activation. ◭◭ ◮◮ • The output signal is a non-linear function � n � ◭ ◮ � y = f w i · x i − w 0 . Page 3 of 38 i =1 Go Back • In biological neurons, f ( x ) = 1 / (1 + exp( − x )) . Full Screen • Traditional neural networks emulate such biological neurons. Close Quit

Why Traditional . . . How the Need for Fast . . . 3. Why Traditional Neural Networks: Faster Differentiation: . . . Real History Beyond Traditional NN • At first, researchers ignored non-linearity and only From Traditional NN . . . used linear neurons. Need to Go Beyond . . . Constraints Are . . . • They got good results and made many promises. Carnegie-Mellon Idea • The euphoria ended in the 1960s when MIT’s Marvin New Idea: Details Minsky and Seymour Papert published a book. Home Page • Their main result was that a composition of linear func- Title Page tions is linear (I am not kidding). ◭◭ ◮◮ • This ended the hopes of original schemes. ◭ ◮ • For some time, neural networks became a bad word. Page 4 of 38 • Then, smart researchers came us with a genius idea: Go Back let’s make neurons non-linear. Full Screen • This revived the field. Close Quit

Why Traditional . . . How the Need for Fast . . . 4. Traditional Neural Networks: Main Motivation Faster Differentiation: . . . • One of the main motivations for neural networks was Beyond Traditional NN that computers were slow. From Traditional NN . . . Need to Go Beyond . . . • Although human neurons are much slower than CPU, Constraints Are . . . the human processing was often faster. Carnegie-Mellon Idea • So, the main motivation was to make data processing New Idea: Details faster. Home Page • The idea was that: Title Page – since we are the result of billion years of ever im- ◭◭ ◮◮ proving evolution, ◭ ◮ – our biological mechanics should be optimal (or close Page 5 of 38 to optimal). Go Back Full Screen Close Quit

Why Traditional . . . How the Need for Fast . . . 5. How the Need for Fast Computation Leads to Faster Differentiation: . . . Traditional Neural Networks Beyond Traditional NN • To make processing faster, we need to have many fast From Traditional NN . . . processing units working in parallel. Need to Go Beyond . . . Constraints Are . . . • The fewer layers, the smaller overall processing time. Carnegie-Mellon Idea • In nature, there are many fast linear processes – e.g., New Idea: Details combining electric signals. Home Page • As a result, linear processing (L) is faster than non- Title Page linear one. ◭◭ ◮◮ • For non-linear processing, the more inputs, the longer ◭ ◮ it takes. Page 6 of 38 • So, the fastest non-linear processing (NL) units process just one input. Go Back Full Screen • It turns out that two layers are not enough to approximate any function. Close Quit

Why Traditional . . . How the Need for Fast . . . 6. Why One or Two Layers Are Not Enough Faster Differentiation: . . . • With 1 linear (L) layer, we only get linear functions. Beyond Traditional NN From Traditional NN . . . • With one nonlinear (NL) layer, we only get functions Need to Go Beyond . . . of one variable. � n Constraints Are . . . � • With L → NL layers, we get g � w i · x i − w 0 . Carnegie-Mellon Idea i =1 New Idea: Details • For these functions, the level sets f ( x 1 , . . . , x n ) = const Home Page n � are planes w i · x i = c . Title Page i =1 • Thus, they cannot approximate, e.g., f ( x 1 , x 2 ) = x 1 · x 2 ◭◭ ◮◮ for which the level set is a hyperbola. ◭ ◮ n • For NL → L layers, we get f ( x 1 , . . . , x n ) = � f i ( x i ). Page 7 of 38 i =1 Go Back ∂ 2 f def • For all these functions, d = = 0, so we also Full Screen ∂x 1 ∂x 2 cannot approximate f ( x 1 , x 2 ) = x 1 · x 2 with d = 1 � = 0. Close Quit

Why Traditional . . . How the Need for Fast . . . 7. Why Three Layers Are Sufficient: Faster Differentiation: . . . Newton’s Prism and Fourier Transform Beyond Traditional NN • In principle, we can have two 3-layer configurations: From Traditional NN . . . L → NL → L and NL → L → NL. Need to Go Beyond . . . Constraints Are . . . • Since L is faster than NL, the fastest is L → NL → L: � n Carnegie-Mellon Idea K � � � y = W k · f k w ki · x i − w k 0 − W 0 . New Idea: Details Home Page k =1 i =1 • Newton showed that a prism decomposes while light Title Page (or any light) into elementary colors. ◭◭ ◮◮ • In precise terms, elementary colors are sinusoids ◭ ◮ A · sin( w · t ) + B · cos( w · t ) . Page 8 of 38 • Thus, every function can be approximated, with any Go Back accuracy, as a linear combination of sinusoids: Full Screen � f ( x 1 ) ≈ ( A k · sin( w k · x 1 ) + B k · cos( w k · x 1 )) . Close k Quit

Why Traditional . . . How the Need for Fast . . . 8. Why Three Layers Are Sufficient (cont-d) Faster Differentiation: . . . • Newton’s prism result: Beyond Traditional NN � From Traditional NN . . . f ( x 1 ) ≈ ( A k · sin( w k · x 1 ) + B k · cos( w k · x 1 )) . Need to Go Beyond . . . k Constraints Are . . . • This result was theoretically proven later by Fourier. Carnegie-Mellon Idea • For f ( x 1 , x 2 ), we get a similar expression for each x 2 , New Idea: Details with A k ( x 2 ) and B k ( x 2 ). Home Page • We can similarly represent A k ( x 2 ) and B k ( x 2 ), thus Title Page getting products of sines, and it is known that, e.g.: ◭◭ ◮◮ cos( a ) · cos( b ) = 1 2 · (cos( a + b ) + cos( a − b )) . ◭ ◮ • Thus, we get an approximation of the desired form with Page 9 of 38 f k = sin or f k = cos: Go Back � n K � � � Full Screen y = W k · f k w ki · x i − w k 0 . i =1 k =1 Close Quit

Why Traditional . . . How the Need for Fast . . . 9. Which Activation Functions f k ( z ) Should We Faster Differentiation: . . . Choose Beyond Traditional NN • A general 3-layer NN has the form: From Traditional NN . . . � n K � Need to Go Beyond . . . � � y = W k · f k w ki · x i − w k 0 − W 0 . Constraints Are . . . i =1 k =1 Carnegie-Mellon Idea • Biological neurons use f ( z ) = 1 / (1 + exp( − z )), but New Idea: Details shall we simulate it? Home Page • Simulations are not always efficient. Title Page • E.g., airplanes have wings like birds but they do not ◭◭ ◮◮ flap them. ◭ ◮ • Let us analyze this problem theoretically. Page 10 of 38 • There is always some noise c in the communication Go Back channel. Full Screen • So, we can consider either the original signals x i or Close denoised ones x i − c . Quit

From Traditional Neural From Traditional NN . . . Networks to Deep - PowerPoint PPT Presentation

Why Traditional . . . How the Need for Fast . . . Faster Differentiation: . . . Beyond Traditional NN From Traditional Neural From Traditional NN . . . Networks to Deep Learning Need to Go Beyond . . . Constraints Are . . . and Beyond

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

getting active after SCI Traditional Email Interaction: Traditional Email Interaction:

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Nothing is Traditional about Nothing is Traditional about Environments in a Traditional

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Overview Understanding the neural code Neural Encoding Encoding: Prediction of neural response to

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

8 Neural MT 2: Attentional Neural MT In the past chapter, we described a simple model for neural

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Workfare as collateral: the case of the National Rural Employment Guarantee Scheme (NREGS)

Continuous Monitoring of Patients on Opioids: Capnography Initiative at BJC Healthcare Friday

Accelerating Agile - an experience report Dan North @tastapod Dan North & Associates Once

Passport to the T EX Canvas Pavneet Arora Jean-luc Doumont Toronto rhymes with... PART I

THE ROLERCOASTER LIFECYCLE OF A REAL ESTATE BUSINESS Presented by Jacob Aldridge, Director of

Rational Bubbles and Middlemen Yu Awaya Kohei Iwasaki Makoto Watanabe University of Rochester

Believing in God is not nearly as fanatical as the alternative. Evolution is much more

Extra Credit Taboo, race Evolution of Revolution: Live from and other Teheran matters

From Traditional Neural From Traditional NN . . . Networks to Deep - PowerPoint PPT Presentation

Why Traditional . . . How the Need for Fast . . . Faster Differentiation: . . . Beyond Traditional NN From Traditional Neural From Traditional NN . . . Networks to Deep Learning Need to Go Beyond . . . Constraints Are . . . and Beyond

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

getting active after SCI Traditional Email Interaction: Traditional Email Interaction:

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Nothing is Traditional about Nothing is Traditional about Environments in a Traditional

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Overview Understanding the neural code Neural Encoding Encoding: Prediction of neural response to

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

8 Neural MT 2: Attentional Neural MT In the past chapter, we described a simple model for neural

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Workfare as collateral: the case of the National Rural Employment Guarantee Scheme (NREGS)

Continuous Monitoring of Patients on Opioids: Capnography Initiative at BJC Healthcare Friday

Accelerating Agile - an experience report Dan North @tastapod Dan North &amp; Associates Once

Passport to the T EX Canvas Pavneet Arora Jean-luc Doumont Toronto rhymes with... PART I

THE ROLERCOASTER LIFECYCLE OF A REAL ESTATE BUSINESS Presented by Jacob Aldridge, Director of

Rational Bubbles and Middlemen Yu Awaya Kohei Iwasaki Makoto Watanabe University of Rochester

Believing in God is not nearly as fanatical as the alternative. Evolution is much more

Extra Credit Taboo, race Evolution of Revolution: Live from and other Teheran matters

Accelerating Agile - an experience report Dan North @tastapod Dan North & Associates Once