One More Advantage of Deep Learning: From Traditional NN . . . - PowerPoint PPT Presentation

Why Traditional . . . How the Need for Fast . . . Faster Differentiation: . . . Beyond Traditional NN One More Advantage of Deep Learning: From Traditional NN . . . While in General, A Perfect Training Formulation of the . . . of a Neural Network Is NP-Hard, What Us Feasible: . . . It Is Feasible for Bounded-Width NP-Hardness Result Deep Networks Feasibility Resuly Home Page Vladik Kreinovich Title Page Department of Computer Science University of Texas at El Paso ◭◭ ◮◮ El Paso, TX 79968, USA vladik@utep.edu ◭ ◮ http://www.cs.utep.edu/vladik Page 1 of 29 (based on a joint work with Chitta Baral) Go Back Full Screen Close Quit

Why Traditional . . . How the Need for Fast . . . 1. Why Traditional Neural Networks: Faster Differentiation: . . . (Sanitized) History Beyond Traditional NN • How do we make computers think? From Traditional NN . . . Formulation of the . . . • To make machines that fly it is reasonable to look at What Us Feasible: . . . the creatures that know how to fly: the birds. NP-Hardness Result • To make computers think, it is reasonable to analyze Feasibility Resuly how we humans think. Home Page • On the biological level, our brain processes information Title Page via special cells called neurons . ◭◭ ◮◮ • Somewhat surprisingly, in the brain, signals are electric ◭ ◮ – just as in the computer. Page 2 of 29 • The main difference is that in a neural network, signals Go Back are sequence of identical pulses. Full Screen Close Quit

Why Traditional . . . How the Need for Fast . . . 2. Why Traditional NN: (Sanitized) History Faster Differentiation: . . . • The intensity of a signal is described by the frequency Beyond Traditional NN of pulses. From Traditional NN . . . Formulation of the . . . • A neuron has many inputs (up to 10 4 ). What Us Feasible: . . . • All the inputs x 1 , . . . , x n are combined, with some loss, NP-Hardness Result n into a frequency � w i · x i . Feasibility Resuly i =1 Home Page • Low inputs do not active the neuron at all, high inputs Title Page lead to largest activation. ◭◭ ◮◮ • The output signal is a non-linear function � n � ◭ ◮ � y = f w i · x i − w 0 . Page 3 of 29 i =1 Go Back • In biological neurons, f ( x ) = 1 / (1 + exp( − x )) . Full Screen • Traditional neural networks emulate such biological neurons. Close Quit

Why Traditional . . . How the Need for Fast . . . 3. Why Traditional Neural Networks: Faster Differentiation: . . . Real History Beyond Traditional NN • At first, researchers ignored non-linearity and only From Traditional NN . . . used linear neurons. Formulation of the . . . What Us Feasible: . . . • They got good results and made many promises. NP-Hardness Result • The euphoria ended in the 1960s when MIT’s Marvin Feasibility Resuly Minsky and Seymour Papert published a book. Home Page • Their main result was that a composition of linear func- Title Page tions is linear (I am not kidding). ◭◭ ◮◮ • This ended the hopes of original schemes. ◭ ◮ • For some time, neural networks became a bad word. Page 4 of 29 • Then, smart researchers came us with a genius idea: Go Back let’s make neurons non-linear. Full Screen • This revived the field. Close Quit

Why Traditional . . . How the Need for Fast . . . 4. Traditional Neural Networks: Main Motivation Faster Differentiation: . . . • One of the main motivations for neural networks was Beyond Traditional NN that computers were slow. From Traditional NN . . . Formulation of the . . . • Although human neurons are much slower than CPU, What Us Feasible: . . . the human processing was often faster. NP-Hardness Result • So, the main motivation was to make data processing Feasibility Resuly faster. Home Page • The idea was that: Title Page – since we are the result of billion years of ever im- ◭◭ ◮◮ proving evolution, ◭ ◮ – our biological mechanics should be optimal (or close Page 5 of 29 to optimal). Go Back Full Screen Close Quit

Why Traditional . . . How the Need for Fast . . . 5. How the Need for Fast Computation Leads to Faster Differentiation: . . . Traditional Neural Networks Beyond Traditional NN • To make processing faster, we need to have many fast From Traditional NN . . . processing units working in parallel. Formulation of the . . . What Us Feasible: . . . • The fewer layers, the smaller overall processing time. NP-Hardness Result • In nature, there are many fast linear processes – e.g., Feasibility Resuly combining electric signals. Home Page • As a result, linear processing (L) is faster than non- Title Page linear one. ◭◭ ◮◮ • For non-linear processing, the more inputs, the longer ◭ ◮ it takes. Page 6 of 29 • So, the fastest non-linear processing (NL) units process just one input. Go Back Full Screen • It turns out that two layers are not enough to approximate any function. Close Quit

Why Traditional . . . How the Need for Fast . . . 6. Why One or Two Layers Are Not Enough Faster Differentiation: . . . • With 1 linear (L) layer, we only get linear functions. Beyond Traditional NN From Traditional NN . . . • With one nonlinear (NL) layer, we only get functions Formulation of the . . . of one variable. � n What Us Feasible: . . . � • With L → NL layers, we get g � w i · x i − w 0 . NP-Hardness Result i =1 Feasibility Resuly • For these functions, the level sets f ( x 1 , . . . , x n ) = const Home Page n � are planes w i · x i = c . Title Page i =1 • Thus, they cannot approximate, e.g., f ( x 1 , x 2 ) = x 1 · x 2 ◭◭ ◮◮ for which the level set is a hyperbola. ◭ ◮ n • For NL → L layers, we get f ( x 1 , . . . , x n ) = � f i ( x i ). Page 7 of 29 i =1 Go Back ∂ 2 f def • For all these functions, d = = 0, so we also Full Screen ∂x 1 ∂x 2 cannot approximate f ( x 1 , x 2 ) = x 1 · x 2 with d = 1 � = 0. Close Quit

Why Traditional . . . How the Need for Fast . . . 7. Why Three Layers Are Sufficient: Faster Differentiation: . . . Newton’s Prism and Fourier Transform Beyond Traditional NN • In principle, we can have two 3-layer configurations: From Traditional NN . . . L → NL → L and NL → L → NL. Formulation of the . . . What Us Feasible: . . . • Since L is faster than NL, the fastest is L → NL → L: � n NP-Hardness Result K � � � y = W k · f k w ki · x i − w k 0 − W 0 . Feasibility Resuly Home Page k =1 i =1 • Newton showed that a prism decomposes while light Title Page (or any light) into elementary colors. ◭◭ ◮◮ • In precise terms, elementary colors are sinusoids ◭ ◮ A · sin( w · t ) + B · cos( w · t ) . Page 8 of 29 • Thus, every function can be approximated, with any Go Back accuracy, as a linear combination of sinusoids: Full Screen � f ( x 1 ) ≈ ( A k · sin( w k · x 1 ) + B k · cos( w k · x 1 )) . Close k Quit

Why Traditional . . . How the Need for Fast . . . 8. Why Three Layers Are Sufficient (cont-d) Faster Differentiation: . . . • Newton’s prism result: Beyond Traditional NN � From Traditional NN . . . f ( x 1 ) ≈ ( A k · sin( w k · x 1 ) + B k · cos( w k · x 1 )) . Formulation of the . . . k What Us Feasible: . . . • This result was theoretically proven later by Fourier. NP-Hardness Result • For f ( x 1 , x 2 ), we get a similar expression for each x 2 , Feasibility Resuly with A k ( x 2 ) and B k ( x 2 ). Home Page • We can similarly represent A k ( x 2 ) and B k ( x 2 ), thus Title Page getting products of sines, and it is known that, e.g.: ◭◭ ◮◮ cos( a ) · cos( b ) = 1 2 · (cos( a + b ) + cos( a − b )) . ◭ ◮ • Thus, we get an approximation of the desired form with Page 9 of 29 f k = sin or f k = cos: Go Back � n K � � � Full Screen y = W k · f k w ki · x i − w k 0 . i =1 k =1 Close Quit

Why Traditional . . . How the Need for Fast . . . 9. Which Activation Functions f k ( z ) Should We Faster Differentiation: . . . Choose Beyond Traditional NN • A general 3-layer NN has the form: From Traditional NN . . . � n K � Formulation of the . . . � � y = W k · f k w ki · x i − w k 0 − W 0 . What Us Feasible: . . . i =1 k =1 NP-Hardness Result • Biological neurons use f ( z ) = 1 / (1 + exp( − z )), but Feasibility Resuly shall we simulate it? Home Page • Simulations are not always efficient. Title Page • E.g., airplanes have wings like birds but they do not ◭◭ ◮◮ flap them. ◭ ◮ • Let us analyze this problem theoretically. Page 10 of 29 • There is always some noise c in the communication Go Back channel. Full Screen • So, we can consider either the original signals x i or Close denoised ones x i − c . Quit

One More Advantage of Deep Learning: From Traditional NN . . . - PowerPoint PPT Presentation

Why Traditional . . . How the Need for Fast . . . Faster Differentiation: . . . Beyond Traditional NN One More Advantage of Deep Learning: From Traditional NN . . . While in General, A Perfect Training Formulation of the . . . of a Neural

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Learn more Do more Be more Learn more Do more Be more UNITY Learn more Do

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Comparative Advantage: The Advantage of the Comparatively Powerful? J. Bradford DeLong

Presentation Advantage: How to Inform and Persuade Any Audience Presentation Advantage: How to

Original A/B Medicare and Medicare Advantage Part C or Medicare Advantage Whats The

Intermediate Care and Frailty Prof Anne Hendry Advantage JA Work Package 7 Lead ADVANTAGE JA

Defect Detection Thomas Zimmermann The First Bug September 9, 1947 More Bugs More Bugs More

Why Transformers Work. More info blablabla More info blablabla More info blablabla More

District 211 One-to-One Program One-to-One: Program Background 2012-2013 2016-2017 One-to-One

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Adaptive Packet Marking for Maintaining End-to-End Throughput in a differentiated- Services

Section 4 Numerical Differentiation and Integration Numerical Analysis I Xiaojing Ye, Math

On the Iteration Complexity of Hypergradient Computation Riccardo Grazzi Computational Statistics

Ascertaining the Reality of Network Ascertaining the Reality of Network Neutrality Violation in

BCD Smart Power Roadmap Trends and Challenges Giuseppe Croce NEREID WORKSHOP Smart Energy

The Nuts and Bolts of ESSA: What Every Principal Should Know About the Law's Accountability The

. Improving the Performance of Introduction (1) Interactive TCP Applications using Service

Prospect of CSC-targeting Therapy Kyoto Univ. Faculty of Medicine (1) Seigi Oshima Outline 1.

One More Advantage of Deep Learning: From Traditional NN . . . - PowerPoint PPT Presentation

Why Traditional . . . How the Need for Fast . . . Faster Differentiation: . . . Beyond Traditional NN One More Advantage of Deep Learning: From Traditional NN . . . While in General, A Perfect Training Formulation of the . . . of a Neural

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Learn more Do more Be more Learn more Do more Be more UNITY Learn more Do

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Comparative Advantage: The Advantage of the Comparatively Powerful? J. Bradford DeLong

Presentation Advantage: How to Inform and Persuade Any Audience Presentation Advantage: How to

Original A/B Medicare and Medicare Advantage Part C or Medicare Advantage Whats The

Intermediate Care and Frailty Prof Anne Hendry Advantage JA Work Package 7 Lead ADVANTAGE JA

Defect Detection Thomas Zimmermann The First Bug September 9, 1947 More Bugs More Bugs More

Why Transformers Work. *More info blablabla *More info blablabla *More info blablabla *More

District 211 One-to-One Program One-to-One: Program Background 2012-2013 2016-2017 One-to-One

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Adaptive Packet Marking for Maintaining End-to-End Throughput in a differentiated- Services

Section 4 Numerical Differentiation and Integration Numerical Analysis I Xiaojing Ye, Math

On the Iteration Complexity of Hypergradient Computation Riccardo Grazzi Computational Statistics

Ascertaining the Reality of Network Ascertaining the Reality of Network Neutrality Violation in

BCD Smart Power Roadmap Trends and Challenges Giuseppe Croce NEREID WORKSHOP Smart Energy

The Nuts and Bolts of ESSA: What Every Principal Should Know About the Law's Accountability The

. Improving the Performance of Introduction (1) Interactive TCP Applications using Service

Prospect of CSC-targeting Therapy Kyoto Univ. Faculty of Medicine (1) Seigi Oshima Outline 1.

Why Transformers Work. More info blablabla More info blablabla More info blablabla More