Lecture 27: Neural Networks and Deep Learning Mark Hasegawa-Johnson - PowerPoint PPT Presentation

Lecture 27: Neural Networks and Deep Learning Mark Hasegawa-Johnson April 6, 2020 License: CC-BY 4.0. You may remix or redistribute if you cite the source.

Outline • Why use more than one layer? • Biological inspiration • Representational power: the XOR function • Two-layer neural networks • The Fundamental Theorem of Calculus • Feature learning for linear classifiers • Deep networks • Biological inspiration: features computed from features • Flexibility: convolutional, recurrent, and gated architectures

Biological Inspiration: McCulloch-Pitts Artificial Neuron, 1943 Input • In 1943, McCulloch & Pitts proposed that biological neurons Weights have a nonlinear activation x 1 w 1 function (a step function) whose input is a weighted linear x 2 w 2 combination of the currents Output: u( w × x ) generated by other neurons. x 3 w 3 • They showed lots of examples of . mathematical and logical . functions that could be computed . w D using networks of simple neurons x D like this.

Biological Inspiration: Hodgkin & Huxley Hodgkin & Huxley won the Nobel prize Hodgkin & Huxley Circuit Model of a for their model of cell membranes, Neuron Membrane which provided lots more detail about By Krishnavedala - Own work, CC0, https://commons.wikimedia.org/w/index.php?curid=21725464 how the McCulloch-Pitts model works in nature. Their nonlinear model has two step functions: • 𝐽 < threshold1: V = −75𝑛𝑊 • threshold1 < 𝐽 < threshold2: V has a spike, then returns to rest. Membrane voltage versus time. As current • threshold 2 < 𝐽 : V spikes periodically passes 0mA, spike appears. As current passes 10mA, spike train appears. By Alexander J. White - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=30310965

Biological Inspiration: Neuronal Circuits • Even the simplest actions involve more than one neuron, acting in sequence in a neuronal circuit. • One of the simplest neuronal circuits is a reflex arc, which may contain just two neurons: • The sensor neuron detects a stimulus, and communicates an electrical signal to … Illustration of a reflex arc: sensor neuron sends a voltage spike to the • The motor neuron , which spinal column, where the resulting current causes a spike in a motor activates the muscle. neuron, whose spike activates the muscle. By MartaAguayo - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=39181552

Biological Inspiration: Neuronal Circuits • A circuit composed of many neurons can compute the autocorrelation function of an input sound, and from the autocorrelation, can estimate the pitch frequency. • The circuit depends on output neurons, C, that each compute a step function in response to the sum of two different input neurons, A and B. J.C.R. Licklider, “A Duplex Theory of Pitch Perception,” Experientia VII(4):128-134, 1951

• Rosenblatt was granted a Perceptron patent for the “perceptron,” an electrical circuit model of a neuron. • The perceptron is basically a network of McCulloch-Pitts neurons. • Rosenblatt’s key innovation was the perceptron learning algorithm.

A McCulloch-Pitts Neuron can compute some logical functions… When the features are binary ( 𝑦 ! ∈ Similarly, the function 𝑍 ∗ = (𝑦 # ∧ 𝑦 $ ) {0,1} ), many (but not all!) binary functions can be re-written as linear can be re-written as functions. For example, the function 𝑍 ∗ = 1 if: 𝑦 # + 𝑦 $ − 1.5 > 0 𝑍 ∗ = (𝑦 # ∨ 𝑦 $ ) can be re-written as 𝑍 ∗ = 1 if: 𝑦 # + 𝑦 $ − 0.5 > 0 𝑦 " 𝑦 " 𝑦 ! 𝑦 !

… but not all. • Not all logical functions can be written as linear classifiers! • Minsky and Papert wrote a book called Perceptrons in 1969. Although the book 𝑦 " said many other things, the only thing most people remembered about the book was that: “A linear classifier cannot learn an 𝑦 ! XOR function.” • Because of that statement, most people gave up working on neural networks from about 1969 to about 2006. • Minsky and Papert also proved that a two-layer neural net can compute an XOR function. But most people didn’t notice.

The Fundamental Theorem of Calculus The Fundamental Theorem of Calculus (proved by Isaac Newton) says that 𝐵 𝑦 + Δ − 𝐵(𝑦) 𝑔 𝑦 = lim Δ %→' Illustration of the Fundamental Theorem of Calculus: any smooth function is the derivative of its own integral. The integral can be approximated as the sum of rectangles, with error going to zero as the width goes to zero. By Kabel - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=11034713

The Fundamental Theorem of Calculus A(x) Imagine the following neural network. Each neuron computes + By Kabel - Own work, CC BY-SA 4.0, ℎ ( 𝑦 = 𝑣(𝑦 − 𝑙Δ) https://commons.wikimedia.org/w/i ndex.php?curid=11034713 Where u(x) is the unit step function. Define 𝑥 $ 𝑥 , 𝑥 - 𝑥 # 𝑥 / 𝑥 . … 𝑥 ( = 𝐵 𝑙Δ − 𝐵((𝑙 − 1)Δ) Then, for any smooth function A(x), + 𝐵 𝑦 = lim %→' D 𝑥 ( ℎ ( 𝑦 x 1 ()*+

The Fundamental Theorem of Calculus f(x) Imagine the following neural network. Each neuron computes + By Kabel - Own work, CC BY-SA 4.0, ℎ ( 𝑦 = 𝑣(𝑦 − 𝑙Δ) https://commons.wikimedia.org/w/i ndex.php?curid=11034713 Where u(x) is the unit step function. Define 𝑥 $ 𝑥 , 𝑥 - 𝑥 # 𝑥 / 𝑥 . … 𝑥 ( = 𝑔 𝑙Δ − 𝑔((𝑙 − 1)Δ) Then, for any smooth function f(x), + 𝑔 𝑦 = lim %→' D 𝑥 ( ℎ ( 𝑦 x 1 ()*+

The Neural Network Representer Theorem (Barron, 1993, “Universal Approximation Bounds for Superpositions of a Sigmoidal Function”) For any vector function 𝑔( ⃗ 𝑦) that is 𝑔 ⃗ 𝑦 sufficiently smooth, and whose limit as + 𝑦 → ∞ decays sufficiently, there is a two- ⃗ By Kabel - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/i ndex.php?curid=11034713 layer neural network with N sigmoidal hidden nodes ℎ ( ⃗ 𝑦 and second-layer 𝑥 $ 𝑥 , 𝑥 - 𝑥 # 𝑥 / 𝑥 . … weights 𝑥 ( such that / 𝑔 ⃗ 𝑦 = lim /→+ D 𝑥 ( ℎ ( ⃗ 𝑦 ()# 𝑦 ⃗ 1

Classifiers example: dogs versus cats Can you write a program that can tell which ones are dogs, and which ones are cats? Idea #3: 𝑦 ! = tameness (# times the animal comes when called, out of 40). 𝑦 " = weight of the animal, in pounds. If 0.5𝑦 ! + 0. 5𝑦 " > 20 , call it a dog. Otherwise, call it a cat. This is called a “linear classifier” because 0.5𝑦 ! + 0. 5𝑦 " = 20 is the equation for a line.

By Nicoguaro - Own The feature selection work, CC BY 4.0, https://commons.wiki media.org/w/index.p hp?curid=46257808 problem • The biggest problem people had with linear classifiers, until back-propagation came along, was: Which features should I observe? • (TAMENESS? Really? What is that, and how do you measure it?) • Example: linear discriminant analysis was invented by Ronald Fisher (1936) using 4 measurements of irises: • Sepal width & length Extracted from Mature_flower_diagr • Petal width & length am.svg By Mariana Ruiz • How did he come up with those LadyofHats - Own work, Public Domain, measurements? Why are they good https://commons.wiki media.org/w/index.p measurements? hp?curid=2273307

Feature Learning: A way to think about neural nets 𝑔 ⃗ 𝑦 The solution to the “feature selection” problem turns out to be, in many cases, totally easy: if + you don’t know the features, then learn them! Define a two-layer neural network. The first- ($) ($) ($) (!) . The first layer computes ($) … 𝑥 / 𝑥 # 𝑥 , 𝑥 $ layer weights are 𝑥 #$ ()! (!) 𝑦 $ (#) ℎ # ⃗ 𝑦 = 𝜏 - 𝑥 #$ (#) 𝑥 /,# 𝑥 #,34# $'! (") . It computes (#) The second-layer weights are 𝑥 # (#) 𝑥 #$ 𝑥 /,3 * (#) (") ℎ # ⃗ 𝑥 ## 𝑔 ⃗ 𝑦 = - 𝑥 # 𝑦 (#) 𝑥 /,34# #'! 𝑦 ! 𝑦 # 𝑦 " … 1

Feature Learning: A way to think about neural nets 𝑦 " ℎ ! ⃗ 𝑦 = 1 up For example, consider the XOR problem. in this region Suppose we create two hidden nodes: ℎ " ⃗ 𝑦 = 1 down ℎ # ⃗ 𝑦 = 𝑣 0.5 − 𝑦 # − 𝑦 $ in this region 𝑦 ! ℎ $ ⃗ 𝑦 = 𝑣 𝑦 # + 𝑦 $ − 1.5 Here in the middle, both ℎ " ⃗ 𝑦 and ℎ ! ⃗ 𝑦 are zero. Then the XOR function 𝑍 ∗ = (𝑦 # ⊕ 𝑦 $ ) ℎ " ⃗ 𝑦 is given by 𝑍 ∗ = ℎ # ⃗ 𝑦 + ℎ $ ⃗ 𝑦 − 1 ℎ ! ⃗ 𝑦

Lecture 27: Neural Networks and Deep Learning Mark Hasegawa-Johnson - PowerPoint PPT Presentation

Lecture 27: Neural Networks and Deep Learning Mark Hasegawa-Johnson April 6, 2020 License: CC-BY 4.0. You may remix or redistribute if you cite the source. Outline Why use more than one layer? Biological inspiration Representational

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Data Mining II Neural Networks and Deep Learning Heiko Paulheim Deep Learning A recent

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Networks - Deep Learning Artificial Intelligence @ Allegheny College Janyl Jumadinova

Neural Networks Greg Mori - CMPT 419/726 Bishop PRML Ch. 5 Feed-forward Networks Network

Optimizing Deep Neural Networks Leena Chennuru Vankadara 26-10-2015 Table of Contents Neural

Deep Neural Networks CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Deep learning slides credit:

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

MUSIC Mikael Djurfeldt, PDC/KTH HBP Neuromorphic SP Outline Interfaces in computatjonal

Mining a Half-center oscillator database (HCO-db) Anca Doloc-Mihu Postdoctoral Fellow , Calabrese

Cortical activity in the null space: permitting preparation without movement Kaufman, Churchland,

Predictive Hebbian Learning Computational Models of Neural Systems Lecture 5.2 David S.

nervous system: : Modelling tadpole lo locomotor behaviour in in response to sensory ry sig

Neural Decoding Matthias Hennig School of Informatics, University of Edinburgh January 2019 0

Unusual Suspects Unusual Suspects of of Amyotrophic Lateral Sclerosis (ALS) Amyotrophic

Discrete event-based neural simulation using the SpiNNaker system Andrew Brown Jeff Reeve