Artifical Neural Networks STAT 27725/CMSC 25400: Machine Learning - PowerPoint PPT Presentation

Artifical Neural Networks STAT 27725/CMSC 25400: Machine Learning Shubhendu Trivedi University of Chicago November 2015 Artifical Neural Networks STAT 27725/CMSC 25400

Things we will look at today • Biological Neural Networks as inspiration for Artificial Neural Networks • Model of a neuron (Perceptron) • Multi-layer Perceptrons • Training feed forward networks: The Backpropagation algorithm • Deep Learning: Convolutional Neural Networks • Visualization of learned features Artifical Neural Networks STAT 27725/CMSC 25400

Neural Networks The human brain has an estimated 10 11 neurons, each connected on average, to 10 4 others Inputs come from the dendrites, are aggregated in the soma. If the neuron starts firing, impulses are propagated to other neurons via axons Neuron activity is typically excited or inhibited through connections to other neurons The fastest neuron switching times are known to be on the order of 10 − 3 seconds - quite slow compared to computer switching speeds of 10 − 10 seconds Yet, humans are surprisingly quick in making complex decisions: Example - takes roughly 10 − 1 seconds to visually recognize your mother Artifical Neural Networks STAT 27725/CMSC 25400

Neural Networks Note that the sequence of neurons firings that can take place during this interval cannot be possibly more than a few hundred steps (given the switching speed of the neurons) Thus, depth of the network can’t be great (clear layer by layer organization in the visual system) This observation has led many to speculate that the information-processing abilities of biological neural systems must follow from highly parallel processes, operating on representations that are distributed over many neurons. Artifical Neural Networks STAT 27725/CMSC 25400

Neural Networks Neurons are simple. But their arrangement in multi-layered networks is very powerful They self organize. Learning effectively is change in organization (or connection strengths). Humans are very good at recognizing patterns. How does the brain do it? Artifical Neural Networks STAT 27725/CMSC 25400

Neural Networks In the perceptual system, neurons represent features of the sensory input The brain learns to extract many layers of features. Features in one layer represent more complex combinations of features in the layer below. (e.g. Hubel Weisel (Vision), 1959, 1962) How can we imitate such a process on a computer? Artifical Neural Networks STAT 27725/CMSC 25400

Neural Networks [Slide credit: Thomas Serre] Artifical Neural Networks STAT 27725/CMSC 25400

First Generation Neural Networks: McCullogh Pitts (1943) Artifical Neural Networks STAT 27725/CMSC 25400

A Model Adaptive Neuron This is just a Perceptron (seen earlier in class) Assumes data are linearly separable. Simple stochastic algorithm for learning the linear classifier Theorem (Novikoff, 1962) Let w , w 0 be a linear separator with � w � = 1 , and margin γ . Then Perceptron will converge after � (max i � x i � ) 2 � O γ 2 Artifical Neural Networks STAT 27725/CMSC 25400

Perceptron as a model of the brain? Perceptron developed in the 1950s Key publication: The perceptron: a probabilistic model for information storage and organization in the brain , Frank Rosenblatt, Psychological Review, 1958 Goal: Pattern classification From ”Mechanization of Thought Process” (1959): ”The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence. Later perceptrons will be able to recognize people and call out their names and instantly translate speech in one language to speech and writing in another language, it was predicted.” Another ancient milestone: Hebbian learning rule (Donald Hebb, 1949) Artifical Neural Networks STAT 27725/CMSC 25400

Perceptron as a model of the brain? The Mark I perceptron machine was the first implementation of the perceptron algorithm. The machine was connected to a camera that used 2020 cadmium sulfide photocells to produce a 400-pixel image. The main visible feature is a patchboard that allowed experimentation with different combinations of input features. To the right of that are arrays of potentiometers that implemented the adaptive weights Artifical Neural Networks STAT 27725/CMSC 25400

Adaptive Neuron: Perceptron A perceptron represents a decision surface in a d dimensional space as a hyperplane Works only for those sets of examples that are linearly separable Many boolean functions can be represented by a perceptron: AND, OR, NAND,NOR Artifical Neural Networks STAT 27725/CMSC 25400

Problems? If features are complex enough, anything can be classified Thus features are really hand coded. But it comes with a clever algorithm for weight updates If features are restricted, then some interesting tasks cannot be learned and thus perceptrons are fundamentally limited in what they can do. Famous examples: XOR, Group Invariance Theorems (Minsky, Papert, 1969) Artifical Neural Networks STAT 27725/CMSC 25400

Coda Single neurons are not able to solve complex tasks (linear decision boundaries). Limited in the input-output mappings they can learn to model. More layers of linear units are not enough (still linear). A fixed non-linearity at the output is not good enough either We could have multiple layers of adaptive, non-linear hidden units. These are called Multi-layer perceptrons Were considered a solution to represent nonlinearly separable function in the 70s Many local minima: Perceptron convergence theorem does not apply. Intuitive conjecture in the 60s: There is no learning algorithm for multilayer perceptrons Artifical Neural Networks STAT 27725/CMSC 25400

Multi-layer Perceptrons Digression: Kernel Methods We have looked at how each neuron will look like But did not mention activation functions. Some common choices: How can we learn the weights? Artifical Neural Networks STAT 27725/CMSC 25400

Learning multiple layers of features [Slide: G. E. Hinton] Artifical Neural Networks STAT 27725/CMSC 25400

Review: neural networks f w (2) 0 h 0 ≡ 1 w (2) m w (2) w (2) 2 1 . . . h h h w (1) x 0 ≡ 1 01 w (1) w (1) w (1) 11 21 d 1 . . . x 1 x 2 x d Feedforward operation, from input x to output ˆ y :   � d � m � � w (2) w (1) ij x i + w (1) + w (2) y ( x ; w ) = f ˆ j h   0 j 0 j =1 i =1 Slide adapted from TTIC 31020, Gregory Shakhnarovich Artifical Neural Networks STAT 27725/CMSC 25400

Training the network Error of the network on a training set: N 1 � y ( x i ; w )) 2 L ( X ; w ) = 2 ( y i − ˆ i =1 Generally, no closed-form solution; resort to gradient descent Need to evaluate derivative of L on a single example y = � Let’s start with a simple linear model ˆ j w j x ij : ∂L ( x i ) = (ˆ y i − y i ) x ij . ∂w j � �� error Artifical Neural Networks STAT 27725/CMSC 25400

Backpropagation General unit activation in a multilayer network:   z t � z t = h w jt z j h  w 1 t w 2 t w st j . . . z 1 z 2 z s Forward propagation: calculate for each unit a t = � j w jt z j The loss L depends on w jt only through a t : ∂L = ∂L ∂a t = ∂L z j ∂w jt ∂a t ∂w jt ∂a t Artifical Neural Networks STAT 27725/CMSC 25400

Backpropagation ∂L = ∂L ∂L ∂L = z j z j ∂w jt ∂a t ∂w jt ∂a t �� δ t Output unit with linear activation: δ t = ˆ y − y Hidden unit z t = h ( a t ) which sends inputs to units S : ∂L ∂a s � δ t = ∂a s ∂a t � z s a s = w js h ( a j ) s ∈ S � w ts j : j → s = h ′ ( a t ) w ts δ s . . . z t s ∈ S Artifical Neural Networks STAT 27725/CMSC 25400

Backpropagation: example Output: f ( a ) = a f Hidden: h ( a ) = tanh( a ) = e a − e − a w (2) w (2) w (2) m e a + e − a , 2 1 . . . h h h m 1 2 w (1) w (1) w (1) d 1 11 21 . . . h ′ ( a ) = 1 − h ( a ) 2 . x 0 x 1 x d Given example x , feed-forward inputs: d � w (1) input to hidden: a j = ij x i , i =0 hidden output: z j = tanh( a j ) , m � w (2) net output: ˆ y = a = j z j . j =0 Artifical Neural Networks STAT 27725/CMSC 25400

Backpropagation: example d m � � w (1) w (2) a j = ij x i , z j = tanh( a j ) , ˆ y = a = j z j . i =0 j =0 Error on example x : L = 1 y ) 2 . 2 ( y − ˆ ∂L Output unit: δ = ∂a = y − ˆ y . Next, compute δ s for the hidden units: δ j = (1 − z j ) 2 w (2) j δ Derivatives w.r.t. weights: ∂L ∂L = δ j x i , = δz j . ∂w (1) ∂w (2) ij j Update weights: w j ← w j − ηδz j and w (1) ij ← w (1) ij − ηδ j x i . η is called the weight decay Artifical Neural Networks STAT 27725/CMSC 25400

Multidimensional output f f f . . . Loss on example ( x , y ) : k K w (2) w (2) w (2) mk K 1 2 k . . . � 1 k y k ) 2 h h h ( y k − ˆ m 1 2 2 w (1) w (1) w (1) k =1 11 21 d 1 . . . x 0 x 1 x d Now, for each output unit δ k = y k − ˆ y k ; For hidden unit j , K � w (2) δ j = (1 − z j ) 2 jk δ k . k =1 Artifical Neural Networks STAT 27725/CMSC 25400

Artifical Neural Networks STAT 27725/CMSC 25400: Machine Learning - PowerPoint PPT Presentation

Artifical Neural Networks STAT 27725/CMSC 25400: Machine Learning Shubhendu Trivedi University of Chicago November 2015 Artifical Neural Networks STAT 27725/CMSC 25400 Things we will look at today Biological Neural Networks as inspiration

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Image Classification with Deep Networks Ronan Collobert Facebook AI Research Feb 11, 2015

Disclosures I am currently carrying out treatment trials for those with FXS for CBD (Zynerba),

Collaborators Opaque line Transparent line Vincent J Dercksen Marcel Oberlnder Bert Sakmann

Toward Reliable Bayesian Nonparametric Learning Erik Sudderth Brown University Department of

-Synuclein Toxicity Biology of -Synuclein Dendrites -Synuclein Maroteaux, et al. J.

Neural Networks CS 6355: Structured Prediction Based on slides and material from Geoffrey Hinton,

Hodgkin-Huxley Model of Action Potentials Differential Equations Math 210 Neuron Axon

(C) Reg (C) Regression ression, , layered layered ne neur ural netw tworks - Networks of