Neural Networks CS 6355: Structured Prediction Based on slides and - PowerPoint PPT Presentation

Neural Networks CS 6355: Structured Prediction Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others

This lecture • What is a neural network? • Training neural networks • Practical concerns • Neural networks and structures 1

This lecture • What is a neural network? – The hypothesis class – Structure, expressiveness • Training neural networks • Practical concerns • Neural networks and structures 2

We have seen linear threshold units Prediction 𝑡𝑕𝑜 (𝒙 ' 𝒚 + 𝑐) = 𝑡𝑕𝑜(∑𝑥 / 𝑦 / + 𝑐) Learning threshold various algorithms dot perceptron, SVM, logistic regression,… product in general, minimize loss features But where do these input features come from? What if the features were outputs of another classifier? 3

Features from classifiers 4

Features from classifiers Each of these connections have their own weights as well 6

Features from classifiers This is a two layer feed forward neural network 8

Features from classifiers This is a two layer feed forward neural network The output layer The input layer The hidden layer Think of the hidden layer as learning a good representation of the inputs 9

Features from classifiers This is a two layer feed forward neural network The dot product followed by the threshold constitutes a neuron Five neurons in this picture (four in hidden layer and one output) 10

But where do the inputs come from? The input layer What if the inputs were the outputs of a classifier? We can make a three layer network…. And so on. 11

Let us try to formalize this 12

Neural networks • A robust approach for approximating real-valued, discrete-valued or vector valued functions • Among the most effective general purpose supervised learning methods currently known – Especially for complex and hard to interpret data such as real- world sensory data • The Backpropagation algorithm for neural networks has been shown successful in many practical problems – handwritten character recognition, speech recognition, object recognition, some NLP problems 13

Biological neurons Neurons : core components of brain and the nervous system consisting of 1. Dendrites that collect information from other neurons 2. An axon that generates outgoing spikes The first drawing of a brain cells by Santiago Ramón y Cajal in 1899 14

Biological neurons Neurons : core components of brain and the nervous system consisting of 1. Dendrites that collect information from other neurons 2. An axon that generates outgoing spikes Modern artificial neurons are “inspired” by biological neurons But there are many, many fundamental differences The first drawing of a brain Don’t take the similarity seriously (as also claims in the news cells by Santiago Ramón y about the “emergence” of intelligent behavior) Cajal in 1899 15

Artificial neurons Functions that very loosely mimic a biological neuron A neuron accepts a collection of inputs (a vector x ) and produces an output by: – Applying a dot product with weights w and adding a bias b – Applying a (possibly non-linear) transformation called an activation 𝑝𝑣𝑢𝑞𝑣𝑢 = 𝑏𝑑𝑢𝑗𝑤𝑏𝑢𝑗𝑝𝑜(𝒙 ' 𝒚 + 𝑐) Dot product Threshold activation Other activations are possible 16

Activation functions Also called transfer functions 𝑝𝑣𝑢𝑞𝑣𝑢 = 𝑏𝑑𝑢𝑗𝑤𝑏𝑢𝑗𝑝𝑜(𝒙 ' 𝒚 + 𝑐) Name of the neuron Activation function: 𝑏𝑑𝑢𝑗𝑤𝑏𝑢𝑗𝑝𝑜 𝑨 Linear unit 𝑨 Threshold/sign unit sgn(𝑨) 1 Sigmoid unit 1 + exp (−𝑨) Rectified linear unit (ReLU) max (0, 𝑨) Tanh unit tanh (𝑨) Many more activation functions exist (sinusoid, sinc, gaussian, polynomial…) 17

A neural network A function that converts inputs to outputs defined by a directed acyclic graph – Nodes organized in layers, correspond to neurons – Edges carry output of one neuron to another, associated with weights • To define a neural network, we need to specify: – The structure of the graph • How many nodes, the connectivity – The activation function on each node – The edge weights 18

A neural network Output A function that converts inputs to outputs defined by a directed acyclic graph K w IJ – Nodes organized in layers, correspond to Hidden neurons L w IJ – Edges carry output of one neuron to another, associated with weights Input • To define a neural network, we need to Called the architecture specify: of the network – The structure of the graph Typically predefined, part of the design of • How many nodes, the connectivity the classifier – The activation function on each node – The edge weights Learned from data 19

A neural network Output A function that converts inputs to outputs defined by a directed acyclic graph K w IJ – Nodes organized in layers, correspond to Hidden neurons L w IJ – Edges carry output of one neuron to another, associated with weights Input • To define a neural network, we need to specify: – The structure of the graph • How many nodes, the connectivity – The activation function on each node – The edge weights 20

A neural network Output A function that converts inputs to outputs defined by a directed acyclic graph K w IJ – Nodes organized in layers, correspond to Hidden neurons L w IJ – Edges carry output of one neuron to another, associated with weights Input • To define a neural network, we need to Called the architecture specify: of the network – The structure of the graph Typically predefined, part of the design of • How many nodes, the connectivity the classifier – The activation function on each node – The edge weights Learned from data 21

A brief history of neural networks 1943: McCullough and Pitts showed how linear threshold units can • compute logical functions 1949: Hebb suggested a learning rule that has some physiological • plausibility 1950s: Rosenblatt, the Peceptron algorithm for a single threshold neuron • 1969: Minsky and Papert studied the neuron from a geometrical • perspective 1980s: Convolutional neural networks (Fukushima, LeCun), the • backpropagation algorithm (various) 2003-today: More compute, more data, deeper networks • 22 See also: http://people.idsia.ch/~juergen/deep-learning-overview.html

What functions do neural networks express? 23

A single neuron with threshold activation Prediction = sgn (b +w 1 x 1 + w 2 x 2 ) b +w 1 x 1 + w 2 x 2 =0 + ++ + + + + + - - - - - - - - - - - - - - - - - - 24

Two layers, with threshold activations In general, convex polygons 25 Figure from Shai Shalev-Shwartz and Shai Ben-David, 2014

Three layers with threshold activations In general, unions of convex polygons 26 Figure from Shai Shalev-Shwartz and Shai Ben-David, 2014

Neural networks are universal function approximators Any continuous function can be approximated to arbitrary accuracy using • one hidden layer of sigmoid units [Cybenko 1989] Approximation error is insensitive to the choice of activation functions • [DasGupta et al 1993] Two layer threshold networks can express any Boolean function • Exercise : Prove this – VC dimension of threshold network with edges E: 𝑊𝐷 = 𝑃(|𝐹| log |𝐹|) • VC dimension of sigmoid networks with nodes V and edges E: • Upper bound: Ο 𝑊 K 𝐹 K – Lower bound: Ω 𝐹 K – Exercise : Show that if we have only linear units, then multiple layers does not change the expressiveness 27

An example network Naming conventions for this example Inputs: x • • Hidden: z Linear activation Output: y • Sigmoid activations Bias feature, always 1 28

The forward pass Given an input x , how is the output predicted X + 𝑥 LL X 𝑨 L + 𝑥 KL X 𝑨 K output y = 𝑥 WL Z + 𝑥 LK Z 𝑦 L + 𝑥 KK Z 𝑦 K ) 𝑨 K = 𝜏(𝑥 WK Z + 𝑥 LL Z 𝑦 L + 𝑥 KL Z 𝑦 K ) z L = 𝜏(𝑥 WL Questions? 29

This lecture • What is a neural network? • Training neural networks – Backpropagation • Practical concerns • Neural Networks and Structures 30

Training a neural network • Given – A network architecture (layout of neurons, their connectivity and activations) – A dataset of labeled examples • S = {( x i , y i )} • The goal: Learn the weights of the neural network • Remember : For a fixed architecture, a neural network is a function parameterized by its weights – Prediction: 𝑧 ] = 𝑂𝑂(𝒚, 𝒙) 31

Back to our running example Given an input x , how is the output predicted X + 𝑥 LL X 𝑨 L + 𝑥 KL X 𝑨 K output y = 𝑥 WL Z + 𝑥 LK Z 𝑦 L + 𝑥 KK Z 𝑦 K ) 𝑨 K = 𝜏(𝑥 WK Z + 𝑥 LL Z 𝑦 L + 𝑥 KL Z 𝑦 K ) z L = 𝜏(𝑥 WL 32

Back to our running example Given an input x , how is the output predicted X + 𝑥 LL X 𝑨 L + 𝑥 KL X 𝑨 K output y = 𝑥 WL Z + 𝑥 LK Z 𝑦 L + 𝑥 KK Z 𝑦 K ) 𝑨 K = 𝜏(𝑥 WK Z + 𝑥 LL Z 𝑦 L + 𝑥 KL Z 𝑦 K ) z L = 𝜏(𝑥 WL Suppose the true label for this example is a number 𝑧 ∗ We can write the square loss for this example as: 𝑀 = 1 2 𝑧– 𝑧 ∗ K 33

Neural Networks CS 6355: Structured Prediction Based on slides and - PowerPoint PPT Presentation

Neural Networks CS 6355: Structured Prediction Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others This lecture What is a neural network? Training

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

-Synuclein Toxicity Biology of -Synuclein Dendrites -Synuclein Maroteaux, et al. J.

Artifical Neural Networks STAT 27725/CMSC 25400: Machine Learning Shubhendu Trivedi University

Image Classification with Deep Networks Ronan Collobert Facebook AI Research Feb 11, 2015

Disclosures I am currently carrying out treatment trials for those with FXS for CBD (Zynerba),

Hodgkin-Huxley Model of Action Potentials Differential Equations Math 210 Neuron Axon

(C) Reg (C) Regression ression, , layered layered ne neur ural netw tworks - Networks of

Understanding New Regulatory Network to Develop Novel Immunotherapy Palermo, October 24, 2011

MISSION Clinical Program GWG Recommendations Shyam Patel Senior Science Officer, Portfolio

Sambuz

Useful Links

Newsletter

Mail Us