Neural Networks Weinan Zhang Shanghai Jiao Tong University - PowerPoint PPT Presentation

2019 CS420 Machine Learning, Lecture 4 Neural Networks Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/cs420/index.html

Breaking News of AI in 2016 • AlphaGo wins Lee Sedol (4-1) https://deepmind.com/research/alphago/ https://www.goratings.org/

Machine Learning in AlphaGo • Policy Network • Supervised Learning • Predict what is the best next human move • Reinforcement Learning • Learning to select the next move to maximize the winning rate • Value Network • Expectation of winning given the board state • Implemented by (deep) neural networks

Neural Networks • Neural networks are the basis of deep learning Perceptron Multi-layer Perceptron Convolutional Neural Network Recurrent Neural Network

Real Neurons • Cell structures • Cell body • Dendrites • Axon • Synaptic terminals Slides credit: Ray Mooney

Neural Communication • Electrical potential across cell membrane exhibits spikes called action potentials. • Spike originates in cell body, travels down axon, and causes synaptic terminals to release neurotransmitters. • Chemical diffuses across synapse to dendrites of other neurons. • Neurotransmitters can be excitatory or inhibitory. • If net input of neurotransmitters to a neuron from other neurons is excitatory and exceeds some threshold, it fires an action potential. Slides credit: Ray Mooney

Real Neural Learning • Synapses change size and strength with experience. • Hebbian learning: When two connected neurons are firing at the same time, the strength of the synapse between them increases. • “Neurons that fire together, wire together.” • These motivate the research of artificial neural nets Slides credit: Ray Mooney

Brief History of Artificial Neural Nets • The First wave • 1943 McCulloch and Pitts proposed the McCulloch-Pitts neuron model • 1958 Rosenblatt introduced the simple single layer networks now called Perceptrons. • 1969 Minsky and Papert’s book Perceptrons demonstrated the limitation of single layer perceptrons, and almost the whole field went into hibernation. • The Second wave • 1986 The Back-Propagation learning algorithm for Multi-Layer Perceptrons was rediscovered and the whole field took off again. • The Third wave • 2006 Deep (neural networks) Learning gains popularity and • 2012 made significant break-through in many applications. Slides credit: Jun Wang

Artificial Neuron Model • Model network as a graph with cells as nodes and synaptic connections as weighted edges from node i to node j , w ji 1 • Model net input to cell as w 12 w 16 w 15 X X w 13 w 14 net j = net j = w ji o i w ji o i 2 3 4 5 6 i i • Cell output is ( ( o j 0 0 if net j < T j if net j < T j o j = o j = 1 1 1 if net j ¸ T j if net j ¸ T j ( T j is threshold for unit j ) 0 T j net j McCulloch and Pitts [1943] Slides credit: Ray Mooney

Perceptron Model • Rosenblatt’s single layer perceptron [1958] • Rosenblatt [1958] further proposed the perceptron as the first model for learning with a teacher (i.e., supervised learning) • Focused on how to find appropriate weights w m for two-class classification task • • Prediction Activation function • y = 1: class one ( ( ³ m ³ m ´ ´ X X • y = -1: class two 1 1 if z ¸ 0 if z ¸ 0 y = ' y = ' ^ ^ w i x i + b w i x i + b ' ( z ) = ' ( z ) = ¡ 1 ¡ 1 otherwise otherwise i =1 i =1

Training Perceptron • Rosenblatt’s single layer perceptron [1958] • Training w i = w i + ´ ( y ¡ ^ w i = w i + ´ ( y ¡ ^ y ) x i y ) x i b = b + ´ ( y ¡ ^ b = b + ´ ( y ¡ ^ y ) y ) • Equivalent to rules: • If output is correct, do nothing • If output is high, lower weights on active inputs • • Prediction Activation function • If output is low, increase ( ( ³ m ³ m ´ ´ X X weights on active inputs 1 1 if z ¸ 0 if z ¸ 0 y = ' y = ' ^ ^ w i x i + b w i x i + b ' ( z ) = ' ( z ) = ¡ 1 ¡ 1 otherwise otherwise i =1 i =1

Properties of Perceptron • Rosenblatt’s single layer perceptron [1958] • Rosenblatt proved the convergence of a learning x 2 x 2 Class 1 algorithm if two classes said to be linearly separable (i.e., patterns that lie on opposite sides of a hyperplane) w 1 x 1 + w 2 x 2 + b = 0 w 1 x 1 + w 2 x 2 + b = 0 • Many people hoped that such a machine could be Class 2 the basis for artificial x 1 x 1 intelligence

Properties of Perceptron • • The XOR problem However, Minsky and Papert [1969] showed that some rather Input x Output y elementary computations, such X 1 X 2 X 1 XOR X 2 0 0 0 as XOR problem, could not be 0 1 1 done by Rosenblatt’s one-layer 1 0 1 perceptron 1 1 0 X 1 • However Rosenblatt believed the 1 true false limitations could be overcome if more layers of units to be added, but no learning algorithm known to obtain the weights yet • Due to the lack of learning false true algorithms people left the neural 0 1 X 2 network paradigm for almost 20 years XOR is non linearly separable: These two classes (true and false) cannot be separated using a line.

Hidden Layers and Backpropagation (1986~) • Adding hidden layer(s) (internal presentation) allows to learn a mapping that is not constrained by linearly separable b class 1 w 1 x 1 y class 2 w 2 b x 1 decision boundary: x 1 w 1 + x 2 w 2 + b = 0 class 2 Each hidden y class 2 node realizes one of the lines x 1 class 1 bounding the convex region class 2 class 2 x 2

Hidden Layers and Backpropagation (1986~) • But the solution is quite often not unique Input x Output y X 1 X 2 X 1 XOR X 2 0 0 0 0 1 1 1 0 1 1 1 0 (solution 1) (solution 2) Two lines are necessary to divide Sign activation function The number in the circle is a threshold the sample space accordingly http://www.cs.stir.ac.uk/research/publications/techreps/pdf/TR148.pdf http://recognize-speech.com/basics/introduction-to-artificial-neural-networks

Hidden Layers and Backpropagation (1986~) • Feedforward: massages move forward from the input nodes, through the hidden nodes (if any), and to the output nodes. There are no cycles or loops in the network Weight Parameters Weight Parameters Two-layer feedforward neural network

Single / Multiple Layers of Calculation • Single layer function f μ ( x ) = μ 0 + μ 1 x + μ 2 x 2 f μ ( x ) = μ 0 + μ 1 x + μ 2 x 2 f μ ( x ) = ¾ ( μ 0 + μ 1 x + μ 2 x 2 ) f μ ( x ) = ¾ ( μ 0 + μ 1 x + μ 2 x 2 ) x 2 x 2 x • Multiple layer function f μ ( x ) f μ ( x ) h 1 ( x ) = tanh( μ 0 + μ 1 x + μ 2 x 2 ) h 1 ( x ) = tanh( μ 0 + μ 1 x + μ 2 x 2 ) h 1 ( x ) h 1 ( x ) h 2 ( x ) h 2 ( x ) h 2 ( x ) = tanh( μ 3 + μ 4 x + μ 5 x 2 ) h 2 ( x ) = tanh( μ 3 + μ 4 x + μ 5 x 2 ) f μ ( x ) = f μ ( h 1 ( x ) ; h 2 ( x )) = ¾ ( μ 6 + μ 7 h 1 + μ 8 h 2 ) f μ ( x ) = f μ ( h 1 ( x ) ; h 2 ( x )) = ¾ ( μ 6 + μ 7 h 1 + μ 8 h 2 ) x 2 x 2 x • With non-linear activation function tanh( x ) = 1 ¡ e ¡ 2 x tanh( x ) = 1 ¡ e ¡ 2 x 1 1 ¾ ( x ) = ¾ ( x ) = 1 + e ¡ x 1 + e ¡ x 1 + e ¡ 2 x 1 + e ¡ 2 x

Non-linear Activation Functions • Sigmoid 1 1 ¾ ( z ) = ¾ ( z ) = 1 + e ¡ z 1 + e ¡ z • Tanh tanh( z ) = 1 ¡ e ¡ 2 z tanh( z ) = 1 ¡ e ¡ 2 z 1 + e ¡ 2 z 1 + e ¡ 2 z • Rectified Linear Unit (ReLU) ReLU( z ) = max(0 ; z ) ReLU( z ) = max(0 ; z )

Universal Approximation Theorem • A feed-forward network with a single hidden layer containing a finite number of neurons (i.e., a multilayer perceptron), can approximate continuous functions • on compact subsets of R n R n • under mild assumptions on the activation function • Such as Sigmoid, Tanh and ReLU [Hornik, Kurt, Maxwell Stinchcombe, and Halbert White. "Multilayer feedforward networks are universal approximators." Neural networks 2.5 (1989): 359-366.]

Universal Approximation • Multi-layer perceptron approximate any continuous R n R n functions on compact subset of 1 1 tanh( x ) = 1 ¡ e ¡ 2 x tanh( x ) = 1 ¡ e ¡ 2 x ¾ ( x ) = ¾ ( x ) = 1 + e ¡ x 1 + e ¡ x 1 + e ¡ 2 x 1 + e ¡ 2 x

Hidden Layers and Backpropagation (1986~) • One of the efficient algorithms for multi-layer neural networks is the Backpropagation algorithm • It was re-introduced in 1986 and Neural Networks regained the popularity Error backpropagation Parameters weights Parameters weights  Error Caculation Note: backpropagation appears to be found by Werbos [1974]; and then independently rediscovered around 1985 by Rumelhart, Hinton, and Williams [1986] and by Parker [1985]

Learning NN by Back-Propagation Compare outputs with correct answer to get error @E @E = @E = @E @z k @z k = @E = @E y j y j @w jk @w jk @z k @z k @w jk @w jk @z k @z k [LeCun, Bengio and Hinton. Deep Learning. Nature 2015.]

Learning NN by Back-Propagation Error Back-propagation Error Calculation Parameters x 1 Parameters weights weights d 1 = 1 y 1 x 2 d 2 = 0 y 0 x m label = Face label = no face Training instances…

Neural Networks Weinan Zhang Shanghai Jiao Tong University - PowerPoint PPT Presentation

2019 CS420 Machine Learning, Lecture 4 Neural Networks Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/cs420/index.html Breaking News of AI in 2016 AlphaGo wins Lee Sedol (4-1)

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Two Applications of Topological Methods for Neuronal Morphology Analysis Yusu Wang Computer

The Neurobiology of Addiction Addiction. Is it a Disease? The Disease Model of Addiction

Learn and Understand: The definition of cell changes again The contractile unit of

Huntingtons Disease 1) signal transmission between NS components

Reinforcement Learning l Variation on Supervised Learning l Exact target outputs are not given l

Machine Learning & Neural Networks CS16: Introduction to Data Structures & Algorithms

Timing in Biological Systems Lou Scheffer Howard Hughes Medical Institute

Modelling Biochemical Reaction Networks Introductory lecture: What to model? Why? Marc R.

Sambuz

Useful Links

Newsletter

Mail Us

Neural Networks Weinan Zhang Shanghai Jiao Tong University - PowerPoint PPT Presentation

2019 CS420 Machine Learning, Lecture 4 Neural Networks Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/cs420/index.html Breaking News of AI in 2016 AlphaGo wins Lee Sedol (4-1)

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Two Applications of Topological Methods for Neuronal Morphology Analysis Yusu Wang Computer

The Neurobiology of Addiction Addiction. Is it a Disease? The Disease Model of Addiction

Learn and Understand: The definition of cell changes again The contractile unit of

Huntingtons Disease 1) signal transmission between NS components

Reinforcement Learning l Variation on Supervised Learning l Exact target outputs are not given l

Machine Learning &amp; Neural Networks CS16: Introduction to Data Structures &amp; Algorithms

Timing in Biological Systems Lou Scheffer Howard Hughes Medical Institute

Modelling Biochemical Reaction Networks Introductory lecture: What to model? Why? Marc R.

Sambuz

Useful Links

Newsletter

Mail Us

Machine Learning & Neural Networks CS16: Introduction to Data Structures & Algorithms