Neural Networks: Multi-Layer Networks & Back-Propagation M. - PowerPoint PPT Presentation

Neural Networks: Multi-Layer Networks & Back-Propagation M. Soleymani Artificial Intelligence Sharif University of Technology Spring 2019 Most slides have been adapted from Bhiksha Raj, 11-785, CMU 2019 and some from Fei Fei Li et. al, cs231n, Stanford 2017

Reasons to study neural computation • Neuroscience: To understand how the brain actually works. – Its very big and very complicated and made of stuff that dies when you poke it around. So we need to use computer simulations. • AI: To solve practical problems by using novel learning algorithms inspired by the brain – Learning algorithms can be very useful even if they are not how the brain actually works. 2

A typical cortical neuron • Gross physical structure: – There is one axon that branches – There is a dendritic tree that collects input from other neurons. • Axons typically contact dendritic trees at synapses – A spike of activity in the axon causes charge to be injected into the post-synaptic neuron. • Spike generation: – There is an axon hillock that generates outgoing spikes whenever enough charge has flowed in at synapses to depolarize the cell membrane. 4

� Binary threshold neurons • McCulloch-Pitts (1943): influenced Von Neumann. – First compute a weighted sum of the inputs. – send out a spike of activity if the weighted sum exceeds a threshold. – McCulloch and Pitts thought that each spike is like the truth value of a proposition and each neuron combines truth values to compute the truth value of another proposition! 𝑔 : Activation 𝑔 function 𝑗𝑜𝑞𝑣𝑢 & 𝑥 & 𝑥 ' 𝑗𝑜𝑞𝑣𝑢 ' 𝑔 * 𝑥 , 𝑦 , Σ , … 𝑥 ( 𝑗𝑜𝑞𝑣𝑢 ( 5

� A better figure 𝑦 & 𝑥 & 𝑥 ' 𝑦 ' z = * w 4 x 4 − 𝜄 𝑦 0 𝑥 0 + 4 ..... z = 81 if z ≥ 0 0 else 𝑦 ( 𝑥 ( 𝜄 • A threshold unit – “Fires” if the weighted sum of inputs and the “bias” T is positive 6

McCulloch-Pitts neuron: binary threshold binary McCulloch-Pitts neuron 𝑧 𝑦 & 𝑥 & 𝑥 ' 𝑦 ' 𝑧 𝜄 : activation threshold … 𝑥 ( 𝑧 = 81, 𝑨 ≥ 𝜄 𝑦 ( 0, 𝑨 < 𝜄 Equivalent to 1 𝑐 𝑦 & 𝑧 𝑥 & 𝑥 ' 𝑦 ' 𝑧 … 𝑥 ( bias: 𝑐 = −𝜄 𝑦 ( 7

Neural nets and the brain 𝑦 & 𝑥 & 𝑥 ' 𝑦 ' 𝑥 0 𝑦 0 + . .... 𝑦 ( 𝑥 ( −𝜄 • Neural nets are composed of networks of computational models of neurons called perceptrons 8

The perceptron 𝑦 & 𝑥 & 𝑥 ' 𝑦 ' 𝑥 0 𝑦 0 + . .... 1 if * 𝑥 , 𝑦 , ≥ 𝜄 y = , 0 else −𝜄 𝑦 ( 𝑥 ( • A threshold unit – “Fires” if the weighted sum of inputs exceeds a threshold – Electrical engineers will call this a threshold gate • A basic unit of Booleancircuits 9

� The “soft” perceptron (logistic) 𝒚 𝟐 𝒙 𝟐 𝒚 𝟑 𝒙 𝟑 z = * w 4 x 4 − θ 𝒚 𝟒 𝒙 𝟒 4 + ..... 1 y = 1 + exp(−z) 𝒚 𝑶 −𝜄 𝒙 𝑶 • A “squashing” function instead of a threshold at the output – The sigmoid “activation” replaces the threshold • Activation: The function that acts on the weighted combination of inputs (and threshold) 10

Sigmoid neurons • These give a real-valued output that is a smooth and bounded function of their total input. • Typically they use the logistic function – They have nice derivatives. 11

Other “activations” 𝒚 𝟐 𝒙 𝟐 sigmoid 𝒚 𝟑 𝒙 𝟑 1 1 𝒚 𝟒 𝒙 𝟒 1 + exp (−𝑨) + .... tanh 𝒄 𝒚 𝑶 𝒙 𝑶 tanh 𝑨 (1 + 𝑓 [ ) log • Does not always have to be a squashing function – We will hear more about activations later • We will continue to assume a “threshold” activation in this lecture 12

� Perceptron 𝑥 & 𝑦 & z = * w 4 x 4 + b 𝑦 ' 𝑥 ' 4 𝑦 0 𝑥 0 + .... 𝑦 ' 1 x 2 𝑐 𝑦 ( 𝑥 ( x 1 𝑦 & 0 • Lean this function – A step function across a hyperplane 13

� Learning the perceptron 𝑧 = b1 𝑗𝑔 * w 4 x 4 + b ≥ 0 𝑦 & 𝑥 & 4 𝑥 ' 𝑦 ' 0 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓 𝑦 0 𝑥 0 𝑦 ' + .... 𝑦 & 𝑦 ( 𝑐 𝑥 ( • Given a number of input output pairs, learn the weights and bias – Learn 𝑋 = [𝑥 & , … , 𝑥 ( ] and b, given several 𝑌, 𝑧 pairs 14

Restating the perceptron x 1 x 2 x 3 𝑋 ( x d W d+1 x d+1 =1 • Restating the perceptron equation by adding another dimension to 𝑌 (h& 𝑧 = 81 𝑗𝑔 ∑ 𝑥 , 𝑦 , ≥ 0 ,i& Where 𝑦 (h& = 1 0 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓 (h& • Note that the boundary ∑ 𝑥 , 𝑦 , ≥ 0 is now a hyperplane ,i& through origin 15

The Perceptron Problem (h& • Find the hyperplane that perfectly separates thetwo * 𝑥 , 𝑦 , = 0 groups of points ,i& 34 16

Perceptron Algorithm:Summary • Cycle through the traininginstances • Only update 𝒙 on misclassifiedinstances • If instance misclassified: – If instance is positive class 𝒙 = 𝒙 + 𝒚 – If instance is negative class 𝒙 = 𝒙 − 𝒚 17

A Simple Method: The Perceptron Algorithm 𝒙 +1 (blue) -1 (red) • Initialize: Randomly initialize the hyperplane – i.e. randomly initialize the normalvector 𝑥 • Classification rule 𝑡𝑗𝑕𝑜(𝒙 k 𝒚) – Vectors on the same side of the hyperplane as 𝑋 will be assigned +1 class, and those on the other side will be assigned -1 • The random initial plane will make mistakes 18

How to learn the weights: multi class example This example has been adopted from Hinton slides, “NN for Machine Learning”, coursera, 2015. 19

How to learn the weights: multi class example • If correct: no change • If wrong: – lower score of the wrong answer (by removing the input from the weight vector of the wrong answer) – raise score of the target (by adding the input to the weight vector of the target class) This example has been adopted from Hinton slides, “NN for Machine Learning”, coursera, 2015. 20

Single layer networks as template matching • Weights for each class as a template (or sometimes also called a prototype) for that class. – The winner is the most similar template. • The ways in which hand-written digits vary are much too complicated to be captured by simple template matches of whole shapes. • To capture all the allowable variations of a digit we need to learn the features that it is composed of. 26

What binary threshold neurons cannot do • A binary threshold output unit cannot even tell if two single bit features are the same! • A geometric view of what binary threshold neurons cannot do • The positive and negative cases cannot be separated by a plane 27

Networks with hidden units • Networks without hidden units are very limited in the input-output mappings they can learn to model. – More layers of linear units do not help. Its still linear. – Fixed output non-linearities are not enough. • We need multiple layers of adaptive, non-linear hidden units. But how can we train such nets? 28

The multi-layer perceptron • A network of perceptrons – Generally “layered ” 29

Feed-forward neural networks • Also called Multi-Layer Perceptron (MLP) 30

Neural Networks: Multi-Layer Networks & Back-Propagation M. - PowerPoint PPT Presentation

Neural Networks: Multi-Layer Networks & Back-Propagation M. Soleymani Artificial Intelligence Sharif University of Technology Spring 2019 Most slides have been adapted from Bhiksha Raj, 11-785, CMU 2019 and some from Fei Fei Li et. al,

Overview Multi-layer networks: Cognitive Modeling limits of single layer networks; Lecture

Multi Multi Multi- Multi - - -Layer Access Control Layer Access Control Layer Access

PLANT PROPAGATION An Overview of Plant Propagation Methods Two Techniques of Stem Cutting

Network Layer October 2, 2019 guha.jayachandran@sjsu.edu Layer 2: Protocol atop Layer 1

A multi- -layer layer A multi A multi-layer research and training platform research and

Layer-wise Relevance Propagation in Neural Neural Networks Deep Learning Shortcomings Networks

CS4501: Introduction to Computer Vision Neural Networks (NNs) Artificial Neural Networks (ANNs)

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Lecture 6: Wireless Link Layer, Lecture 6: Wireless Link Layer, MAC protocols, CSMA MAC

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

ELEC / COMP 177 Fall 2016 Some slides from Kurose and Ross, Computer Networking , 5 th Edition

5 Network Layer Network Layer Network Layer Network Layer Example: Choosing among multiple ASes

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Multi-Layer vs. Single-Layer Networks Single-layer networks based on a linear combination of

Cost function Machine Learning Neural Network (Classification) total no. of layers in network

Introduction to Data Science: Neural [ 1 , 2 , , p ] g x w h m g h g f w old M

The renormalization of the NN potential 1.1 Introduction First of all, it is worth to recall

Interpretability of Machine Learning for Computer Vision Xinshuo Weng* *Most slides borrowed

Perceptrons Introduction: Neural Networks 1 The Perceptron 2 Using Perceptrons Perceptrons

CS 466 Introduction to Bioinformatics Lecture 2 Mohammed El-Kebir August 30, 2019

Lets Go ! Akim Demaille, Etienne Renault, Roland Levillain April 2, 2020 TYLA Lets Go !

182.694 Microcontroller VU Kyrill Winkler SS 2014 Featuring Today: Structured C Programming

Classes as Record Types Classes and Objects class Date { public: // access specifier int day;

THE ATMOSPHERIC INFRARED SOUNDER AIRS SCIENCE TEAM MEETING CALTECH PASADENA, CA March 7-10,