CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons, Perceptron Learning Algorithm and Convergence, Multilayer Perceptrons (MLPs), Representation Power of MLPs Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Module 2.1: Biological Neurons 2/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

The most fundamental unit of a deep y neural network is called an artificial neuron Why is it called a neuron ? Where σ does the inspiration come from ? The inspiration comes from biology w 1 w 2 w 3 (more specifically, from the brain ) x 1 x 2 x 3 biological neurons = neural cells = neural processing units Artificial Neuron We will first see what a biological neuron looks like ... 3/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

dendrite: receives signals from other neurons synapse: point of connection to other neurons soma: processes the information axon: transmits the output of this neuron Biological Neurons ∗ ∗ Image adapted from https://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg 4/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Let us see a very cartoonish illustration of how a neuron works Our sense organs interact with the outside world They relay information to the neurons The neurons (may) get activated and produces a response (laughter in this case) 5/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Of course, in reality, it is not just a single neuron which does all this There is a massively parallel interconnected network of neurons The sense organs relay information to the low- est layer of neurons Some of these neurons may fire (in red) in response to this information and in turn relay information to other neurons they are connec- ted to These neurons may also fire (again, in red) and the process continues eventually resulting in a response (laughter in this case) An average human brain has around 10 11 (100 billion) neurons! 6/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

This massively parallel network also ensures that there is division of work Each neuron may perform a certain role or respond to a certain stimulus A simplified illustration 7/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

The neurons in the brain are arranged in a hierarchy We illustrate this with the help of visual cortex (part of the brain) which deals with processing visual information Starting from the retina, the information is relayed to several layers (follow the arrows) We observe that the layers V 1, V 2 to AIT form a hierarchy (from identify- ing simple visual forms to high level objects) 8/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Sample illustration of hierarchical processing ∗ ∗ Idea borrowed from Hugo Larochelle’s lecture slides 9/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Disclaimer I understand very little about how the brain works! What you saw so far is an overly simplified explanation of how the brain works! But this explanation suffices for the purpose of this course! 10/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Module 2.2: McCulloch Pitts Neuron 11/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

McCulloch (neuroscientist) and Pitts (logi- y ∈ { 0 , 1 } cian) proposed a highly simplified computational model of the neuron (1943) g aggregates the inputs and the function f f takes a decision based on this aggregation g The inputs can be excitatory or inhibitory y = 0 if any x i is inhibitory, else .. .. x 1 x 2 x n ∈ { 0 , 1 } n � g ( x 1 , x 2 , ..., x n ) = g ( x ) = x i i =1 y = f ( g ( x )) = 1 g ( x ) ≥ θ if = 0 if g ( x ) < θ θ is called the thresholding parameter This is called Thresholding Logic 12/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Let us implement some boolean functions using this McCulloch Pitts (MP) neuron ... 13/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

y ∈ { 0 , 1 } y ∈ { 0 , 1 } y ∈ { 0 , 1 } 3 1 θ x 1 x 2 x 3 x 1 x 2 x 3 x 1 x 2 x 3 A McCulloch Pitts unit AND function OR function y ∈ { 0 , 1 } y ∈ { 0 , 1 } y ∈ { 0 , 1 } 1 0 0 x 1 x 2 x 1 x 2 x 1 x 1 AND ! x 2 ∗ NOR function NOT function ∗ circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0 14/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Can any boolean function be represented using a McCulloch Pitts unit ? Before answering this question let us first see the geometric interpretation of a MP unit ... 15/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

y ∈ { 0 , 1 } A single MP neuron splits the input points (4 points for 2 binary inputs) into two halves Points lying on or above the line � n i =1 x i − θ = 1 0 and points lying below this line In other words, all inputs which produce an x 1 x 2 output 0 will be on one side ( � n i =1 x i < θ ) OR function of the line and all inputs which produce an x 1 + x 2 = � 2 i =1 x i ≥ 1 output 1 will lie on the other side ( � n i =1 x i ≥ θ ) of this line x 2 Let us convince ourselves about this with a few more examples (if it is not already clear (0 , 1) (1 , 1) from the math) x 1 + x 2 = θ = 1 x 1 (0 , 0) (1 , 0) 16/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

y ∈ { 0 , 1 } y ∈ { 0 , 1 } 2 0 x 1 x 2 x 1 x 2 AND function Tautology (always ON) x 1 + x 2 = � 2 i =1 x i ≥ 2 x 2 x 2 (0 , 1) (1 , 1) (0 , 1) (1 , 1) x 1 + x 2 = θ = 0 x 1 + x 2 = θ = 2 x 1 (0 , 0) (1 , 0) x 1 (0 , 0) (1 , 0) 17/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

y ∈ { 0 , 1 } What if we have more than 2 inputs? Well, instead of a line we will have a plane 1 OR For the OR function, we want a plane such that the point (0,0,0) lies on one x 1 x 2 x 3 side and the remaining 7 points lie on x 2 the other side of the plane (0 , 1 , 0) (1 , 1 , 0) (0 , 1 , 1) (1 , 1 , 1) x 1 + x 2 + x 3 = θ = 1 x 1 (0 , 0 , 0) (1 , 0 , 0) (0 , 0 , 1) (1 , 0 , 1) x 3 18/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

The story so far ... A single McCulloch Pitts Neuron can be used to represent boolean functions which are linearly separable Linear separability (for boolean functions) : There exists a line (plane) such that all inputs which produce a 1 lie on one side of the line (plane) and all inputs which produce a 0 lie on other side of the line (plane) 19/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Module 2.3: Perceptron 20/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

The story ahead ... What about non-boolean (say, real) inputs ? Do we always need to hand code the threshold ? Are all inputs equal ? What if we want to assign more weight (importance) to some inputs ? What about functions which are not linearly separable ? 21/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Frank Rosenblatt, an American psychologist, y proposed the classical perceptron model (1958) A more general computational model than McCulloch–Pitts neurons Main differences: Introduction of numer- .. .. w 1 w 2 w n ical weights for inputs and a mechanism for learning these weights .. .. x 1 x 2 x n Inputs are no longer limited to boolean values Refined and carefully analyzed by Minsky and Papert (1969) - their model is referred to as the perceptron model here 22/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

y n � y = 1 if w i ∗ x i ≥ θ i =1 n � = 0 if w i ∗ x i < θ i =1 .. .. w 1 w 2 w n w 0 = − θ Rewriting the above, .. .. x 0 = 1 x 1 x 2 x n n A more accepted convention, � y = 1 if w i ∗ x i − θ ≥ 0 n � y = 1 if w i ∗ x i ≥ 0 i =1 n i =0 � = 0 w i ∗ x i − θ < 0 if n � = 0 if w i ∗ x i < 0 i =1 i =0 where, x 0 = 1 and w 0 = − θ 23/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

We will now try to answer the following questions: Why are we trying to implement boolean functions? Why do we need weights ? Why is w 0 = − θ called the bias ? 24/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Consider the task of predicting whether we would like y a movie or not Suppose, we base our decision on 3 inputs (binary, for simplicity) Based on our past viewing experience ( data ), we may give a high weight to isDirectorNolan as compared to w 1 w 2 w 3 w 0 = − θ the other inputs Specifically, even if the actor is not Matt Damon and x 0 = 1 x 1 x 2 x 3 the genre is not thriller we would still want to cross the threshold θ by assigning a high weight to isDirect- orNolan x 1 = isActorDamon x 2 = isGenreThriller w 0 is called the bias as it represents the prior (preju- x 3 = isDirectorNolan dice) A movie buff may have a very low threshold and may watch any movie irrespective of the genre, actor, dir- 25/69 ector [ θ = 0] Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

What kind of functions can be implemented using the perceptron? Any difference from McCulloch Pitts neurons? 26/69 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons, Perceptron Learning Algorithm and Convergence, Multilayer Perceptrons (MLPs), Representation Power of MLPs Mitesh M. Khapra Department of Computer

CS7015 (Deep Learning) : Lecture 10 Learning Vectorial Representations Of Words Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 16 Encoder Decoder Models, Attention Mechanism Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 18 Markov Networks Mitesh M. Khapra Department of Computer

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

CS7015 (Deep Learning): Lecture 4 Feedforward Neural Networks, Backpropagation Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

CS7015 (Deep Learning) : Lecture 19 Using joint distributions for classification and sampling,

A Provisions B Building Code N Code R Code X requirements with up-to-date model codes and

DoS: Fighting Fire with Fire Michael Walfish, Hari Balakrishnan, David Karger, and Scott Shenker *

Using Data to Improve a Global Fire Model for Use in Climate Models and ESMs Brian Magi 1,2 ,

Fire, Smoke, & Radiation Dampers Mark Terzigni Director of Engineering 2019 Program

State and Territory Single Points of Contact (SPOC) Meeting November 16, 2016 | Phoenix

Discovering Spatial and Temporal Links among RDF Data Panayiotis Smeros and Manolis Koubarakis

Introduction to CLIPS Objectives Learn what type of language CLIPS is Study the

Galaxies on FIRE : Burning up the small-scale crises of CDM Observed Starlight Molecular

CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons, Perceptron Learning Algorithm and Convergence, Multilayer Perceptrons (MLPs), Representation Power of MLPs Mitesh M. Khapra Department of Computer

CS7015 (Deep Learning) : Lecture 10 Learning Vectorial Representations Of Words Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 16 Encoder Decoder Models, Attention Mechanism Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 18 Markov Networks Mitesh M. Khapra Department of Computer

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

CS7015 (Deep Learning): Lecture 4 Feedforward Neural Networks, Backpropagation Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

CS7015 (Deep Learning) : Lecture 19 Using joint distributions for classification and sampling,

A Provisions B Building Code N Code R Code X requirements with up-to-date model codes and

DoS: Fighting Fire with Fire Michael Walfish, Hari Balakrishnan, David Karger, and Scott Shenker *

Using Data to Improve a Global Fire Model for Use in Climate Models and ESMs Brian Magi 1,2 ,

Fire, Smoke, &amp; Radiation Dampers Mark Terzigni Director of Engineering 2019 Program

State and Territory Single Points of Contact (SPOC) Meeting November 16, 2016 | Phoenix

Discovering Spatial and Temporal Links among RDF Data Panayiotis Smeros and Manolis Koubarakis

Introduction to CLIPS Objectives Learn what type of language CLIPS is Study the

Galaxies on FIRE : Burning up the small-scale crises of CDM Observed Starlight Molecular

Fire, Smoke, & Radiation Dampers Mark Terzigni Director of Engineering 2019 Program