Algorithms in Nature Neural Networks (NN) Mimicking the brain In - PowerPoint PPT Presentation

Algorithms in Nature Neural Networks (NN)

Mimicking the brain • In the early days of AI there was a lot of interest in developing models that can mimic human thinking. • While no one knew exactly how the brain works (and, even though there was a lot of progress since, there is still a lot we do not know), some of the basic computational units were known • A key component of these units is the neuron.

The Neuron • A cell in the brain • Highly connected to other neurons • Thought to perform computations by integrating signals from other neurons • Outputs of these computation may be transmitted to one or more neurons

Biological inspiration • The nervous system is built using relatively simple units, the neurons, so copying their behaviour and functionality could provide solutions to problems related to interpretation and optimization.

Biological inspiration Dendrites Soma (cell body) Axon

Biological inspiration dendrites axon synapses Synapses are the edges in this network, responsible for transmitting information between the neurons

Biological inspiration •The spikes travelling along the axon of the pre-synaptic neuron trigger the release of neurotransmitter substances at the synapse. •The neurotransmitters cause excitation or inhibition in the dendrite of the post-synaptic neuron. •The integration of the excitatory and inhibitory signals may produce spikes in the post-synaptic neuron. •The contribution of the signals depends on the strength of the synaptic connection.

What can we do with NN? • Classification • Regression Input: Real valued variables Output: One or more real values • Examples: - Predict the price of Google’s stock from Microsoft’s stock price - Predict distance to obstacle from various sensors

Recall: Regression • In linear regression we assume that y and x are related with the following equation: Y y = wx+ ε X

Multivariate regression: Least squares • We already presented a solution for determining the parameters of a general linear regression problem. x = φ + ε T y ( ) w   φ 0 ( x 1 ) φ 1 ( x 1 ) φ m ( x 1 )  Define:   φ 0 ( x 2 ) φ 1 ( x 2 ) φ m ( x 2 )    Φ =         φ 0 ( x n ) φ 1 ( x n ) φ m ( x n )     w = ( Φ T Φ ) − 1 Φ T y Then deriving w we get:

Multivariate regression: Least squares w = ( Φ T Φ ) − 1 Φ T y • The solution turns out to be: we need to invert a k by k matrix (k-1 is the number of features) • This takes O(k 3 ) • Depending on k this can be rather slow

Where we are • Linear regression – solved! • But - Solution may be slow - Does not address general regression problems of the form y = f( w T X )

Back to NN: Preceptron • The basic processing unit of a neural net Input layer Output layer 1 w 0 x 1 w 1 y=f(∑w i x i ) w 2 x 2 w k x k

Linear regression 1 w 0 • Lets start by setting f(∑w i x i )=∑w i x i x 1 w 1 • We are back to linear regression y= ∑ w i x w 2 x 2 i • Unlike our original linear regression w k solution, for perceptrons we will use a x k different strategy • Why?

Gradient descent Slope = ∂ z/ ∂ w z=(f(w)-y) 2 ∆ z ∆ w w • Going in the opposite direction to the slope will lead to a smaller z • But not too much, otherwise we would go beyond the optimal w

Gradient descent • Going in the opposite direction to the slope will lead to a smaller z • But not too much, otherwise we would go beyond the optimal w • We thus update the weights by setting: ∂ z ← − λ w w ∂ w where λ is small constant which is intended to prevent us from passing the optimal w

Example when choosing the ‘right’ λ • We get a monotonically decreasing error as we perform more updates

Gradient descent for linear regression • Taking the derivatove w.r.t. to each w i for a sample x: 2 ∂   ∑ ∑  −  = − − 2 ( ) y w x x y w x ∂ k k i k k   w k k i • And if we have n measurements then ∂ n n ∑ ∑ = − 2 T T ( ) 2 ( ) y x y - w x - w x ∂ , j j j i j j w = = 1 1 j j i where x j,i is the i’th value of the j’th input vector

Gradient descent for linear regression • If we have n measurements then ∂ n n ∑ ∑ = − 2 T T ( ) 2 ( ) y x y - w x - w x ∂ , j j j i j j w = = 1 1 j j i δ = • Set T ( ) y - w x j j j • Then our update rule can be written as n ∑ ← + λ δ 2 w w x , i i j i j = 1 j

Gradient descent algorithm for linear regression 1.Chose λ 2.Start with a guess for w 3.Compute δ j for all j n ∑ 4.For all i set ← + λ δ 2 w w x , i i j i j = 1 j n ∑ 2 T ( ) 5.If no improvement for y - w x j j = 1 j stop. Otherwise go to step 3

Gradient descent vs. matrix inversion • Advantages of matrix inversion - No iterations - No need to specify parameters - Closed form solution in a predictable time • Advantages of gradient descent - Applicable regardless of the number of parameters - General, applies to other forms of regression

Perceptrons for classification • So far we discussed regression • However, perceptrons can also be used for classification • For example, output 1 is w T x > 0 and -1 otherwise • Problem?

Regression for classification • Assume we would like to use linear regression to learn the parameters for a classification problem w T x ≥ 0 ⇒ classify as 1 • Problems? w T x < 0 ⇒ classify as -1 1 Optimal regression model -1

The sigmoid function p ( y | x ; θ ) • To classify using regression models we replace the linear function with the sigmoid 1 function: g ( h ) = Always between 0 1 + e − h and 1 1 p ( y = 0 | x ; θ ) = g (w T x ) = 1 + e w T x • Using the sigmoid we set (for binary classification problems) e w T x p ( y = 1| x ; θ ) = 1 − g (w T x ) = 1 + e w T x

The sigmoid function p ( y | x ; θ ) • To classify using regression models we replace the linear function with the sigmoid 1 function: g ( h ) = Always between 0 1 + e − h and 1 1 p ( y = 0 | x ; θ ) = g (w T x ) = We can use the sigmoid function 1 + e w T x as part of the perception when • Using the sigmoid we set (for using it for classification binary classification problems) e w T x p ( y = 1| x ; θ ) = 1 − g (w T x ) = 1 + e w T x

Logistic regression vs. Linear regression 1 p ( y = 0 | x ; θ ) = g (w T x ) = 1 + e w T x e w T x p ( y = 1| x ; θ ) = 1 − g (w T x ) = 1 + e w T x

Non linear regression with NN 1 = 1 ( ) g x + − x e • So how do we find the parameters? • Least squares minimization when using a sigmoid function in a NN: ∑ − 2 T min ( ( )) y g w x j j j Taking the derivative w.r.t. w i we get: = − − ' ( ) ( )( 1 ( )) g x g x g x ∂ ∑ − 2 T ( ( )) y g w x ∂ j j w j i ∑ = − − − T T T 2 ( ( )) ( )( 1 ( )) y g w x g w x g w x x , j j j j j i j

Deriving g’(x) • Recall that g(x) is the sigmoid function so 1 = 1 ( ) g x + − x e • The derivation of g’(x) is below

New target function for NN 1 = 1 ( ) g x + − x e • So how do we find the parameters? • Least squares minimization when using a sigmoid function in a NN: ∑ − 2 T min ( ( )) y g w x j j j Taking the derivative w.r.t. w i we get: = − − ' ( ) ( )( 1 ( )) g x g x g x ∂ ∑ − 2 T ( ( )) y g w x ∂ j j w j i ∑ = − − T T T 2 ( ( )) ( )( 1 ( )) y g w x g w x g w x x , j j j j j i j def ∑ = δ − 2 ( 1 ) g g x j j j j , i g = T ( ) g w x j j j

Revised algorithm for sigmoid regression 1.Chose λ 2.Start with a guess for w 3.Compute δ j for all j n ∑ 4.For all i set ← − λ δ − 2 ( 1 ) w w g g x , i i j j j j i = j 1 n ∑ T 2 5.If no improvement for ( )) y g - (w x j j = 1 j stop. Otherwise go to step 3

Multilayer neural networks • So far we discussed networks with one layer. • But these networks can be extended to combine several layers, increasing the set of functions that can be represented using a NN Input layer Hidden layer Output layer 1 w 0,1 v 1 =g( w T x ) w 0,2 w 1,1 w 1 w 2,1 x 1 v 1 =g( w T v ) w 1,2 w 2 w 2,2 v 2 =g( w T x ) x 2

Learning the parameters for multilayer networks • Gradient descent works by connecting the output to the inputs. • But how do we use it for a multilayer network? • We need to account for both, the output weights and the hidden layer weights 1 w 0,1 v 1 =g( w T x ) w 0,2 w 1,1 w 1 w 2,1 x 1 v 1 =g( w T v ) w 1,2 w 2 w 2,2 v 2 =g( w T x ) x 2

Learning the parameters for multilayer networks • If we know the values of the internal layer, it is easy to compute the update rule for the output weights w 1 and w 2 : n ∑ ← + λ δ − 2 ( 1 ) w w g g v i i j j j j , i = 1 j δ = − T ( ) y g w v j j j where 1 w 0,1 v 1 =g( w T x ) w 0,2 w 1,1 w 1 w 2,1 x 1 y=g( w T v ) w 1,2 w 2 w 2,2 v 2 =g( w T x ) x 2

Algorithms in Nature Neural Networks (NN) Mimicking the brain In - PowerPoint PPT Presentation

Algorithms in Nature Neural Networks (NN) Mimicking the brain In the early days of AI there was a lot of interest in developing models that can mimic human thinking. While no one knew exactly how the brain works (and, even though there

NATURE HEALS Heather Greeley Benson, Program Specialist Nature-Based Therapeutic Services Nature

Algorithms in Nature Nature inspired algorithms http://www.cs.cmu.edu/~02317/ Ziv Bar-Joseph

T odays Agenda Nature Kindergarten Using Nature to Connect Children to STEM Why is

Nature provides no standard of length Nature provides no standard of volume Nature provides no

Herzog Park Nature Play Area Herzog Nature Play Area Help Nature & Have Fun Rathgar

Building with Nature Jenny Stuart Building with Nature Assessor Cornwall Wildlife Trust

NamibRand Nature Reserve NamibRand Nature Reserve www.namibrand.org Private Nature Reserves

The Nature and Triumph of Islam The Nature and Triumph of Islam The Nature and Triumph of Islam

The State of Nature (2) Rousseau, Locke, and Hobbes Review .. Aristotle : State of Nature and

Today Lockes Second Treatise of Government State of nature: freedom, equality, law of nature

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

Software Architecture III Leveraging Nature to Build Better Systems Yuriy Brun

General remarks Algorithms Algorithms Oliver Oliver Week 8 Kullmann Kullmann Greedy Greedy

What is Loose Parts, Nature Play? Children are invited to engage in unstructured

Y Cognitive Enhancement with Transcranial P O Direct Current Stimulation (tDCS) C T O N O

New approaches to simultaneously drive and measure neuronal activity Gyorgy Lur, PhD Bio Sci

CSE/NEUBEH 528 Modeling Synapses and Networks (Chapter 7) Image from Wikimedia Commons R. Rao,

Thyroid hormone transcriptome and cerebellum development . 1) Developmental biology: General

Connectivity of the Olfactory System CB 1 Plasticity Squire et al., 2013; Holy, 2010 Kauer and

Conductance-Based Models Fundamentals of Computational Neuroscience, T. P . Trappenberg, 2010.

-Presented By Arnab Ghosh Shubhangi Agarwal Epilepsy is a brain disorder in which clusters of

What Is Cerebral Palsy? Off Label Use Is it brain damage due to obstetrical All of the

Algorithms in Nature Neural Networks (NN) Mimicking the brain In - PowerPoint PPT Presentation

Algorithms in Nature Neural Networks (NN) Mimicking the brain In the early days of AI there was a lot of interest in developing models that can mimic human thinking. While no one knew exactly how the brain works (and, even though there

NATURE HEALS Heather Greeley Benson, Program Specialist Nature-Based Therapeutic Services Nature

Algorithms in Nature Nature inspired algorithms http://www.cs.cmu.edu/~02317/ Ziv Bar-Joseph

T odays Agenda Nature Kindergarten Using Nature to Connect Children to STEM Why is

Nature provides no standard of length Nature provides no standard of volume Nature provides no

Herzog Park Nature Play Area Herzog Nature Play Area Help Nature &amp; Have Fun Rathgar

Building with Nature Jenny Stuart Building with Nature Assessor Cornwall Wildlife Trust

NamibRand Nature Reserve NamibRand Nature Reserve www.namibrand.org Private Nature Reserves

The Nature and Triumph of Islam The Nature and Triumph of Islam The Nature and Triumph of Islam

The State of Nature (2) Rousseau, Locke, and Hobbes Review .. Aristotle : State of Nature and

Today Lockes Second Treatise of Government State of nature: freedom, equality, law of nature

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

Software Architecture III Leveraging Nature to Build Better Systems Yuriy Brun

General remarks Algorithms Algorithms Oliver Oliver Week 8 Kullmann Kullmann Greedy Greedy

What is Loose Parts, Nature Play? Children are invited to engage in unstructured

Y Cognitive Enhancement with Transcranial P O Direct Current Stimulation (tDCS) C T O N O

New approaches to simultaneously drive and measure neuronal activity Gyorgy Lur, PhD Bio Sci

CSE/NEUBEH 528 Modeling Synapses and Networks (Chapter 7) Image from Wikimedia Commons R. Rao,

Thyroid hormone transcriptome and cerebellum development . 1) Developmental biology: General

Connectivity of the Olfactory System CB 1 Plasticity Squire et al., 2013; Holy, 2010 Kauer and

Conductance-Based Models Fundamentals of Computational Neuroscience, T. P . Trappenberg, 2010.

-Presented By Arnab Ghosh Shubhangi Agarwal Epilepsy is a brain disorder in which clusters of

What Is Cerebral Palsy? Off Label Use Is it brain damage due to obstetrical All of the

Herzog Park Nature Play Area Herzog Nature Play Area Help Nature & Have Fun Rathgar