 
              CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward N/W; Error Backpropagation 29 th March, 2011
The Perceptron Model . y = 1 for Σw i x i >=θ = 0 otherwise Output = y Threshold = θ w 1 w n W n-1 x 1 X n-1
y 1 Σw i x i θ
Perceptron Training Algorithm Start with a random value of w 1. ex: <0,0,0…> Test for wx i > 0 2. If the test succeeds for i=1,2,…n then return w 3. Modify w, w next = w prev + x fail
Feedforward Network
Example - XOR θ = 0.5 � Calculation of XOR w 1 =1 w 2 =1 x 1 x 2 x 1 x 2 x x 1 x x 2 x x x 1 x Calculation of x 1 x 2 2 0 0 0 0 < Θ θ = 1 0 1 1 2 ≥ Θ w w 1 =-1 w 2 =1.5 1 < Θ w 1 0 0 1 2 + < Θ x 2 w w x 1 1 1 0
Example - XOR θ = 0.5 w 1 =1 w 2 =1 x 1 x 2 1 1 x 1 x 2 1.5 -1 -1 1.5 x 2 x 1
Can Linear Neurons Work? = + y m x c 3 3 h 2 h 1 = + = + y m x c y m x c 1 1 2 2 x 2 x 1 ( ) = + + h m w x w x c 1 1 1 1 2 2 1 ( ) = + + h m w x w x c 1 1 1 1 2 2 1 ( ) = + + Out w h w h c 5 1 6 2 3 = + + k x k x k 1 1 2 2 3
Note: The whole structure shown in earlier slide is reducible to a single neuron with given behavior = + + Out k x k x k 1 1 2 2 3 Claim: A neuron with linear I-O behavior can’t compute X- OR. Proof: Considering all possible cases: Proof: Considering all possible cases: [assuming 0.1 and 0.9 as the lower and upper thresholds] ( . 0 . 0 ) 0 . 1 + − θ + < m w w c 1 2 . 0 . 1 − θ < c m ⇒ For (0,0), Zero class: ( . 1 . 0 ) 0 . 9 + − θ + > m w w c 2 1 . . 0 . 9 − θ + > m w m c 1 ⇒ For (0,1), One class:
. . 0 . 9 − θ + > m w m c 1 For (1,0), One class: . . 0 . 9 − θ + > m w m c For (1,1), Zero class: 1 These equations are inconsistent. Hence X-OR can’t be computed. Observations: A linear neuron can’t compute X-OR. 1. A multilayer FFN with linear neurons is collapsible to a 2. single linear neuron, hence no a additional power due to hidden layer. Non-linearity is essential for power. 3.
Multilayer Perceptron
Gradient Descent Technique � Let E be the error at the output layer 1 p n ( ) 2 = − E t o ∑∑ i i j 2 1 1 = = j i � t i = target output; o i = observed output � i is the index going over n neurons in the outermost layer � j is the index going over the p patterns (1 to p) � Ex: XOR:– p=4 and n=1
Weights in a FF NN � w mn is the weight of the m w mn connection from the n th neuron to the m th neuron n W � E vs surface is a complex surface in the space defined by the weights w ij δ E − δ E gives the direction in δ w � ∆ ∝ − w mn mn δ w which a movement of the mn operating point in the w mn co- ordinate space will result in maximum decrease in error
Sigmoid neurons � Gradient Descent needs a derivative computation - not possible in perceptron due to the discontinuous step function used! � Sigmoid neurons with easy-to-compute derivatives used! 1 as → → ∞ y x 0 as → → −∞ y x � Computing power comes from non-linearity of sigmoid function.
Derivative of Sigmoid function 1 = y 1 − + x e 1 1 − x dy dy e e ( ( − − ) ) = = − − − − x x = = e e ( 1 − ) 2 ( 1 − ) 2 + + x x dx e e 1 1 1 ( 1 ) = − = −   y y   1 1 − − + x + x e e  
Training algorithm � Initialize weights to random values. � For input x = <x n ,x n-1 ,…,x 0 >, modify weights as follows Target output = t, Observed output = o Target output = t, Observed output = o δ E ∆ ∝ − w i δ w i 1 ( ) = − E t o 2 2 � Iterate until E < δ (threshold)
Calculation of ∆w i δ δ δ 1 − E E net n   : = × = where net w x   ∑ i i δ δ δ w net w = 0 i i i   δ δ δ E o net = × × δ δ δ δ δ δ o o net net w w i i ( ) ( 1 ) = − − − t o o o x i δ E ( learning constant, 0 1 ) ∆ = − η η = ≤ η ≤ w i δ w i ( ) ( 1 ) ∆ = η − − w t o o o x i i
Observations Does the training technique support our intuition? � The larger the x i , larger is ∆w i � Error burden is borne by the weight values corresponding to large input values
Recommend
More recommend