fundamentals of computational neuroscience 2e
play

Fundamentals of Computational Neuroscience 2e December 27, 2009 - PowerPoint PPT Presentation

Fundamentals of Computational Neuroscience 2e December 27, 2009 Chapter 6: Feed-forward mapping networks Digital representation of a letter A . . 0 1 2 3 0 <-15 1 13 14 15 . . 23 24 25 0 1 0 33 34 35 . . 0 1 0 . .


  1. Fundamentals of Computational Neuroscience 2e December 27, 2009 Chapter 6: Feed-forward mapping networks

  2. Digital representation of a letter A . . 0 1 2 3 0 <-15 1 13 14 15 . . 23 24 25 0 1 0 33 34 35 . . 0 1 0 . . Optical character recognition : Predict meaning from features. E.g., given features x , what is the character y f : x ∈ S n 1 → y ∈ S m 2

  3. Examples given by lookup table Boolean AND function x 1 x 2 y 0 0 1 0 1 0 1 0 0 1 1 1 Look-up table for a non-boolean example function x 1 x 2 y 1 2 -1 2 1 1 3 -2 5 -1 -1 7 ... ... ...

  4. The population node as perceptron Update rule: r out = g ( wr in ) (component-wise: r out j w ij r in = g ( P j ) ) i For example: r in = x i , ˜ y = r out , linear grain function g ( x ) = x : i ˜ y = w 1 x 1 + w 2 x 2 5 y,~ y 0 in r 1 -5 w 1 r out 4 g Σ 2 4 2 0 0 x 2 2 w 2 in 2 r 2 x 1 4 4

  5. How to find the right weight values? Objective (error) function , for example: mean square error (MSE) E = 1 � ( r out − y i ) 2 i 2 i Gradient descent method: w ij ← w ij − ǫ ∂ E ∂ w ij = w ij − ǫ ( y i − r out ) r in for MSE, linear gain i j E(w) w Initialize weights arbitrarily Repeat until error is sufficiently small Apply a sample pattern to the input nodes: r 0 i = r in = ξ in i i Calculate rate of the output nodes: r out j w ij r in = g ( P j ) i Compute the delta term for the output layer: δ i = g ′ ( h out i )( ξ out − r out ) i i Update the weight matrix by adding the term: ∆ w ij = ǫδ i r in j

  6. Example: OCR A. Training pattern B. Learning curve C. Generalization ability 12 Average number of wrong letters >> displayLetter(1) Average number of wrong bits Threshold activation +++ function 10 25 +++ +++++ 8 20 ++ ++ ++ ++ 6 15 +++ +++ +++++++++ Max activation 10 4 +++++++++++ function +++ +++ 5 +++ +++ 2 +++ +++ 0 +++ +++ 0 0 5 10 15 20 0 0.1 0.2 0.3 0.4 0.5 Training step Fraction of flipped bits

  7. Example: Boolean function A. Boolean OR function x 1 x x y w = 1 1 2 x 2 2 0 0 0 w = 1 0 1 1 1 Σ x 1 y 1 0 1 1 1 1 x =1 w = Θ = 1 x 0 2 0 w x + w x = Θ 1 1 2 2 B. Boolean XOR function x 1 x 1 x 1 x x y 1 2 0 0 0 ? 0 1 1 1 0 1 x 2 1 1 0 x 2 x 2

  8. perceptronTrain.m 1 %% Letter recognition with threshold perceptron 2 clear; clf; 3 nIn=12*13; nOut=26; 4 wOut=rand(nOut,nIn)-0.5; 5 6 % training vectors 7 load pattern1; 8 rIn=reshape(pattern1’, nIn, 26); 9 rDes=diag(ones(1,26)); 10 11 % Updating and training network 12 for training_step=1:20; 13 % test all pattern 14 rOut=(wOut*rIn)>0.5; 15 distH=sum(sum((rDes-rOut).ˆ2))/26; 16 error(training_step)=distH; 17 % training with delta rule 18 wOut=wOut+0.1*(rDes-rOut)*rIn’; 19 end 20 21 plot(0:19,error) 22 xlabel(’Training step’) 23 ylabel(’Average Hamming distance’)

  9. The mulitlayer Perceptron (MLP) n in n h n out r h 1 in r 1 out r 1 in r 2 out r n out in r n in h out w w Update rule: r out = g out ( w out g h ( w h r in )) Learning rule (error backpropagation): w ij ← w ij − ǫ ∂ E ∂ w ij

  10. The error-backpropagation algorithm Initialize weights arbitrarily Repeat until error is sufficiently small Apply a sample pattern to the input nodes: r 0 i := r in = ξ in i i Propagate input through the network by calculating the rates of nodes in successive layers l : r l i = g ( h l j w l ij r l − 1 i ) = g ( P ) j Compute the delta term for the output layer: δ out = g ′ ( h out i )( ξ out − r out ) i i i Back-propagate delta terms through the network: δ l − 1 = g ′ ( h l − 1 j w l ji δ l ) P j i i Update weight matrix by adding the term: ∆ w l ij = ǫδ l i r l − 1 j

  11. mlp.m 1 %% MLP with backpropagation learning on XOR problem 2 clear; clf; 3 N_i=2; N_h=2; N_o=1; 4 w_h=rand(N_h,N_i)-0.5; w_o=rand(N_o,N_h)-0.5; 5 6 % training vectors (XOR) 7 r_i=[0 1 0 1 ; 0 0 1 1]; 8 r_d=[0 1 1 0]; 9 10 % Updating and training network with sigmoid activation function 11 for sweep=1:10000; 12 % training randomly on one pattern 13 i=ceil(4*rand); 14 r_h=1./(1+exp(-w_h*r_i(:,i))); 15 r_o=1./(1+exp(-w_o*r_h)); 16 d_o=(r_o.*(1-r_o)).*(r_d(:,i)-r_o); 17 d_h=(r_h.*(1-r_h)).*(w_o’*d_o); 18 w_o=w_o+0.7*(r_h*d_o’)’; 19 w_h=w_h+0.7*(r_i(:,i)*d_h’)’; 20 % test all pattern 21 r_o_test=1./(1+exp(-w_o*(1./(1+exp(-w_h*r_i))))); 22 d(sweep)=0.5*sum((r_o_test-r_d).ˆ2); 23 end 24 plot(d)

  12. MLP for XOR function Learning curve for XOR problem 0.5 Training error 1 0.4 in r 0.5 0.5 1 1 1 0.3 out r 0.5 1 0.2 in 0 5000 10000 r 2 0.5 1.5 1 2 Training steps

  13. MLP approximating sine function f ( x ) 1 0 − 1 − 2 0 2 4 6 8 x

  14. Overfitting and underfitting 3 overfitting f ( x ) 2 underfitting 1 0 true mean − 1 0 1 2 3 x Regularization, for example E = 1 1 � − y i ) 2 − γ r � w 2 ( r out i i 2 2 i i

  15. Support Vector Machines Linear large-margine classifier x 1 x 2

  16. SVM: Kernel trick A. Linear not separable case B. Linear separable case φ (x)

  17. Further Readings Simon Haykin (1999), Neural networks: a comprehensive foundation , MacMillan (2nd edition). John Hertz, Anders Krogh, and Richard G. Palmer (1991), Introduction to the theory of neural computation , Addison-Wesley. Berndt M¨ uller, Joachim Reinhardt, and Michael Thomas Strickland (1995), Neural Networks: An Introduction , Springer Christopher M. Bishop (2006), Pattern Recognition and Machine Learning , Springer Laurence F . Abbott and Sacha B. Nelson (2000), Synaptic plasticity: taming the beast , in Nature Neurosci. (suppl.) , 3: 1178–83. Christopher J. C. Burges (1998), A Tutorial on Support Vector Machines for Pattern Recognition in Data Mining and Knowledge Discovery 2:121–167. Alex J. Smola and Bernhard Sch¨ olhopf (2004), A tutorial on support vector regression in Statistics and computing 14: 199-222. David E. Rumelhart, James L. McClelland, and the PDP research group (1986), Parallel Distributed Processing: Explorations in the Microstructure of Cognition , MIT Press. Peter McLeod, Kim Plunkett, and Edmund T. Rolls (1998), Introduction to connectionist modelling of cognitive processes , Oxford University Press. E. Bruce Goldstein (1999), Sensation & perception , Brooks/Cole Publishing Company (5th edition).

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend