c reg c regression ression layered layered ne neur ural
play

(C) Reg (C) Regression ression, , layered layered ne neur ural - PowerPoint PPT Presentation

(C) Reg (C) Regression ression, , layered layered ne neur ural netw tworks - Networks of conti tinuous units ts - Reg Regres ression ion problems - Gradient t descent, t, backpropagation of error - The role of the learning rate te - O


  1. (C) Reg (C) Regression ression, , layered layered ne neur ural netw tworks - Networks of conti tinuous units ts - Reg Regres ression ion problems - Gradient t descent, t, backpropagation of error - The role of the learning rate te - O Onlin line learn e learnin ing, , stochastic approximation

  2. Of Neurons and Netw tworks biolog biological n ical neu euron rons (very brief) - single neurons - synapses and networks - synaptic plasticity and learning simplified descripti tion - inspirati tion for arti tificial neural netw tworks arti tificial neural netw tworks - - architectures and types of networks: recurrent attr ttracto tor neural netw tworks (associative memory) feed-forward neural netw tworks (classification/ regression) Neural Networks 2

  3. Of Neurons and Netw tworks ne neur urons: ns: pre-synaptic post-synaptic highly specialized cells dendrites cell body so soma ma • incoming dendrite tes • branched axon axon • many ma ny ne neur urons ns ! ≳ 10 12 in human cortex axon soma highly connecte ted ! ≳ 1000 neighbors synaptic cleft axon acti tion pote tenti tials / spikes: branches ∙ travel along ∙ cells generate the axon electric pulses Neural Networks 3

  4. Of Neurons and Netw tworks pre-synaptic synap sy napses: ses: ∙ pre-synaptic pulse arriving at vesicles excita tato tory /inhibito tory synapse transmitter triggers / hinders post-synaptic spike generation synaptic cleft receptors excitatory: increase ∙ incoming the postsynaptic pulse membrane potential inhibitory: decrease post-synaptic ∙ all or nothing response potential exceeds th threshold ⇨ postsynaptic neuron fires potential is sub-th threshold ⇨ posts tsynapti tic neuron rests Neural Networks 4

  5. Of Neurons and Netw tworks simplified description of neural activity: firing rate tes single spikes time [ms] e.g. spikes / ms mean activity S(t) Neural Networks 5

  6. Of Neurons and Netw tworks (mean) local pote tenti tial at neuron i (with activity S i ) X weighte ted sum of incoming activities w ij S j j j excita tato tory synapse  > 0 synapti tic  w ij = = 0 weights ts inhibito tory synapse < 0  i Neural Networks 6

  7. Acti tivati tion Functi tion hP i non-l no n-line near resp sponse nse: S i = h j w ij S j ∙ minimal activity h(x → - ∞ ) ≡ 0 important class of fcts.: ∙ maximal activity h(x → + ∞ ) ≡ 1 sigmo sigmoid idal al acti tivati tion ∙ monotonic increase h’(x) > 0 h ( x i ) = 1 ⇣ ⌘ just one example: 1 + tanh [ γ ( x i − θ )] 2 1 gain parameter Υ Υ local threshold θ X x i = w ij S j 0 θ j Neural Networks 7

  8. Acti tivati tion Functi tion hP i no non-l n-line near resp sponse nse: S i = g j w ij S j ∙ minimal activity g(x → - ∞ ) ≡ -1 sigmo sigmoid idal al acti tivati tion ∙ maximal activity g(x → + ∞ ) ≡ 1 ∙ monotonic increase g ’ (x) > 0 just one example: g ( x i ) = tanh [ γ ( x i − θ )] 1 gain parameter Υ Υ local threshold θ X x i = w ij S j -1 j θ Neural Networks 8

  9. McCulloch Pitts tts Neurons an extreme case: infinite te gain γ → ∞ ⇢ +1 for x ≥ θ g ( x i ) = tanh [ γ ( x i − θ )] → sign [ x − θ ] = − 1 for x < θ McCulloch Pitts tts [1943]: model neuron is either quiescent or maximally active do not consider graded response local threshold θ 1 ( don’t confuse θ with the all-or-nothing X x i = w ij S j threshold in spiking -1 j θ neurons ) Neural Networks 9

  10. Synapti tic Plasti ticity ty D. D. Heb Hebb [1949] [1949] Hypothesis: Heb Hebbian ian Learn Learnin ing A consider - presynaptic neuron A - postsynaptic neuron B - excitatory synapse w BA B If A and B (frequently) fire at the same time the excitatory synaptic strength increases w AB → memory-effect will favor joint activity in the future For symmetrized firing rates − 1 ≤ S A , S B ≤ +1 change of synaptic strength ∆ w BA ∝ S A S B pre-synaptic x post-synaptic activity Neural Networks 10

  11. Arti tificial Neural Netw tworks in the following: - assembled from simple firing rate neurons - connected by weights, real valued synaptic strenghts - various architectures and types of networks e.g.: attr ttracto tor neural netw tworks, recurrent t netw tworks w ij S i ( t ) dynamical systems, e.g. Hopfield model: S j ( t ) network of McCulloch Pitts neurons, can operate as Associative Memory by learning of synaptic interactions here: N=5 neurons partial connectivity Neural Networks 11

  12. feed-forward netw tworks layered archite tectu ture input t layer (external stimulus) (here: 6-3-4-1) directe ted connecti tions (here: only to next layer) hidden units ts (internal representation) w ij 0 1 @X S i = g w ij S j A j ↑ previous layer only outp tput t unit( t(s) (function of input vector) Neural Networks 12

  13. the perceptr th tron revisite ted input t units ts ξ j ∈ I R R N weights ts w j ∈ I R, w ∈ I single outp tput t unit 0 1 N X S = sign w j ξ j − θ @ A j =1 output = “ linear separable functi tion ” of input variables parameterized by the weight vector and threshold θ w Neural Networks 13

  14. convergent t tw two-layer archite tectu ture R N input t units ts ξ j ∈ I R, ξ ∈ I w ( k ) input t to to hidden weights ts j 0 1 hidden layer units ts w ( k ) @X S k = g ξ j A j j hidden to to outp tput t weights ts v k single outp tput t unit σ output = non-linear functi tion of input variables: 0 0 1 1 K ! w ( k ) X @X @X σ = g v k S k = g v k g ξ j A A j k =1 k j parameterized by set of all weights (and threshold) Neural Networks 14

  15. netw tworks of conti tinuous nodes continuous activation functions, e.g. g ( x ) = tanh ( γ x ) for all nodes in the network given a network architecture, the weights (and thresholds) parameterize a function (input/output relation): R N → σ ( ξ ) ∈ I (here: single output unit) ξ ∈ I R Learning as reg regression ression problem problem µ , τ µ = { ξ µ , τ µ = τ ( ξ µ ) } P set of examples with real-valued labels µ =1 tr training: (approximately) implement σ ( ξ µ ) = τ ( ξ µ ) for all µ generalizati tion: σ ( ξ ) ≈ τ ( ξ ) application to novel data Neural Networks 15

  16. error measure and tr training training strategy: employ an error m error measu easure re for comparison of student/teacher outputs just one very popular and plausible choice: e ( σ , τ ) = 1 2 ( σ − τ ) 2 quadrati tic deviati tion: P P E = 1 e µ = 1 1 ⌘ 2 ⇣ X X cost t functi tion: σ ( ξ µ ) − τ ( ξ µ ) 2 P P µ =1 µ =1 - defined for a given set of example data - guides the training process - is a differenti tiable functi tion of weights and thresholds - training by gradient t descent t minimization of E Neural Networks 16

  17. a single unit t . . . . . . R N ξ j ∈ I R, ξ ∈ I R N w ∈ I 0 1 N X σ = g w j ξ j @ A j =1 P E ( w ) = 1 1 g ( w · ξ µ ) − τ µ ⌘ 2 ⇣ X 2 P µ =1 P ∂ E ( w ) = 1 ⇣ g ( w · ξ µ ) − τ µ ⌘ X g 0 ( w · ξ µ ) ξ µ k ∂ w k P µ =1 P r w E ( w ) = 1 ⇣ g ( w · ξ µ ) � τ µ ⌘ X g 0 ( w · ξ µ ) ξ µ P µ =1 Neural Networks 17

  18. convenient calculation of the gradient in multilayer networks ( chain rule) Backpropagation of Error example: continuous two-layer network with hidden units convenient calculation of the gradient in multilayer networks ( ← chain rule) inputs example: continuous two-layer network with K hidden units weights inputs R N ξ ∈ I weights R N , k = 1 , 2 , . . . , K w k ∈ I hidden units convenient calculation of the gradient in multilayer networks ( chain rule) hidden units σ k ( ξ ) = g ( w k · ξ ) example: continuous two-layer network with hidden units output inputs ⇣P K ⌘ output σ ( ξ ) = h j =1 v j g ( w j · ξ ) derive and weights derive and the weigths w k and v k are used ... hidden units the weigths – downward for the calculation of hidden states and output and are used ... – upward for the calculation of the gradient ⇣P ⌘ output – for the calculation of hidden states and output 75 – for the calculation of the gradient derive and ∂ E Exercise: r w k E ∂ v k the weigths and are used ... 18 – for the calculation of hidden states and output – for the calculation of the gradient

  19. backpropagati tion A.E. Bryson, Y.-C. Ho (1969) (1969) Applied optimal control: optimization, estimation and control. Blaisdell Publishing, p 481 P. Werbos (1974). (1974). Beyond regression: New Tools for Prediction and Analysis in Behavorial Sciences PhD thesis, Harvard University D.E. Rumelhart, G.E. Hinton, R.J. Williams (1986) (1986) Learning representations by backpropagating errors. Nature 323 (6088): 533-536 Neural Networks 19

  20. backpropagati tion 1987 1995 Neural Networks 20

  21. negative gradient gives the direction of steepest descent in E simple gradient based minimization of E : sequence w 0 → w 1 → . . . → w t → w t +1 → . . . with w t +1 = w t − η r E | w t approaches some minimum of (?) E learning rate rate η – controls the step size of the algorithm – has to be small enough to ensure convergence – should be as large as possible to facilitate fast learning 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend