synapse a j a i dendrite axon input input activation
play

Synapse a j a i Dendrite Axon Input Input Activation Output - PDF document

Outline Brains Neural networks Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 15 Chapter 19, Sections 15 1 Chapter 19, Sections 15 2 Brains


  1. Outline ♦ Brains ♦ Neural networks Neural networks ♦ Perceptrons ♦ Multilayer perceptrons ♦ Applications of neural networks Chapter 19, Sections 1–5 Chapter 19, Sections 1–5 1 Chapter 19, Sections 1–5 2 Brains McCulloch–Pitts “unit” 10 11 neurons of > 20 types, 10 14 synapses, 1ms–10ms cycle time Output is a “squashed” linear function of the inputs: Signals are noisy “spike trains” of electrical potential a i ← g ( in i ) = g � Σ j W j,i a j � Bias Weight a 0 = − 1 a i = g ( in i ) W 0 ,i Axonal arborization g Axon from another cell in i W j,i Σ Synapse a j a i Dendrite Axon Input Input Activation Output Output Nucleus Links Function Function Links Synapses Cell body or Soma Chapter 19, Sections 1–5 3 Chapter 19, Sections 1–5 4 Activation functions Implementing logical functions g ( in i ) g ( in i ) W 0 = 1.5 W 0 = 0.5 W 0 = 0.5 W 1 = 1 W 1 = 1 + 1 + 1 W 1 = 1 W 2 = 1 W 2 = 1 AND OR NOT McCulloch and Pitts: every Boolean function can be implemented in i in i (a) (b) (a) is a step function or threshold function (b) is a sigmoid function 1 / (1 + e − x ) Changing the bias weight W 0 ,i moves the threshold location Chapter 19, Sections 1–5 5 Chapter 19, Sections 1–5 6

  2. Network structures Feed-forward example Feed-forward networks: W 1,3 – single-layer perceptrons 1 3 W – multi-layer perceptrons 3,5 W 1,4 Feed-forward networks implement functions, have no internal state 5 Recurrent networks: – Hopfield networks have symmetric weights ( W i,j = W j,i ) W W 2,3 g ( x ) = sign ( x ) , a i = ± 1 ; holographic associative memory 4,5 2 4 – Boltzmann machines use stochastic activation functions, W 2,4 ≈ MCMC in BNs – recurrent neural nets have directed cycles with delays Feed-forward network = a parameterized family of nonlinear functions: have internal state (like flip-flops), can oscillate etc. ⇒ a 5 = g ( W 3 , 5 · a 3 + W 4 , 5 · a 4 ) = g ( W 3 , 5 · g ( W 1 , 3 · a 1 + W 2 , 3 · a 2 ) + W 4 , 5 · g ( W 1 , 4 · a 1 + W 2 , 4 · a 2 )) Chapter 19, Sections 1–5 7 Chapter 19, Sections 1–5 8 Perceptrons Expressiveness of perceptrons Consider a perceptron with g = step function (Rosenblatt, 1957, 1960) Can represent AND, OR, NOT, majority, etc. Represents a linear separator in input space: Σ j W j x j > 0 or W · x > 0 Perceptron output 1 I I I 0.9 1 1 1 0.8 0.7 0.6 0.5 1 1 1 0.4 0.3 0.2 -4 -2 0 2 4 0.1 Output Input 0 W j,i ? -4 x 2 Units Units -2 0 2 x 1 4 0 0 0 I I I 0 1 0 1 0 1 2 2 2 I I I I I xor I (a) and (b) or (c) 1 2 1 2 1 2 Chapter 19, Sections 1–5 9 Chapter 19, Sections 1–5 10 Perceptron learning Perceptron learning contd. Learn by adjusting weights to reduce error on training set Perceptron learning rule converges to a consistent function for any linearly separable data set The squared error for an example with input x and true output y is E = 1 2 Err 2 ≡ 1 1 1 2( y − h W ( x )) 2 , Proportion correct on test set Proportion correct on test set 0.9 0.9 Perform optimization search by gradient descent: 0.8 0.8 Decision tree Perceptron 0.7 0.7 ∂E = Err × ∂ Err ∂ y − g ( Σ n � � = Err × j = 0 W j x j ) 0.6 Perceptron 0.6 ∂W j ∂W j ∂W j Decision tree 0.5 0.5 = − Err × g ′ ( in ) × x j 0.4 0.4 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Simple weight update rule: Training set size Training set size W j ← W j + α × Err × g ′ ( in ) × x j E.g., +ve error ⇒ increase network output ⇒ increase weights on +ve inputs, decrease on -ve inputs Chapter 19, Sections 1–5 11 Chapter 19, Sections 1–5 12

  3. Multilayer perceptrons Expressiveness of MLPs Layers are usually fully connected; All continuous functions w/ 2 layers, all functions w/ 3 layers numbers of hidden units typically chosen by hand Output units a i h W ( x 1 , x 2 ) h W ( x 1 , x 2 ) 0.9 1 W j,i 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 Hidden units 0.3 a j 0.2 0.2 0.1 -4 -2 0 2 4 -4 -2 0 2 4 0.1 0 0 x 2 x 2 -4 -4 -2 -2 0 0 2 2 x 1 x 1 4 4 W k,j Input units a k Chapter 19, Sections 1–5 13 Chapter 19, Sections 1–5 14 Back-propagation learning Back-propagation derivation Output layer: same as for single-layer perceptron, The squared error on a single example is defined as E = 1 i ( y i − a i ) 2 , W j,i ← W j,i + α × a j × ∆ i � 2 where ∆ i = Err i × g ′ ( in i ) where the sum is over the nodes in the output layer. Hidden layer: back-propagate the error from the output layer: ∂E = − ( y i − a i ) ∂a i = − ( y i − a i ) ∂g ( in i ) ∂W j,i ∂W j,i ∂W j,i ∆ j = g ′ ( in j ) i W j,i ∆ i . � = − ( y i − a i ) g ′ ( in i ) ∂ in i ∂   = − ( y i − a i ) g ′ ( in i ) j W j,i a j �   Update rule for weights in hidden layer: ∂W j,i ∂W j,i   = − ( y i − a i ) g ′ ( in i ) a j = − a j ∆ i W k,j ← W k,j + α × a k × ∆ j . (Most neuroscientists deny that back-propagation occurs in the brain) Chapter 19, Sections 1–5 15 Chapter 19, Sections 1–5 16 Back-propagation derivation contd. Back-propagation learning contd. At each epoch, sum gradient updates for all examples and apply ∂E i ( y i − a i ) ∂a i i ( y i − a i ) ∂g ( in i ) = − = − � � ∂W k,j ∂W k,j ∂W k,j 14 i ( y i − a i ) g ′ ( in i ) ∂ in i ∂   = − = − i ∆ i j W j,i a j 12 � � �   Total error on training set   ∂W k,j ∂W k,j 10 ∂a j ∂g ( in j ) = − i ∆ i W j,i = − i ∆ i W j,i � � 8 ∂W k,j ∂W k,j 6 i ∆ i W j,i g ′ ( in j ) ∂ in j = − � ∂W k,j 4 ∂   2 = − i ∆ i W j,i g ′ ( in j ) k W k,j a k � �     ∂W k,j 0 i ∆ i W j,i g ′ ( in j ) a k = − a k ∆ j 0 50 100 150 200 250 300 350 400 = − � Number of epochs Usual problems with slow convergence, local minima Chapter 19, Sections 1–5 17 Chapter 19, Sections 1–5 18

  4. Back-propagation learning contd. Handwritten digit recognition 1 0.9 % correct on test set 0.8 0.7 3-nearest-neighbor = 2.4% error 0.6 Multilayer network 400–300–10 unit MLP = 1.6% error Decision tree LeNet: 768–192–30–10 unit MLP = 0.9% 0.5 0.4 0 10 20 30 40 50 60 70 80 90 100 Training set size Chapter 19, Sections 1–5 19 Chapter 19, Sections 1–5 20 Summary Most brains have lots of neurons; each neuron ≈ linear–threshold unit (?) Perceptrons (one-layer networks) insufficiently expressive Multi-layer networks are sufficiently expressive; can be trained by gradient descent, i.e., error back-propagation Many applications: speech, driving, handwriting, credit cards, etc. Chapter 19, Sections 1–5 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend