Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural - PowerPoint PPT Presentation

Neural networks Chapter 20 Chapter 20 1

Outline ♦ Brains ♦ Neural networks ♦ Perceptrons ♦ Multilayer networks ♦ Applications of neural networks Chapter 20 2

Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axonal arborization Axon from another cell Synapse Dendrite Axon Nucleus Synapses Cell body or Soma Chapter 20 3

McCulloch–Pitts “unit” Output is a “squashed” linear function of the inputs: � Σ j W j,i a j � a i ← g ( in i ) = g Bias Weight a 0 = − 1 a i = g ( in i ) W 0 ,i g in i W j,i Σ a j a i Input� Input� Activation� Output� Output Links Function Function Links Chapter 20 4

Activation functions g ( in i ) g ( in i ) + 1 + 1 in i in i (a)� (b)� (a) is a step function or threshold function (b) is a sigmoid function 1 / (1 + e − x ) Changing the bias weight W 0 ,i moves the threshold location Chapter 20 5

Implementing logical functions McCulloch and Pitts: every Boolean function can be implemented (with large enough network) AND? OR? NOT? MAJORITY? Chapter 20 6

Implementing logical functions McCulloch and Pitts: every Boolean function can be implemented (with large enough network) W 0 = 1.5 W 0 = 0.5 W 0 = – 0.5 W 1 = 1 W 1 = 1 W 1 = –1 W 2 = 1 W 2 = 1 AND OR NOT Chapter 20 7

Network structures Feed-forward networks: – single-layer perceptrons – multi-layer networks Feed-forward networks implement functions, have no internal state Recurrent networks: – Hopfield networks have symmetric weights ( W i,j = W j,i ) g ( x ) = sign ( x ) , a i = ± 1 ; holographic associative memory – Boltzmann machines use stochastic activation functions, ≈ MCMC in BNs – recurrent neural nets have directed cycles with delays ⇒ have internal state (like flip-flops), can oscillate etc. Chapter 20 8

Feed-forward example W 1,3 1 3 W W 3,5 1,4 5 W W 2,3 4,5 2 4 W 2,4 Feed-forward network = a parameterized family of nonlinear functions: a 5 = g ( W 3 , 5 · a 3 + W 4 , 5 · a 4 ) = g ( W 3 , 5 · g ( W 1 , 3 · a 1 + W 2 , 3 · a 2 ) + W 4 , 5 · g ( W 1 , 4 · a 1 + W 2 , 4 · a 2 )) Chapter 20 9

Perceptrons Perceptron output 1 0.8 0.6 0.4 0.2 -4 -2 0 2 4 0 x 2 -4 Output Input -2 0 W j,i x 1 2 4 Units Units Chapter 20 10

Expressiveness of perceptrons Consider a perceptron with g = step function (Rosenblatt, 1957, 1960) Can represent AND, OR, NOT, majority, etc. Represents a linear separator in input space: Σ j W j x j > 0 or W · x > 0 I I I 1 1 1 1 1 1 ? 0 0 0 I I I 0 1 0 1 0 1 2 2 2 I xor I I I I I and or (a) (b) (c) 1 2 1 2 1 2 Chapter 20 11

Perceptron learning Learn by adjusting weights to reduce error on training set The squared error for an example with input x and true output y is E = 1 2 Err 2 ≡ 1 2( y − h W ( x )) 2 Chapter 20 12

Perceptron learning Learn by adjusting weights to reduce error on training set The squared error for an example with input x and true output y is E = 1 2 Err 2 ≡ 1 2( y − h W ( x )) 2 Perform optimization search by gradient descent: ∂E =? ∂W j Chapter 20 13

Perceptron learning Learn by adjusting weights to reduce error on training set The squared error for an example with input x and true output y is E = 1 2 Err 2 ≡ 1 2( y − h W ( x )) 2 Perform optimization search by gradient descent: ∂E = Err × ∂ Err ∂ y − g ( Σ n � � = Err × j = 0 W j x j ) ∂W j ∂W j ∂W j Chapter 20 14

Perceptron learning Learn by adjusting weights to reduce error on training set The squared error for an example with input x and true output y is E = 1 2 Err 2 ≡ 1 2( y − h W ( x )) 2 Perform optimization search by gradient descent: ∂E = Err × ∂ Err ∂ y − g ( Σ n � � = Err × j = 0 W j x j ) ∂W j ∂W j ∂W j = − Err × g ′ ( in ) × x j Chapter 20 15

Perceptron learning Learn by adjusting weights to reduce error on training set The squared error for an example with input x and true output y is E = 1 2 Err 2 ≡ 1 2( y − h W ( x )) 2 Perform optimization search by gradient descent: ∂E = Err × ∂ Err ∂ y − g ( Σ n � � = Err × j = 0 W j x j ) ∂W j ∂W j ∂W j = − Err × g ′ ( in ) × x j Simple weight update rule: W j ← W j + α × Err × g ′ ( in ) × x j E.g., +ve error ⇒ increase network output ⇒ increase weights on +ve inputs, decrease on -ve inputs Chapter 20 16

Perceptron learning W = random initial values for iter = 1 to T for i = 1 to N (all examples) � x = input for example i y = output for example i W old = W Err = y − g ( W old · � x ) for j = 1 to M (all weights) W j = W j + α · Err · g ′ ( W old · � x ) · x j Chapter 20 17

Perceptron learning contd. Derivative of sigmoid g ( x ) can be written in simple form: 1 g ( x ) = 1 + e − x g ′ ( x ) = ? Chapter 20 18

Perceptron learning contd. Derivative of sigmoid g ( x ) can be written in simple form: 1 g ( x ) = 1 + e − x e − x g ′ ( x ) = (1 + e − x ) 2 = e − x g ( x ) 2 Also, 1 + e − x ⇒ g ( x ) + e − x g ( x ) = 1 ⇒ e − x = 1 − g ( x ) 1 g ( x ) = g ( x ) So g ′ ( x ) = 1 − g ( x ) g ( x ) 2 g ( x ) = (1 − g ( x )) g ( x ) Chapter 20 19

Perceptron learning contd. Perceptron learning rule converges to a consistent function for any linearly separable data set Proportion correct on test set Proportion correct on test set 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 Perceptron 0.6 Decision tree 0.5 0.5 Perceptron Decision tree 0.4 0.4 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Training set size - MAJORITY on 11 inputs Training set size - RESTAURANT data Chapter 20 20

Multilayer networks Layers are usually fully connected; numbers of hidden units typically chosen by hand a i Output units W j,i a j Hidden units W k,j a k Input units Chapter 20 21

Expressiveness of MLPs All continuous functions w/ 1 hidden layer, all functions w/ 2 hidden layers h W ( x 1 , x 2 ) h W ( x 1 , x 2 ) 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 -4 -2 0 2 4 -4 -2 0 2 4 0 0 x 2 x 2 -4 -4 -2 -2 0 0 x 1 x 1 2 2 4 4 Chapter 20 22

Training a MLP In general have n output nodes, E ≡ 1 i Err 2 i , � 2 where Err i = ( y i − a i ) and i runs over all nodes in the output layer. � Need to calculate ∂E ∂W ij for any W ij . Chapter 20 23

Training a MLP cont. Can approximate derivatives by: f ′ ( x ) ≈ f ( x + h ) − f ( x ) h ∂E ( W ) ≈ E ( W + (0 , . . . , h, . . . , 0)) − E ( W ) ∂W ij h What would this entail for a network with n weights? Chapter 20 24

Training a MLP cont. Can approximate derivatives by: f ′ ( x ) ≈ f ( x + h ) − f ( x ) h ∂E ( W ) ≈ E ( W + (0 , . . . , h, . . . , 0)) − E ( W ) ∂W ij h What would this entail for a network with n weights? - one iteration would take O ( n 2 ) time Complicated networks have tens of thousands of weights, O ( n 2 ) time is intractable. Back-propagation is a recursive method of calculating all of these derivatives in O ( n ) time. Chapter 20 25

Back-propagation learning In general have n output nodes, E ≡ 1 i Err 2 i , � 2 where Err i = ( y i − a i ) and i runs over all nodes in the output layer. � Output layer: same as for single-layer perceptron, W j,i ← W j,i + α × a j × ∆ i where ∆ i = Err i × g ′ ( in i ) Hidden layers: back-propagate the error from the output layer: ∆ j = g ′ ( in j ) i W j,i ∆ i . � Update rule for weights in hidden layers: W k,j ← W k,j + α × a k × ∆ j . Chapter 20 26

Back-propagation derivation For a node i in the output layer: ∂E = − ( y i − a i ) ∂a i ∂W j,i ∂W j,i Chapter 20 27

Back-propagation derivation For a node i in the output layer: ∂E = − ( y i − a i ) ∂a i = − ( y i − a i ) ∂g ( in i ) ∂W j,i ∂W j,i ∂W j,i Chapter 20 28

Back-propagation derivation For a node i in the output layer: ∂E = − ( y i − a i ) ∂a i = − ( y i − a i ) ∂g ( in i ) ∂W j,i ∂W j,i ∂W j,i = − ( y i − a i ) g ′ ( in i ) ∂ in i ∂W j,i Chapter 20 29

Back-propagation derivation For a node i in the output layer: ∂E = − ( y i − a i ) ∂a i = − ( y i − a i ) ∂g ( in i ) ∂W j,i ∂W j,i ∂W j,i = − ( y i − a i ) g ′ ( in i ) ∂ in i ∂   = − ( y i − a i ) g ′ ( in i ) k W k,i a j �     ∂W j,i ∂W j,i Chapter 20 30

Back-propagation derivation For a node i in the output layer: ∂E = − ( y i − a i ) ∂a i = − ( y i − a i ) ∂g ( in i ) ∂W j,i ∂W j,i ∂W j,i = − ( y i − a i ) g ′ ( in i ) ∂ in i ∂   = − ( y i − a i ) g ′ ( in i ) k W k,i a j �     ∂W j,i ∂W j,i = − ( y i − a i ) g ′ ( in i ) a j = − a j ∆ i where ∆ i = ( y i − a i ) g ′ ( in i ) Chapter 20 31

Back-propagation derivation: hidden layer For a node j in a hidden layer: ∂E = ? ∂W k,j Chapter 20 32

“Reminder”: Chain rule for partial derivatives For f ( x, y ) , with f differentiable wrt x and y , and x and y differentiable wrt u and v : ∂f ∂f ∂x ∂u + ∂f ∂y = ∂u ∂x ∂y ∂u and ∂f ∂f ∂x ∂v + ∂f ∂y = ∂v ∂x ∂y ∂v Chapter 20 33

Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural - PowerPoint PPT Presentation

Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms10ms cycle

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

CS3350B Computer Organization Chapter 4: Instruction-Level Parallelism Part 1: Pipelining Alex

Cycle 4c: R-type result write (add, and) Inf2C Computer Systems - 2013-2014 19 What happens in

Processor Pipeline Nima Honarmand Spring 2016 :: CSE 502 Computer Architecture Generic

Continuous Improvement Toolkit Lean Measures Continuous Improvement Toolkit . www.citoolkit.com

Dancing Back to 1914 Rewind Dance Dance 1 20/11/15 2 20/11/15 Heritage g 3 20/11/15 4

Sliding-Block back analysis of Earthquake-Induced Slides. Article January 2000 DOI:

DUNE Near Detector: Perspective from NDDG A. D. Bross (FNAL), H.A. Tanaka (SLAC/Stanford) for the

Approche Algorithmique des Syst` emes Distribu es (AASR) Guillaume Pierre