MLPs with Backpropagation CS 472 Backpropagation 1 Multilayer - PowerPoint PPT Presentation

MLPs with Backpropagation CS 472 – Backpropagation 1

Multilayer Nets? Linear Systems F(cx) = cF(x) F(x+y) = F(x) + F(y) I N M Z Z = (M(NI)) = (MN)I = PI CS 472 – Backpropagation 2

Early Attempts Committee Machine Randomly Connected V o t e T a k ing TLU (Adaptive) (non-adaptive) Majority Logic "Least Perturbation Principle" For each pattern, if incorrect, change just enough weights into internal units to give majority. Choose those closest to CS 472 – Backpropagation 3 their threshold (LPP & changing undecided nodes)

Perceptron (Frank Rosenblatt) Simple Perceptron S - U n i t s A - u n i t s R - u n i t s ( S e n s o r ) (Association) (Response) Random to A-units fixed weights adaptive Variations on Delta rule learning Why S-A units? CS 472 – Backpropagation 4

Backpropagation l Rumelhart (1986), Werbos (74),…, explosion of neural net interest l Multi-layer supervised learning l Able to train multi-layer perceptrons (and other topologies) l Uses differentiable sigmoid function which is the smooth (squashed) version of the threshold function l Error is propagated back through earlier layers of the network l Very fast efficient way to compute gradients! CS 472 – Backpropagation 5

Multi-layer Perceptrons trained with BP l Can compute arbitrary mappings l Training algorithm less obvious l First of many powerful multi-layer learning algorithms CS 472 – Backpropagation 6

Responsibility Problem Output 1 Wanted 0 CS 472 – Backpropagation 7

Multi-Layer Generalization CS 472 – Backpropagation 8

Multilayer nets are universal function approximators l Input, output, and arbitrary number of hidden layers l 1 hidden layer sufficient for DNF representation of any Boolean function - One hidden node per positive conjunct, output node set to the “Or” function l 2 hidden layers allow arbitrary number of labeled clusters l 1 hidden layer sufficient to approximate all bounded continuous functions l 1 hidden layer was the most common in practice, but recently… Deep networks show excellent results! CS 472 – Backpropagation 9

z n 2 n 1 x 2 x 1 (1,1) (0,1) (0,1) (1,1) x 2 x 2 (0,0) (1,0) (1,0) (0,0) x 1 x 1 (0,1) (1,1) n 2 (1,0) (0,0) n 1 CS 472 – Backpropagation 10

Backpropagation l Multi-layer supervised learner l Gradient descent weight updates l Sigmoid activation function (smoothed threshold logic) l Backpropagation requires a differentiable activation function CS 472 – Backpropagation 11

1 0 .99 .01 CS 472 – Backpropagation 12

Multi-layer Perceptron (MLP) Topology i k i j k i k i Input Layer Hidden Layer(s) Output Layer CS 472 – Backpropagation 13

Backpropagation Learning Algorithm l Until Convergence (low error or other stopping criteria) do – Present a training pattern – Calculate the error of the output nodes (based on T - Z ) – Calculate the error of the hidden nodes (based on the error of the output nodes which is propagated back to the hidden nodes) – Continue propagating error back until the input layer is reached – Then update all weights based on the standard delta rule with the appropriate error function d D w ij = C d j Z i CS 472 – Backpropagation 14

Activation Function and its Derivative l Node activation function f(net) is commonly the sigmoid 1 1 Z f ( net ) .5 = = j j net − 1 e j + 0 -5 0 5 Net l Derivative of activation function is a critical part of the algorithm .25 f '( net j ) = Z j (1 − Z j ) 0 -5 5 0 Net CS 472 – Backpropagation 15

Backpropagation Learning Equations w C Z Δ = δ ij j i ( T Z ) f ' ( net ) [Output Node] δ = − j j j j ( w ) f ' ( net ) [Hidden Node] ∑ δ = δ j k jk j k i k i j k i k i CS 472 – Backpropagation 16

Network Equations 1 Output: O j = f(net j ) = 1+e -netj f'(net j ) = ∂ O j ∂ net j = O j (1 - O j ) w ij (general node): C O i j w ij (output node): j = (t j - O j ) f'(net j ) w ij = C O i j = C O i (t j - O j ) f'(net j ) w ij (hidden node) j = ∑ ( k • w jk ) f'(net j ) k w ij = C O i j = C O i ( ∑ ( k • w jk ) ) f'(net j ) k CS 472 – Backpropagation 17

BP-1) A 2-2-1 backpropagation model has initial weights as shown. Work through one cycle of learning for the f ollowing pattern(s). Assume 0 momentum and a learning constant of 1. Round calculations to 3 significant digits to the right of the decimal. Give values for all nodes and links for activation, output, error signal, weight delta, and final weights. Nodes 4, 5, 6, and 7 are just input nodes and do not have a sigmoidal output. For each node calculate the following (show necessary equati on for each). Hint: Calculate bottom-top-bottom. a = o = = w = w = 1 4 2 3 +1 7 5 6 +1 a) All weights initially 1.0 Training Patterns 1) 0 0 -> 1 2) 0 1 -> 0 CS 472 – Backpropagation 18

BP-1) net2 = wi xi = (1*0 + 1*0 + 1*1) = 1 net3 = 1 o2 = 1/(1+e-net) = 1/(1+e-1) = 1/(1+.368) = .731 o3 = .731 o4 = 1 net1 = (1*.731 + 1*.731 + 1) = 2.462 o1 = 1/(1+e-2.462)= .921 1 = (t1 - o1) o1 (1 - o1) = (1 - .921) .921 (1 - .921) = .00575 w21 = j oi = 1 o2 = 1 * .00575 * .731 = .00420 w31 = 1 * .00575 * .731 = .00420 w41 = 1 * .00575 * 1 = .00575 2 = oj (1 - oj) k wjk = o2 (1 - o2) 1 w21 = .731 (1 - .731) (.00575 * 1) = .00113 3 = .00113 w52 = j oi = 2 o5 = 1 * .00113 * 0 = 0 w62 = 0 w72 = 1 * .00113 * 1 = .00113 w53 = 0 w63 = 0 w73 = 1 * .00113 * 1 = .00113 1 4 2 3 +1 7 5 6 CS 472 – Backpropagation 19 +1

Backprop Homework l For your homework update the weights for the second pattern of the training set 0 1 -> 0 l And then go to link below: Neural Network Playground using the tensorflow tool and play around with the BP simulation. Try different training sets, layers, inputs, etc. and get a feel for what the nodes are doing. You do not have to hand anything in for this part. l http://playground.tensorflow.org/ CS 472 – Backpropagation 20

Activation Function and its Derivative l Node activation function f(net) is commonly the sigmoid 1 1 Z f ( net ) .5 = = j j net − 1 e j + 0 -5 0 5 Net l Derivative of activation function is a critical part of the algorithm .25 f '( net j ) = Z j (1 − Z j ) 0 -5 5 0 Net CS 472 – Backpropagation 21

Inductive Bias & Intuition l Node Saturation - Avoid early, but all right later – When saturated, an incorrect output node will still have low error – Start with weights close to 0 – Saturated error even when wrong? – Multiple TSS drops – Not exactly 0 weights (can get stuck), random small Gaussian with 0 mean – Can train with target/error deltas (e.g. .1 and .9 instead of 0 and 1) l Intuition – Manager/Worker Interaction – Gives some stability l Inductive Bias – Start with simple net (small weights, initially linear changes) – Smoothly build a more complex surface until stopping criteria CS 472 – Backpropagation 22

Multi-layer Perceptron (MLP) Topology i k i j k i k i Input Layer Hidden Layer(s) Output Layer CS 472 – Backpropagation 23

Local Minima l Most algorithms which have difficulties with simple tasks get much worse with more complex tasks l Good news with MLPs l Many dimensions make for many descent options l Local minima more common with simple/toy problems, rare with larger problems and larger nets l Even if there are occasional minima problems, could simply train multiple times and pick the best l Some algorithms add noise to the updates to escape minima CS 472 – Backpropagation 24

Local Minima and Neural Networks l Neural Network can get stuck in local minima for small networks, but for most large networks (many weights), local minima rarely occur in practice l This is because with so many dimensions of weights it is unlikely that we are in a minima in every dimension simultaneously – almost always a way down CS 472 – Backpropagation 25

Stopping Criteria and Overfit Avoidance SSE Validation/Test Set Training Set Epochs More Training Data (vs. overtraining - One epoch limit) l Validation Set - save weights which do best job so far on the validation set. l Keep training for enough epochs to be fairly sure that no more improvement will occur (e.g. once you have trained m epochs with no further improvement, stop and use the best weights so far, or retrain with all data). – Note: If using N -way CV with a validation set, do n runs with 1 of n data partitions as a validation set. Save the number i of training epochs for each run. To get a final model you can train on all the data and stop after the average number of epochs, or a little less than the average since there is more data. Specific BP techniques for avoiding overfit l Less hidden nodes not a great approach because may underfit – Weight decay (later), Error deltas, Dropout (discuss with ensembles) – CS 472 – Backpropagation 26

MLPs with Backpropagation CS 472 Backpropagation 1 Multilayer - PowerPoint PPT Presentation

MLPs with Backpropagation CS 472 Backpropagation 1 Multilayer Nets? Linear Systems F(cx) = cF(x) F(x+y) = F(x) + F(y) I N M Z Z = (M(NI)) = (MN)I = PI CS 472 Backpropagation 2 Early Attempts Committee Machine Randomly

Connecting the Employee Roster and MLPS ER Enabling MLPS Ambika Bachan Associate

Perceptrons Introduction: Neural Networks 1 The Perceptron 2 Using Perceptrons Perceptrons

Backpropagation Why backpropagation Neural networks are sequences of parametrized functions

CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1

Scanning for patterns (aka convolutional networks) Bhiksha Raj 1 Story so far MLPs are

Perceptrons Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction: Neural Networks The

Multi-Layered Perceptrons (MLPs) The XOR problem is solvable if we add an extra node

Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey

Backpropagation Matt Gormley Lecture 12 Oct 10, 2018 1 Q&A 3 BACKPROPAGATION 4 A

Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic

Lecture 11: Multi-layer Perceptron Forward Pass Backpropagation Aykut Erdem November

GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY Marco Gori SAILAB,

Lecture 8: Convolutional Neural Networks 1 CS109B Data Science 2 Pavlos Protopapas and Mark

CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons,

Agenda 1) Recap of Process- Where are We? 2) Research with other MLPs- Results 3) Review of

Backpropagation and Gradient Descent Brian Carignan, Dec 5 2016 Overview Notation/background

1 I. RESPONSE OF THE NEURON TO INJURY (summary) If the axon is damaged, A. All neurons -

Neural networks Slides adapted from Stuart Russell Slides adapted from Stuart Russell 1 Brains

Neural Networks Overview CS89.11/189.2 - Spring 2020 Our Neurons Our Neurons Dendrites Our

STEP 5. Slides of a neuron, nerve, and spinal cord #1 Neuron slide This step can wait until

An introduction to CQRS and Axon Framework Finances forgotten treasure Allard Buijze

Python + NEURON Interpreter HOC Section Neuron specific syntax Range Variable Mechanism

Chemaxon Tools Julian Fowler Chemaxon UGM - September 26 th , 2012 Example Collaboration

CQRS with Axon Framework An introduction to scalable architectures Allard Buijze

MLPs with Backpropagation CS 472 Backpropagation 1 Multilayer - PowerPoint PPT Presentation

MLPs with Backpropagation CS 472 Backpropagation 1 Multilayer Nets? Linear Systems F(cx) = cF(x) F(x+y) = F(x) + F(y) I N M Z Z = (M(NI)) = (MN)I = PI CS 472 Backpropagation 2 Early Attempts Committee Machine Randomly

Connecting the Employee Roster and MLPS ER Enabling MLPS Ambika Bachan Associate

Perceptrons Introduction: Neural Networks 1 The Perceptron 2 Using Perceptrons Perceptrons

Backpropagation Why backpropagation Neural networks are sequences of parametrized functions

CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1

Scanning for patterns (aka convolutional networks) Bhiksha Raj 1 Story so far MLPs are

Perceptrons Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction: Neural Networks The

Multi-Layered Perceptrons (MLPs) The XOR problem is solvable if we add an extra node

Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey

Backpropagation Matt Gormley Lecture 12 Oct 10, 2018 1 Q&amp;A 3 BACKPROPAGATION 4 A

Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic

Lecture 11: Multi-layer Perceptron Forward Pass Backpropagation Aykut Erdem November

GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY Marco Gori SAILAB,

Lecture 8: Convolutional Neural Networks 1 CS109B Data Science 2 Pavlos Protopapas and Mark

CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons,

Agenda 1) Recap of Process- Where are We? 2) Research with other MLPs- Results 3) Review of

Backpropagation and Gradient Descent Brian Carignan, Dec 5 2016 Overview Notation/background

1 I. RESPONSE OF THE NEURON TO INJURY (summary) If the axon is damaged, A. All neurons -

Neural networks Slides adapted from Stuart Russell Slides adapted from Stuart Russell 1 Brains

Neural Networks Overview CS89.11/189.2 - Spring 2020 Our Neurons Our Neurons Dendrites Our

STEP 5. Slides of a neuron, nerve, and spinal cord #1 Neuron slide This step can wait until

An introduction to CQRS and Axon Framework Finances forgotten treasure Allard Buijze

Python + NEURON Interpreter HOC Section Neuron specific syntax Range Variable Mechanism

Chemaxon Tools Julian Fowler Chemaxon UGM - September 26 th , 2012 Example Collaboration

CQRS with Axon Framework An introduction to scalable architectures Allard Buijze

Backpropagation Matt Gormley Lecture 12 Oct 10, 2018 1 Q&A 3 BACKPROPAGATION 4 A