Overview Multi-layer networks: Cognitive Modeling limits of single - - PowerPoint PPT Presentation

overview
SMART_READER_LITE
LIVE PREVIEW

Overview Multi-layer networks: Cognitive Modeling limits of single - - PowerPoint PPT Presentation

Overview Multi-layer networks: Cognitive Modeling limits of single layer networks; Lecture 12: Connectionist Networks: multi-layer networks: solution to XOR; Multi-layer Networks; Backpropagation properties of multi-layer networks;


slide-1
SLIDE 1

Cognitive Modeling

Lecture 12: Connectionist Networks: Multi-layer Networks; Backpropagation

Frank Keller School of Informatics University of Edinburgh

keller@inf.ed.ac.uk

Cognitive Modeling: Multi-layer Networks; Backpropagation – p.1

Overview

Multi-layer networks:

limits of single layer networks; multi-layer networks: solution to XOR; properties of multi-layer networks; training multi-layer networks: backpropagation.

Reading: McLeod et al. (1998, Ch. 5).

Cognitive Modeling: Multi-layer Networks; Backpropagation – p.2

2-D Representation of Boolean Funct.

Visualize the relationship between inputs (plotted in 2-D space) and desired output (the line dividing the space): XOR problem is not linearly separable. Single-layer networks can

  • nly

represent linearly separa- ble problems.

Cognitive Modeling: Multi-layer Networks; Backpropagation – p.3

Solving XOR with Hidden Units

Consider the following network:

3-layer, feedforward; 2 units in a hidden layer; hidden and output units are threshold units: θ = 1.

Representations at hidden layer: Input Hidden Target

h1 h2

0 0 1 0 1 1 0 1 1 1 1 1

Cognitive Modeling: Multi-layer Networks; Backpropagation – p.4

slide-2
SLIDE 2

Solving XOR with Hidden Units

Problem: current learning rules cannot be used for hidden units:

we don’t know what the error is at these nodes; delta rule requires that we know the desired activation: ∆w = 2εδf ∗ain

Solution: algorithm based on:

forward propagation of activity and backpropagation of error through the network.

Cognitive Modeling: Multi-layer Networks; Backpropagation – p.5

Backpropagation of Error

(a) Forward propagation of activity:

netout = ∑wahidden aout = f(netout)

(b) Backward propagation of error:

nethidden = ∑wδout δhidden = f(nethidden)

Cognitive Modeling: Multi-layer Networks; Backpropagation – p.6

Learning in Multi-layer Networks

The generalized Delta rule:

∆wi j = ηδiaj δi = f ′(neti)(ti −ai)

for output nodes

δi = f ′(neti)∑k δkwki

for hidden nodes where f ′(neti) = ai(1−ai) Multi-layer networks can, in principle, learn any mapping function: not constrained to problems which are linearly separable. While there exists a solution for any mapping problem, backpropagation is not guaranteed to find it (unlike the perceptron convergence rule).

Cognitive Modeling: Multi-layer Networks; Backpropagation – p.7

Learning in Multi-layer Networks

Reason why backpropagation sometimes fails to find the correction mapping function: It gets stuck in a local minimum instead of finding the global minimum: backprop trapped here global minimum is here

Cognitive Modeling: Multi-layer Networks; Backpropagation – p.8

slide-3
SLIDE 3

Example of Backpropagation

Consider the following network, containing hidden nodes. Calculate the weight changes for both layers of the network, assuming targets of: 1 1 The generalized Delta rule:

∆wi j = ηδiaj δi = f ′(neti)(ti −ai)

for output nodes

δi = f ′(neti)∑k δkwki

for hidden nodes where f ′(neti) = ai(1−ai)

Cognitive Modeling: Multi-layer Networks; Backpropagation – p.9

Example of Backpropagation

Cognitive Modeling: Multi-layer Networks; Backpropagation – p.10

Some Comments

Single layer networks (perceptrons):

can only solve problems which are linearly separable; but a solution is guaranteed by the perceptron convergence rule.

Multi-layer networks (with hidden units):

can in principle solve any input-output mapping function; backpropagation performs a gradient descent of the error surface; can get caught in a local minimum; cannot guarantee to find the solution.

Finding solutions:

manipulate learning rule parameters: learning rate, momentum; brute force search (sampling) of the error surface to find a set of

starting position in weight space;

computationally impractical for complex networks.

Cognitive Modeling: Multi-layer Networks; Backpropagation – p.11

Biological Plausibility

Backpropagation requires bi-directional signals:

forward propagation of activation and backward propagation of

error;

nodes must ‘know’ the strengths of all synaptic connections to

compute error: non-local.

But: axons are uni-directional transmitters. Possible justification: backpropagation explains what is learned, not how. Network architecture:

successful learning crucially depends on number of hidden units; there is no way to know, a priori, what that number is.

Alternative solution: use a network with a local learning rule, e.g., Hebbian learning.

Cognitive Modeling: Multi-layer Networks; Backpropagation – p.12

slide-4
SLIDE 4

Summary

There are simple Boolean functions (e.g., XOR) that a

single layer perceptron can’t represent;

hidden layers need to be introduced to fix this problem; the perceptron learning rule needs to be extended to the

generalized delta rule;

it performs forward propagation of activity and

backpropagation of error through the network;

it is not guaranteed to find a global minimum of the error

function; might get stuck in local minima;

limited biological plausibility of backpropagation.

Cognitive Modeling: Multi-layer Networks; Backpropagation – p.13

References

McLeod, Peter, Kim Plunkett, and Edmund T. Rolls. 1998. Introduction to Connectionist Modelling of Cognitive Processes. Oxford: Oxford University Press. Plaut, David C., James L. McClelland, Mark S. Seidenberg, and Karalyn Patterson. 1996. Understanding normal and impaired word reading: Computational principles in quasi-regular domains. Psychological Review 103: 56–115. Seidenberg, Mark S., and James L. McClelland. 1989. A distributed developmental model of word recognition and naming. Psychological Review 96: 523–568.

Cognitive Modeling: Multi-layer Networks; Backpropagation – p.14