Perceptrons Introduction: Neural Networks 1 The Perceptron 2 - - PowerPoint PPT Presentation

perceptrons
SMART_READER_LITE
LIVE PREVIEW

Perceptrons Introduction: Neural Networks 1 The Perceptron 2 - - PowerPoint PPT Presentation

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Perceptrons Introduction: Neural Networks 1


slide-1
SLIDE 1

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Perceptrons

Steven J Zeil

Old Dominion Univ.

Fall 2010

1 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Perceptrons

1

Introduction: Neural Networks

2

The Perceptron Using Perceptrons Training

3

Multilayer Perceptrons Structure

4

Training MLPs Backpropagation Improving Convergence OverTraining Tuning Network Size

5

Applying MLPs Structuring Networks Dimensionality Reduction Time Delay Neural Networks Recurrent Networks

2 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Neural Networks

Networks of processing units (neurons) with connections (synapses) between them Large number of neurons: 1010 Large connectitivity: 105 Parallel processing Robust

3 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Computing via NN

Not so much an attempt to imitate the brain as inspired by it A model for massive parallel processing Simplest building block: the perceptron

4

slide-2
SLIDE 2

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

The Perceptron

1

Introduction: Neural Networks

2

The Perceptron Using Perceptrons Training

3

Multilayer Perceptrons Structure

4

Training MLPs Backpropagation Improving Convergence OverTraining Tuning Network Size

5

Applying MLPs Structuring Networks Dimensionality Reduction Time Delay Neural Networks Recurrent Networks

5 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Perceptron

Rosenblatt, 1962 wi are connection weights y = wT x

  • w = [w0, w1, . . . , wd]
  • x = [1, x1, x2, . . . , wd]

(augmented input vector)

6 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Basic Uses

y = wT x + w0 Linear regression Linear discriminant between 2 classes

Use multiple perceptrons for K > 2 classes

7 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Perceptron Output Functions

Many perceptrons have a “post-processing” function at the output node. A common choice is the threshold: y = 1 if wT x > 0

  • w

Useful for classification.

8

slide-3
SLIDE 3

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Sigmoid Output Functions

Useful when we need differentiation or the ability to estimate posterior probs. y = sigmoid(o) = 1 1 + exp [− wT x]

9 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

K Classes

  • i =

wT

i

x Use softmax: yi = exp oi

  • k exp ok

Choose Ci if yi = maxk yk

10 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Training

Allows online (incremental) training rather than the usual batch

No need to store whole sample Adjusts to slow changes in the problem domain

Incremental form of gradient-descent: update in direction of gradient after each training instance LMS update: ∆wt

ij = η

  • rt

i − yt i

  • xt

j

η is the learning factor - size controls rate of convergence and stability

11 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Update Rule: Regression

Error function is E t( w| xt, rt) = 1 2(rt − wT xt)2 with gradient components ∂E t ∂wt

i

= −(rt − wT xt)xt

i

= −(rt − yt)xt

i

Therefore to move in the direction of the gradient ∆wt

ij = η

  • rt

i − yt i

  • xt

j

12

slide-4
SLIDE 4

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Update Rule: Classification

∆wt

ij = η

  • rt

i − yt i

  • xt

j

For K=2, yt = sigmoid( wT x) leads to same update function as for regression For K > 2 softmax leads to same update as well.

13 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Example: Learning Boolean Functions

Example: spreadsheet demonstrates that perceptrons can learn linearly separable functions (AND, OR, NAND, . . . ) but cannot learn XOR

Minsky & papert, 1969 Nearly halted all work on neural networks until 1982

14 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Multilayer Perceptrons

1

Introduction: Neural Networks

2

The Perceptron Using Perceptrons Training

3

Multilayer Perceptrons Structure

4

Training MLPs Backpropagation Improving Convergence OverTraining Tuning Network Size

5

Applying MLPs Structuring Networks Dimensionality Reduction Time Delay Neural Networks Recurrent Networks

15 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Multilayer Perceptrons

Adds one or more hidden layers yi = vT z =

H

  • h=1

vihzh + vi0 zh = sigmoid( wT x) = 1 1 + exp

d

j=1 whjxj + wh0

  • (Rumelhart et al. 1986)

16

slide-5
SLIDE 5

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Learning XOR

17 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

MLP as a Universal Approximator

Any function with continuous inputs and outputs can be approximated by an MLP Given two hidden layers, can use one to divide input domain and the other to compute a piecewise linear regression function Hidden layers may need to be arbitrarily wide

18 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Training MLPs

1

Introduction: Neural Networks

2

The Perceptron Using Perceptrons Training

3

Multilayer Perceptrons Structure

4

Training MLPs Backpropagation Improving Convergence OverTraining Tuning Network Size

5

Applying MLPs Structuring Networks Dimensionality Reduction Time Delay Neural Networks Recurrent Networks

19 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Training MLPs: Backpropagation

yi = vT z =

H

  • h=1

vihzh + vi0 zh = sigmoid( wT x) = 1 1 + exp

d

j=1 whjxj + wh0

  • Given the z values, we could

train the v as we do a single-layer perceptron. ∆vh = η

  • t

(rt − yt)zt

h

How to get the W?

20

slide-6
SLIDE 6

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Backpropagation (cont.)

∆whj = −η ∂E ∂whj = ∂E ∂yi ∂yi ∂zh ∂zh ∂whj = −η

  • t

−(rt − yt) ∂yi ∂zh ∂zh ∂whj = −η

  • t

−(rt − yt) vh ∂zh ∂whj = −η

  • t

−(rt − yt) vh zt

h(1 − zt h)xt j

21 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Backpropagation Algorithm

Initialize all vih and whj to rand(-0.01,0.01) repeat for all ( xt, rt) ∈ X in random order do for h=1, . . . , H do zh ← sigmoid(˜ wT

h ˜

xt) end for for i=1, . . . , K do yi ← vT

i

z end for for i=1, . . . , K do ∆ vi ← η(rt

i − yt i )

z end for for h=1, . . . , H do ∆ wh ← η(

i (rt i − yt i )vih)zh(1 − zh)

xt end for for i=1, . . . , K do

  • vi ←

vi + ∆ vi end for for h=1, . . . , H do

  • wh ←

wh + ∆ wh end for end for until convergence 22 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Applying Backpropagation

Batch learning: make multiple passes over entire sample

Update v and w after each entire pass Each pass is called an epoch

Online learning: one pass, smaller η

23 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Example of Batch Learning

24

slide-7
SLIDE 7

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Multiple hidden Levels

Multiple hidden levels are possible Backpropagation generalizes to any nuimber of levels.

25 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Improving Convergence

Momentum: Attempts to damp out oscillations by averaging in the “trend” of prior updates ∆wt

i = −η∂E t

∂wi + α∆wt−1

i

0.5 ≤ α < 1.0 Adaptive Leanring rate: Keep η large when learning is going

  • n, decreasing it later

∆η = +a if E t+τ < E t −bη

  • therwise

Note that increase is arithmetic, but decrease is geometric.

26 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

OverTraining

MLPs are subject to overtaining partly due to large number of parameters but also is a function of training time

wi start near zero - in effect the paramters are ignored Early training steps move the more important attributes’ weights away from zero As training continues, we start fittign to noise by moving the weights of less important attributes away from zero

In effect, adding more parameters to the model over time.

27 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Overtraining Example

28

slide-8
SLIDE 8

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Overtraining Example

29 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Tuning Network Size

Destructive: remove units or connections that are unnecessary. Constructive: add units or connections to add performance

30 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Destructive Tuning

Weight decay: Give each weight a tendency to decay towards zero unless it is refreshed by additional trianing examples: ∆wi = −η ∂E ∂wi − λwi Equivalent to gradient descent trianign with error function: E ′ = E + λ 2

  • i

w2

i

penalizing solutions with large numbers of non0-zero weights.

31 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Constructive Tuning

Train initial small network. If error is high, add a hidden unit and retrain

Dynamic Node Creation Cascade Correlation

32

slide-9
SLIDE 9

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Dynamic Node Creation

Start with a hidden layer with one hidden unit New nodes added to that layer

Never increases the number of layers

Weights of new unit are started randomly Already-trained weights start from their trained values

33 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Cascade Correlation

Each new node becomes the

  • nly node in a new layer

Connected to all of the existing hidden units and to all inputs

Weights of new unit are started randomly Already-trained weights are frozen at their trained values

34 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Applying MLPs

1

Introduction: Neural Networks

2

The Perceptron Using Perceptrons Training

3

Multilayer Perceptrons Structure

4

Training MLPs Backpropagation Improving Convergence OverTraining Tuning Network Size

5

Applying MLPs Structuring Networks Dimensionality Reduction Time Delay Neural Networks Recurrent Networks

35 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Structuring Networks

When we have knowledge of input structure (e.g., vision) pixels are arranged in rectangular arrays locally correlated structures (e.g., edges) are important A hierarchical cone

36

slide-10
SLIDE 10

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Weight Sharing

Take advantage of uniformity over a spatial dimension.

37 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Hints

Prior knowledge of equivalent cases, e.g., invariance to common graphic transforms: Use to auto-expand training set (“virtual examples”) Reduce equivalent cases to a canonical form during pre-processing Incorporate into network structure (e.g., weight sharing) Augment the error function to penalize violations of the equivalence E ′ = E + λhEh where Eh = (g(x|θ) − g(x′|θ))2 if x′ ≃ x

  • w

38 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Dimensionality Reduction

Looking at weights of trained MLP can give hints as to which input attributes are significant In any MLP, if the number of units in the first hidden layer is less than the number of inputs, we are doing diensionality reduction. In an auto-associator, we train an MLP to generate its own inputs.

Using an intermediate hidden layer of fewer units than the number of inputs.

39 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Linear Auto-Associator

In essence, performs principal components analysis

Weight vectors span the same space and the principle eigenvectors

40

slide-11
SLIDE 11

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Non-Linear Auto-Associator

Nonlinear dimensionality reduction

41 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Time Delay Neural Networks

For learning time sequences

42 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Recurrent Networks

In effect, adds limited memory to MLPs Train by unfolding (similar to loop unrolling)

43 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Unfolding in Time

44