Perceptrons Steven J Zeil Old Dominion Univ. Fall 2010 1 - - PowerPoint PPT Presentation

perceptrons
SMART_READER_LITE
LIVE PREVIEW

Perceptrons Steven J Zeil Old Dominion Univ. Fall 2010 1 - - PowerPoint PPT Presentation

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Perceptrons Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs


slide-1
SLIDE 1

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Perceptrons

Steven J Zeil

Old Dominion Univ.

Fall 2010

1

slide-2
SLIDE 2

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Perceptrons

1

Introduction: Neural Networks

2

The Perceptron Using Perceptrons Training

3

Multilayer Perceptrons Structure

4

Training MLPs Backpropagation Improving Convergence OverTraining Tuning Network Size

5

Applying MLPs Structuring Networks Dimensionality Reduction Time Delay Neural Networks Recurrent Networks

2

slide-3
SLIDE 3

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Neural Networks

Networks of processing units (neurons) with connections (synapses) between them Large number of neurons: 1010 Large connectitivity: 105 Parallel processing Robust

3

slide-4
SLIDE 4

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Computing via NN

Not so much an attempt to imitate the brain as inspired by it A model for massive parallel processing Simplest building block: the perceptron

4

slide-5
SLIDE 5

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

The Perceptron

1

Introduction: Neural Networks

2

The Perceptron Using Perceptrons Training

3

Multilayer Perceptrons Structure

4

Training MLPs Backpropagation Improving Convergence OverTraining Tuning Network Size

5

Applying MLPs Structuring Networks Dimensionality Reduction Time Delay Neural Networks Recurrent Networks

5

slide-6
SLIDE 6

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Perceptron

Rosenblatt, 1962 wi are connection weights y = wT x

  • w = [w0, w1, . . . , wd]
  • x = [1, x1, x2, . . . , wd]

(augmented input vector)

6

slide-7
SLIDE 7

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Basic Uses

y = wT x + w0 Linear regression Linear discriminant between 2 classes

Use multiple perceptrons for K > 2 classes

7

slide-8
SLIDE 8

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Perceptron Output Functions

Many perceptrons have a “post-processing” function at the output node. A common choice is the threshold: y = 1 if wT x > 0

  • w

Useful for classification.

8

slide-9
SLIDE 9

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Sigmoid Output Functions

Useful when we need differentiation or the ability to estimate posterior probs. y = sigmoid(o) = 1 1 + exp [− wT x]

9

slide-10
SLIDE 10

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

K Classes

  • i =

wT

i

x Use softmax: yi = exp oi

  • k exp ok

Choose Ci if yi = maxk yk

10

slide-11
SLIDE 11

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Training

Allows online (incremental) training rather than the usual batch

No need to store whole sample Adjusts to slow changes in the problem domain

Incremental form of gradient-descent: update in direction of gradient after each training instance LMS update: ∆wt

ij = η

  • rt

i − yt i

  • xt

j

η is the learning factor - size controls rate of convergence and stability

11

slide-12
SLIDE 12

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Update Rule: Regression

Error function is E t( w| xt, rt) = 1 2(rt − wT xt)2 with gradient components ∂E t ∂wt

i

= −(rt − wT xt)xt

i

= −(rt − yt)xt

i

Therefore to move in the direction of the gradient ∆wt

ij = η

  • rt

i − yt i

  • xt

j

12

slide-13
SLIDE 13

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Update Rule: Classification

∆wt

ij = η

  • rt

i − yt i

  • xt

j

For K=2, yt = sigmoid( wT x) leads to same update function as for regression For K > 2 softmax leads to same update as well.

13

slide-14
SLIDE 14

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Example: Learning Boolean Functions

Example: spreadsheet demonstrates that perceptrons can learn linearly separable functions (AND, OR, NAND, . . . ) but cannot learn XOR

Minsky & papert, 1969 Nearly halted all work on neural networks until 1982

14

slide-15
SLIDE 15

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Multilayer Perceptrons

1

Introduction: Neural Networks

2

The Perceptron Using Perceptrons Training

3

Multilayer Perceptrons Structure

4

Training MLPs Backpropagation Improving Convergence OverTraining Tuning Network Size

5

Applying MLPs Structuring Networks Dimensionality Reduction Time Delay Neural Networks Recurrent Networks

15

slide-16
SLIDE 16

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Multilayer Perceptrons

Adds one or more hidden layers yi = vT z =

H

  • h=1

vihzh + vi0 zh = sigmoid( wT x) = 1 1 + exp

d

j=1 whjxj + wh

(Rumelhart et al. 1986)

16

slide-17
SLIDE 17

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Learning XOR

17

slide-18
SLIDE 18

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

MLP as a Universal Approximator

Any function with continuous inputs and outputs can be approximated by an MLP Given two hidden layers, can use one to divide input domain and the other to compute a piecewise linear regression function Hidden layers may need to be arbitrarily wide

18

slide-19
SLIDE 19

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Training MLPs

1

Introduction: Neural Networks

2

The Perceptron Using Perceptrons Training

3

Multilayer Perceptrons Structure

4

Training MLPs Backpropagation Improving Convergence OverTraining Tuning Network Size

5

Applying MLPs Structuring Networks Dimensionality Reduction Time Delay Neural Networks Recurrent Networks

19

slide-20
SLIDE 20

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Training MLPs: Backpropagation

yi = vT z =

H

  • h=1

vihzh + vi0 zh = sigmoid( wT x) = 1 1 + exp

d

j=1 whjxj + wh

Given the z values, we could train the v as we do a single-layer perceptron. ∆vh = η

  • t

(rt − yt)zt

h

20

slide-21
SLIDE 21

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Backpropagation (cont.)

∆whj = −η ∂E ∂whj = ∂E ∂yi ∂yi ∂zh ∂zh ∂whj = −η

  • t

−(rt − yt) ∂yi ∂zh ∂zh ∂whj = −η

  • t

−(rt − yt) vh ∂zh ∂whj = −η

  • t

−(rt − yt) vh zt

h(1 − zt h)xt j

21

slide-22
SLIDE 22

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Backpropagation Algorithm

Initialize all vih and whj to rand(-0.01,0.01) repeat for all ( xt, rt) ∈ X in random order do for h=1, . . . , H do zh ← sigmoid(˜ wT

h ˜

xt) end for for i=1, . . . , K do yi ← vT

i

z end for for i=1, . . . , K do ∆ vi ← η(rt

i − yt i )

z end for for h=1, . . . , H do ∆ wh ← η(

i (rt i − yt i )vih)zh(1 − zh)

xt end for for i=1, . . . , K do

  • vi ←

vi + ∆ vi end for for h=1, . . . , H do

  • wh ←

wh + ∆ wh end for end for until convergence 22

slide-23
SLIDE 23

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Applying Backpropagation

Batch learning: make multiple passes over entire sample

Update v and w after each entire pass Each pass is called an epoch

Online learning: one pass, smaller η

23

slide-24
SLIDE 24

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Example of Batch Learning

24

slide-25
SLIDE 25

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Multiple hidden Levels

Multiple hidden levels are possible Backpropagation generalizes to any nuimber of levels.

25

slide-26
SLIDE 26

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Improving Convergence

Momentum: Attempts to damp out oscillations by averaging in the “trend” of prior updates ∆wt

i = −η∂E t

∂wi + α∆wt−1

i

0.5 ≤ α < 1.0 Adaptive Leanring rate: Keep η large when learning is going

  • n, decreasing it later

∆η = +a if E t+τ < E t −bη

  • therwise

Note that increase is arithmetic, but decrease is geometric.

26

slide-27
SLIDE 27

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

OverTraining

MLPs are subject to overtaining partly due to large number of parameters but also is a function of training time

wi start near zero - in effect the paramters are ignored

27

slide-28
SLIDE 28

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

OverTraining

MLPs are subject to overtaining partly due to large number of parameters but also is a function of training time

wi start near zero - in effect the paramters are ignored Early training steps move the more important attributes’ weights away from zero

27

slide-29
SLIDE 29

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

OverTraining

MLPs are subject to overtaining partly due to large number of parameters but also is a function of training time

wi start near zero - in effect the paramters are ignored Early training steps move the more important attributes’ weights away from zero As training continues, we start fittign to noise by moving the weights of less important attributes away from zero

27

slide-30
SLIDE 30

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

OverTraining

MLPs are subject to overtaining partly due to large number of parameters but also is a function of training time

wi start near zero - in effect the paramters are ignored Early training steps move the more important attributes’ weights away from zero As training continues, we start fittign to noise by moving the weights of less important attributes away from zero

In effect, adding more parameters to the model over time.

27

slide-31
SLIDE 31

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Overtraining Example

28

slide-32
SLIDE 32

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Overtraining Example

29

slide-33
SLIDE 33

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Tuning Network Size

Destructive: remove units or connections that are unnecessary. Constructive: add units or connections to add performance

30

slide-34
SLIDE 34

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Destructive Tuning

Weight decay: Give each weight a tendency to decay towards zero unless it is refreshed by additional trianing examples: ∆wi = −η ∂E ∂wi − λwi Equivalent to gradient descent trianign with error function: E ′ = E + λ 2

  • i

w2

i

penalizing solutions with large numbers of non0-zero weights.

31

slide-35
SLIDE 35

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Constructive Tuning

Train initial small network. If error is high, add a hidden unit and retrain

Dynamic Node Creation Cascade Correlation

32

slide-36
SLIDE 36

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Dynamic Node Creation

Start with a hidden layer with one hidden unit New nodes added to that layer

Never increases the number of layers

Weights of new unit are started randomly Already-trained weights start from their trained values

33

slide-37
SLIDE 37

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Cascade Correlation

Each new node becomes the

  • nly node in a new layer

Connected to all of the existing hidden units and to all inputs

Weights of new unit are started randomly Already-trained weights are frozen at their trained values

34

slide-38
SLIDE 38

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Applying MLPs

1

Introduction: Neural Networks

2

The Perceptron Using Perceptrons Training

3

Multilayer Perceptrons Structure

4

Training MLPs Backpropagation Improving Convergence OverTraining Tuning Network Size

5

Applying MLPs Structuring Networks Dimensionality Reduction Time Delay Neural Networks Recurrent Networks

35

slide-39
SLIDE 39

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Structuring Networks

When we have knowledge of input structure (e.g., vision) pixels are arranged in rectangular arrays locally correlated structures (e.g., edges) are important A hierarchical cone

36

slide-40
SLIDE 40

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Weight Sharing

Take advantage of uniformity over a spatial dimension.

37

slide-41
SLIDE 41

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Hints

Prior knowledge of equivalent cases, e.g., invariance to common graphic transforms: Use to auto-expand training set (“virtual examples”) Reduce equivalent cases to a canonical form during pre-processing Incorporate into network structure (e.g., weight sharing) Augment the error function to penalize violations of the equivalence E ′ = E + λhEh where Eh = (g(x|θ) − g(x′|θ))2 if x′ ≃ x

  • w

38

slide-42
SLIDE 42

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Dimensionality Reduction

Looking at weights of trained MLP can give hints as to which input attributes are significant In any MLP, if the number of units in the first hidden layer is less than the number of inputs, we are doing diensionality reduction. In an auto-associator, we train an MLP to generate its own inputs.

Using an intermediate hidden layer of fewer units than the number of inputs.

39

slide-43
SLIDE 43

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Linear Auto-Associator

In essence, performs principal components analysis

Weight vectors span the same space and the principle eigenvectors

40

slide-44
SLIDE 44

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Non-Linear Auto-Associator

Nonlinear dimensionality reduction

41

slide-45
SLIDE 45

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Time Delay Neural Networks

For learning time sequences

42

slide-46
SLIDE 46

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Recurrent Networks

In effect, adds limited memory to MLPs Train by unfolding (similar to loop unrolling)

43

slide-47
SLIDE 47

Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs

Unfolding in Time

44