Neural Network Backpropagation 3-2-16 Recall from Monday... - - PowerPoint PPT Presentation

neural network backpropagation
SMART_READER_LITE
LIVE PREVIEW

Neural Network Backpropagation 3-2-16 Recall from Monday... - - PowerPoint PPT Presentation

Neural Network Backpropagation 3-2-16 Recall from Monday... Perceptrons can only classify linearly separable data. Multi-layer networks Can represent any boolean function. We dont want to build them by hand, so we need a way to


slide-1
SLIDE 1

Neural Network Backpropagation

3-2-16

slide-2
SLIDE 2

Recall from Monday...

Perceptrons can only classify linearly separable data.

slide-3
SLIDE 3

Multi-layer networks

  • Can represent any boolean function.
  • We don’t want to build them by hand, so we need a way to train them.
  • Algorithm: backpropagation.

○ You’ve already seen this in action in yesterday’s lab.

slide-4
SLIDE 4

Backpropagation networks

  • Backpropagation can be applied to

any directed acyclic neural network.

  • Activation functions must be

differentiable.

  • Activation functions should be non-

linear.

  • Layered networks allow training to be

parallelized within each layer.

  • 0.5

0.2 0.8

  • 1.2

3.0 0.1 1.5 2.7

  • 0.3
  • 1.6

sigmoid activation functions

2.2

OK not OK

  • 1.9
slide-5
SLIDE 5

Sigmoid activation functions

  • We want something like a threshold.

○ Neuron is inactive below the threshold; active above it.

  • We need something differentiable.

○ Required for gradient descent.

slide-6
SLIDE 6

Gradient descent

  • Define the squared error at each output node as:
  • Update weights to reduce error.

○ Take a step in the direction of steepest descent:

learning rate

w0 w1

derivative of error w.r.t. weight

slide-7
SLIDE 7

Computing the error gradient

… algebra ensues ...

slide-8
SLIDE 8

Gradient descent step for output nodes

1

  • 1

2 1.2

1.04

  • .97

2 1.2

slide-9
SLIDE 9

Backpropagation

Key idea: at hidden units, use the next-layer change instead of the error function.

  • Determine the node’s contribution to its successors.
  • Update incoming weights using this “error”

w0 w1

slide-10
SLIDE 10

Backpropagation algorithm

for 1:training runs for example in training_data: run example through network compute error for each output node for each layer (starting from output): for each node in layer: gradient descent update on incoming weights

slide-11
SLIDE 11

Exercise: run a backprop step on this network

  • 0.5

0.2 0.8

  • 1.2

3.0 0.1 1.5 2.7

  • 0.3
  • 1.6

2

  • 1

t = 0.1 t = 0.8