Neural Net Backpropagation
3/20/17
Neural Net Backpropagation 3/20/17 Recall: Limitations of - - PowerPoint PPT Presentation
Neural Net Backpropagation 3/20/17 Recall: Limitations of Perceptrons vs. AND and OR are linearly separable. XOR isnt What is the output of the network? ( 0 x < 0 f ( x ) = 1 x 0 1 f ( x ) = 1 + e x ( 0 x < 0 f ( x
3/20/17
AND and OR are linearly separable. XOR isn’t
f(x) = ( x < 0 1 x ≥ 0 f(x) = 1 1 + e−x f(x) = ( x < 0 x x ≥ 0
Two reasons the perceptron algorithm won’t work:
nodes?). Key idea: stochastic gradient descent (SGD).
each weight.
sigmoid tanh RELU
RELU(x) = ( x < 0 x x ≥ 0 σ(x) = 1 1 + e−x tanh(x) = 1 + e−2x 1 − e−2x
σ(x) = 1 1 + e−x dσ(x) dx = σ(x)(1 − σ(x)) d tanh(x) dx = 1 − tanh2(x) tanh(x) = 1 + e−2x 1 − e−2x RELU(x) = ( x < 0 x x ≥ 0
sigmoid tanh RELU
dRELU(x) dx = ( x ≤ 0 1 x > 0
between a node’s output and the target:
… … … algebra ensues … … …
E(~ w, ~ x) = (t − o)2 ∂E ∂wi
1 1 + e− ~
w·~ x
~ w · ~ x = X
i
wixi
sigmoid
∂E ∂wi = −o(1 − o)(t − o)xi
wi+ = −α ∂E ∂wi wi+ = α(o)(1 − o)(t − o)xi
sigmoid
w0 += .5 · .7(1 − .7)(.9 − .7)2 → wi = 1.04 w1 += .5 · .7(1 − .7)(.9 − .7)1.2 → wi = −.97 α = .5
previous layers.
Let 𝜀k be the error we computed for output-node k. The error for hidden node h comes from the sum of its contribution to the errors for each output node.
sigmoid
δk = ok(1 − ok)(tk − ok) X
k∈output
whkδk
δh = oh(1 − oh) X
k∈next layer
whkδk wi+ = αδhxi
for 1:training runs for example in shuffled training data: run example through network compute error for each output node for each layer (starting from output): for each node in layer: gradient descent update on incoming weights
wi+ = α(o)(1 − o)(t − o)xi δh = oh(1 − oh) X
k∈next layer
whkδk
σ(x) = 1 1 + e−x