Learning From Data Lecture 21 Neural Networks: Backpropagation
Forward propagation: algorithmic computation h(x) Backpropagation: algorithmic computation of ∂e(x) ∂weights
- M. Magdon-Ismail
CSCI 4100/6100
Learning From Data Lecture 21 Neural Networks: Backpropagation - - PowerPoint PPT Presentation
Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic computation h ( x ) e ( x ) Backpropagation: algorithmic computation of weights M. Magdon-Ismail CSCI 4100/6100 recap: The Neural Network
CSCI 4100/6100
recap: The Neural Network
input layer ℓ = 0
. . .
hidden layers 0 < ℓ < L
c A M L Creator: Malik Magdon-Ismail
Neural Networks: Backpropagation: 2 /14
Zooming into a hidden node − →
input layer ℓ = 0
1 1 h(x)
. . .
s θ(s) θ θ θ θ θ θ 1 x1 x2 xd
hidden layers 0 < ℓ < L
+ θ θ
layer (ℓ − 1) layer ℓ layer (ℓ + 1)
s(ℓ) x(ℓ) W(ℓ) W(ℓ+1)
layers ℓ = 0, 1, 2, . . . , L layer ℓ has “dimension” d(ℓ) = ⇒ d(ℓ) + 1 nodes W(ℓ) = w(ℓ)
1
w(ℓ)
2
· · · w(ℓ)
d(ℓ)
. . . W = {W(1), W(2), . . . , W(L)} ← specifies the network
c A M L Creator: Malik Magdon-Ismail
Neural Networks: Backpropagation: 3 /14
Linear Signal − →
input layer ℓ = 0
1 1 h(x)
. . .
s θ(s) θ θ θ θ θ θ 1 x1 x2 xd
hidden layers 0 < ℓ < L
1
2
j
d(ℓ)
1 )t
2 )t
j )t
d(ℓ))t
j
j )tx(ℓ−1) (recall the linear signal s = wtx)
c A M L Creator: Malik Magdon-Ismail
Neural Networks: Backpropagation: 4 /14
Forward propagation − →
W(1)
W(2)
W(L)
1: x(0) ← x
2: for ℓ = 1 to L
3:
4:
6: h(x) = x(L)
c A M L Creator: Malik Magdon-Ismail
Neural Networks: Backpropagation: 5 /14
Minimizing Ein − →
W = {W(1), W(2), . . . , W(L)}
c A M L Creator: Malik Magdon-Ismail
Neural Networks: Backpropagation: 6 /14
Gradient Descent − →
c A M L Creator: Malik Magdon-Ismail
Neural Networks: Backpropagation: 7 /14
Gradient of Ein − →
c A M L Creator: Malik Magdon-Ismail
Neural Networks: Backpropagation: 8 /14
Numerical approach − →
approximate inefficient
c A M L Creator: Malik Magdon-Ismail
Neural Networks: Backpropagation: 9 /14
Algorithmic approach − →
(chain rule)
c A M L Creator: Malik Magdon-Ismail
Neural Networks: Backpropagation: 10 /14
δ(ℓ) and the chain rule − →
δ(ℓ) + + × θ′(s(ℓ))
. . .
W(ℓ+1) δ(ℓ+1)
. . .
layer ℓ layer (ℓ + 1)
1 1
don’t use 0th component (bias) ↓
1
↑ componentwise multiplication
c A M L Creator: Malik Magdon-Ismail
Neural Networks: Backpropagation: 11 /14
The Backpropagation algorithm − →
1: δ(L) ← 2(x(L) − y) · θ′(s(L))
2: for ℓ = L − 1 to 1
3:
4:
5: end for c A M L Creator: Malik Magdon-Ismail
Neural Networks: Backpropagation: 12 /14
Gradient Descent on Ein − →
1: Initialize: Ein = 0; for ℓ = 1, . . . , L, G(ℓ) = 0 · W(ℓ) . 2: for Each data point xn (n = 1, . . . , N) do 3:
4:
5:
6:
7:
8:
9:
10: end for
c A M L Creator: Malik Magdon-Ismail
Neural Networks: Backpropagation: 13 /14
Digits Data − →
SGD Gradient Descent log10(iteration) log10(error) 2 4 6
c A M L Creator: Malik Magdon-Ismail
Neural Networks: Backpropagation: 14 /14