Learning From Data Lecture 21 Neural Networks: Backpropagation - - PowerPoint PPT Presentation

learning from data lecture 21 neural networks
SMART_READER_LITE
LIVE PREVIEW

Learning From Data Lecture 21 Neural Networks: Backpropagation - - PowerPoint PPT Presentation

Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic computation h ( x ) e ( x ) Backpropagation: algorithmic computation of weights M. Magdon-Ismail CSCI 4100/6100 recap: The Neural Network


slide-1
SLIDE 1

Learning From Data Lecture 21 Neural Networks: Backpropagation

Forward propagation: algorithmic computation h(x) Backpropagation: algorithmic computation of ∂e(x) ∂weights

  • M. Magdon-Ismail

CSCI 4100/6100

slide-2
SLIDE 2

recap: The Neural Network

Biology − − − − − − − − − − − → Engineering − − − →

input layer ℓ = 0

1 1 h(x)

. . .

s θ(s) θ θ θ θ θ θ 1 x1 x2 xd

  • utput layer ℓ = L

hidden layers 0 < ℓ < L

c A M L Creator: Malik Magdon-Ismail

Neural Networks: Backpropagation: 2 /14

Zooming into a hidden node − →

slide-3
SLIDE 3

Zooming into a Hidden Node

input layer ℓ = 0

1 1 h(x)

. . .

s θ(s) θ θ θ θ θ θ 1 x1 x2 xd

  • utput layer ℓ = L

hidden layers 0 < ℓ < L

+ θ θ

layer (ℓ − 1) layer ℓ layer (ℓ + 1)

s(ℓ) x(ℓ) W(ℓ) W(ℓ+1)

layer ℓ parameters signals in s(ℓ) d(ℓ) dimensional input vector

  • utputs

x(ℓ) d(ℓ) + 1 dimensional output vector weights in W(ℓ) (d(ℓ−1) + 1) × d(ℓ) dimensional matrix weights out W(ℓ+1) (d(ℓ) + 1) × d(ℓ+1) dimensional matrix

layers ℓ = 0, 1, 2, . . . , L layer ℓ has “dimension” d(ℓ) = ⇒ d(ℓ) + 1 nodes W(ℓ) =     w(ℓ)

1

w(ℓ)

2

· · · w(ℓ)

d(ℓ)

. . .     W = {W(1), W(2), . . . , W(L)} ← specifies the network

c A M L Creator: Malik Magdon-Ismail

Neural Networks: Backpropagation: 3 /14

Linear Signal − →

slide-4
SLIDE 4

The Linear Signal

Input s(ℓ) is a linear combination (using weights) of the

  • utputs of the previous layer x(ℓ−1).

s(ℓ) = (W(ℓ))tx(ℓ−1)

input layer ℓ = 0

1 1 h(x)

. . .

s θ(s) θ θ θ θ θ θ 1 x1 x2 xd

  • utput layer ℓ = L

hidden layers 0 < ℓ < L

           s(ℓ)

1

s(ℓ)

2

. . . s(ℓ)

j

. . . s(ℓ)

d(ℓ)

           =            (w(ℓ)

1 )t

(w(ℓ)

2 )t

. . . (w(ℓ)

j )t

. . . (w(ℓ)

d(ℓ))t

           x(ℓ−1) s(ℓ)

j

= (w(ℓ)

j )tx(ℓ−1) (recall the linear signal s = wtx)

s(ℓ) θ − − − − − − → x(ℓ)

c A M L Creator: Malik Magdon-Ismail

Neural Networks: Backpropagation: 4 /14

Forward propagation − →

slide-5
SLIDE 5

Forward Propagation: Computing h(x)

x = x(0)

W(1)

− → s(1)

θ

− → x(1)

W(2)

− → s(2)

θ

− → x(2) · · ·

W(L)

− → s(L)

θ

− → x(L) = h(x). Forward propagation to compute h(x):

1: x(0) ← x

[Initialization]

2: for ℓ = 1 to L

[Forward Propagation] do

3:

s(ℓ) ← (W(ℓ))tx(ℓ−1)

4:

x(ℓ) ←

  • 1

θ(s(ℓ))

  • 5: end for

6: h(x) = x(L)

[Output]

c A M L Creator: Malik Magdon-Ismail

Neural Networks: Backpropagation: 5 /14

Minimizing Ein − →

slide-6
SLIDE 6

Minimizing Ein

Ein(h) = Ein(W) = 1 N

N

  • n=1

(h(xn) − yn)2

W = {W(1), W(2), . . . , W(L)}

w Ein sign tanh

linear sign tanh

Using θ = tanh makes Ein differentiable so we can use gradient descent − → local minimum.

c A M L Creator: Malik Magdon-Ismail

Neural Networks: Backpropagation: 6 /14

Gradient Descent − →

slide-7
SLIDE 7

Gradient Descent

W(t + 1) = W(t) − η∇Ein(W(t))

c A M L Creator: Malik Magdon-Ismail

Neural Networks: Backpropagation: 7 /14

Gradient of Ein − →

slide-8
SLIDE 8

Gradient of Ein

Ein(w) = 1 N

N

  • n=1

e(h(xn), yn) ւ en ∂Ein(w) ∂W(ℓ) = 1 N

N

  • n=1

∂en ∂W(ℓ) We need ∂e(x) ∂W(ℓ)

c A M L Creator: Malik Magdon-Ismail

Neural Networks: Backpropagation: 8 /14

Numerical approach − →

slide-9
SLIDE 9

Numerical Approach

∂e(x) ∂W(ℓ)

ij

≈ e(x|W(ℓ)

ij + ∆) − e(x|W(ℓ) ij − ∆)

2∆

approximate inefficient

c A M L Creator: Malik Magdon-Ismail

Neural Networks: Backpropagation: 9 /14

Algorithmic approach − →

slide-10
SLIDE 10

Algorithmic Approach

e(x) is a function of s(ℓ) and s(ℓ) = (W(ℓ))tx(ℓ−1) ∂e ∂W(ℓ) = ∂s(ℓ) ∂W(ℓ) ·

∂e

∂s(ℓ)

t

(chain rule)

= x(ℓ−1)(δ(ℓ))t sensitivity δ(ℓ) = ∂e ∂s(ℓ)

c A M L Creator: Malik Magdon-Ismail

Neural Networks: Backpropagation: 10 /14

δ(ℓ) and the chain rule − →

slide-11
SLIDE 11

Computing δ(ℓ) Using the Chain Rule

δ(1) ← − δ(2) · · · ← − δ(L−1) ← − δ(L) Multiple applications of the chain rule: ∆s(ℓ)

θ

− → ∆x(ℓ) W(ℓ+1) − → ∆s(ℓ+1) · · · − → ∆e(x)

δ(ℓ) + + × θ′(s(ℓ))

. . .

W(ℓ+1) δ(ℓ+1)

. . .

layer ℓ layer (ℓ + 1)

1 1

don’t use 0th component (bias) ↓

δ(ℓ) = θ′(s(ℓ)) ⊗ [W(ℓ+1)δ(ℓ+1)]d(ℓ)

1

↑ componentwise multiplication

c A M L Creator: Malik Magdon-Ismail

Neural Networks: Backpropagation: 11 /14

The Backpropagation algorithm − →

slide-12
SLIDE 12

The Backpropagation Algorithm

δ(1) ← − δ(2) · · · ← − δ(L−1) ← − δ(L) Backpropagation to compute sensitivities δ(ℓ): (Assume s(ℓ) and x(ℓ) have been computed for all ℓ)

1: δ(L) ← 2(x(L) − y) · θ′(s(L))

[Initialization]

2: for ℓ = L − 1 to 1

[Back-Propagation] do

3:

Compute (for tanh hidden node): θ′(s(ℓ)) =

  • 1 − x(ℓ) ⊗ x(ℓ)d(ℓ)

1

4:

δ(ℓ) ← θ′(s(ℓ)) ⊗

W(ℓ+1)δ(ℓ+1)d(ℓ)

1

← componentwise multiplication

5: end for c A M L Creator: Malik Magdon-Ismail

Neural Networks: Backpropagation: 12 /14

Gradient Descent on Ein − →

slide-13
SLIDE 13

Algorithm for Gradient Descent on Ein

Algorithm to Compute Ein(w) and g = ∇Ein(w): Input: weights w = {W(1), . . . , W(L)}; data D. Output: error Ein(w) and gradient g = {G(1), . . . , G(L)}.

1: Initialize: Ein = 0; for ℓ = 1, . . . , L, G(ℓ) = 0 · W(ℓ) . 2: for Each data point xn (n = 1, . . . , N) do 3:

Compute x(ℓ) for ℓ = 0, . . . , L. [forward propagation]

4:

Compute δ(ℓ) for ℓ = 1, . . . , L. [backpropagation]

5:

Ein ← Ein + 1

N(x(L) 1

− yn)2.

6:

for ℓ = 1, . . . , L do

7:

G(ℓ)(xn) = [x(ℓ−1)(δ(ℓ))t]

8:

G(ℓ) ← G(ℓ) + 1

NG(ℓ)(xn).

9:

end for

10: end for

Can do batch version or sequential version (SGD).

c A M L Creator: Malik Magdon-Ismail

Neural Networks: Backpropagation: 13 /14

Digits Data − →

slide-14
SLIDE 14

Digits Data

Average Intensity Symmetry

SGD Gradient Descent log10(iteration) log10(error) 2 4 6

  • 4
  • 3
  • 2
  • 1

c A M L Creator: Malik Magdon-Ismail

Neural Networks: Backpropagation: 14 /14