Training Neural Nets
COMPSCI 527 — Computer Vision
COMPSCI 527 — Computer Vision Training Neural Nets 1 / 29
Training Neural Nets COMPSCI 527 Computer Vision COMPSCI 527 - - PowerPoint PPT Presentation
Training Neural Nets COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Training Neural Nets 1 / 29 Outline 1 The Softmax Simplex 2 Loss and Risk 3 Back-Propagation 4 Stochastic Gradient Descent 5 Regularization COMPSCI 527
COMPSCI 527 — Computer Vision Training Neural Nets 1 / 29
1 The Softmax Simplex 2 Loss and Risk 3 Back-Propagation 4 Stochastic Gradient Descent 5 Regularization
COMPSCI 527 — Computer Vision Training Neural Nets 2 / 29
The Softmax Simplex
exp(z) 1T exp(z)
def
K
COMPSCI 527 — Computer Vision Training Neural Nets 3 / 29
The Softmax Simplex
def
i=1 pi = 1}
p
1
p
2
1 1 1/2 1/2
1/3 1/3 1/3 1 1 1 p
3
p
1
p
2
COMPSCI 527 — Computer Vision Training Neural Nets 4 / 29
Loss and Risk
N
n=1 ℓn(w)
COMPSCI 527 — Computer Vision Training Neural Nets 5 / 29
Loss and Risk
1
COMPSCI 527 — Computer Vision Training Neural Nets 6 / 29
Loss and Risk
exp(z) 1T exp(z) ∈ R5
COMPSCI 527 — Computer Vision Training Neural Nets 7 / 29
Loss and Risk
10 15
exp(zy) 1T exp(z) = log(1T exp(z)) − zy
COMPSCI 527 — Computer Vision Training Neural Nets 8 / 29
Back-Propagation
(1)
(1)
(2)
(3)
(0)
n
∂w
∂ℓn ∂w(k) = ∂ℓn ∂x(k) ∂x(k) ∂w(k) ∂ℓn ∂x(k−1) = ∂ℓn ∂x(k) ∂x(k) ∂x(k−1)
∂ℓn ∂x(K) = ∂ℓ ∂p
COMPSCI 527 — Computer Vision Training Neural Nets 9 / 29
Back-Propagation
(1)
(1)
(2)
(3)
(0)
n
∂x(k) ∂w(k)
∂x(k) ∂x(k−1)
∂ℓn ∂x(K) = ∂ℓ ∂p
∂ℓn ∂w(k) into a vector ∂ℓn ∂w
COMPSCI 527 — Computer Vision Training Neural Nets 10 / 29
Back-Propagation
(1)
f f (2) f (3) w(1) w(2) w(3)
(1)
x
(2)
x
(3)
x = p
(0)
xn x = l n
n
y l ∂ℓn ∂x(3) = ∂ℓ ∂p ∂ℓn ∂w(3) = ∂ℓn ∂x(3) ∂x(3) ∂w(3) ∂ℓn ∂x(2) = ∂ℓn ∂x(3) ∂x(3) ∂x(2) ∂ℓn ∂w(2) = ∂ℓn ∂x(2) ∂x(2) ∂w(2) ∂ℓn ∂x(1) = ∂ℓn ∂x(2) ∂x(2) ∂x(1) ∂ℓn ∂w(1) = ∂ℓn ∂x(1) ∂x(1) ∂w(1)
∂x(0) = ∂ℓn ∂x(1) ∂x(1) ∂x(0)
∂w =
∂ℓn ∂w(1) ∂ℓn ∂w(2) ∂ℓn ∂w(3)
COMPSCI 527 — Computer Vision Training Neural Nets 11 / 29
Back-Propagation
COMPSCI 527 — Computer Vision Training Neural Nets 12 / 29
Back-Propagation
∂x = V (easy!)
∂w: What is ∂z ∂V ? Three subscripts: ∂zi ∂vjk .
∂w is a 2 × 8 matrix
COMPSCI 527 — Computer Vision Training Neural Nets 13 / 29
Back-Propagation
∂z ∂w =
∂w1 ∂z1 ∂w2 ∂z1 ∂w3 ∂z1 ∂w4 ∂z1 ∂w5 ∂z1 ∂w6 ∂z1 ∂w7 ∂z1 ∂w8 ∂z2 ∂w1 ∂z2 ∂w2 ∂z2 ∂w3 ∂z2 ∂w4 ∂z2 ∂w5 ∂z2 ∂w6 ∂z2 ∂w7 ∂z2 ∂w8
∂w =
COMPSCI 527 — Computer Vision Training Neural Nets 14 / 29
Stochastic Gradient Descent
COMPSCI 527 — Computer Vision Training Neural Nets 15 / 29
Stochastic Gradient Descent
COMPSCI 527 — Computer Vision Training Neural Nets 16 / 29
Stochastic Gradient Descent
risk
COMPSCI 527 — Computer Vision Training Neural Nets 17 / 29
Stochastic Gradient Descent
N
n=1 ∇ℓn(w) .
N ∇ℓ1(wt), . . . , − α N ∇ℓN(wt)
COMPSCI 527 — Computer Vision Training Neural Nets 18 / 29
Stochastic Gradient Descent
N ∇ℓ1(wt), . . . , − α N ∇ℓN(wt)
COMPSCI 527 — Computer Vision Training Neural Nets 19 / 29
Stochastic Gradient Descent
N ∇ℓn(w)
t
t
t
t
t
t
t
t
t
COMPSCI 527 — Computer Vision Training Neural Nets 20 / 29
Stochastic Gradient Descent
https://towardsdatascience.com/ COMPSCI 527 — Computer Vision Training Neural Nets 21 / 29
Stochastic Gradient Descent
COMPSCI 527 — Computer Vision Training Neural Nets 22 / 29
Stochastic Gradient Descent
B
n=(j−1)B+1 ∇ℓn(w(j−1))
COMPSCI 527 — Computer Vision Training Neural Nets 23 / 29
Stochastic Gradient Descent
200 400 600 800 1000 1200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
COMPSCI 527 — Computer Vision Training Neural Nets 24 / 29
Regularization
COMPSCI 527 — Computer Vision Training Neural Nets 25 / 29
Regularization
COMPSCI 527 — Computer Vision Training Neural Nets 26 / 29
Regularization
COMPSCI 527 — Computer Vision Training Neural Nets 27 / 29
Regularization COMPSCI 527 — Computer Vision Training Neural Nets 28 / 29
Regularization
COMPSCI 527 — Computer Vision Training Neural Nets 29 / 29