Neural Networks II
Chen Gao Virginia Tech
Spring 2019
ECE-5424G / CS-5824
Neural Networks II Chen Gao Virginia Tech Spring 2019 ECE-5424G / - - PowerPoint PPT Presentation
Neural Networks II Chen Gao Virginia Tech Spring 2019 ECE-5424G / CS-5824 Neural Networks Origins: Algorithms that try to mimic the brain. What is this? A single neuron in the brain Input Output Slide credit: Andrew Ng An artificial
Chen Gao Virginia Tech
Spring 2019
ECE-5424G / CS-5824
What is this?
Slide credit: Andrew Ng
π¦ = π¦0 π¦1 π¦2 π¦3 π = π0 π1 π2 π3
π¦1 π¦2 π¦3 π¦0
βπ π¦ = 1 1 + πβπβ€π¦
βBias unitβ βInputβ βOutputβ βWeightsβ βParametersβ
Slide credit: Andrew Ng
bias b only change the position of the hyperplane
Slide credit: Hugo Larochelle
range determined by g(.)
activation between 0 and 1
π π¦ = 1 1 + πβπ¦
Slide credit: Hugo Larochelle
activation between -1 and 1
π π¦ = tanh π¦ = ππ¦ β πβπ¦ ππ¦ + πβπ¦
Slide credit: Hugo Larochelle
sparse activities
Slide credit: Hugo Larochelle
π π¦ = relu π¦ = max 0, π¦
π π¦ = softmax π¦ = ππ¦1 Οπ ππ¦π β¦ ππ¦π Οπ ππ¦π
Slide credit: Hugo Larochelle
ββa single hidden layer neural network with a linear output unit can approximate any continuous function arbitrarily well, given enough hidden unitsββ Hornik, 1991
Slide credit: Hugo Larochelle
(2)
(2)
(2)
(2)
Layer 1 βOutputβ
Layer 2 (hidden) Layer 3
Slide credit: Andrew Ng
π1
(2)
π2
(2)
π3
(2)
π0
(2)
βΞ π¦
π¦1 π¦2 π¦3 π¦0
ππ
(π) = βactivationβ of unit π in layer π
Ξ π = matrix of weights controlling function mapping from layer π to layer π + 1 π1
(2) = π Ξ10 (1)π¦0 + Ξ11 (1)π¦1 + Ξ12 (1)π¦2 + Ξ13 (1)π¦3
π2
(2) = π Ξ20 (1)π¦0 + Ξ21 (1)π¦1 + Ξ22 (1)π¦2 + Ξ23 (1)π¦3
π3
(2) = π Ξ30 (1)π¦0 + Ξ31 (1)π¦1 + Ξ32 (1)π¦2 + Ξ33 (1)π¦3
βΞ(π¦) = π Ξ10
(2)π0 (2) + Ξ11 (1)π1 (2) + Ξ12 (1)π2 (2) + Ξ13 (1)π3 (2)
π‘
π unit in layer π
π‘
π+1 units in layer π + 1
Size of Ξ π ?
π‘
π+1 Γ (π‘ π + 1)
Slide credit: Andrew Ng
π1
(2)
π2
(2)
π3
(2)
π0
(2)
βΞ π¦
π¦1 π¦2 π¦3 π¦0
π¦ = π¦0 π¦1 π¦2 π¦3 z(2) = z1
(2)
z2
(2)
z3
(2)
π1
(2) = π Ξ10 (1)π¦0 + Ξ11 (1)π¦1 + Ξ12 (1)π¦2 + Ξ13 (1)π¦3
= π(z1
(2))
π2
(2) = π Ξ20 (1)π¦0 + Ξ21 (1)π¦1 + Ξ22 (1)π¦2 + Ξ23 (1)π¦3
= π(z2
(2))
π3
(2) = π Ξ30 (1)π¦0 + Ξ31 (1)π¦1 + Ξ32 (1)π¦2 + Ξ33 (1)π¦3
= π(z3
(2))
βΞ π¦ = π Ξ10
2 π0 2 + Ξ11 1 π1 2 + Ξ12 1 π2 2 + Ξ13 1 π3 2
= π(π¨(3)) βPre-activationβ
Slide credit: Andrew Ng
Why do we need g(.)?
π1
(2)
π2
(2)
π3
(2)
π0
(2)
βΞ π¦
π¦1 π¦2 π¦3 π¦0
π¦ = π¦0 π¦1 π¦2 π¦3 z(2) = z1
(2)
z2
(2)
z3
(2)
π1
(2) = π(z1 (2))
π2
(2) = π(z2 (2))
π3
(2) = π(z3 (2))
βΞ π¦ = π(π¨(3))
βPre-activationβ
π¨(2) = Ξ(1)π¦ = Ξ(1)π(1) π(2) = π(π¨(2)) Add π0
(2) = 1
π¨(3) = Ξ(2)π(2) βΞ π¦ = π(3) = π(π¨(3))
Slide credit: Andrew Ng
π(2) π¨(2) π(3)
βΞ π¦
X π(1) π(1) π¨(3) π(2) π(2)
π¨(2) = Ξ(1)π¦ = Ξ(1)π(1) π(2) = π(π¨(2)) Add π0
(2) = 1
π¨(3) = Ξ(2)π(2) βΞ π¦ = π(3) = π(π¨(3))
How do we evaluate
Logistic regression: Neural network:
Slide credit: Andrew Ng
Need to compute:
Slide credit: Andrew Ng
Given one training example π¦, π§ π(1) = π¦ π¨(2) = Ξ(1)π(1) π(2) = π(π¨(2)) (add a0
(2))
π¨(3) = Ξ(2)π(2) π(3) = π(π¨(3)) (add a0
(3))
π¨(4) = Ξ(3)π(3) π(4) = π π¨ 4 = βΞ π¦
Slide credit: Andrew Ng
Intuition: π
π (π) = βerrorβ of node π in layer π
For each output unit (layer L = 4)
Slide credit: Andrew Ng
π(4) = π(4) β π§ π(3) = π(4) ππ(4)
ππ¨(3) = π(4) ππ(4) ππ(4) ππ(4) ππ¨(4) ππ¨(4) ππ(3) ππ(3) ππ¨(3)
= 1 *Ξ 3 ππ(4) .β πβ² π¨ 4 .β πβ²(π¨(3))
π¨(3) = Ξ(2)π(2) π(3) = π(π¨(3)) π¨(4) = Ξ(3)π(3) π(4) = π π¨ 4
Training set π¦(1), π§(1) β¦ π¦(π), π§(π) Set Ξ(1) = 0 For π = 1 to π Set π(1) = π¦ Perform forward propagation to compute π(π) for π = 2. . π use π§(π) to compute π(π) = π(π) β π§(π) Compute π(πβ1), π(πβ2) β¦ π(2) Ξ(π) = Ξ(π) β π(π)π(π+1)
Slide credit: Andrew Ng
π π¦ = 1 1 + πβπ¦
Slide credit: Hugo Larochelle
πβ² π¦ = π π¦ 1 β π π¦
π π¦ = tanh π¦ = ππ¦ β πβπ¦ ππ¦ + πβπ¦
Slide credit: Hugo Larochelle
πβ² π¦ = 1 β π π¦ 2
Slide credit: Hugo Larochelle
π π¦ = relu π¦ = max 0, π¦
πβ² π¦ = 1π¦ > 0
Slide credit: Hugo Larochelle
Pick a network architecture
Slide credit: Hugo Larochelle
Early stopping
Slide credit: Hugo Larochelle
increases
Slide credit: Hugo Larochelle
Slide credit: Hugo Larochelle