Deep Learning
Srihari 1
Numerical Computation
Sargur N. Srihari srihari@cedar.buffalo.edu
This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/CSE676
Numerical Computation Sargur N. Srihari srihari@cedar.buffalo.edu - - PowerPoint PPT Presentation
Srihari Deep Learning Numerical Computation Sargur N. Srihari srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/CSE676 1 Srihari Deep Learning Topics Overflow and Underflow
Deep Learning
Srihari 1
This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/CSE676
Deep Learning
Srihari
2
Acknowledgements: Goodfellow, Bengio, Courville, Deep Learning, MIT Press, 2016
Deep Learning
Srihari
3
Deep Learning
Srihari
– log 0 is -∞ (which becomes not-number for further operations)
4
Deep Learning
Srihari
5
softmax(x)i = exp(xi) exp(x j)
j=1 n
∑
Deep Learning
Srihari
6
Deep Learning
Srihari
7
max
i,j
λi λj
Deep Learning
Srihari
8
f(x) = 1 2 || Ax −b ||2
Deep Learning
Srihari
Deep Learning
Srihari
10
Deep Learning
Srihari
11
Deep Learning
Srihari
12
Deep Learning
Srihari
13
Deep Learning
Srihari
14
∂ ∂xi f x
( )
∇xf x
Deep Learning
Srihari
15
uT∇xf x
min
u,uTu=1uT∇xf x
u,uTu=1 u 2 ∇xf x
2 cosθ
Deep Learning
Srihari
16
x' = x −ε∇xf x
Deep Learning
Srihari
17
f(x −ε∇xf x
Deep Learning
Srihari
18
f(x) = 1 2 || Ax − b ||2
∇xf x
|| ATAx −ATb ||
2> δ
x ← x −ε ATAx −ATb
{ }
2 1
) ( w 2 1 ) w (
=
− =
N n n T n D
x t E φ
Deep Learning
Srihari
19
∇xf x
Deep Learning
Srihari
20
Deep Learning
Srihari
21
Deep Learning
Srihari
22
∂2 ∂xi ∂x j f
∂2 ∂x 2 f
Deep Learning
Srihari
23
Decrease is faster than predicted by Gradient Descent Gradient Predicts decrease correctly Decrease is slower than expected Actually increases
Deep Learning
Srihari
H(f )(x)i,j = ∂2 ∂xi ∂x j f(x)
Deep Learning
Srihari
25
Deep Learning
Srihari
– original value of f, – expected improvement due to slope, and – correction to be applied due to curvature
26
f (x) ≈ f (x(0))+(x - x(0))
Tg + 1
2(x - x(0))
TH(x - x(0))
f (x(0) −εg) ≈ f (x(0))−εgTg + 1 2 ε2gTHg
ε* ≈ gTg gTHg
Deep Learning
Srihari
27
Deep Learning
Srihari
28
Deep Learning
Srihari
2-x2 2
29
Deep Learning
Srihari
30
Deep Learning
Srihari
31
Deep Learning
Srihari
32
Deep Learning
Srihari
33
f (x) ≈ f (x(0))+(x - x(0))
T ∇x f (x(0))+ 1
2(x - x(0))
TH(f )(x - x(0))(x - x(0))
x* = x(0) −H(f )(x(0))−1∇xf(x(0))
Deep Learning
Srihari
34
f (x)− f (y) ≤ L x - y
2
Deep Learning
Srihari
35
Deep Learning
Srihari
36
Deep Learning
Srihari
f(x) = 1 2 || Ax − b ||2 L(x,λ) = f(x)+λ xTx −1
( )
min
x
max
λ,λ≥0 L(x,λ)
Deep Learning
Srihari
38
Deep Learning
Srihari
39
L(x,λ,α) = f(x)+ λi
i
g(i)(x)+ αj
j
h(j)(x)
Deep Learning
Srihari
40