Lecture 12:
− Computational Graph − Backpropagation
Aykut Erdem
March 2016 Hacettepe University
Lecture 12: Computational Graph Backpropagation Aykut Erdem March - - PowerPoint PPT Presentation
Lecture 12: Computational Graph Backpropagation Aykut Erdem March 2016 Hacettepe University Administrative Assignment 2 due March 20, 2016! Midterm exam on Thursday, March 24, 2016 You are responsible from the beginning
− Computational Graph − Backpropagation
Aykut Erdem
March 2016 Hacettepe University
− You are responsible from the beginning till the end
− You can prepare and bring a full-page copy sheet
− It is due April 7, 2016 − You will implement a 2-layer Neural Network
2
3
linear mapping Wx and nonlinear function
to measure quality of estimate so far
yi = Wixi xi+1 = σ(yi)
x1 x2 x3 x4 y
l(y, yi)
slide by Alex Smola
(j indexing hidden units, k indexing the output units, D number of inputs)
4
slide by Raquel Urtasun, Richard Zemel, Sanja Fidler
X
= g(wk0 +
J
X
j=1
hj(x)wkj)
σ(z) = 1 1 + exp(−z), tanh(z) = exp(z) − exp(−z) exp(z) + exp(−z), ReLU(z) = max(0, z) hj(x) = f (vj0 +
D
X
i=1
xivji)
biases and W3?
5
slide by Raquel Urtasun, Richard Zemel, Sanja Fidler
[http://cs231n.github.io/neural-networks-1/]
6
7
8
0.09 2.9 4.48 8.02 3.78 1.06
6.04 5.31
3.58 4.49
3.42 4.64 2.65 5.1 2.64 5.55
6.14
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
quantifies our unhappiness with the scores across the training data.
efficiently finding the parameters that minimize the loss function. (optimization)
TODO:
We defined a (linear) score function:
9
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
10
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
11
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
12
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
13
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
14
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
15
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
16
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
17
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
18
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
19
20
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
21
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
22 there are also more fancy update formulas (momentum, Adagrad, RMSProp, Adam, …)
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
23
(image credits to Alec Radford)
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
24
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
25
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
26
27
x W
*
hinge loss
R
+
L s (scores)
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
input image weights loss
28
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
input tape loss
29
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
30
e.g. x = -2, y = 5, z = -4
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
31
e.g. x = -2, y = 5, z = -4 Want:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
32
e.g. x = -2, y = 5, z = -4 Want:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
33
e.g. x = -2, y = 5, z = -4 Want:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
34
e.g. x = -2, y = 5, z = -4 Want:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
35
e.g. x = -2, y = 5, z = -4 Want:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
36
e.g. x = -2, y = 5, z = -4 Want:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
37
e.g. x = -2, y = 5, z = -4 Want:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
38
e.g. x = -2, y = 5, z = -4 Want:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
39
e.g. x = -2, y = 5, z = -4 Want:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
40
e.g. x = -2, y = 5, z = -4 Want:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Chain rule:
41
e.g. x = -2, y = 5, z = -4 Want:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
42
e.g. x = -2, y = 5, z = -4 Want:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Chain rule:
43
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
activations
44
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
activations
“local gradient”
45
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
activations gradients
“local gradient”
46
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
activations gradients
“local gradient”
47
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
activations gradients
“local gradient”
48
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
activations gradients
“local gradient”
49
Another example:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
50
Another example:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
51
Another example:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
52
Another example:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
53
Another example:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
54
Another example:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
55
Another example:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
56
Another example:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
57
Another example:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
58
Another example:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
(-1) * (-0.20) = 0.20
59
Another example:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
60
Another example:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
[local gradient] x [its gradient] [1] x [0.2] = 0.2 [1] x [0.2] = 0.2 (both inputs!)
61
Another example:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
62
Another example:
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
[local gradient] x [its gradient] x0: [2] x [0.2] = 0.4 w0: [-1] x [0.2] = -0.2
63
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
sigmoid function sigmoid gate
64
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
sigmoid function sigmoid gate (0.73) * (1 - 0.73) = 0.2
65
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
66
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
+
67
Graph (or Net) object. (Rough pseudo code)
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
68
(x,y,z are scalars)
* x y z
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
69
(x,y,z are scalars)
* x y z
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
gradient formula by hand for all parameters
along a computational graph to compute the gradients of all inputs/parameters/intermediates
nodes implement the forward() / backward() API.
intermediates needed for gradient computation in memory
the loss function with respect to the inputs.
70
71
72