Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 1
Lecture 4: Backpropagation and Neural Networks part 1 Fei-Fei Li - - PowerPoint PPT Presentation
Lecture 4: Backpropagation and Neural Networks part 1 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - Lecture 4 - 13 Jan 2016 13 Jan 2016 1 Administrative A1 is due
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 1
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 2
A1 is due Jan 20 (Wednesday). ~150 hours left Warning: Jan 18 (Monday) is Holiday (no class/office hours) Also note: Lectures are non-exhaustive. Read course notes for completeness. I’ll hold make up office hours on Wed Jan20, 5pm @ Gates 259
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 3
scores function SVM loss data loss + regularization
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 4
(image credits to Alec Radford)
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 5
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 6
x W
hinge loss
R
+
L
s (scores)
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 7
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 8
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 9
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 10
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 11
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 12
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 13
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 14
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 15
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 16
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 17
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 18
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 19
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 20
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 21
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 22
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 23
“local gradient”
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 24
“local gradient”
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 25
“local gradient”
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 26
“local gradient”
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 27
“local gradient”
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 28
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 29
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 30
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 31
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 32
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 33
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 34
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 35
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 36
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 37
(-1) * (-0.20) = 0.20
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 38
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 39
[local gradient] x [its gradient] [1] x [0.2] = 0.2 [1] x [0.2] = 0.2 (both inputs!)
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 40
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 41
[local gradient] x [its gradient] x0: [2] x [0.2] = 0.4 w0: [-1] x [0.2] = -0.2
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 42
sigmoid function
sigmoid gate
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 43
sigmoid function
sigmoid gate
(0.73) * (1 - 0.73) = 0.2
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 44
add gate: gradient distributor max gate: gradient router mul gate: gradient… “switcher”?
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 45
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 46
Graph (or Net) object. (Rough psuedo code)
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 47
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 48
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 49
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 50
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 51
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 52
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 53
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 54
“local gradient” This is now the Jacobian matrix (derivative of each element of z w.r.t. each element of x) (x,y,z are now vectors)
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 55
f(x) = max(0,x) (elementwise)
4096-d input vector 4096-d
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 56
f(x) = max(0,x) (elementwise)
4096-d input vector 4096-d
Jacobian matrix
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 57
(elementwise)
4096-d input vector 4096-d
Jacobian matrix
f(x) = max(0,x) (elementwise)
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 58
(elementwise)
100 4096-d input vectors 100 4096-d
in practice we process an entire minibatch (e.g. 100)
i.e. Jacobian would technically be a [409,600 x 409,600] matrix :\
f(x) = max(0,x) (elementwise)
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 59
E.g. for the SVM:
margins
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 60
hand for all parameters
computational graph to compute the gradients of all inputs/parameters/intermediates
the forward() / backward() API.
needed for gradient computation in memory
function with respect to the inputs.
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 61
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 62
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 63
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 64
W1
W2 3072 100 10
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 65
W1
W2 3072 100 10
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 66
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 67 Full implementation of training a 2-layer Neural Network needs ~11 lines:
from @iamtrask, http://iamtrask.github.io/2015/07/12/basic-python-network/
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 68
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 69
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 70
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 71
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 72
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 73
sigmoid activation function
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 74
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 75
Be very careful with your Brain analogies: Biological Neurons:
linear computations
a complex non-linear dynamical system
[Dendritic Computation. London and Hausser]
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 76
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 77
“Fully-connected” layers “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net”
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 78
We can efficiently evaluate an entire layer of neurons.
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 79
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 80
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 81
(you can play with this demo over at ConvNetJS: http://cs.stanford. edu/people/karpathy/convnetjs/demo/classify2d.html)
Do not use size of neural network as a regularizer. Use stronger regularization instead:
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 82
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 83
Lecture 4 - 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 4 - 13 Jan 2016 84
reverse-mode differentiation (if you want effect of many things on one thing) forward-mode differentiation (if you want effect of one thing on many things) for many different x for many different y complex graph inputs x