Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 1
Lecture 6: Training Neural Networks, Part 2 Fei-Fei Li & - - PowerPoint PPT Presentation
Lecture 6: Training Neural Networks, Part 2 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - Lecture 6 - 25 Jan 2016 25 Jan 2016 1 Administrative A2 is out. Its
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 1
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 2
A2 is out. It’s meaty. It’s due Feb 5 (next Friday) You’ll implement: Neural Nets (with Layer Forward/Backward API) Batch Norm Dropout ConvNets
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 3
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 4
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 5
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 6
“Xavier initialization” [Glorot et al., 2010] Reasonable initialization. (Mathematical derivation assumes linear activations)
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 7
[Ioffe and Szegedy, 2015] And then allow the network to squash the range if it wants to: Normalize:
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 8
Loss barely changing: Learning rate is probably too low
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 9
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 10
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 11
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 12
simple gradient descent update now: complicate.
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 13
Image credits: Alec Radford
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 14
Suppose loss function is steep vertically but shallow horizontally:
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 15
Suppose loss function is steep vertically but shallow horizontally:
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 16
Suppose loss function is steep vertically but shallow horizontally:
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 17
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 18
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 19
notice momentum
but overall getting to the minimum much faster.
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 20
gradient step momentum step actual step Ordinary momentum update:
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 21
gradient step momentum step actual step momentum step “lookahead” gradient step (bit different than
actual step Momentum update Nesterov momentum update
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 22
gradient step momentum step actual step momentum step “lookahead” gradient step (bit different than
actual step Momentum update Nesterov momentum update
Nesterov: the only difference...
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 23
Slightly inconvenient… usually we have :
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 24
Slightly inconvenient… usually we have : Variable transform and rearranging saves the day:
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 25
Slightly inconvenient… usually we have : Variable transform and rearranging saves the day: Replace all thetas with phis, rearrange and obtain:
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 26
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 27
Added element-wise scaling of the gradient based on the historical sum of squares in each dimension [Duchi et al., 2011]
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 28
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 29
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 30
[Tieleman and Hinton, 2012]
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 31
Introduced in a slide in Geoff Hinton’s Coursera class, lecture 6
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 32
Introduced in a slide in Geoff Hinton’s Coursera class, lecture 6
Cited by several papers as:
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 33
adagrad rmsprop
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 34
[Kingma and Ba, 2014] (incomplete, but close)
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 35
[Kingma and Ba, 2014] (incomplete, but close) momentum RMSProp-like
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 36
[Kingma and Ba, 2014] (incomplete, but close) momentum RMSProp-like
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 37
[Kingma and Ba, 2014] RMSProp-like bias correction
(only relevant in first few iterations when t is small)
momentum The bias correction compensates for the fact that m,v are initialized at zero and need some time to “warm up”.
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 38
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 39
=> Learning rate decay over time!
step decay: e.g. decay learning rate by half every few epochs. exponential decay: 1/t decay:
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 40
second-order Taylor expansion: Solving for the critical point we obtain the Newton parameter update:
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 41
second-order Taylor expansion: Solving for the critical point we obtain the Newton parameter update:
notice: no hyperparameters! (e.g. learning rate)
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 42
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 43
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 44
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 45
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 46
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016
47
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016
48
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 49
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 50
[Srivastava et al., 2014]
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 51
Example forward pass with a 3- layer network using dropout
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 52
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 53
Forces the network to have a redundant representation. has an ear has a tail is furry has claws mischievous look cat score X X X
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 54
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 55
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 56
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 57
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 58
x y
a w0 w1
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 59
x y
a With p=0.5, using all inputs in the forward pass would inflate the activations by 2x from what the network was “used to” during training! => Have to compensate by scaling the activations back down by ½ w0 w1
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 60
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 61
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 62
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 63
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 64
(see class notes...) fun guaranteed.
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 65
[LeNet-5, LeCun 1980]
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 66
RECEPTIVE FIELDS OF SINGLE NEURONES IN THE CAT'S STRIATE CORTEX
RECEPTIVE FIELDS, BINOCULAR INTERACTION AND FUNCTIONAL ARCHITECTURE IN THE CAT'S VISUAL CORTEX
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 67
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 68
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 69
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 70
“sandwich” architecture (SCSCSC…) simple cells: modifiable parameters complex cells: perform pooling
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 71
LeNet-5
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 72
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 73
[Krizhevsky 2012] Classification Retrieval
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 74
[Faster R-CNN: Ren, He, Girshick, Sun 2015] Detection Segmentation [Farabet et al., 2012]
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 75
NVIDIA Tegra X1
self-driving cars
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 76
[Taigman et al. 2014] [Simonyan et al. 2014] [Goodfellow 2014]
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 77
[Toshev, Szegedy 2014] [Mnih 2013]
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 78
[Ciresan et al. 2013] [Sermanet et al. 2011] [Ciresan et al.]
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 79
[Denil et al. 2014] [Turaga et al., 2010]
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 80
Whale recognition, Kaggle Challenge Mnih and Hinton, 2010
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 81
[Vinyals et al., 2015]
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 82
reddit.com/r/deepdream
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 83
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 84
Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition [Cadieu et al., 2014]
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 85
Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition [Cadieu et al., 2014]
Lecture 6 - 25 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 - 25 Jan 2016 86