Deep Learning Basics Lecture 6: Convolutional NN
Princeton University COS 495 Instructor: Yingyu Liang
Lecture 6: Convolutional NN Princeton University COS 495 - - PowerPoint PPT Presentation
Deep Learning Basics Lecture 6: Convolutional NN Princeton University COS 495 Instructor: Yingyu Liang Review: convolutional layers Convolution: two dimensional case Input Kernel/filter a b c d w x e f g h y z i j k l wa + bx
Princeton University COS 495 Instructor: Yingyu Liang
a b c d e f g h i j k l w x y z bw + cx + fy + gz wa + bx + ey + fz Kernel/filter Feature map Input
Figure from Deep Learning, by Goodfellow, Bengio, and Courville
the same weight shared for all output nodes π output nodes π input nodes π kernel size
Figure from Deep Learning, by Goodfellow, Bengio, and Courville
recognitionβ , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner,
in Proceedings of the IEEE, 1998
recognitionβ , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner,
in Proceedings of the IEEE, 1998
recognitionβ , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner,
in Proceedings of the IEEE, 1998
Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Filter: 5x5, stride: 1x1, #filters: 6
Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Pooling: 2x2, stride: 2
Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Filter: 5x5x6, stride: 1x1, #filters: 16
Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Pooling: 2x2, stride: 2
Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Weight matrix: 400x120
Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Weight matrix: 120x84 Weight matrix: 84x10
Updated in April 2016; checked more recent ones online
MILA lab at University of Montreal)
machine learning algorithms in lua
networks in Theano
ΰ· ππ π =
1 π Οπ’=1 π
π(π, π¦π’, π§π’) + π(π) where the hypothesis is parametrized by π
ππ’+1 = ππ’ β ππ’πΌΰ· ππ ππ’
(π¦π’π+1,π§π’π+1),β¦, (π¦π’π+π,π§π’π+π)
ππ’+1 = ππ’ β ππ’πΌ 1 π ΰ·
1β€πβ€π
π ππ’, π¦π’π+π, π§π’π+π + π(ππ’)
can take larger steps
Figure from Deep Learning, by Goodfellow, Bengio, and Courville
Contour: loss function Path: SGD with momentum Arrow: stochastic gradient
(π¦π’π+1,π§π’π+1),β¦, (π¦π’π+π,π§π’π+π)
π€π’ = π½π€π’β1 β ππ’πΌ 1 π ΰ·
1β€πβ€π
π ππ’, π¦π’π+π, π§π’π+π + π(ππ’) ππ’+1 = ππ’ + π€π’
π€π’ = π½π€π’β1 β ππ’πΌ 1 π ΰ·
1β€πβ€π
π ππ’, π¦π’π+π, π§π’π+π + π(ππ’) ππ’+1 = ππ’ + π€π’
then is increased to 0.9 or higher.