lecture 6 convolutional nn
play

Lecture 6: Convolutional NN Princeton University COS 495 - PowerPoint PPT Presentation

Deep Learning Basics Lecture 6: Convolutional NN Princeton University COS 495 Instructor: Yingyu Liang Review: convolutional layers Convolution: two dimensional case Input Kernel/filter a b c d w x e f g h y z i j k l wa + bx


  1. Deep Learning Basics Lecture 6: Convolutional NN Princeton University COS 495 Instructor: Yingyu Liang

  2. Review: convolutional layers

  3. Convolution: two dimensional case Input Kernel/filter a b c d w x e f g h y z i j k l wa + bx + bw + cx + ey + fz fy + gz Feature map

  4. Convolutional layers the same weight shared for all output nodes ๐‘› output nodes ๐‘™ kernel size ๐‘œ input nodes Figure from Deep Learning, by Goodfellow, Bengio, and Courville

  5. Terminology Figure from Deep Learning, by Goodfellow, Bengio, and Courville

  6. Case study: LeNet-5

  7. LeNet-5 โ€ข Proposed in โ€œ Gradient-based learning applied to document recognition โ€ , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998

  8. LeNet-5 โ€ข Proposed in โ€œ Gradient-based learning applied to document recognition โ€ , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998 โ€ข Apply convolution on 2D images (MNIST) and use backpropagation

  9. LeNet-5 โ€ข Proposed in โ€œ Gradient-based learning applied to document recognition โ€ , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998 โ€ข Apply convolution on 2D images (MNIST) and use backpropagation โ€ข Structure: 2 convolutional layers (with pooling) + 3 fully connected layers โ€ข Input size: 32x32x1 โ€ข Convolution kernel size: 5x5 โ€ข Pooling: 2x2

  10. LeNet-5 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

  11. LeNet-5 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

  12. LeNet-5 Filter: 5x5, stride: 1x1, #filters: 6 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

  13. LeNet-5 Pooling: 2x2, stride: 2 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

  14. LeNet-5 Filter: 5x5x6, stride: 1x1, #filters: 16 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

  15. LeNet-5 Pooling: 2x2, stride: 2 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

  16. LeNet-5 Weight matrix: 400x120 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

  17. Weight matrix: 84x10 LeNet-5 Weight matrix: 120x84 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

  18. Software platforms for CNN Updated in April 2016; checked more recent ones online

  19. Platform: Marvin (marvin.is)

  20. Platform: Marvin by

  21. LeNet in Marvin: convolutional layer

  22. LeNet in Marvin: pooling layer

  23. LeNet in Marvin: fully connected layer

  24. Platform: Caffe (caffe.berkeleyvision.org)

  25. LeNet in Caffe

  26. Platform: Tensorflow (tensorflow.org)

  27. Platform: Tensorflow (tensorflow.org)

  28. Platform: Tensorflow (tensorflow.org)

  29. Others โ€ข Theano โ€“ CPU/GPU symbolic expression compiler in python (from MILA lab at University of Montreal) โ€ข Torch โ€“ provides a Matlab-like environment for state-of-the-art machine learning algorithms in lua โ€ข Lasagne - Lasagne is a lightweight library to build and train neural networks in Theano โ€ข See: http://deeplearning.net/software_links/

  30. Optimization: momentum

  31. Basic algorithms โ€ข Minimize the (regularized) empirical loss 1 เท  ๐‘œ ๐‘œ ฯƒ ๐‘ข=1 ๐‘€ ๐‘† ๐œ„ = ๐‘š(๐œ„, ๐‘ฆ ๐‘ข , ๐‘ง ๐‘ข ) + ๐‘†(๐œ„) where the hypothesis is parametrized by ๐œ„ โ€ข Gradient descent ๐œ„ ๐‘ข+1 = ๐œ„ ๐‘ข โˆ’ ๐œƒ ๐‘ข ๐›ผเท  ๐‘€ ๐‘† ๐œ„ ๐‘ข

  32. Mini-batch stochastic gradient descent โ€ข Instead of one data point, work with a small batch of ๐‘ points (๐‘ฆ ๐‘ข๐‘+1, ๐‘ง ๐‘ข๐‘+1 ) ,โ€ฆ, (๐‘ฆ ๐‘ข๐‘+๐‘, ๐‘ง ๐‘ข๐‘+๐‘ ) โ€ข Update rule 1 ๐œ„ ๐‘ข+1 = ๐œ„ ๐‘ข โˆ’ ๐œƒ ๐‘ข ๐›ผ ๐‘ เท ๐‘š ๐œ„ ๐‘ข , ๐‘ฆ ๐‘ข๐‘+๐‘— , ๐‘ง ๐‘ข๐‘+๐‘— + ๐‘†(๐œ„ ๐‘ข ) 1โ‰ค๐‘—โ‰ค๐‘

  33. Momentum โ€ข Drawback of SGD: can be slow when gradient is small โ€ข Observation: when the gradient is consistent across consecutive steps, can take larger steps โ€ข Metaphor: rolling marble ball on gentle slope

  34. Momentum Contour: loss function Path: SGD with momentum Arrow: stochastic gradient Figure from Deep Learning, by Goodfellow, Bengio, and Courville

  35. Momentum โ€ข work with a small batch of ๐‘ points (๐‘ฆ ๐‘ข๐‘+1, ๐‘ง ๐‘ข๐‘+1 ) ,โ€ฆ, (๐‘ฆ ๐‘ข๐‘+๐‘, ๐‘ง ๐‘ข๐‘+๐‘ ) โ€ข Keep a momentum variable ๐‘ค ๐‘ข , and set a decay rate ๐›ฝ โ€ข Update rule 1 ๐‘ค ๐‘ข = ๐›ฝ๐‘ค ๐‘ขโˆ’1 โˆ’ ๐œƒ ๐‘ข ๐›ผ ๐‘ เท ๐‘š ๐œ„ ๐‘ข , ๐‘ฆ ๐‘ข๐‘+๐‘— , ๐‘ง ๐‘ข๐‘+๐‘— + ๐‘†(๐œ„ ๐‘ข ) 1โ‰ค๐‘—โ‰ค๐‘ ๐œ„ ๐‘ข+1 = ๐œ„ ๐‘ข + ๐‘ค ๐‘ข

  36. Momentum โ€ข Keep a momentum variable ๐‘ค ๐‘ข , and set a decay rate ๐›ฝ โ€ข Update rule 1 ๐‘ค ๐‘ข = ๐›ฝ๐‘ค ๐‘ขโˆ’1 โˆ’ ๐œƒ ๐‘ข ๐›ผ ๐‘ เท ๐‘š ๐œ„ ๐‘ข , ๐‘ฆ ๐‘ข๐‘+๐‘— , ๐‘ง ๐‘ข๐‘+๐‘— + ๐‘†(๐œ„ ๐‘ข ) 1โ‰ค๐‘—โ‰ค๐‘ ๐œ„ ๐‘ข+1 = ๐œ„ ๐‘ข + ๐‘ค ๐‘ข โ€ข Practical guide: ๐›ฝ is set to 0.5 until the initial learning stabilizes and then is increased to 0.9 or higher.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend