Lecture 6: Convolutional NN Princeton University COS 495 - PowerPoint PPT Presentation

Deep Learning Basics Lecture 6: Convolutional NN Princeton University COS 495 Instructor: Yingyu Liang

Review: convolutional layers

Convolution: two dimensional case Input Kernel/filter a b c d w x e f g h y z i j k l wa + bx + bw + cx + ey + fz fy + gz Feature map

Convolutional layers the same weight shared for all output nodes 𝑛 output nodes 𝑙 kernel size 𝑜 input nodes Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Terminology Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Case study: LeNet-5

LeNet-5 • Proposed in “ Gradient-based learning applied to document recognition ” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998

LeNet-5 • Proposed in “ Gradient-based learning applied to document recognition ” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998 • Apply convolution on 2D images (MNIST) and use backpropagation

LeNet-5 • Proposed in “ Gradient-based learning applied to document recognition ” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998 • Apply convolution on 2D images (MNIST) and use backpropagation • Structure: 2 convolutional layers (with pooling) + 3 fully connected layers • Input size: 32x32x1 • Convolution kernel size: 5x5 • Pooling: 2x2

LeNet-5 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

LeNet-5 Filter: 5x5, stride: 1x1, #filters: 6 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

LeNet-5 Pooling: 2x2, stride: 2 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

LeNet-5 Filter: 5x5x6, stride: 1x1, #filters: 16 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

LeNet-5 Pooling: 2x2, stride: 2 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

LeNet-5 Weight matrix: 400x120 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Weight matrix: 84x10 LeNet-5 Weight matrix: 120x84 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Software platforms for CNN Updated in April 2016; checked more recent ones online

Platform: Marvin (marvin.is)

Platform: Marvin by

LeNet in Marvin: convolutional layer

LeNet in Marvin: pooling layer

LeNet in Marvin: fully connected layer

Platform: Caffe (caffe.berkeleyvision.org)

LeNet in Caffe

Platform: Tensorflow (tensorflow.org)

Others • Theano – CPU/GPU symbolic expression compiler in python (from MILA lab at University of Montreal) • Torch – provides a Matlab-like environment for state-of-the-art machine learning algorithms in lua • Lasagne - Lasagne is a lightweight library to build and train neural networks in Theano • See: http://deeplearning.net/software_links/

Optimization: momentum

Basic algorithms • Minimize the (regularized) empirical loss 1 ෠ 𝑜 𝑜 σ 𝑢=1 𝑀 𝑆 𝜄 = 𝑚(𝜄, 𝑦 𝑢 , 𝑧 𝑢 ) + 𝑆(𝜄) where the hypothesis is parametrized by 𝜄 • Gradient descent 𝜄 𝑢+1 = 𝜄 𝑢 − 𝜃 𝑢 𝛼෠ 𝑀 𝑆 𝜄 𝑢

Mini-batch stochastic gradient descent • Instead of one data point, work with a small batch of 𝑐 points (𝑦 𝑢𝑐+1, 𝑧 𝑢𝑐+1 ) ,…, (𝑦 𝑢𝑐+𝑐, 𝑧 𝑢𝑐+𝑐 ) • Update rule 1 𝜄 𝑢+1 = 𝜄 𝑢 − 𝜃 𝑢 𝛼 𝑐 ෍ 𝑚 𝜄 𝑢 , 𝑦 𝑢𝑐+𝑗 , 𝑧 𝑢𝑐+𝑗 + 𝑆(𝜄 𝑢 ) 1≤𝑗≤𝑐

Momentum • Drawback of SGD: can be slow when gradient is small • Observation: when the gradient is consistent across consecutive steps, can take larger steps • Metaphor: rolling marble ball on gentle slope

Momentum Contour: loss function Path: SGD with momentum Arrow: stochastic gradient Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Momentum • work with a small batch of 𝑐 points (𝑦 𝑢𝑐+1, 𝑧 𝑢𝑐+1 ) ,…, (𝑦 𝑢𝑐+𝑐, 𝑧 𝑢𝑐+𝑐 ) • Keep a momentum variable 𝑤 𝑢 , and set a decay rate 𝛽 • Update rule 1 𝑤 𝑢 = 𝛽𝑤 𝑢−1 − 𝜃 𝑢 𝛼 𝑐 ෍ 𝑚 𝜄 𝑢 , 𝑦 𝑢𝑐+𝑗 , 𝑧 𝑢𝑐+𝑗 + 𝑆(𝜄 𝑢 ) 1≤𝑗≤𝑐 𝜄 𝑢+1 = 𝜄 𝑢 + 𝑤 𝑢

Momentum • Keep a momentum variable 𝑤 𝑢 , and set a decay rate 𝛽 • Update rule 1 𝑤 𝑢 = 𝛽𝑤 𝑢−1 − 𝜃 𝑢 𝛼 𝑐 ෍ 𝑚 𝜄 𝑢 , 𝑦 𝑢𝑐+𝑗 , 𝑧 𝑢𝑐+𝑗 + 𝑆(𝜄 𝑢 ) 1≤𝑗≤𝑐 𝜄 𝑢+1 = 𝜄 𝑢 + 𝑤 𝑢 • Practical guide: 𝛽 is set to 0.5 until the initial learning stabilizes and then is increased to 0.9 or higher.

Lecture 6: Convolutional NN Princeton University COS 495 - PowerPoint PPT Presentation

Deep Learning Basics Lecture 6: Convolutional NN Princeton University COS 495 Instructor: Yingyu Liang Review: convolutional layers Convolution: two dimensional case Input Kernel/filter a b c d w x e f g h y z i j k l wa + bx

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Anytime Reliability of Systematic LDPC Motivation Convolutional Codes LDPC Convolutional Codes

Convolutional Autoencoder (CAE) Prof. Seungchul Lee Industrial AI Lab. Convolutional Autoencoder

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Convolutional Neural Nets CS447 Natural Language Processing (J. Hockenmaier)

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

ON TEGRA X1 ALAN WANG, NVIDIA Convolutional Neural Network optimization target Result

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Coding and decoding with convolutional codes. The Viterbi Algorithm. J.-M. Brossier 2008 J.-M.

ECEN 5682 Theory and Practice of Error Control Codes Convolutional Codes Peter Mathys

Coding and decoding with convolutional codes. The Viterbi Algorithm. J.-M. Brossier 2008 J.-M.

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Neural Turing Machines Tristan Deleu June 23, 2016 @tristandeleu Deep Learning The

CS535: Deep Learning Winter 2018 Fuxin Li Course Information Instructor: Dr. Fuxin Li

Turning spaghetti into lasagne Applying the principles of application frameworks to Applying the

Introduction to Machine Learning Deep Learning Applications Barnabs Pczos Applications

Year 4 Literacy Lesson One I can consistently choose nouns or pronouns appropriately to aid

Lecture 13: Introduction to Deep Learning Deep Convolutional Neural Networks Aykut Erdem

Fermilab Keras Workshop Stefan Wunsch stefan.wunsch@cern.ch December 8, 2017 1 What is this

Introduction Welcome CSLog : Combinatorial Optimization, Discrete Algorithms and Logistics

Lecture 6: Convolutional NN Princeton University COS 495 - PowerPoint PPT Presentation

Deep Learning Basics Lecture 6: Convolutional NN Princeton University COS 495 Instructor: Yingyu Liang Review: convolutional layers Convolution: two dimensional case Input Kernel/filter a b c d w x e f g h y z i j k l wa + bx

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Anytime Reliability of Systematic LDPC Motivation Convolutional Codes LDPC Convolutional Codes

Convolutional Autoencoder (CAE) Prof. Seungchul Lee Industrial AI Lab. Convolutional Autoencoder

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Convolutional Neural Nets CS447 Natural Language Processing (J. Hockenmaier)

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

ON TEGRA X1 ALAN WANG, NVIDIA Convolutional Neural Network optimization target Result

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Coding and decoding with convolutional codes. The Viterbi Algorithm. J.-M. Brossier 2008 J.-M.

ECEN 5682 Theory and Practice of Error Control Codes Convolutional Codes Peter Mathys

Coding and decoding with convolutional codes. The Viterbi Algorithm. J.-M. Brossier 2008 J.-M.

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Neural Turing Machines Tristan Deleu June 23, 2016 @tristandeleu Deep Learning The

CS535: Deep Learning Winter 2018 Fuxin Li Course Information Instructor: Dr. Fuxin Li

Turning spaghetti into lasagne Applying the principles of application frameworks to Applying the

Introduction to Machine Learning Deep Learning Applications Barnabs Pczos Applications

Year 4 Literacy Lesson One I can consistently choose nouns or pronouns appropriately to aid

Lecture 13: Introduction to Deep Learning Deep Convolutional Neural Networks Aykut Erdem

Fermilab Keras Workshop Stefan Wunsch stefan.wunsch@cern.ch December 8, 2017 1 What is this

Introduction Welcome CSLog : Combinatorial Optimization, Discrete Algorithms and Logistics

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing