Convolutional Neural Networks Basics Praveen Krishnan Overview - PowerPoint PPT Presentation

Convolutional Neural Networks Basics Praveen Krishnan

Overview  Paradigm Shift  Simple Network  Convolutional Network  Layers  Case Study1: Alex Net  Training  Generalization  Visualizations  Transfer Learning  Case Study2: JAZ Net  Practical Aspects  Gradient checks.  Data  GPU Coding/Libraries

Part Paradigm Shift Models Feature Sparrow Extraction Coding Pooling Classifier (SIFT, HoG,.) Sparrow Feature Learning Classifier (CNN, RBM, …) Layers - L 3 (Hierarchical L 1 L 2 L 4 decomposition)

A simple network f 1 f 2 f n-1 f n x 0 x n-2 x n-1 x n x 1 w 1 w 2 w n-1 w n Here each output x j depends on previous input x j-1 through a function f j with parameters w j

Feed forward neural network x n1 x 00 x n2 x 01 x nc x 0d W 1 W n Zooming In

Feed forward neural network x n1 x 00 x n2 x 01 x nc x 0d W 1 W n z LOSS y = [0,0,…,1,…0]

Feed forward neural network x n1 x n2 x nc W 1 W n z LOSS Weight updates using back propagation of gradients

Convolutional Network Fully connected layer Locally connected layer 200x200x3 3x3x3 • #Hidden Units: 1200,00 • #Hidden Units: 1200,00 • #Params: 12 billion • #Params: 1.08 Million • Need huge training data to prevent • Useful when the image is highly registered over-fitting!

Convolutional Network Convolutional layer 3x3x3 • #Hidden Units: 1200,00 • #Params: 27 • #feature map: 1 • Exploiting the stationarity property.

Convolutional Network Receptive field Convolutional layer 3 3 200 3 3 # feature maps • Use of multiple feature maps. • Sharing parameters • Exploits stationarity of statistics. • Preserves locality of pixel dependencies.

Convolutional Network 200x200x3 Image size: W1xH1xD1 Receptive field size: FxF #Feature maps: K Q. Find out W2,H2 and D2 ?

Convolutional Network 200x200x3 Image size: W1xH1xD1 It is also better to do Receptive field size: FxF zero padding to #Feature maps: K preserve input size spatially. W2=(W1-F)/S+1 H2=(H1-F)/S+1 D2=K

Convolutional Layer y 1 n y 2 n Conv. x 1 n-1 Layer x 2 n-1 x 3 n-1 Here “f” is a non -linear activation function. F= no. of input feature maps n= layer index “*” represents convolution/correlation ? Q. Is there a difference between correlation and convolution in learned network?

Activation Functions Sigmoid tanh ReLU maxout Leaky ReLU

A Typical Supervised CNN Architecture  A typical deep convolutional network SOFTMAX CONV NORM CONV NORM POOL POOL FC  Other layers  Pooling  Normalization  Fully connected  etc.

SOFTMAX CONV NORM CONV NORM POOL POOL FC Pooling Layer Pool Size: 2x2 Stride: 2 2 8 9 4 Type: Max 8 9 3 6 5 7 5 7 3 1 6 4 2 5 7 3 Max pooling  Aggregation over space or feature type.  Invariance to image transformation and increases compactness to representation.  Pooling types: Max, Average, L2 etc.

SOFTMAX CONV NORM CONV NORM POOL POOL FC Normalization  Local contrast normalization (Jarrett et.al ICCV‟09)  reduce illumination artifacts.  performs local subtractive and divisive normalization.  Local response normalization (Krizhevesky et.al. NIPS‟12)  form of lateral inhibition across channels.  Batch normalization (More later)

SOFTMAX CONV NORM CONV NORM POOL POOL FC Fully connected  Multi layer perceptron  Role of an classifier**  Generally used in final layers to classify the object represented in terms of discriminative parts and higher semantic entities.

Case Study: AlexNet  Winner of ImageNet LSVRC-2012.  Trained over 1.2M images using SGD with regularization.  Deep architecture (60M parameters.)  Optimized GPU implementation (cuda-convnet) Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." NIPS 2012. - Cited by 11915

deep convolutional neural networks." NIPS 2012. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with Case Study: AlexNet CONV 11x11x96 LRN MAX POOL 2x2 CONV 5x5x256 LRN MAX POOL 2x2 CONV 3x3x384 MAX POOL 2x2 CONV 3x3x384 MAX POOL 2x2 CONV 3x3x256 MAX POOL 2x2 FC - 4096 FC - 4096 SOFTMAX - 1000

Training LOSS  Learning: Minimizing the loss function (incl. FC regularization) w.r.t. parameters of the network. Filter weights NORM POOL  Mini batch stochastic gradient descent CONV  Sample a batch of data.  Forward propagation NORM POOL  Backward propagation CONV  Parameter update x n y n

Training LOSS  Back propagation FC  Consider an layer f with parameters w : Here z is scalar which is the loss computed from NORM loss function h. The derivative of loss function w.r.t POOL to parameters is given as: CONV NORM Recursive eq. which POOL is applicable to each CONV layer x n y n

Training LOSS  Parameter update FC  Stochastic gradient descent Here η is the learning rate and θ is the set of all parameters NORM  Stochastic gradient descent with momentum POOL CONV NORM POOL CONV More in coming slides… x n y n

Training LOSS  Loss functions. FC Measures the compatibility between prediction and ground truth.  one vs. rest classification  Soft-max classifier (cross entropy loss) NORM POOL CONV Derivative w.r.t. x i NORM POOL CONV Proof? x n y n

Training LOSS  Loss functions. FC  one vs. rest classification  Hinge Loss Hinge loss is a convex function but not differentiable but sub-gradient exists. NORM POOL Sub-gradient w.r.t. x i CONV NORM POOL CONV x n y n

Training LOSS  Loss functions. FC  Regression  Euclidean loss / Squared loss NORM Derivative w.r.t. x i POOL CONV NORM POOL CONV x n y n

Training  Visualization of loss function Typically viewed as Initialization highly non-convex function but more recently it‟s believed to have smoother surfaces but with many Loss saddle regions ! θ Step direction Step size/learning rate Momentum

Loss Training θ  Momentum  Better convergence rates.  Physical perspective: Affects velocity of the update.  Higher velocity in the consistent direction of gradient.  Momentum update: Position Velocity

Loss Training θ  Learning Rates ( η )  Controls the kinetic energy of the updates.  Important to know when the decay the η .  Common methods (Annealing):-  Step decay  Exponential/log space decay  Manual  Adaptive learning methods  Adagrad  RMSprop Figure courtesy: Fei Fei et al. , cs231n

Loss Training θ  Initialization  Never initialize weights to all zero‟s or same value. (Why?)  Popular techniques:-  Random values sampled from N(0,1)  Xavier (Glorot et.al JMLR‟10)  Scale of initialization is dependent on the number of input and output neurons.  Initial weights are sampled from N(0,var(w)) Fan-in Fan-out  Pre-training  Using RBMs. (Hinton et.al, Science 2006)

Training  Generalization  How to prevent? val-2 accuracy (overfitting)  Underfitting – Deeper n/ w‟s  Overfitting top5- error  Stopping at the right time. val-1 accuracy (*)  Weight penalties.  L1  L2  Max norm training accuracy  Dropout  Model ensembles epoch  E.g. Same model, different initializations.

Generalization  Dropouts  Stochastic regularization.  Idea applicable to many other networks.  Dropping out hidden units randomly Before dropout with fixed probability „p‟ (say 0.5) temporarily while training.  While testing the all units are preserved but scaled with „p‟.  Dropouts along with max norm constraint is found to be useful. After dropout Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. JMLR 2014

Generalization  Without dropout  With dropout Features learned with one hidden layers auto-encoder on MNIST dataset. Sparsity Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. JMLR 2014

Generalization  Batch Normalization  Covariate Shift  defined as a change in the distribution of a function‟s domain  Mini-batches (randomized) reduces the effect of covariate shift  Internal Covariate Shift go water the plants got water in kite bang eat your pants face monkey  Current layer parameters change the distribution of the input to successive layers.  Slows down training and careful initialization. Image Credit: https://gab41.lab41.org/batch-normalization-what-the-hey-d480039a9e3b

Generalization  Batch Normalization  Fixes the distribution of layer input as training progresses.  Faster convergence. Ioffe, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, arxiv'15

Some results on ImageNet GoogLeNet Clarifai AlexNet Source: Krizhevsky et.al. NIPS‟12 T op-5 classification accuracy

Convolutional Neural Networks Basics Praveen Krishnan Overview - PowerPoint PPT Presentation

Convolutional Neural Networks Basics Praveen Krishnan Overview Paradigm Shift Simple Network Convolutional Network Layers Case Study1: Alex Net Training Generalization Visualizations Transfer Learning Case

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

MICROBOONE Taritree Wongjirad DPF 2017 Tufts/MIT Outline Convolutional neural networks

Neural Networks + Convolutional Neural Networks Last Class Global Features The perceptron

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Thermodynamics of a Black Hole with Moon Alexandre Le Tiec Laboratoire Univers et Th eories

Stationary reflection Spencer Unger, joint work with Yair Hayut Tel Aviv University July 28,

3D-LD: a graphical Wavelet-based method for Analyzing Scaling Processes Steve UHLIG

Partial Stationary Reflection Principles Toshimichi Usuba ( ) Nagoya University

The solution of the Continuum Problem gen. suparcomapct card. (2/11) A The solution of the

Height fluctuations for the stationary KPZ equation P.L. Ferrari with A. Borodin, I. Corwin and B.

Network Traffic Characterization using Energy TF Distributions Angelos K. Marnerides

Second-law like inequalities for transitions between non-stationary states D. Lacoste

Sambuz

Useful Links

Newsletter

Mail Us

Convolutional Neural Networks Basics Praveen Krishnan Overview - PowerPoint PPT Presentation

Convolutional Neural Networks Basics Praveen Krishnan Overview Paradigm Shift Simple Network Convolutional Network Layers Case Study1: Alex Net Training Generalization Visualizations Transfer Learning Case

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Convolutional Neural Networks (Part III) 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image

MICROBOONE Taritree Wongjirad DPF 2017 Tufts/MIT Outline Convolutional neural networks

Neural Networks + Convolutional Neural Networks Last Class Global Features The perceptron

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Thermodynamics of a Black Hole with Moon Alexandre Le Tiec Laboratoire Univers et Th eories

Stationary reflection Spencer Unger, joint work with Yair Hayut Tel Aviv University July 28,

3D-LD: a graphical Wavelet-based method for Analyzing Scaling Processes Steve UHLIG

Partial Stationary Reflection Principles Toshimichi Usuba ( ) Nagoya University

The solution of the Continuum Problem gen. suparcomapct card. (2/11) A The solution of the

Height fluctuations for the stationary KPZ equation P.L. Ferrari with A. Borodin, I. Corwin and B.

Network Traffic Characterization using Energy TF Distributions Angelos K. Marnerides

Second-law like inequalities for transitions between non-stationary states D. Lacoste

Sambuz

Useful Links

Newsletter

Mail Us

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image