CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS Neural networks Fully - PowerPoint PPT Presentation

CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS

Neural networks • Fully connected networks • Neuron • Non-linearity • Softmax layer • DNN training • Loss function and regularization • SGD and backprop • Learning rate • Overfitting – dropout, batchnorm • CNN, RNN, LSTM, GRU <- This class

Notes on non-linearity • Sigmoid Models get stuck if fall go far away from 0. Output always positive https://www.analyticsvidhya.com/blog/2017/10/fundamentals- deep-learning-activation-functions-when-to-use-them/

Notes on non-linearity • Tanh Output can be +-. Models get stuck if fall go far away from 0 https://www.analyticsvidhya.com/blog/2017/10/fundamentals- deep-learning-activation-functions-when-to-use-them/

Notes on non-linearity • ReLU High gradient in positive. Fast compute. Gradient doesn’t move in negative https://www.analyticsvidhya.com/blog/2017/10/fundamentals- deep-learning-activation-functions-when-to-use-them/

Notes on non-linearity • Leaky ReLU Negative part now have some gradient. Real task doesn’t seem that much better than ReLU https://www.analyticsvidhya.com/blog/2017/10/fundamentals- deep-learning-activation-functions-when-to-use-them/

So we learn about basis and projections

Projections and Neural network weights • w T x w x w x

Projections and neural network weights • W T x w 1 x w 2 w 1 x w 2

Projections and neural network weights • W T x w 1 x fisher projection = V T W T x = (WV) T x w 2 LDA projections v 1 w 1 y x = y w 2 v 2

Projections and neural network weights • W T x w 1 x fisher projection = V T W T x = (WV) T x w 2 Wv 1 Wv 2 LDA projections v 1 w 1 y x = y w 2 v 2

Projections and neural network weights • W T x • Neural network layers as feature transform • Non-linearity prevents merging of layers w 1 x w 2 fisher projection = V T W T x = (WV) T x Wv 1 Wv 2 LDA projections v 1 w 1 y x = y w 2 v 2

Shift in feature space • W T x • What happens if I have a person that is off-frame? w 1 • Need another filter that is shifted x • Can we do better? w 2 fisher projection = V T W T x = (WV) T x Wv 1 Wv 2 LDA projections v 1 w 1 y x = y w 2 v 2

Convolution • Continuous convolution • Meaning of (t-T) : Flip then shift

Convolution visually Demo https://en.wikipedia.org/wiki/Convolution

Convolution discrete • Discrete convolution • Continuous convolution • Same concept as continuous version

Matched filters • We can use convolution to detect things that match our pattern • Convolution can be considered as a filter (Why? Take ASR next semester J ) • If the filter detects our pattern, it will show up as a nice peak even if there are noise. • Demo

Matched filters Red: matched filter Blue: signal Matched peak

Convolution and Cross-Correlation • Convolution • (Cross)-Correlation Convolution and cross-correlation are the same if g(t) is symmetric (even function). For some unknown reason, people use convolution in CNN to mean cross-correlation. From this point onwards, when we say convolution we mean cross-correlation.

2D convolution • Flip and shifts in 2D • But, we no longer flips Our match filter Will get some peak here

Shift in feature space • W T x • What happens if I have a person that is off-frame? w 1 • Need another filter that is shifted x w 2 fisher projection = V T W T x = (WV) T x Wv 1 Wv 2 LDA projections v 1 w 1 y x = y w 2 v 2

Shift in feature space • W T x • What happens if I have a person that is off-frame? w 1 • Ans: Convolution with W as filter x w 2 fisher projection = V T W T x = (WV) T x Wv 1 Wv 2 LDA projections v 1 w 1 y x = y w 2 v 2

Convolutional Neural networks • A neural network with convolutions! (cross-correlation to be precise) • But we have peaks at different location From the point of view of a network, these are two different things.

Pooling layers/Subsampling layers • Combine different locations into one • One possible method is to use a max • Interpretation: Yes, I found a cat somewhere Max 1 filter

Convolutional filters • Convolutional layer consists of • Small filter patches Convolution • Pooling to remove variation output1 98x98 filter1 Convolution output2 filter2 Input image Filter3 100 x 100 3x3 Convolution output3

4 5 6 Convolutional filters x 4*1 + 5*2 + 6*3 = 32 1 2 3 • Convolutional layer consists of • Small filter patches Convolution • Pooling to remove variation output1 98x98 Convolution filter1 output2 filter2 Input image Filter3 100 x 100 3x3 Convolution output3

Convolutional filters • Convolutional layer consists of • Small filter patches Convolution • Pooling to remove variation output1 98x98 Convolution filter1 output2 filter2 Input image Filter3 100 x 100 3x3 Convolution output3

Pooling/subsampling • Convolutional layer consists of • Small filter patches • Pooling to remove variation Layer output1 33x33 Convolution output1 98x98 3x3 Max filter with no overlap Layer output2 Convolution output2

max Pooling/subsampling = 6 4 5 6 • Convolutional layer consists of • Small filter patches • Pooling to remove variation Layer output1 33x33 Convolution output1 98x98 3x3 Max filter with no overlap Layer output2 Convolution output2

Pooling/subsampling • Convolutional layer consists of • Small filter patches • Pooling to remove variation Layer output1 33x33 Convolution output1 98x98 3x3 Max filter with no overlap Layer output2 Convolution output2

CNN overview • Filter size, number of filters, filter shifts, and pooling rate are all parameters • Usually followed by a fully connected network at the end • CNN is good at learning low level features • DNN combines the features into high level features and classify https://en.wikipedia.org/wiki/Convolutional_neural_network#/media/File:Typical_cnn.png

Parameter sharing in convolution neural networks • W T x • Cats at different location might need two neurons for different locations in fully w 1 connect NNs. x • CNN shares the parameters in 1 filter w 2 • The network is no longer fully connected Layer n+1 pooling convolutional Layer n Layer n-1 Layer n-1

Pooling/subsampling • Max filter -> Maxout • Backward pass? • Gradient pass through the maximum location, 0 otherwise

Norms (p-norm or Lp-norm) • For any real number p > 1 • For p = ∞ • We’ll see more of p-norms when we get to neural networks https://en.wikipedia.org/wiki/Lp_space

Pooling/subsampling • Max filter -> Maxout • Backward pass? • Gradient pass through the maximum location, 0 otherwise • P-norm filter • Fully connected layer – (1x1 convolutions) • Recently, people care less about the meaning of pooling as way to introduce a shift invariance, but more as a dimension reduction (since conv layers usually has a higher dimension than the input)

1x1 Convolutions • Convolutional layer consists of • Small filter patches Convolution • Pooling to remove variation output1 98x98 Convolution filter1 output2 filter2 Input image Filter3 100 x 100 3x3 Convolution output3

1x1 Convolutions 1x1 filters (in space) 1x1xK from the previous output Convolution output1 98x98 Convolution output2 Convolution output3

1x1 Convolutions 1x1 filters (in space) 1x1xK from the previous output Convolution output1 98x98 Sum over channels Convolution output2 Convolution output3

1x1 Convolutions 1x1 filters (in space) 1x1xK from the previous output Convolution output1 98x98 Convolution output1 98x98 Sum over channels Convolution output2 Convolution output3

1x1 Convolutions 1x1 filters (in space) 1x1xK from the previous output Convolution output1 98x98 Convolution output2 Convolution 98x98 output1 98x98 Sum over channels Convolution output2 If we have less 1x1 filters than previous level, we just perform dimensionality reduction. Convolution output3

Common schemes • INPUT -> [CONV -> RELU -> POOL]*N -> [FC -> RELU]*M -> FC • INPUT -> [CONV -> RELU -> CONV -> RELU -> POOL]*N -> [FC -> RELU]*M -> FC • If you working with images, just use a winning architecture.

CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS Neural networks Fully - PowerPoint PPT Presentation

CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS Neural networks Fully connected networks Neuron Non-linearity Softmax layer DNN training Loss function and regularization SGD and backprop Learning rate Overfitting

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) CMSC 678 UMBC Recap

Convolutional Neural Networks (CNNs) Recurrent Neural Networks (RNNs) L1 Scalar Processor L0

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

TrustNeighborhoods: Visualizing Trust in Distributed File Sharing Trust in Distributed File

File storage and file sharing for human rights organizations A design research case study

PHP & Web Services / PHP & XML Unix IDs: baxter muhammad sonu walberg 1 PHP &

Particle identification and Particle identification and Hadron/Jet correlations analysis f

Information Visualization Online lectures and office hours start today, using Zoom:

A Simple Approach for Author Profiling in MapReduce Suraj Maharjan , Prasha Shrestha, and Thamar

The Kalman Filter (part 1) Administrative Stuff Rudolf Emil Kalman

Nonlinear Prefiltering for Surface Shading Presenter: Chun-Po Wang, Pramook Khungurn MOTIVATION

CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS Neural networks Fully - PowerPoint PPT Presentation

CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS Neural networks Fully connected networks Neuron Non-linearity Softmax layer DNN training Loss function and regularization SGD and backprop Learning rate Overfitting

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) CMSC 678 UMBC Recap

Convolutional Neural Networks (CNNs) Recurrent Neural Networks (RNNs) L1 Scalar Processor L0

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

TrustNeighborhoods: Visualizing Trust in Distributed File Sharing Trust in Distributed File

File storage and file sharing for human rights organizations A design research case study

PHP &amp; Web Services / PHP &amp; XML Unix IDs: baxter muhammad sonu walberg 1 PHP &amp;

Particle identification and Particle identification and Hadron/Jet correlations analysis f

Information Visualization Online lectures and office hours start today, using Zoom:

A Simple Approach for Author Profiling in MapReduce Suraj Maharjan , Prasha Shrestha, and Thamar

The Kalman Filter (part 1) Administrative Stuff Rudolf Emil Kalman

Nonlinear Prefiltering for Surface Shading Presenter: Chun-Po Wang, Pramook Khungurn MOTIVATION

PHP & Web Services / PHP & XML Unix IDs: baxter muhammad sonu walberg 1 PHP &