Convolutional Neural Networks Kaitlin Palmer San Diego State - PDF document

Convolutional Neural Networks Kaitlin Palmer San Diego State University 1 Outline • What are Convolutional Neural Networks (CNN) • Why use a CNN • Typical Layout – Kernel Size – Stride Size/Padding – Pooling • Keras Implementation 2 2 1

What are CNNs? • Neural networks that use convolution (or cross correlation) of a weight and bias matrix rather than matrix multiplication 3 3 What are CNNs? s 𝑢 = ׬ 𝑦 𝑏 𝑥 𝑢 − 𝑏 𝑒𝑏 • Spaceship example • s(t) - smoothed estimate of the • x(t) - radar position • a – age of measurement • w(t-a) – weighting function • Considered probability density function • 0 for all negative zeros 4 ISS tracking Data: https://www.nasa.gov/pdf/686319main_AP_ED_Stats_RadarData.pdf 4 2

What are CNNs? Discretized ∞ 𝑦 𝑏 𝑥(𝑢 − 𝑏) s 𝑢 = σ −∞ • s(t) – feature map • x(a) input (multi-dimensional array) • w(t-a) kernel (multi-dimensional array) 5 5 What are CNNs? • What is convolution? • Practice example 1D (summation of the products) ∗ 0 1 1 2 5 3 0 0 1 0 1 1 0 2 3 7 8 3 0 6 6 3

What are CNNs? Multi-dimensional Array 𝑔 𝑦 = ෍ ෍ 𝐽 𝑗 − 𝑛, 𝑘 − 𝑜 𝐿(𝑛, 𝑜) 𝑜 𝑛 Beware matrix flipping – convolution vs. cross correlation 𝑔 𝑦 = ෍ ෍ 𝐽 𝑗 + 𝑛, 𝑘 + 𝑜 𝐿(𝑛, 𝑜) 𝑜 𝑛 7 7 Why CNNs? • Sparse Interactions • Parameter sharing • Equivariant Representations 8 8 4

Why CNNs • Sparse Interactions – AKA sparse connectivity or sparse weights – Fewer Parameters – Kernels (storing) smaller than input – Tens or hundreds of parameters to learn vs. millions 9 Fig. 9.2 Goodfellow et al. 9 Why CNNs • Parameter Sharing – Same parameter for more than one function in a model – Weights applied to one input applied elsewhere – Each member of the kernel used at every position – One set of parameters is learned- regardless of location 10 10 5

Why CNNs • Sparse Interactions – Receptive Field – Few Direct connections but units in deeper layers indirectly connected to most of the input image Fig. 9.4 Goodfellow et al. 11 11 Why CNNs • Parameter Sharing – Each kernel value used at every position of the input – Convolution Example • 280 x 320 * 280 x 319 = 319*280*3 = 267,960 [two multiplications and one addition per kernel • 320 x 280 x 319 x 280 = >8 billion parameters 4 billion times more effective Fig. 9.5 Goodfellow et al. 12 12 6

Why CNN’s • Equivariance to translation – If input changes output changes by the same amount – Event moves later in time (or location) in input shifts the same in output – Not naturally invariant to rotation or scale 13 13 Why CNN’s • Edge Detection Example Fig. 9.6 Goodfellow et al. 14 14 7

8 8 8 0 0 0 8 8 8 0 0 0 0 24 24 0 8 8 8 0 0 0 1 0 -1 0 24 24 0 8 8 8 0 0 0 ∗ = 1 0 -1 0 24 24 0 8 8 8 0 0 0 1 0 -1 0 24 24 0 8 8 8 0 0 0 ∗ 15 Andrew Ng 2017 15 CNN Layout • Kernel Size (typically odd) • Stride/ padding • Pooling 16 16 8

CNN Layout • Padding 17 17 CNN Layout • Why padding? – Input shrinks at each layer – Edge Effects • Padding Types and Terminology – Valid: No padding – Same: Make output size the same as the input size – Full: Sufficient pixels to be visited k times in each direction 18 18 9

Layout - Step Size • AKA stride • Equivalent to hop size- advance • Down sample neural network 19 19 Convolutions on RGB ∗ = 4 x 4 Andrew ng 20 20 10

CNN Layout- Pooling • Pooling Layers • Invariant to small translations of the input • Replace net output with summary statistic – Max pooling – Neighborhood average – L 2 norm – Weighted average distance from central pixel 21 21 Pooling • Pooling – Equivalent to infinitely strong prior – Max Pooling Example Fig. 9.8 Goodfellow et al. 22 22 11

Pooling • Down sampling – Computational Efficiency – Possible to use fewer pooling units than detector layer – Pool over k pixels 23 Fig. 9.10 Goodfellow et al. 23 Pooling • Invariance to translation 24 Fig. 9.9 Goodfellow et al. 24 12

Pooling Invariance Yann LeCun: http://yann.lecun.com/exdb/lenet/stroke-width.html 25 25 CNN Layout Fig. 9.9 Goodfellow et al. 26 26 13

Keras Implementation LeCun et al. 1998 Gradient Based Learning Applied to Document Recognition 27 27 Keras Implementation LeNet-5 model = Sequential() model.add(Conv2D(6, kernel_size=(5, 5), activation= tanh ', input_shape=input_shape)) model.add(MaxPooling2D(6, pool_size=(2, 2))) model.add (Conv2D(16, (5, 5), activation=‘ tanh ’)) model.add(MaxPooling2D(16, pool_size=(2, 2))) model.add(Conv2D(120, (5, 5), activation= ‘ tanh ')) model.add(Dense(84, activation= ‘sigmoid ')) model.add(Dense(num_classes, activation='softmax')) 28 28 14

Convolution Backpropagation • Convolution, Backpropagation from output to weights, Backpropagation from output to input • Kernel stack K • Multidimensional input (e.g. image) V • Stride s • Convolution output (feature map) Z • Loss function J 29 29 Backpropagation 𝐷𝑝𝑜𝑤𝑝𝑚𝑣𝑢𝑗𝑝𝑜 = 𝑑 𝐿, 𝑊, 𝑡 = 𝑎 Backpropagation from output to kernel 𝑀𝑝𝑡𝑡 𝐺𝑣𝑜𝑑𝑢𝑗𝑝𝑜 = 𝐾 𝑊, 𝐿 𝜖 Tensor, change loss with 𝐻 = = 𝐾(𝑊, 𝐿) respect to feature map 𝜖𝑎 𝑗,𝑘,𝑙 𝜖 Derivatives with 𝑕(𝐻, 𝑊, 𝑡) 𝑗,𝑘,𝑙,𝑚 = = ෍ 𝐻 𝑗,𝑛,𝑜 𝑊 𝑘, 𝑛−1 𝑡+𝑙, 𝑜−1 𝑡+𝑚 respect to the 𝜖𝐿 𝑗,𝑘,𝑙,𝑚 𝑛,𝑜 kernel 𝜖 Backpropagation ℎ(𝐿, 𝐻, 𝑡) 𝑗,𝑘,𝑙 = 𝐾(𝑊, 𝐿) = ෍ ෍ ෍ 𝐿 𝑟,𝑗,𝑛,𝑞 𝐻 𝑟,𝑚,𝑜 through hidden 𝜖𝑊 𝑗,𝑘,𝑙 𝑜,𝑞 𝑚,𝑛 𝑟 layer 𝑡.𝑢. 𝑡.𝑢. 𝑜−1 𝑡+𝑞=𝑙 𝑚−1 𝑡+𝑛=𝑘 30 30 15

Structured Output • For pixel-wise labeling of images pooling is not always necessary Fig. 9.17 Goodfellow et al. https://sthalles.github.io/deep_segmentation_network/ 31 31 Locally Connected Layers • Aka unshared convolution Locally connected layer • Features a fx small portion of (patch size 2) space, but not across all space • Look for chin in the bottom half Convolutional Layer of an image 𝑎 𝑗,𝑘,𝑙 = ෍ [𝑊 𝑚,𝑘+𝑛−1,𝑙+𝑜−1 𝑥 𝑗,𝑘,𝑙,𝑚,𝑛,𝑜 ] Fully connected layer 𝑚,𝑛,𝑜 Fig. 9.14 Goodfellow et al. 32 32 16

Tiled Convolution • Midway between locally connected Locally layers and convolutional layer connected layer • (patch size 2) Learn a set of kernels to rotate through • Immediate neighbors different filters but memory size increased only by a Tiled convolution (t=2) factor of the size of the kernels 𝑊 𝑚,𝑘+𝑛−1,𝑙+𝑜−1 𝑎 𝑗,𝑘,𝑙 = ෍ Traditional convolution 𝐿 𝑗,𝑚,𝑛,𝑜,𝑘%𝑢+1,𝑙%𝑢+1 ~tiled convolution with 𝑚,𝑛,𝑜 t=1 Fig. 9.16 Goodfellow et al. 33 33 Data Types • Flexibility in CNNs • Multiple input sizes 34 34 17

Data Types • 1D Multi-channel Single Channel Position Rotation Scale www.riotgames.com 35 35 Data Types • 2D Multi-channel Single Channel 36 36 18

Data Types • 3D Multi Channel Single Channel 37 37 Random or Unsupervised Features • Learning features is expensive – Every gradient step requires full forward/back prop • Use features not trained in a supervised fashion 38 38 19

Random or Unsupervised Features • Random kernel initialization • Design kernels by hand • Learn kernels with an unsupervised criterion 39 39 Random or Unsupervised Features • Random kernel initialization – As before, random weights typically perform well – Need to test multiple architectures • Good approach: – Build multiple architectures – Set random weights – Only train the last layer- pick the best architecture and train using full back prop 40 40 20

Random or Unsupervised Features • Learn kernels ( k ) using unsupervised criterion – Allows features to be determined separately from the classifier late in the architecture – What unsupervised tools have we used so far? – K-means clustering to image patches, each centroid as a convolution kernel – Extract k-means for the entire training set and use this as the last layer before classification 41 41 Random or Unsupervised Features • Hand designed features ? ? ? 42 42 21

Neurobiologically Inspired Networks • Hubel and Wisel, 1959,1962,1968 Utdallas.edu https://www.youtube.com/watch?v=IOHayh06LJ4 43 43 Neurobiological Basis • Simple cells – Roughly linear – Feature selection • Complex cells – Nonlinear – Invariant to some transformations of simple cell features 44 44 22

Convolutional Neural Networks Kaitlin Palmer San Diego State - PDF document

Convolutional Neural Networks Kaitlin Palmer San Diego State University 1 Outline What are Convolutional Neural Networks (CNN) Why use a CNN Typical Layout Kernel Size Stride Size/Padding Pooling Keras Implementation

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

MICROBOONE Taritree Wongjirad DPF 2017 Tufts/MIT Outline Convolutional neural networks

Neural Networks + Convolutional Neural Networks Last Class Global Features The perceptron

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Congestion Management for Non-Blocking Clos Networks Nikos Chrysos Inst. of Computer Science

An Open Interface for Hooking Solvers to Modeling Systems Part 2: The Modeling System Interface

Scene Classification with Inception-7 Christian Szegedy with Julian Ibarz and Vincent Vanhoucke

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of less than 9) homeworks Due 22

Deep Learning Tutorial Part II Greg Shakhnarovich TTI-Chicago December 2016 Deep Learning

Switching and Forwarding Outline Store-and-Forward Switches Bridges and Extended LANs Cell

Crypto meets Web Security: Certificates and SSL/TLS Spring 2017 Franziska (Franzi) Roesner

Convolutional Neural Networks Kaitlin Palmer San Diego State - PDF document

Convolutional Neural Networks Kaitlin Palmer San Diego State University 1 Outline What are Convolutional Neural Networks (CNN) Why use a CNN Typical Layout Kernel Size Stride Size/Padding Pooling Keras Implementation

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Convolutional Neural Networks (Part III) 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image

MICROBOONE Taritree Wongjirad DPF 2017 Tufts/MIT Outline Convolutional neural networks

Neural Networks + Convolutional Neural Networks Last Class Global Features The perceptron

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Congestion Management for Non-Blocking Clos Networks Nikos Chrysos Inst. of Computer Science

An Open Interface for Hooking Solvers to Modeling Systems Part 2: The Modeling System Interface

Scene Classification with Inception-7 Christian Szegedy with Julian Ibarz and Vincent Vanhoucke

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of less than 9) homeworks Due 22

Deep Learning Tutorial Part II Greg Shakhnarovich TTI-Chicago December 2016 Deep Learning

Switching and Forwarding Outline Store-and-Forward Switches Bridges and Extended LANs Cell

Crypto meets Web Security: Certificates and SSL/TLS Spring 2017 Franziska (Franzi) Roesner

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image