Deep Convolutional Neural Nets COMPSCI 371D Machine Learning - PowerPoint PPT Presentation

Deep Convolutional Neural Nets COMPSCI 371D — Machine Learning COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 1 / 25

Outline 1 Why Neural Networks? 2 Circuits 3 Neurons, Layers, and Networks 4 Correlation and Convolution 5 AlexNet COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 2 / 25

Why Neural Networks? Why Neural Networks? • Neural networks are very expressive (large H ) • Can approximate any well-behaved function from a hypercube in R d to an interval in R within any ǫ > 0 • Universal approximators • However • Complexity grows exponentially with d = dim( X ) • L T is not convex (not even close) • Large H ⇒ overfitting ⇒ lots of data! • Amazon’s Mechanical Turk made neural networks possible • Even so, we cannot keep up with the curse of dimensionality! COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 3 / 25

Why Neural Networks? Why Neural Networks? • Neural networks are data hungry • Availability of lots of data is not a sufficient explanation • There must be deeper reasons • Special structure of image space (or audio space)? • Specialized network architectures? • Regularization tricks and techniques? • We don’t really know. Stay tuned... • Be prepared for some hand-waving and empirical statements COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 4 / 25

Circuits Circuits • Describe implementation of h : X → Y on a computer • Algorithm: A sequence of finite steps • Circuit : Many gates of few types, wired together • These are NAND gates. We’ll use neurons • Algorithms and circuits are equivalent • Algorithm can simulate a circuit • Computer is a circuit that runs algorithms! • Computer really only computes Boolean functions... COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 5 / 25

Circuits Deep Neural Networks as Circuits • Neural networks are typically described as circuits • Nearly always implemented as algorithms • One gate, the neuron • Many neurons that receive the same input form a layer • A cascade of layers is a network • A deep network has many layers • Layers with a special constraint are called convolutional COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 6 / 25

Neurons, Layers, and Networks The Neuron • y = ρ ( a ( x )) a = v T x + b where x ∈ R d , y ∈ R • v are the gains , b is the bias • Together, w = [ v , b ] T are the weights • ρ ( a ) = max( 0 , a ) (ReLU, Rectified Linear Unit) y y ρ a + a b v 1 v d 1 ... x x 1 x d COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 7 / 25

Neurons, Layers, and Networks The Neuron as a Pattern Matcher (Almost) • Left pattern is a drumbeat g (a pattern template): • Which of the other two patterns x is a drumbeat? • Normalize both g and x so that � g � = � x � = 1 • Then g T x is the cosine of the angle between the patterns • If g T x ≥ − b for some threshold b , output a = g T x + b (amount by which the cosine exceeds the threshold) otherwise, output 0 • y = ρ ( g T x + b ) COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 8 / 25

Neurons, Layers, and Networks The Neuron as a Pattern Matcher (Almost) • y = ρ ( v T x + b ) • A neuron is a pattern matcher, except for normalization • In neural networks, normalization may happen in later or earlier layers • This interpretation is not necessary to understand neural networks • Nice to have a mental model, though • Many neurons wired together can approximate any function we want • A neural network is a function approximator COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 9 / 25

Neurons, Layers, and Networks Layers and Networks • A layer is a set of neurons that share the same input y y y (1) 1 d x x • A neural network is a cascade of layers • A neural network is deep if it has many layers • Two layers can make a universal approximator • If neurons did not have nonlinearities, any cascade of layers would collapse to a single layer COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 10 / 25

Correlation and Convolution Convolutional Layers • A layer with input x ∈ R d and output y ∈ R e has e neurons, each with d gains and one bias • Total of ( d + 1 ) e weights to be trained in a single layer • For images, d , e are in the order of hundreds of thousands or even millions • Too many parameters • Convolutional layers are layers restricted in a special way • Many fewer parameters to train • Also good justification in terms of basic principles COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 11 / 25

Correlation and Convolution Hierarchy, Locality, Reuse • To find a person, look for a face, a torso, limbs,... • To find a face, look for eyes, nose, ears, mouth, hair,... • To find an eye look for a circle, some corners, some curved edges,... • A hierarchical image model is less sensitive to viewpoint, body configuration, ... • Hierarchy leads to a cascade of layers • Low-level features are local : A neuron doesn’t need to see the entire image • Circles are circles, regardless of where they show up: A single neuron can be reused to look for circles anywhere in the image COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 12 / 25

Correlation and Convolution Correlation, Locality, and Reuse • Does the drumbeat on the left show up in the clip on the right? • Drumbeat g has 25 samples, clip x has 100 • Make 100 − 25 + 1 = 76 neurons that look for g in every possible position • y i = ρ ( v T i x + b i ) where v T i = [ 0 , . . . , 0 , g 0 , . . . , g 24 , 0 , . . . 0 ] � �� g i − 1 76 − i g 0 g 24 0 0 0   · · · · · · 0 g 0 g 24 0 0 · · · · · ·    . .  ... ... ... ... • Gain matrix V =  . .    . .     .  ... ... ...  .   . 0   0 0 g 0 g 24 · · · · · · · · · COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 13 / 25

Correlation and Convolution Compact Computation g 0 g 24 0 0 0   · · · · · · 0 g 0 g 24 0 0 · · · · · ·    . .  ... ... ... ... • Gain matrix V =   . .   . .     .  ... ... ...  .   . 0   0 0 g 0 g 24 · · · · · · · · · i x = � 24 • z i = v T a = 0 g a x i + a for i = 0 , . . . , 75 • In general, k − 1 � z i = g a x i + a for i = 0 , . . . , e − 1 = 0 , . . . , d − k a = 0 • (One-dimensional) correlation • g is the kernel COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 14 / 25

Correlation and Convolution A Small Example 2 � z i = i = 0 , . . . , 5 g a x i + a for a = 0 z g 0 g 1 g 2 0 0 0 0 0   0 g 0 g 1 g 2 0 0 0 0 V   0 0 g 0 g 1 g 2 0 0 0   z = V x = x   0 0 0 g 0 g 1 g 2 0 0     x 0 0 0 0 g 0 g 1 g 2 0   0 0 0 0 0 g 0 g 1 g 2 COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 15 / 25

Correlation and Convolution Correlation and Convolution • A layer whose gain matrix V is a correlation matrix is called a convolutional layer • Also includes biases b • The correlation of x with g = [ g 0 , . . . , g k − 1 ] is the convolution of x with r = [ r 0 , . . . , r k − 1 ] = [ g k − 1 , . . . , g 0 ] • There are deep reasons why mathematicians prefer convolution • We do not need to get into these, but see notes COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 16 / 25

Correlation and Convolution Input Padding • If the input has d entries and the kernel has k , then the output has e = d − k + 1 entries • This shrinkage is inconvenient when cascading several layers • Pad input with k − 1 zeros to make the output have d entries • Padding is typically asymmetric when index is time, symmetric when index is position in space x x' 0 0 0 0 0 0 0 g z ? ? ? ? ? ? • Padded or shape-preserving or ‘same’ correlation COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 17 / 25

Correlation and Convolution 2D Correlation • Generalize in a straightforward way for 2D images: k 1 − 1 k 2 − 1 � � z ij = g ab x i + a , j + b a = 0 b = 0 for i = 0 , . . . , e 1 − 1 = 0 , . . . , d 1 − k 1 and j = 0 , . . . , e 2 − 1 = 0 , . . . , d 2 − k 2 COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 18 / 25

Correlation and Convolution Stride • Output z ij is often similar to z i , j + 1 and z i + 1 , j • Images often vary slowly over space • Reduce the redundancy in the output by computing correlations with a stride s m greater than one • Only compute every s m output values in dimension m ∈ { 1 , 2 } • Output size shrinks from d 1 × d 2 to about d 1 / s 1 × d 2 / s 2 COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 19 / 25

Deep Convolutional Neural Nets COMPSCI 371D Machine Learning - PowerPoint PPT Presentation

Deep Convolutional Neural Nets COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Deep Convolutional Neural Nets 1 / 25 Outline 1 Why Neural Networks? 2 Circuits 3 Neurons, Layers, and Networks 4 Correlation and Convolution 5

Convolutional Neural Nets CS447 Natural Language Processing (J. Hockenmaier)

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Conflict nets: Efficient locally canonical MALL proof nets Dominic J. D. Hughes and Willem

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Petri Nets Petri Nets Inputs and Outputs Petri Nets vs FSM Lionel Morel Modeling Templates

Mix-Nets Lecture 19 Some tools for electronic-voting (and other things) Mix-Nets Mix-Nets

Petri Nets and Model Checking Natasa Gkolfi University of Oslo March 31, 2017 Petri Nets and

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

The Fundamentals of Deep Learning Building Blocks Theory with Applications Neural Units Neural

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

outline of this tutorial motivations 1 ACISS09 tutorial on deep belief nets deep

Poster #20 Bayesian Nonparametric Federated Learning of Neural Networks Mikhail Yurochkin Mayank

Bayesian Neural Networks - Presenters Group 1: A Practical Bayesian Framework for Backpropagation

PDE models of neural networks Beno t Perthame Introduction The electrically active cells

Lecture on Distributed Representations and Coarse Coding Geoffrey Hinton Localist

Neural Networks Hugo Larochelle ( @hugo_larochelle ) Google Brain 2 NEURAL NETWORK ONLINE

Neural encoding models & maximum likelihood Jonathan Pillow 1 probability leftovers:

ResNet with one-neuron hidden layers is universal approximator Hongzhou Lin, Stefanie Jegelka

When Neurons Fail El Mahdi El Mhamdi, Rachid Guerraoui BDA, Chicago July 25th, 2016 1 / 28

Deep Convolutional Neural Nets COMPSCI 371D Machine Learning - PowerPoint PPT Presentation

Deep Convolutional Neural Nets COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Deep Convolutional Neural Nets 1 / 25 Outline 1 Why Neural Networks? 2 Circuits 3 Neurons, Layers, and Networks 4 Correlation and Convolution 5

Convolutional Neural Nets CS447 Natural Language Processing (J. Hockenmaier)

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Conflict nets: Efficient locally canonical MALL proof nets Dominic J. D. Hughes and Willem

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Petri Nets Petri Nets Inputs and Outputs Petri Nets vs FSM Lionel Morel Modeling Templates

Mix-Nets Lecture 19 Some tools for electronic-voting (and other things) Mix-Nets Mix-Nets

Petri Nets and Model Checking Natasa Gkolfi University of Oslo March 31, 2017 Petri Nets and

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

The Fundamentals of Deep Learning Building Blocks Theory with Applications Neural Units Neural

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

outline of this tutorial motivations 1 ACISS09 tutorial on deep belief nets deep

Poster #20 Bayesian Nonparametric Federated Learning of Neural Networks Mikhail Yurochkin Mayank

Bayesian Neural Networks - Presenters Group 1: A Practical Bayesian Framework for Backpropagation

PDE models of neural networks Beno t Perthame Introduction The electrically active cells

Lecture on Distributed Representations and Coarse Coding Geoffrey Hinton Localist

Neural Networks Hugo Larochelle ( @hugo_larochelle ) Google Brain 2 NEURAL NETWORK ONLINE

Neural encoding models &amp; maximum likelihood Jonathan Pillow 1 probability leftovers:

ResNet with one-neuron hidden layers is universal approximator Hongzhou Lin, Stefanie Jegelka

When Neurons Fail El Mahdi El Mhamdi, Rachid Guerraoui BDA, Chicago July 25th, 2016 1 / 28

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Neural encoding models & maximum likelihood Jonathan Pillow 1 probability leftovers: