deep convolutional neural nets
play

Deep Convolutional Neural Nets COMPSCI 371D Machine Learning - PowerPoint PPT Presentation

Deep Convolutional Neural Nets COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Deep Convolutional Neural Nets 1 / 25 Outline 1 Why Neural Networks? 2 Circuits 3 Neurons, Layers, and Networks 4 Correlation and Convolution 5


  1. Deep Convolutional Neural Nets COMPSCI 371D — Machine Learning COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 1 / 25

  2. Outline 1 Why Neural Networks? 2 Circuits 3 Neurons, Layers, and Networks 4 Correlation and Convolution 5 AlexNet COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 2 / 25

  3. Why Neural Networks? Why Neural Networks? • Neural networks are very expressive (large H ) • Can approximate any well-behaved function from a hypercube in R d to an interval in R within any ǫ > 0 • Universal approximators • However • Complexity grows exponentially with d = dim( X ) • L T is not convex (not even close) • Large H ⇒ overfitting ⇒ lots of data! • Amazon’s Mechanical Turk made neural networks possible • Even so, we cannot keep up with the curse of dimensionality! COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 3 / 25

  4. Why Neural Networks? Why Neural Networks? • Neural networks are data hungry • Availability of lots of data is not a sufficient explanation • There must be deeper reasons • Special structure of image space (or audio space)? • Specialized network architectures? • Regularization tricks and techniques? • We don’t really know. Stay tuned... • Be prepared for some hand-waving and empirical statements COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 4 / 25

  5. Circuits Circuits • Describe implementation of h : X → Y on a computer • Algorithm: A sequence of finite steps • Circuit : Many gates of few types, wired together • These are NAND gates. We’ll use neurons • Algorithms and circuits are equivalent • Algorithm can simulate a circuit • Computer is a circuit that runs algorithms! • Computer really only computes Boolean functions... COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 5 / 25

  6. Circuits Deep Neural Networks as Circuits • Neural networks are typically described as circuits • Nearly always implemented as algorithms • One gate, the neuron • Many neurons that receive the same input form a layer • A cascade of layers is a network • A deep network has many layers • Layers with a special constraint are called convolutional COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 6 / 25

  7. Neurons, Layers, and Networks The Neuron • y = ρ ( a ( x )) a = v T x + b where x ∈ R d , y ∈ R • v are the gains , b is the bias • Together, w = [ v , b ] T are the weights • ρ ( a ) = max( 0 , a ) (ReLU, Rectified Linear Unit) y y ρ a + a b v 1 v d 1 ... x x 1 x d COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 7 / 25

  8. Neurons, Layers, and Networks The Neuron as a Pattern Matcher (Almost) • Left pattern is a drumbeat g (a pattern template): • Which of the other two patterns x is a drumbeat? • Normalize both g and x so that � g � = � x � = 1 • Then g T x is the cosine of the angle between the patterns • If g T x ≥ − b for some threshold b , output a = g T x + b (amount by which the cosine exceeds the threshold) otherwise, output 0 • y = ρ ( g T x + b ) COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 8 / 25

  9. Neurons, Layers, and Networks The Neuron as a Pattern Matcher (Almost) • y = ρ ( v T x + b ) • A neuron is a pattern matcher, except for normalization • In neural networks, normalization may happen in later or earlier layers • This interpretation is not necessary to understand neural networks • Nice to have a mental model, though • Many neurons wired together can approximate any function we want • A neural network is a function approximator COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 9 / 25

  10. Neurons, Layers, and Networks Layers and Networks • A layer is a set of neurons that share the same input y y y (1) 1 d x x • A neural network is a cascade of layers • A neural network is deep if it has many layers • Two layers can make a universal approximator • If neurons did not have nonlinearities, any cascade of layers would collapse to a single layer COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 10 / 25

  11. Correlation and Convolution Convolutional Layers • A layer with input x ∈ R d and output y ∈ R e has e neurons, each with d gains and one bias • Total of ( d + 1 ) e weights to be trained in a single layer • For images, d , e are in the order of hundreds of thousands or even millions • Too many parameters • Convolutional layers are layers restricted in a special way • Many fewer parameters to train • Also good justification in terms of basic principles COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 11 / 25

  12. Correlation and Convolution Hierarchy, Locality, Reuse • To find a person, look for a face, a torso, limbs,... • To find a face, look for eyes, nose, ears, mouth, hair,... • To find an eye look for a circle, some corners, some curved edges,... • A hierarchical image model is less sensitive to viewpoint, body configuration, ... • Hierarchy leads to a cascade of layers • Low-level features are local : A neuron doesn’t need to see the entire image • Circles are circles, regardless of where they show up: A single neuron can be reused to look for circles anywhere in the image COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 12 / 25

  13. Correlation and Convolution Correlation, Locality, and Reuse • Does the drumbeat on the left show up in the clip on the right? • Drumbeat g has 25 samples, clip x has 100 • Make 100 − 25 + 1 = 76 neurons that look for g in every possible position • y i = ρ ( v T i x + b i ) where v T i = [ 0 , . . . , 0 , g 0 , . . . , g 24 , 0 , . . . 0 ] � �� � � �� � � �� � g i − 1 76 − i g 0 g 24 0 0 0   · · · · · · 0 g 0 g 24 0 0 · · · · · ·    . .  ... ... ... ... • Gain matrix V =  . .    . .     .  ... ... ...  .   . 0   0 0 g 0 g 24 · · · · · · · · · COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 13 / 25

  14. Correlation and Convolution Compact Computation g 0 g 24 0 0 0   · · · · · · 0 g 0 g 24 0 0 · · · · · ·    . .  ... ... ... ... • Gain matrix V =   . .   . .     .  ... ... ...  .   . 0   0 0 g 0 g 24 · · · · · · · · · i x = � 24 • z i = v T a = 0 g a x i + a for i = 0 , . . . , 75 • In general, k − 1 � z i = g a x i + a for i = 0 , . . . , e − 1 = 0 , . . . , d − k a = 0 • (One-dimensional) correlation • g is the kernel COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 14 / 25

  15. Correlation and Convolution A Small Example 2 � z i = i = 0 , . . . , 5 g a x i + a for a = 0 z g 0 g 1 g 2 0 0 0 0 0   0 g 0 g 1 g 2 0 0 0 0 V   0 0 g 0 g 1 g 2 0 0 0   z = V x = x   0 0 0 g 0 g 1 g 2 0 0     x 0 0 0 0 g 0 g 1 g 2 0   0 0 0 0 0 g 0 g 1 g 2 COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 15 / 25

  16. Correlation and Convolution Correlation and Convolution • A layer whose gain matrix V is a correlation matrix is called a convolutional layer • Also includes biases b • The correlation of x with g = [ g 0 , . . . , g k − 1 ] is the convolution of x with r = [ r 0 , . . . , r k − 1 ] = [ g k − 1 , . . . , g 0 ] • There are deep reasons why mathematicians prefer convolution • We do not need to get into these, but see notes COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 16 / 25

  17. Correlation and Convolution Input Padding • If the input has d entries and the kernel has k , then the output has e = d − k + 1 entries • This shrinkage is inconvenient when cascading several layers • Pad input with k − 1 zeros to make the output have d entries • Padding is typically asymmetric when index is time, symmetric when index is position in space x x' 0 0 0 0 0 0 0 g z ? ? ? ? ? ? • Padded or shape-preserving or ‘same’ correlation COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 17 / 25

  18. Correlation and Convolution 2D Correlation • Generalize in a straightforward way for 2D images: k 1 − 1 k 2 − 1 � � z ij = g ab x i + a , j + b a = 0 b = 0 for i = 0 , . . . , e 1 − 1 = 0 , . . . , d 1 − k 1 and j = 0 , . . . , e 2 − 1 = 0 , . . . , d 2 − k 2 COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 18 / 25

  19. Correlation and Convolution Stride • Output z ij is often similar to z i , j + 1 and z i + 1 , j • Images often vary slowly over space • Reduce the redundancy in the output by computing correlations with a stride s m greater than one • Only compute every s m output values in dimension m ∈ { 1 , 2 } • Output size shrinks from d 1 × d 2 to about d 1 / s 1 × d 2 / s 2 COMPSCI 371D — Machine Learning Deep Convolutional Neural Nets 19 / 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend