neural networks
play

Neural Networks Hugo Larochelle ( @hugo_larochelle ) Google Brain - PowerPoint PPT Presentation

Neural Networks Hugo Larochelle ( @hugo_larochelle ) Google Brain 2 NEURAL NETWORKS What well cover ... f ( x ) computer vision architectures - convolutional networks - data augmentation 1 - residual networks ... ...


  1. Neural Networks Hugo Larochelle ( @hugo_larochelle ) Google Brain

  2. 2 NEURAL NETWORKS • What we’ll cover ... • f ( x ) ‣ computer vision architectures - convolutional networks - data augmentation 1 - residual networks ... ... ‣ natural language processing architectures - word embeddings - recurrent neural networks 1 ... ... - long short-term memory networks (LSTMs) 1 ... ... • x 1 x j x d x

  3. Neural Networks Computer vision

  4. 4 COMPUTER VISION Topics: computer vision, object recognition • Computer vision is the design of computers that can process visual data and accomplish some given task ‣ we will focus on object recognition: given some input image, identify which object it contains Caltech 101 dataset 112 pixels ‘‘sun flower’’ 150 pixels

  5. 5 COMPUTER VISION Topics: computer vision • We can design neural networks that are specifically adapted for such problems ‣ must deal with very high-dimensional inputs - 150 x 150 pixels = 22500 inputs, or 3 x 22500 if RGB pixels ‣ can exploit the 2D topology of pixels (or 3D for video data) ‣ can build in invariance to certain variations we can expect - translations, illumination, etc. • Convolutional networks leverage these ideas ‣ local connectivity ‣ parameter sharing ‣ pooling / subsampling hidden units

  6. 6 COMPUTER VISION Topics: local connectivity ... • First idea: use a local ... ... connectivity of hidden units ‣ each hidden unit is connected only to a subregion (patch) of the input image ‣ it is connected to all channels - 1 if greyscale image - 3 (R, G, B) for color image • Solves the following problems: ‣ fully connected hidden layer would have an unmanageable number of parameters r = receptive field ‣ computing the linear activations of the hidden units would be very expensive

  7. 7 COMPUTER VISION Topics: local connectivity • Units are connected to all channels: ‣ 1 channel if grayscale image, 3 channels (R, G, B) if color image ... ... ...

  8. 8 COMPUTER VISION Topics: parameter sharing • Second idea: share matrix of parameters across certain units ‣ units organized into the same ‘‘feature map’’ share parameters ‣ hidden units within a feature map cover different positions in the image feature map 1 feature map 2 feature map 3 ... ... ... ... ... ... same color = same matrix of connections 6

  9. 8 COMPUTER VISION Topics: parameter sharing • Second idea: share matrix of parameters across certain units ‣ units organized into the same ‘‘feature map’’ share parameters ‣ hidden units within a feature map cover different positions in the image feature map 1 feature map 2 feature map 3 ... ... ... ... ... ... same color = same matrix of connections 6

  10. 8 COMPUTER VISION Topics: parameter sharing • Second idea: share matrix of parameters across certain units ‣ units organized into the same ‘‘feature map’’ share parameters ‣ hidden units within a feature map cover different positions in the image feature map 1 feature map 2 feature map 3 ... ... ... ... ... ... same color = same matrix of connections 6

  11. 8 COMPUTER VISION Topics: parameter sharing • Second idea: share matrix of parameters across certain units ‣ units organized into the same ‘‘feature map’’ share parameters ‣ hidden units within a feature map cover different positions in the image feature map 1 feature map 2 feature map 3 ... ... ... ... ... ... same color = same matrix of connections 6

  12. 8 COMPUTER VISION Topics: parameter sharing • Second idea: share matrix of parameters across certain units ‣ units organized into the same ‘‘feature map’’ share parameters ‣ hidden units within a feature map cover different positions in the image feature map 1 feature map 2 feature map 3 ... ... ... ... ... ... W ij is the matrix same color connecting the i th = input channel with same matrix of connections the j th feature map 6

  13. 8 COMPUTER VISION Topics: parameter sharing • Second idea: share matrix of parameters across certain units ‣ units organized into the same ‘‘feature map’’ share parameters ‣ hidden units within a feature map cover different positions in the image feature map 1 feature map 2 feature map 3 ... ... ... ... ... ... W ij is the matrix same color connecting the i th = input channel with same matrix of connections the j th feature map 6

  14. 9 COMPUTER VISION Topics: parameter sharing • Solves the following problems: ‣ reduces even more the number of parameters ‣ will extract the same features at every position (features are ‘‘equivariant’’) feature map 1 feature map 2 feature map 3 ... ... ... ... ... ... W ij is the matrix same color connecting the i th = input channel with same matrix of connections the j th feature map

  15. 10 COMPUTER VISION Topics: parameter sharing Jarret et al. 2009 • Each feature map forms a 2D grid of features ‣ can be computed with a discrete convolution ( ) of a kernel matrix k ij which is 
 H X ∗ the hidden weights matrix W ij with its rows and columns flipped 
 f e a t u r e m ‣ x i is the i th channel of input a p s ‣ k ij is the convolution kernel ‣ g j is a learned scaling factor ‣ y j is the hidden layer (could have added a bias) � y j = g j tanh( k ij ∗ x i ) i

  16. 
 11 COMPUTER VISION Topics: discrete convolution • The convolution of an image x with a kernel k is computed as follows: 
 ( x * k ) ij = ∑ x i + p,j + q k r-p,r-q • Example: 0 80 40 0 0.25 = * 20 40 0 0.5 1 0 0 40 k x

  17. 
 11 COMPUTER VISION Topics: discrete convolution • The convolution of an image x with a kernel k is computed as follows: 
 ( x * k ) ij = ∑ x i + p,j + q k r-p,r-q pq • Example: ~ = k with rows and columns flipped k 0 80 40 1 0.5 0 0.25 = * 20 40 0 0.25 0 0.5 1 0 0 40 k x

  18. 
 12 COMPUTER VISION Topics: discrete convolution • The convolution of an image x with a kernel k is computed as follows: 
 ( x * k ) ij = ∑ x i + p,j + q k r-p,r-q pq • Example: 1 x 0 + 0.5 x 80 + 0.25 x 20 + 0 x 40 0 80 40 1 0.5 0 0.25 45 = * 20 40 0 0.25 0 0.5 1 0 0 40 k x

  19. 
 13 COMPUTER VISION Topics: discrete convolution • The convolution of an image x with a kernel k is computed as follows: 
 ( x * k ) ij = ∑ x i + p,j + q k r-p,r-q pq • Example: 1 x 80 + 0.5 x 40 + 0.25 x 40 + 0 x 0 0 80 40 1 0.5 0 0.25 45 110 = * 20 40 0 0.25 0 0.5 1 0 0 40 k x

  20. 
 14 COMPUTER VISION Topics: discrete convolution • The convolution of an image x with a kernel k is computed as follows: 
 ( x * k ) ij = ∑ x i + p,j + q k r-p,r-q pq • Example: 1 x 20 + 0.5 x 40 + 0.25 x 0 + 0 x 0 0 80 40 0 0.25 45 110 = * 20 40 0 1 0.5 0.5 1 40 0 0 40 k 0.25 0 x

  21. 
 15 COMPUTER VISION Topics: discrete convolution • The convolution of an image x with a kernel k is computed as follows: 
 ( x * k ) ij = ∑ x i + p,j + q k r-p,r-q pq • Example: 1 x 40 + 0.5 x 0 + 0.25 x 0 + 0 x 40 0 80 40 0 0.25 45 110 = * 20 40 0 1 0.5 0.5 1 40 40 0 0 40 k 0.25 0 x

  22. 16 COMPUTER VISION Topics: discrete convolution • Pre-activations from channel x i into feature map y j can be computed by: ~ ‣ getting the convolution kernel where k ij = W ij from the connection matrix W ij ‣ applying the convolution x i * k ij • This is equivalent to computing the discrete correlation 
 of x i with W ij

  23. 17 COMPUTER VISION Topics: discrete convolution ~ 0% 0.5% • Simple illustration: x i * k ij where W ij = W ij 0.5% 0% W 0.5% 0% 0% 0% 255% 0% 0% 0% 128% 128% 0% 0.5% 0% 0% 0% 255% 0% 0% 0% 128% 128% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 0% x i * k ij x i X W %%%%% %%%%%

  24. 18 COMPUTER VISION Topics: discrete convolution • With a non-linearity, we get a detector of a feature at any position in the image 0% 0% 255% 0% 0% 0.02% 0.19% 0.19% 0.02% 0% 128% 128% 0% 0% 0% 255% 0% 0% 0.02% 0.19% 0.19% 0.02% 0% 128% 128% 0% 0% 0% 255% 0% 0% 0.02% 0.75% 0.02% 0.02% 0% 255% 0% 0% 0% 255% 0% 0% 0% 0.75% 0.02% 0.02% 0.02% 255% 0% 0% 0% 255% 0% 0% 0% 0% sigm(0.02 x i * k ij - 4) x i Logis6c (%(%%%%%%%%%%%%%n%200%)%/%50%)%

  25. 19 COMPUTER VISION Topics: discrete convolution • Can use ‘‘zero padding’’ to allow going over the borders ( * ) 0% 0% 255% 0% 0% 0% 128% 128% 0% 0% 0% 255% 0% 0% 0% 128% 128% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 0% x i * k ij x i

  26. 19 COMPUTER VISION Topics: discrete convolution • Can use ‘‘zero padding’’ to allow going over the borders ( * ) 0% 255% 0% 255% 0% 0% 0% 0% 0% 0% 0% 0% 0% 255% 0% 255% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 128% 128% 0% 0% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 128% 128% 0% 0% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 0% x i * k ij x i

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend