Neural Networks Hugo Larochelle ( @hugo_larochelle ) Google Brain

2 NEURAL NETWORKS • What we’ll cover ... • f ( x ) ‣ computer vision architectures - convolutional networks - data augmentation 1 - residual networks ... ... ‣ natural language processing architectures - word embeddings - recurrent neural networks 1 ... ... - long short-term memory networks (LSTMs) 1 ... ... • x 1 x j x d x

Neural Networks Computer vision

4 COMPUTER VISION Topics: computer vision, object recognition • Computer vision is the design of computers that can process visual data and accomplish some given task ‣ we will focus on object recognition: given some input image, identify which object it contains Caltech 101 dataset 112 pixels ‘‘sun flower’’ 150 pixels

5 COMPUTER VISION Topics: computer vision • We can design neural networks that are specifically adapted for such problems ‣ must deal with very high-dimensional inputs - 150 x 150 pixels = 22500 inputs, or 3 x 22500 if RGB pixels ‣ can exploit the 2D topology of pixels (or 3D for video data) ‣ can build in invariance to certain variations we can expect - translations, illumination, etc. • Convolutional networks leverage these ideas ‣ local connectivity ‣ parameter sharing ‣ pooling / subsampling hidden units

6 COMPUTER VISION Topics: local connectivity ... • First idea: use a local ... ... connectivity of hidden units ‣ each hidden unit is connected only to a subregion (patch) of the input image ‣ it is connected to all channels - 1 if greyscale image - 3 (R, G, B) for color image • Solves the following problems: ‣ fully connected hidden layer would have an unmanageable number of parameters r = receptive field ‣ computing the linear activations of the hidden units would be very expensive

7 COMPUTER VISION Topics: local connectivity • Units are connected to all channels: ‣ 1 channel if grayscale image, 3 channels (R, G, B) if color image ... ... ...

8 COMPUTER VISION Topics: parameter sharing • Second idea: share matrix of parameters across certain units ‣ units organized into the same ‘‘feature map’’ share parameters ‣ hidden units within a feature map cover different positions in the image feature map 1 feature map 2 feature map 3 ... ... ... ... ... ... same color = same matrix of connections 6

8 COMPUTER VISION Topics: parameter sharing • Second idea: share matrix of parameters across certain units ‣ units organized into the same ‘‘feature map’’ share parameters ‣ hidden units within a feature map cover different positions in the image feature map 1 feature map 2 feature map 3 ... ... ... ... ... ... W ij is the matrix same color connecting the i th = input channel with same matrix of connections the j th feature map 6

9 COMPUTER VISION Topics: parameter sharing • Solves the following problems: ‣ reduces even more the number of parameters ‣ will extract the same features at every position (features are ‘‘equivariant’’) feature map 1 feature map 2 feature map 3 ... ... ... ... ... ... W ij is the matrix same color connecting the i th = input channel with same matrix of connections the j th feature map

10 COMPUTER VISION Topics: parameter sharing Jarret et al. 2009 • Each feature map forms a 2D grid of features ‣ can be computed with a discrete convolution ( ) of a kernel matrix k ij which is   H X ∗ the hidden weights matrix W ij with its rows and columns flipped   f e a t u r e m ‣ x i is the i th channel of input a p s ‣ k ij is the convolution kernel ‣ g j is a learned scaling factor ‣ y j is the hidden layer (could have added a bias) � y j = g j tanh( k ij ∗ x i ) i

  11 COMPUTER VISION Topics: discrete convolution • The convolution of an image x with a kernel k is computed as follows:   ( x * k ) ij = ∑ x i + p,j + q k r-p,r-q • Example: 0 80 40 0 0.25 = * 20 40 0 0.5 1 0 0 40 k x

  11 COMPUTER VISION Topics: discrete convolution • The convolution of an image x with a kernel k is computed as follows:   ( x * k ) ij = ∑ x i + p,j + q k r-p,r-q pq • Example: ~ = k with rows and columns flipped k 0 80 40 1 0.5 0 0.25 = * 20 40 0 0.25 0 0.5 1 0 0 40 k x

  12 COMPUTER VISION Topics: discrete convolution • The convolution of an image x with a kernel k is computed as follows:   ( x * k ) ij = ∑ x i + p,j + q k r-p,r-q pq • Example: 1 x 0 + 0.5 x 80 + 0.25 x 20 + 0 x 40 0 80 40 1 0.5 0 0.25 45 = * 20 40 0 0.25 0 0.5 1 0 0 40 k x

  13 COMPUTER VISION Topics: discrete convolution • The convolution of an image x with a kernel k is computed as follows:   ( x * k ) ij = ∑ x i + p,j + q k r-p,r-q pq • Example: 1 x 80 + 0.5 x 40 + 0.25 x 40 + 0 x 0 0 80 40 1 0.5 0 0.25 45 110 = * 20 40 0 0.25 0 0.5 1 0 0 40 k x

  14 COMPUTER VISION Topics: discrete convolution • The convolution of an image x with a kernel k is computed as follows:   ( x * k ) ij = ∑ x i + p,j + q k r-p,r-q pq • Example: 1 x 20 + 0.5 x 40 + 0.25 x 0 + 0 x 0 0 80 40 0 0.25 45 110 = * 20 40 0 1 0.5 0.5 1 40 0 0 40 k 0.25 0 x

  15 COMPUTER VISION Topics: discrete convolution • The convolution of an image x with a kernel k is computed as follows:   ( x * k ) ij = ∑ x i + p,j + q k r-p,r-q pq • Example: 1 x 40 + 0.5 x 0 + 0.25 x 0 + 0 x 40 0 80 40 0 0.25 45 110 = * 20 40 0 1 0.5 0.5 1 40 40 0 0 40 k 0.25 0 x

16 COMPUTER VISION Topics: discrete convolution • Pre-activations from channel x i into feature map y j can be computed by: ~ ‣ getting the convolution kernel where k ij = W ij from the connection matrix W ij ‣ applying the convolution x i * k ij • This is equivalent to computing the discrete correlation   of x i with W ij

17 COMPUTER VISION Topics: discrete convolution ~ 0% 0.5% • Simple illustration: x i * k ij where W ij = W ij 0.5% 0% W 0.5% 0% 0% 0% 255% 0% 0% 0% 128% 128% 0% 0.5% 0% 0% 0% 255% 0% 0% 0% 128% 128% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 0% x i * k ij x i X W %%%%% %%%%%

18 COMPUTER VISION Topics: discrete convolution • With a non-linearity, we get a detector of a feature at any position in the image 0% 0% 255% 0% 0% 0.02% 0.19% 0.19% 0.02% 0% 128% 128% 0% 0% 0% 255% 0% 0% 0.02% 0.19% 0.19% 0.02% 0% 128% 128% 0% 0% 0% 255% 0% 0% 0.02% 0.75% 0.02% 0.02% 0% 255% 0% 0% 0% 255% 0% 0% 0% 0.75% 0.02% 0.02% 0.02% 255% 0% 0% 0% 255% 0% 0% 0% 0% sigm(0.02 x i * k ij - 4) x i Logis6c (%(%%%%%%%%%%%%%n%200%)%/%50%)%

19 COMPUTER VISION Topics: discrete convolution • Can use ‘‘zero padding’’ to allow going over the borders ( * ) 0% 0% 255% 0% 0% 0% 128% 128% 0% 0% 0% 255% 0% 0% 0% 128% 128% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 0% x i * k ij x i

19 COMPUTER VISION Topics: discrete convolution • Can use ‘‘zero padding’’ to allow going over the borders ( * ) 0% 255% 0% 255% 0% 0% 0% 0% 0% 0% 0% 0% 0% 255% 0% 255% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 128% 128% 0% 0% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 128% 128% 0% 0% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 0% x i * k ij x i

Neural Networks Hugo Larochelle ( @hugo_larochelle ) Google Brain - PowerPoint PPT Presentation

Neural Networks Hugo Larochelle ( @hugo_larochelle ) Google Brain 2 NEURAL NETWORKS What well cover ... f ( x ) computer vision architectures - convolutional networks - data augmentation 1 - residual networks ... ...

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

CS 4803 / 7643: Deep Learning Topics: (Continue) Low-label ML Formulations Zsolt Kira

A simple way to create a website with A simple way to create a website with Blogdown Blogdown R

OPEN SOURCE HUGO TESO ALLOW ME TO... CYBER-CYBER AIRPLANES! NDA... :( I DO OPEN SAUCE!

Creating Websites with Hugo & R Markdown Thomas Lo Russo WHO Research Associate Official

On the Direction of Innovation Hugo A. Hopenhayn Francesco Squintani EARIE, Milan August 2014

Introd u ction to iterators P YTH ON DATA SC IE N C E TOOL BOX ( PAR T 2 ) H u go Bo w ne -

sttde : a time-depending and post- estimation within time-interval command Hugo Sjqvist

An Education, Research and Innovation Ecosystem 2 Carnegie Mellon Portugal Program The