CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

Module 23.1: Generative Adversarial Networks - The intuition 2/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

So far we have looked at generative ˆ X models which explicitly model the P φ ( X | z ) joint probability distribution or z conditional probability distribution + H ∈ { 0 , 1 } n c 1 c 2 c n ∗ For example, in RBMs we learn ǫ · · · h 1 h 2 h n P ( X, H ), in VAEs we learn P ( z | X ) w 1 , 1 w m,n µ W ∈ R m × n Σ and P ( X | z ) whereas in AR models we learn P ( X ) v 1 v 2 v m · · · Q θ ( z | X ) b 1 b 2 b m V ∈ { 0 , 1 } m What if we are only interested in X p ( x 4 | x 1 , x 2 , x 3 ) p ( x 3 | x 1 , x 2 ) sampling from the distribution and p ( x 2 | x 1 ) p ( x 1 ) don’t really care about the explicit density function P ( X )? V h 1 h 2 h 3 h 4 What does this mean? Let us see W x 1 x 2 x 3 x 4 3/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

As usual we are given some training data (say, MNIST images) which obviously comes from some underlying distribution Our goal is to generate more images from this distribution ( i.e. , create images which look similar to the images from the training data) In other words, we want to sample from a complex high dimensional distribution which is intractable (recall RBMs, VAEs and AR models deal with this intractability in their own way) 4/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

Complex Transformation Sample Generated z ∼ N (0 , I ) GANs take a different approach to this problem where the idea is to sample from a simple tractable distribution (say, z ∼ N (0 , I )) and then learn a complex transformation from this to the training distribution In other words, we will take a z ∼ N (0 , I ), learn to make a series of complex transformations on it so that the output looks as if it came from our training distribution 5/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

What can we use for such a complex transformation? A Neural Network How do you train such a neural network? Using a Real or Fake two player game Discriminator There are two players in the game: a generator and a discriminator The job of the generator is to produce images which look so natural that the discriminator Real Images Generator thinks that the images came from the real data distribution The job of the discriminator is to get better and z ∼ N (0 , I ) better at distinguishing between true images and generated (fake) images 6/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

So let’s look at the full picture Let G φ be the generator and D θ be the discriminator ( φ and θ are the parameters of G Real or Fake and D , respectively) Discriminator We have a neural network based generator which takes as input a noise vector z ∼ N (0 , I ) and produces G φ ( z ) = X We have a neural network based discriminator Real Images Generator which could take as input a real X or a generated X = G φ ( z ) and classify the input as real/fake z ∼ N (0 , I ) 7/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

What should be the objective function of the overall network? Let’s look at the objective function of the Real or Fake generator first Discriminator Given an image generated by the generator as G φ ( z ) the discriminator assigns a score D θ ( G φ ( z )) to it This score will be between 0 and 1 and will tell us Real Images Generator the probability of the image being real or fake For a given z , the generator would want to maximize log D θ ( G φ ( z )) (log likelihood) or z ∼ N (0 , I ) minimize log(1 − D θ ( G φ ( z ))) 8/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

This is just for a single z and the generator would like to do this for all possible values of z , For example, if z was discrete and drawn from a Real or Fake uniform distribution ( i.e. , p ( z ) = 1 N ∀ z ) then the Discriminator generator’s objective function would be N 1 � min N log(1 − D θ ( G φ ( z ))) φ i =1 Real Images Generator However, in our case, z is continuous and not uniform ( z ∼ N (0 , I )) so the equivalent objective function would be z ∼ N (0 , I ) ˆ min p ( z ) log(1 − D θ ( G φ ( z ))) φ min φ E z ∼ p ( z ) [log(1 − D θ ( G φ ( z )))] 9/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

Now let’s look at the discriminator The task of the discriminator is to assign a high score to real images and a low score to fake images Real or Fake And it should do this for all possible real images Discriminator and all possible fake images In other words, it should try to maximize the following objective function Real Images Generator max E x ∼ pdata [log D θ ( x )]+ E z ∼ p ( z ) [log(1 − D θ ( G φ ( z )))] θ z ∼ N (0 , I ) 10/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

If we put the objectives of the generator and discriminator together we get a minimax game Real or Fake min max [ E x ∼ p data log D θ ( x ) φ θ Discriminator + E z ∼ p ( z ) log(1 − D θ ( G φ ( z )))] The first term in the objective is only w.r.t. the parameters of the discriminator ( θ ) Real Images Generator The second term in the objective is w.r.t. the parameters of the generator ( φ ) as well as the discriminator ( θ ) z ∼ N (0 , I ) The discriminator wants to maximize the second term whereas the generator wants to minimize it (hence it is a two-player game) 11/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

So the overall training proceeds by alternating between these two step Step 1: Gradient Ascent on Discriminator Real or Fake Discriminator max [ E x ∼ p data log D θ ( x )+ E z ∼ p ( z ) log(1 − D θ ( G φ ( z )))] θ Step 2: Gradient Descent on Generator min E z ∼ p ( z ) log(1 − D θ ( G φ ( z ))) Real Images Generator φ In practice, the above generator objective does not work well and we use a slightly modified objective z ∼ N (0 , I ) Let us see why 12/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

When the sample is likely fake, we want to give a feedback to the generator (using gradients) 4 log(1 − D ( g ( x ))) However, in this region where D ( G ( z )) is close − log( D ( g ( x ))) to 0, the curve of the loss function is very flat 2 and the gradient would be close to 0 Loss Trick: Instead of minimizing the likelihood of 0 the discriminator being correct, maximize the likelihood of the discriminator being wrong − 2 In effect, the objective remains the same but − 4 the gradient signal becomes better 0 0 . 2 0 . 4 0 . 6 0 . 8 1 D ( G ( z )) 13/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

With that we are now ready to see the full algorithm for training GANs 1: procedure GAN Training for number of training iterations do 2: for k steps do 3: • Sample minibatch of m noise samples { z (1) , .., z ( m ) } from noise prior p g ( z ) 4: • Sample minibatch of m examples { x (1) , .., x ( m ) } from data generating distribution p data ( x ) 5: • Update the discriminator by ascending its stochastic gradient: 6: m 1 � � x ( i ) � � � � z ( i ) �� ∇ θ log D θ + log 1 − D θ G φ m i =1 end for 7: • Sample minibatch of m noise samples { z (1) , .., z ( m ) } from noise prior p g ( z ) 8: • Update the generator by ascending its stochastic gradient 9: m 1 � � � � z ( i ) �� ∇ φ log D θ G φ m i =1 end for 10: 11: end procedure 14/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

Module 23.2: Generative Adversarial Networks - Architecture 15/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

We will now look at one of the popular neural networks used for the generator and discriminator (Deep Convolutional GANs) For discriminator, any CNN based classifier with 1 class (real) at the output can be used (e.g. VGG, ResNet, etc.) Figure: Generator (Redford et al 2015) (left) and discriminator (Yeh et al 2016) (right) used in DCGAN 16/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

Architecture guidelines for stable Deep Convolutional GANs Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator). Use batchnorm in both the generator and the discriminator. Remove fully connected hidden layers for deeper architectures. Use ReLU activation in generator for all layers except for the output, which uses tanh. Use LeakyReLU activation in the discriminator for all layers 17/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

Module 23.3: Generative Adversarial Networks - The Math Behind it 18/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

We will now delve a bit deeper into the objective function used by GANs and see what it implies Suppose we denote the true data distribution by p data ( x ) and the distribution of the data generated by the model as p G ( x ) What do we wish should happen at the end of training? p G ( x ) = p data ( x ) Can we prove this formally even though the model is not explicitly computing this density? We will try to prove this over the next few slides 19/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23 Module

CS7015 (Deep Learning) : Lecture 10 Learning Vectorial Representations Of Words Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 16 Encoder Decoder Models, Attention Mechanism Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 18 Markov Networks Mitesh M. Khapra Department of Computer

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

CS7015 (Deep Learning): Lecture 4 Feedforward Neural Networks, Backpropagation Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

CS7015 (Deep Learning) : Lecture 19 Using joint distributions for classification and sampling,

CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons,

The Wasserstein GAN Instructor: John Thickstun Discussion Board: Available on Ed Zoom Link:

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 FOSDEM - 2019.02.02

An Algorithm for Determining the Endpoints for Isolated Intro to problem Solution

Time Complexity [Turing] has for the first time succeeded in giving an absolute definition of an

Variants and Combinations of Basic Models Stefano Ermon, Aditya Grover Stanford University

Computing similarity between multiscale biological systems under uncertainty Kris Ghosh Miami

Genera&ve Adversarial Networks NTT

The Role of Geographic Information in News Consumption Gebrekirstos G. Gebremeskel and Arjen P.

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23 Module

CS7015 (Deep Learning) : Lecture 10 Learning Vectorial Representations Of Words Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 16 Encoder Decoder Models, Attention Mechanism Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 18 Markov Networks Mitesh M. Khapra Department of Computer

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

CS7015 (Deep Learning): Lecture 4 Feedforward Neural Networks, Backpropagation Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

CS7015 (Deep Learning) : Lecture 19 Using joint distributions for classification and sampling,

CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons,

The Wasserstein GAN Instructor: John Thickstun Discussion Board: Available on Ed Zoom Link:

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 FOSDEM - 2019.02.02

An Algorithm for Determining the Endpoints for Isolated Intro to problem Solution

Time Complexity [Turing] has for the first time succeeded in giving an absolute definition of an

Variants and Combinations of Basic Models Stefano Ermon, Aditya Grover Stanford University

Computing similarity between multiscale biological systems under uncertainty Kris Ghosh Miami

Genera&amp;ve Adversarial Networks NTT

The Role of Geographic Information in News Consumption Gebrekirstos G. Gebremeskel and Arjen P.

Genera&ve Adversarial Networks NTT