cs7015 deep learning lecture 23
play

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23 Module


  1. CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

  2. Module 23.1: Generative Adversarial Networks - The intuition 2/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

  3. So far we have looked at generative ˆ X models which explicitly model the P φ ( X | z ) joint probability distribution or z conditional probability distribution + H ∈ { 0 , 1 } n c 1 c 2 c n ∗ For example, in RBMs we learn ǫ · · · h 1 h 2 h n P ( X, H ), in VAEs we learn P ( z | X ) w 1 , 1 w m,n µ W ∈ R m × n Σ and P ( X | z ) whereas in AR models we learn P ( X ) v 1 v 2 v m · · · Q θ ( z | X ) b 1 b 2 b m V ∈ { 0 , 1 } m What if we are only interested in X p ( x 4 | x 1 , x 2 , x 3 ) p ( x 3 | x 1 , x 2 ) sampling from the distribution and p ( x 2 | x 1 ) p ( x 1 ) don’t really care about the explicit density function P ( X )? V h 1 h 2 h 3 h 4 What does this mean? Let us see W x 1 x 2 x 3 x 4 3/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

  4. As usual we are given some training data (say, MNIST images) which obviously comes from some underlying distribution Our goal is to generate more images from this distribution ( i.e. , create images which look similar to the images from the training data) In other words, we want to sample from a complex high dimensional distribution which is intractable (recall RBMs, VAEs and AR models deal with this intractability in their own way) 4/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

  5. Complex Transformation Sample Generated z ∼ N (0 , I ) GANs take a different approach to this problem where the idea is to sample from a simple tractable distribution (say, z ∼ N (0 , I )) and then learn a complex transformation from this to the training distribution In other words, we will take a z ∼ N (0 , I ), learn to make a series of complex transformations on it so that the output looks as if it came from our training distribution 5/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

  6. What can we use for such a complex transformation? A Neural Network How do you train such a neural network? Using a Real or Fake two player game Discriminator There are two players in the game: a generator and a discriminator The job of the generator is to produce images which look so natural that the discriminator Real Images Generator thinks that the images came from the real data distribution The job of the discriminator is to get better and z ∼ N (0 , I ) better at distinguishing between true images and generated (fake) images 6/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

  7. So let’s look at the full picture Let G φ be the generator and D θ be the discriminator ( φ and θ are the parameters of G Real or Fake and D , respectively) Discriminator We have a neural network based generator which takes as input a noise vector z ∼ N (0 , I ) and produces G φ ( z ) = X We have a neural network based discriminator Real Images Generator which could take as input a real X or a generated X = G φ ( z ) and classify the input as real/fake z ∼ N (0 , I ) 7/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

  8. What should be the objective function of the overall network? Let’s look at the objective function of the Real or Fake generator first Discriminator Given an image generated by the generator as G φ ( z ) the discriminator assigns a score D θ ( G φ ( z )) to it This score will be between 0 and 1 and will tell us Real Images Generator the probability of the image being real or fake For a given z , the generator would want to maximize log D θ ( G φ ( z )) (log likelihood) or z ∼ N (0 , I ) minimize log(1 − D θ ( G φ ( z ))) 8/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

  9. This is just for a single z and the generator would like to do this for all possible values of z , For example, if z was discrete and drawn from a Real or Fake uniform distribution ( i.e. , p ( z ) = 1 N ∀ z ) then the Discriminator generator’s objective function would be N 1 � min N log(1 − D θ ( G φ ( z ))) φ i =1 Real Images Generator However, in our case, z is continuous and not uniform ( z ∼ N (0 , I )) so the equivalent objective function would be z ∼ N (0 , I ) ˆ min p ( z ) log(1 − D θ ( G φ ( z ))) φ min φ E z ∼ p ( z ) [log(1 − D θ ( G φ ( z )))] 9/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

  10. Now let’s look at the discriminator The task of the discriminator is to assign a high score to real images and a low score to fake images Real or Fake And it should do this for all possible real images Discriminator and all possible fake images In other words, it should try to maximize the following objective function Real Images Generator max E x ∼ pdata [log D θ ( x )]+ E z ∼ p ( z ) [log(1 − D θ ( G φ ( z )))] θ z ∼ N (0 , I ) 10/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

  11. If we put the objectives of the generator and discriminator together we get a minimax game Real or Fake min max [ E x ∼ p data log D θ ( x ) φ θ Discriminator + E z ∼ p ( z ) log(1 − D θ ( G φ ( z )))] The first term in the objective is only w.r.t. the parameters of the discriminator ( θ ) Real Images Generator The second term in the objective is w.r.t. the parameters of the generator ( φ ) as well as the discriminator ( θ ) z ∼ N (0 , I ) The discriminator wants to maximize the second term whereas the generator wants to minimize it (hence it is a two-player game) 11/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

  12. So the overall training proceeds by alternating between these two step Step 1: Gradient Ascent on Discriminator Real or Fake Discriminator max [ E x ∼ p data log D θ ( x )+ E z ∼ p ( z ) log(1 − D θ ( G φ ( z )))] θ Step 2: Gradient Descent on Generator min E z ∼ p ( z ) log(1 − D θ ( G φ ( z ))) Real Images Generator φ In practice, the above generator objective does not work well and we use a slightly modified objective z ∼ N (0 , I ) Let us see why 12/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

  13. When the sample is likely fake, we want to give a feedback to the generator (using gradients) 4 log(1 − D ( g ( x ))) However, in this region where D ( G ( z )) is close − log( D ( g ( x ))) to 0, the curve of the loss function is very flat 2 and the gradient would be close to 0 Loss Trick: Instead of minimizing the likelihood of 0 the discriminator being correct, maximize the likelihood of the discriminator being wrong − 2 In effect, the objective remains the same but − 4 the gradient signal becomes better 0 0 . 2 0 . 4 0 . 6 0 . 8 1 D ( G ( z )) 13/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

  14. With that we are now ready to see the full algorithm for training GANs 1: procedure GAN Training for number of training iterations do 2: for k steps do 3: • Sample minibatch of m noise samples { z (1) , .., z ( m ) } from noise prior p g ( z ) 4: • Sample minibatch of m examples { x (1) , .., x ( m ) } from data generating distribution p data ( x ) 5: • Update the discriminator by ascending its stochastic gradient: 6: m 1 � � x ( i ) � � � � z ( i ) ���� � ∇ θ log D θ + log 1 − D θ G φ m i =1 end for 7: • Sample minibatch of m noise samples { z (1) , .., z ( m ) } from noise prior p g ( z ) 8: • Update the generator by ascending its stochastic gradient 9: m 1 � � � � z ( i ) ���� � ∇ φ log D θ G φ m i =1 end for 10: 11: end procedure 14/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

  15. Module 23.2: Generative Adversarial Networks - Architecture 15/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

  16. We will now look at one of the popular neural networks used for the generator and discriminator (Deep Convolutional GANs) For discriminator, any CNN based classifier with 1 class (real) at the output can be used (e.g. VGG, ResNet, etc.) Figure: Generator (Redford et al 2015) (left) and discriminator (Yeh et al 2016) (right) used in DCGAN 16/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

  17. Architecture guidelines for stable Deep Convolutional GANs Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator). Use batchnorm in both the generator and the discriminator. Remove fully connected hidden layers for deeper architectures. Use ReLU activation in generator for all layers except for the output, which uses tanh. Use LeakyReLU activation in the discriminator for all layers 17/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

  18. Module 23.3: Generative Adversarial Networks - The Math Behind it 18/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

  19. We will now delve a bit deeper into the objective function used by GANs and see what it implies Suppose we denote the true data distribution by p data ( x ) and the distribution of the data generated by the model as p G ( x ) What do we wish should happen at the end of training? p G ( x ) = p data ( x ) Can we prove this formally even though the model is not explicitly computing this density? We will try to prove this over the next few slides 19/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend