Generative Adversarial Networks Stefano Ermon, Aditya Grover - PowerPoint PPT Presentation

Generative Adversarial Networks Stefano Ermon, Aditya Grover Stanford University Lecture 9 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 1 / 23

Recap Model families Autoregressive Models: p θ ( x ) = � n i =1 p θ ( x i | x < i ) � Variational Autoencoders: p θ ( x ) = p θ ( x , z ) d z � ∂ f − 1 � � �� ( x ) f − 1 � � Normalizing Flow Models: p X ( x ; θ ) = p Z θ ( x ) � det θ � ∂ x All the above families are based on maximizing likelihoods (or approximations) Is the likelihood a good indicator of the quality of samples generated by the model? Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 2 / 23

Towards likelihood-free learning Case 1: Optimal generative model will give best sample quality and highest test log-likelihood For imperfect models, achieving high log-likelihoods might not always imply good sample quality, and vice-versa (Theis et al., 2016) Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 3 / 23

Towards likelihood-free learning Case 2: Great test log-likelihoods, poor samples. E.g., For a discrete noise mixture model p θ ( x ) = 0 . 01 p data ( x ) + 0 . 99 p noise ( x ) 99% of the samples are just noise Taking logs, we get a lower bound log p θ ( x ) = log[0 . 01 p data ( x ) + 0 . 99 p noise ( x )] ≥ log 0 . 01 p data ( x ) = log p data ( x ) − log 100 For expected likelihoods, we know that Lower bound E p data [log p θ ( x )] ≥ E p data [log p data ( x )] − log 100 Upper bound (via non-negativity of KL) E p data [log p data ( x ))] ≥ E p data [log p θ ( x )] As we increase the dimension of x , absolute value of log p data ( x ) increases proportionally but log 100 remains constant. Hence, E p data [log p θ ( x )] ≈ E p data [log p data ( x )] in very high dimensions Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 4 / 23

Towards likelihood-free learning Case 3: Great samples, poor test log-likelihoods. E.g., Memorizing training set Samples look exactly like the training set (cannot do better!) Test set will have zero probability assigned (cannot do worse!) The above cases suggest that it might be useful to disentangle likelihoods and samples Likelihood-free learning consider objectives that do not depend directly on a likelihood function Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 5 / 23

Comparing distributions via samples Given a finite set of samples from two distributions S 1 = { x ∼ P } and S 2 = { x ∼ Q } , how can we tell if these samples are from the same distribution? (i.e., P = Q ?) Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 6 / 23

Two-sample tests Given S 1 = { x ∼ P } and S 2 = { x ∼ Q } , a two-sample test considers the following hypotheses Null hypothesis H 0 : P = Q Alternate hypothesis H 1 : P � = Q Test statistic T compares S 1 and S 2 e.g., difference in means, variances of the two sets of samples If T is less than a threshold α , then accept H 0 else reject it Key observation: Test statistic is likelihood-free since it does not involve P or Q (only samples) Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 7 / 23

Generative modeling and two-sample tests Apriori we assume direct access to S 1 = D = { x ∼ p data } In addition, we have a model distribution p θ Assume that the model distribution permits efficient sampling (e.g., directed models). Let S 2 = { x ∼ p θ } Alternate notion of distance between distributions: Train the generative model to minimize a two-sample test objective between S 1 and S 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 8 / 23

Two-Sample Test via a Discriminator Finding a two-sample test objective in high dimensions is hard In the generative model setup, we know that S 1 and S 2 come from different distributions p data and p θ respectively Key idea: Learn a statistic that maximizes a suitable notion of distance between the two sets of samples S 1 and S 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 9 / 23

Generative Adversarial Networks A two player minimax game between a generator and a discriminator z G θ x Generator Directed, latent variable model with a deterministic mapping between z and x given by G θ Minimizes a two-sample test objective (in support of the null hypothesis p data = p θ ) Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 10 / 23

Generative Adversarial Networks A two player minimax game between a generator and a discriminator y D φ x Discriminator Any function (e.g., neural network) which tries to distinguish “real” samples from the dataset and “fake” samples generated from the model Maximizes the two-sample test objective (in support of the alternate hypothesis p data � = p θ ) Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 11 / 23

Example of GAN objective Training objective for discriminator : max D V ( G , D ) = E x ∼ p data [log D ( x )] + E x ∼ p G [log(1 − D ( x ))] For a fixed generator G , the discriminator is performing binary classification with the cross entropy objective Assign probability 1 to true data points x ∼ p data Assing probability 0 to fake samples x ∼ p G Optimal discriminator p data ( x ) D ∗ G ( x ) = p data ( x ) + p G ( x ) Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 12 / 23

Example of GAN objective Training objective for generator : min G V ( G , D ) = E x ∼ p data [log D ( x )] + E x ∼ p G [log(1 − D ( x ))] For the optimal discriminator D ∗ G ( · ), we have V ( G , D ∗ G ( x )) � � � � p data ( x ) p G ( x ) = E x ∼ p data log + E x ∼ p G log p data ( x )+ p G ( x ) p data ( x )+ p G ( x ) � � � � p data ( x ) p G ( x ) = E x ∼ p data log + E x ∼ p G log − log 4 p data ( x )+ pG ( x ) p data ( x )+ pG ( x ) 2 2 � � � � p data , p data + p G p G , p data + p G = D KL + D KL − log 4 2 2 � �� 2 × Jenson-Shannon Divergence (JSD) = 2 D JSD [ p data , p G ] − log 4 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 13 / 23

Jenson-Shannon Divergence Also called as the symmetric KL divergence � � � � �� D JSD [ p , q ] = 1 p , p + q q , p + q + D KL D KL 2 2 2 Properties D JSD [ p , q ] ≥ 0 D JSD [ p , q ] = 0 iff p = q D JSD [ p , q ] = D JSD [ q , p ] � D JSD [ p , q ] satisfies triangle inequality → Jenson-Shannon Distance Optimal generator for the JSD/Negative Cross Entropy GAN p G = p data For the optimal discriminator D ∗ G ∗ ( · ) and generator G ∗ ( · ), we have V ( G ∗ , D ∗ G ∗ ( x )) = − log 4 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 14 / 23

The GAN training algorithm Sample minibatch of m training points x (1) , x (2) , . . . , x ( m ) from D Sample minibatch of m noise vectors z (1) , z (2) , . . . , z ( m ) from p z Update the generator parameters θ by stochastic gradient descent m ∇ θ V ( G θ , D φ ) = 1 � log(1 − D φ ( G θ ( z ( i ) ))) m ∇ θ i =1 Update the discriminator parameters φ by stochastic gradient ascent m ∇ φ V ( G θ , D φ ) = 1 � [log D φ ( x ( i ) ) + log(1 − D φ ( G θ ( z ( i ) )))] m ∇ φ i =1 Repeat for fixed number of epochs Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 15 / 23

Alternating optimization in GANs min θ max V ( G θ , D φ ) = E x ∼ p data [log D φ ( x )] + E z ∼ p ( z ) [log(1 − D φ ( G θ ( z )))] φ Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 16 / 23

Which one is real? Both images are generated via GANs! Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 17 / 23

Frontiers in GAN research GANs have been successfully applied to several domains and tasks However, working with GANs can be very challenging in practice Unstable optimization Mode collapse Evaluation Many bag of tricks applied to train GANs successfully Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 18 / 23

Optimization challenges Theorem (informal): If the generator updates are made in function space and discriminator is optimal at every step, then the generator is guaranteed to converge to the data distribution Unrealistic assumptions ! In practice, the generator and discriminator loss keeps oscillating during GAN training No robust stopping criteria in practice (unlike likelihood based learning) Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 19 / 23

Mode Collapse GANs are notorious for suffering from mode collapse Intuitively, this refers to the phenomena where the generator of a GAN collapses to one or few samples (dubbed as “modes”) Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 20 / 23

Mode Collapse True distribution is a mixture of Gaussians The generator distribution keeps oscillating between different modes Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 21 / 23

Mode Collapse Fixes to mode collapse are mostly empirically driven: alternate architectures, adding regularization terms, injecting small noise perturbations etc. https://github.com/soumith/ganhacks How to Train a GAN? Tips and tricks to make GANs work by Soumith Chintala Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 22 / 23

Generative Adversarial Networks Stefano Ermon, Aditya Grover - PowerPoint PPT Presentation

Generative Adversarial Networks Stefano Ermon, Aditya Grover Stanford University Lecture 9 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 9 1 / 23 Recap Model families Autoregressive Models: p ( x ) = n i =1 p

Generative Adversarial Networks Benjamin Striner CMU 11-785 March 21, 2018 Benjamin Striner

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Robust Estimation and Generative Adversarial Networks Weizhi ZHU Hong Kong University of Science

GAN-based Photo Video Synthesis Summary of Generative Adversarial Nets Lei Zhang What is

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

CSC321 Lecture 19: Generative Adversarial Networks Roger Grosse Roger Grosse CSC321 Lecture 19:

Generative Adversarial Networks presented by Ian Goodfellow presentation co-developed with Aaron

Generative Adversarial Networks Aaron Mishkin UBC MLRG 2018W2 1 Generative Adversial Networks

Adversarial Training Attacks on Deep Networks and Generative Adversarial Networks Erkut Erdem

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

Generative Adversarial Networks Sahin Olut Department of Computer Engineering Istanbul Technical

LAB MEETING: A Connection Between Generative Adversarial Networks, Inverse Reinforcement Learning

Model-Assisted Generative Adversarial Networks Leigh Whitehead ICL Seminar 05/06/20

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

Tensor Networks for Generative Modeling From Boltzmann machines to Born machines, and back ang (

Improved dose determination for Molecular Radiotherapy Andrew Robinson Dave Cullen Nuclear

Adaptive Management: Applying What We Learn Sriju Sharma, Associate Director - Monitoring,

20 December 2018 Sharing on School Matters Sharing on School Matters School Programmes School

Issues with the Federal Government Lewis Roca Rothgerber Christie 1 PANEL Moderator

Mental Health Awareness for Spiritual Companions 25 th and 26 th March, Glastonbury Sarah Jane

Planning for Success Multidisciplinary Approach and Common Pitfalls Vicki Ibrahim Director of

Matthew Series Lesson #126 July 3, 2016 Dean Bible Ministries www.deanbibleministries.org Dr.