Generative Adversarial Networks (GANs)
- M. Soleymani
Sharif University of Technology Spring 2020 Most slides are based on Fei Fei Li and colleagues lectures, cs231n, Stanford 2018 and some slides from Raymond Yeh et al., CS598LAZ, Illinois, 2017.
1
Generative Adversarial Networks (GANs) M. Soleymani Sharif - - PowerPoint PPT Presentation
Generative Adversarial Networks (GANs) M. Soleymani Sharif University of Technology Spring 2020 Most slides are based on Fei Fei Li and colleagues lectures, cs231n, Stanford 2018 and some slides from Raymond Yeh et al., CS598LAZ, Illinois,
Sharif University of Technology Spring 2020 Most slides are based on Fei Fei Li and colleagues lectures, cs231n, Stanford 2018 and some slides from Raymond Yeh et al., CS598LAZ, Illinois, 2017.
1
– Fully visible belief networks – Variational Autoencoder (variational inference)
– trained without even needing to explicitly define a density functions.
Ian Goodfellow, Generative Adversarial Networks, NIPS 2016 Tutorial.
2
dimensional training distribution.
– No direct way to do this!
– Sample from a simple distribution, e.g. random noise. – Then, learn transformation to training distribution.
transformation?
– A neural network!
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.
3
4
5
noise Input Generator Network Generated sample Discriminator Network real/fake Output Sample
6
noise Generator Network Generated sample Discriminator Network real/fake Sample from training data
z G(z) G x D
7
𝐾(#) = 𝔽𝒚~)*+,+ log 𝐸 𝒚 + 𝔽𝒜~)(𝒜) log 1 − 𝐸 𝐻(𝒜) 𝐾(6) = 𝔽𝒜~)(𝒜) log 𝐸 𝐻(𝒜) 𝜄#
∗ = max <= 𝔽𝒚~)*+,+ log 𝐸<=(𝒚) + 𝔽𝒜~)(𝒜) log 1 − 𝐸<= 𝐻<>(𝒜)
𝜄6
∗ = max <> 𝔽𝒜~)(𝒜) log 𝐸<= 𝐻<>(𝒜) Ian Goodfellow et al., Generative Adversarial Nets, NIPS 2014. Discriminator output for real samples Discriminator output for generated samples
– try to fool the discriminator by generate real-looking samples
– to evaluate the generated samples by generator network
8
min
<> max <= 𝔽𝒚~)*+,+ log 𝐸<=(𝒚) + 𝔽𝒜~)(𝒜) log 1 − 𝐸<= 𝐻<>(𝒜) Ian Goodfellow et al., Generative Adversarial Nets, NIPS 2014. Discriminator output for real samples Discriminator output for generated samples
Discriminator (θd ) wants to maximize objective such that D(x) is close to 1 (real) and D(G(z)) is close to 0 (fake) Generator (θg ) wants to minimize objective such that D(G(z)) is close to 1 (discriminator is fooled into thinking generated G(z) is real)
9
𝐾(#) = 𝔽𝒚~)*+,+ log 𝐸 𝒚 + 𝔽𝒜~)(𝒜) log 1 − 𝐸 𝐻(𝒜) 𝐾(6) = −𝐾(#) min
<> max <= 𝔽𝒚~)*+,+ log 𝐸<=(𝒚) + 𝔽𝒜~)(𝒜) log 1 − 𝐸<= 𝐻<>(𝒜) Ian Goodfellow et al., Generative Adversarial Nets, NIPS 2014. The training procedure for G is to maximize the probability of D making a mistake.
– Generator G: creates (fake) samples that the discriminator cannot distinguish
– Discriminator D: distinguish fake and real samples
Ian J. Goodfellow et al. Generative Adversarial Networks, 2014.
10
Ian Goodfellow et al., Generative Adversarial Nets, NIPS 2014.
11
Gradient is relatively flat for likely fake samples while we intend to improve generator from them.
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.
12
Ian Goodfellow et al., Generative Adversarial Nets, NIPS 2014.
13
In generator, instead of minimizing likelihood of discriminator being correct, maximize likelihood of discriminator being wrong. Higher gradient for likely fake samples as we intend.
Aside: Jointly training two networks is challenging, can be unstable. Choosing objectives with better loss landscapes helps training, is an active area of research.
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.
14
15
noise Generator Network Generated sample Discriminator Network real/fake Sample from training data
z G(z) G x D
𝜄# updates by fixing 𝜄6 𝜄6 updates for fixed 𝜄6
Each player’s cost depends on the parameters of other player. However, each player can only optimize its own parameters.
16
noise Generator Network Generated sample Discriminator Network real/fake Sample from training data
z G(z) G x D
After training, this generator network is used for sample generation
17
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.
18
Nearest training sample to the generated sample in the second rightmost column CIFAR-10 MNIST
19
TFD CIFAR-10 Ian Goodfellow et al., Generative Adversarial Nets, NIPS 2014.
20
Radford et al., Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, ICLR 2016.
All convolutional net
21
Radford et al., Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, ICLR 2016.
22
Learning a conditional model p(y|x)
23
24
Radford et al., Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, ICLR 2016.
– this representation can capture useful high-level abstract semantic properties
– However, it is difficult to make use of it.
25
26
Interpolation between a series of 9 random points in Z
Radford et al., Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, ICLR 2016.
27
Radford et al., Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, ICLR 2016.
Samples generated by model
28
Radford et al., Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, ICLR 2016.
Samples generated by model Sample corresponding to average Z of the above images
29
Radford et al., Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, ICLR 2016.
– when both models have sufficient capacity, Nash equilibrium of the game corresponds to:
D(x) will be 1/2 for all x.
30
Ian Goodfellow et al., Generative Adversarial Nets, NIPS 2014. 𝐻(𝑨) being drawn from the same distribution as the training data
31
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.
32
Ian Goodfellow et al., Generative Adversarial Nets, NIPS 2014.
– Image to image tasks
– Text-to-Image Generation
– Speech synthesis: Text to Speech – Text generation
33
Using a Generative Adversarial Network, CVPR 2017.
34
35
Using a Generative Adversarial Network, CVPR 2017.
Phillip Isola et al., Image-to-Image Translation with Conditional Adversarial Networks, CVPR 2017.
36
– minimized by averaging all plausible outputs
– Objective: “make the output indistinguishable from reality”
37
Phillip Isola et al., Image-to-Image Translation with Conditional Adversarial Networks, CVPR 2017.
38
Phillip Isola et al., Image-to-Image Translation with Conditional Adversarial Networks, CVPR 2017.
– classify if each N × N patch in an image is real or fake – responses on all patches are averaged.
39
Jun-Yan Zhu et al., Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV 2017.
– such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.
40
Jun-Yan Zhu et al., Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV 2017.
encourages G to translate X into outputs indistinguishable from domain Y while DY aims to distinguish between translated samples G(x) and real samples y
41
Jun-Yan Zhu et al., Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV 2017.
42
Jun-Yan Zhu et al., Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV 2017.
43
Jun-Yan Zhu et al., Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV 2017.
44
Jun-Yan Zhu et al., Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV 2017.
45
46
pairs as joint observations, the discriminator usually just judge pairs as real or fake.
inputs to the discriminator, we add a third type consisting of real images with mismatched text
47
– Non-convergence
convex games
– Mode collapse: GANs easily suffer from mode collapse
model to produce a small number of distinct outputs
48
Salimans et al., Improved Techniques for Training GANs, 2016.
49
Metz et al, Unrolled Generative Adversarial Networks, ICLR 2017.
– LSGAN (Mao et al., 2016), EBGAN (Zhao et al., 2016), BEGAN (Berthelot, 2017), f-GAN (Nowosin et al., 2016), … – Wasserstein GAN (Arjovsky et al., 2017; Gulrajani et al., 2017) – Spectral normalization as regularization term (Miyato et al., ICLR 2018)
– Mini-Batch discrimination (Salimans et al., 2016) – unrolled GANs (Metz et al., 2016) – using labels
50
51
52
Karras et al., Progressive Growing of GANs for Improved Quality, Stability, and Variation, ICLR 2018.
https://github.com/hindupuravinash/the-gan-zoo
53
– Generate new images, faces, photographs, cartoon characters, human poses
– Image-to-Image Translation
– Text-to-image translation
54
– Language Generation – Dialogue Generation – Machine Translation
– Molecule generation – Drug discovery
55
– Missing Data Imputation – Conditional Generation – Semi-supervised Learning
56
through 2-player game
– Beautiful, state-of-the-art samples!
– Trickier / more unstable to train – Can’t solve inference queries such as p(x), p(z|x)
– Better loss functions, more stable training – Conditional GANs, GANs for all kinds of applications – The quality of generated samples is still a challenge – Controlling the diversity of the generated samples is difficult
57
– Don’t work with an explicit density function
58
59
60