Generative Adversarial Networks (GANs) M. Soleymani Sharif - - PowerPoint PPT Presentation

generative adversarial networks gans
SMART_READER_LITE
LIVE PREVIEW

Generative Adversarial Networks (GANs) M. Soleymani Sharif - - PowerPoint PPT Presentation

Generative Adversarial Networks (GANs) M. Soleymani Sharif University of Technology Spring 2020 Most slides are based on Fei Fei Li and colleagues lectures, cs231n, Stanford 2018 and some slides from Raymond Yeh et al., CS598LAZ, Illinois,


slide-1
SLIDE 1

Generative Adversarial Networks (GANs)

  • M. Soleymani

Sharif University of Technology Spring 2020 Most slides are based on Fei Fei Li and colleagues lectures, cs231n, Stanford 2018 and some slides from Raymond Yeh et al., CS598LAZ, Illinois, 2017.

1

slide-2
SLIDE 2

Generative Models

  • Explicitly working with the distribution (likelihood):

– Fully visible belief networks – Variational Autoencoder (variational inference)

  • GAN as an implicit generative model

– trained without even needing to explicitly define a density functions.

Ian Goodfellow, Generative Adversarial Networks, NIPS 2016 Tutorial.

2

slide-3
SLIDE 3

Generative Adversarial Networks

  • Problem: Want to sample from complex, high-

dimensional training distribution.

– No direct way to do this!

  • Solution:

– Sample from a simple distribution, e.g. random noise. – Then, learn transformation to training distribution.

  • Q: What can we use to represent this complex

transformation?

– A neural network!

Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.

3

slide-4
SLIDE 4

Training GANs: Two-player game

  • Generator network: try to fool the discriminator by generating real-

looking images

  • Discriminator network: try to distinguish between real and fake

images

4

slide-5
SLIDE 5

GAN architecture

5

noise Input Generator Network Generated sample Discriminator Network real/fake Output Sample

slide-6
SLIDE 6

GAN architecture

6

noise Generator Network Generated sample Discriminator Network real/fake Sample from training data

z G(z) G x D

slide-7
SLIDE 7

Training GANs

  • Generator network: intend to generate real-looking samples
  • Discriminator network: intend to distinguish between real and fake samples

7

𝐾(#) = 𝔽𝒚~)*+,+ log 𝐸 𝒚 + 𝔽𝒜~)(𝒜) log 1 − 𝐸 𝐻(𝒜) 𝐾(6) = 𝔽𝒜~)(𝒜) log 𝐸 𝐻(𝒜) 𝜄#

∗ = max <= 𝔽𝒚~)*+,+ log 𝐸<=(𝒚) + 𝔽𝒜~)(𝒜) log 1 − 𝐸<= 𝐻<>(𝒜)

𝜄6

∗ = max <> 𝔽𝒜~)(𝒜) log 𝐸<= 𝐻<>(𝒜) Ian Goodfellow et al., Generative Adversarial Nets, NIPS 2014. Discriminator output for real samples Discriminator output for generated samples

slide-8
SLIDE 8

Training GANs: Two-player game

  • Generator network: intend to generate real-looking samples

– try to fool the discriminator by generate real-looking samples

  • Discriminator network: intend to distinguish between real and fake samples

– to evaluate the generated samples by generator network

8

min

<> max <= 𝔽𝒚~)*+,+ log 𝐸<=(𝒚) + 𝔽𝒜~)(𝒜) log 1 − 𝐸<= 𝐻<>(𝒜) Ian Goodfellow et al., Generative Adversarial Nets, NIPS 2014. Discriminator output for real samples Discriminator output for generated samples

Discriminator (θd ) wants to maximize objective such that D(x) is close to 1 (real) and D(G(z)) is close to 0 (fake) Generator (θg ) wants to minimize objective such that D(G(z)) is close to 1 (discriminator is fooled into thinking generated G(z) is real)

slide-9
SLIDE 9

Training GANs: Two-player game

  • Train jointly in a minimax game:
  • Objectives of the generators and discriminators:

9

𝐾(#) = 𝔽𝒚~)*+,+ log 𝐸 𝒚 + 𝔽𝒜~)(𝒜) log 1 − 𝐸 𝐻(𝒜) 𝐾(6) = −𝐾(#) min

<> max <= 𝔽𝒚~)*+,+ log 𝐸<=(𝒚) + 𝔽𝒜~)(𝒜) log 1 − 𝐸<= 𝐻<>(𝒜) Ian Goodfellow et al., Generative Adversarial Nets, NIPS 2014. The training procedure for G is to maximize the probability of D making a mistake.

slide-10
SLIDE 10

GAN: Adversarial

  • Two networks:

– Generator G: creates (fake) samples that the discriminator cannot distinguish

  • trained to capture the data distribution

– Discriminator D: distinguish fake and real samples

  • estimate the probability that a sample came from the training data rather than G.
  • Equilibrium is a saddle point of the discriminator loss

Ian J. Goodfellow et al. Generative Adversarial Networks, 2014.

10

slide-11
SLIDE 11

Training GANs: Two-player game

Ian Goodfellow et al., Generative Adversarial Nets, NIPS 2014.

11

Gradient is relatively flat for likely fake samples while we intend to improve generator from them.

slide-12
SLIDE 12

Training GANs: Two-player game

Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.

12

slide-13
SLIDE 13

Training GANs: Two-player game

Ian Goodfellow et al., Generative Adversarial Nets, NIPS 2014.

13

In generator, instead of minimizing likelihood of discriminator being correct, maximize likelihood of discriminator being wrong. Higher gradient for likely fake samples as we intend.

slide-14
SLIDE 14

Training GANs: Two-player game

Aside: Jointly training two networks is challenging, can be unstable. Choosing objectives with better loss landscapes helps training, is an active area of research.

Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.

14

slide-15
SLIDE 15

GAN Training

15

noise Generator Network Generated sample Discriminator Network real/fake Sample from training data

z G(z) G x D

𝜄# updates by fixing 𝜄6 𝜄6 updates for fixed 𝜄6

Each player’s cost depends on the parameters of other player. However, each player can only optimize its own parameters.

slide-16
SLIDE 16

GAN architecture

16

noise Generator Network Generated sample Discriminator Network real/fake Sample from training data

z G(z) G x D

After training, this generator network is used for sample generation

slide-17
SLIDE 17

Training GAN

17

Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.

slide-18
SLIDE 18

Putting it together: GAN training algorithm

Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.

18

slide-19
SLIDE 19

Generate samples by GAN

Nearest training sample to the generated sample in the second rightmost column CIFAR-10 MNIST

19

TFD CIFAR-10 Ian Goodfellow et al., Generative Adversarial Nets, NIPS 2014.

slide-20
SLIDE 20

Deep Convolutional GANs (DCGANs)

  • Generator: an upsampling network with fractionally-strided convolutions
  • Discriminator is a convolutional network
  • Conditioning on a class label improves the generated samples

20

Radford et al., Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, ICLR 2016.

All convolutional net

slide-21
SLIDE 21

DCGAN: Convolutional Architecture

21

Radford et al., Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, ICLR 2016.

slide-22
SLIDE 22

Conditional GAN

  • M. Mirza, S. Osindero, Conditional Generative Adversarial Nets, 2014.

22

Learning a conditional model p(y|x)

  • ften gives much better samples
slide-23
SLIDE 23

Conditional GAN: MNIST

  • M. Mirza, S. Osindero, Conditional Generative Adversarial Nets, 2014.

23

slide-24
SLIDE 24

DCGAN

24

Radford et al., Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, ICLR 2016.

slide-25
SLIDE 25

Disentangling the representation

  • GANs learn a representation z of the image x.

– this representation can capture useful high-level abstract semantic properties

  • f data

– However, it is difficult to make use of it.

25

slide-26
SLIDE 26

DCGAN: Latent space

26

Interpolation between a series of 9 random points in Z

Radford et al., Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, ICLR 2016.

slide-27
SLIDE 27

Vector arithmetic for visual concepts

27

Radford et al., Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, ICLR 2016.

Samples generated by model

slide-28
SLIDE 28

Vector arithmetic for visual concepts

28

Radford et al., Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, ICLR 2016.

Samples generated by model Sample corresponding to average Z of the above images

slide-29
SLIDE 29

Vector arithmetic for visual concepts

29

Radford et al., Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, ICLR 2016.

slide-30
SLIDE 30

Optimal Discriminator and Generator

  • For a fix generator 𝐻, given enough capacity the optimal discriminator is:

𝐸 = 𝑞CDED 𝑞CDED + 𝑞F

  • If we insert the above discriminator in the objective:

– when both models have sufficient capacity, Nash equilibrium of the game corresponds to:

𝑞F ≈ 𝑞CDED

D(x) will be 1/2 for all x.

30

Ian Goodfellow et al., Generative Adversarial Nets, NIPS 2014. 𝐻(𝑨) being drawn from the same distribution as the training data

slide-31
SLIDE 31

Global Optimality

31

Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.

slide-32
SLIDE 32

Training

32

Ian Goodfellow et al., Generative Adversarial Nets, NIPS 2014.

slide-33
SLIDE 33

GAN Applications

  • Conditional image generation tasks

– Image to image tasks

  • style transfer, super-resolution, in-painting, and etc

– Text-to-Image Generation

  • Other generative models

– Speech synthesis: Text to Speech – Text generation

33

slide-34
SLIDE 34

SRGAN: weighted combination of several loss components

  • C. Ledig et al, Photo-Realistic Single Image Super-Resolution

Using a Generative Adversarial Network, CVPR 2017.

34

slide-35
SLIDE 35

SRGAN: weighted combination of several loss components

35

  • C. Ledig et al, Photo-Realistic Single Image Super-Resolution

Using a Generative Adversarial Network, CVPR 2017.

slide-36
SLIDE 36

Image-to-Image Translation (pix2pix)

Phillip Isola et al., Image-to-Image Translation with Conditional Adversarial Networks, CVPR 2017.

36

slide-37
SLIDE 37

Image-to-Image Translation (pix2pix)

  • Euclidean distance (used in the traditional methods) as loss fuction

causes blurring

– minimized by averaging all plausible outputs

  • These networks not only learn the mapping from input to output, but

also learn a loss function to train this mapping.

– Objective: “make the output indistinguishable from reality”

  • this automatically learn a loss function appropriate for satisfying this goal

37

Phillip Isola et al., Image-to-Image Translation with Conditional Adversarial Networks, CVPR 2017.

slide-38
SLIDE 38

Image-to-Image Translation (pix2pix)

38

Phillip Isola et al., Image-to-Image Translation with Conditional Adversarial Networks, CVPR 2017.

  • Provides noise only in the form of dropout, applied on several layers
  • f our generator at both training and test time.
  • Generator has a “U-Net”-based architecture
  • Discriminator is a convolutional “PatchGAN” classifier, which only

penalizes structure at the scale of image patches.

– classify if each N × N patch in an image is real or fake – responses on all patches are averaged.

slide-39
SLIDE 39

Unpaired data

39

Jun-Yan Zhu et al., Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV 2017.

slide-40
SLIDE 40

cycleGAN

  • For many tasks, paired training data will not be available.
  • learn a mapping G : X → Y in the absence of paired examples

– such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.

  • learn

to synthesize pairs

  • f

corresponding images without correspondence supervision

  • couple it with an inverse mapping F : Y → X and introduce a cycle

consistency loss to push F(G(X)) ≈ X (and vice versa).

40

Jun-Yan Zhu et al., Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV 2017.

slide-41
SLIDE 41

cycleGAN

encourages G to translate X into outputs indistinguishable from domain Y while DY aims to distinguish between translated samples G(x) and real samples y

41

Jun-Yan Zhu et al., Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV 2017.

slide-42
SLIDE 42

cycleGAN

42

Jun-Yan Zhu et al., Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV 2017.

slide-43
SLIDE 43

cycleGAN

43

Jun-Yan Zhu et al., Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV 2017.

slide-44
SLIDE 44

cycGAN: failure

44

Jun-Yan Zhu et al., Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV 2017.

slide-45
SLIDE 45

Generative Adversarial Text to Image Synthesis

  • Translating visual concepts from characters to pixels.

45

  • S. Reed et al., Generative Adversarial Text to Image Synthesis, 2016.
slide-46
SLIDE 46

Generative Adversarial Text to Image Synthesis

  • S. Reed et al., Generative Adversarial Text to Image Synthesis, 2016.

46

slide-47
SLIDE 47

Generative Adversarial Text to Image Synthesis

  • To train a conditional GAN, (c,x)

pairs as joint observations, the discriminator usually just judge pairs as real or fake.

  • In addition to the real / fake

inputs to the discriminator, we add a third type consisting of real images with mismatched text

47

  • S. Reed et al., Generative Adversarial Text to Image Synthesis, 2016.
slide-48
SLIDE 48

GAN Challenges

  • Difficulty of training

– Non-convergence

  • training GANs requires finding Nash equilibria in high dimensional, continuous, non-

convex games

– Mode collapse: GANs easily suffer from mode collapse

  • Thus, applications of GANs are often limited to problems where it is acceptable for the

model to produce a small number of distinct outputs

  • Thus, the quality and variation of the generated samples is a

challenge.

  • Evaluation: there is no clearly justified way to quantitatively score

samples.

48

Salimans et al., Improved Techniques for Training GANs, 2016.

slide-49
SLIDE 49

Mode Collapse

49

Metz et al, Unrolled Generative Adversarial Networks, ICLR 2017.

slide-50
SLIDE 50

Better training approaches

  • To alleviate training problems, several different objectives were

proposed:

– LSGAN (Mao et al., 2016), EBGAN (Zhao et al., 2016), BEGAN (Berthelot, 2017), f-GAN (Nowosin et al., 2016), … – Wasserstein GAN (Arjovsky et al., 2017; Gulrajani et al., 2017) – Spectral normalization as regularization term (Miyato et al., ICLR 2018)

  • And some specific techniques for avoiding mode collapse, e.g.:

– Mini-Batch discrimination (Salimans et al., 2016) – unrolled GANs (Metz et al., 2016) – using labels

50

slide-51
SLIDE 51

Generating high-resolution images

  • Starts from LapGAN (NIPS 2015)
  • ProGAN: Progressive Growing GANs (ICLR 2018)
  • Style-Based Generator Architecture for GANs (CVPR 2019)
  • BigGAN (ICLR 2019)

51

slide-52
SLIDE 52

Progressive Growing

52

Karras et al., Progressive Growing of GANs for Improved Quality, Stability, and Variation, ICLR 2018.

slide-53
SLIDE 53

GAN Zoo

https://github.com/hindupuravinash/the-gan-zoo

53

slide-54
SLIDE 54

Summary of Image Generation Applications

  • Unconditional image generation for creative or fun works

– Generate new images, faces, photographs, cartoon characters, human poses

  • Conditional image generation

– Image-to-Image Translation

  • Face aging
  • Super-resolution
  • Inpainting
  • Colorization
  • Photos to emojis
  • Face frontal view generation
  • Photograph editing

– Text-to-image translation

54

slide-55
SLIDE 55

GANs Broader Applications

  • 3D object generation and manipulation
  • Video prediction and editing
  • Text and language generation

– Language Generation – Dialogue Generation – Machine Translation

  • Graph generation

– Molecule generation – Drug discovery

55

slide-56
SLIDE 56

GAN-based or Adversarial Learning

  • Semi-supervised learning
  • Adversarial domain adaptation
  • GAN for data cleaning and/or imputation
  • Multi-task Learning
  • Imbalanced data classification
  • Example, HexaGAN (Hwang et al., ICML 2019): Generative Adversarial

Nets for Real World Classification

– Missing Data Imputation – Conditional Generation – Semi-supervised Learning

56

slide-57
SLIDE 57

GANs

  • Don’t work with an explicit density function
  • Take game-theoretic approach: learn to generate from training distribution

through 2-player game

  • Pros:

– Beautiful, state-of-the-art samples!

  • Cons:

– Trickier / more unstable to train – Can’t solve inference queries such as p(x), p(z|x)

  • Active areas of research:

– Better loss functions, more stable training – Conditional GANs, GANs for all kinds of applications – The quality of generated samples is still a challenge – Controlling the diversity of the generated samples is difficult

57

slide-58
SLIDE 58

Summary

  • GANs: learn to generate from training distribution by a game-

theoretic approach

– Don’t work with an explicit density function

  • State-of-the-art samples have been generated using GAN
  • Tricky (and more unstable) to train
  • Controlling the diversity of the generated samples is difficult
  • GANs are more successful in image generation than text generation

58

slide-59
SLIDE 59

Summary

59

slide-60
SLIDE 60

Resources

  • Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.
  • Ian Goodfellow, “Generative Adversarial Nets”, NIPS 2016 Tutorial.

60