AMMI Introduction to Deep Learning 10.1. Generative Adversarial - - PowerPoint PPT Presentation

ammi introduction to deep learning 10 1 generative
SMART_READER_LITE
LIVE PREVIEW

AMMI Introduction to Deep Learning 10.1. Generative Adversarial - - PowerPoint PPT Presentation

AMMI Introduction to Deep Learning 10.1. Generative Adversarial Networks Fran cois Fleuret https://fleuret.org/ammi-2018/ Thu Sep 6 16:09:56 CAT 2018 COLE POLYTECHNIQUE FDRALE DE LAUSANNE A different approach to learn


slide-1
SLIDE 1

AMMI – Introduction to Deep Learning 10.1. Generative Adversarial Networks

Fran¸ cois Fleuret https://fleuret.org/ammi-2018/ Thu Sep 6 16:09:56 CAT 2018

ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

slide-2
SLIDE 2

A different approach to learn high-dimension generative models are the Generative Adversarial Networks proposed by Goodfellow et al. (2014).

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 1 / 29

slide-3
SLIDE 3

A different approach to learn high-dimension generative models are the Generative Adversarial Networks proposed by Goodfellow et al. (2014). The idea behind GANs is to train two networks jointly:

  • A discriminator D to classify samples as “real” or “fake”,
  • a generator G to map a [simple] fixed distribution to samples that fool D.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 1 / 29

slide-4
SLIDE 4

A different approach to learn high-dimension generative models are the Generative Adversarial Networks proposed by Goodfellow et al. (2014). The idea behind GANs is to train two networks jointly:

  • A discriminator D to classify samples as “real” or “fake”,
  • a generator G to map a [simple] fixed distribution to samples that fool D.

“real” D

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 1 / 29

slide-5
SLIDE 5

A different approach to learn high-dimension generative models are the Generative Adversarial Networks proposed by Goodfellow et al. (2014). The idea behind GANs is to train two networks jointly:

  • A discriminator D to classify samples as “real” or “fake”,
  • a generator G to map a [simple] fixed distribution to samples that fool D.

“real” D Z G “fake” D

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 1 / 29

slide-6
SLIDE 6

A different approach to learn high-dimension generative models are the Generative Adversarial Networks proposed by Goodfellow et al. (2014). The idea behind GANs is to train two networks jointly:

  • A discriminator D to classify samples as “real” or “fake”,
  • a generator G to map a [simple] fixed distribution to samples that fool D.

“real” D Z G “fake” D The approach is adversarial since the two networks have antagonistic objectives.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 1 / 29

slide-7
SLIDE 7

A bit more formally, let 풳 be the signal space and D the latent space dimension.

  • The generator

G : RD → 풳 is trained so that [ideally] if it gets a random normal-distributed Z as input, it produces a sample following the data distribution as output.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 2 / 29

slide-8
SLIDE 8

A bit more formally, let 풳 be the signal space and D the latent space dimension.

  • The generator

G : RD → 풳 is trained so that [ideally] if it gets a random normal-distributed Z as input, it produces a sample following the data distribution as output.

  • The discriminator

D : 풳 → [0, 1] is trained so that if it gets a sample as input, it predicts if it is genuine.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 2 / 29

slide-9
SLIDE 9

If G is fixed, to train D given a set of “real points” xn ∼ µ, n = 1, . . . , N, we can generate zn ∼ 풩(0, I), n = 1, . . . , N, build a two-class data-set 풟 =

  • (x1, 1), . . . , (xN, 1)
  • real samples ∼µ

, (G(z1), 0), . . . , (G(zN), 0)

  • fake samples ∼µG
  • ,

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 3 / 29

slide-10
SLIDE 10

If G is fixed, to train D given a set of “real points” xn ∼ µ, n = 1, . . . , N, we can generate zn ∼ 풩(0, I), n = 1, . . . , N, build a two-class data-set 풟 =

  • (x1, 1), . . . , (xN, 1)
  • real samples ∼µ

, (G(z1), 0), . . . , (G(zN), 0)

  • fake samples ∼µG
  • ,

and minimize the binary cross-entropy ℒ(D) = − 1 2N N

  • n=1

log D(xn) +

N

  • n=1

log(1 − D(G(zn)))

  • = − 1

2

  • ˆ

EX∼µ

  • log D(X)
  • + ˆ

EX∼µG

  • log(1 − D(X))
  • ,

where µ is the true distribution of the data, and µG is the distribution of G(Z) with Z ∼ 풩(0, I).

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 3 / 29

slide-11
SLIDE 11

The situation is slightly more complicated since we also want to optimize G to maximize D’s loss.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 4 / 29

slide-12
SLIDE 12

The situation is slightly more complicated since we also want to optimize G to maximize D’s loss. Goodfellow et al. (2014) provide an analysis of the resulting equilibrium of that strategy.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 4 / 29

slide-13
SLIDE 13

Let’s define V (D, G) = EX∼µ

  • log D(X)
  • + EX∼µG
  • log(1 − D(X))
  • which is high if D is doing a good job (low cross entropy), and low if G fools D.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 5 / 29

slide-14
SLIDE 14

Let’s define V (D, G) = EX∼µ

  • log D(X)
  • + EX∼µG
  • log(1 − D(X))
  • which is high if D is doing a good job (low cross entropy), and low if G fools D.

Our ultimate goal is a G∗ that fools any D, so G∗ = argmin

G

max

D

V (D, G).

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 5 / 29

slide-15
SLIDE 15

Let’s define V (D, G) = EX∼µ

  • log D(X)
  • + EX∼µG
  • log(1 − D(X))
  • which is high if D is doing a good job (low cross entropy), and low if G fools D.

Our ultimate goal is a G∗ that fools any D, so G∗ = argmin

G

max

D

V (D, G). If we define the optimal discriminator for a given generator D∗

G = argmax D

V (D, G),

  • ur objective becomes

G∗ = argmin

G

V (D∗

G, G).

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 5 / 29

slide-16
SLIDE 16

We have V (D, G) = EX∼µ

  • log D(X)
  • + EX∼µG
  • log(1 − D(X))
  • =
  • x

µ(x) log D(x) + µG(x) log(1 − D(x))dx.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 6 / 29

slide-17
SLIDE 17

We have V (D, G) = EX∼µ

  • log D(X)
  • + EX∼µG
  • log(1 − D(X))
  • =
  • x

µ(x) log D(x) + µG(x) log(1 − D(x))dx. Since argmax

d

µ(x) log d + µG(x) log(1 − d) = µ(x) µ(x) + µG(x) , and D∗

G = argmax D

V (D, G), if there is no regularization on D, we get ∀x, D∗

G(x) =

µ(x) µ(x) + µG(x) .

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 6 / 29

slide-18
SLIDE 18

So, since ∀x, D∗

G(x) =

µ(x) µ(x) + µG(x) . we get V (D∗

G, G) = EX∼µ

  • log D∗

G(X)

  • + EX∼µG
  • log(1 − D∗

G(X))

  • = EX∼µ
  • log

µ(X) µ(X) + µG(X)

  • + EX∼µG
  • log

µG(X) µ(X) + µG(X)

  • = DKL
  • µ
  • µ + µG

2

  • + DKL
  • µG
  • µ + µG

2

  • − log 4

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 7 / 29

slide-19
SLIDE 19

So, since ∀x, D∗

G(x) =

µ(x) µ(x) + µG(x) . we get V (D∗

G, G) = EX∼µ

  • log D∗

G(X)

  • + EX∼µG
  • log(1 − D∗

G(X))

  • = EX∼µ
  • log

µ(X) µ(X) + µG(X)

  • + EX∼µG
  • log

µG(X) µ(X) + µG(X)

  • = DKL
  • µ
  • µ + µG

2

  • + DKL
  • µG
  • µ + µG

2

  • − log 4

= 2 DJS (µ, µG) − log 4 where DJS is the Jensen-Shannon Divergence, a standard dissimilarity measure between distributions.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 7 / 29

slide-20
SLIDE 20

To recap: if there is no capacity limitation for D, and if we define V (D, G) = EX∼µ

  • log D(X)
  • + EX∼µG
  • log(1 − D(X))
  • ,

computing G∗ = argmin

G

max

D

V (D, G) amounts to compute G∗ = argmin

G

DJS(µ, µG), where DJS is a reasonable dissimilarity measure between distributions.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 8 / 29

slide-21
SLIDE 21

To recap: if there is no capacity limitation for D, and if we define V (D, G) = EX∼µ

  • log D(X)
  • + EX∼µG
  • log(1 − D(X))
  • ,

computing G∗ = argmin

G

max

D

V (D, G) amounts to compute G∗ = argmin

G

DJS(µ, µG), where DJS is a reasonable dissimilarity measure between distributions.

  • Although this derivation provides a nice formal framework, in practice D

is not “fully” optimized to [come close to] D∗

G when optimizing G.

In our minimal example, we alternate gradient steps to improve G and D.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 8 / 29

slide-22
SLIDE 22

z_dim, nb_hidden = 8, 100 model_G = nn.Sequential(nn.Linear(z_dim, nb_hidden), nn.ReLU(), nn.Linear(nb_hidden, 2)) model_D = nn.Sequential(nn.Linear(2, nb_hidden), nn.ReLU(), nn.Linear(nb_hidden, 1), nn.Sigmoid())

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 9 / 29

slide-23
SLIDE 23

batch_size, lr = 10, 1e-3

  • ptimizer_G = optim.Adam(model_G.parameters(), lr = lr)
  • ptimizer_D = optim.Adam(model_D.parameters(), lr = lr)

for e in range(nb_epochs): for t, real_batch in enumerate(real_samples.split(batch_size)): z = real_batch.new(real_batch.size(0), z_dim).normal_() fake_batch = model_G(z) D_scores_on_real = model_D(real_batch) D_scores_on_fake = model_D(fake_batch) if t%2 == 0: loss = (1 - D_scores_on_fake).log().mean()

  • ptimizer_G.zero_grad()

loss.backward()

  • ptimizer_G.step()

else: loss = - (1 - D_scores_on_fake).log().mean() \

  • D_scores_on_real.log().mean()
  • ptimizer_D.zero_grad()

loss.backward()

  • ptimizer_D.step()

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 10 / 29

slide-24
SLIDE 24

D = 2

  • 4
  • 3
  • 2
  • 1

1 2 3 4

  • 6
  • 4
  • 2

2 4 6 Real Synth

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 11 / 29

slide-25
SLIDE 25

D = 8

  • 4
  • 3
  • 2
  • 1

1 2 3 4

  • 6
  • 4
  • 2

2 4 6 Real Synth

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 11 / 29

slide-26
SLIDE 26

D = 32

  • 4
  • 3
  • 2
  • 1

1 2 3 4

  • 6
  • 4
  • 2

2 4 6 Real Synth

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 11 / 29

slide-27
SLIDE 27

D = 2

  • 4
  • 3
  • 2
  • 1

1 2 3 4

  • 6
  • 4
  • 2

2 4 6 Real Synth

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 11 / 29

slide-28
SLIDE 28

D = 8

  • 4
  • 3
  • 2
  • 1

1 2 3 4

  • 6
  • 4
  • 2

2 4 6 Real Synth

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 11 / 29

slide-29
SLIDE 29

D = 32

  • 4
  • 3
  • 2
  • 1

1 2 3 4

  • 6
  • 4
  • 2

2 4 6 Real Synth

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 11 / 29

slide-30
SLIDE 30

D = 2

  • 4
  • 3
  • 2
  • 1

1 2 3 4

  • 6
  • 4
  • 2

2 4 6 Real Synth

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 11 / 29

slide-31
SLIDE 31

D = 8

  • 4
  • 3
  • 2
  • 1

1 2 3 4

  • 6
  • 4
  • 2

2 4 6 Real Synth

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 11 / 29

slide-32
SLIDE 32

D = 32

  • 4
  • 3
  • 2
  • 1

1 2 3 4

  • 6
  • 4
  • 2

2 4 6 Real Synth

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 11 / 29

slide-33
SLIDE 33

D = 2

  • 4
  • 3
  • 2
  • 1

1 2 3 4

  • 6
  • 4
  • 2

2 4 6 Real Synth

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 11 / 29

slide-34
SLIDE 34

D = 8

  • 4
  • 3
  • 2
  • 1

1 2 3 4

  • 6
  • 4
  • 2

2 4 6 Real Synth

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 11 / 29

slide-35
SLIDE 35

D = 32

  • 4
  • 3
  • 2
  • 1

1 2 3 4

  • 6
  • 4
  • 2

2 4 6 Real Synth

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 11 / 29

slide-36
SLIDE 36

D = 2

  • 4
  • 3
  • 2
  • 1

1 2 3 4

  • 6
  • 4
  • 2

2 4 6 Real Synth

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 11 / 29

slide-37
SLIDE 37

D = 8

  • 4
  • 3
  • 2
  • 1

1 2 3 4

  • 6
  • 4
  • 2

2 4 6 Real Synth

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 11 / 29

slide-38
SLIDE 38

D = 32

  • 4
  • 3
  • 2
  • 1

1 2 3 4

  • 6
  • 4
  • 2

2 4 6 Real Synth

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 11 / 29

slide-39
SLIDE 39

In more realistic settings, the fake samples may be initially so “unrealistic” that the response of D saturates. That causes the loss for G ˆ EX∼µG

  • log(1 − D(X))
  • to be far in the exponential tail of D’s sigmoid, and have zero gradient since

log(1 + ǫ) ≃ ǫ does not correct it in any way.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 12 / 29

slide-40
SLIDE 40

In more realistic settings, the fake samples may be initially so “unrealistic” that the response of D saturates. That causes the loss for G ˆ EX∼µG

  • log(1 − D(X))
  • to be far in the exponential tail of D’s sigmoid, and have zero gradient since

log(1 + ǫ) ≃ ǫ does not correct it in any way. Goodfellow et al. suggest to replace this term with a non-saturating cost −ˆ EX∼µG

  • log(D(X))
  • so that the log fixes D’s exponential behavior. The resulting optimization

problem has the same optima as the original one.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 12 / 29

slide-41
SLIDE 41

In more realistic settings, the fake samples may be initially so “unrealistic” that the response of D saturates. That causes the loss for G ˆ EX∼µG

  • log(1 − D(X))
  • to be far in the exponential tail of D’s sigmoid, and have zero gradient since

log(1 + ǫ) ≃ ǫ does not correct it in any way. Goodfellow et al. suggest to replace this term with a non-saturating cost −ˆ EX∼µG

  • log(D(X))
  • so that the log fixes D’s exponential behavior. The resulting optimization

problem has the same optima as the original one.

  • The loss for D remains unchanged.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 12 / 29

slide-42
SLIDE 42

a) b) c) d) Figure 2: Visualization of samples from the model. Rightmost column shows the nearest training example of

the neighboring sample, in order to demonstrate that the model has not memorized the training set. Samples are fair random draws, not cherry-picked. Unlike most other visualizations of deep generative models, these images show actual samples from the model distributions, not conditional means given samples of hidden units. Moreover, these samples are uncorrelated because the sampling process does not depend on Markov chain

  • mixing. a) MNIST b) TFD c) CIFAR-10 (fully connected model) d) CIFAR-10 (convolutional discriminator

and “deconvolutional” generator)

(Goodfellow et al., 2014)

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 13 / 29

slide-43
SLIDE 43

Deep Convolutional GAN

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 14 / 29

slide-44
SLIDE 44

“We also encountered difficulties attempting to scale GANs using CNN architectures commonly used in the supervised literature. However, after extensive model exploration we identified a family of architectures that resulted in stable training across a range of datasets and allowed for training higher resolution and deeper generative models.” (Radford et al., 2015)

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 15 / 29

slide-45
SLIDE 45

Radford et al. converged to the following rules:

  • Replace pooling layers with strided convolutions in D and strided

transposed convolutions in G,

  • use batchnorm in both D and G,
  • remove fully connected hidden layers,
  • use ReLU in G except for the output, which uses Tanh,
  • use LeakyReLU activation in D for all layers.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 16 / 29

slide-46
SLIDE 46

Figure 1: DCGAN generator used for LSUN scene modeling. A 100 dimensional uniform distribu- tion Z is projected to a small spatial extent convolutional representation with many feature maps. A series of four fractionally-strided convolutions (in some recent papers, these are wrongly called deconvolutions) then convert this high level representation into a 64 × 64 pixel image. Notably, no fully connected or pooling layers are used.

(Radford et al., 2015)

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 17 / 29

slide-47
SLIDE 47

Figure 1: DCGAN generator used for LSUN scene modeling. A 100 dimensional uniform distribu- tion Z is projected to a small spatial extent convolutional representation with many feature maps. A series of four fractionally-strided convolutions (in some recent papers, these are wrongly called deconvolutions) then convert this high level representation into a 64 × 64 pixel image. Notably, no fully connected or pooling layers are used.

(Radford et al., 2015) We can have a look at the reference implementation provided in https://github.com/pytorch/examples.git

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 17 / 29

slide-48
SLIDE 48

# default nz = 100, ngf = 64 class Generator(nn.Module): def __init__(self, ngpu): super(Generator, self).__init__() self.ngpu = ngpu self.main = nn.Sequential( # input is Z, going into a convolution nn.ConvTranspose2d( nz, ngf * 8, 4, 1, 0, bias=False), nn.BatchNorm2d(ngf * 8), nn.ReLU(True), # state size. (ngf*8) x 4 x 4 nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False), nn.BatchNorm2d(ngf * 4), nn.ReLU(True), # state size. (ngf*4) x 8 x 8 nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False), nn.BatchNorm2d(ngf * 2), nn.ReLU(True), # state size. (ngf*2) x 16 x 16 nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False), nn.BatchNorm2d(ngf), nn.ReLU(True), # state size. (ngf) x 32 x 32 nn.ConvTranspose2d( ngf, nc, 4, 2, 1, bias=False), nn.Tanh() # state size. (nc) x 64 x 64 )

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 18 / 29

slide-49
SLIDE 49

# default nz = 100, ndf = 64 class Discriminator(nn.Module): def __init__(self, ngpu): super(Discriminator, self).__init__() self.ngpu = ngpu self.main = nn.Sequential( # input is (nc) x 64 x 64 nn.Conv2d(nc, ndf, 4, 2, 1, bias=False), nn.LeakyReLU(0.2, inplace=True), # state size. (ndf) x 32 x 32 nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False), nn.BatchNorm2d(ndf * 2), nn.LeakyReLU(0.2, inplace=True), # state size. (ndf*2) x 16 x 16 nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False), nn.BatchNorm2d(ndf * 4), nn.LeakyReLU(0.2, inplace=True), # state size. (ndf*4) x 8 x 8 nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False), nn.BatchNorm2d(ndf * 8), nn.LeakyReLU(0.2, inplace=True), # state size. (ndf*8) x 4 x 4 nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False), nn.Sigmoid() )

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 19 / 29

slide-50
SLIDE 50

# custom weights initialization called on netG and netD def weights_init(m): classname = m.__class__.__name__ if classname.find(’Conv’) != -1: m.weight.data.normal_(0.0, 0.02) elif classname.find(’BatchNorm’) != -1: m.weight.data.normal_(1.0, 0.02) m.bias.data.fill_(0)

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 20 / 29

slide-51
SLIDE 51

# custom weights initialization called on netG and netD def weights_init(m): classname = m.__class__.__name__ if classname.find(’Conv’) != -1: m.weight.data.normal_(0.0, 0.02) elif classname.find(’BatchNorm’) != -1: m.weight.data.normal_(1.0, 0.02) m.bias.data.fill_(0) criterion = nn.BCELoss() fixed_noise = torch.randn(opt.batchSize, nz, 1, 1, device=device) real_label = 1 fake_label = 0 # setup optimizer

  • ptimizerD = optim.Adam(netD.parameters(), lr=opt.lr, betas=(opt.beta1, 0.999))
  • ptimizerG = optim.Adam(netG.parameters(), lr=opt.lr, betas=(opt.beta1, 0.999))

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 20 / 29

slide-52
SLIDE 52

############################ # (1) Update D network: maximize log(D(x)) + log(1 - D(G(z))) ########################### # train with real netD.zero_grad() real_cpu = data[0].to(device) batch_size = real_cpu.size(0) label = torch.full((batch_size,), real_label, device=device)

  • utput = netD(real_cpu)

errD_real = criterion(output, label) errD_real.backward() D_x = output.mean().item() # train with fake noise = torch.randn(batch_size, nz, 1, 1, device=device) fake = netG(noise) label.fill_(fake_label)

  • utput = netD(fake.detach())

errD_fake = criterion(output, label) errD_fake.backward() D_G_z1 = output.mean().item() errD = errD_real + errD_fake

  • ptimizerD.step()

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 21 / 29

slide-53
SLIDE 53

############################ # (2) Update G network: maximize log(D(G(z))) ########################### netG.zero_grad() label.fill_(real_label) # fake labels are real for generator cost

  • utput = netD(fake)

errG = criterion(output, label) errG.backward() D_G_z2 = output.mean().item()

  • ptimizerG.step()

Note that this update implements the − log(D(G(z))) trick.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 22 / 29

slide-54
SLIDE 54

Real images from LSUN’s “bedroom” class.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 23 / 29

slide-55
SLIDE 55

Fake images after 1 epoch (3M images)

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 24 / 29

slide-56
SLIDE 56

Fake images after 2 epochs

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 25 / 29

slide-57
SLIDE 57

Fake images after 5 epochs

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 26 / 29

slide-58
SLIDE 58

Fake images after 10 epochs

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 27 / 29

slide-59
SLIDE 59

Fake images after 20 epochs

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 28 / 29

slide-60
SLIDE 60

Training a standard GAN often results in two pathological behaviors:

  • Oscillations without convergence. Contrary to standard loss minimization,

we have no guarantee here that it will actually decrease.

  • The infamous “mode collapse”, when G models very well a small

sub-population, concentrating on a few modes.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 29 / 29

slide-61
SLIDE 61

Training a standard GAN often results in two pathological behaviors:

  • Oscillations without convergence. Contrary to standard loss minimization,

we have no guarantee here that it will actually decrease.

  • The infamous “mode collapse”, when G models very well a small

sub-population, concentrating on a few modes. Additionally, performance is hard to assess and is often a “beauty contest”.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 29 / 29

slide-62
SLIDE 62

The end

slide-63
SLIDE 63

References

  • I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
  • A. Courville, and Y. Bengio. Generative adversarial networks. CoRR, abs/1406.2661,

2014.

  • A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep

convolutional generative adversarial networks. CoRR, abs/1511.06434, 2015.