Tax axono onomy my of f ge generativ erative e mo models - - PowerPoint PPT Presentation

tax axono onomy my of f ge generativ erative e mo models
SMART_READER_LITE
LIVE PREVIEW

Tax axono onomy my of f ge generativ erative e mo models - - PowerPoint PPT Presentation

Tax axono onomy my of f ge generativ erative e mo models dels Prof. Leal-Taix and Prof. Niessner Figure from Ian Goodfellow, Tutorial on Generative Adversarial /networks, 2017 1 Tax axono onomy my of f ge generativ erative e mo


slide-1
SLIDE 1

Tax axono

  • nomy

my of f ge generativ erative e mo models dels

  • Prof. Leal-Taixé and Prof. Niessner

1

Figure from Ian Goodfellow, Tutorial on Generative Adversarial /networks, 2017

slide-2
SLIDE 2

Tax axono

  • nomy

my of f ge generativ erative e mo models dels

  • Prof. Leal-Taixé and Prof. Niessner

2

Figure from Ian Goodfellow, Tutorial on Generative Adversarial /networks, 2017

slide-3
SLIDE 3

Ge Gene nerative rative Ad Adver ersarial sarial Ne Networks tworks

  • Prof. Leal-Taixé and Prof. Niessner

3

slide-4
SLIDE 4

Generat rativ ive Ad Adve versari sarial al Netw tworks rks (GAN ANs) s)

  • Prof. Leal-Taixé and Prof. Niessner

4

https://github.com/hindupuravinash/the-gan-zoo

slide-5
SLIDE 5

Conv nvolutio

  • lution

n an and Dec d Deconv

  • nvolution
  • lution

Convolution no padding, no stride

https://github.com/vdumoulin/conv_arithmetic

Transposed convolution no padding, no stride Input Output Input Output

slide-6
SLIDE 6

Autoencod

  • encoder

er

Conv Deconv

slide-7
SLIDE 7

Rec econstru

  • nstruction:

ction: Autoencod

  • encoder

er

Conv Deconv Input Image Output Image

Reconstruction Loss (often L2)

slide-8
SLIDE 8

Tra raining ning Autoencoder

  • encoders

Latent space z dim (z) < dim (x) Input x Reconstruction x’ Input images Reconstructed images

slide-9
SLIDE 9

De Deco coder der as as Ge Generativ nerative Mo Mode del

Latent space z dim (z) < dim (x) “Test time”:

  • > reconstruction from

‘random’ vector Output Image

Reconstruction Loss (often L2)

slide-10
SLIDE 10

De Deco coder der as as Ge Generativ nerative Mo Model

Interpolation between two chair models

[Dosovitsky et al. 14] Learning to Generate Chairs

slide-11
SLIDE 11

De Deco coder der as as Ge Generativ nerative Mo Mode del

[Dosovitsky et al. 14] Learning to Generate Chairs

Morphing between chair models

slide-12
SLIDE 12

De Deco coder der as as Ge Generativ nerative Mo Mode del

Latent space z dim (z) < dim (x) “Test time”:

  • > reconstruction from

‘random’ vector

Reconstruction Loss Often L2, i.e., sum of squared dist.

  • > L2 distributes error equally
  • > mean is opt.
  • > res. Is blurry

Instea ead d of L L2, , can we we “learn”a los

  • ss funct

ction? ion?

slide-13
SLIDE 13

Generat rativ ive Ad Adve versari sarial al Netw tworks rks (GAN ANs) s)

[Goodfellow et al. 14] GANs (slide McGuinness)

  • Prof. Leal-Taixé and Prof. Niessner

13

𝑨 𝐻 𝐻(𝑨) 𝐸 𝐸(𝐻(𝑨))

slide-14
SLIDE 14

Generat rativ ive Ad Adve versari sarial al Netw tworks rks (GAN ANs) s)

[Goodfellow et al. 14] GANs (slide McGuinness)

  • Prof. Leal-Taixé and Prof. Niessner

14

𝑨 𝐻 𝐻(𝑨) 𝐸 𝑦 𝐸(𝑦) 𝐸(𝐻(𝑨))

slide-15
SLIDE 15

Generat rativ ive Ad Adve versari sarial al Netw tworks rks (GAN ANs) s)

[Goodfellow et al. 14/16] GANs

real data fake data

slide-16
SLIDE 16

Discriminator loss Generator loss binary cross entropy

GA GANs: Ns: Loss ss Fu Functions nctions

  • Minimax Game:

– G minimizes probability that D is correct – Equilibrium is saddle point of discriminator loss

[Goodfellow et al. 14/16] GANs

  • > D provid

vides supervi visio ion n (i.e., e., gradi dients nts) ) for r G

slide-17
SLIDE 17
  • Heuristic Method (often used in practice)

– G maximizes the log-probability of D being mistaken – G can still learn even when D rejects all generator samples

Discriminator loss

GA GANs: Ns: Loss ss Fu Functions nctions

[Goodfellow et al. 14/16] GANs

Generator loss

slide-18
SLIDE 18

Alter ernating nating Gr Grad adient ent Up Upda dates es

  • Step 1: Fix G, and perform gradient step to
  • Step 2: Fix D, and perform gradient step to
  • Prof. Leal-Taixé and Prof. Niessner

18

slide-19
SLIDE 19

Van anilla illa GA GAN

  • Prof. Leal-Taixé and Prof. Niessner

19 https://papers.nips.cc/paper/5423-generative-adversarial-nets

slide-20
SLIDE 20

Tra raining ning a G a GAN

  • Prof. Leal-Taixé and Prof. Niessner

20

https://medium.com/ai-society/gans-from-scratch-1-a-deep-introduction-with-code-in-pytorch-and-tensorflow-cb03cdcdba0f

slide-21
SLIDE 21

GA GANs: Ns: Loss ss Fu Functions nctions

[Goodfellow et al. 14/16] GANs

Minimax Heuristic

slide-22
SLIDE 22

DCGAN: GAN: Gener nerat ator

  • r

DCGAN: https://github.com/carpedm20/DCGAN-tensorflow

Generator of Deep Convolutional GANs

slide-23
SLIDE 23

DC DCGAN: GAN: Res esult ults

DCGAN: https://github.com/carpedm20/DCGAN-tensorflow

Results on MNIST

  • Prof. Leal-Taixé and Prof. Niessner

23

slide-24
SLIDE 24

DC DCGAN: GAN: Res esult ults

Results on CelebA (200k relatively well aligned portrait photos)

DCGAN: https://github.com/carpedm20/DCGAN-tensorflow

slide-25
SLIDE 25

DC DCGAN: GAN: Res esult ults

DCGAN: https://github.com/carpedm20/DCGAN-tensorflow

Asian face dataset

  • Prof. Leal-Taixé and Prof. Niessner

25

slide-26
SLIDE 26

DC DCGAN: GAN: Res esult ults

DCGAN: https://github.com/carpedm20/DCGAN-tensorflow

  • Prof. Leal-Taixé and Prof. Niessner

26

slide-27
SLIDE 27

DC DCGAN: GAN: Res esult ults

DCGAN: https://github.com/carpedm20/DCGAN-tensorflow

  • Prof. Leal-Taixé and Prof. Niessner

27

Loss of D and G on custom dataset

slide-28
SLIDE 28

“Bad” Training Curves

  • Prof. Leal-Taixé and Prof. Niessner

28

https://stackoverflow.com/questions/44313306/dcgans-discriminator-getting-too-strong-too-quickly-to-allow-generator-to-learn

slide-29
SLIDE 29

“Good” Training Curves

  • Prof. Leal-Taixé and Prof. Niessner

29

https://medium.com/ai-society/gans-from-scratch-1-a-deep-introduction-with-code-in-pytorch-and-tensorflow-cb03cdcdba0f

slide-30
SLIDE 30

“Good” Training Curves

  • Prof. Leal-Taixé and Prof. Niessner

30

https://stackoverflow.com/questions/42690721/how-to-interpret-the-discriminators-loss-and-the-generators-loss-in-generative

slide-31
SLIDE 31

Tra raining ning Sch chedul edules es

  • Adaptive schedules
  • For instance:

while loss_discriminator > t_d: train discriminator while loss_generator > t_g: train generator

  • Prof. Leal-Taixé and Prof. Niessner

31

slide-32
SLIDE 32

Wea eak vs s Str trong

  • ng Di

Discriminat criminator

  • r

Need balance 

  • Discriminator too weak?

– No good gradients (cannot get better than teacher…)

  • Generator too weak?

– Discriminator will always be right

  • Prof. Leal-Taixé and Prof. Niessner

32

slide-33
SLIDE 33

Mo Mode de Coll llaps apse

  • min

𝐻 max 𝐸

𝑊 𝐻, 𝐸 ≠ max

𝐸

min

𝐻 𝑊(𝐻, 𝐸)

  • 𝐸 in inner loop -> convergence to correct dist.
  • 𝐻 in inner loop -> easy to convergence to one

sample

  • Prof. Leal-Taixé and Prof. Niessner

33

[Metz et al. 16]

slide-34
SLIDE 34

Mo Mode de Coll llaps apse

  • Data dim. Fixed (512)
  • Performance

correlates with # of modes

  • Prof. Leal-Taixé and Prof. Niessner

34

Slide credit Ming-Yu Liu

  • > More modes, smaller recovery rate!
  • > part of the reason, why we often see

GAN-results on specific domains (e.g., faces)

slide-35
SLIDE 35

Mo Mode de Coll llaps apse

  • Performance

correlates with dim of manifold

  • Prof. Leal-Taixé and Prof. Niessner

35

Slide credit Ming-Yu Liu

  • > Larger latent space,

more mode collapse

slide-36
SLIDE 36

Prob

  • blems

lems wi with th Gl Global al Struc tructu ture

  • Prof. Leal-Taixé and Prof. Niessner

36

slide-37
SLIDE 37

Prob

  • blems

lems wi with th Count unting ing

  • Prof. Leal-Taixé and Prof. Niessner

37

slide-38
SLIDE 38

Ev Evaluation aluation of f GA GAN N Per erformance formance

  • Prof. Leal-Taixé and Prof. Niessner

38

slide-39
SLIDE 39

Ev Evaluation aluation of f GA GAN N Per erformance formance

  • Main difficulty of GANs: we don’t know how good

they are

  • People cherry pick results in papers -> some of them

will always look good, but how to quantify?

  • Do we only memorize or do we generalize?
  • GANs are difficult to evaluate! [This et al., ICLR 2016]
  • Prof. Leal-Taixé and Prof. Niessner

39

slide-40
SLIDE 40

Ev Evaluation aluation of f GA GAN N Per erformance formance

Human evaluation:

  • Every n updates, show a series of predictions
  • Check train curves
  • What does ‘look good’ mean at the beginning?
  • Need variety!
  • But don’t have ‘realistic’ predictions yet…
  • If it doesn’t look good? Go back, try different

hyperparameters…

  • Prof. Leal-Taixé and Prof. Niessner

40

slide-41
SLIDE 41

Ev Evaluation aluation of f GA GAN N Per erformance formance

Incept ceptio ion n Score (IS)

  • Measures saliency and diversity
  • Train an accurate classifier
  • Train a image generation model (conditional)
  • Check how accurate the classifier can recognize the

generated images

  • Makes some assumptions about data distributions…
  • Prof. Leal-Taixé and Prof. Niessner

41

slide-42
SLIDE 42

Ev Evaluation aluation of f GA GAN N Per erformance formance

Incept ceptio ion n Score (IS)

  • Saliency: check whether the generated images can be

classified with high confidence (i.e., high scores only

  • n a single class)
  • Diversity: check whether we obtain samples from all

classes

  • Prof. Leal-Taixé and Prof. Niessner

42

What t if w we only have ve one good d image ge per class?

slide-43
SLIDE 43

Ev Evaluation aluation of f GA GAN N Per erformance formance

  • Could also look at discriminator

– If we end up with a strong discriminator, then generator must also be good – Use D features, for classification network – Only fine-tune last layer – If high class accuracy -> we have a good D and G

  • Prof. Leal-Taixé and Prof. Niessner

43

Caveat: not sure if people do this... Couldn’t find paper

slide-44
SLIDE 44

Ne Next: xt: Ma Making ng GA GANs Ns Work rk in Pra ractice ctice

  • Training / Hyperparameters (most important)
  • Choice of loss function
  • Choice of architecture
  • Prof. Leal-Taixé and Prof. Niessner

44

slide-45
SLIDE 45

GA GAN N Hac acks: s: No Normalize rmalize Inputs puts

  • Normalize the inputs between -1 and 1
  • Tanh as the last layer of the generator output
  • No-brainer 
  • Prof. Leal-Taixé and Prof. Niessner

45 https://github.com/soumith/ganhacks

slide-46
SLIDE 46

GA GAN N Hac acks: s: Sam ampling pling

  • Use a spherical z
  • Don’t sample from a uniform distribution
  • Sample from a Gaussian Distribution
  • Prof. Leal-Taixé and Prof. Niessner

46

  • When doing interpolations, do the

interpolation via a great circle, rather than a straight line from point A to point B

  • Tom White's Sampling Generative

Networks ref code https://github.com/dribnet/plat has more details

slide-47
SLIDE 47

GA GAN N Hac acks: s: Bat atch chNor Norm

  • Use Batch Norm
  • Construct different mini-

batches for real and fake, i.e. each mini- batch needs to contain

  • nly all real images or

all generated images.

  • Prof. Leal-Taixé and Prof. Niessner

47

slide-48
SLIDE 48

GA GAN N Hac acks: s: Us Use e ADAM AM

  • See Adam usage [Radford et al. 15]
  • SGD for discriminator
  • ADAM for generator
  • Prof. Leal-Taixé and Prof. Niessner

48

slide-49
SLIDE 49

GAN AN Ha Hacks ks: : One-sided ided Label l Smooth

  • thing

ing

  • Prevent discriminator from giving too large gradient

signal to generator:

  • Prof. Leal-Taixé and Prof. Niessner

49

Salimans et al. 17 “Improved Techniques for Training GANs”

Some value smaller than 1; e.g.,0.9

  • > reduces confidence; i.e., makes disc. ‘weaker’
  • > encourages ‘extreme samples’ (prevents extrapolating)

𝝁

slide-50
SLIDE 50

GAN AN Ha Hacks ks: : Hi Historical rical Generat rator

  • r Batche

hes

  • Prof. Leal-Taixé and Prof. Niessner

50

Srivastava et al. 17 “Learning from Simulated and Unsupervised Images through Adversarial Training”

Help stabilize discriminator training in early stage

slide-51
SLIDE 51

GA GAN N Hac acks: : Avoid

  • id Spar

arse se Gr Grad adient ents

  • Stability of GAN game suffers if gradients are sparse
  • LeakyReLU -> good in both G and D
  • Downsample -> use average pool, conv+stride
  • Upsample -> deconv+stride, PixelShuffle
  • Prof. Leal-Taixé and Prof. Niessner

51 [Shi et al. 16] https://arxiv.org/pdf/1609.05158.pdf

slide-52
SLIDE 52

Ex Expo ponent nential ial Averaging veraging of f Wei eights ghts

  • Problem: discriminator is noisy due to SGD
  • Rather than taking final result of a GAN, would be

biased on last latest iterations (i.e., latest training samples),

  • > exponential average of weights
  • > keep second ‘vector’ of weights that are

averaged

  • > almost no cost, average of weights from last n iters
  • Prof. Leal-Taixé and Prof. Niessner

52

slide-53
SLIDE 53

Ne New w Obj bjectiv ective e Fu Func ncti tions

  • ns
  • Prof. Leal-Taixé and Prof. Niessner

53

slide-54
SLIDE 54

Ne New w Obj bjectiv ective e Fu Func ncti tions

  • ns

“heuristic is standard…” EBGAN: “Energy-based Generative Adversarial Networks” BEGAN: “Boundary Equilibrium GAN” WGAN: “Wasserstein Generative Adversarial Networks” LSGAN: “Least Squares Generative Adversarial Networks” …. The loss function alone will not make it suddenly work!

  • Prof. Leal-Taixé and Prof. Niessner

54

slide-55
SLIDE 55

GA GAN N Loss sses: es: EB EBGAN GAN

  • Discriminator is AE (Energy-based GAN)
  • a good autoencoder: we want the reconstruction cost

D(x) for real images to be low.

  • a good critic: we want to penalize the discriminator if

the reconstruction error for generated images drops below a value m.

  • Prof. Leal-Taixé and Prof. Niessner

55

https://medium.com/@jonathan_hui/gan-energy-based-gan-ebgan-boundary-equilibrium-gan-began-4662cceb7824

slide-56
SLIDE 56

GA GAN N Loss sses: es: BEGAN EGAN

  • Similar to EBGAN
  • Instead of reconstruction

loss, measure difference in data distribution of real and generated images

  • Prof. Leal-Taixé and Prof. Niessner

56

https://medium.com/@jonathan_hui/gan-energy-based-gan-ebgan-boundary-equilibrium-gan-began-4662cceb7824

slide-57
SLIDE 57

GA GAN N Loss sses: es: WG WGAN

  • Earth Mover Distance / Wasserstein Distance
  • Prof. Leal-Taixé and Prof. Niessner

57

Minimum amount of work to move earth from p(x) to q(x)

https://medium.com/@jonathan_hui/gan-wasserstein-gan-wgan-gp-6a1a2aa1b490

slide-58
SLIDE 58

GA GAN N Loss sses: es: WG WGAN

  • Formulate EMD via it’s dual:
  • Prof. Leal-Taixé and Prof. Niessner

58

1-Lipschitz function: upper bound between densities

https://medium.com/@jonathan_hui/gan-wasserstein-gan-wgan-gp-6a1a2aa1b490

slide-59
SLIDE 59

GA GAN N Loss sses: es: WG WGAN

  • Prof. Leal-Taixé and Prof. Niessner

59

f is a critic function, defined by a neural network

  • > f needs to be 1-Lipschitz; WGAN restricts max weight value in f;

weights of the discriminator must be within a certain range controlled by hyperparameters c

https://medium.com/@jonathan_hui/gan-wasserstein-gan-wgan-gp-6a1a2aa1b490

slide-60
SLIDE 60

GA GAN N Loss sses: es: WG WGAN

  • Prof. Leal-Taixé and Prof. Niessner

60 https://medium.com/@jonathan_hui/gan-wasserstein-gan-wgan-gp-6a1a2aa1b490

slide-61
SLIDE 61

GA GAN N Loss sses: es: WG WGAN

  • Prof. Leal-Taixé and Prof. Niessner

61 https://medium.com/@jonathan_hui/gan-wasserstein-gan-wgan-gp-6a1a2aa1b490

slide-62
SLIDE 62

GA GAN N Loss sses: es: WG WGAN

  • Prof. Leal-Taixé and Prof. Niessner

62 https://medium.com/@jonathan_hui/gan-wasserstein-gan-wgan-gp-6a1a2aa1b490

slide-63
SLIDE 63

GA GAN N Loss sses: es: WG WGAN

  • Prof. Leal-Taixé and Prof. Niessner

63 https://medium.com/@jonathan_hui/gan-wasserstein-gan-wgan-gp-6a1a2aa1b490

slide-64
SLIDE 64

GA GAN N Loss sses: es: WG WGAN

  • Prof. Leal-Taixé and Prof. Niessner

64 https://medium.com/@jonathan_hui/gan-wasserstein-gan-wgan-gp-6a1a2aa1b490

slide-65
SLIDE 65

GA GAN N Loss sses: es: WG WGAN

+ mitigates mode collapse + generator still learns when critic performs well + actual convergence

  • Enforcing Lipschitz constraint is difficult
  • Weight clipping is “terrible”
  • > too high: takes long time to reach limit limit; slow

training

  • > too small: vanishing gradients when layersi s big
  • Prof. Leal-Taixé and Prof. Niessner

65

slide-66
SLIDE 66

GA GAN N Loss sses es

  • Many more variations!!!
  • High-level understanding: “loss” is a meta loss to

train the actual loss (i.e., D) to provide gradients for G

  • Always start simple: if things don’t converge, don’t

randomly shuffle loss around; always try easy things first (AE, VAE, ‘simple heuristic’ GAN)

  • Prof. Leal-Taixé and Prof. Niessner

66

slide-67
SLIDE 67

GAN N Archit chitectu ectures es

  • Prof. Leal-Taixé and Prof. Niessner

67

slide-68
SLIDE 68

Mu Multisc tiscale ale GA GANs Ns

Credit: Li/Karpathy/Johnson

  • Prof. Leal-Taixé and Prof. Niessner

68

slide-69
SLIDE 69

Mu Multisc tiscale ale GA GANs Ns

Credit: Li/Karpathy/Johnson

  • Prof. Leal-Taixé and Prof. Niessner

69

slide-70
SLIDE 70

Prog

  • gressi

essive e Gr Growing wing GA GANs Ns

https://github.com/tkarras/progressive_growing_of_gans [Karras et al. 17]

slide-71
SLIDE 71

71

64×64 4×4

G

Latent 64×64 4×4

D

Real or fake Generated image

slide-72
SLIDE 72

72

64×64 4×4

G

Latent 64×64 4×4

D

Real or fake Generated image

slide-73
SLIDE 73

73

Generated image 4×4 4×4 1024×1024 1024×1024

G D

Latent Real or fake

slide-74
SLIDE 74

74

4×4 4×4 1024×1024 1024×1024

G D

Latent Real or fake

There’s waves everywhere! But where’s the shore?

slide-75
SLIDE 75

75

64×64 4×4

G

Latent 64×64 4×4

D

Real or fake 1024×1024 1024×1024

There it is!

slide-76
SLIDE 76

76

4×4

G

Latent 4×4

D

Real or fake

slide-77
SLIDE 77

77

4×4

G

Latent 4×4

D

Real or fake

slide-78
SLIDE 78

78

4×4

G

Latent 4×4

D

Real or fake 8×8 8×8

slide-79
SLIDE 79

79

4×4

G

Latent 4×4

D

Real or fake 1024×1024 1024×1024

slide-80
SLIDE 80

80

2x

8×8 8×8

2x

16×16 16×16

2x

32×32 32×32 4×4 4×4

G

Nearest-neighbor upsampling 3×3 convolution Replicated block

slide-81
SLIDE 81

81

2x

16×16 16×16

2x

32×32 32×32

2x

8×8 8×8

G

toRGB

1×1 convolution

4×4 4×4

slide-82
SLIDE 82

82

toRGB toRGB 2x

16×16 16×16

2x

32×32 32×32 4×4 4×4

G

2x

8×8 8×8

slide-83
SLIDE 83

83

toRGB toRGB 2x

16×16 16×16

2x

32×32 32×32 4×4 4×4

G

2x

8×8 8×8

+

Linear crossfade

slide-84
SLIDE 84

84

2x

16×16 16×16

2x

32×32 32×32

2x

8×8 8×8 4×4 4×4 32×32 32×32

0.5x

16×16 16×16

0.5x

8×8 4×4

fromRGB

8×8

0.5x

4×4

G D

toRGB

slide-85
SLIDE 85

85

slide-86
SLIDE 86

Prog

  • gressi

essive e Gr Growing wing GA GANs Ns

https://github.com/tkarras/progressive_growing_of_gans [Karras et al. 17]

slide-87
SLIDE 87

Lots ts of f GA GAN N Var ariations ations

  • Hundreds of GAN papers in the last two years

– > Mostly with different losses – > Extremely hard to train and evaluate

slide-88
SLIDE 88

Ne Next xt lec ectur tures

  • Next Monday 17th, more on Generative models

– Conditional itional GANs Ns (cGANs Ns)! )!

  • We are still working on feedback for presentations –

will send around asap.

  • Keep working on the projects!
  • Prof. Leal-Taixé and Prof. Niessner

88