Introduction to Generative Adversarial Network (GAN) Hongsheng Li - - PowerPoint PPT Presentation

introduction to generative adversarial network gan
SMART_READER_LITE
LIVE PREVIEW

Introduction to Generative Adversarial Network (GAN) Hongsheng Li - - PowerPoint PPT Presentation

Introduction to Generative Adversarial Network (GAN) Hongsheng Li Department of Electronic Engineering Chinese University of Hong Kong Adversarial adj. 1 Generative Models Density Estimation ( | ) p y x


slide-1
SLIDE 1

Introduction to Generative Adversarial Network (GAN)

Hongsheng Li Department of Electronic Engineering Chinese University of Hong Kong

Adversarial – adj. 對抗的

1

slide-2
SLIDE 2

Generative Models

  • Density Estimation

– Discriminative model:

  • y=0 for elephant, y=1 for horse

– Generative model:

Elephant (y=0) Horse (y=1)

) | ( x y p ) | (  y x p ) 1 | (  y x p ) | ( y x p

2

slide-3
SLIDE 3

Generative Models

  • Sample Generation

Training samples Model samples

3

slide-4
SLIDE 4

Generative Models

  • Sample Generation

Training samples Model samples Training samples

4

slide-5
SLIDE 5

Generative Models

  • Generative model
  • GAN is a generative model

– Mainly focuses on sample generation – Possible to do both Data

data

p

el

pmod

Sample generation

5

slide-6
SLIDE 6

Why Worth Studying?

  • Excellent test of our ability to use high-

dimensional, complicated probability distributions

  • Missing data

– Semi-supervised learning

el

pmod

Sample generation

6

slide-7
SLIDE 7

Why Worth Studying?

  • Multi-modal outputs

– Example: next frame prediction

Lotter et al. 2015

7

slide-8
SLIDE 8

Why Worth Studying?

  • Image generation tasks

– Example: single-image super-resolution

Ledig et al 2015

8

slide-9
SLIDE 9

Why Worth Studying?

  • Image generation tasks

– Example: Image-to-Image Translation – https://affinelayer.com/pixsrv/

Isola et al 2016

9

slide-10
SLIDE 10

Why Worth Studying?

  • Image generation tasks

– Example: Text-to-Image Generation

Zhang et al 2016

10

slide-11
SLIDE 11

How does GAN Work?

  • Adversarial – adj. 對抗的
  • Two networks:

– Generator G: creates (fake) samples that the discriminator cannot distinguish – Discriminator D: determine whether samples are fake or real Generator Discriminator

compete

11

slide-12
SLIDE 12

The Generator

  • G: a differentiable function

– modeled as a neural network

  • Input:

– z: random noise vector from some simple prior distribution

  • Output:

– x = G(z): generated samples Generator

z x

12

slide-13
SLIDE 13

The Generator

  • The dimension of z should be at least as large

as that of x

Generator

z G(z)=x

el

pmod

data

p ~

13

slide-14
SLIDE 14

The Discriminator

  • D: modeled as a neural network
  • Input:

– Real sample – Generated sample x

  • Output:

– 1 for real samples – 0 for fake samples Discriminator

x Real data 1

14

slide-15
SLIDE 15

Generative Adversarial Networks

15

slide-16
SLIDE 16

Cost Functions

  • The discriminator outputs a value D(x) indicating the

chance that x is a real image

  • For real images, their ground-truth labels are 1. For

generated images their labels are 0.

  • Our objective is to maximize the chance to recognize

real images as real and generated images as fake

  • The objective for generator can be defined as

16

slide-17
SLIDE 17

Cost Functions

  • For the generator G, its objective function wants the

model to generate images with the highest possible value of D(x) to fool the discriminator

  • The cost function is
  • The overall GAN training is therefore a min-max

game

17

slide-18
SLIDE 18

Training Procedure

  • The generator and the discriminator are learned

jointly by the alternating gradient descent

– Fix the generator’s parameters and perform a single iteration of gradient descent on the discriminator using the real and the generated images – Fix the discriminator and train the generator for another single iteration

18

slide-19
SLIDE 19

The Algorithm

19

slide-20
SLIDE 20

Illustration of the Learning

  • Generative adversarial learning aims to learn a

model distribution that matches the actual data distribution

Discriminator Data Model distribution

20

slide-21
SLIDE 21

Generator diminished gradient

  • However, we encounter a gradient diminishing problem for

the generator. The discriminator usually wins early against the generator

  • It is always easier to distinguish the generated images from

real images in early training. That makes cost function approaches 0. i.e. -log(1 -D(G(z))) → 0

  • The gradient for the generator will also vanish which makes

the gradient descent optimization very slow

  • To improve that, the GAN provides an alternative function to

backpropagate the gradient to the generator

21

minimize maximize

slide-22
SLIDE 22

Comparison between Two Losses

22

slide-23
SLIDE 23

Non-Saturating Game

  • In the min-max game, the generator maximizes

the same cross-entropy

  • Now, generator maximizes the log-probability of

the discriminator being mistaken

  • Heuristically motivated; generator can still learn

even when discriminator successfully rejects all generator samples

23

slide-24
SLIDE 24

Deep Convolutional Generative Adversarial Networks (DCGAN)

  • All convolutional nets
  • No global average pooling
  • Batch normalization
  • ReLU

Radford et al. 2016

24

slide-25
SLIDE 25

Deep Convolutional Generative Adversarial Networks (DCGAN)

Radford et al. 2016

  • LSUN bedroom (about 3m training images)

25

slide-26
SLIDE 26

Manipulating Learned z

26

slide-27
SLIDE 27

Manipulating Learned z

27

slide-28
SLIDE 28

Image Super-resolution with GAN

Ledig et al. 2016

28

slide-29
SLIDE 29

Image Super-resolution with GAN

29

slide-30
SLIDE 30

Image Super-resolution with GAN

30

slide-31
SLIDE 31

Image Super-resolution with GAN

bicubic SRResNet SRGAN

  • riginal

31

slide-32
SLIDE 32

Context-Encoder for Image Inpainting

  • For a pre-defined region, synthesize the image

contents

Pathak et al 2016

32

slide-33
SLIDE 33

Context-Encoder for Image Inpainting

  • For a pre-defined region, synthesize the image

contents

Pathak et al 2016

33

slide-34
SLIDE 34

Context-Encoder for Image Inpainting

  • Overall framework

Original region Synthetic region

34

slide-35
SLIDE 35

Context-Encoder for Image Inpainting

  • The objective

35

slide-36
SLIDE 36

Context-Encoder for Image Inpainting

36

slide-37
SLIDE 37

Image Inpainting with Partial Convolution

Liu 2016

  • Partial convolution for handling missing data
  • L1 loss: minimizing the pixel differences between the

generated image and their ground-truth images

  • Perceptual loss: minimizing the VGG features of the generated

images and their ground-truth images

  • Style loss (Gram matrix): minimizing the gram matrices of the

generated images and their ground-truth images

37

slide-38
SLIDE 38

Image Inpainting with Partial Convolution: Results

Liu 2016

38

slide-39
SLIDE 39

Texture Synthesis with Patch-based GAN

Liu et al. 2018

  • Synthesize textures for input images

39

slide-40
SLIDE 40

Texture Synthesis with Patch-based GAN

Li and Wand 2016 MSE Loss Adv loss

40

slide-41
SLIDE 41

Texture Synthesis with Patch-based GAN

Li and Wand 2016

41

slide-42
SLIDE 42

Texture Synthesis with Patch-based GAN

Li and Wand 2016

42

slide-43
SLIDE 43

Conditional GAN

  • GAN is too free. How to add some constraints?
  • Add conditional variables y into the generator

Mirza and Osindero 2016 Training samples Model samples

43

slide-44
SLIDE 44

Conditional GAN

  • GAN is too free. How to add some constraints?
  • Add conditional variables y into G and D

Mirza and Osindero 2016

44

slide-45
SLIDE 45

Conditional GAN

Mirza and Osindero 2016

45

slide-46
SLIDE 46

Conditional GAN

Mirza and Osindero 2016

1

46

slide-47
SLIDE 47

Conditional GAN

Mirza and Osindero 2016

  • Positive samples for D

– True data + corresponding conditioning variable

  • Negative samples for D

– Synthetic data + corresponding conditioning variable – True data + non-corresponding conditioning variable

47

slide-48
SLIDE 48

Text-to-Image Synthesis

Reed et al 2015

48

slide-49
SLIDE 49

StackGAN: Text to Photo-realistic Images

Zhang et al. 2016

  • How humans draw a figure?

– A coarse-to-fine manner

49

slide-50
SLIDE 50

StackGAN: Text to Photo-realistic Images

Zhang et al. 2016

  • Use stacked GAN structure for text-to-image

synthesis

50

slide-51
SLIDE 51

StackGAN: Text to Photo-realistic Images

  • Use stacked GAN structure for text-to-image

synthesis

51

slide-52
SLIDE 52

StackGAN: Text to Photo-realistic Images

  • Conditioning augmentation
  • No random noise vector z for Stage-2
  • Conditioning both stages on text help achieve

better results

  • Spatial replication for the text conditional

variable

  • Negative samples for D

– True images + non-corresponding texts – Synthetic images + corresponding texts

52

slide-53
SLIDE 53

Conditioning Augmentation

  • How train parameters like the mean and variance of

a Gaussian distribution

  • Sample from standard Normal distribution
  • Multiple with and then add with
  • The re-parameterization trick

) , (

0 

 N

) 1 , ( N

53

slide-54
SLIDE 54

More StackGAN Results on Flower

54

slide-55
SLIDE 55

More StackGAN Results on COCO

55

slide-56
SLIDE 56

StackGAN-v2: Architecture

  • Approximate multi-scale image distributions jointly
  • Approximate conditional and unconditional image

distributions jointly

56

slide-57
SLIDE 57

StackGAN-v2: Results

57

slide-58
SLIDE 58

Progressive Growing of GAN

  • Share the similar spirit with StackGAN-v1/-v2 but use

a different training strategy

58

slide-59
SLIDE 59

Progressive Growing of GAN

  • Impressively realistic face images

59

slide-60
SLIDE 60

Image-to-Image Translation with Conditional GAN

Isola et al. 2016

60

slide-61
SLIDE 61

Image-to-Image Translation with Conditional GAN

  • Incorporate L1 loss into the objective function
  • Adopt the U-net structure for the generator

Encoder-decoder Encoder-decoder with skips

61

slide-62
SLIDE 62

Patch-based Discriminator

  • Separate each image into N x N patches
  • Instead of distinguish whether the whole image is

real or fake, train a patch-based discriminator

62

slide-63
SLIDE 63

More Results

63

slide-64
SLIDE 64

More Results

64

slide-65
SLIDE 65

CycleGAN

  • All previous methods require to have paired training

data, i.e., exact input-output pairs, which can be extremely difficult to obtain in practice

slide-66
SLIDE 66

CycleGAN

  • The framework learns two mapping functions

(generators) G : X → Y and F : Y → X with two domain discriminators DX and DY

slide-67
SLIDE 67

CycleGAN: Results

slide-68
SLIDE 68

CycleGAN: Results

slide-69
SLIDE 69

S2-GAN: Decomposing difficult problems into subproblems

  • Generating indoor images
  • Generating surface normal map + surface style

map

69

slide-70
SLIDE 70

Style-GAN

70

slide-71
SLIDE 71

S2-GAN Results

71

slide-72
SLIDE 72

Insights

  • Some insights

– Decomposing the problems into easier problems – Spatially well-aligned conditioning variables are generally better

72

slide-73
SLIDE 73

Non-convergence in GANs

  • Finding equilibrium is a game of two players
  • Exploiting convexity in function space, GAN training is

theoretically guaranteed to converge if we can modify the density functions directly, but:

– Instead, we modify G (sample generation function) and D (density ratio), not densities – We represent G and D as highly non-convex parametric functions

  • “Oscillation”: can train for a very long time, generating very

many different categories of samples, without clearly generating better samples

  • Mode collapse: most severe form of non-convergence

73

slide-74
SLIDE 74

Mode Collapse

  • D in inner loop: convergence to correct distribution
  • G in inner loop: place all mass on most likely point

Metz et al 2016

74

slide-75
SLIDE 75

Mode Collapse Causes Low Output Diversity

75

slide-76
SLIDE 76

Conditioning Augmentation

76

slide-77
SLIDE 77

Minibatch Features

  • Add minibatch features that classify each

example by comparing it to other members of the minibatch (Salimans et al 2016)

  • Nearest-neighbor style features detect if

a minibatch contains samples that are too similar to each other

77

slide-78
SLIDE 78

Minibatch GAN on CIFAR

78

slide-79
SLIDE 79

Minibatch GAN on ImageNet

79

slide-80
SLIDE 80

Cherry-picked Results

80

slide-81
SLIDE 81

Problems with Counting

81

slide-82
SLIDE 82

Problems with Perspective

82

slide-83
SLIDE 83

83

slide-84
SLIDE 84

Unrolled GANs

  • A toy example

Metz et al 2016

84

slide-85
SLIDE 85
  • Normalize the inputs

– normalize the images between -1 and 1 – Tanh as the last layer of the generator output

  • Modified loss function

– Because of the vanishing gradients (Goodfellow et al 2014). Use – Flip labels when training generator: real = fake, fake = real

Tips and Tricks for training GAN

Soumith et al 2016

85

slide-86
SLIDE 86
  • Use a spherical Z

– Don’t sample from a Uniform distribution – Sample from a Gaussian distribution – When doing interpolations, do the interpolation via a great circle, rather than a straight line (White et al 2016)

Tips and Tricks for training GAN

86

slide-87
SLIDE 87
  • Batch normalization

– Compute mean and standard deviation of features – Normalize features (subtract mean, divide by standard deviation)

Tips and Tricks for training GAN

Soumith et al 2016

87

slide-88
SLIDE 88
  • Batch normalization in G

Tips and Tricks for training GAN

Soumith et al 2016

88

slide-89
SLIDE 89
  • Reference Batch Normalization

– Fix a reference batch – Given new inputs – Normalize the features of X using the mean and standard deviation from R – Every is always treated the same, regardless of which other examples appear in the minibatch

Tips and Tricks for training GAN

Salimens et al 2016

89

slide-90
SLIDE 90
  • Virtual Batch Normalization

– Reference batch norm can overfit to the reference

  • batch. A partial solution is virtual batch norm

– Fix a reference batch – Given new inputs – For each

  • Construct a minibatch containing and all R
  • Compute mean and standard deviation of V
  • Normalize the features of using the mean and

standard deviation

Tips and Tricks for training GAN

Salimens et al 2016

90

slide-91
SLIDE 91

Tips and Tricks for training GAN

  • Use Adam optimizer

– Use SGD for discriminator & Adam for generator

  • Avoid Sparse Gradients: ReLU, MaxPool

– the stability of the GAN game suffers if you have sparse gradients – LeakyReLU = good (in both G and D) – For Downsampling, use: Average Pooling, Conv2d + stride – For Upsampling, use: Bilinear Interpolation, PixelShuffle

91

slide-92
SLIDE 92

Tips and Tricks for training GAN

  • Use Soft and Noisy Labels

– Default cost – Label Smoothing (Salimans et al. 2016) – For real ones (label=1), replace it with 0.9; For fake ones (label=0), keep it to 0.

92

slide-93
SLIDE 93

Tips and Tricks for training GAN

  • Use Soft and Noisy Labels

– make the labels noisy for the discriminator:

  • ccasionally flip the labels when training the

discriminator

93

slide-94
SLIDE 94

Tips and Tricks for training GAN

  • Track failures early

– D loss goes to 0: failure mode – Check norms of gradients: > 100 is bad – When training well, D loss has low variance and is going down. Otherwise, D loss is spiky – If loss of G steadily decreases, then it’s fooling D with garbage

94

slide-95
SLIDE 95

Tips and Tricks for training GAN

  • Discrete variables in Conditional GANs

– Use an Embedding layer – Add as additional channels to images – Keep embedding dimensionality low and upsample to match image channel size

95

slide-96
SLIDE 96

Tips and Tricks for training GAN

  • Use label information when possible

– Used as conditioning variable – Auxiliary classifier GAN

Conditional GAN AC-GAN

96

slide-97
SLIDE 97

Tips and Tricks for training GAN

  • Balancing G and D

– Usually the discriminator “wins” – Good thing: theoretical justification are based on assuming D is perfect – Usually D is bigger and deeper than G – Sometimes run D more often than G. Mixed results – Do not try to limit D to avoid making it “too smart”

  • Use non-saturating cost
  • Use label smoothing

97

slide-98
SLIDE 98

Research Directions

  • Research direction of GANs

– Better network structures – Better objective functions – Novel problem setups – Use of adversarial losses in other CV/ML applications – Theories on GAN

98