Introduction to Generative Models (and GANs) Haoqiang Fan - - PowerPoint PPT Presentation

introduction to generative models and gans
SMART_READER_LITE
LIVE PREVIEW

Introduction to Generative Models (and GANs) Haoqiang Fan - - PowerPoint PPT Presentation

Introduction to Generative Models (and GANs) Haoqiang Fan fhq@megvii.com Nov. 2017 Figures adapted from NIPS 2016 Tutorial Generative Adversarial Networks Generative Models: Learning the Distributions Discriminative: learns the likelihood


slide-1
SLIDE 1

Introduction to Generative Models (and GANs)

Haoqiang Fan fhq@megvii.com

  • Nov. 2017

Figures adapted from NIPS 2016 Tutorial Generative Adversarial Networks

slide-2
SLIDE 2

Generative Models: Learning the Distributions

Discriminative: learns the likelihood Generative: performs Density Estimation (learns the distribution) to allow sampling

Figures adapted from NIPS 2016 Tutorial Generative Adversarial Networks

slide-3
SLIDE 3

Loss function for distribution: Ambiguity and the “blur” effect

Figures adapted from NIPS 2016 Tutorial Generative Adversarial Networks

Generative

MSE: a Discriminative model just smoothes all possibilities.

slide-4
SLIDE 4

Ambiguity and the “blur” effect

Figures adapted from NIPS 2016 Tutorial Generative Adversarial Networks

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

slide-5
SLIDE 5

Example Application of Generative Models

slide-6
SLIDE 6

Image Generation from Sketch

Figures adapted from NIPS 2016 Tutorial Generative Adversarial Networks

iGAN: Interactive Image Generation via Generative Adversarial Networks

slide-7
SLIDE 7

Interactive Editing

Figures adapted from NIPS 2016 Tutorial Generative Adversarial Networks

Neural Photo Editing with Introspective Adversarial Networks

slide-8
SLIDE 8

Image to Image Translation

Figures adapted from NIPS 2016 Tutorial Generative Adversarial Networks

slide-9
SLIDE 9

How Generative Models are Trained

slide-10
SLIDE 10

Learning Generative Models

Figures adapted from NIPS 2016 Tutorial Generative Adversarial Networks

slide-11
SLIDE 11

Taxonomy of Generative Models

Figures adapted from NIPS 2016 Tutorial Generative Adversarial Networks

slide-12
SLIDE 12

Exact Model: NVP (non-volume preserving)

Density estimation using Real NVP https://arxiv.org/abs/1605.08803

slide-13
SLIDE 13

Real NVP: Invertible Non-linear Transforms

Density estimation using Real NVP

slide-14
SLIDE 14

Real NVP: Examples

Density estimation using Real NVP

slide-15
SLIDE 15

Real NVP

Restriction on the source domain: must be of the same as the target.

slide-16
SLIDE 16

Variational Auto-Encoder

Auto-encoding with noise in hidden variable

slide-17
SLIDE 17

Variational Auto-Encoder

slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20

VAE: Examples

slide-21
SLIDE 21

Generative Adversarial Networks (GAN)

Figures adapted from NIPS 2016 Tutorial Generative Adversarial Networks

slide-22
SLIDE 22

DCGAN

Train D by Loss(D(real),1), Loss(D(G(random),0) Train G by Loss(D(G(random)),1)

http://gluon.mxnet.io/chapter14_generative-adversarial-networks/dcgan.html

slide-23
SLIDE 23

DCGAN: Examples

slide-24
SLIDE 24

DCGAN: Example of Feature Manipulation

Vector arithmetics in feature space

slide-25
SLIDE 25

Conditional, Cross-domain Generation

Generative adversarial text to image synthesis

Figures adapted from NIPS 2016 Tutorial Generative Adversarial Networks

slide-26
SLIDE 26

GAN training problems: unstable losses

http://guimperarnau.com/files/blog/Fantastic-GANs-and-where-to-find-them/crazy_loss_function.jpg

slide-27
SLIDE 27

GAN training problems: Mini-batch Fluctuation

Differs much even between consecutive minibatches.

Figures adapted from NIPS 2016 Tutorial Generative Adversarial Networks

slide-28
SLIDE 28

GAN training problems: Mode Collapse

Lack of diversity in generated results.

Figures adapted from NIPS 2016 Tutorial Generative Adversarial Networks

slide-29
SLIDE 29

Figures adapted from NIPS 2016 Tutorial Generative Adversarial Networks

Improve GAN training: Label Smoothing

Improves stability of training

slide-30
SLIDE 30

Improve GAN training: Wasserstein GAN

Use linear instead of log

slide-31
SLIDE 31

WGAN: Stabilized Training Curve

slide-32
SLIDE 32

WGAN: Non-vanishing Gradient

slide-33
SLIDE 33

Loss Sensitive GAN

slide-34
SLIDE 34
slide-35
SLIDE 35

The GAN Zoo

https://github.com/hindupuravinash/the-gan-zoo

slide-36
SLIDE 36
slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41
slide-42
SLIDE 42
slide-43
SLIDE 43
slide-44
SLIDE 44

Cycle GAN: Correspondence from Unpaired Data

slide-45
SLIDE 45

Cycle GAN

slide-46
SLIDE 46

Cycle GAN: Bad Cases

slide-47
SLIDE 47

DiscoGAN

Cross-domain relation

slide-48
SLIDE 48

DiscoGAN

slide-49
SLIDE 49

How much smile? How much smile? Image A Image B Reconstructed B Au Bε Reconstructed B

Underdetermined CycleGAN pattern Information Preserving GeneGAN pattern

Smiling from A Smiling from A Aε Bu

slide-50
SLIDE 50

GeneGAN: shorter pathway improves training

Cross breeds and reproductions

slide-51
SLIDE 51

GeneGAN: Object Transfiguration

Transfer "my" hairstyle to him, not just a hairstyle.

slide-52
SLIDE 52

GeneGAN: Interpolation in Object Subspace

Check the directions of the hairs.

Bi-linearly interpolated ε instance

slide-53
SLIDE 53

Math behind Generative Models

Those who don’t care about math or theory can open their PyTorch now...

slide-54
SLIDE 54

Formulation of Generative Models

sampling v.s. density estimation

slide-55
SLIDE 55

RBM

slide-56
SLIDE 56

RBM

It is NP-Hard to estimate Z

slide-57
SLIDE 57

RBM

It is NP-Hard to sample from P

slide-58
SLIDE 58

Score Matching

Let L be the likelihood function, score V is: If two distribution’s scores match, they also match.

slide-59
SLIDE 59

Markov Chain Monte Carlo

From each node a, walk to “neighbor” b with probability proportional to p(b). Neighbors must be reciprocal: a <->b Walk for long enough time to reach equilibrium

a b p(a)/p(b)/N 1/N

slide-60
SLIDE 60

MCMC in RBM

Sample x given y Sample y given x Sample x given y ….. In theory, repeat for long enough time. In practice, repeat a few times. ("burnin")

slide-61
SLIDE 61

RBM: Learned “Filters”

slide-62
SLIDE 62

From Density to Sample

Given density function p(x), can we efficiently black-box sample from it? No! p(x)= MD5(x)==0 Unless query Ω(N) samples, it is hard to determine.

slide-63
SLIDE 63

From Sample to Density

Given black-box sampler G, can we efficiently estimate the density (frequency) of x? Naive bound: Ω(ε-2) absolute, Ω(1/p(x) ε-2) relative Cannot essentially do better. Example: Sample x randomly. Retry iff x=0.

slide-64
SLIDE 64

What can be done if only samples are available?

Problem: Given black box sampler G, decide if: (1) it is uniform (2) it is ε-far from uniform How to define distance between distributions? Statistical distance: ½ sum |p(x)-q(x)| p:G q:Uniform L2 distance: sum (p(x)-q(x))2 KL divergence: sum q(x)log(q(x)/p(x))

slide-65
SLIDE 65

Uniformity Check using q(x)log(q(x)/p(x))

Impossible to check unless Ω(N) samples are obtained. Consider {1,2,...,N}T and {1,2,...,N-1}T. Unbound KL. Statistical distance = sum max(p(x)-q(x),0) ((N-1)/N)T = 1-o(1) if T=o(N) Statistical distance is the best distinguisher’s advantage over random guess! advantage = 2*|Pr(guess correct)-0.5|

slide-66
SLIDE 66

Uniformity Check using L2 Distance

sum (p(x)-q(x))2 = sum p(x)2+q(x)2-2p(x)q(x) = sum p(x)2 - 1/N p(x)2 : seeing two x in a row sum p(x)2: counting collisions Algorithm: Get T samples, count the number of x[i]==x[j] for i<j, divide by C(T,2) variance calculation: O(ε2) is enough!

slide-67
SLIDE 67

Uniformity Check using L1 Distance

Estimate collision probability to 1±O(ε2) O(ε-4sqrt(N)) samples are enough.

slide-68
SLIDE 68

Lessons Learned: What We Can Get From Samples

Given samples, some properties of the distribution can be learned, while others cannot.

slide-69
SLIDE 69

Discriminator based distances

maxD E(D(x))x~p - E(D(y))y~q 0<=D<=1 : Statistical Distance D is Lipschitz Continuous: Wasserstein Distance

slide-70
SLIDE 70

Wasserstein Distance

Duality Earth Mover Distance: Definition using Discriminator:

slide-71
SLIDE 71

Estimating Wasserstein Distance in High Dimension

The curse of dimensionality There is no algorithm that, for any two distributions P and Q in an n-dimensional space with radius r, takes poly(n) samples from P and Q and estimates W(P,Q) to precision o(1)*r w.h.p.

slide-72
SLIDE 72

Finite Sample Version of EMD

Let WN(P,Q) be the expected EMD between N samples from P and Q. WN(P,Q)>=W(P,Q) W(P,Q)≥WN(P,Q)-min(WN(P,P),WN(Q,Q))

slide-73
SLIDE 73

Projected Wasserstein Distance

The k-dimensional projected EMD: let σ be a random k-dim subspace As a lower bounding approach

slide-74
SLIDE 74

Game Theory: The Generator - Discriminator Game

Stackelberg Game:

  • min. D max. G
  • min. G max. D

Nash equilibrium (G,D) where both G and D will not deviate Which is the largest?

slide-75
SLIDE 75

Linear Model

minimax theorem

slide-76
SLIDE 76

The Future of GANs

Guaranteed stabilization: new distance Broader application: apply adversarial loss in XX / different type of data

slide-77
SLIDE 77

References

GAN Tutorial: https://arxiv.org/pdf/1701.00160.pdf Slides: https://media.nips.cc/Conferences/2016/Slides/6202-Slides.pdf