Wasserstein GAN
Martin Arjovsky, Soumith Chintala, Léon Bottou, ICML 2017
Presented by Yaochen Xie 12-22–2017
Wasserstein GAN Martin Arjovsky, Soumith Chintala, Lon Bottou, ICML - - PowerPoint PPT Presentation
Wasserstein GAN Martin Arjovsky, Soumith Chintala, Lon Bottou, ICML 2017 Presented by Yaochen Xie 12-222017 Contents GAN and its applications [1] GAN vs. Variational Auto-Encoder [2] Whats wrong with GAN [3], [4] JS
Martin Arjovsky, Soumith Chintala, Léon Bottou, ICML 2017
Presented by Yaochen Xie 12-22–2017
❖ GAN and its applications [1] ❖ GAN vs. Variational Auto-Encoder [2] ❖ What’s wrong with GAN [3], [4] ❖ JS Divergence and KL Divergence [3], [4] ❖ Wasserstein Distance [4], [5] ❖ WGAN and its Implementation[4]
D and G play the following two-player minimax game with the value function V(D, G):
Conditional GAN Triangle GAN
GAN LR TV Input Real
network, that forces it to generate latent vectors that roughly follow a unit gaussian distribution
error
to judge generation quality
discriminator to judge generation quality
Gradient Vanishing Unstable, not converging Mode Collapse
Discrete Distributions:
Continuous Distributions:
A metric that measures the distance between two distributions Notice that: (P||Q) (P||Q) is not equal to Rigorously, KL divergence cannot be considered as a distance.
A symmetrized and smoothed version of the KL Divergence When two distribution are far from each other….
Cross Entropy: Loss based on Cross Entropy: What if p and q belongs to continuous distributions?
Now we fix G, and let D be optimum:
=> 2 times of Jensen-Shannon divergence
Till now, to optimize the loss is equivalent to minimize the JS-divergence between Pr and Pg.
Gradient Vanishing!
When G is fixed and D is optimum: = Mode Collapse Unstable
KL —> ∞ KL —> 0
E —> 0
are too strong for the loss function 1 to be continuous.
measurement of distance s.t.:
continuous.
almost everywhere if is locally Lipschitz with finite expectation of local Lipschitz constant.
Brenier potential
Minkowski theorem Alexandrov theorem Geometric Interpretation to Optimal Transport Map
where Lip(f) denotes the minimal Lipschitz constant for f.
https://vincentherrmann.github.io/blog/wasserstein/
Compared with origin GAN, WGAN conducts four changes:
LD = LG =
[1] Ian J. Goodfellow. Generative Adversarial Nets. [2] Kingma, Diederik P
., and Max Welling. Auto-encoding variational bayes.
[3] Martin Arjovsky and L´eon Bottou. Towards principled methods for
training generative adversarial networks.
[4] Martin Arjovsky, Soumith Chintala and Léon Bottou. Wasserstein GAN. [5] Na Lei, Kehua Su, Li Cui, Shing-Tung Yau, David Xianfeng Gu, A
Geometric View of Optimal Transportation and Generative Model