Intro Tutorial on GANs Michela Paganini Fermilab Machine Learning - - PowerPoint PPT Presentation

intro tutorial on gans
SMART_READER_LITE
LIVE PREVIEW

Intro Tutorial on GANs Michela Paganini Fermilab Machine Learning - - PowerPoint PPT Presentation

Yale Intro Tutorial on GANs Michela Paganini Fermilab Machine Learning Group Meeting March 21, 2018 Yale Outline Overview: Generative modeling Generative Adversarial Networks (GANs) Hands-on : build vanilla GAN on FashionMNIST GAN


slide-1
SLIDE 1

Yale

Intro Tutorial on GANs

Michela Paganini

March 21, 2018

Fermilab Machine Learning Group Meeting

slide-2
SLIDE 2

Yale

Outline

Overview:

  • Generative modeling
  • Generative Adversarial Networks (GANs)

Hands-on: build vanilla GAN on FashionMNIST GAN improvements (f-GAN, WGAN, WGAN-GP) Hands on: build WGAN on FashionMNIST

2

slide-3
SLIDE 3

Yale

Task: given a training dataset, generate more samples from the same data distribution Why do we care?

  • HEP: fast simulation
  • Domain adaptation
  • Latent representation learning
  • Simulation and planning for RL

3

slide-4
SLIDE 4

Yale

Generative Modeling

4

Build a generative model with probability distribution

slide-5
SLIDE 5

Yale

Taxonomy of Generative Models

5

From I. Goodfellow Maximum Likelihood Explicit density Implicit density … Tractable density Fully visible belief nets: NADE MADE PixelRNN Change of variables models (nonlinear ICA) Approximate density Variational VAE Markov Chain Boltzmann machine Markov Chain Direct GSN GAN

slide-6
SLIDE 6

Yale

2-player game between generator and discriminator Latent prior mapped to sample space implicitly defines a distribution Discriminator tells how fake or real a sample looks via a score

Generative Adversarial Networks

6

Distinguish real samples from fake samples Transform noise into a realistic sample Real data

slide-7
SLIDE 7

Yale

Minimax Formulation

Construct a two-person zero-sum minimax game with a value We have an inner maximization by D and an outer minimization by G
 With perfect discriminator, generator minimizes

7

slide-8
SLIDE 8

Yale

How to train your GANs

Alternate gradient descent on D and gradient ascent on G Heuristic loss function (non-saturating):
 
 
 instead of

8

slide-9
SLIDE 9

Yale

Demo #1

9

Switch to i🐎📔 for 1st hands-on activity:
 training a vanilla GAN on FashionMNIST

slide-10
SLIDE 10

Yale

f-Divergence

Divergence = “a function which establishes the "distance" of one probability distribution to the other on a statistical manifold" f-divergences: Convex conjugate: No access to distributions in functional form. Use empirical expectations instead:

10

slide-11
SLIDE 11

Yale

f-GAN

Extend GAN formalism: Any f-divergence can be used as GAN objective

11

slide-12
SLIDE 12

Yale

Problems with f-GANs

If at any point the supports of these distributions have no overlap, this family of measures collapses to a null or infinite distance

12

Instance noise Label flipping Label smoothing

& (some) solutions

learn more

slide-13
SLIDE 13

Yale

Solving the Disjoint Support Problem

Other distance metrics: Integral Probability Metrics, Wasserstein Distances, Proper Scoring Rules e.g. replace family of f-divergences with the Wasserstein-1 distance, often referred to as the Earth Movers Distance Intuition: think of PDFs as mounds of dirt; EMD describes how much “work” it takes to transform one mound of dirt into another Accounts for both “mass” and distance —i.e., this works for disjoint PDFs!

13

Excellent WGAN review!

slide-14
SLIDE 14

Yale

The Wasserstein-1 Distance

…is intractable! The infimum is over a uncountably infinite set of candidate distribution pairings Kantorovich-Rubenstein duality to the rescue! What does this give us? Restriction to K-Lipschitz functions

14

Excellent blog post about WGAN and KR-duality from V. Herrmann!

slide-15
SLIDE 15

Yale

Lipschitz?

We can constrain a neural network to be 
 K-Lipschitz! This lets us parameterize f(x) as a neural network, and clamp the weights to be in a compact space, say [-c, c] This guarantees that a network f is 
 K-Lipschitz, with K = r(c), for some function r Great! The network can now operate in Wasserstein-1 space up to a constant factor! f is now a critic

15

For a Lipschitz continuous function, there is a double cone (shown in white) whose vertex can be translated along the graph, so that the graph always remains entirely outside the cone.

slide-16
SLIDE 16

Yale

The WGAN Algorithm

Training critic: For each batch of real samples, we want the output of f to be as big (1.0) as possible For each batch of fake samples, we want the output of f to be as small (-1.0) as possible Training generator: We want f to be as 
 big (1.0) as possible

16

Excellent blog post about WGAN from A. Irpan!

slide-17
SLIDE 17

Yale

WGAN Introspection

Reliable metric to train critic to convergence

17

slide-18
SLIDE 18

Yale

WGAN Deficiencies

Restricting neural net to have weights in compact space restricts expressivity (capacity underutilization) Gradients explode / vanish Why not make the network 1-Lipschitz?

18

slide-19
SLIDE 19

Yale

WGAN-GP

WGAN with Gradient Penalty TL;DR: penalize critic for having a gradient norm too far from unity This is a better way to ensure 1-Lipschitz critics For every real sample, build a fake sample, and randomly linearly interpolate between the two

19

slide-20
SLIDE 20

Yale

Demo #2

20

Switch batch to i🐎📔 for 2nd hands-on activity:
 training WGAN on FashionMNIST

slide-21
SLIDE 21

Yale

Thank You!

21

You can find me at: michela.paganini@yale.edu

Question?

slide-22
SLIDE 22

Yale

Theoretical dynamics of minimax GANs for optimal D

From original paper, know that the optimal discriminator is: Define generator solving for infinite capacity discriminator, We can rewrite value as Simplifying notation, and applying some algebra But we recognize this as a summation of two KL-divergences And can combine these into the Jenson-Shannon divergence 
 This yields a unique global minimum precisely when

22