Intro Tutorial on GANs Michela Paganini Fermilab Machine Learning - PowerPoint PPT Presentation

Yale Intro Tutorial on GANs Michela Paganini Fermilab Machine Learning Group Meeting March 21, 2018

Yale Outline Overview: • Generative modeling • Generative Adversarial Networks (GANs) Hands-on : build vanilla GAN on FashionMNIST GAN improvements (f-GAN, WGAN, WGAN-GP) Hands on : build WGAN on FashionMNIST 2

Yale Task: given a training dataset, generate more samples from the same data distribution Why do we care? • HEP: fast simulation • Domain adaptation • Latent representation learning • Simulation and planning for RL • … 3

Yale Generative Modeling Build a generative model with probability distribution 4

Yale Taxonomy of Generative Models From I. Goodfellow … Direct Maximum Likelihood GAN Explicit density Implicit density Markov Chain Approximate density Tractable density GSN Fully visible belief nets: Variational Markov Chain NADE MADE Boltzmann machine VAE PixelRNN Change of variables models (nonlinear ICA) 5

Yale Generative Adversarial Networks 2-player game between generator Distinguish real samples from fake and discriminator samples Latent prior mapped to sample space implicitly defines a Transform noise into a realistic sample distribution Discriminator tells how fake or real a sample looks via a score Real data 6

Yale Minimax Formulation Construct a two-person zero-sum minimax game with a value We have an inner maximization by D and an outer minimization by G   With perfect discriminator, generator minimizes 7

    Yale How to train your GANs Alternate gradient descent on D and gradient ascent on G Heuristic loss function (non-saturating):   instead of 8

Yale Demo #1 Switch to i 🐎📔 for 1 st hands-on activity:   training a vanilla GAN on FashionMNIST 9

Yale f -Divergence Divergence = “a function which establishes the "distance" of one probability distribution to the other on a statistical manifold" f-divergences: Convex conjugate: No access to distributions in functional form. Use empirical expectations instead: 10

Yale f-GAN Extend GAN formalism: Any f -divergence can be used as GAN objective 11

Yale Problems with f -GANs If at any point the supports of these distributions have no overlap, this family of measures collapses to a null or infinite distance & (some) solutions Instance noise Label flipping Label smoothing learn more 12

Yale Solving the Disjoint Support Problem Other distance metrics: Integral Probability Metrics, Wasserstein Distances, Proper Scoring Rules e.g. replace family of f -divergences with the Wasserstein-1 distance , often referred to as the Earth Movers Distance Intuition: think of PDFs as mounds of dirt; EMD describes how much “work” it takes to transform one mound of dirt into another Accounts for both “mass” and distance —i.e., this works for disjoint PDFs! Excellent WGAN review! 13

Yale The Wasserstein-1 Distance …is intractable! The infimum is over a uncountably infinite set of candidate distribution pairings Kantorovich-Rubenstein duality to the rescue! What does this give us? Restriction to K-Lipschitz functions Excellent blog post about WGAN and KR-duality from V. Herrmann! 14

Yale Lipschitz? We can constrain a neural network to be   K -Lipschitz! This lets us parameterize f(x) as a neural network, and clamp the weights to For a Lipschitz continuous be in a compact space, say [-c, c] function, there is a double cone This guarantees that a network f is   (shown in white) whose vertex can K- Lipschitz, with K = r(c) , for some function r be translated along the graph, Great! The network can now operate in so that the graph Wasserstein-1 space up to a constant factor! always remains entirely outside the cone. f is now a critic 15

Yale The WGAN Algorithm Training critic: For each batch of real samples, we want the output of f to be as big (1.0) as possible For each batch of fake samples, we want the output of f to be as small (-1.0) as possible Training generator: We want f to be as   big (1.0) as possible Excellent blog post about WGAN from A. Irpan! 16

Yale WGAN Introspection Reliable metric to train critic to convergence 17

Yale WGAN Deficiencies Restricting neural net to have weights in compact space restricts expressivity (capacity underutilization) Gradients explode / vanish Why not make the network 1-Lipschitz? 18

Yale WGAN-GP WGAN with Gradient Penalty TL;DR: penalize critic for having a gradient norm too far from unity This is a better way to ensure 1-Lipschitz critics For every real sample, build a fake sample, and randomly linearly interpolate between the two 19

Yale Demo #2 Switch batch to i 🐎📔 for 2 nd hands-on activity:   training WGAN on FashionMNIST 20

Yale Thank You! Question? You can find me at: � michela.paganini@yale.edu 21

  Yale Theoretical dynamics of minimax GANs for optimal D From original paper, know that the optimal discriminator is: Define generator solving for infinite capacity discriminator, We can rewrite value as Simplifying notation, and applying some algebra But we recognize this as a summation of two KL-divergences And can combine these into the Jenson-Shannon divergence This yields a unique global minimum precisely when 22

Intro Tutorial on GANs Michela Paganini Fermilab Machine Learning - PowerPoint PPT Presentation

Yale Intro Tutorial on GANs Michela Paganini Fermilab Machine Learning Group Meeting March 21, 2018 Yale Outline Overview: Generative modeling Generative Adversarial Networks (GANs) Hands-on : build vanilla GAN on FashionMNIST GAN

Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs Yogesh

GANs for Word Embeddings Akshay Budhkar and Krishnapriya Introduction GANs have shown incredible

GANs, Optimal Transport, and Implicit Distribution Estimation Tengyuan Liang Econometrics and

Tutorial Tutorial A2 is out, its called Inpainting Tutorial Tutorial A2 is out, its called

Advanced Section #8: Generative Adversarial Networks (GANs) CS109B Data Science 2 Vincent Casser

Reading group: Latent Optimized GANs (Game theory brings guns to GANs) Michal Sustr Dept. of

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

A GAMS TUTORIAL A GAMS TUTORIAL A GAMS TUTORIAL WHAT IS GAMS ? General Algebraic Modeling

Excel Tutorial 1 Getting Started with Excel Tutorial 2 Formatting a Workbook Tutorial 3

Lecture 20: GANS CS109B Data Science 2 Pavlos Protopapas and Mark Glickman 1 Outline Review of

On Minimax Optimality of GANs for Robust Mean Estimation Kaiwen Wu 1,2 With Gavin Weiguang Ding 3

GAN Frontiers/Related Methods Improving GAN Training Improved Techniques for Training GANs

NLP Programming Tutorial 0 - Programming Basics Graham Neubig Nara Institute of Science and

PROGRAMMING TUTORIAL Thierry Lepley, April 4 th 2016 TUTORIAL GOAL Intermediate Tutorial for

Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research

Social Authentication: Vulnerabilities, Mitigations, and Redesign Marco Lancini DEEPSEC 2014

Wjets bkg. Estimation in WW analysis Jun Gao , Yanwen Liu, Emmanuel Monnier U niversity of S

Who Are You Online? commonsense.org/education Shareable with attribution for noncommercial use.

Image Domains Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL

challenges and lessons learned Pierre-Louis Bossart UMG Platform Architecture LPC2009 Ultra

What syntax doesnt feed semantics Fake indexicals as indexicals Emar Maier ILLC/University of

saskia sassen Columbia University www.saskiasassen.com For elaboration on some of this material

Sambuz

Useful Links

Newsletter

Mail Us