GANs, Optimal Transport, and Implicit Distribution Estimation - PowerPoint PPT Presentation

Intro. Adversarial Framework GANs Optimization Optimal Transport GANs, Optimal Transport, and Implicit Distribution Estimation Tengyuan Liang Econometrics and Statistics 1 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport 2 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport O UTLINE Implicit Distribution Estimation Given i.i.d. Y 1 , . . . , Y n ∼ ν . Use transformation T ∶ R d → R d to represent and learn unknown dist. Y ∼ ν via simple Z ∼ µ (say Uniform or Gaussian). close in dist.? T ( Z ) ≈ Y 3 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport O UTLINE Implicit Distribution Estimation Given i.i.d. Y 1 , . . . , Y n ∼ ν . Use transformation T ∶ R d → R d to represent and learn unknown dist. Y ∼ ν via simple Z ∼ µ (say Uniform or Gaussian). close in dist.? T ( Z ) ≈ Y equivalently ? T # µ ≈ ν 3 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport O UTLINE Implicit Distribution Estimation Generative Adversarial Networks Optimal Transport • statistical rates • estimate the Wasserstein metric vs. • pair regularization • estimate under the Wasserstein metric • optimization 3 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport G ENERATIVE ADVERSARIAL NETWORKS • GAN Goodfellow et al. (2014) • WGAN Arjovsky et al. (2017); Arjovsky and Bottou (2017) • MMD GAN Li, Swersky, and Zemel (2015); Dziugaite, Roy, and Ghahramani (2015); Arbel, Sutherland, Bi´ nkowski, and Gretton (2018) • f -GAN Nowozin, Cseke, and Tomioka (2016) • Sobolev GAN Mroueh et al. (2017) • many others... Liu, Bousquet, and Chaudhuri (2017); Tolstikhin, Gelly, Bousquet, Simon-Gabriel, and Sch¨ olkopf (2017) 4 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport G ENERATIVE ADVERSARIAL NETWORKS 4 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport G ENERATIVE ADVERSARIAL NETWORKS Generator g θ , Discriminator f ω U ( θ , ω ) = [ f ω ( Y )] − E [ f ω ( g θ ( Z ))] E � Y ∼ ν Z ∼ µ � target input U ( θ , ω ) min θ max ω GANs are widely used in practice, however 4 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport M UCH NEEDS TO BE UNDERSTOOD , IN THEORY • Approximation: what dist. can be approximated by the generator ( g θ ) # ( µ ) ? • Statistical : given n samples, what is the statistical/generalization error rate? • Computational: local convergence for practical optimization, how to stablize? • Landscape: are local saddle points good globally? 5 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport F ORMULATION T G class of generator transformations, F D class of discriminator functions ν target dist. ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ g ∗ ∈ arg min ⎨ X ∼ g # µ [ f ( X )] − E Y ∼ ν [ f ( Y )] ⎬ ⎪ ⎪ population max ⎪ E ⎪ ⎩ ⎭ f ∈F D g ∈T G 6 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport F ORMULATION T G class of generator transformations, F D class of discriminator functions ν target dist. ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ g ∗ ∈ arg min ⎨ X ∼ g # µ [ f ( X )] − E Y ∼ ν [ f ( Y )] ⎬ ⎪ ⎪ population max ⎪ E ⎪ ⎩ ⎭ f ∈F D g ∈T G ̂ ν n empirical dist. ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ ̂ g ∈ arg min ⎨ X ∼ g # µ [ f ( X )] − ν n [ f ( Y )] ⎬ ⎪ ⎪ empirical max E E ⎪ ⎪ ⎩ ⎭ f ∈F D Y ∼̂ g ∈T G ̂ g # µ as estimate for ν 6 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport F ORMULATION T G class of generator transformations, F D class of discriminator functions ν target dist. ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ g ∗ ∈ arg min ⎨ X ∼ g # µ [ f ( X )] − E Y ∼ ν [ f ( Y )] ⎬ ⎪ ⎪ population max ⎪ E ⎪ ⎩ ⎭ f ∈F D g ∈T G ̂ ν n empirical dist. ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ ̂ g ∈ arg min ⎨ X ∼ g # µ [ f ( X )] − ν n [ f ( Y )] ⎬ ⎪ ⎪ empirical max E E ⎪ ⎪ ⎩ ⎭ f ∈F D Y ∼̂ g ∈T G ̂ g # µ as estimate for ν • Density learning/estimation: long history nonparametric statistics model target density ρ ν ∈ W α - Sobolev space with smoothness α ≥ 0 Stone (1982); Nemirovski (2000); Tsybakov (2009); Wassermann (2006) • GAN statistical theory is needed Arora and Zhang (2017); Arora et al. (2017a,b); Liu et al. (2017) 6 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport D ISCRIMINATOR METRIC Define the critic metric (IPM) d F ( µ , ν ) ∶ = sup ∣ X ∼ µ f ( X ) − E Y ∼ ν f ( Y ) ∣ . E f ∈F 7 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport D ISCRIMINATOR METRIC Define the critic metric (IPM) d F ( µ , ν ) ∶ = sup ∣ X ∼ µ f ( X ) − E Y ∼ ν f ( Y ) ∣ . E f ∈F • F Lip-1: Wasserstein metric d W • F bounded by 1: total variation/Radon metric d TV • RKHS H , F = { f ∈ H , ∥ f ∥ H ≤ 1 } : MMD GAN • F Sobolev smoothness β : Sobolev GAN Statistical question: statistical error rate with n -i.i.d samples, E d F ( ν , ̂ µ n ) ? for a range of F and ν with certain regularity. 7 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport S UMMARY OF F IRST H ALF OF T ALK Goal Evaluation Results Generator Discriminator Property Metric Class G Class F Adversarial Sobolev minimax Sobolev Sobolev W α W β Framework d F GAN optimal (nonparametric) RKHS H MMD upper smooth GAN bound subspace in RKHS oracle any Sobolev G † W β results Generative d TV leaky- upper leaky- leaky- F ‡, m ∗ Adversarial ReLU bound ReLU ReLU Networks GANs (parametric) d TV , d KL , d H any GANs oracle neural neural G †, F ‡, m ∗ results networks networks d W Lipschitz oracle Lipschitz Lipschitz G †, F ‡, m ∗ GANs results neural neural networks networks 8 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport S UMMARY OF F IRST H ALF OF T ALK Goal Evaluation Results Generator Discriminator Property Metric Class G Class F Adversarial Sobolev minimax Sobolev Sobolev W α W β Framework d F GAN optimal (nonparametric) RKHS H MMD upper smooth GAN bound subspace in RKHS oracle any Sobolev G † W β results Generative d TV leaky- upper leaky- leaky- F ‡, m ∗ Adversarial ReLU bound ReLU ReLU Networks GANs (parametric) d TV , d KL , d H any GANs oracle neural neural G †, F ‡, m ∗ results networks networks d W Lipschitz oracle Lipschitz Lipschitz G †, F ‡, m ∗ GANs results neural neural networks networks The symbols: ( G †) and ( F ‡) to denote the mis-specification for the generator class and the discriminator class respectively, and ( m ∗ ) to indicate the dependence on the number of generator samples. 8 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport Implicit Distribution Estimator: GANs, Optimal Transport vs. Explicit Density Estimator: KDE, Projection/Series Estimator, ... 9 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport Adversarial Framework (nonparametric) 10 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport M INIMAX OPTIMAL RATES : S OBOLEV GAN Consider the target G ∶ = { ν ∶ ρ ν ∈ W α } Sobolev space with smoothness α , and the evaluation metric F = W β with smoothness β . 11 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport M INIMAX OPTIMAL RATES : S OBOLEV GAN Consider the target G ∶ = { ν ∶ ρ ν ∈ W α } Sobolev space with smoothness α , and the evaluation metric F = W β with smoothness β . Theorem (L. ’17 & L. ’18, Sobolev) . The minimax optimal rate is E d F ( ν , ̃ 2 α + d ∨ n − 1 ν n ) ≍ n − α + β 2 . inf sup ̃ ν n ν ∈G Here ̃ ν n any estimator based on n samples. d -dim. Liang (2017); Singh et al. (2018); Weed and Berthet (2019) 11 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport M INIMAX OPTIMAL RATES : MMD GAN Consider a reproducing kernel Hilbert space (RKHS) H • integral operator T with eigenvalue decay t i ≍ i − κ , 0 < κ < ∞ • evaluation metric F = { f ∈ H ∣ ∥ f ∥ H ≤ 1 } • target density ρ ν in G = { ν ∣ ∥T − α − 1 2 ρ ν ∥ H ≤ 1 } with smoothness α 12 / 40

Intro. Adversarial Framework GANs Optimization Optimal Transport M INIMAX OPTIMAL RATES : MMD GAN Consider a reproducing kernel Hilbert space (RKHS) H • integral operator T with eigenvalue decay t i ≍ i − κ , 0 < κ < ∞ • evaluation metric F = { f ∈ H ∣ ∥ f ∥ H ≤ 1 } • target density ρ ν in G = { ν ∣ ∥T − α − 1 2 ρ ν ∥ H ≤ 1 } with smoothness α Theorem (L. ’18, RKHS) . The minimax optimal rate is 2 ακ + 2 ∨ n − 1 E d F ( ν , ̃ ν n ) ≾ n − ( α + 1 ) κ inf sup 2 . ̃ ν n ν ∈G 12 / 40

GANs, Optimal Transport, and Implicit Distribution Estimation - PowerPoint PPT Presentation

Intro. Adversarial Framework GANs Optimization Optimal Transport GANs, Optimal Transport, and Implicit Distribution Estimation Tengyuan Liang Econometrics and Statistics 1 / 40 Intro. Adversarial Framework GANs Optimization Optimal

GANs for Word Embeddings Akshay Budhkar and Krishnapriya Introduction GANs have shown incredible

Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs Yogesh

Implicit Guarantees and Risk Taking: Implicit Guarantees and Risk Taking: Implicit Guarantees and

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

Implicit Bias Implicit bias Implicit bias refers to attitudes or stereotypes that affect our

Implicit Surfaces Implicit Surfaces An implicit surface is simply an iso-contour CIS 781 of a

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

Advanced Section #8: Generative Adversarial Networks (GANs) CS109B Data Science 2 Vincent Casser

Reading group: Latent Optimized GANs (Game theory brings guns to GANs) Michal Sustr Dept. of

Implicit Extremes and Implicit MaxStable Laws Stilian Stoev ( sstoev@umich.edu ) University of

Implicit Bias: Transcript Inclusive Teaching Series: Implicit Bias Welcome to the third module of

Multi-core Programming: Implicit Parallelism Tuukka Haapasalo April 16, 2009 Tuukka Haapasalo

Implicit Surfaces CPSC 599.86 / 601.86 Sonny Chan University of Calgary (some board work happened

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport

iRODS as Future Data Grid Backend for TextGrid ? Modular platform for collaborative textual

Introduction to OpenGL 3.x and Shader-Programming using GLSL Part 1 Ingo Radax, Gnther

Validation of Water Vapor Retrieved from Aqua AMSU/HSB Evan Fishbein JPL AIRS 2007 Spring

Face Recognition based on a 3D Morphable Model Zhang Kun School Of Computing National

Two-prover entangled games are NP hard Anand Natarajan (MIT) Thomas Vidick (Caltech)

Fixed point theorems for holomorphic maps on Teichm uller spaces and beyond Stergios M.

Connectivity and Hyperbolicity of a Graph Nicolas Nisse 1 David Coudert 1 Guillaume Ducoffe 1

THE GRAVITY-RELATED DECOHERENCE/COLLAPSE THEORY Lajos Di osi, Budapest CONTENT: Real,