gans optimal transport and implicit distribution
play

GANs, Optimal Transport, and Implicit Distribution Estimation - PowerPoint PPT Presentation

Intro. Adversarial Framework GANs Optimization Optimal Transport GANs, Optimal Transport, and Implicit Distribution Estimation Tengyuan Liang Econometrics and Statistics 1 / 40 Intro. Adversarial Framework GANs Optimization Optimal


  1. Intro. Adversarial Framework GANs Optimization Optimal Transport GANs, Optimal Transport, and Implicit Distribution Estimation Tengyuan Liang Econometrics and Statistics 1 / 40

  2. Intro. Adversarial Framework GANs Optimization Optimal Transport 2 / 40

  3. Intro. Adversarial Framework GANs Optimization Optimal Transport O UTLINE Implicit Distribution Estimation Given i.i.d. Y 1 , . . . , Y n ∼ ν . Use transformation T ∶ R d → R d to represent and learn unknown dist. Y ∼ ν via simple Z ∼ µ (say Uniform or Gaussian). close in dist.? T ( Z ) ≈ Y 3 / 40

  4. Intro. Adversarial Framework GANs Optimization Optimal Transport O UTLINE Implicit Distribution Estimation Given i.i.d. Y 1 , . . . , Y n ∼ ν . Use transformation T ∶ R d → R d to represent and learn unknown dist. Y ∼ ν via simple Z ∼ µ (say Uniform or Gaussian). close in dist.? T ( Z ) ≈ Y equivalently ? T # µ ≈ ν 3 / 40

  5. Intro. Adversarial Framework GANs Optimization Optimal Transport O UTLINE Implicit Distribution Estimation Generative Adversarial Networks Optimal Transport • statistical rates • estimate the Wasserstein metric vs. • pair regularization • estimate under the Wasserstein metric • optimization 3 / 40

  6. Intro. Adversarial Framework GANs Optimization Optimal Transport G ENERATIVE ADVERSARIAL NETWORKS • GAN Goodfellow et al. (2014) • WGAN Arjovsky et al. (2017); Arjovsky and Bottou (2017) • MMD GAN Li, Swersky, and Zemel (2015); Dziugaite, Roy, and Ghahramani (2015); Arbel, Sutherland, Bi´ nkowski, and Gretton (2018) • f -GAN Nowozin, Cseke, and Tomioka (2016) • Sobolev GAN Mroueh et al. (2017) • many others... Liu, Bousquet, and Chaudhuri (2017); Tolstikhin, Gelly, Bousquet, Simon-Gabriel, and Sch¨ olkopf (2017) 4 / 40

  7. Intro. Adversarial Framework GANs Optimization Optimal Transport G ENERATIVE ADVERSARIAL NETWORKS 4 / 40

  8. Intro. Adversarial Framework GANs Optimization Optimal Transport G ENERATIVE ADVERSARIAL NETWORKS Generator g θ , Discriminator f ω U ( θ , ω ) = [ f ω ( Y )] − E [ f ω ( g θ ( Z ))] E � Y ∼ ν Z ∼ µ � target input U ( θ , ω ) min θ max ω GANs are widely used in practice, however 4 / 40

  9. Intro. Adversarial Framework GANs Optimization Optimal Transport M UCH NEEDS TO BE UNDERSTOOD , IN THEORY • Approximation: what dist. can be approximated by the generator ( g θ ) # ( µ ) ? • Statistical : given n samples, what is the statistical/generalization error rate? • Computational: local convergence for practical optimization, how to stablize? • Landscape: are local saddle points good globally? 5 / 40

  10. Intro. Adversarial Framework GANs Optimization Optimal Transport F ORMULATION T G class of generator transformations, F D class of discriminator functions ν target dist. ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ g ∗ ∈ arg min ⎨ X ∼ g # µ [ f ( X )] − E Y ∼ ν [ f ( Y )] ⎬ ⎪ ⎪ population max ⎪ E ⎪ ⎩ ⎭ f ∈F D g ∈T G 6 / 40

  11. Intro. Adversarial Framework GANs Optimization Optimal Transport F ORMULATION T G class of generator transformations, F D class of discriminator functions ν target dist. ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ g ∗ ∈ arg min ⎨ X ∼ g # µ [ f ( X )] − E Y ∼ ν [ f ( Y )] ⎬ ⎪ ⎪ population max ⎪ E ⎪ ⎩ ⎭ f ∈F D g ∈T G ̂ ν n empirical dist. ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ ̂ g ∈ arg min ⎨ X ∼ g # µ [ f ( X )] − ν n [ f ( Y )] ⎬ ⎪ ⎪ empirical max E E ⎪ ⎪ ⎩ ⎭ f ∈F D Y ∼̂ g ∈T G ̂ g # µ as estimate for ν 6 / 40

  12. Intro. Adversarial Framework GANs Optimization Optimal Transport F ORMULATION T G class of generator transformations, F D class of discriminator functions ν target dist. ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ g ∗ ∈ arg min ⎨ X ∼ g # µ [ f ( X )] − E Y ∼ ν [ f ( Y )] ⎬ ⎪ ⎪ population max ⎪ E ⎪ ⎩ ⎭ f ∈F D g ∈T G ̂ ν n empirical dist. ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ ̂ g ∈ arg min ⎨ X ∼ g # µ [ f ( X )] − ν n [ f ( Y )] ⎬ ⎪ ⎪ empirical max E E ⎪ ⎪ ⎩ ⎭ f ∈F D Y ∼̂ g ∈T G ̂ g # µ as estimate for ν • Density learning/estimation: long history nonparametric statistics model target density ρ ν ∈ W α - Sobolev space with smoothness α ≥ 0 Stone (1982); Nemirovski (2000); Tsybakov (2009); Wassermann (2006) • GAN statistical theory is needed Arora and Zhang (2017); Arora et al. (2017a,b); Liu et al. (2017) 6 / 40

  13. Intro. Adversarial Framework GANs Optimization Optimal Transport D ISCRIMINATOR METRIC Define the critic metric (IPM) d F ( µ , ν ) ∶ = sup ∣ X ∼ µ f ( X ) − E Y ∼ ν f ( Y ) ∣ . E f ∈F 7 / 40

  14. Intro. Adversarial Framework GANs Optimization Optimal Transport D ISCRIMINATOR METRIC Define the critic metric (IPM) d F ( µ , ν ) ∶ = sup ∣ X ∼ µ f ( X ) − E Y ∼ ν f ( Y ) ∣ . E f ∈F • F Lip-1: Wasserstein metric d W • F bounded by 1: total variation/Radon metric d TV • RKHS H , F = { f ∈ H , ∥ f ∥ H ≤ 1 } : MMD GAN • F Sobolev smoothness β : Sobolev GAN Statistical question: statistical error rate with n -i.i.d samples, E d F ( ν , ̂ µ n ) ? for a range of F and ν with certain regularity. 7 / 40

  15. Intro. Adversarial Framework GANs Optimization Optimal Transport S UMMARY OF F IRST H ALF OF T ALK Goal Evaluation Results Generator Discriminator Property Metric Class G Class F Adversarial Sobolev minimax Sobolev Sobolev W α W β Framework d F GAN optimal (nonparametric) RKHS H MMD upper smooth GAN bound subspace in RKHS oracle any Sobolev G † W β results Generative d TV leaky- upper leaky- leaky- F ‡, m ∗ Adversarial ReLU bound ReLU ReLU Networks GANs (parametric) d TV , d KL , d H any GANs oracle neural neural G †, F ‡, m ∗ results networks networks d W Lipschitz oracle Lipschitz Lipschitz G †, F ‡, m ∗ GANs results neural neural networks networks 8 / 40

  16. Intro. Adversarial Framework GANs Optimization Optimal Transport S UMMARY OF F IRST H ALF OF T ALK Goal Evaluation Results Generator Discriminator Property Metric Class G Class F Adversarial Sobolev minimax Sobolev Sobolev W α W β Framework d F GAN optimal (nonparametric) RKHS H MMD upper smooth GAN bound subspace in RKHS oracle any Sobolev G † W β results Generative d TV leaky- upper leaky- leaky- F ‡, m ∗ Adversarial ReLU bound ReLU ReLU Networks GANs (parametric) d TV , d KL , d H any GANs oracle neural neural G †, F ‡, m ∗ results networks networks d W Lipschitz oracle Lipschitz Lipschitz G †, F ‡, m ∗ GANs results neural neural networks networks The symbols: ( G †) and ( F ‡) to denote the mis-specification for the generator class and the discriminator class respectively, and ( m ∗ ) to indicate the dependence on the number of generator samples. 8 / 40

  17. Intro. Adversarial Framework GANs Optimization Optimal Transport Implicit Distribution Estimator: GANs, Optimal Transport vs. Explicit Density Estimator: KDE, Projection/Series Estimator, ... 9 / 40

  18. Intro. Adversarial Framework GANs Optimization Optimal Transport Adversarial Framework (nonparametric) 10 / 40

  19. Intro. Adversarial Framework GANs Optimization Optimal Transport M INIMAX OPTIMAL RATES : S OBOLEV GAN Consider the target G ∶ = { ν ∶ ρ ν ∈ W α } Sobolev space with smoothness α , and the evaluation metric F = W β with smoothness β . 11 / 40

  20. Intro. Adversarial Framework GANs Optimization Optimal Transport M INIMAX OPTIMAL RATES : S OBOLEV GAN Consider the target G ∶ = { ν ∶ ρ ν ∈ W α } Sobolev space with smoothness α , and the evaluation metric F = W β with smoothness β . Theorem (L. ’17 & L. ’18, Sobolev) . The minimax optimal rate is E d F ( ν , ̃ 2 α + d ∨ n − 1 ν n ) ≍ n − α + β 2 . inf sup ̃ ν n ν ∈G Here ̃ ν n any estimator based on n samples. d -dim. Liang (2017); Singh et al. (2018); Weed and Berthet (2019) 11 / 40

  21. Intro. Adversarial Framework GANs Optimization Optimal Transport M INIMAX OPTIMAL RATES : MMD GAN Consider a reproducing kernel Hilbert space (RKHS) H • integral operator T with eigenvalue decay t i ≍ i − κ , 0 < κ < ∞ • evaluation metric F = { f ∈ H ∣ ∥ f ∥ H ≤ 1 } • target density ρ ν in G = { ν ∣ ∥T − α − 1 2 ρ ν ∥ H ≤ 1 } with smoothness α 12 / 40

  22. Intro. Adversarial Framework GANs Optimization Optimal Transport M INIMAX OPTIMAL RATES : MMD GAN Consider a reproducing kernel Hilbert space (RKHS) H • integral operator T with eigenvalue decay t i ≍ i − κ , 0 < κ < ∞ • evaluation metric F = { f ∈ H ∣ ∥ f ∥ H ≤ 1 } • target density ρ ν in G = { ν ∣ ∥T − α − 1 2 ρ ν ∥ H ≤ 1 } with smoothness α Theorem (L. ’18, RKHS) . The minimax optimal rate is 2 ακ + 2 ∨ n − 1 E d F ( ν , ̃ ν n ) ≾ n − ( α + 1 ) κ inf sup 2 . ̃ ν n ν ∈G 12 / 40

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend