 
              Intro. Adversarial Framework GANs Optimization Optimal Transport GANs, Optimal Transport, and Implicit Distribution Estimation Tengyuan Liang Econometrics and Statistics 1 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport 2 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport O UTLINE Implicit Distribution Estimation Given i.i.d. Y 1 , . . . , Y n ∼ ν . Use transformation T ∶ R d → R d to represent and learn unknown dist. Y ∼ ν via simple Z ∼ µ (say Uniform or Gaussian). close in dist.? T ( Z ) ≈ Y 3 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport O UTLINE Implicit Distribution Estimation Given i.i.d. Y 1 , . . . , Y n ∼ ν . Use transformation T ∶ R d → R d to represent and learn unknown dist. Y ∼ ν via simple Z ∼ µ (say Uniform or Gaussian). close in dist.? T ( Z ) ≈ Y equivalently ? T # µ ≈ ν 3 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport O UTLINE Implicit Distribution Estimation Generative Adversarial Networks Optimal Transport • statistical rates • estimate the Wasserstein metric vs. • pair regularization • estimate under the Wasserstein metric • optimization 3 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport G ENERATIVE ADVERSARIAL NETWORKS • GAN Goodfellow et al. (2014) • WGAN Arjovsky et al. (2017); Arjovsky and Bottou (2017) • MMD GAN Li, Swersky, and Zemel (2015); Dziugaite, Roy, and Ghahramani (2015); Arbel, Sutherland, Bi´ nkowski, and Gretton (2018) • f -GAN Nowozin, Cseke, and Tomioka (2016) • Sobolev GAN Mroueh et al. (2017) • many others... Liu, Bousquet, and Chaudhuri (2017); Tolstikhin, Gelly, Bousquet, Simon-Gabriel, and Sch¨ olkopf (2017) 4 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport G ENERATIVE ADVERSARIAL NETWORKS 4 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport G ENERATIVE ADVERSARIAL NETWORKS Generator g θ , Discriminator f ω U ( θ , ω ) = [ f ω ( Y )] − E [ f ω ( g θ ( Z ))] E � Y ∼ ν Z ∼ µ � target input U ( θ , ω ) min θ max ω GANs are widely used in practice, however 4 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport M UCH NEEDS TO BE UNDERSTOOD , IN THEORY • Approximation: what dist. can be approximated by the generator ( g θ ) # ( µ ) ? • Statistical : given n samples, what is the statistical/generalization error rate? • Computational: local convergence for practical optimization, how to stablize? • Landscape: are local saddle points good globally? 5 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport F ORMULATION T G class of generator transformations, F D class of discriminator functions ν target dist. ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ g ∗ ∈ arg min ⎨ X ∼ g # µ [ f ( X )] − E Y ∼ ν [ f ( Y )] ⎬ ⎪ ⎪ population max ⎪ E ⎪ ⎩ ⎭ f ∈F D g ∈T G 6 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport F ORMULATION T G class of generator transformations, F D class of discriminator functions ν target dist. ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ g ∗ ∈ arg min ⎨ X ∼ g # µ [ f ( X )] − E Y ∼ ν [ f ( Y )] ⎬ ⎪ ⎪ population max ⎪ E ⎪ ⎩ ⎭ f ∈F D g ∈T G ̂ ν n empirical dist. ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ ̂ g ∈ arg min ⎨ X ∼ g # µ [ f ( X )] − ν n [ f ( Y )] ⎬ ⎪ ⎪ empirical max E E ⎪ ⎪ ⎩ ⎭ f ∈F D Y ∼̂ g ∈T G ̂ g # µ as estimate for ν 6 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport F ORMULATION T G class of generator transformations, F D class of discriminator functions ν target dist. ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ g ∗ ∈ arg min ⎨ X ∼ g # µ [ f ( X )] − E Y ∼ ν [ f ( Y )] ⎬ ⎪ ⎪ population max ⎪ E ⎪ ⎩ ⎭ f ∈F D g ∈T G ̂ ν n empirical dist. ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ ̂ g ∈ arg min ⎨ X ∼ g # µ [ f ( X )] − ν n [ f ( Y )] ⎬ ⎪ ⎪ empirical max E E ⎪ ⎪ ⎩ ⎭ f ∈F D Y ∼̂ g ∈T G ̂ g # µ as estimate for ν • Density learning/estimation: long history nonparametric statistics model target density ρ ν ∈ W α - Sobolev space with smoothness α ≥ 0 Stone (1982); Nemirovski (2000); Tsybakov (2009); Wassermann (2006) • GAN statistical theory is needed Arora and Zhang (2017); Arora et al. (2017a,b); Liu et al. (2017) 6 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport D ISCRIMINATOR METRIC Define the critic metric (IPM) d F ( µ , ν ) ∶ = sup ∣ X ∼ µ f ( X ) − E Y ∼ ν f ( Y ) ∣ . E f ∈F 7 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport D ISCRIMINATOR METRIC Define the critic metric (IPM) d F ( µ , ν ) ∶ = sup ∣ X ∼ µ f ( X ) − E Y ∼ ν f ( Y ) ∣ . E f ∈F • F Lip-1: Wasserstein metric d W • F bounded by 1: total variation/Radon metric d TV • RKHS H , F = { f ∈ H , ∥ f ∥ H ≤ 1 } : MMD GAN • F Sobolev smoothness β : Sobolev GAN Statistical question: statistical error rate with n -i.i.d samples, E d F ( ν , ̂ µ n ) ? for a range of F and ν with certain regularity. 7 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport S UMMARY OF F IRST H ALF OF T ALK Goal Evaluation Results Generator Discriminator Property Metric Class G Class F Adversarial Sobolev minimax Sobolev Sobolev W α W β Framework d F GAN optimal (nonparametric) RKHS H MMD upper smooth GAN bound subspace in RKHS oracle any Sobolev G † W β results Generative d TV leaky- upper leaky- leaky- F ‡, m ∗ Adversarial ReLU bound ReLU ReLU Networks GANs (parametric) d TV , d KL , d H any GANs oracle neural neural G †, F ‡, m ∗ results networks networks d W Lipschitz oracle Lipschitz Lipschitz G †, F ‡, m ∗ GANs results neural neural networks networks 8 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport S UMMARY OF F IRST H ALF OF T ALK Goal Evaluation Results Generator Discriminator Property Metric Class G Class F Adversarial Sobolev minimax Sobolev Sobolev W α W β Framework d F GAN optimal (nonparametric) RKHS H MMD upper smooth GAN bound subspace in RKHS oracle any Sobolev G † W β results Generative d TV leaky- upper leaky- leaky- F ‡, m ∗ Adversarial ReLU bound ReLU ReLU Networks GANs (parametric) d TV , d KL , d H any GANs oracle neural neural G †, F ‡, m ∗ results networks networks d W Lipschitz oracle Lipschitz Lipschitz G †, F ‡, m ∗ GANs results neural neural networks networks The symbols: ( G †) and ( F ‡) to denote the mis-specification for the generator class and the discriminator class respectively, and ( m ∗ ) to indicate the dependence on the number of generator samples. 8 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport Implicit Distribution Estimator: GANs, Optimal Transport vs. Explicit Density Estimator: KDE, Projection/Series Estimator, ... 9 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport Adversarial Framework (nonparametric) 10 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport M INIMAX OPTIMAL RATES : S OBOLEV GAN Consider the target G ∶ = { ν ∶ ρ ν ∈ W α } Sobolev space with smoothness α , and the evaluation metric F = W β with smoothness β . 11 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport M INIMAX OPTIMAL RATES : S OBOLEV GAN Consider the target G ∶ = { ν ∶ ρ ν ∈ W α } Sobolev space with smoothness α , and the evaluation metric F = W β with smoothness β . Theorem (L. ’17 & L. ’18, Sobolev) . The minimax optimal rate is E d F ( ν , ̃ 2 α + d ∨ n − 1 ν n ) ≍ n − α + β 2 . inf sup ̃ ν n ν ∈G Here ̃ ν n any estimator based on n samples. d -dim. Liang (2017); Singh et al. (2018); Weed and Berthet (2019) 11 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport M INIMAX OPTIMAL RATES : MMD GAN Consider a reproducing kernel Hilbert space (RKHS) H • integral operator T with eigenvalue decay t i ≍ i − κ , 0 < κ < ∞ • evaluation metric F = { f ∈ H ∣ ∥ f ∥ H ≤ 1 } • target density ρ ν in G = { ν ∣ ∥T − α − 1 2 ρ ν ∥ H ≤ 1 } with smoothness α 12 / 40
Intro. Adversarial Framework GANs Optimization Optimal Transport M INIMAX OPTIMAL RATES : MMD GAN Consider a reproducing kernel Hilbert space (RKHS) H • integral operator T with eigenvalue decay t i ≍ i − κ , 0 < κ < ∞ • evaluation metric F = { f ∈ H ∣ ∥ f ∥ H ≤ 1 } • target density ρ ν in G = { ν ∣ ∥T − α − 1 2 ρ ν ∥ H ≤ 1 } with smoothness α Theorem (L. ’18, RKHS) . The minimax optimal rate is 2 ακ + 2 ∨ n − 1 E d F ( ν , ̃ ν n ) ≾ n − ( α + 1 ) κ inf sup 2 . ̃ ν n ν ∈G 12 / 40
Recommend
More recommend