on minimax optimality of gans for robust mean estimation
play

On Minimax Optimality of GANs for Robust Mean Estimation Kaiwen Wu - PowerPoint PPT Presentation

On Minimax Optimality of GANs for Robust Mean Estimation Kaiwen Wu 1,2 With Gavin Weiguang Ding 3 , Ruitong Huang 3 and Yaoliang Yu 1,2 University of Waterloo 1 Vector Institute 2 Borealis AI 3 Wu, Ding, Huang & Yu GANs for Robust Mean


  1. On Minimax Optimality of GANs for Robust Mean Estimation Kaiwen Wu 1,2 With Gavin Weiguang Ding 3 , Ruitong Huang 3 and Yaoliang Yu 1,2 University of Waterloo 1 Vector Institute 2 Borealis AI 3 Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 1 / 13

  2. The success of generative adversarial networks (GANs) (Arjovsky et al. 2017; Goodfellow et al. 2014; Li et al. 2017; Miyato et al. 2018) Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 2 / 13

  3. The success of generative adversarial networks (GANs) (Arjovsky et al. 2017; Goodfellow et al. 2014; Li et al. 2017; Miyato et al. 2018) But... what if the training data is noisy? Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 2 / 13

  4. H N ( θ, I p ) X 1 , X 2 , · · · X n ∼ (1 − ǫ ) N ( θ, I p ) + ǫ H (Huber 1964) Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 3 / 13

  5. H N ( θ, I p ) X 1 , X 2 , · · · X n ∼ (1 − ǫ ) N ( θ, I p ) + ǫ H (Huber 1964) Compute an estimator ˆ θ ≈ θ Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 3 / 13

  6. Goal: small RMSE sup H E � ˆ θ − θ � in the worst case Sample average: infinite error in the worst case Coordinate-wise median: √ p ǫ error Tukey’s median (Tukey 1975): optimal error ǫ , but NP-hard Statistically optimal & computationally feasible estimators (Diakonikolas et al. 2016; Lai et al. 2016) Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 4 / 13

  7. Robust Mean Estimation via GANs ˆ E data [ T ( X )] − E N ( η, I p ) [ s ( T ( Y ))] θ := argmin sup T ∈T η N ( η, I p ) is the generator T is the discriminator function class Which discriminator class T guarantees small estimation error? Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 5 / 13

  8. f -GAN (Nowozin et al. 2016) Discriminator is an one-hidden-layer network � � l � � � w i σ ( u ⊤ T = g i x + b i ) : � w � 1 ≤ κ i =1 Theorem ( f -GAN) Under mild assumptions on the activations, we have � p � ˆ θ n − θ � � n ∨ ǫ with high probability. Generalizing results of Gao et al. (2019) on TV-GAN and JS-GAN Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 6 / 13

  9. MMD-GAN (Dziugaite et al. 2015; Li et al. 2017) T = { f ∈ H k : � f � H k ≤ 1 } Discriminator is a unit ball in RKHS: � − � x − y � 2 � We focus on the Gaussian kernel: k ( x , y ) = exp 2 σ 2 Theorem With appropriate tuning of the bandwidth ( σ = √ p), � p n ∨ √ p ǫ � ˆ θ n − θ � � Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 7 / 13

  10. MMD-GAN (Dziugaite et al. 2015; Li et al. 2017) T = { f ∈ H k : � f � H k ≤ 1 } Discriminator is a unit ball in RKHS: � − � x − y � 2 � We focus on the Gaussian kernel: k ( x , y ) = exp 2 σ 2 Theorem With appropriate tuning of the bandwidth ( σ = √ p), � p n ∨ √ p ǫ � ˆ θ n − θ � � Theorem For any bandwidth σ , there exists a contamination H such that θ − θ � � √ p ǫ � ˆ Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 7 / 13

  11. Simulation σ = 5 σ = 15 1.1 2.5 σ = 7.5 σ = 20 σ = 10 1.0 2.0 0.9 θ − θ ‖ θ − θ ‖ 0.8 1.5 0.7 ‖ ̂ ‖ ̂ 1.0 0.6 0.5 0.5 0.4 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3 4 5 6 7 8 9 10 ‖ ̃ θ − θ ‖/ √ p √ p (a) different σ and δ ˜ θ in 100 dimension (b) different dimension p with σ = √ p Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 8 / 13

  12. Wasserstein GAN (Arjovsky et al. 2017) Discriminator is 1-Lipschitz functions: T = { f : | f ( x ) − f ( y ) | ≤ � x − y � , ∀ x , y ∈ X} . Theorem In one dimension, estimation error is bounded: | ˆ θ − θ | ≍ ǫ θ − θ � ≍ √ p ǫ empirically... In high dimensions, � ˆ Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 9 / 13

  13. Minimizing Wasserstein distance directly by Sinkhorn divergence 1.4 1.2 1.0 θ − θ ‖ 0.8 ‖ ̂ λ = 0.1 0.6 λ = 0.05 0.4 λ = 0.01 0.2 2 3 4 5 6 7 8 9 10 √ p (a) WGAN in 1 dimension (b) WGAN in p dimension Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 10 / 13

  14. Extension of f -GAN Unknown covariance Sparse mean estimation ◮ θ has at most s nonzero entries ◮ Sparse constraints on both discriminator and generator l � � � � � w i σ ( u ⊤ Discriminator: T = g i x + b i ) : � w � 1 ≤ κ, � u � 0 ≤ 2 s i =1 � � Generator: N ( η, I p ) : � η � 0 ≤ s Theorem � s log ep � ˆ s θ n − θ � ≍ ∨ ǫ n Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 11 / 13

  15. Simulation (b) sparse vs. nonsparse (a) varying sparsity s Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 12 / 13

  16. Summary Characterize minimax optimality of several GAN formulations – Complete characterization of the discriminator function class – Computational complexity of GANs Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 13 / 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend