On Minimax Optimality of GANs for Robust Mean Estimation Kaiwen Wu - - PowerPoint PPT Presentation

on minimax optimality of gans for robust mean estimation
SMART_READER_LITE
LIVE PREVIEW

On Minimax Optimality of GANs for Robust Mean Estimation Kaiwen Wu - - PowerPoint PPT Presentation

On Minimax Optimality of GANs for Robust Mean Estimation Kaiwen Wu 1,2 With Gavin Weiguang Ding 3 , Ruitong Huang 3 and Yaoliang Yu 1,2 University of Waterloo 1 Vector Institute 2 Borealis AI 3 Wu, Ding, Huang & Yu GANs for Robust Mean


slide-1
SLIDE 1

On Minimax Optimality of GANs for Robust Mean Estimation

Kaiwen Wu1,2 With Gavin Weiguang Ding3, Ruitong Huang3 and Yaoliang Yu1,2 University of Waterloo1 Vector Institute2 Borealis AI3

Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 1 / 13

slide-2
SLIDE 2

The success of generative adversarial networks (GANs)

(Arjovsky et al. 2017; Goodfellow et al. 2014; Li et al. 2017; Miyato et al. 2018)

Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 2 / 13

slide-3
SLIDE 3

The success of generative adversarial networks (GANs)

(Arjovsky et al. 2017; Goodfellow et al. 2014; Li et al. 2017; Miyato et al. 2018)

But... what if the training data is noisy?

Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 2 / 13

slide-4
SLIDE 4

N(θ, Ip) H X1, X2, · · · Xn ∼ (1 − ǫ)N(θ, Ip) + ǫH (Huber 1964)

Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 3 / 13

slide-5
SLIDE 5

N(θ, Ip) H X1, X2, · · · Xn ∼ (1 − ǫ)N(θ, Ip) + ǫH (Huber 1964) Compute an estimator ˆ θ ≈ θ

Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 3 / 13

slide-6
SLIDE 6

Goal: small RMSE supH Eˆ θ − θ in the worst case Sample average: infinite error in the worst case Coordinate-wise median: √pǫ error Tukey’s median (Tukey 1975): optimal error ǫ, but NP-hard Statistically optimal & computationally feasible estimators (Diakonikolas et al. 2016; Lai et al. 2016)

Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 4 / 13

slide-7
SLIDE 7

Robust Mean Estimation via GANs

ˆ θ := argmin

η

sup

T∈T

Edata[T(X)] − EN(η,Ip)[s(T(Y ))] N(η, Ip) is the generator T is the discriminator function class Which discriminator class T guarantees small estimation error?

Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 5 / 13

slide-8
SLIDE 8

f -GAN (Nowozin et al. 2016)

Discriminator is an one-hidden-layer network T =

  • g
  • l
  • i=1

wiσ(u⊤

i x + bi)

  • : w1 ≤ κ
  • Theorem (f -GAN)

Under mild assumptions on the activations, we have ˆ θn − θ p n ∨ ǫ with high probability. Generalizing results of Gao et al. (2019) on TV-GAN and JS-GAN

Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 6 / 13

slide-9
SLIDE 9

MMD-GAN (Dziugaite et al. 2015; Li et al. 2017)

Discriminator is a unit ball in RKHS: T = {f ∈ Hk : f Hk ≤ 1} We focus on the Gaussian kernel: k(x, y) = exp

  • − x−y2

2σ2

  • Theorem

With appropriate tuning of the bandwidth (σ = √p), ˆ θn − θ p n ∨ √pǫ

Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 7 / 13

slide-10
SLIDE 10

MMD-GAN (Dziugaite et al. 2015; Li et al. 2017)

Discriminator is a unit ball in RKHS: T = {f ∈ Hk : f Hk ≤ 1} We focus on the Gaussian kernel: k(x, y) = exp

  • − x−y2

2σ2

  • Theorem

With appropriate tuning of the bandwidth (σ = √p), ˆ θn − θ p n ∨ √pǫ

Theorem

For any bandwidth σ, there exists a contamination H such that ˆ θ − θ √pǫ

Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 7 / 13

slide-11
SLIDE 11

Simulation

0.0 0.5 1.0 1.5 2.0 2.5 3.0

‖ ̃ θ − θ‖/√p

0.0 0.5 1.0 1.5 2.0 2.5

‖ ̂ θ − θ‖

σ = 5 σ = 7.5 σ = 10 σ = 15 σ = 20

(a) different σ and δ˜

θ in 100 dimension

3 4 5 6 7 8 9 10 √p 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1

‖ ̂ θ − θ‖

(b) different dimension p with σ = √p

Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 8 / 13

slide-12
SLIDE 12

Wasserstein GAN (Arjovsky et al. 2017)

Discriminator is 1-Lipschitz functions: T = {f : |f (x) − f (y)| ≤ x − y, ∀x, y ∈ X} .

Theorem

In one dimension, estimation error is bounded: |ˆ θ − θ| ≍ ǫ In high dimensions, ˆ θ − θ ≍ √pǫ empirically...

Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 9 / 13

slide-13
SLIDE 13

Minimizing Wasserstein distance directly by Sinkhorn divergence

(a) WGAN in 1 dimension

2 3 4 5 6 7 8 9 10 √p 0.2 0.4 0.6 0.8 1.0 1.2 1.4

‖ ̂ θ − θ‖

λ = 0.1 λ = 0.05 λ = 0.01

(b) WGAN in p dimension

Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 10 / 13

slide-14
SLIDE 14

Extension of f -GAN

Unknown covariance Sparse mean estimation

◮ θ has at most s nonzero entries ◮ Sparse constraints on both discriminator and generator

Discriminator: T =

  • g
  • l
  • i=1

wiσ(u⊤

i x + bi)

  • : w1 ≤ κ, u0 ≤ 2s
  • Generator:
  • N(η, Ip) : η0 ≤ s
  • Theorem

ˆ θn − θ ≍

  • s log ep

s

n ∨ ǫ

Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 11 / 13

slide-15
SLIDE 15

Simulation

(a) varying sparsity s (b) sparse vs. nonsparse

Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 12 / 13

slide-16
SLIDE 16

Summary

Characterize minimax optimality of several GAN formulations – Complete characterization of the discriminator function class – Computational complexity of GANs

Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 13 / 13