On the Statistical Rate of Nonlinear Recovery in Generative Models - - PowerPoint PPT Presentation

on the statistical rate of nonlinear recovery in
SMART_READER_LITE
LIVE PREVIEW

On the Statistical Rate of Nonlinear Recovery in Generative Models - - PowerPoint PPT Presentation

On the Statistical Rate of Nonlinear Recovery in Generative Models with Heavy-tailed Data Xiaohan Wei , Zhuoran Yang, and Zhaoran Wang University of Southern California, Princeton University and Northwestern University June 12th, 2019 Generative


slide-1
SLIDE 1

On the Statistical Rate of Nonlinear Recovery in Generative Models with Heavy-tailed Data

Xiaohan Wei, Zhuoran Yang, and Zhaoran Wang

University of Southern California, Princeton University and Northwestern University

June 12th, 2019

slide-2
SLIDE 2

Generative Model vs Sparsity in Signal Recovery

Classical sparsity: structure of the signals depend on basis.

slide-3
SLIDE 3

Generative Model vs Sparsity in Signal Recovery

Classical sparsity: structure of the signals depend on basis. Generative model: explicit parametrization of low-dimensional signal manifold.

slide-4
SLIDE 4

Generative Model vs Sparsity in Signal Recovery

Classical sparsity: structure of the signals depend on basis. Generative model: explicit parametrization of low-dimensional signal manifold. Previous works: [Bora et al. 2017] [Hand et al. 2018] [Mardani et al. 2017].

slide-5
SLIDE 5

Nonlinear Recovery via Generative Models

Given: Generative model G : Rk → Rd and measurement matrix X ∈ Rm×d.

slide-6
SLIDE 6

Nonlinear Recovery via Generative Models

Given: Generative model G : Rk → Rd and measurement matrix X ∈ Rm×d. Goal: Recovery G(θ∗) up to scaling from nonlinear observations y = f(XG(θ∗)).

slide-7
SLIDE 7

Nonlinear Recovery via Generative Models

Given: Generative model G : Rk → Rd and measurement matrix X ∈ Rm×d. Goal: Recovery G(θ∗) up to scaling from nonlinear observations y = f(XG(θ∗)). Challenges:

1

High-dimensional recovery: k ≪ d, m ≪ d.

slide-8
SLIDE 8

Nonlinear Recovery via Generative Models

Given: Generative model G : Rk → Rd and measurement matrix X ∈ Rm×d. Goal: Recovery G(θ∗) up to scaling from nonlinear observations y = f(XG(θ∗)). Challenges:

1

High-dimensional recovery: k ≪ d, m ≪ d.

2

Non-Gaussian X and unknown non-linearity f.

slide-9
SLIDE 9

Nonlinear Recovery via Generative Models

Given: Generative model G : Rk → Rd and measurement matrix X ∈ Rm×d. Goal: Recovery G(θ∗) up to scaling from nonlinear observations y = f(XG(θ∗)). Challenges:

1

High-dimensional recovery: k ≪ d, m ≪ d.

2

Non-Gaussian X and unknown non-linearity f.

3

Observations y can be heavy-tailed.

slide-10
SLIDE 10

Our Method: Stein + Adaptive Thresholding

Suppose the rows of X := [X1, · · · , Xm]T ∈ Rm×d have density p : Rd → R. Define the (row-wise) score transformation: Sp(X) := [Sp(X1), · · · , Sp(Xm)]T = [∇ log p(X1), · · · , ∇ log p(Xm)]T .

slide-11
SLIDE 11

Our Method: Stein + Adaptive Thresholding

Suppose the rows of X := [X1, · · · , Xm]T ∈ Rm×d have density p : Rd → R. Define the (row-wise) score transformation: Sp(X) := [Sp(X1), · · · , Sp(Xm)]T = [∇ log p(X1), · · · , ∇ log p(Xm)]T . (First-order) Stein’s identity: when Ef ′(Xi, G(θ∗)) > 0, E

  • Sp(X)T y
  • ∝ G(θ∗).

(Second-order) Stein’s identity: when Ef ′′(Xi, G(θ∗)) > 0, δ is a constant, E

  • Sp(X)T diag(y)Sp(X)
  • ∝ G(θ∗)G(θ∗)T + δ · Id×d.
slide-12
SLIDE 12

Our Method: Stein + Adaptive Thresholding

Suppose the rows of X := [X1, · · · , Xm]T ∈ Rm×d have density p : Rd → R. Define the (row-wise) score transformation: Sp(X) := [Sp(X1), · · · , Sp(Xm)]T = [∇ log p(X1), · · · , ∇ log p(Xm)]T . (First-order) Stein’s identity: when Ef ′(Xi, G(θ∗)) > 0, E

  • Sp(X)T y
  • ∝ G(θ∗).

(Second-order) Stein’s identity: when Ef ′′(Xi, G(θ∗)) > 0, δ is a constant, E

  • Sp(X)T diag(y)Sp(X)
  • ∝ G(θ∗)G(θ∗)T + δ · Id×d.

Adaptive thresholding: suppose yiLq < ∞, q > 4, and τm ∝ m2/q,

  • yi = sign(yi) · (|yi| ∧ τm), i ∈ {1, 2, · · · , m}
slide-13
SLIDE 13

Our Method: Stein + Adaptive Thresholding

Least-squares estimator:

  • θ ∈ argminθ∈Rk
  • G(θ) − 1

m Sp(X)T y

  • 2

2

.

slide-14
SLIDE 14

Our Method: Stein + Adaptive Thresholding

Least-squares estimator:

  • θ ∈ argminθ∈Rk
  • G(θ) − 1

m Sp(X)T y

  • 2

2

. Main performance theorem:

Theorem (Wei, Yang and Wang, 2019)

For any accuracy level ε ∈ (0, 1], suppose (1) Ef ′(Xi, G(θ∗)) > 0, (2) the generative model G is a ReLU network with zero bias, (3) the number of measurements m ∝ kε−2 log d. Then, with high probability,

  • G(

θ) G( θ)2 − G(θ∗) G(θ∗)2

  • 2

≤ ε. Similar results hold for more general Lipschitz generators G.

slide-15
SLIDE 15

Our Method: Stein + Adaptive Thresholding

PCA type estimator:

  • θ ∈ argmaxG(θ)2=1 G(θ)T Sp(X)T diag(

y)Sp(X)G(θ)

slide-16
SLIDE 16

Our Method: Stein + Adaptive Thresholding

PCA type estimator:

  • θ ∈ argmaxG(θ)2=1 G(θ)T Sp(X)T diag(

y)Sp(X)G(θ) Main performance theorem:

Theorem (Wei, Yang and Wang, 2019)

For any accuracy level ε ∈ (0, 1], suppose (1) Ef ′′(Xi, G(θ∗)) > 0, (2) the generative model G is a ReLU network with zero bias, (3) the number of measurements m ∝ kε−2 log d. Then, with high probability,

  • G(

θ) − G(θ∗) G(θ∗)2

  • 2

≤ ε. Similar results hold for more general Lipschitz generators G.

slide-17
SLIDE 17

Thank you!

Poster 198, Pacific Ballroom, 6:30-9:00 pm