Exact and Stable Covariance Estimation from Quadratic Sampling via - - PowerPoint PPT Presentation

exact and stable covariance estimation from quadratic
SMART_READER_LITE
LIVE PREVIEW

Exact and Stable Covariance Estimation from Quadratic Sampling via - - PowerPoint PPT Presentation

May 9 Exact and Stable Covariance Estimation from Quadratic Sampling via Convex Programming Yuxin Chen , Yuejie Chi , Andrea J. Goldsmith Stanford University , Ohio State University Page 1 High-Dimensional Sequential Data /


slide-1
SLIDE 1

May 9

Exact and Stable Covariance Estimation from Quadratic Sampling via Convex Programming

Yuxin Chen†, Yuejie Chi∗, Andrea J. Goldsmith† Stanford University†, Ohio State University∗

Page 1

slide-2
SLIDE 2
  • Data Stream / Stochastic Processes
  • Each data instance can be high-dimensional
  • We’re interested in information in the data

rather than the data themselves

  • Covariance Estimation
  • second-order statistics Σ ∈ Rn×n
  • cornerstone of many information processing tasks

High-Dimensional Sequential Data / Signals

Page 2

slide-3
SLIDE 3

What are Quadratic Measurements?

  • Quadratic Measurements
  • obtain m measurements of Σ taking the form

yi ≈ a⊤

i Σai

(1 ≤ i ≤ m)

  • rank-1 measurements!

Page 3

slide-4
SLIDE 4

Example: Applications in Spectral Estimation

  • High-frequency wireless and signal processing (Energy Measurements)
  • Spectral estimation of stationary processes (possibly sparse)

Page 4

slide-5
SLIDE 5

Example: Applications in Spectral Estimation

  • High-frequency wireless and signal processing (Energy Measurements)
  • Spectral estimation of stationary processes (possibly sparse)
  • Channel Estimation in MIMO Channels

Page 4

slide-6
SLIDE 6

10 20 30 40 5 10 15 20 25 30 35 40 45 10 20 30 40 5 10 15 20 25 30 35 40 45

Fig credit: Chi et al

Example: Applications in Optics

  • Phase Space Tomography
  • measure correlation functions of a wave field

Page 5

slide-7
SLIDE 7

10 20 30 40 5 10 15 20 25 30 35 40 45 10 20 30 40 5 10 15 20 25 30 35 40 45

courtesy of Chi et al courtesy of Candes et al

Example: Applications in Optics

  • Phase Space Tomography
  • measure correlation functions of a wave field
  • Phase Retrieval
  • signal recovery from magnitude measurements

Page 5

slide-8
SLIDE 8

binary data stream by Kazmin

Example: Applications in Data Streams

  • Covariance Sketching
  • data stream: real-time data {xt}∞

t=1 arriving sequentially at a high rate...

  • Challenges
  • limited memory
  • computational efficiency
  • hopefully a single pass over the data

Page 6

slide-9
SLIDE 9

Proposed Quadratic Sketching Method

1) Sketching:

  • at each time t, obtain a quadratic sketch (a⊤

i xt)2

— ai: sketching vector

Page 7

slide-10
SLIDE 10

Proposed Quadratic Sketching Method

1) Sketching:

  • at each time t, obtain a quadratic sketch (a⊤

i xt)2

— ai: sketching vector 2) Aggregation:

  • all sketches are aggregated into m measurements

yi = a⊤

i

  • 1

T

T

  • t=1

xtx⊤

t

  • ai ≈ a⊤

i Σai

(1 ≤ i ≤ m)

Page 7

slide-11
SLIDE 11

Proposed Quadratic Sketching Method

1) Sketching:

  • at each time t, obtain a quadratic sketch (a⊤

i xt)2

— ai: sketching vector 2) Aggregation:

  • all sketches are aggregated into m measurements

yi = a⊤

i

  • 1

T

T

  • t=1

xtx⊤

t

  • ai ≈ a⊤

i Σai

(1 ≤ i ≤ m)

  • Benefits:
  • one pass
  • minimal storage (as will be shown)

Page 7

slide-12
SLIDE 12
  • Given: m (≪ n2) quadratic measurements y = {yi}m

i=1

yi = a⊤

i Σai + ηi,

i = 1, · · · , m,

  • ai : sampling vectors
  • η = {ηi}m

i=1: noise terms

  • more concise operator form:

y = A(Σ) + η

  • Goal: recover Σ ∈ Rn×n.
  • Sampling model
  • sub-Gaussian i.i.d. sampling vectors

Problem Formulation

Page 8

slide-13
SLIDE 13

Piet Mondrian 1) low rank 2) Toeplitz low rank 3) jointly sparse and low rank

Geometry of Covariance Structure

  • # unknown > # stored measurements
  • exploit low-dimensional structures!
  • Structures considered in this talk:
  • low rank
  • Toeplitz low rank
  • simultaneously sparse and low-rank

Page 9

slide-14
SLIDE 14

Low Rank

  • Low-Rank Structure:
  • A few components explains most of the data variability
  • metric learning, array signal processing, collaborative filtering ...
  • rank(Σ) = r ≪ n.

Page 10

slide-15
SLIDE 15
  • Trace Minimization

(TraceMin) minimizeM trace (M)

  • low rank

s.t. A (M) − y1 ≤ ǫ

  • noise bound

, M 0.

  • inspired by Candes et. al. for phase retrieval

Trace Minimization for Low-Rank Structure

Page 11

slide-16
SLIDE 16

minimize tr (M) s.t. A (M) − y1 ≤ ǫ, M 0 Theorem 1 (Low Rank). With high prob, for all Σ with rank(Σ) ≤ r, the solution ˆ Σ to TraceMin obeys ˆ Σ − ΣF Σ − Σr∗ √r

  • due to imperfect structure

+ ǫ m

  • due to noise

, provided that m rn. (Σr: rank-r approx of Σ)

  • Exact recovery in the noiseless case
  • Universal recovery: simultaneously works for all low-rank matrices
  • Robust recovery when Σ is approximately low-rank
  • Stable recovery against bounded noise

Near-Optimal Recovery for Low-Rank Structure

Page 12

slide-17
SLIDE 17

m / (n*n) r/n

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

theoretic sampling limit

empirical success probability of Monte Carlo trials: n = 50

Phase Transition for Low-Rank Recovery

  • Near-Optimal Storage Complexity!
  • degrees of freedom ≈ rn

Page 13

slide-18
SLIDE 18

Toeplitz Low Rank

  • Toeplitz Low-Rank Structure:
  • Spectral sparsity!

∗ possibly off-the-grid frequency spikes (Vandemonde decomposition)

  • wireless communication, array signal processing ...
  • rank(Σ) = r ≪ n.

Page 14

slide-19
SLIDE 19
  • Trace Minimization

(ToepTraceMin) minimizeM trace (M)

  • low rank

s.t. A (M) − y2 ≤ ǫ2

  • noise bound

, M 0, M is Toeplitz.

Trace Minimization for Toeplitz Low-Rank Structure

Page 15

slide-20
SLIDE 20

minimize tr (M) s.t. A (M) − y2 ≤ ǫ2, M 0, M is Toeplitz Theorem 2 (Toeplitz Low Rank). With high prob, for all Toeplitz Σ with rank(Σ) ≤ r, the solution ˆ Σ to ToepTraceMin obeys ˆ Σ − ΣF ǫ2 √m

  • due to noise

, provided that m rpoly log(n).

  • Exact recovery in the absence of noise
  • Universal recovery: simultaneously works for all Toeplitz low-rank matrices
  • Stable recovery against bounded noise

Toeplitz ball

Near-Optimal Recovery for Toeplitz Low-Rank Structure

Page 16

slide-21
SLIDE 21

m: number of measurements r: rank

5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

theoretic sampling limit

empirical success probability of Monte Carlo trials: n = 50

Phase Transition for Toeplitz Low-Rank Recovery

  • Near-Optimal Storage Complexity!
  • degrees of freedom ≈ r

Page 17

slide-22
SLIDE 22

Simultaneous Structure

  • Joint Structure:

Σ is simultaneously sparse and low-rank.

  • rank:

r

  • sparsity:

k

  • SVD:

Σ = UΛU ⊤, where U = [u1, · · · , ur]

Page 18

slide-23
SLIDE 23
  • Convex Relaxation

minimizeM trace (M)

  • low rank

+ λM1

sparsity

s.t. A (M) − y1 ≤ ǫ

  • noise bound

, M 0.

  • coincides with Li and Voroninski for rank-1 cases

Convex Relaxation for Simultaneous Structure

Page 19

slide-24
SLIDE 24

Exact Recovery for Simultaneous Structure

minimize tr (M) + λ M1 s.t. A (M) = y, M 0 Theorem 3 (Simultaneous Structure). SDP with λ ∈

  • 1

n, 1 NΣ

  • is exact with

high probability, provided that m r log n λ2 (1) where NΣ := max

  • sign (ΣΩ) ,
  • k r

i=1ui2 1

r

  • .
  • Exact recovery with appropriate regularization parameters
  • Question: how good is the storage complexity (1)?

Page 20

slide-25
SLIDE 25

Compressible Covariance Matrices: Near-Optimal Recovery

Definition (Compressible Matrices)

  • non-zero entries of ui exhibit power-law decays
  • ui1 = O(poly log(n)).

Page 21

slide-26
SLIDE 26

Compressible Covariance Matrices: Near-Optimal Recovery

Definition (Compressible Matrices)

  • non-zero entries of ui exhibit power-law decays
  • ui1 = O(poly log(n)).

Corollary 1 (Compressible Case). For compressible covariance matrices, SDP with λ ≈

1 √ k is exact w.h.p., provided that

m kr · poly log(n).

  • Near-Minimal Measurements!
  • degree-of-freedom: Θ(kr)

Page 22

slide-27
SLIDE 27
  • noise: η1 ≤ ǫ
  • imperfect structural assumption: Σ =

ΣΩ

  • simultaneous sparse and low-rank

+ Σc

  • residuals

Stability and Robustness

Page 23

slide-28
SLIDE 28
  • noise: η1 ≤ ǫ
  • imperfect structural assumption: Σ =

ΣΩ

  • simultaneous sparse and low-rank

+ Σc

  • residuals

Theorem 4. Under the same λ as in Theorem 1 or Corollary 1,

  • ˆ

Σ − ΣΩ

  • F 1

√r   Σc∗ + λ Σc1

  • due to imperfect structure

  + ǫ m

  • due to noise
  • stable against bounded noise
  • robust against imperfect structural assumptions

Stability and Robustness

Page 24

slide-29
SLIDE 29

Mixed-Norm RIP (for Low-Rank and Joint Structure)

  • Restricted Isometry Property: a powerful notion for compressed sensing

∀X in some class : B (X)2 ≈ XF .

  • unfortunately, it does NOT hold for quadratic models

Page 25

slide-30
SLIDE 30

Mixed-Norm RIP (for Low-Rank and Joint Structure)

  • Restricted Isometry Property: a powerful notion for compressed sensing

∀X in some class : B (X)2 ≈ XF .

  • unfortunately, it does NOT hold for quadratic models
  • A Mixed-norm Variant:

RIP-ℓ2/ℓ1 ∀X in some class : B (X)1 ≈ XF .

Page 25

slide-31
SLIDE 31

Mixed-Norm RIP (for Low-Rank and Joint Structure)

  • Restricted Isometry Property: a powerful notion for compressed sensing

∀X in some class : B (X)2 ≈ XF .

  • unfortunately, it does NOT hold for quadratic models
  • A Mixed-norm Variant:

RIP-ℓ2/ℓ1 ∀X in some class : B (X)1 ≈ XF .

  • does NOT hold for A, but hold after A is debiased
  • A very simple proof for PhaseLift!

Page 25

slide-32
SLIDE 32

Piet Mondrian

Concluding Remarks

  • Our approach / analysis works for other structural models
  • Sparse covariance matrix
  • Low-Rank plus Sparse matrix
  • The way ahead
  • Sparse inverse covariance matrix
  • Beyond sub-Gaussian sampling
  • Online recovery algorithms

Page 26

slide-33
SLIDE 33

Q&A

Full-length version available at arXiv: Exact and Stable Covariance Estimation from Quadratic Sampling via Convex Programming http://arxiv.org/abs/1310.0807

Thank You! Questions?

Page 27