Spectral Analysis of Stationary Stochastic Process Hanxiao Liu - - PowerPoint PPT Presentation

▶

Jan 30, 2024 171 likes •341 views

Spectral Analysis of Stationary Stochastic Process Hanxiao Liu hanxiaol@cs.cmu.edu February 20, 2016 1 / 16 Outline Stationarity The time-frequency dual Spectral representation Marginal/conditional dependencies Inference 2

SLIDE 1

Spectral Analysis of Stationary Stochastic Process

Hanxiao Liu hanxiaol@cs.cmu.edu February 20, 2016

1 / 16

SLIDE 2

Outline

◮ Stationarity ◮ The time-frequency dual

◮ Spectral representation ◮ Marginal/conditional dependencies

◮ Inference

2 / 16

SLIDE 3

Stationary Stochastic Process

Strong stationarity: ∀t1, . . . , tk, h (X(t1), . . . , X(tk))

D

= (X(t1 + h), . . . , X(tk + h)) (1) Weak/2nd-order stationarity: E

X(t)X(t)⊤

< ∞ ∀t (2) E (X(t)) = µ ∀t (3) Cov (X(t), X(t + h)) = Γ(h) ∀t, h (4) The r.h.s. does not depend on t. Γ(h) autocovariance function (marginal dependencies) Γ(0) variance (power) of X

3 / 16

SLIDE 4

Spectral Representation Theorem

X(t) = π

−π

eiwtdZ(ω) (5)

◮ E [dZ(ω)dZ∗(ω′)] = 0 if ω = ω′. ◮ ∗ denotes Hermitian (conjugate) transpose.

Compared to X(t), we are more interested in Γ(h)—

0illustrative animation A and B.

4 / 16

SLIDE 5

Spectral Representation Theorem

Γ(h) = E

X(0)X(h)⊤

(6) = E

e0dZ(ω)

ω′ eiw′hdZ∗(ω′)
(7)

=

ω
ω′ eiw′hE [dZ(ω)dZ∗(ω′)]

(8) =

eiwhE [dZ(ω)dZ∗(ω)] (9) =

eiwhs (ω)dω (10) Γ(h) - covariance with lag h (time domain) s(ω) - covariance at frequency ω (freq domain)

5 / 16

SLIDE 6

Spectral Density Function

The Fourier transform pair Γ(h) =

eiwhs (ω)dω (11) s (ω) = 1 2π

∞

h=−∞

Γ(h)e−iωh (12) We call s the spectral density function, since Γ(0) =

s(ω)dω (13) Γ(0) = Cov(X(t), X(t)) = cumulative effect of s(w)

6 / 16

SLIDE 7

Marginal Dependencies

Γ(h) ← sample autocovariance function ˆ Γ(h) = 1 N

N−h−1

t=0
X(t) − ¯

X X(t + h) − ¯ X ⊤ (14) Asymptotic normality under mild assumptions. s(ω) ← periodogram. Let ωk = 2πk

N ,

I(ωk) = d(k)d(k)∗ → ˆ s(ω) (15) where d(k) := 1

N

N−1

t=0 X(t)e−ikt is obtained via DFT. ◮ bad estimator in general ◮ good estimator with appropriate smoothing

7 / 16

SLIDE 8

Conditional Dependence

For time-series i and j Xi | = Xj | XV \{i,j} (16) ⇐ ⇒ Cov

Xi(t), Xi(t + h) | XV \{i,j}
= 0, ∀h

(17) ⇐ ⇒ (Γ(h)−1)ij = 0, ∀h (18) ⇐ ⇒ (s(ω)−1)ij = 0, ∀ω ∈ [0, 2π] (19) Inferring conditional dependences

◮ = inferring Γ(h)−1 ◮ = inferring s(ω)−1

Applicable to any stationary X

8 / 16

SLIDE 9

Autoregressive Gaussian Process

The Autoregressive (AR) process X(t) = −

p

AhX(t − h) + ǫ(t) (20) ǫ(t) Gaussian white noise ∼ N (0, Σ) We’d like to parametrize s(ω)−1 with A

◮ Inferring conditional dependences for AR can be cast

as an optimization problem w.r.t. A

9 / 16

SLIDE 10

Filter Theorem

For any stationary X and {at} s.t. ∞

t=−∞ |at| < ∞,

process Y (t) = ∞

h=−∞ ahX(t − h) is stationary with

sY (ω) = |A(eiω)|2sX(ω) (21) where A(z) = ∞

−∞ ahz−h

In 1-d AR, ǫ(t) = x(t) + p

h=1 ahx(t − h) =

⇒ s(ω)−1 = |A(eiω)|2

σ2

Multi-dimensional analogy: s(ω)−1 = A(eiω)Σ−1A(eiω)∗ (22) where A(z) = p

h=0 Ahz−h, A0 := I.

10 / 16

SLIDE 11

Parametrized Spectral Density

Parametrize s(ω)−1 by AR parameters s(ω)−1 =

Ahe−ihω

Σ−1
p
h=0

Ahe−ihω ∗ (23) = Y0 + 1 2

p

h=1
e−ihωYh + eihωY ⊤

h

(24)

where Y0 = p

h=0 A⊤ h Σ−1Ah, Yh = 2 p−h i=0 A⊤ i Σ−1Ai+h

Bh

def

= Σ− 1

2Ah =

⇒ Y0 = p

h=0 B⊤ h Bh, Yh = 2 p−h i=0 B⊤ i Bi+h

(s(ω)−1)ij = 0 ⇐ ⇒ (Yh)ij = (Yh)ji = 0, ∀0, . . . , p, i.e. linear constraints over Y ⇐ ⇒ quadratic constraints over B

11 / 16

SLIDE 12

Conditional MLE

Simplification: fix x(1), . . . x(p) ǫ(t) =

p

Ahx(t − h) (25) = [A0, . . . , Ah]      x(t) x(t − 1) . . . x(t − p)      := Ax(t) ∼ N(0, Σ) (26) A least-squares estimate. Likelihood = e− 1

N

t=p+1 x(t)⊤A⊤Σ−1Ax(t)

(2π)

m(N−p) 2

(det Σ)

N−p 2

B=Σ− 1

2 A

= = = = = = = e− 1

N

t=p+1 x(t)⊤B⊤Bx(t)

(2π)

m(N−p) 2

(det B0)p−N (27)

12 / 16

SLIDE 13

Regularized ML

Maximize log-likelihood min

B

−2 log det B0 + tr

CB⊤B
(28)

Solution given by Yule-Walker equations. Enforcing sparsity over s(ω)−1 min

B

−2 log det B0 + tr

CB⊤B
+ γD(B⊤B)1

(29) Convex relaxation: min

Z0

− log det Z00 + tr (CZ) + γD(Z)1 (30)

◮ Exact if rank(Z∗) ≤ m ◮ Bregman divergence + ℓ1-regularization. Well studied.

13 / 16

SLIDE 14

Non-stationary Extensions

With stationarity s (ω) = 1 2π

∞

h=−∞

Γ(h)e−iωh (31) No stationarity? The Wigner-Ville spectrum s(t, ω) = 1 2π

∞

h=−∞

Γ

t + h

2, t − h 2

e−iωh

(32) Other types of power spectra

◮ Rihaczek spectrum ◮ (Generalized) Evolutionary spectrum

14 / 16

SLIDE 15

Reference I

Bach, F. R. and Jordan, M. I. (2004). Learning graphical models for stationary time series. Signal Processing, IEEE Transactions on, 52(8):2189–2199. Basu, S., Michailidis, G., et al. (2015). Regularized estimation in sparse high-dimensional time series models. The Annals of Statistics, 43(4):1535–1567. Matz, G. and Hlawatsch, F. (2003). Time-varying power spectra of nonstationary random processes. na. Pereira, J., Ibrahimi, M., and Montanari, A. (2010). Learning networks of stochastic differential equations. In Advances in Neural Information Processing Systems, pages 172–180. Songsiri, J., Dahl, J., and Vandenberghe, L. (2010). Graphical models of autoregressive processes. Convex Optimization in Signal Processing and Communications, pages 89–116.

15 / 16

SLIDE 16

Reference II

Songsiri, J. and Vandenberghe, L. (2010). Topology selection in graphical models of autoregressive processes. The Journal of Machine Learning Research, 11:2671–2705. Tank, A., Foti, N. J., and Fox, E. B. (2015). Bayesian structure learning for stationary time series. In Uncertainty in Artificial Intelligence, UAI 2015, July 12-16, 2015, Amsterdam, The Netherlands, pages 872–881.