SLIDE 1
Spectral Analysis of Stationary Stochastic Process
Hanxiao Liu hanxiaol@cs.cmu.edu February 20, 2016
1 / 16
SLIDE 2 Outline
◮ Stationarity ◮ The time-frequency dual
◮ Spectral representation ◮ Marginal/conditional dependencies
◮ Inference
2 / 16
SLIDE 3 Stationary Stochastic Process
Strong stationarity: ∀t1, . . . , tk, h (X(t1), . . . , X(tk))
D
= (X(t1 + h), . . . , X(tk + h)) (1) Weak/2nd-order stationarity: E
< ∞ ∀t (2) E (X(t)) = µ ∀t (3) Cov (X(t), X(t + h)) = Γ(h) ∀t, h (4) The r.h.s. does not depend on t. Γ(h) autocovariance function (marginal dependencies) Γ(0) variance (power) of X
3 / 16
SLIDE 4 Spectral Representation Theorem
X(t) = π
−π
eiwtdZ(ω) (5)
◮ E [dZ(ω)dZ∗(ω′)] = 0 if ω = ω′. ◮ ∗ denotes Hermitian (conjugate) transpose.
Compared to X(t), we are more interested in Γ(h)—
0illustrative animation A and B.
4 / 16
SLIDE 5 Spectral Representation Theorem
Γ(h) = E
(6) = E
e0dZ(ω)
=
- ω
- ω′ eiw′hE [dZ(ω)dZ∗(ω′)]
(8) =
eiwhE [dZ(ω)dZ∗(ω)] (9) =
eiwhs (ω)dω (10) Γ(h) - covariance with lag h (time domain) s(ω) - covariance at frequency ω (freq domain)
5 / 16
SLIDE 6 Spectral Density Function
The Fourier transform pair Γ(h) =
eiwhs (ω)dω (11) s (ω) = 1 2π
∞
Γ(h)e−iωh (12) We call s the spectral density function, since Γ(0) =
s(ω)dω (13) Γ(0) = Cov(X(t), X(t)) = cumulative effect of s(w)
6 / 16
SLIDE 7 Marginal Dependencies
Γ(h) ← sample autocovariance function ˆ Γ(h) = 1 N
N−h−1
X X(t + h) − ¯ X ⊤ (14) Asymptotic normality under mild assumptions. s(ω) ← periodogram. Let ωk = 2πk
N ,
I(ωk) = d(k)d(k)∗ → ˆ s(ω) (15) where d(k) := 1
N
N−1
t=0 X(t)e−ikt is obtained via DFT. ◮ bad estimator in general ◮ good estimator with appropriate smoothing
7 / 16
SLIDE 8 Conditional Dependence
For time-series i and j Xi | = Xj | XV \{i,j} (16) ⇐ ⇒ Cov
- Xi(t), Xi(t + h) | XV \{i,j}
- = 0, ∀h
(17) ⇐ ⇒ (Γ(h)−1)ij = 0, ∀h (18) ⇐ ⇒ (s(ω)−1)ij = 0, ∀ω ∈ [0, 2π] (19) Inferring conditional dependences
◮ = inferring Γ(h)−1 ◮ = inferring s(ω)−1
Applicable to any stationary X
8 / 16
SLIDE 9 Autoregressive Gaussian Process
The Autoregressive (AR) process X(t) = −
p
AhX(t − h) + ǫ(t) (20) ǫ(t) Gaussian white noise ∼ N (0, Σ) We’d like to parametrize s(ω)−1 with A
◮ Inferring conditional dependences for AR can be cast
as an optimization problem w.r.t. A
9 / 16
SLIDE 10
Filter Theorem
For any stationary X and {at} s.t. ∞
t=−∞ |at| < ∞,
process Y (t) = ∞
h=−∞ ahX(t − h) is stationary with
sY (ω) = |A(eiω)|2sX(ω) (21) where A(z) = ∞
−∞ ahz−h
In 1-d AR, ǫ(t) = x(t) + p
h=1 ahx(t − h) =
⇒ s(ω)−1 = |A(eiω)|2
σ2
Multi-dimensional analogy: s(ω)−1 = A(eiω)Σ−1A(eiω)∗ (22) where A(z) = p
h=0 Ahz−h, A0 := I.
10 / 16
SLIDE 11 Parametrized Spectral Density
Parametrize s(ω)−1 by AR parameters s(ω)−1 =
Ahe−ihω
Ahe−ihω ∗ (23) = Y0 + 1 2
p
h
where Y0 = p
h=0 A⊤ h Σ−1Ah, Yh = 2 p−h i=0 A⊤ i Σ−1Ai+h
Bh
def
= Σ− 1
2Ah =
⇒ Y0 = p
h=0 B⊤ h Bh, Yh = 2 p−h i=0 B⊤ i Bi+h
(s(ω)−1)ij = 0 ⇐ ⇒ (Yh)ij = (Yh)ji = 0, ∀0, . . . , p, i.e. linear constraints over Y ⇐ ⇒ quadratic constraints over B
11 / 16
SLIDE 12 Conditional MLE
Simplification: fix x(1), . . . x(p) ǫ(t) =
p
Ahx(t − h) (25) = [A0, . . . , Ah] x(t) x(t − 1) . . . x(t − p) := Ax(t) ∼ N(0, Σ) (26) A least-squares estimate. Likelihood = e− 1
2
N
t=p+1 x(t)⊤A⊤Σ−1Ax(t)
(2π)
m(N−p) 2
(det Σ)
N−p 2
B=Σ− 1
2 A
= = = = = = = e− 1
2
N
t=p+1 x(t)⊤B⊤Bx(t)
(2π)
m(N−p) 2
(det B0)p−N (27)
12 / 16
SLIDE 13 Regularized ML
Maximize log-likelihood min
B
−2 log det B0 + tr
Solution given by Yule-Walker equations. Enforcing sparsity over s(ω)−1 min
B
−2 log det B0 + tr
(29) Convex relaxation: min
Z0
− log det Z00 + tr (CZ) + γD(Z)1 (30)
◮ Exact if rank(Z∗) ≤ m ◮ Bregman divergence + ℓ1-regularization. Well studied.
13 / 16
SLIDE 14 Non-stationary Extensions
With stationarity s (ω) = 1 2π
∞
Γ(h)e−iωh (31) No stationarity? The Wigner-Ville spectrum s(t, ω) = 1 2π
∞
Γ
2, t − h 2
(32) Other types of power spectra
◮ Rihaczek spectrum ◮ (Generalized) Evolutionary spectrum
14 / 16
SLIDE 15
Reference I
Bach, F. R. and Jordan, M. I. (2004). Learning graphical models for stationary time series. Signal Processing, IEEE Transactions on, 52(8):2189–2199. Basu, S., Michailidis, G., et al. (2015). Regularized estimation in sparse high-dimensional time series models. The Annals of Statistics, 43(4):1535–1567. Matz, G. and Hlawatsch, F. (2003). Time-varying power spectra of nonstationary random processes. na. Pereira, J., Ibrahimi, M., and Montanari, A. (2010). Learning networks of stochastic differential equations. In Advances in Neural Information Processing Systems, pages 172–180. Songsiri, J., Dahl, J., and Vandenberghe, L. (2010). Graphical models of autoregressive processes. Convex Optimization in Signal Processing and Communications, pages 89–116.
15 / 16
SLIDE 16
Reference II
Songsiri, J. and Vandenberghe, L. (2010). Topology selection in graphical models of autoregressive processes. The Journal of Machine Learning Research, 11:2671–2705. Tank, A., Foti, N. J., and Fox, E. B. (2015). Bayesian structure learning for stationary time series. In Uncertainty in Artificial Intelligence, UAI 2015, July 12-16, 2015, Amsterdam, The Netherlands, pages 872–881.
16 / 16