Stationarity DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data - - PowerPoint PPT Presentation
Stationarity DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data - - PowerPoint PPT Presentation
Stationarity DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data Science https://cims.nyu.edu/~cfgranda/pages/MTDS_spring20/index.html Carlos Fernandez-Granda Stationarity Translation Linear translation-invariant models Stationary signals
Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering
Motivation
Goal: Estimate signal y ∈ RN from noisy data x ∈ RN Regression problem Optimal estimator? Linear estimator?
Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering
Circular translation
We focus on circular translations that wrap around We denote by x ↓s the sth circular translation of a vector x ∈ CN For all 0 ≤ j ≤ N − 1, x ↓s[j] = x[(j − s) mod N]
Effect of shift on sinusoids
Shifting a sinusoid modifies its phase ψ ↓s
k [l] = exp
i2πk(l − s) N
- = exp
- −i2πks
N
- ψk[l]
Effect of translation in Fourier domain
Let x ∈ CN with DFT ˆ x and y := x ↓s ˆ y [k] := x ↓s, ψk = x, ψ ↓−s
k
- =
- x, exp
i2πks N
- ψk
- = exp
- −i2πks
N
- x, ψk
= exp
- −i2πks
N
- ˆ
x [k]
Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering
Linear translation-invariant (LTI) function
A function F from CN to CN is linear if for any x, y ∈ CN and any α ∈ C F(x + y) = F(x) + F(y), F(αx) = αF(x), and translation invariant if for any shift 0 ≤ s ≤ N − 1 F(x ↓s) = F(x) ↓s
Parametrizing a linear function
Let ej be the jth standard vector (ej[j] = 1 and ej[k] = 0 for k = j) Let FL : CN → CN be a linear function FL (x) = FL
N−1
- j=0
x[j]ej =
N−1
- j=0
x[j]FL (ej) =
- FL (e0)
FL (e1) · · · FL (eN−1)
- x
= Mx
Parametrizing an LTI function
Let F : CN → CN be linear and translation invariant FL (x) = F
N−1
- j=0
x[j]ej =
N−1
- j=0
x[j]F (ej) =
N−1
- j=0
x[j]F
- e ↓j
- =
N−1
- j=0
x[j]F (e0) ↓j
Impulse response
Standard basis vectors can be interpreted as impulses LTI are characterized by their impulse response hF := F(e0)
Circular convolution
The circular convolution between two vectors x, y ∈ CN is defined as x ∗ y [j] :=
N−1
- s=0
x [s] y ↓s [j] , 0 ≤ j ≤ N − 1
Convolution example: x
40 20 20 40 0.0 0.2 0.4 0.6 0.8 1.0
Convolution example: y
40 20 20 40 0.0 0.2 0.4 0.6 0.8 1.0
Convolution example: x ∗ y
40 20 20 40 2 4 6 8 10
Circular convolution
The 2D circular convolution between X ∈ CN×N and Y ∈ CN×N is X ∗ Y [j1, j2] :=
N−1
- s1=0
N−1
- s2=0
X [s1, s2] Y ↓(s1,s2) [j1, j2] , 0 ≤ j1, j2 ≤ N − 1
Convolution example: x
0.2 0.4 0.6 0.8
Convolution example: y
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
Convolution example: x ∗ y
30 40 50 60 70 80 90
LTI functions as convolution with impulse response
For any LTI function F : CN → CN and any x ∈ CN F (x) =
N−1
- j=0
x[j]F (e0) ↓j = x ∗ hF For any 2D LTI function F : CN×N → CN×N and any X ∈ CN×N F (X) = X ∗ HF
Convolution in time is multiplication in frequency
Let y := x1 ∗ x2, x1, x2 ∈ CN. Then ˆ y [k] = ˆ x1 [k] ˆ x2 [k] , 0 ≤ k ≤ N − 1
Convolution in time is multiplication in frequency
Let Y := X1 ∗ X2 for X1, X2 ∈ CN×N. Then
- Y [k1, k2] =
X1 [k1, k2] X2 [k1, k2]
Proof
ˆ y [k] := x1 ∗ x2, ψk = N−1
- s=0
x1 [s] x ↓s
2 , ψk
- =
N−1
- s=0
x1 [s] 1 N
N−1
- j=0
exp
- −i2πjs
N
- ˆ
x2[j]ψj, ψk
- =
N−1
- j=0
ˆ x2[j] 1 N ψj, ψk
N−1
- s=0
x1 [s] exp
- −i2πjs
N
- =
N−1
- j=0
ˆ x1[j]ˆ x2[j] 1 N ψj, ψk = ˆ x1[k]ˆ x2[k]
x
40 20 20 40 0.0 0.2 0.4 0.6 0.8 1.0
ˆ x
40 20 20 40 5 5 10 15 20
y
40 20 20 40 0.0 0.2 0.4 0.6 0.8 1.0
ˆ y
40 20 20 40 2 2 4 6 8 10
ˆ x ◦ ˆ y
40 20 20 40 50 100 150 200
x ∗ y
40 20 20 40 2 4 6 8 10
X
0.2 0.4 0.6 0.8
- X
10
1
100 101 102 103 104
Y
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
- Y
10
14
10
12
10
10
10
8
10
6
10
4
10
2
100 102
- X ◦
Y
10
13
10
10
10
7
10
4
10
1
102 105
X ∗ Y
30 40 50 60 70 80 90
Convolution in time is multiplication in frequency
LTI functions just scale Fourier coefficients! DFT of impulse response is the transfer function of the function For any LTI function F and any x ∈ C N F(x) =
N−1
- k=0
ˆ hF[k]ˆ x[k]ψk. For any 2D LTI function F and any X ∈ C N×N F(X) =
N−1
- k1=0
N
- k2=1
ˆ HF[k1, k2] X[k1, k2]Φk1,k2
Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering
Signal with translation-invariant statistics
Sample covariance matrix
0.1 0.2 0.3 0.4 0.5
Eigenvalues
20 40 60 80 100 10
1
100 101
Eigenvalues
Principal directions
1 2 3 4 5 6 7 8 9 10
Principal directions
15 20 25 30 40 50 60 70 80 90
Stationary signals
˜ x is wide-sense or weak-sense stationary if
- 1. it has a constant mean
E (˜ x[j]) = µ, 1 ≤ j ≤ N
- 2. there is a function a˜
x such that
E (˜ x[j1]˜ x[j2]) = ac˜
x(j2 − j1 mod N),
0 ≤ j1, j2 ≤ N − 1 i.e. it has translation-invariant covariance
Autocovariance
ac˜
x is the autocovariance of ˜
x For any j, ac˜
x(j) = ac˜ x(−j) = ac˜ x(N − j)
Σ˜
x =
ac˜
x(0)
ac˜
x(N − 1)
· · · ac˜
x(1)
ac˜
x(1)
ac˜
x(0)
· · · ac˜
x(2)
· · · ac˜
x(N − 1)
ac˜
x(N − 2)
· · · ac˜
x(0)
=
- a˜
x
a ↓1
˜ x
a ↓2
˜ x
· · · a ↓N−1
˜ x
- where
a˜
x :=
ac˜
x(0)
ac˜
x(1)
ac˜
x(2)
· · ·
Circulant matrix
Each column vector is a unit circular shift of previous column a d c b b a d c c b a d d c b a
Sample covariance matrix
0.1 0.2 0.3 0.4 0.5
Eigendecomposition of circulant matrix
Any circulant matrix C ∈ CN×N can be written as C := 1 N F ∗
[N]ΛF[N]
where F[N] is the DFT matrix and Λ is a diagonal matrix
Proof
For any vector x ∈ CN Cx = c ∗ x = 1 N F ∗
[N] diag(ˆ
c)F[N]x
Eigendecomposition of circulant covariance matrix
A valid eigendecomposition is given by 1 √ N F ∗
[N] diag(ˆ
c) 1 √ N F[N] If ˆ c have different values, singular vectors are sinusoids!
PCA on stationary vector
Let ˜ x be wide-sense stationary with autocovariance vector a˜
x
The eigendecomposition of the covariance matrix of ˜ x equals Σ˜
x = 1
N F ∗ diag(ˆ a˜
x)F
CIFAR-10 images
Rows of covariance matrix
1 4 8 12
Rows of covariance matrix
15 16 17 18
Rows of covariance matrix
20 24 28 32
Principal directions
1 2 3 4 5 6 7 8 9 10
Principal directions
15 20 25 30 40 50 60 70 80 90
Principal directions
100 150 100 250 300 400 500 600 800 1000
PCA of natural images
Principal directions tend to be sinusoidal This suggests using 2D sinusoids for dimensionality reduction JPEG compresses images using discrete cosine transform (DCT):
- 1. Image is divided into 8 × 8 patches
- 2. Each DCT band is quantized differently (more bits for lower
frequencies)
DCT basis vectors
Projection of each 8x8 block onto first DCT coefficients
1 5 15 30 50
Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering
Signal estimation
Goal: Estimate N-dimensional signal from N-dimensional data Minimum MSE estimator is conditional mean (usually intractable) Linear minimum MSE estimator?
Linear MMSE
Let ˜ y and ˜ x be N-dimensional zero-mean random vectors If Σ˜
x is full rank, then
Σ−1
˜ x Σ˜ x ˜ y := arg min B E
- ˜
y − BT ˜ x
- 2
2
- Σ˜
x ˜ y := E
- ˜
x ˜ yT
Proof
The cost function can be decomposed into E
- ˜
y − BT ˜ x
- 2
2
- =
n
- j=1
E
- ˜
y[j] − BT
j ˜
x 2 Each one is a linear regression problem with optimal estimator Σ−1
˜ x (Σ˜ x ˜ y)j = arg min Bj E
- (˜
y[j] − ˜ xTBj)2 where (Σ˜
x ˜ y)j is the jth column of Σ˜ x ˜ y
Joint stationarity
˜ x and ˜ y are jointly wide-sense or weak-sense stationary if
- 1. they are each wide-sense or weak-sense stationary
- 2. there is a function cc˜
x,˜ y such that
E (˜ x[j1]˜ y[j2]) = cc˜
x ˜ y(j2 − j1 mod N),
0 ≤ j1, j2 ≤ N − 1 i.e. they have translation-invariant cross-covariance
Cross-covariance
cc˜
x ˜ y is the cross-covariance of ˜
x and ˜ y Σ˜
x ˜ y
cc˜
x ˜ y(0)
cc˜
x ˜ y(N − 1)
· · · cc˜
x ˜ y(1)
cc˜
x ˜ y(1)
cc˜
x ˜ y(0)
· · · cc˜
x ˜ y(2)
· · · cc˜
x ˜ y(N − 1)
cc˜
x ˜ y(N − 2)
· · · cc˜
x ˜ y(0)
=
- c˜
x ˜ y
c ↓1
˜ x
c ↓2
˜ x
· · · c ↓N−1
˜ x
- where
c˜
x ˜ y :=
cc˜
x ˜ y(0)
cc˜
x ˜ y(1)
cc˜
x ˜ y(2)
· · ·
Wiener filter
Let ˜ x and ˜ y be zero-mean and jointly stationary The linear estimate of ˜ y given ˜ x that minimizes MSE as the convolution
- f ˜
x with the Wiener filter w, defined by ˆ w[k] := Cov(˜ xF[k], ˜ yF[k]) Var(˜ xF[k]) , 0 ≤ k ≤ N − 1 where ˜ xF and ˜ yF denote the DFT coefficients of ˜ x and ˜ y, and Cov(˜ xF[k], ˜ yF[k]) := E
- ˜
xF[k]˜ yF[k]
- Var(˜
xF[k]) := E
- |˜
xF[k]|2 , 0 ≤ k ≤ N − 1
Proof
Σ˜
x = 1
N F ∗ diag(ˆ a˜
x)F
Σ˜
x ˜ y = 1
N F ∗ diag(ˆ c˜
x)F
ΣT
˜ x ˜ yΣ−1 ˜ x
= 1 N F ∗ diag(ˆ a˜
x)F
−1 1 N F ∗ diag(ˆ c˜
x)F
= 1 N F ∗ diag(ˆ a−1
˜ x )F 1
N F ∗ diag(ˆ c˜
x)F
= 1 N F ∗ diag(ˆ a−1
˜ x ˆ
c˜
x)F
Proof
Σ˜
xF := E (F ˜
x(F ˜ x)∗) = FE
- ˜
x ˜ xT F ∗ = FΣ˜
xF ∗
= F 1 N F ∗ diag(ˆ a˜
x)FF ∗
= N diag(ˆ a˜
x)
ˆ a˜
x[k] = Var(˜
xF[k]) N , 0 ≤ k ≤ N − 1
Proof
Σ˜
xF ˜ yF := E (F ˜
x(F ˜ y)∗) = FE
- ˜
x ˜ yT F ∗ = FΣ˜
x ˜ yF ∗
= F 1 N F ∗ diag(ˆ c˜
x)FF ∗
= N diag(ˆ c˜
x)
ˆ c˜
x ˜ y[k] = Cov(˜
xF[k], ˜ yF[k]) N , 0 ≤ k ≤ N − 1
Proof
ΣT
˜ x ˜ yΣ−1 ˜ x
= F ∗ diag(ˆ a−1
˜ x ˆ
c˜
x)F
= F ∗ diagN−1
k=0
Cov(˜ xF[k], ˜ yF[k]) Var(˜ xF[k])
- F
Least squares
Training set (y1, x1), (y2, x2), . . . , (yn, xn) optimal LTI estimator w := arg min
v n
- j=1
||yj − v ∗ xj||2
2
is Wiener filter with transfer function ˆ w = cov
- ˆ
X[k], ˆ Y[k]
- var
- ˆ
X[k]
- ,
0 ≤ k ≤ N − 1 where cov
- ˆ
X[k], ˆ Y[k]
- := 1
n
n
- j=1
ˆ xj[k]ˆ yj[k] var
- ˆ
X[k]
- := 1
n
n
- j=1
|ˆ xj[k]|2 , 0 ≤ k ≤ N − 1
Proof
n
- j=1
||yj − v ∗ xj||2
2 = n
- j=1
- 1
√ N F ∗(ˆ yj − ˆ v ◦ ˆ xj)
- 2
2
= 1 N2
n
- j=1
||ˆ yj − ˆ v ◦ ˆ xj||2
2
= 1 N2
n
- j=1
N
- k=1
|ˆ yj[k] − ˆ v[k]ˆ xj[k]|2 := 1 N2
N
- k=1
Ck (ˆ v[k])
Denoising
Measurements ˜ x = ˜ y + ˜ z, where ˜ z is zero-mean Gaussian noise with variance σ2, independent of ˜ y
Noise
Linear transformation A˜ z of a Gaussian vector with mean µ and covariance matrix Σ is Gaussian with mean Aµ and cov. matrix AΣA∗ Fourier coefficients of noise are Gaussian with zero mean and covariance matrix F[N]σ2IF ∗
[N] = Nσ2I (iid Gaussian with variance Nσ2)
Wiener filter
Cov(˜ xF[k], ˜ yF[k]) = E
- ˜
xF[k]˜ yF[k]
- = E
- ˜
yF[k]˜ yF[k]
- + E
- ˜
zF[k]˜ yF[k]
- = Var (˜
yF[k]) Var(˜ xF[k]) = Var (˜ yF[k]) + Var (˜ zF[k]) = Var (˜ yF[k]) + σ2 ˆ w[k] = Var (˜ yF[k]) Var (˜ yF[k]) + σ2 , 0 ≤ k ≤ N − 1
Audio data
4.0 4.1 4.2 4.3 4.4 4.5 Time (s) 7500 5000 2500 2500 5000 7500
Audio data: Variance of Fourier coefficients
4000 3000 2000 1000 1000 2000 3000 4000 Frequency (Hz) 10
1
100 101 102 103 104 105
Wiener filter: σ = 0.02
Frequency Time
4000 2000 2000 4000 Frequency (Hz) 10
4
10
3
10
2
10
1
100 3.115 3.120 3.125 3.130 3.135
Time (s) 0.0 0.2 0.4 0.6 0.8
Wiener filter: σ = 0.1
Frequency Time
4000 2000 2000 4000 Frequency (Hz) 10
4
10
3
10
2
10
1
100 3.115 3.120 3.125 3.130 3.135
Time (s) 0.0 0.2 0.4 0.6 0.8
Wiener filter: σ = 0.5
Frequency Time
4000 2000 2000 4000 Frequency (Hz) 10
4
10
3
10
2
10
1
100 3.115 3.120 3.125 3.130 3.135
Time (s) 0.0 0.2 0.4 0.6 0.8
Example: σ = 0.1
Clean Noisy Denoised
4.0 4.1 4.2 4.3 4.4 4.5 Time (s) 6000 4000 2000 2000 4000 6000 8000 4.0 4.1 4.2 4.3 4.4 4.5 Time (s) 7500 5000 2500 2500 5000 7500 4.0 4.1 4.2 4.3 4.4 4.5 Time (s) 8000 6000 4000 2000 2000 4000 6000 8000 4000 3000 2000 1000 1000 2000 3000 4000 Frequency (Hz) 10
3
10
2
10
1
100 101 102 103 4000 3000 2000 1000 1000 2000 3000 4000 Frequency (Hz) 10
3
10
2
10
1
100 101 102 103 4000 3000 2000 1000 1000 2000 3000 4000 Frequency (Hz) 10
3
10
2
10
1
100 101 102 103
Example: σ = 0.1
4.247 4.248 4.249 4.250 4.251 4.252 4.253 Time (s) 10000 5000 5000 10000 Signal Denoised signal Noisy data
Example: σ = 0.5
Clean Noisy Denoised
4.0 4.1 4.2 4.3 4.4 4.5 Time (s) 6000 4000 2000 2000 4000 6000 8000 4.0 4.1 4.2 4.3 4.4 4.5 Time (s) 15000 10000 5000 5000 10000 15000 4.0 4.1 4.2 4.3 4.4 4.5 Time (s) 7500 5000 2500 2500 5000 7500 10000 4000 3000 2000 1000 1000 2000 3000 4000 Frequency (Hz) 10
3
10
2
10
1
100 101 102 103 4000 3000 2000 1000 1000 2000 3000 4000 Frequency (Hz) 10
3
10
2
10
1
100 101 102 103 4000 3000 2000 1000 1000 2000 3000 4000 Frequency (Hz) 10
3
10
2
10
1
100 101 102 103
Example: σ = 0.5
4.247 4.248 4.249 4.250 4.251 4.252 4.253 Time (s) 10000 5000 5000 10000 Signal Denoised signal Noisy data
Image data
Image data: Variance of Fourier coefficients
90 65 40 15 10 35 60 85
k2
90 65 40 15 10 35 60 85
k1 101 102 103 104 105 106 107 108
Wiener filter: σ = 0.04
Frequency Space
50 50 k2 75 50 25 25 50 75 k1 10
3
10
2
10
1
0.00 0.05 0.10 0.15 0.20 0.25 0.30
Wiener filter: σ = 0.1
Frequency Space
50 50 k2 75 50 25 25 50 75 k1 10
3
10
2
10
1
0.00 0.05 0.10 0.15 0.20 0.25 0.30
Wiener filter: σ = 0.2
Frequency Space
50 50 k2 75 50 25 25 50 75 k1 10
3
10
2
10
1
0.00 0.05 0.10 0.15 0.20 0.25 0.30
Example: σ = 0.04
Clean Noisy Denoised
50 50 k2 75 50 25 25 50 75 k1 10
3
10
2
10
1
100 101 102 103 104 50 50 k2 75 50 25 25 50 75 k1 10
3
10
2
10
1
100 101 102 103 104 50 50 k2 75 50 25 25 50 75 k1 10
3
10
2
10
1
100 101 102 103 104
Example: σ = 0.1
Clean Noisy Denoised
50 50 k2 75 50 25 25 50 75 k1 10
3
10
2
10
1
100 101 102 103 104 50 50 k2 75 50 25 25 50 75 k1 10
3
10
2
10
1
100 101 102 103 104 50 50 k2 75 50 25 25 50 75 k1 10
3
10
2
10
1
100 101 102 103 104
Example: σ = 0.2
Clean Noisy Denoised
50 50 k2 75 50 25 25 50 75 k1 10
3
10
2
10
1
100 101 102 103 104 50 50 k2 75 50 25 25 50 75 k1 10
3
10
2
10
1
100 101 102 103 104 50 50 k2 75 50 25 25 50 75 k1 10
3
10
2
10
1
100 101 102 103 104