Stationarity DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data - - PowerPoint PPT Presentation

stationarity
SMART_READER_LITE
LIVE PREVIEW

Stationarity DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data - - PowerPoint PPT Presentation

Stationarity DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data Science https://cims.nyu.edu/~cfgranda/pages/MTDS_spring20/index.html Carlos Fernandez-Granda Stationarity Translation Linear translation-invariant models Stationary signals


slide-1
SLIDE 1

Stationarity

DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data Science

https://cims.nyu.edu/~cfgranda/pages/MTDS_spring20/index.html Carlos Fernandez-Granda

slide-2
SLIDE 2

Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering

slide-3
SLIDE 3

Motivation

Goal: Estimate signal y ∈ RN from noisy data x ∈ RN Regression problem Optimal estimator? Linear estimator?

slide-4
SLIDE 4

Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering

slide-5
SLIDE 5

Circular translation

We focus on circular translations that wrap around We denote by x ↓s the sth circular translation of a vector x ∈ CN For all 0 ≤ j ≤ N − 1, x ↓s[j] = x[(j − s) mod N]

slide-6
SLIDE 6

Effect of shift on sinusoids

Shifting a sinusoid modifies its phase ψ ↓s

k [l] = exp

i2πk(l − s) N

  • = exp
  • −i2πks

N

  • ψk[l]
slide-7
SLIDE 7

Effect of translation in Fourier domain

Let x ∈ CN with DFT ˆ x and y := x ↓s ˆ y [k] := x ↓s, ψk = x, ψ ↓−s

k

  • =
  • x, exp

i2πks N

  • ψk
  • = exp
  • −i2πks

N

  • x, ψk

= exp

  • −i2πks

N

  • ˆ

x [k]

slide-8
SLIDE 8

Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering

slide-9
SLIDE 9

Linear translation-invariant (LTI) function

A function F from CN to CN is linear if for any x, y ∈ CN and any α ∈ C F(x + y) = F(x) + F(y), F(αx) = αF(x), and translation invariant if for any shift 0 ≤ s ≤ N − 1 F(x ↓s) = F(x) ↓s

slide-10
SLIDE 10

Parametrizing a linear function

Let ej be the jth standard vector (ej[j] = 1 and ej[k] = 0 for k = j) Let FL : CN → CN be a linear function FL (x) = FL  

N−1

  • j=0

x[j]ej   =

N−1

  • j=0

x[j]FL (ej) =

  • FL (e0)

FL (e1) · · · FL (eN−1)

  • x

= Mx

slide-11
SLIDE 11

Parametrizing an LTI function

Let F : CN → CN be linear and translation invariant FL (x) = F  

N−1

  • j=0

x[j]ej   =

N−1

  • j=0

x[j]F (ej) =

N−1

  • j=0

x[j]F

  • e ↓j
  • =

N−1

  • j=0

x[j]F (e0) ↓j

slide-12
SLIDE 12

Impulse response

Standard basis vectors can be interpreted as impulses LTI are characterized by their impulse response hF := F(e0)

slide-13
SLIDE 13

Circular convolution

The circular convolution between two vectors x, y ∈ CN is defined as x ∗ y [j] :=

N−1

  • s=0

x [s] y ↓s [j] , 0 ≤ j ≤ N − 1

slide-14
SLIDE 14

Convolution example: x

40 20 20 40 0.0 0.2 0.4 0.6 0.8 1.0

slide-15
SLIDE 15

Convolution example: y

40 20 20 40 0.0 0.2 0.4 0.6 0.8 1.0

slide-16
SLIDE 16

Convolution example: x ∗ y

40 20 20 40 2 4 6 8 10

slide-17
SLIDE 17

Circular convolution

The 2D circular convolution between X ∈ CN×N and Y ∈ CN×N is X ∗ Y [j1, j2] :=

N−1

  • s1=0

N−1

  • s2=0

X [s1, s2] Y ↓(s1,s2) [j1, j2] , 0 ≤ j1, j2 ≤ N − 1

slide-18
SLIDE 18

Convolution example: x

0.2 0.4 0.6 0.8

slide-19
SLIDE 19

Convolution example: y

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

slide-20
SLIDE 20

Convolution example: x ∗ y

30 40 50 60 70 80 90

slide-21
SLIDE 21

LTI functions as convolution with impulse response

For any LTI function F : CN → CN and any x ∈ CN F (x) =

N−1

  • j=0

x[j]F (e0) ↓j = x ∗ hF For any 2D LTI function F : CN×N → CN×N and any X ∈ CN×N F (X) = X ∗ HF

slide-22
SLIDE 22

Convolution in time is multiplication in frequency

Let y := x1 ∗ x2, x1, x2 ∈ CN. Then ˆ y [k] = ˆ x1 [k] ˆ x2 [k] , 0 ≤ k ≤ N − 1

slide-23
SLIDE 23

Convolution in time is multiplication in frequency

Let Y := X1 ∗ X2 for X1, X2 ∈ CN×N. Then

  • Y [k1, k2] =

X1 [k1, k2] X2 [k1, k2]

slide-24
SLIDE 24

Proof

ˆ y [k] := x1 ∗ x2, ψk = N−1

  • s=0

x1 [s] x ↓s

2 , ψk

  • =

N−1

  • s=0

x1 [s] 1 N

N−1

  • j=0

exp

  • −i2πjs

N

  • ˆ

x2[j]ψj, ψk

  • =

N−1

  • j=0

ˆ x2[j] 1 N ψj, ψk

N−1

  • s=0

x1 [s] exp

  • −i2πjs

N

  • =

N−1

  • j=0

ˆ x1[j]ˆ x2[j] 1 N ψj, ψk = ˆ x1[k]ˆ x2[k]

slide-25
SLIDE 25

x

40 20 20 40 0.0 0.2 0.4 0.6 0.8 1.0

slide-26
SLIDE 26

ˆ x

40 20 20 40 5 5 10 15 20

slide-27
SLIDE 27

y

40 20 20 40 0.0 0.2 0.4 0.6 0.8 1.0

slide-28
SLIDE 28

ˆ y

40 20 20 40 2 2 4 6 8 10

slide-29
SLIDE 29

ˆ x ◦ ˆ y

40 20 20 40 50 100 150 200

slide-30
SLIDE 30

x ∗ y

40 20 20 40 2 4 6 8 10

slide-31
SLIDE 31

X

0.2 0.4 0.6 0.8

slide-32
SLIDE 32
  • X

10

1

100 101 102 103 104

slide-33
SLIDE 33

Y

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

slide-34
SLIDE 34
  • Y

10

14

10

12

10

10

10

8

10

6

10

4

10

2

100 102

slide-35
SLIDE 35
  • X ◦

Y

10

13

10

10

10

7

10

4

10

1

102 105

slide-36
SLIDE 36

X ∗ Y

30 40 50 60 70 80 90

slide-37
SLIDE 37

Convolution in time is multiplication in frequency

LTI functions just scale Fourier coefficients! DFT of impulse response is the transfer function of the function For any LTI function F and any x ∈ C N F(x) =

N−1

  • k=0

ˆ hF[k]ˆ x[k]ψk. For any 2D LTI function F and any X ∈ C N×N F(X) =

N−1

  • k1=0

N

  • k2=1

ˆ HF[k1, k2] X[k1, k2]Φk1,k2

slide-38
SLIDE 38

Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering

slide-39
SLIDE 39

Signal with translation-invariant statistics

slide-40
SLIDE 40

Sample covariance matrix

0.1 0.2 0.3 0.4 0.5

slide-41
SLIDE 41

Eigenvalues

20 40 60 80 100 10

1

100 101

Eigenvalues

slide-42
SLIDE 42

Principal directions

1 2 3 4 5 6 7 8 9 10

slide-43
SLIDE 43

Principal directions

15 20 25 30 40 50 60 70 80 90

slide-44
SLIDE 44

Stationary signals

˜ x is wide-sense or weak-sense stationary if

  • 1. it has a constant mean

E (˜ x[j]) = µ, 1 ≤ j ≤ N

  • 2. there is a function a˜

x such that

E (˜ x[j1]˜ x[j2]) = ac˜

x(j2 − j1 mod N),

0 ≤ j1, j2 ≤ N − 1 i.e. it has translation-invariant covariance

slide-45
SLIDE 45

Autocovariance

ac˜

x is the autocovariance of ˜

x For any j, ac˜

x(j) = ac˜ x(−j) = ac˜ x(N − j)

Σ˜

x =

    ac˜

x(0)

ac˜

x(N − 1)

· · · ac˜

x(1)

ac˜

x(1)

ac˜

x(0)

· · · ac˜

x(2)

· · · ac˜

x(N − 1)

ac˜

x(N − 2)

· · · ac˜

x(0)

    =

x

a ↓1

˜ x

a ↓2

˜ x

· · · a ↓N−1

˜ x

  • where

x :=

    ac˜

x(0)

ac˜

x(1)

ac˜

x(2)

· · ·    

slide-46
SLIDE 46

Circulant matrix

Each column vector is a unit circular shift of previous column     a d c b b a d c c b a d d c b a    

slide-47
SLIDE 47

Sample covariance matrix

0.1 0.2 0.3 0.4 0.5

slide-48
SLIDE 48

Eigendecomposition of circulant matrix

Any circulant matrix C ∈ CN×N can be written as C := 1 N F ∗

[N]ΛF[N]

where F[N] is the DFT matrix and Λ is a diagonal matrix

slide-49
SLIDE 49

Proof

For any vector x ∈ CN Cx = c ∗ x = 1 N F ∗

[N] diag(ˆ

c)F[N]x

slide-50
SLIDE 50

Eigendecomposition of circulant covariance matrix

A valid eigendecomposition is given by 1 √ N F ∗

[N] diag(ˆ

c) 1 √ N F[N] If ˆ c have different values, singular vectors are sinusoids!

slide-51
SLIDE 51

PCA on stationary vector

Let ˜ x be wide-sense stationary with autocovariance vector a˜

x

The eigendecomposition of the covariance matrix of ˜ x equals Σ˜

x = 1

N F ∗ diag(ˆ a˜

x)F

slide-52
SLIDE 52

CIFAR-10 images

slide-53
SLIDE 53

Rows of covariance matrix

1 4 8 12

slide-54
SLIDE 54

Rows of covariance matrix

15 16 17 18

slide-55
SLIDE 55

Rows of covariance matrix

20 24 28 32

slide-56
SLIDE 56

Principal directions

1 2 3 4 5 6 7 8 9 10

slide-57
SLIDE 57

Principal directions

15 20 25 30 40 50 60 70 80 90

slide-58
SLIDE 58

Principal directions

100 150 100 250 300 400 500 600 800 1000

slide-59
SLIDE 59

PCA of natural images

Principal directions tend to be sinusoidal This suggests using 2D sinusoids for dimensionality reduction JPEG compresses images using discrete cosine transform (DCT):

  • 1. Image is divided into 8 × 8 patches
  • 2. Each DCT band is quantized differently (more bits for lower

frequencies)

slide-60
SLIDE 60

DCT basis vectors

slide-61
SLIDE 61

Projection of each 8x8 block onto first DCT coefficients

1 5 15 30 50

slide-62
SLIDE 62

Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering

slide-63
SLIDE 63

Signal estimation

Goal: Estimate N-dimensional signal from N-dimensional data Minimum MSE estimator is conditional mean (usually intractable) Linear minimum MSE estimator?

slide-64
SLIDE 64

Linear MMSE

Let ˜ y and ˜ x be N-dimensional zero-mean random vectors If Σ˜

x is full rank, then

Σ−1

˜ x Σ˜ x ˜ y := arg min B E

  • ˜

y − BT ˜ x

  • 2

2

  • Σ˜

x ˜ y := E

  • ˜

x ˜ yT

slide-65
SLIDE 65

Proof

The cost function can be decomposed into E

  • ˜

y − BT ˜ x

  • 2

2

  • =

n

  • j=1

E

  • ˜

y[j] − BT

j ˜

x 2 Each one is a linear regression problem with optimal estimator Σ−1

˜ x (Σ˜ x ˜ y)j = arg min Bj E

y[j] − ˜ xTBj)2 where (Σ˜

x ˜ y)j is the jth column of Σ˜ x ˜ y

slide-66
SLIDE 66

Joint stationarity

˜ x and ˜ y are jointly wide-sense or weak-sense stationary if

  • 1. they are each wide-sense or weak-sense stationary
  • 2. there is a function cc˜

x,˜ y such that

E (˜ x[j1]˜ y[j2]) = cc˜

x ˜ y(j2 − j1 mod N),

0 ≤ j1, j2 ≤ N − 1 i.e. they have translation-invariant cross-covariance

slide-67
SLIDE 67

Cross-covariance

cc˜

x ˜ y is the cross-covariance of ˜

x and ˜ y Σ˜

x ˜ y

    cc˜

x ˜ y(0)

cc˜

x ˜ y(N − 1)

· · · cc˜

x ˜ y(1)

cc˜

x ˜ y(1)

cc˜

x ˜ y(0)

· · · cc˜

x ˜ y(2)

· · · cc˜

x ˜ y(N − 1)

cc˜

x ˜ y(N − 2)

· · · cc˜

x ˜ y(0)

    =

x ˜ y

c ↓1

˜ x

c ↓2

˜ x

· · · c ↓N−1

˜ x

  • where

x ˜ y :=

    cc˜

x ˜ y(0)

cc˜

x ˜ y(1)

cc˜

x ˜ y(2)

· · ·    

slide-68
SLIDE 68

Wiener filter

Let ˜ x and ˜ y be zero-mean and jointly stationary The linear estimate of ˜ y given ˜ x that minimizes MSE as the convolution

  • f ˜

x with the Wiener filter w, defined by ˆ w[k] := Cov(˜ xF[k], ˜ yF[k]) Var(˜ xF[k]) , 0 ≤ k ≤ N − 1 where ˜ xF and ˜ yF denote the DFT coefficients of ˜ x and ˜ y, and Cov(˜ xF[k], ˜ yF[k]) := E

  • ˜

xF[k]˜ yF[k]

  • Var(˜

xF[k]) := E

xF[k]|2 , 0 ≤ k ≤ N − 1

slide-69
SLIDE 69

Proof

Σ˜

x = 1

N F ∗ diag(ˆ a˜

x)F

Σ˜

x ˜ y = 1

N F ∗ diag(ˆ c˜

x)F

ΣT

˜ x ˜ yΣ−1 ˜ x

= 1 N F ∗ diag(ˆ a˜

x)F

−1 1 N F ∗ diag(ˆ c˜

x)F

= 1 N F ∗ diag(ˆ a−1

˜ x )F 1

N F ∗ diag(ˆ c˜

x)F

= 1 N F ∗ diag(ˆ a−1

˜ x ˆ

x)F

slide-70
SLIDE 70

Proof

Σ˜

xF := E (F ˜

x(F ˜ x)∗) = FE

  • ˜

x ˜ xT F ∗ = FΣ˜

xF ∗

= F 1 N F ∗ diag(ˆ a˜

x)FF ∗

= N diag(ˆ a˜

x)

ˆ a˜

x[k] = Var(˜

xF[k]) N , 0 ≤ k ≤ N − 1

slide-71
SLIDE 71

Proof

Σ˜

xF ˜ yF := E (F ˜

x(F ˜ y)∗) = FE

  • ˜

x ˜ yT F ∗ = FΣ˜

x ˜ yF ∗

= F 1 N F ∗ diag(ˆ c˜

x)FF ∗

= N diag(ˆ c˜

x)

ˆ c˜

x ˜ y[k] = Cov(˜

xF[k], ˜ yF[k]) N , 0 ≤ k ≤ N − 1

slide-72
SLIDE 72

Proof

ΣT

˜ x ˜ yΣ−1 ˜ x

= F ∗ diag(ˆ a−1

˜ x ˆ

x)F

= F ∗ diagN−1

k=0

Cov(˜ xF[k], ˜ yF[k]) Var(˜ xF[k])

  • F
slide-73
SLIDE 73

Least squares

Training set (y1, x1), (y2, x2), . . . , (yn, xn) optimal LTI estimator w := arg min

v n

  • j=1

||yj − v ∗ xj||2

2

is Wiener filter with transfer function ˆ w = cov

  • ˆ

X[k], ˆ Y[k]

  • var
  • ˆ

X[k]

  • ,

0 ≤ k ≤ N − 1 where cov

  • ˆ

X[k], ˆ Y[k]

  • := 1

n

n

  • j=1

ˆ xj[k]ˆ yj[k] var

  • ˆ

X[k]

  • := 1

n

n

  • j=1

|ˆ xj[k]|2 , 0 ≤ k ≤ N − 1

slide-74
SLIDE 74

Proof

n

  • j=1

||yj − v ∗ xj||2

2 = n

  • j=1
  • 1

√ N F ∗(ˆ yj − ˆ v ◦ ˆ xj)

  • 2

2

= 1 N2

n

  • j=1

||ˆ yj − ˆ v ◦ ˆ xj||2

2

= 1 N2

n

  • j=1

N

  • k=1

|ˆ yj[k] − ˆ v[k]ˆ xj[k]|2 := 1 N2

N

  • k=1

Ck (ˆ v[k])

slide-75
SLIDE 75

Denoising

Measurements ˜ x = ˜ y + ˜ z, where ˜ z is zero-mean Gaussian noise with variance σ2, independent of ˜ y

slide-76
SLIDE 76

Noise

Linear transformation A˜ z of a Gaussian vector with mean µ and covariance matrix Σ is Gaussian with mean Aµ and cov. matrix AΣA∗ Fourier coefficients of noise are Gaussian with zero mean and covariance matrix F[N]σ2IF ∗

[N] = Nσ2I (iid Gaussian with variance Nσ2)

slide-77
SLIDE 77

Wiener filter

Cov(˜ xF[k], ˜ yF[k]) = E

  • ˜

xF[k]˜ yF[k]

  • = E
  • ˜

yF[k]˜ yF[k]

  • + E
  • ˜

zF[k]˜ yF[k]

  • = Var (˜

yF[k]) Var(˜ xF[k]) = Var (˜ yF[k]) + Var (˜ zF[k]) = Var (˜ yF[k]) + σ2 ˆ w[k] = Var (˜ yF[k]) Var (˜ yF[k]) + σ2 , 0 ≤ k ≤ N − 1

slide-78
SLIDE 78

Audio data

4.0 4.1 4.2 4.3 4.4 4.5 Time (s) 7500 5000 2500 2500 5000 7500

slide-79
SLIDE 79

Audio data: Variance of Fourier coefficients

4000 3000 2000 1000 1000 2000 3000 4000 Frequency (Hz) 10

1

100 101 102 103 104 105

slide-80
SLIDE 80

Wiener filter: σ = 0.02

Frequency Time

4000 2000 2000 4000 Frequency (Hz) 10

4

10

3

10

2

10

1

100 3.115 3.120 3.125 3.130 3.135

Time (s) 0.0 0.2 0.4 0.6 0.8

slide-81
SLIDE 81

Wiener filter: σ = 0.1

Frequency Time

4000 2000 2000 4000 Frequency (Hz) 10

4

10

3

10

2

10

1

100 3.115 3.120 3.125 3.130 3.135

Time (s) 0.0 0.2 0.4 0.6 0.8

slide-82
SLIDE 82

Wiener filter: σ = 0.5

Frequency Time

4000 2000 2000 4000 Frequency (Hz) 10

4

10

3

10

2

10

1

100 3.115 3.120 3.125 3.130 3.135

Time (s) 0.0 0.2 0.4 0.6 0.8

slide-83
SLIDE 83

Example: σ = 0.1

Clean Noisy Denoised

4.0 4.1 4.2 4.3 4.4 4.5 Time (s) 6000 4000 2000 2000 4000 6000 8000 4.0 4.1 4.2 4.3 4.4 4.5 Time (s) 7500 5000 2500 2500 5000 7500 4.0 4.1 4.2 4.3 4.4 4.5 Time (s) 8000 6000 4000 2000 2000 4000 6000 8000 4000 3000 2000 1000 1000 2000 3000 4000 Frequency (Hz) 10

3

10

2

10

1

100 101 102 103 4000 3000 2000 1000 1000 2000 3000 4000 Frequency (Hz) 10

3

10

2

10

1

100 101 102 103 4000 3000 2000 1000 1000 2000 3000 4000 Frequency (Hz) 10

3

10

2

10

1

100 101 102 103

slide-84
SLIDE 84

Example: σ = 0.1

4.247 4.248 4.249 4.250 4.251 4.252 4.253 Time (s) 10000 5000 5000 10000 Signal Denoised signal Noisy data

slide-85
SLIDE 85

Example: σ = 0.5

Clean Noisy Denoised

4.0 4.1 4.2 4.3 4.4 4.5 Time (s) 6000 4000 2000 2000 4000 6000 8000 4.0 4.1 4.2 4.3 4.4 4.5 Time (s) 15000 10000 5000 5000 10000 15000 4.0 4.1 4.2 4.3 4.4 4.5 Time (s) 7500 5000 2500 2500 5000 7500 10000 4000 3000 2000 1000 1000 2000 3000 4000 Frequency (Hz) 10

3

10

2

10

1

100 101 102 103 4000 3000 2000 1000 1000 2000 3000 4000 Frequency (Hz) 10

3

10

2

10

1

100 101 102 103 4000 3000 2000 1000 1000 2000 3000 4000 Frequency (Hz) 10

3

10

2

10

1

100 101 102 103

slide-86
SLIDE 86

Example: σ = 0.5

4.247 4.248 4.249 4.250 4.251 4.252 4.253 Time (s) 10000 5000 5000 10000 Signal Denoised signal Noisy data

slide-87
SLIDE 87

Image data

slide-88
SLIDE 88

Image data: Variance of Fourier coefficients

90 65 40 15 10 35 60 85

k2

90 65 40 15 10 35 60 85

k1 101 102 103 104 105 106 107 108

slide-89
SLIDE 89

Wiener filter: σ = 0.04

Frequency Space

50 50 k2 75 50 25 25 50 75 k1 10

3

10

2

10

1

0.00 0.05 0.10 0.15 0.20 0.25 0.30

slide-90
SLIDE 90

Wiener filter: σ = 0.1

Frequency Space

50 50 k2 75 50 25 25 50 75 k1 10

3

10

2

10

1

0.00 0.05 0.10 0.15 0.20 0.25 0.30

slide-91
SLIDE 91

Wiener filter: σ = 0.2

Frequency Space

50 50 k2 75 50 25 25 50 75 k1 10

3

10

2

10

1

0.00 0.05 0.10 0.15 0.20 0.25 0.30

slide-92
SLIDE 92

Example: σ = 0.04

Clean Noisy Denoised

50 50 k2 75 50 25 25 50 75 k1 10

3

10

2

10

1

100 101 102 103 104 50 50 k2 75 50 25 25 50 75 k1 10

3

10

2

10

1

100 101 102 103 104 50 50 k2 75 50 25 25 50 75 k1 10

3

10

2

10

1

100 101 102 103 104

slide-93
SLIDE 93

Example: σ = 0.1

Clean Noisy Denoised

50 50 k2 75 50 25 25 50 75 k1 10

3

10

2

10

1

100 101 102 103 104 50 50 k2 75 50 25 25 50 75 k1 10

3

10

2

10

1

100 101 102 103 104 50 50 k2 75 50 25 25 50 75 k1 10

3

10

2

10

1

100 101 102 103 104

slide-94
SLIDE 94

Example: σ = 0.2

Clean Noisy Denoised

50 50 k2 75 50 25 25 50 75 k1 10

3

10

2

10

1

100 101 102 103 104 50 50 k2 75 50 25 25 50 75 k1 10

3

10

2

10

1

100 101 102 103 104 50 50 k2 75 50 25 25 50 75 k1 10

3

10

2

10

1

100 101 102 103 104