Estimation of Autoregressive Processes with Sparse Parameters Abbas - - PowerPoint PPT Presentation

estimation of autoregressive processes with sparse
SMART_READER_LITE
LIVE PREVIEW

Estimation of Autoregressive Processes with Sparse Parameters Abbas - - PowerPoint PPT Presentation

Estimation of Autoregressive Processes with Sparse Parameters Abbas Kazemipour MAST Group Meeting University of Maryland. College Park kaazemi@umd.edu November 18, 2015 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 1 / 1 Overview


slide-1
SLIDE 1

Estimation of Autoregressive Processes with Sparse Parameters

Abbas Kazemipour

MAST Group Meeting University of Maryland. College Park kaazemi@umd.edu

November 18, 2015

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 1 / 1

slide-2
SLIDE 2

Overview

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 2 / 1

slide-3
SLIDE 3

Introduction

1 AR(p) :

xk = θ1xk−1 + θ2xk−2 + · · · + θpxk−p + wk = θT xk−1

k−p + wk,

(1)

2 {wk} : i.i.d innovation sequence.] 3 Assumption 1: σ2

w = 1. (WLOG)

4 Output of an LTI system with the z-transform of its impulse

response being H(z) = 1 1 − θiz−i . (2)

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 3 / 1

slide-4
SLIDE 4

Introduction

1 AR(p) :

xk = θ1xk−1 + θ2xk−2 + · · · + θpxk−p + wk = θT xk−1

k−p + wk,

(1)

2 {wk} : i.i.d innovation sequence.] 3 Assumption 1: σ2

w = 1. (WLOG)

4 Output of an LTI system with the z-transform of its impulse

response being H(z) = 1 1 − θiz−i . (2)

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 3 / 1

slide-5
SLIDE 5

Introduction

1 AR(p) :

xk = θ1xk−1 + θ2xk−2 + · · · + θpxk−p + wk = θT xk−1

k−p + wk,

(1)

2 {wk} : i.i.d innovation sequence.] 3 Assumption 1: σ2

w = 1. (WLOG)

4 Output of an LTI system with the z-transform of its impulse

response being H(z) = 1 1 − θiz−i . (2)

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 3 / 1

slide-6
SLIDE 6

Introduction

1 AR(p) :

xk = θ1xk−1 + θ2xk−2 + · · · + θpxk−p + wk = θT xk−1

k−p + wk,

(1)

2 {wk} : i.i.d innovation sequence.] 3 Assumption 1: σ2

w = 1. (WLOG)

4 Output of an LTI system with the z-transform of its impulse

response being H(z) = 1 1 − θiz−i . (2)

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 3 / 1

slide-7
SLIDE 7

Introduction

1 Assumption 2: |θi|≤ 1 − ǫ < 1

All the poles inside the unit circle ⇒ stable

2 AR(p) process given by {xi}∞

−∞ in (1) is stationary (in the strict

sense).

3 By (2) the power spectral density of the process:

S(ω) = σ2

w

|1 −

i θie−jwi|2 =

1 |1 −

i θie−jwi|2

(3)

4 Assumption 3: θ is an s-sparse vector with s ≪ p. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 4 / 1

slide-8
SLIDE 8

Introduction

1 Assumption 2: |θi|≤ 1 − ǫ < 1

All the poles inside the unit circle ⇒ stable

2 AR(p) process given by {xi}∞

−∞ in (1) is stationary (in the strict

sense).

3 By (2) the power spectral density of the process:

S(ω) = σ2

w

|1 −

i θie−jwi|2 =

1 |1 −

i θie−jwi|2

(3)

4 Assumption 3: θ is an s-sparse vector with s ≪ p. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 4 / 1

slide-9
SLIDE 9

Introduction

1 Assumption 2: |θi|≤ 1 − ǫ < 1

All the poles inside the unit circle ⇒ stable

2 AR(p) process given by {xi}∞

−∞ in (1) is stationary (in the strict

sense).

3 By (2) the power spectral density of the process:

S(ω) = σ2

w

|1 −

i θie−jwi|2 =

1 |1 −

i θie−jwi|2

(3)

4 Assumption 3: θ is an s-sparse vector with s ≪ p. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 4 / 1

slide-10
SLIDE 10

Introduction

1 Assumption 2: |θi|≤ 1 − ǫ < 1

All the poles inside the unit circle ⇒ stable

2 AR(p) process given by {xi}∞

−∞ in (1) is stationary (in the strict

sense).

3 By (2) the power spectral density of the process:

S(ω) = σ2

w

|1 −

i θie−jwi|2 =

1 |1 −

i θie−jwi|2

(3)

4 Assumption 3: θ is an s-sparse vector with s ≪ p. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 4 / 1

slide-11
SLIDE 11

Introduction

1 We observe n consecutive snapshots of length p (a total of

n + p − 1 samples): {xk}n

k=−p+1

2 Questions:

What is a good optimization problem for estimating θ? How does a suitable n compare to p, s order-wise? How does such an estimator perform compared to the traditional estimation methods?

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 5 / 1

slide-12
SLIDE 12

Introduction

1 We observe n consecutive snapshots of length p (a total of

n + p − 1 samples): {xk}n

k=−p+1

2 Questions:

What is a good optimization problem for estimating θ? How does a suitable n compare to p, s order-wise? How does such an estimator perform compared to the traditional estimation methods?

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 5 / 1

slide-13
SLIDE 13

Introduction

1 We observe n consecutive snapshots of length p (a total of

n + p − 1 samples): {xk}n

k=−p+1

2 Questions:

What is a good optimization problem for estimating θ? How does a suitable n compare to p, s order-wise? How does such an estimator perform compared to the traditional estimation methods?

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 5 / 1

slide-14
SLIDE 14

Introduction

1 We observe n consecutive snapshots of length p (a total of

n + p − 1 samples): {xk}n

k=−p+1

2 Questions:

What is a good optimization problem for estimating θ? How does a suitable n compare to p, s order-wise? How does such an estimator perform compared to the traditional estimation methods?

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 5 / 1

slide-15
SLIDE 15

Introduction

1 We observe n consecutive snapshots of length p (a total of

n + p − 1 samples): {xk}n

k=−p+1

2 Questions:

What is a good optimization problem for estimating θ? How does a suitable n compare to p, s order-wise? How does such an estimator perform compared to the traditional estimation methods?

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 5 / 1

slide-16
SLIDE 16

Introduction

1 Yule-Walker equations:

Rp×pθ = r−1

−p,

r0 = θT r−1

−p + 1 =

1 |1 −

i θi|2 ,

(4)

2 R := Rp×p = E[xp

1xpT 1 ] : p × p covariance matrix of the process

3 rk = E[xixi+k] is the k-th autocorrelation 4 If n ≫ p ⇒ estimate R, rk’s + Yule-Walker

Does not exploit the sparsity of θ Does not perform well when n is small or comparable with p. Poor estimates mainly due to the fact that requires an inversion of

  • Rp which might not be numerically stable.

5 Usually biased estimates are used at the cost of distorting the

Yule-Walker equations.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 6 / 1

slide-17
SLIDE 17

Introduction

1 Yule-Walker equations:

Rp×pθ = r−1

−p,

r0 = θT r−1

−p + 1 =

1 |1 −

i θi|2 ,

(4)

2 R := Rp×p = E[xp

1xpT 1 ] : p × p covariance matrix of the process

3 rk = E[xixi+k] is the k-th autocorrelation 4 If n ≫ p ⇒ estimate R, rk’s + Yule-Walker

Does not exploit the sparsity of θ Does not perform well when n is small or comparable with p. Poor estimates mainly due to the fact that requires an inversion of

  • Rp which might not be numerically stable.

5 Usually biased estimates are used at the cost of distorting the

Yule-Walker equations.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 6 / 1

slide-18
SLIDE 18

Introduction

1 Yule-Walker equations:

Rp×pθ = r−1

−p,

r0 = θT r−1

−p + 1 =

1 |1 −

i θi|2 ,

(4)

2 R := Rp×p = E[xp

1xpT 1 ] : p × p covariance matrix of the process

3 rk = E[xixi+k] is the k-th autocorrelation 4 If n ≫ p ⇒ estimate R, rk’s + Yule-Walker

Does not exploit the sparsity of θ Does not perform well when n is small or comparable with p. Poor estimates mainly due to the fact that requires an inversion of

  • Rp which might not be numerically stable.

5 Usually biased estimates are used at the cost of distorting the

Yule-Walker equations.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 6 / 1

slide-19
SLIDE 19

Introduction

1 Yule-Walker equations:

Rp×pθ = r−1

−p,

r0 = θT r−1

−p + 1 =

1 |1 −

i θi|2 ,

(4)

2 R := Rp×p = E[xp

1xpT 1 ] : p × p covariance matrix of the process

3 rk = E[xixi+k] is the k-th autocorrelation 4 If n ≫ p ⇒ estimate R, rk’s + Yule-Walker

Does not exploit the sparsity of θ Does not perform well when n is small or comparable with p. Poor estimates mainly due to the fact that requires an inversion of

  • Rp which might not be numerically stable.

5 Usually biased estimates are used at the cost of distorting the

Yule-Walker equations.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 6 / 1

slide-20
SLIDE 20

Introduction

1 Yule-Walker equations:

Rp×pθ = r−1

−p,

r0 = θT r−1

−p + 1 =

1 |1 −

i θi|2 ,

(4)

2 R := Rp×p = E[xp

1xpT 1 ] : p × p covariance matrix of the process

3 rk = E[xixi+k] is the k-th autocorrelation 4 If n ≫ p ⇒ estimate R, rk’s + Yule-Walker

Does not exploit the sparsity of θ Does not perform well when n is small or comparable with p. Poor estimates mainly due to the fact that requires an inversion of

  • Rp which might not be numerically stable.

5 Usually biased estimates are used at the cost of distorting the

Yule-Walker equations.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 6 / 1

slide-21
SLIDE 21

Introduction

1 Yule-Walker equations:

Rp×pθ = r−1

−p,

r0 = θT r−1

−p + 1 =

1 |1 −

i θi|2 ,

(4)

2 R := Rp×p = E[xp

1xpT 1 ] : p × p covariance matrix of the process

3 rk = E[xixi+k] is the k-th autocorrelation 4 If n ≫ p ⇒ estimate R, rk’s + Yule-Walker

Does not exploit the sparsity of θ Does not perform well when n is small or comparable with p. Poor estimates mainly due to the fact that requires an inversion of

  • Rp which might not be numerically stable.

5 Usually biased estimates are used at the cost of distorting the

Yule-Walker equations.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 6 / 1

slide-22
SLIDE 22

Introduction

1 Yule-Walker equations:

Rp×pθ = r−1

−p,

r0 = θT r−1

−p + 1 =

1 |1 −

i θi|2 ,

(4)

2 R := Rp×p = E[xp

1xpT 1 ] : p × p covariance matrix of the process

3 rk = E[xixi+k] is the k-th autocorrelation 4 If n ≫ p ⇒ estimate R, rk’s + Yule-Walker

Does not exploit the sparsity of θ Does not perform well when n is small or comparable with p. Poor estimates mainly due to the fact that requires an inversion of

  • Rp which might not be numerically stable.

5 Usually biased estimates are used at the cost of distorting the

Yule-Walker equations.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 6 / 1

slide-23
SLIDE 23

Introduction

1 Yule-Walker equations:

Rp×pθ = r−1

−p,

r0 = θT r−1

−p + 1 =

1 |1 −

i θi|2 ,

(4)

2 R := Rp×p = E[xp

1xpT 1 ] : p × p covariance matrix of the process

3 rk = E[xixi+k] is the k-th autocorrelation 4 If n ≫ p ⇒ estimate R, rk’s + Yule-Walker

Does not exploit the sparsity of θ Does not perform well when n is small or comparable with p. Poor estimates mainly due to the fact that requires an inversion of

  • Rp which might not be numerically stable.

5 Usually biased estimates are used at the cost of distorting the

Yule-Walker equations.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 6 / 1

slide-24
SLIDE 24

Our Formulation

1 LASSO type estimator given by a conditional log-likelihood

penalization: minimize

θ∈Rp

1 nxn

1 − Xθ2 2+γnθ1,

(5) where X =      xn−1 xn−2 · · · xn−p xn−2 xn−3 · · · xn−p−1 . . . . . . ... . . . x0 x−1 · · · x−p+1      . (6)

2 X is toeplitz matrix with highly correlated elements. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 7 / 1

slide-25
SLIDE 25

Our Formulation

1 LASSO type estimator given by a conditional log-likelihood

penalization: minimize

θ∈Rp

1 nxn

1 − Xθ2 2+γnθ1,

(5) where X =      xn−1 xn−2 · · · xn−p xn−2 xn−3 · · · xn−p−1 . . . . . . ... . . . x0 x−1 · · · x−p+1      . (6)

2 X is toeplitz matrix with highly correlated elements. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 7 / 1

slide-26
SLIDE 26

Results

Theorem

If σs(θ) = O(√s), there exist positive constants c1, c2, c3 and cǫ such that for n > cǫsp2/3(log p)2/3 and a choice of regularization parameter γn = c1

  • log p

n , any solution

θsp to (??) satisfies the bound

  • θsp − θ
  • 2 ≤ c2
  • s log p

n +

  • c2σs(θ)

4

  • log p

n , (7) with probability greater than 1 − O( 1

nc3 ).

1 n can be much less than p 2 Better than Yule-Walker Abbas Kazemipour (UMD) Sparse AR November 18, 2015 8 / 1

slide-27
SLIDE 27

Results

Theorem

If σs(θ) = O(√s), there exist positive constants c1, c2, c3 and cǫ such that for n > cǫsp2/3(log p)2/3 and a choice of regularization parameter γn = c1

  • log p

n , any solution

θsp to (??) satisfies the bound

  • θsp − θ
  • 2 ≤ c2
  • s log p

n +

  • c2σs(θ)

4

  • log p

n , (7) with probability greater than 1 − O( 1

nc3 ).

1 n can be much less than p 2 Better than Yule-Walker Abbas Kazemipour (UMD) Sparse AR November 18, 2015 8 / 1

slide-28
SLIDE 28

Results

Theorem

If σs(θ) = O(√s), there exist positive constants c1, c2, c3 and cǫ such that for n > cǫsp2/3(log p)2/3 and a choice of regularization parameter γn = c1

  • log p

n , any solution

θsp to (??) satisfies the bound

  • θsp − θ
  • 2 ≤ c2
  • s log p

n +

  • c2σs(θ)

4

  • log p

n , (7) with probability greater than 1 − O( 1

nc3 ).

1 n can be much less than p 2 Better than Yule-Walker Abbas Kazemipour (UMD) Sparse AR November 18, 2015 8 / 1

slide-29
SLIDE 29

Simulation: p = 100, n = 500,s = 3 and γn = 0.1.

1

20 40 60 80 100

  • 0.2
  • 0.1

0.1

True Parameters

20 40 60 80 100

  • 0.05

0.05

Regularized ML

20 40 60 80 100

  • 0.2
  • 0.1

0.1

Yule-Walker

Figure: Recovery Results for n = 500, p = 100, s = 3

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 9 / 1

slide-30
SLIDE 30

Proof of the Main Theorem

Lemma (Cone Condition)

For a choice of the regularization parameter γn = 2

nXT (xn 1 − Xθ) ∞

the optimal error h = θ − θ belongs to the cone C := {h ∈ Rp|hSc1≤ 3hS1}. (8)

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 10 / 1

slide-31
SLIDE 31

Proof of the Main Theorem

Definition (Restricted Eigenvalue Condition)

X is said to satisfy the RE condition of order s if λmax(s)θ2

2≥ 1

nθT XT Xθ = 1 nXθ2

2≥ λmin(s)θ2 2,

(9) for all θ which is s-sparse.

1 Essentially requires the eigenvalues of all n × s submatrices of X

to be bounded and strictly positive.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 11 / 1

slide-32
SLIDE 32

Proof of the Main Theorem

Definition (Restricted Eigenvalue Condition)

X is said to satisfy the RE condition of order s if λmax(s)θ2

2≥ 1

nθT XT Xθ = 1 nXθ2

2≥ λmin(s)θ2 2,

(9) for all θ which is s-sparse.

1 Essentially requires the eigenvalues of all n × s submatrices of X

to be bounded and strictly positive.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 11 / 1

slide-33
SLIDE 33

Proof of the Main Theorem

Definition (Restricted Strong Convexity)

X is said to satisfy the RSC condition of order s if 1 nhT XT Xh = 1 nXh2

2≥ κh2 2,

∀h ∈ C. (10)

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 12 / 1

slide-34
SLIDE 34

Proof of the Main Theorem

Lemma (Theorem 1 of Negahban)

If X satisfies the RSC condition then any optimal solution θ satisfies

  • θ − θ2≤ 2√sγn

κ

  • θ − θ1≤ 6sγn

κ . (⋆)

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 13 / 1

slide-35
SLIDE 35

Proof of the Main Theorem

Lemma (lemma 4.1 of Bickel)

If X satisfies the RE condition of order s⋆ = (r + 1)s then the RSC condition is also satisfied with κ =

  • λmin((r + 1)s)
  • 1 − 3
  • λmax(rs)

rλmin ((r + 1)s)

  • .

(11)

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 14 / 1

slide-36
SLIDE 36

Proof of the Main Theorem

1 Step 1:

Finding a lower bound on κ

2 Step 2:

Finding an upper bound on γn

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 15 / 1

slide-37
SLIDE 37

Proof of the Main Theorem

1 Step 1:

Finding a lower bound on κ

2 Step 2:

Finding an upper bound on γn

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 15 / 1

slide-38
SLIDE 38

Finding a lower bound on κ

Lemma (Haykin)

Let R ∈ Rk×k be the k × k covariance matrix of a stationary process with power spectral density S(ω), and denote its maximum and minimum eigenvalues by φmax(k) and φmin(k) respectively then φmin(k) ↓ inf

ω S(ω),

(12) and φmax(k) ↑ sup

ω S(ω).

(13)

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 16 / 1

slide-39
SLIDE 39

Convergence of the Eigenvalues of R

1 k 100 200 300 0.5 1 1.5 2 2.5 3 3.5 4 Minimum and Maximum Eigenvalues of R

λmin(k) λmax(k)

f

0.2 0.4 0.6 0.8 1 0.5 1 1.5 2 2.5 3 3.5 4

Power Spectral Density

Figure: Recovery Results for n = 500, p = 100, s = 3

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 17 / 1

slide-40
SLIDE 40

Finding a lower bound on κ

Corollary (RE of R)

Under the assumptions of our problem, for an AR process R satisfies RE condition (of any order) for λmax = 1/ǫ2 and λmin = 1/4.

Proof.

For an AR(p) process S(ω) = σ2

w

|1 − θie−jω|2 . Using the assumption |θi|≤ 1 − ǫ in conjunction with lemma 6 proves the claim.

1 RE condition also holds for any stationary process satisfying

infω S(ω) > 0 and supω S(ω) < ∞!

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 18 / 1

slide-41
SLIDE 41

Finding a lower bound on κ

Corollary (RE of R)

Under the assumptions of our problem, for an AR process R satisfies RE condition (of any order) for λmax = 1/ǫ2 and λmin = 1/4.

Proof.

For an AR(p) process S(ω) = σ2

w

|1 − θie−jω|2 . Using the assumption |θi|≤ 1 − ǫ in conjunction with lemma 6 proves the claim.

1 RE condition also holds for any stationary process satisfying

infω S(ω) > 0 and supω S(ω) < ∞!

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 18 / 1

slide-42
SLIDE 42

Finding a lower bound on κ

Lemma (RE of R)

If R satisfies the RE condition with parameters λmax and λmin, then R satisfies the RE condition of order s⋆ with parameters λ′

max = λmax + ts⋆ and λ′ min = λmin − ts⋆, where t = maxi,j|

Rij − Rij|.

Proof.

For every s⋆-sparse vector θ we have θT Rθ ≥ θT Rθ − tθ2

1≥ (λmin − ts⋆)θ2 2,

θT Rθ ≤ θT Rθ + tθ2

1≤ (λmax + ts⋆)θ2 2,

which is what we claimed.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 19 / 1

slide-43
SLIDE 43

Finding a lower bound on κ

Lemma (RE of R)

If R satisfies the RE condition with parameters λmax and λmin, then R satisfies the RE condition of order s⋆ with parameters λ′

max = λmax + ts⋆ and λ′ min = λmin − ts⋆, where t = maxi,j|

Rij − Rij|.

Proof.

For every s⋆-sparse vector θ we have θT Rθ ≥ θT Rθ − tθ2

1≥ (λmin − ts⋆)θ2 2,

θT Rθ ≤ θT Rθ + tθ2

1≤ (λmax + ts⋆)θ2 2,

which is what we claimed.

1 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 20 / 1

slide-44
SLIDE 44

Finding a lower bound on κ

Lemma (RE of R)

If R satisfies the RE condition with parameters λmax and λmin, then R satisfies the RE condition of order s⋆ with parameters λ′

max = λmax + ts⋆ and λ′ min = λmin − ts⋆, where t = maxi,j|

Rij − Rij|.

Proof.

For every s⋆-sparse vector θ we have θT Rθ ≥ θT Rθ − tθ2

1≥ (λmin − ts⋆)θ2 2,

θT Rθ ≤ θT Rθ + tθ2

1≤ (λmax + ts⋆)θ2 2,

which is what we claimed.

1 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 20 / 1

slide-45
SLIDE 45

Finding a lower bound on κ

Lemma (RE of R)

If R satisfies the RE condition with parameters λmax and λmin, then R satisfies the RE condition of order s⋆ with parameters λ′

max = λmax + ts⋆ and λ′ min = λmin − ts⋆, where t = maxi,j|

Rij − Rij|.

1 This holds for every t. 2 We will be interested in t = λmin/2s⋆. 3 Noting that

R = XT X/n, corollary 7 + lemma 5 with r = 288/(ǫ2 + 1/8): ⇒ X satisfies the RSC with parameter κ = 1/4 √ 2.

4 In order to complete this bound it only remains to show that t can

be chosen to be suitably small.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 21 / 1

slide-46
SLIDE 46

Finding a lower bound on κ

Lemma (RE of R)

If R satisfies the RE condition with parameters λmax and λmin, then R satisfies the RE condition of order s⋆ with parameters λ′

max = λmax + ts⋆ and λ′ min = λmin − ts⋆, where t = maxi,j|

Rij − Rij|.

1 This holds for every t. 2 We will be interested in t = λmin/2s⋆. 3 Noting that

R = XT X/n, corollary 7 + lemma 5 with r = 288/(ǫ2 + 1/8): ⇒ X satisfies the RSC with parameter κ = 1/4 √ 2.

4 In order to complete this bound it only remains to show that t can

be chosen to be suitably small.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 21 / 1

slide-47
SLIDE 47

Finding a lower bound on κ

Lemma (RE of R)

If R satisfies the RE condition with parameters λmax and λmin, then R satisfies the RE condition of order s⋆ with parameters λ′

max = λmax + ts⋆ and λ′ min = λmin − ts⋆, where t = maxi,j|

Rij − Rij|.

1 This holds for every t. 2 We will be interested in t = λmin/2s⋆. 3 Noting that

R = XT X/n, corollary 7 + lemma 5 with r = 288/(ǫ2 + 1/8): ⇒ X satisfies the RSC with parameter κ = 1/4 √ 2.

4 In order to complete this bound it only remains to show that t can

be chosen to be suitably small.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 21 / 1

slide-48
SLIDE 48

Finding a lower bound on κ

Lemma (RE of R)

If R satisfies the RE condition with parameters λmax and λmin, then R satisfies the RE condition of order s⋆ with parameters λ′

max = λmax + ts⋆ and λ′ min = λmin − ts⋆, where t = maxi,j|

Rij − Rij|.

1 This holds for every t. 2 We will be interested in t = λmin/2s⋆. 3 Noting that

R = XT X/n, corollary 7 + lemma 5 with r = 288/(ǫ2 + 1/8): ⇒ X satisfies the RSC with parameter κ = 1/4 √ 2.

4 In order to complete this bound it only remains to show that t can

be chosen to be suitably small.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 21 / 1

slide-49
SLIDE 49

Finding a lower bound on κ

Lemma (RE of R)

If R satisfies the RE condition with parameters λmax and λmin, then R satisfies the RE condition of order s⋆ with parameters λ′

max = λmax + ts⋆ and λ′ min = λmin − ts⋆, where t = maxi,j|

Rij − Rij|.

1 This holds for every t. 2 We will be interested in t = λmin/2s⋆. 3 Noting that

R = XT X/n, corollary 7 + lemma 5 with r = 288/(ǫ2 + 1/8): ⇒ X satisfies the RSC with parameter κ = 1/4 √ 2.

4 In order to complete this bound it only remains to show that t can

be chosen to be suitably small.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 21 / 1

slide-50
SLIDE 50

Concentration Inequlatiy

Lemma (Concentration Inequality: Theorem 4 of Rudzkis)

Let xn

−p+1 be samples of a stationary process which satisfies

xk = ∞

j=−∞ bj−kwj, where wk’s are i.i.d random variables with

|E(wK

j )|≤ CK.K! , k = 2, 3, · · · ,

(14) and ∞

j=−∞|bj|< ∞. Then the biased sample autocorrelation given by

  • rb

k =

1 n + k

n+k

  • i,j=1,j−i=k

xixj satisfies P(| rb

k − rb k|> t) ≤ c1 exp

  • −c2t3/2√

n + k

  • ,

(15) for positive constants c1 and c2 which are independent of dimensions of the problem.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 22 / 1

slide-51
SLIDE 51

Concentration Inequlatiy

Corollary (Concentration Inequality for Unbiased Estimates)

The unbiased estimate satisfies P(| rk − rk|> t) ≤ c1 exp

  • −c2n3/2t3/2

n + k

  • .

(16)

1 Choose t⋆ =

λmin 2(r+1)s = cǫ/s + union bound (k = p in all inequ.):

P(max

i,j |

Rij − Rij|> t⋆) ≤ c1p2 exp

  • −c2n3/2t3/2

n + p

  • ≤ c1 exp

cǫn3/2 s3/2(n + p) + 2 log p

  • .

2 p ≫ n + choose n > csp2/3(log p)2/3 ⇒ bound on κ. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 23 / 1

slide-52
SLIDE 52

Concentration Inequlatiy

Corollary (Concentration Inequality for Unbiased Estimates)

The unbiased estimate satisfies P(| rk − rk|> t) ≤ c1 exp

  • −c2n3/2t3/2

n + k

  • .

(16)

1 Choose t⋆ =

λmin 2(r+1)s = cǫ/s + union bound (k = p in all inequ.):

P(max

i,j |

Rij − Rij|> t⋆) ≤ c1p2 exp

  • −c2n3/2t3/2

n + p

  • ≤ c1 exp

cǫn3/2 s3/2(n + p) + 2 log p

  • .

2 p ≫ n + choose n > csp2/3(log p)2/3 ⇒ bound on κ. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 23 / 1

slide-53
SLIDE 53

Concentration Inequlatiy

Corollary (Concentration Inequality for Unbiased Estimates)

The unbiased estimate satisfies P(| rk − rk|> t) ≤ c1 exp

  • −c2n3/2t3/2

n + k

  • .

(16)

1 Choose t⋆ =

λmin 2(r+1)s = cǫ/s + union bound (k = p in all inequ.):

P(max

i,j |

Rij − Rij|> t⋆) ≤ c1p2 exp

  • −c2n3/2t3/2

n + p

  • ≤ c1 exp

cǫn3/2 s3/2(n + p) + 2 log p

  • .

2 p ≫ n + choose n > csp2/3(log p)2/3 ⇒ bound on κ. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 23 / 1

slide-54
SLIDE 54

Finding an upper bound on γn

1 Gradient of the objective function xn

1 − Xθ2 2:

∇L(θ) := 2 nXT (xn

1 − Xθ),

2 Lemmas 8 and 4 suggest that a suitable choice of the

regularization parameter is given by: γn = ∇L(θ)∞.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 24 / 1

slide-55
SLIDE 55

Finding an upper bound on γn

1 Gradient of the objective function xn

1 − Xθ2 2:

∇L(θ) := 2 nXT (xn

1 − Xθ),

2 Lemmas 8 and 4 suggest that a suitable choice of the

regularization parameter is given by: γn = ∇L(θ)∞.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 24 / 1

slide-56
SLIDE 56

Finding an upper bound on γn

1 First it is easy to check that by the uncorrelatedness of wk’s we

have E [∇L(θ)] = 2 nE

  • XT (xn

1 − Xθ)

  • = 2

nE

  • XT wn

1

  • = 0.

(17)

2 In linear regression terminology, (17) is known as the

  • rthogonality principle.

3 We show that ∇L(θ) is concentrated around its mean. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 25 / 1

slide-57
SLIDE 57

Finding an upper bound on γn

1 First it is easy to check that by the uncorrelatedness of wk’s we

have E [∇L(θ)] = 2 nE

  • XT (xn

1 − Xθ)

  • = 2

nE

  • XT wn

1

  • = 0.

(17)

2 In linear regression terminology, (17) is known as the

  • rthogonality principle.

3 We show that ∇L(θ) is concentrated around its mean. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 25 / 1

slide-58
SLIDE 58

Finding an upper bound on γn

1 First it is easy to check that by the uncorrelatedness of wk’s we

have E [∇L(θ)] = 2 nE

  • XT (xn

1 − Xθ)

  • = 2

nE

  • XT wn

1

  • = 0.

(17)

2 In linear regression terminology, (17) is known as the

  • rthogonality principle.

3 We show that ∇L(θ) is concentrated around its mean. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 25 / 1

slide-59
SLIDE 59

Finding an upper bound on γn

1 We have

(∇L(θ))i = 2 nxn−iT

−i+1wn 1

2 The jth element in this expansion is of the form

yj = xn−i−j+1wn−j+1.

3 It is easy to check that the sequence yn

1 is a martingale with

respect to the filtration given by Fj = σ

  • xn−j+1

−p+1

  • ,

where σ(·) denote the sigma-field generated by the random variables in its argument.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 26 / 1

slide-60
SLIDE 60

Finding an upper bound on γn

1 We have

(∇L(θ))i = 2 nxn−iT

−i+1wn 1

2 The jth element in this expansion is of the form

yj = xn−i−j+1wn−j+1.

3 It is easy to check that the sequence yn

1 is a martingale with

respect to the filtration given by Fj = σ

  • xn−j+1

−p+1

  • ,

where σ(·) denote the sigma-field generated by the random variables in its argument.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 26 / 1

slide-61
SLIDE 61

Finding an upper bound on γn

1 We have

(∇L(θ))i = 2 nxn−iT

−i+1wn 1

2 The jth element in this expansion is of the form

yj = xn−i−j+1wn−j+1.

3 It is easy to check that the sequence yn

1 is a martingale with

respect to the filtration given by Fj = σ

  • xn−j+1

−p+1

  • ,

where σ(·) denote the sigma-field generated by the random variables in its argument.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 26 / 1

slide-62
SLIDE 62

Finding an upper bound on γn

1 We will now state the following concentration result for sums of

dependent random variables [?]:

Proposition

Fix n ≥ 1. Let Zj’s be subgaussian Fj-measurable random variables, satisfying for each j = 1, 2, · · · , n, E [Zj|Fj−1] = 0, almost surely, then there exists a constant c such that for all t > 0, P  

  • 1

n

n

  • j=1

Zj − E[Zj]

  • ≥ t

  ≤ exp

  • −cnt2

.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 27 / 1

slide-63
SLIDE 63

Finding an upper bound on γn

1 Since yj’s are a product of two independent subgaussian random

variables, they are subgaussian as well.

2 Proposition 1 implies that

P (|∇L(θ)i|≥ t) ≤ exp(−cnt2). (18) By union bound, we get: P

  • ∇L(θ)∞ ≥ t
  • ≤ exp(−ct2n + log p).

(19) Choosing t =

  • 1 + α1

c

  • log p

n for some α1 > 0 yields P

  • ∇L(θ)∞ ≥
  • 1 + α1

c

  • log p

n

  • ≤ 2 exp(−α1 log p) ≤

2 nα1 .

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 28 / 1

slide-64
SLIDE 64

Finding an upper bound on γn

1 Since yj’s are a product of two independent subgaussian random

variables, they are subgaussian as well.

2 Proposition 1 implies that

P (|∇L(θ)i|≥ t) ≤ exp(−cnt2). (18) By union bound, we get: P

  • ∇L(θ)∞ ≥ t
  • ≤ exp(−ct2n + log p).

(19) Choosing t =

  • 1 + α1

c

  • log p

n for some α1 > 0 yields P

  • ∇L(θ)∞ ≥
  • 1 + α1

c

  • log p

n

  • ≤ 2 exp(−α1 log p) ≤

2 nα1 .

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 28 / 1

slide-65
SLIDE 65

Finding an upper bound on γn

1 Hence γn ≤ d2

  • log p

n

with d2 :=

  • 1+α1

c

with probability at least 1 −

2 nα1 .

2 Combined with the result of Corollary 12 for n > d1sp2/3(log p)2/3,

we get the claim of Theorem 1.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 29 / 1

slide-66
SLIDE 66

Finding an upper bound on γn

1 Hence γn ≤ d2

  • log p

n

with d2 :=

  • 1+α1

c

with probability at least 1 −

2 nα1 .

2 Combined with the result of Corollary 12 for n > d1sp2/3(log p)2/3,

we get the claim of Theorem 1.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 29 / 1

slide-67
SLIDE 67

Future Work

1 Greedy methods 2 Penalized Yule-Walker 3 Dynamic ℓ1 reconstruction 4 Dynamic Durbin Levinson Abbas Kazemipour (UMD) Sparse AR November 18, 2015 30 / 1

slide-68
SLIDE 68

Future Work

1 Greedy methods 2 Penalized Yule-Walker 3 Dynamic ℓ1 reconstruction 4 Dynamic Durbin Levinson Abbas Kazemipour (UMD) Sparse AR November 18, 2015 30 / 1

slide-69
SLIDE 69

Future Work

1 Greedy methods 2 Penalized Yule-Walker 3 Dynamic ℓ1 reconstruction 4 Dynamic Durbin Levinson Abbas Kazemipour (UMD) Sparse AR November 18, 2015 30 / 1

slide-70
SLIDE 70

Future Work

1 Greedy methods 2 Penalized Yule-Walker 3 Dynamic ℓ1 reconstruction 4 Dynamic Durbin Levinson Abbas Kazemipour (UMD) Sparse AR November 18, 2015 30 / 1

slide-71
SLIDE 71

Other Methods

1 Penalized Yule-Walker

minimize

θ∈Rp

Rθ − r−1

−p2+λθ1

Need to do fourth moment analysis instead

2 Instead try

minimize

θ∈Rp

Rθ − r−1

−p1+λθ1

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 31 / 1

slide-72
SLIDE 72

Summary of Simulation Methods

1 Reguralized ML

minimize

θ∈Rp

y − Xθ2+λθ1

2 Yule-Walker l2,1

minimize

θ∈Rp

  • r−1

−p −

Rθ2+λθ1

3 Yule-Walker l1,1

minimize

θ∈Rp

  • r−1

−p −

Rθ1+λθ1

4 Least Square Solutions to Yule-Walker and Maximum Likelihood

(Traditional Method)

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 32 / 1

slide-73
SLIDE 73

Simulation Results for n = 60, p = 200, s = 4

20 40 60 80 100 120 140 160 180 200 0.1 0.2 True Parameters 20 40 60 80 100 120 140 160 180 200 −0.2 0.2 Reguralized ML 20 40 60 80 100 120 140 160 180 200 −0.1 0.1 ML Least Squares 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 OMP 20 40 60 80 100 120 140 160 180 200 −0.2 0.2 Yule-Walker ℓ2,1 20 40 60 80 100 120 140 160 180 200 −500 500 Yule-Walker Least Squares 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 OMP 20 40 60 80 100 120 140 160 180 200 −0.2 0.2 Yule-Walker ℓ1,1

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 33 / 1

slide-74
SLIDE 74

Simulation Results for n = 120, p = 200, s = 4

20 40 60 80 100 120 140 160 180 200 0.1 0.2 True Parameters 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 Reguralized ML 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 ML Least Squares 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 OMP 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 Yule-Walker ℓ2,1 20 40 60 80 100 120 140 160 180 200 −4 −2 2 Yule-Walker Least Squares 20 40 60 80 100 120 140 160 180 200 0.2 0.4 OMP 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 Yule-Walker ℓ1,1

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 34 / 1

slide-75
SLIDE 75

Simulation Results for n = 180, p = 200, s = 4

20 40 60 80 100 120 140 160 180 200 0.1 0.2 True Parameters 20 40 60 80 100 120 140 160 180 200 −0.1 0.1 Reguralized ML 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 1 ML Least Squares 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 OMP 20 40 60 80 100 120 140 160 180 200 −0.2 0.2 Yule-Walker ℓ2,1 20 40 60 80 100 120 140 160 180 200 −1 1 Yule-Walker Least Squares 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 OMP 20 40 60 80 100 120 140 160 180 200 −0.2 0.2 Yule-Walker ℓ1,1

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 35 / 1

slide-76
SLIDE 76

Simulation Results for n = 280, p = 200, s = 4

20 40 60 80 100 120 140 160 180 200 0.1 0.2 True Parameters 20 40 60 80 100 120 140 160 180 200 −0.1 0.1 Reguralized ML 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 ML Least Squares 20 40 60 80 100 120 140 160 180 200 −0.2 0.2 OMP 20 40 60 80 100 120 140 160 180 200 −0.1 0.1 0.2 Yule-Walker ℓ2,1 20 40 60 80 100 120 140 160 180 200 −0.2 0.2 Yule-Walker Least Squares 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 OMP 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 Yule-Walker ℓ1,1

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 36 / 1

slide-77
SLIDE 77

Simulation Results for n = 700, p = 200, s = 4

20 40 60 80 100 120 140 160 180 200 0.1 0.2 True Parameters 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 Reguralized ML 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 ML Least Squares 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 OMP 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 Yule-Walker ℓ2,1 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 Yule-Walker Least Squares 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 OMP 20 40 60 80 100 120 140 160 180 200 −0.5 0.5 Yule-Walker ℓ1,1

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 37 / 1

slide-78
SLIDE 78

Simulation Results for n = 4000, p = 200, s = 4

20 40 60 80 100 120 140 160 180 200 0.1 0.2 True Parameters 20 40 60 80 100 120 140 160 180 200 −0.1 0.1 0.2 Reguralized ML 20 40 60 80 100 120 140 160 180 200 −0.2 0.2 ML Least Squares 20 40 60 80 100 120 140 160 180 200 0.1 0.2 OMP 20 40 60 80 100 120 140 160 180 200 −0.1 0.1 0.2 Yule-Walker ℓ2,1 20 40 60 80 100 120 140 160 180 200 −0.2 0.2 Yule-Walker Least Squares 20 40 60 80 100 120 140 160 180 200 0.2 0.4 OMP 20 40 60 80 100 120 140 160 180 200 −0.1 0.1 0.2 Yule-Walker ℓ1,1

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 38 / 1

slide-79
SLIDE 79

MSE for different values of n, p = 300, s = 3

101 102 103 104 105 10-2 10-1 100 101 102

Regularized ML Least Squares Yule-Walker ℓ11 Yule-Walker Regularized ML + OMP Yule-Walker ℓ21 OMP + Yule-Walker

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 39 / 1

slide-80
SLIDE 80

OMP

1

Input: L(θ), s⋆ Output: θ(s⋆)

AROMP

Initialization: Start with the index set S(0) = ∅ and the initial estimate θ(0)

AROMP = 0

for k = 1, 2, · · · , s⋆ j = arg max

i

  • ∇L
  • θ(k−1)

AROMP

  • i
  • S(k) = S(k−1) ∪ {j}
  • θ(k)

AROMP =

arg min

supp(x)⊂S(k) L(θ)

end

Table: Autoregressive Orthogonal Matching Pursuit (AROMP)

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 40 / 1

slide-81
SLIDE 81

Main Theoretical Result

Theorem

If θ is (s, ξ, 2)-compressible for some ξ < 1/2, there exist constants d′

1, d′ 2ǫ, d′ 3ǫ and d′ 4 such that for n > d′ 1ǫs2/3p2/3 (log s)2/3 log p, the

AROMP estimate satisfies the bound

  • θAROMP − θ
  • 2 ≤ d′

  • s log s log p

n + d′

log s s

1 ξ −2

(20) after s⋆ = Oǫ(s log s) iterations with probability greater than 1 − O

  • 1

nd′

4

  • .

1 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 41 / 1

slide-82
SLIDE 82

Application to Financial Data

1 Crude oil price of cushing, OK WTI Spot Price FOB dataset. 2 The dataset consists of 7429 daily values 3 outliers removed by visual inspection, n = 4000 4 Long-memory time series → first order differencing 5 Model order selection of low importance here Abbas Kazemipour (UMD) Sparse AR November 18, 2015 42 / 1

slide-83
SLIDE 83

Application to Financial Data

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 43 / 1

slide-84
SLIDE 84

Conclusions

1 First order differences show Gaussian behavior 2 Given no outliers our method predicts a sudden change in prices

every 40, 100, 150 days.

3 Yule-Walker is bad ! 4 Greedy is good! Abbas Kazemipour (UMD) Sparse AR November 18, 2015 44 / 1

slide-85
SLIDE 85

Minimax Framework

1 minimax estimation risk over the class of good stationary:

Re( θ) = sup

  • E
  • θ − θ2

2

1/2 . (21)

2 The minimax estimator:

  • θminimiax = arg min

θ∈Θ

Re( θ). (22)

3 Typically cannot be constructed → interested in optimal in order

estimators : Re( θ) ≤ cRe( θminimax). (23)

4 Can also define the minimax prediction risk:

Rp( θ) = sup E

  • xk −

θ′xk−1

k−p

2 . (24)

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 45 / 1

slide-86
SLIDE 86

Minimax Framework

1 ℓ2-regularized LS problem: [Goldenhauser 2001] 2 Slightly weaker exponential inequality 3 p⋆ = ⌊−1/2 log(1 − ǫ) log n⌋ is minimax optimal 4 Requires an exponentially large in p sample size 5 Our result: the ℓ1-regularized LS estimator is minimax optimal 6 Can afford higher orders Abbas Kazemipour (UMD) Sparse AR November 18, 2015 46 / 1

slide-87
SLIDE 87

Minimax Optimality

Theorem

Let xn

1 be samples of an AR process with s-sparse parameters satisfying

θ1≤ 1 − ǫ, then with a choice of p⋆ = Oǫ 2

3

√n

  • we have:

cǫ s n ≤ Re( θminimax) ≤ Re( θsp) ≤ c′

ǫRe(

θminimax), that is the ℓ1-regularized LS estimator is minimax optimal modulo logarithmic factors.

1 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 47 / 1

slide-88
SLIDE 88

Minimax Optimality

Theorem

Let xn

−p+1 be samples of an AR process with Gaussian innovation

There exist positive constants cǫ and c′

ǫ such that for

n > cǫsp2/3(log p)2/3 we have: Rp( θsp) ≤ c′

ǫ

1 n2 + (1 − ǫ)2ps + s n

  • + 1.

(25)

1 Large n → prediction error variance is very close to the variance of

the innovations.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 48 / 1

slide-89
SLIDE 89

Proofs:

1 Define the event:

A := {max

i,j |

Rij − Rij|≤ t⋆}.

2

P(Ac) ≤ c1 exp

cǫn3/2 s3/2(n + p) + 2 log p

  • .

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 49 / 1

slide-90
SLIDE 90

Proofs:

1

Re( θminimax)2 ≤ Re( θsp)2 = sup

  • E
  • θ − θ2

2

1/2 ≤ P(A)

  • c2
  • s log p

n +

  • c2σs(θ)

4

  • log p

n 2 + P(Ac) θsp − θ2

2

  • c2
  • s log p

n +

  • c2σs(θ)

4

  • log p

n 2 + 4(1 − ǫ)2c1 exp

cǫn3/2 s3/2(n + p) + 2 log p

  • .

2 For n > cǫsp2/3(log p)2/3, the first term will be the dominant factor Abbas Kazemipour (UMD) Sparse AR November 18, 2015 50 / 1

slide-91
SLIDE 91

Proofs: Converse

1 Assumption: Gaussian innovations

Lemma (Fano’s Inequality)

Let Z be a class of densities with a subclass Z⋆ densities fθi, i ∈ {0, · · · , 2s} such that for any two distinct θ1, θ2 ∈ Z⋆: D(fθ1fθ2) ≤ β. Let θ be an estimate of the parameters. Then sup

j

P( θ = θj|Hj) ≥ 1 − β + log 2 s , (26) where Hj denotes the hypothesis that θj is the true parameter, and induces the probability measure P(.|Hj).

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 51 / 1

slide-92
SLIDE 92

Proofs: Converse

1 Class Z of AR processes defined over a fixed subset

S ⊂ {1, 2, · · · , p} satisfying |S|= s and by the s-sparse parameter set given by: θj = ±ηe−N✶S(j), (27) where η and N remain to be chosen

2 Add the all zero vector θ to Z. 3

|Z|= 2s + 1.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 52 / 1

slide-93
SLIDE 93

Proofs: Converse

Lemma (Gilbert-Varshamov Lemma)

∃Z⋆ ⊂ Z, such that |Z⋆|≥ 2⌊s/8⌋ + 1, and any two distinct θ1, θ2 ∈ Z⋆ differ at least in s/16 components!

1 2

θ1 − θ22≥ 1 4 √se−N := α. (28)

3 Arbitrary estimate

θ: Hypothesis testing problem between the 2⌊s/8⌋ + 1 hypotheses Hj : θ = θj ∈ Z⋆, and the minimum distance decoding strategy.

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 53 / 1

slide-94
SLIDE 94

Proofs: Converse

1 Markov’s inequality:

sup

Z

E

  • θ − θ2
  • ≥ sup

Z⋆ E[

θ − θ2] ≥ α 2 sup

Z⋆ P

  • θ − θ2≥ s

2

  • = α

2 sup

j=0,···,2⌊ s

8 ⌋

P

  • θ = θj|Hj
  • .

(29)

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 54 / 1

slide-95
SLIDE 95

Proofs: Converse

1 fθi: joint pdf of {xk}k = 1n conditioned on {x}0

−p+1,

i ∈ {0, · · · , 2s}.

2 Gaussian innovations, for i = j

D(fθifθj) ≤ sup

i=j

E

  • log fθi

fθj |Hi

  • ≤ sup

i=j

E

  • −1

2

n

  • k=1
  • xk − θ′

ixk−1 k−p

2 −

  • xk − θ′

jxk−1 k−p

2

  • Hi
  • ≤ sup

i=j

n 2 E

  • (θi − θj)′xk−1

k−p

2

  • Hi
  • = n

2 sup

i=j

(θi − θj)′Rp×p(θi − θj) ≤ n 2 sup

i=j

θi − θj2

2λmax ≤ ηnse−2N

ǫ2 := β. (30)

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 55 / 1

slide-96
SLIDE 96

Proofs: Converse

1 Using Fano’s:

sup

Z

E

  • θ − θ2

√se−N 8

  • 1 − 8( ηnse−2N

ǫ2

+ log 2) s

  • .

2 Choose η = ǫ2 and N = log n for large enough s and n. 3 Any θ ∈ Z, satisfies θ1≤ 1 − ǫ. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 56 / 1

slide-97
SLIDE 97

Statistical Tests of Goodness-of-Fit

1 The residues (estimated innovations) of the process:

ek = xk − θxk−1

k−p

i = 1, 2, · · · , n.

2 Goal: quantify how close the sequence {ei}n

i=1 is to an i.i.d

realization of an unknown (mostly absolutely continuous) distribution F0.

Lemma (Glivenko-Cantelli Theorem)

If the samples are generated from F0 the theorem suggests that: sup

t

| Fn(t) − F0(t)| a.s. − → 0.

3 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 57 / 1

slide-98
SLIDE 98

Statistical Tests of Goodness-of-Fit

1 Kolmogorov-Smirnov (KS) test statistic

Kn := sup

t

| Fn(t) − F0(t)|,

2 Cramer-Von Mises (CvM) statistic

Cn :=

  • Fn(t) − F0(t)

2 dF0(t),

3 Anderson-Darling (AD) statistic

An :=

  • Fn(t) − F0(t)

2 F0(t) (1 − F0(t)) dF0(t).

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 58 / 1

slide-99
SLIDE 99

Statistical Tests of Goodness-of-Fit

1 KS

Kn = max

1≤i≤n max

  • i

n − F0(ei)

  • ,
  • i − 1

n − F0(ei)

  • ,

2 CvM

nCn = 1 12n +

n

  • i=1
  • F0(ei) − 2i − 1

2n 2 ,

3 AD

nAn = −n − 1 n

n

  • i=1

(2i − 1)

  • log F0(ei) + log
  • 1 − F0(ei)
  • .

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 59 / 1

slide-100
SLIDE 100

Spectral Forms

1 Based on the similarities of the spectrogram of the data and the

estimated power-spectral density of the process.

Lemma

Let S(ω) be the (normalized) power-spectral density of stationary process with bounded condition number, and Sn(ω) be the spectrogram of the n samples of a realization of such a process, then for all ω we have: √n

  • 2

ω

  • Sn(λ) − S(λ)
  • d.

− → Z(ω), (31) where Z(ω) is a mean zero Gaussian process.

2 Spectral KS, CvM, AD tests ... Abbas Kazemipour (UMD) Sparse AR November 18, 2015 60 / 1

slide-101
SLIDE 101

Spectral Forms

Table: Statistical Tests for Synthetic Data using Minimax Order Selection

1

Estimator \Test CvM AD KS SCvM θ 0.30685 1.5419 0.030611 0.0092007

  • θsp

0.33946 1.722 0.030428 0.0089813

  • θLS

0.67812 5.1189 0.037651 0.016709

  • θyw,ℓ1,1

0.41924 2.3264 0.039603 0.008389

  • θyw,ℓ2,1

0.34459 1.8007 0.032277 0.0089091

  • θAROMP

0.28592 1.451 0.027876 0.0092705

  • θARYOMP

0.28776 1.4601 0.029668 0.0092142

  • θyw

0.65198 4.8698 0.034141 0.024712

Abbas Kazemipour (UMD) Sparse AR November 18, 2015 61 / 1