estimation of autoregressive processes with sparse
play

Estimation of Autoregressive Processes with Sparse Parameters Abbas - PowerPoint PPT Presentation

Estimation of Autoregressive Processes with Sparse Parameters Abbas Kazemipour MAST Group Meeting University of Maryland. College Park kaazemi@umd.edu November 18, 2015 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 1 / 1 Overview


  1. Introduction 1 Yule-Walker equations: 1 r 0 = θ T r − 1 R p × p θ = r − 1 − p , − p + 1 = | 1 − � i θ i | 2 , (4) 2 R := R p × p = E [ x p 1 x pT 1 ] : p × p covariance matrix of the process 3 r k = E [ x i x i + k ] is the k -th autocorrelation 4 If n ≫ p ⇒ estimate R , r k ’s + Yule-Walker Does not exploit the sparsity of θ Does not perform well when n is small or comparable with p . Poor estimates mainly due to the fact that requires an inversion of � R p which might not be numerically stable. 5 Usually biased estimates are used at the cost of distorting the Yule-Walker equations. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 6 / 1

  2. Introduction 1 Yule-Walker equations: 1 r 0 = θ T r − 1 R p × p θ = r − 1 − p , − p + 1 = | 1 − � i θ i | 2 , (4) 2 R := R p × p = E [ x p 1 x pT 1 ] : p × p covariance matrix of the process 3 r k = E [ x i x i + k ] is the k -th autocorrelation 4 If n ≫ p ⇒ estimate R , r k ’s + Yule-Walker Does not exploit the sparsity of θ Does not perform well when n is small or comparable with p . Poor estimates mainly due to the fact that requires an inversion of � R p which might not be numerically stable. 5 Usually biased estimates are used at the cost of distorting the Yule-Walker equations. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 6 / 1

  3. Introduction 1 Yule-Walker equations: 1 r 0 = θ T r − 1 R p × p θ = r − 1 − p , − p + 1 = | 1 − � i θ i | 2 , (4) 2 R := R p × p = E [ x p 1 x pT 1 ] : p × p covariance matrix of the process 3 r k = E [ x i x i + k ] is the k -th autocorrelation 4 If n ≫ p ⇒ estimate R , r k ’s + Yule-Walker Does not exploit the sparsity of θ Does not perform well when n is small or comparable with p . Poor estimates mainly due to the fact that requires an inversion of � R p which might not be numerically stable. 5 Usually biased estimates are used at the cost of distorting the Yule-Walker equations. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 6 / 1

  4. Our Formulation 1 LASSO type estimator given by a conditional log-likelihood penalization: 1 n � x n 1 − X θ � 2 2 + γ n � θ � 1 , minimize (5) θ ∈ R p where   · · · x n − 1 x n − 2 x n − p   x n − 2 x n − 3 · · · x n − p − 1   X =   .  (6) . . . ... . . .  . . . x 0 x − 1 · · · x − p +1 2 X is toeplitz matrix with highly correlated elements . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 7 / 1

  5. Our Formulation 1 LASSO type estimator given by a conditional log-likelihood penalization: 1 n � x n 1 − X θ � 2 2 + γ n � θ � 1 , minimize (5) θ ∈ R p where   · · · x n − 1 x n − 2 x n − p   x n − 2 x n − 3 · · · x n − p − 1   X =   .  (6) . . . ... . . .  . . . x 0 x − 1 · · · x − p +1 2 X is toeplitz matrix with highly correlated elements . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 7 / 1

  6. Results Theorem If σ s ( θ ) = O ( √ s ) , there exist positive constants c 1 , c 2 , c 3 and c ǫ such that for n > c ǫ sp 2 / 3 (log p ) 2 / 3 and a choice of regularization parameter � n , any solution � log p γ n = c 1 θ sp to ( ?? ) satisfies the bound � � � � � s log p log p � � �� 4 θ sp − θ 2 ≤ c 2 + c 2 σ s ( θ ) n , (7) � n with probability greater than 1 − O ( 1 n c 3 ) . 1 n can be much less than p 2 Better than Yule-Walker Abbas Kazemipour (UMD) Sparse AR November 18, 2015 8 / 1

  7. Results Theorem If σ s ( θ ) = O ( √ s ) , there exist positive constants c 1 , c 2 , c 3 and c ǫ such that for n > c ǫ sp 2 / 3 (log p ) 2 / 3 and a choice of regularization parameter � n , any solution � log p γ n = c 1 θ sp to ( ?? ) satisfies the bound � � � � � s log p log p � � �� 4 θ sp − θ 2 ≤ c 2 + c 2 σ s ( θ ) n , (7) � n with probability greater than 1 − O ( 1 n c 3 ) . 1 n can be much less than p 2 Better than Yule-Walker Abbas Kazemipour (UMD) Sparse AR November 18, 2015 8 / 1

  8. Results Theorem If σ s ( θ ) = O ( √ s ) , there exist positive constants c 1 , c 2 , c 3 and c ǫ such that for n > c ǫ sp 2 / 3 (log p ) 2 / 3 and a choice of regularization parameter � n , any solution � log p γ n = c 1 θ sp to ( ?? ) satisfies the bound � � � � � s log p log p � � �� 4 θ sp − θ 2 ≤ c 2 + c 2 σ s ( θ ) n , (7) � n with probability greater than 1 − O ( 1 n c 3 ) . 1 n can be much less than p 2 Better than Yule-Walker Abbas Kazemipour (UMD) Sparse AR November 18, 2015 8 / 1

  9. Simulation: p = 100, n = 500, s = 3 and γ n = 0 . 1. True Parameters 0.1 0 -0.1 -0.2 20 40 60 80 100 Regularized ML 0.05 0 -0.05 20 40 60 80 100 Yule-Walker 0.1 0 -0.1 -0.2 20 40 60 80 100 1 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 9 / 1 Figure: Recovery Results for n = 500 , p = 100 , s = 3

  10. Proof of the Main Theorem Lemma (Cone Condition) n � X T ( x n For a choice of the regularization parameter γ n = 2 1 − X θ ) � ∞ the optimal error h = � θ − θ belongs to the cone C := { h ∈ R p |� h S c � 1 ≤ 3 � h S � 1 } . (8) Abbas Kazemipour (UMD) Sparse AR November 18, 2015 10 / 1

  11. Proof of the Main Theorem Definition (Restricted Eigenvalue Condition) X is said to satisfy the RE condition of order s if 2 ≥ 1 n θ T X T X θ = 1 λ max ( s ) � θ � 2 n � X θ � 2 2 ≥ λ min ( s ) � θ � 2 2 , (9) for all θ which is s -sparse. 1 Essentially requires the eigenvalues of all n × s submatrices of X to be bounded and strictly positive. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 11 / 1

  12. Proof of the Main Theorem Definition (Restricted Eigenvalue Condition) X is said to satisfy the RE condition of order s if 2 ≥ 1 n θ T X T X θ = 1 λ max ( s ) � θ � 2 n � X θ � 2 2 ≥ λ min ( s ) � θ � 2 2 , (9) for all θ which is s -sparse. 1 Essentially requires the eigenvalues of all n × s submatrices of X to be bounded and strictly positive. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 11 / 1

  13. Proof of the Main Theorem Definition (Restricted Strong Convexity) X is said to satisfy the RSC condition of order s if n h T X T X h = 1 1 n � X h � 2 2 ≥ κ � h � 2 ∀ h ∈ C . 2 , (10) Abbas Kazemipour (UMD) Sparse AR November 18, 2015 12 / 1

  14. Proof of the Main Theorem Lemma (Theorem 1 of Negahban) If X satisfies the RSC condition then any optimal solution � θ satisfies θ − θ � 2 ≤ 2 √ sγ n � � κ θ − θ � 1 ≤ 6 sγ n � � . ( ⋆ ) κ Abbas Kazemipour (UMD) Sparse AR November 18, 2015 13 / 1

  15. Proof of the Main Theorem Lemma (lemma 4.1 of Bickel) If X satisfies the RE condition of order s ⋆ = ( r + 1) s then the RSC condition is also satisfied with � � � � λ max ( rs ) κ = λ min (( r + 1) s ) 1 − 3 . (11) rλ min (( r + 1) s ) Abbas Kazemipour (UMD) Sparse AR November 18, 2015 14 / 1

  16. Proof of the Main Theorem 1 Step 1: Finding a lower bound on κ 2 Step 2: Finding an upper bound on γ n Abbas Kazemipour (UMD) Sparse AR November 18, 2015 15 / 1

  17. Proof of the Main Theorem 1 Step 1: Finding a lower bound on κ 2 Step 2: Finding an upper bound on γ n Abbas Kazemipour (UMD) Sparse AR November 18, 2015 15 / 1

  18. Finding a lower bound on κ Lemma (Haykin) Let R ∈ R k × k be the k × k covariance matrix of a stationary process with power spectral density S ( ω ) , and denote its maximum and minimum eigenvalues by φ max ( k ) and φ min ( k ) respectively then φ min ( k ) ↓ inf ω S ( ω ) , (12) and φ max ( k ) ↑ sup ω S ( ω ) . (13) Abbas Kazemipour (UMD) Sparse AR November 18, 2015 16 / 1

  19. Convergence of the Eigenvalues of R 1 Minimum and Maximum Eigenvalues of R Power Spectral Density 4 4 λ min ( k ) 3.5 3.5 λ max ( k ) 3 3 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 0 100 200 300 0 0.2 0.4 0.6 0.8 1 k f Figure: Recovery Results for n = 500 , p = 100 , s = 3 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 17 / 1

  20. Finding a lower bound on κ Corollary (RE of R ) Under the assumptions of our problem, for an AR process R satisfies RE condition (of any order) for λ max = 1 /ǫ 2 and λ min = 1 / 4 . Proof. For an AR( p ) process σ 2 w S ( ω ) = | 1 − � θ i e − jω | 2 . Using the assumption � | θ i |≤ 1 − ǫ in conjunction with lemma 6 proves the claim. 1 RE condition also holds for any stationary process satisfying inf ω S ( ω ) > 0 and sup ω S ( ω ) < ∞ ! Abbas Kazemipour (UMD) Sparse AR November 18, 2015 18 / 1

  21. Finding a lower bound on κ Corollary (RE of R ) Under the assumptions of our problem, for an AR process R satisfies RE condition (of any order) for λ max = 1 /ǫ 2 and λ min = 1 / 4 . Proof. For an AR( p ) process σ 2 w S ( ω ) = | 1 − � θ i e − jω | 2 . Using the assumption � | θ i |≤ 1 − ǫ in conjunction with lemma 6 proves the claim. 1 RE condition also holds for any stationary process satisfying inf ω S ( ω ) > 0 and sup ω S ( ω ) < ∞ ! Abbas Kazemipour (UMD) Sparse AR November 18, 2015 18 / 1

  22. Finding a lower bound on κ Lemma (RE of � R ) If R satisfies the RE condition with parameters λ max and λ min , then � R satisfies the RE condition of order s ⋆ with parameters min = λ min − ts ⋆ , where t = max i,j | � λ ′ max = λ max + ts ⋆ and λ ′ R ij − R ij | . Proof. For every s ⋆ -sparse vector θ we have θ T � R θ ≥ θ T R θ − t � θ � 2 1 ≥ ( λ min − ts ⋆ ) � θ � 2 2 , θ T � R θ ≤ θ T R θ + t � θ � 2 1 ≤ ( λ max + ts ⋆ ) � θ � 2 2 , which is what we claimed. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 19 / 1

  23. Finding a lower bound on κ Lemma (RE of � R ) If R satisfies the RE condition with parameters λ max and λ min , then � R satisfies the RE condition of order s ⋆ with parameters min = λ min − ts ⋆ , where t = max i,j | � λ ′ max = λ max + ts ⋆ and λ ′ R ij − R ij | . Proof. For every s ⋆ -sparse vector θ we have θ T � R θ ≥ θ T R θ − t � θ � 2 1 ≥ ( λ min − ts ⋆ ) � θ � 2 2 , θ T � R θ ≤ θ T R θ + t � θ � 2 1 ≤ ( λ max + ts ⋆ ) � θ � 2 2 , which is what we claimed. 1 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 20 / 1

  24. Finding a lower bound on κ Lemma (RE of � R ) If R satisfies the RE condition with parameters λ max and λ min , then � R satisfies the RE condition of order s ⋆ with parameters min = λ min − ts ⋆ , where t = max i,j | � λ ′ max = λ max + ts ⋆ and λ ′ R ij − R ij | . Proof. For every s ⋆ -sparse vector θ we have θ T � R θ ≥ θ T R θ − t � θ � 2 1 ≥ ( λ min − ts ⋆ ) � θ � 2 2 , θ T � R θ ≤ θ T R θ + t � θ � 2 1 ≤ ( λ max + ts ⋆ ) � θ � 2 2 , which is what we claimed. 1 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 20 / 1

  25. Finding a lower bound on κ Lemma (RE of � R ) If R satisfies the RE condition with parameters λ max and λ min , then � R satisfies the RE condition of order s ⋆ with parameters min = λ min − ts ⋆ , where t = max i,j | � λ ′ max = λ max + ts ⋆ and λ ′ R ij − R ij | . 1 This holds for every t . 2 We will be interested in t = λ min / 2 s ⋆ . R = X T X/n , corollary 7 + lemma 5 with 3 Noting that � r = 288 / ( ǫ 2 + 1 / 8): ⇒ X satisfies the RSC with parameter √ κ = 1 / 4 2. 4 In order to complete this bound it only remains to show that t can be chosen to be suitably small. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 21 / 1

  26. Finding a lower bound on κ Lemma (RE of � R ) If R satisfies the RE condition with parameters λ max and λ min , then � R satisfies the RE condition of order s ⋆ with parameters min = λ min − ts ⋆ , where t = max i,j | � λ ′ max = λ max + ts ⋆ and λ ′ R ij − R ij | . 1 This holds for every t . 2 We will be interested in t = λ min / 2 s ⋆ . R = X T X/n , corollary 7 + lemma 5 with 3 Noting that � r = 288 / ( ǫ 2 + 1 / 8): ⇒ X satisfies the RSC with parameter √ κ = 1 / 4 2. 4 In order to complete this bound it only remains to show that t can be chosen to be suitably small. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 21 / 1

  27. Finding a lower bound on κ Lemma (RE of � R ) If R satisfies the RE condition with parameters λ max and λ min , then � R satisfies the RE condition of order s ⋆ with parameters min = λ min − ts ⋆ , where t = max i,j | � λ ′ max = λ max + ts ⋆ and λ ′ R ij − R ij | . 1 This holds for every t . 2 We will be interested in t = λ min / 2 s ⋆ . R = X T X/n , corollary 7 + lemma 5 with 3 Noting that � r = 288 / ( ǫ 2 + 1 / 8): ⇒ X satisfies the RSC with parameter √ κ = 1 / 4 2. 4 In order to complete this bound it only remains to show that t can be chosen to be suitably small. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 21 / 1

  28. Finding a lower bound on κ Lemma (RE of � R ) If R satisfies the RE condition with parameters λ max and λ min , then � R satisfies the RE condition of order s ⋆ with parameters min = λ min − ts ⋆ , where t = max i,j | � λ ′ max = λ max + ts ⋆ and λ ′ R ij − R ij | . 1 This holds for every t . 2 We will be interested in t = λ min / 2 s ⋆ . R = X T X/n , corollary 7 + lemma 5 with 3 Noting that � r = 288 / ( ǫ 2 + 1 / 8): ⇒ X satisfies the RSC with parameter √ κ = 1 / 4 2. 4 In order to complete this bound it only remains to show that t can be chosen to be suitably small. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 21 / 1

  29. Finding a lower bound on κ Lemma (RE of � R ) If R satisfies the RE condition with parameters λ max and λ min , then � R satisfies the RE condition of order s ⋆ with parameters min = λ min − ts ⋆ , where t = max i,j | � λ ′ max = λ max + ts ⋆ and λ ′ R ij − R ij | . 1 This holds for every t . 2 We will be interested in t = λ min / 2 s ⋆ . R = X T X/n , corollary 7 + lemma 5 with 3 Noting that � r = 288 / ( ǫ 2 + 1 / 8): ⇒ X satisfies the RSC with parameter √ κ = 1 / 4 2. 4 In order to complete this bound it only remains to show that t can be chosen to be suitably small. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 21 / 1

  30. Concentration Inequlatiy Lemma (Concentration Inequality: Theorem 4 of Rudzkis) Let x n − p +1 be samples of a stationary process which satisfies x k = � ∞ j = −∞ b j − k w j , where w k ’s are i.i.d random variables with | E ( w K j ) |≤ C K .K ! , k = 2 , 3 , · · · , (14) and � ∞ j = −∞ | b j | < ∞ . Then the biased sample autocorrelation given by n + k � 1 r b � k = x i x j n + k i,j =1 ,j − i = k satisfies � � − c 2 t 3 / 2 √ r b k − r b P ( | � k | > t ) ≤ c 1 exp n + k , (15) for positive constants c 1 and c 2 which are independent of dimensions of the problem. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 22 / 1

  31. Concentration Inequlatiy Corollary (Concentration Inequality for Unbiased Estimates) The unbiased estimate satisfies � � − c 2 n 3 / 2 t 3 / 2 P ( | � r k − r k | > t ) ≤ c 1 exp . (16) n + k λ min 1 Choose t ⋆ = 2( r +1) s = c ǫ /s + union bound ( k = p in all inequ.): � � − c 2 n 3 / 2 t 3 / 2 R ij − R ij | > t ⋆ ) ≤ c 1 p 2 exp ⋆ i,j | � P (max n + p � � c ǫ n 3 / 2 ≤ c 1 exp − s 3 / 2 ( n + p ) + 2 log p . 2 p ≫ n + choose n > csp 2 / 3 (log p ) 2 / 3 ⇒ bound on κ . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 23 / 1

  32. Concentration Inequlatiy Corollary (Concentration Inequality for Unbiased Estimates) The unbiased estimate satisfies � � − c 2 n 3 / 2 t 3 / 2 P ( | � r k − r k | > t ) ≤ c 1 exp . (16) n + k λ min 1 Choose t ⋆ = 2( r +1) s = c ǫ /s + union bound ( k = p in all inequ.): � � − c 2 n 3 / 2 t 3 / 2 R ij − R ij | > t ⋆ ) ≤ c 1 p 2 exp ⋆ i,j | � P (max n + p � � c ǫ n 3 / 2 ≤ c 1 exp − s 3 / 2 ( n + p ) + 2 log p . 2 p ≫ n + choose n > csp 2 / 3 (log p ) 2 / 3 ⇒ bound on κ . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 23 / 1

  33. Concentration Inequlatiy Corollary (Concentration Inequality for Unbiased Estimates) The unbiased estimate satisfies � � − c 2 n 3 / 2 t 3 / 2 P ( | � r k − r k | > t ) ≤ c 1 exp . (16) n + k λ min 1 Choose t ⋆ = 2( r +1) s = c ǫ /s + union bound ( k = p in all inequ.): � � − c 2 n 3 / 2 t 3 / 2 R ij − R ij | > t ⋆ ) ≤ c 1 p 2 exp ⋆ i,j | � P (max n + p � � c ǫ n 3 / 2 ≤ c 1 exp − s 3 / 2 ( n + p ) + 2 log p . 2 p ≫ n + choose n > csp 2 / 3 (log p ) 2 / 3 ⇒ bound on κ . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 23 / 1

  34. Finding an upper bound on γ n 1 Gradient of the objective function � x n 1 − X θ � 2 2 : ∇ L ( θ ) := 2 nX T ( x n 1 − X θ ) , 2 Lemmas 8 and 4 suggest that a suitable choice of the regularization parameter is given by: γ n = �∇ L ( θ ) � ∞ . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 24 / 1

  35. Finding an upper bound on γ n 1 Gradient of the objective function � x n 1 − X θ � 2 2 : ∇ L ( θ ) := 2 nX T ( x n 1 − X θ ) , 2 Lemmas 8 and 4 suggest that a suitable choice of the regularization parameter is given by: γ n = �∇ L ( θ ) � ∞ . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 24 / 1

  36. Finding an upper bound on γ n 1 First it is easy to check that by the uncorrelatedness of w k ’s we have � � � � E [ ∇ L ( θ )] = 2 = 2 X T ( x n X T w n n E 1 − X θ ) n E = 0 . (17) 1 2 In linear regression terminology, (17) is known as the orthogonality principle. 3 We show that ∇ L ( θ ) is concentrated around its mean. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 25 / 1

  37. Finding an upper bound on γ n 1 First it is easy to check that by the uncorrelatedness of w k ’s we have � � � � E [ ∇ L ( θ )] = 2 = 2 X T ( x n X T w n n E 1 − X θ ) n E = 0 . (17) 1 2 In linear regression terminology, (17) is known as the orthogonality principle. 3 We show that ∇ L ( θ ) is concentrated around its mean. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 25 / 1

  38. Finding an upper bound on γ n 1 First it is easy to check that by the uncorrelatedness of w k ’s we have � � � � E [ ∇ L ( θ )] = 2 = 2 X T ( x n X T w n n E 1 − X θ ) n E = 0 . (17) 1 2 In linear regression terminology, (17) is known as the orthogonality principle. 3 We show that ∇ L ( θ ) is concentrated around its mean. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 25 / 1

  39. Finding an upper bound on γ n 1 We have ( ∇ L ( θ )) i = 2 nx n − i T − i +1 w n 1 2 The j th element in this expansion is of the form y j = x n − i − j +1 w n − j +1 . 3 It is easy to check that the sequence y n 1 is a martingale with respect to the filtration given by � � x n − j +1 F j = σ , − p +1 where σ ( · ) denote the sigma-field generated by the random variables in its argument. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 26 / 1

  40. Finding an upper bound on γ n 1 We have ( ∇ L ( θ )) i = 2 nx n − i T − i +1 w n 1 2 The j th element in this expansion is of the form y j = x n − i − j +1 w n − j +1 . 3 It is easy to check that the sequence y n 1 is a martingale with respect to the filtration given by � � x n − j +1 F j = σ , − p +1 where σ ( · ) denote the sigma-field generated by the random variables in its argument. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 26 / 1

  41. Finding an upper bound on γ n 1 We have ( ∇ L ( θ )) i = 2 nx n − i T − i +1 w n 1 2 The j th element in this expansion is of the form y j = x n − i − j +1 w n − j +1 . 3 It is easy to check that the sequence y n 1 is a martingale with respect to the filtration given by � � x n − j +1 F j = σ , − p +1 where σ ( · ) denote the sigma-field generated by the random variables in its argument. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 26 / 1

  42. Finding an upper bound on γ n 1 We will now state the following concentration result for sums of dependent random variables [ ? ]: Proposition Fix n ≥ 1 . Let Z j ’s be subgaussian F j -measurable random variables, satisfying for each j = 1 , 2 , · · · , n , E [ Z j |F j − 1 ] = 0 , almost surely , then there exists a constant c such that for all t > 0 , � �   � � n � � � � − cnt 2 � 1  � �  ≤ exp Z j − E [ Z j ] ≥ t . P � � n � � j =1 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 27 / 1

  43. Finding an upper bound on γ n 1 Since y j ’s are a product of two independent subgaussian random variables, they are subgaussian as well. 2 Proposition 1 implies that P ( |∇ L ( θ ) i |≥ t ) ≤ exp( − cnt 2 ) . (18) By union bound, we get: � � ≤ exp( − ct 2 n + log p ) . P �∇ L ( θ ) � ∞ ≥ t (19) Choosing � � 1 + α 1 log p t = c n for some α 1 > 0 yields � � � � 1 + α 1 log p 2 P �∇ L ( θ ) � ∞ ≥ ≤ 2 exp( − α 1 log p ) ≤ n α 1 . c n Abbas Kazemipour (UMD) Sparse AR November 18, 2015 28 / 1

  44. Finding an upper bound on γ n 1 Since y j ’s are a product of two independent subgaussian random variables, they are subgaussian as well. 2 Proposition 1 implies that P ( |∇ L ( θ ) i |≥ t ) ≤ exp( − cnt 2 ) . (18) By union bound, we get: � � ≤ exp( − ct 2 n + log p ) . P �∇ L ( θ ) � ∞ ≥ t (19) Choosing � � 1 + α 1 log p t = c n for some α 1 > 0 yields � � � � 1 + α 1 log p 2 P �∇ L ( θ ) � ∞ ≥ ≤ 2 exp( − α 1 log p ) ≤ n α 1 . c n Abbas Kazemipour (UMD) Sparse AR November 18, 2015 28 / 1

  45. Finding an upper bound on γ n � � log p 1 Hence γ n ≤ d 2 1+ α 1 with d 2 := with probability at least n c 2 1 − n α 1 . 2 Combined with the result of Corollary 12 for n > d 1 sp 2 / 3 (log p ) 2 / 3 , we get the claim of Theorem 1. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 29 / 1

  46. Finding an upper bound on γ n � � log p 1 Hence γ n ≤ d 2 1+ α 1 with d 2 := with probability at least n c 2 1 − n α 1 . 2 Combined with the result of Corollary 12 for n > d 1 sp 2 / 3 (log p ) 2 / 3 , we get the claim of Theorem 1. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 29 / 1

  47. Future Work 1 Greedy methods 2 Penalized Yule-Walker 3 Dynamic ℓ 1 reconstruction 4 Dynamic Durbin Levinson Abbas Kazemipour (UMD) Sparse AR November 18, 2015 30 / 1

  48. Future Work 1 Greedy methods 2 Penalized Yule-Walker 3 Dynamic ℓ 1 reconstruction 4 Dynamic Durbin Levinson Abbas Kazemipour (UMD) Sparse AR November 18, 2015 30 / 1

  49. Future Work 1 Greedy methods 2 Penalized Yule-Walker 3 Dynamic ℓ 1 reconstruction 4 Dynamic Durbin Levinson Abbas Kazemipour (UMD) Sparse AR November 18, 2015 30 / 1

  50. Future Work 1 Greedy methods 2 Penalized Yule-Walker 3 Dynamic ℓ 1 reconstruction 4 Dynamic Durbin Levinson Abbas Kazemipour (UMD) Sparse AR November 18, 2015 30 / 1

  51. Other Methods 1 Penalized Yule-Walker � � r − 1 minimize R θ − � − p � 2 + λ � θ � 1 θ ∈ R p Need to do fourth moment analysis instead 2 Instead try � � r − 1 minimize R θ − � − p � 1 + λ � θ � 1 θ ∈ R p Abbas Kazemipour (UMD) Sparse AR November 18, 2015 31 / 1

  52. Summary of Simulation Methods 1 Reguralized ML minimize � y − X θ � 2 + λ � θ � 1 θ ∈ R p 2 Yule-Walker l 2,1 r − 1 − p − � � � R θ � 2 + λ � θ � 1 minimize θ ∈ R p 3 Yule-Walker l 1,1 − p − � r − 1 minimize � � R θ � 1 + λ � θ � 1 θ ∈ R p 4 Least Square Solutions to Yule-Walker and Maximum Likelihood (Traditional Method) Abbas Kazemipour (UMD) Sparse AR November 18, 2015 32 / 1

  53. Simulation Results for n = 60 , p = 200 , s = 4 Yule-Walker ℓ 2 , 1 True Parameters 0.2 0.2 0.1 0 0 −0.2 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Reguralized ML Yule-Walker Least Squares 0.2 500 0 0 −0.2 −500 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 ML Least Squares OMP 0.1 0.5 0 0 −0.1 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Yule-Walker ℓ 1 , 1 OMP 0.5 0.2 0 0 −0.5 −0.2 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 33 / 1

  54. Simulation Results for n = 120 , p = 200 , s = 4 Yule-Walker ℓ 2 , 1 True Parameters 0.2 0.5 0.1 0 0 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Reguralized ML Yule-Walker Least Squares 0.5 2 0 0 −2 −0.5 −4 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 ML Least Squares OMP 0.5 0.4 0 0.2 −0.5 0 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Yule-Walker ℓ 1 , 1 OMP 0.5 0.5 0 0 −0.5 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 34 / 1

  55. Simulation Results for n = 180 , p = 200 , s = 4 Yule-Walker ℓ 2 , 1 True Parameters 0.2 0.2 0.1 0 0 −0.2 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Reguralized ML Yule-Walker Least Squares 0.1 1 0 0 −0.1 −1 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 ML Least Squares OMP 1 0.5 0.5 0 0 −0.5 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Yule-Walker ℓ 1 , 1 OMP 0.5 0.2 0 0 −0.5 −0.2 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 35 / 1

  56. Simulation Results for n = 280 , p = 200 , s = 4 Yule-Walker ℓ 2 , 1 True Parameters 0.2 0.2 0.1 0.1 0 0 −0.1 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Reguralized ML Yule-Walker Least Squares 0.1 0.2 0 0 −0.1 −0.2 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 ML Least Squares OMP 0.5 0.5 0 0 −0.5 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Yule-Walker ℓ 1 , 1 OMP 0.2 0.5 0 0 −0.2 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 36 / 1

  57. Simulation Results for n = 700 , p = 200 , s = 4 Yule-Walker ℓ 2 , 1 True Parameters 0.2 0.5 0.1 0 0 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Reguralized ML Yule-Walker Least Squares 0.5 0.5 0 0 −0.5 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 ML Least Squares OMP 0.5 0.5 0 0 −0.5 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Yule-Walker ℓ 1 , 1 OMP 0.5 0.5 0 0 −0.5 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 37 / 1

  58. Simulation Results for n = 4000 , p = 200 , s = 4 Yule-Walker ℓ 2 , 1 True Parameters 0.2 0.2 0.1 0.1 0 0 −0.1 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Reguralized ML Yule-Walker Least Squares 0.2 0.2 0.1 0 0 −0.1 −0.2 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 ML Least Squares OMP 0.2 0.4 0 0.2 −0.2 0 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Yule-Walker ℓ 1 , 1 OMP 0.2 0.2 0.1 0.1 0 0 −0.1 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 38 / 1

  59. MSE for different values of n , p = 300 , s = 3 10 2 Regularized ML Least Squares Yule-Walker ℓ 11 Yule-Walker 10 1 Regularized ML + OMP Yule-Walker ℓ 21 OMP + Yule-Walker 10 0 10 -1 10 -2 10 1 10 2 10 3 10 4 10 5 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 39 / 1

  60. OMP Input: L ( θ ) , s ⋆ θ ( s ⋆ ) Output: � AROMP � Start with the index set S (0) = ∅ Initialization: θ (0) and the initial estimate � AROMP = 0 for k = 1 , 2 , · · · , s ⋆ � � � � �� 1 � � θ ( k − 1) � j = arg max ∇ L � � AROMP i i S ( k ) = S ( k − 1) ∪ { j } θ ( k ) � AROMP = supp( x ) ⊂ S ( k ) L ( θ ) arg min end Table: Autoregressive Orthogonal Matching Pursuit (AROMP) Abbas Kazemipour (UMD) Sparse AR November 18, 2015 40 / 1

  61. Main Theoretical Result Theorem If θ is ( s, ξ, 2) -compressible for some ξ < 1 / 2 , there exist constants 1 ǫ s 2 / 3 p 2 / 3 (log s ) 2 / 3 log p , the d ′ 1 , d ′ 2 ǫ , d ′ 3 ǫ and d ′ 4 such that for n > d ′ AROMP estimate satisfies the bound � � � s log s log p log s � � �� 2 ≤ d ′ + d ′ θ AROMP − θ (20) � 2 ǫ 3 ǫ 1 n ξ − 2 s after s ⋆ = O ǫ ( s log s ) iterations with probability greater than � � 1 1 − O . n d ′ 4 1 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 41 / 1

  62. Application to Financial Data 1 Crude oil price of cushing, OK WTI Spot Price FOB dataset. 2 The dataset consists of 7429 daily values 3 outliers removed by visual inspection, n = 4000 4 Long-memory time series → first order differencing 5 Model order selection of low importance here Abbas Kazemipour (UMD) Sparse AR November 18, 2015 42 / 1

  63. Application to Financial Data Abbas Kazemipour (UMD) Sparse AR November 18, 2015 43 / 1

  64. Conclusions 1 First order differences show Gaussian behavior 2 Given no outliers our method predicts a sudden change in prices every 40 , 100 , 150 days. 3 Yule-Walker is bad ! 4 Greedy is good! Abbas Kazemipour (UMD) Sparse AR November 18, 2015 44 / 1

  65. Minimax Framework 1 minimax estimation risk over the class of good stationary: � � �� 1 / 2 R e ( � � � θ − θ � 2 θ ) = sup . (21) E 2 2 The minimax estimator: � R e ( � θ minimiax = arg min θ ) . (22) θ ∈ Θ 3 Typically cannot be constructed → interested in optimal in order estimators : R e ( � θ ) ≤ c R e ( � θ minimax ) . (23) 4 Can also define the minimax prediction risk: �� � 2 � R p ( � x k − � θ ′ x k − 1 θ ) = sup E . (24) k − p Abbas Kazemipour (UMD) Sparse AR November 18, 2015 45 / 1

  66. Minimax Framework 1 ℓ 2 -regularized LS problem: [Goldenhauser 2001] 2 Slightly weaker exponential inequality 3 p ⋆ = ⌊− 1 / 2 log(1 − ǫ ) log n ⌋ is minimax optimal 4 Requires an exponentially large in p sample size 5 Our result: the ℓ 1 -regularized LS estimator is minimax optimal 6 Can afford higher orders Abbas Kazemipour (UMD) Sparse AR November 18, 2015 46 / 1

  67. Minimax Optimality Theorem Let x n 1 be samples of an AR process with s -sparse parameters satisfying � 2 � √ n � θ � 1 ≤ 1 − ǫ , then with a choice of p ⋆ = O ǫ we have: 3 � s n ≤ R e ( � θ minimax ) ≤ R e ( � θ sp ) ≤ c ′ ǫ R e ( � c ǫ θ minimax ) , that is the ℓ 1 -regularized LS estimator is minimax optimal modulo logarithmic factors. 1 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 47 / 1

  68. Minimax Optimality Theorem Let x n − p +1 be samples of an AR process with Gaussian innovation There exist positive constants c ǫ and c ′ ǫ such that for n > c ǫ sp 2 / 3 (log p ) 2 / 3 we have: � 1 � n 2 + (1 − ǫ ) 2 p s + s R p ( � θ sp ) ≤ c ′ + 1 . (25) ǫ n 1 Large n → prediction error variance is very close to the variance of the innovations. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 48 / 1

  69. Proofs: 1 Define the event: i,j | � A := { max R ij − R ij |≤ t ⋆ } . 2 � � c ǫ n 3 / 2 P ( A c ) ≤ c 1 exp − s 3 / 2 ( n + p ) + 2 log p . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 49 / 1

  70. Proofs: 1 � � �� 1 / 2 θ minimax ) 2 ≤ R e ( � θ sp ) 2 = sup R e ( � � � θ − θ � 2 E 2 � � 2 � � � s log p log p 4 ≤ P ( A ) c 2 + c 2 σ s ( θ ) n n + P ( A c ) � � θ sp − � θ � 2 2 � � 2 � � � s log p log p 4 ≤ c 2 + c 2 σ s ( θ ) n n � � c ǫ n 3 / 2 + 4(1 − ǫ ) 2 c 1 exp − s 3 / 2 ( n + p ) + 2 log p . 2 For n > c ǫ sp 2 / 3 (log p ) 2 / 3 , the first term will be the dominant factor Abbas Kazemipour (UMD) Sparse AR November 18, 2015 50 / 1

  71. Proofs: Converse 1 Assumption: Gaussian innovations Lemma (Fano’s Inequality) Let Z be a class of densities with a subclass Z ⋆ densities f θ i , i ∈ { 0 , · · · , 2 s } such that for any two distinct θ 1 , θ 2 ∈ Z ⋆ : D ( f θ 1 � f θ 2 ) ≤ β. Let � θ be an estimate of the parameters. Then θ � = θ j | H j ) ≥ 1 − β + log 2 P ( � sup , (26) s j where H j denotes the hypothesis that θ j is the true parameter, and induces the probability measure P ( . | H j ) . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 51 / 1

  72. Proofs: Converse 1 Class Z of AR processes defined over a fixed subset S ⊂ { 1 , 2 , · · · , p } satisfying | S | = s and by the s -sparse parameter set given by: θ j = ± ηe − N ✶ S ( j ) , (27) where η and N remain to be chosen 2 Add the all zero vector θ to Z . 3 |Z| = 2 s + 1 . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 52 / 1

  73. Proofs: Converse Lemma (Gilbert-Varshamov Lemma) ∃Z ⋆ ⊂ Z , such that |Z ⋆ |≥ 2 ⌊ s/ 8 ⌋ + 1 , and any two distinct θ 1 , θ 2 ∈ Z ⋆ differ at least in s/ 16 components! 1 2 √ se − N := α. � θ 1 − θ 2 � 2 ≥ 1 (28) 4 3 Arbitrary estimate � θ : Hypothesis testing problem between the 2 ⌊ s/ 8 ⌋ + 1 hypotheses H j : θ = θ j ∈ Z ⋆ , and the minimum distance decoding strategy. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 53 / 1

  74. Proofs: Converse 1 Markov’s inequality: � � � � Z ⋆ E [ � � sup θ − θ � 2 ≥ sup θ − θ � 2 ] E Z � � ≥ α θ − θ � 2 ≥ s � � 2 sup Z ⋆ P 2 � � = α � sup θ � = θ j | H j . (29) P 2 j =0 , ··· , 2 ⌊ s 8 ⌋ Abbas Kazemipour (UMD) Sparse AR November 18, 2015 54 / 1

  75. Proofs: Converse 1 f θ i : joint pdf of { x k } k = 1 n conditioned on { x } 0 − p +1 , i ∈ { 0 , · · · , 2 s } . 2 Gaussian innovations, for i � = j � � log f θ i D ( f θ i � f θ j ) ≤ sup E | H i f θ j i � = j � � �� � 2 � � � 2 � n � − 1 � x k − θ ′ i x k − 1 x k − θ ′ j x k − 1 ≤ sup − � H i E k − p k − p 2 i � = j k =1 �� � � 2 � n � ( θ i − θ j ) ′ x k − 1 ≤ sup � H i 2 E k − p i � = j = n ( θ i − θ j ) ′ R p × p ( θ i − θ j ) 2 sup i � = j 2 λ max ≤ ηnse − 2 N ≤ n � θ i − θ j � 2 2 sup := β. (30) ǫ 2 i � = j Abbas Kazemipour (UMD) Sparse AR November 18, 2015 55 / 1

  76. Proofs: Converse 1 Using Fano’s: � � √ se − N � � 1 − 8( ηnse − 2 N + log 2) � � ǫ 2 sup E θ − θ � 2 ≥ . 8 s Z 2 Choose η = ǫ 2 and N = log n for large enough s and n . 3 Any θ ∈ Z , satisfies � θ � 1 ≤ 1 − ǫ . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 56 / 1

  77. Statistical Tests of Goodness-of-Fit 1 The residues (estimated innovations) of the process: e k = x k − � θ x k − 1 i = 1 , 2 , · · · , n. k − p 2 Goal: quantify how close the sequence { e i } n i =1 is to an i.i.d realization of an unknown (mostly absolutely continuous) distribution F 0 . Lemma (Glivenko-Cantelli Theorem) If the samples are generated from F 0 the theorem suggests that: F n ( t ) − F 0 ( t ) | a . s . | � − → 0 . sup t 3 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 57 / 1

  78. Statistical Tests of Goodness-of-Fit 1 Kolmogorov-Smirnov (KS) test statistic | � F n ( t ) − F 0 ( t ) | , K n := sup t 2 Cramer-Von Mises (CvM) statistic � � � 2 � C n := F n ( t ) − F 0 ( t ) dF 0 ( t ) , 3 Anderson-Darling (AD) statistic � � � 2 � F n ( t ) − F 0 ( t ) A n := F 0 ( t ) (1 − F 0 ( t )) dF 0 ( t ) . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 58 / 1

  79. Statistical Tests of Goodness-of-Fit 1 KS �� � � � � � � � � i i − 1 � � � � n − F 0 ( e i ) − F 0 ( e i ) K n = max 1 ≤ i ≤ n max � , , � � � n 2 CvM � � 2 n � 1 F 0 ( e i ) − 2 i − 1 nC n = 12 n + , 2 n i =1 3 AD � � �� n � nA n = − n − 1 (2 i − 1) log F 0 ( e i ) + log 1 − F 0 ( e i ) . n i =1 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 59 / 1

  80. Spectral Forms 1 Based on the similarities of the spectrogram of the data and the estimated power-spectral density of the process. Lemma Let S ( ω ) be the (normalized) power-spectral density of stationary process with bounded condition number, and � S n ( ω ) be the spectrogram of the n samples of a realization of such a process, then for all ω we have: � � � ω � � √ n d . � 2 S n ( λ ) − S ( λ ) dλ − → Z ( ω ) , (31) 0 where Z ( ω ) is a mean zero Gaussian process. 2 Spectral KS, CvM, AD tests ... Abbas Kazemipour (UMD) Sparse AR November 18, 2015 60 / 1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend