Lasso Regularization Paths for NARMAX Models via Coordinate Descent - PowerPoint PPT Presentation

Lasso Regularization Paths for NARMAX Models via Coordinate Descent Antˆ onio H. Ribeiro, Luis A. Aguirre Universidade Federal de Minas Gerais (UFMG), Brazil American Control Conference, June 29, 2018 Milwaukee, U.S. A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 1 / 18

Problem Statement Figure: The system identification problem. A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 2 / 18

Prediction Error Methods Framework Cost Function observed � � �� 2 � � � V ( θ ) = y [ k ] − ˆ y θ [ k ] . � � � �� k predicted A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 3 / 18

Linear-in-the-Parameters Model Linear-in-the-parameter models: basis functions � �� y θ [ k ] = ˆ θ i · x i ( y [ k − 1] , u [ k − 1]) , i Ordinary least-squares formulation: � y θ [ k ] � 2 ⇒ min θ � y − X θ � 2 min � y [ k ] − ˆ 2 θ k A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 4 / 18

L 1 penalty The Lasso θ � y − X θ � 2 min 2 + λ � θ � 1 , Figure: Lasso interpretation (Tibshirani, 1996). Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society. Series B (Methodological) , pages 267–288. A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 5 / 18

Literature Review Solving Lasso Problem Quadratic Programming; Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society. Series B (Methodological) , pages 267–288. A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 6 / 18

Literature Review Solving Lasso Problem Quadratic Programming; LARS (Least Angle Regression) algorithm; Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). Least angle regression. The Annals of Statistics , 32(2):407–499. A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 6 / 18

Literature Review Solving Lasso Problem Quadratic Programming; LARS (Least Angle Regression) algorithm; Coordinate Descent ; Friedman, J., Hastie, T., H¨ ofling, H., and Tibshirani, R. (2007). Pathwise coordinate optimization. The Annals of Applied Statistics , 1(2):302–332. Friedman, J., Hastie, T., and Tibshirani, R. (2009). Glmnet: Lasso and elastic-net regularized generalized linear models. R package version , 1(4). Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of statistical software , 33(1):1. A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 6 / 18

Coordinate Descent Algorithm One-at-a-time coordinate optimization: θ j ← arg θ j min � y − X θ � 2 2 + λ � θ � 1 , Figure: Soft threshold operator A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 7 / 18

Coordinate Descent Algorithm One-at-a-time coordinate optimization: r 1 �� T x j ; λ θ j ← � x j � 2 S ( y − X θ ) + x j θ j , Figure: Soft threshold operator A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 7 / 18

Coordinate Descent Algorithm Optimization Problem θ � y − X θ � 2 min 2 + λ � θ � 1 , Repeat: � � 1 1 θ j ← ( r + x j θ j ) T x j ; λ � x j � 2 S 2 Update r = ( y − X θ ) 3 Next j . A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 8 / 18

Coordinate Descent Algorithm Optimization Problem θ � y − X θ � 2 min 2 + λ � θ � 1 , Repeat: � � 1 1 θ j ← ( r + x j θ j ) T x j ; λ → O ( N ) � x j � 2 S 2 Update r = ( y − X θ ) → O ( N ) 3 Next j . A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 8 / 18

NARMAX model Assuming that: r [ k ] = y [ k ] − ˆ y θ [ k ] p � ˆ y θ [ k ] = θ i · x i ( y [ k − 1] , u [ k − 1] , r [ k − 1] ) . � �� i =1 measured values noise term Estimated parameter: ˆ θ = arg θ min � y − X ( y , u , r ) θ � 2 2 . A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 9 / 18

Extended Least Squares Optimization Problem θ � y − X ( y , u , r ) θ � 2 min 2 , Repeat: � � ( i +1) ← arg θ min 2 ˆ � � � y − X ( y , u , r ( i ) ) θ θ 1 � r ( i +1) ← y − X ( y , u , r ( i ) ) θ ( i +1) 2 ˆ 3 i ← i + 1. A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 10 / 18

Coordinate Descent Algorithm (Revisited) Optimization Problem θ � y − X ( y , u , r ) θ � 2 min 2 + λ � θ � 1 , Repeat: Update x j if it depends on r 1 � � 2 θ + 1 ( r + x j θ j ) T x j ; λ j ← � x j � 2 S 3 Update r = ( y − X θ ) 4 Next j . A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 11 / 18

Coordinate Descent Algorithm (Revisited) Optimization Problem θ � y − X ( y , u , r ) θ � 2 min 2 + λ � θ � 1 , Repeat: Update x j if it depends on r → O ( N ) 1 � � 2 θ + 1 ( r + x j θ j ) T x j ; λ j ← � x j � 2 S → O ( N ) 3 Update r = ( y − X θ ) → O ( N ) 4 Next j . A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 11 / 18

Example I The dataset was generated from the linear system: y [ k ] = 0 . 5 y [ k − 1] − 0 . 5 u [ k − 1] + 0 . 5 v [ k − 1] + v [ k ] . A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 12 / 18

Example I The dataset was generated from the linear system: y [ k ] = 0 . 5 y [ k − 1] − 0 . 5 u [ k − 1] + 0 . 5 v [ k − 1] + v [ k ] . We try to fit the following linear model to the training data (30 regressors): 10 10 10 � � � y [ k ] = θ i y [ k − i ] + θ ( i +10) u [ k − i ] + θ ( i +20) r [ k − i ] . i =1 i =1 i =1 A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 12 / 18

Example I 0.8 y [ k − 1] 0.4 v [ k − 1] 0.0 θ -0.4 u [ k − 1] -0.8 10 − 1 10 − 2 10 − 3 10 − 4 λ Figure: Estimated parameter vector θ as a function of λ . Estimated system: y [ k ] = 0 . 48 y [ k − 1] − 0 . 50 u [ k − 1] + 0 . 44 v [ k − 1]. A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 13 / 18

Example II The dataset was generated from the nonlinear system (Chen, et. al., 1990): (0 . 8 − 0 . 5exp( − y [ k − 1] 2 ) y [ k − 1] + u [ k − 1] − y [ k ] = (0 . 3 + 0 . 9exp( − y [ k − 1] 2 ) y [ k − 2] + 0 . 2 u [ k − 2] + 0 . 1 u [ k − 1] u [ k − 2] + 0 . 1 v [ k − 1] + 0 . 3 v [ k − 2] + v [ k ] , And, we fit a polynomial model with degree 2 and 44 regressors to it. S. Chen, S. A. Billings, and P. M. Grant (1990). Non-linear system identification using neural networks International Journal of Control , vol. 51, no. 6, pp. 1191–1214, 1990. A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 14 / 18

Example II 1.2 0.8 0.4 0.0 θ -0.4 -0.8 -1.2 10 0 10 − 1 10 − 2 10 − 3 10 − 4 λ Figure: Estimated parameter vector θ as a function of λ . For this optimal λ the mean absolute error in the validation set is 1.03 and the model includes the regressors y [ k − 1], u [ k − 1], y [ k − 3], y [ k − 2], u [ k − 2], r [ k − 1], r [ k − 2], y [ k − 1] y [ k − 2], u [ k − 1] u [ k − 2], y [ k − 3] r [ k − 1], y [ k − 2] u [ k − 2]. A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 15 / 18

Related Work H. Wang, G. Li, and C.-L. Tsai (2007). Regression Coefficient and Autoregressive Order Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society. Series B (Statistical Methodology) , vol. 69, no. 1, pp. 63–78, 2007. Y. J. Yoon, C. Park, and T. Lee (2013). Penalized regression models with autoregressive error terms. Journal of Statistical Computation and Simulation , vol. 83, no. 9, pp. 1756–1772, Sep. 2013. A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 16 / 18

Conclusion 1 Timmings; A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 17 / 18

Conclusion 1 Timmings; 2 Convergence; A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 17 / 18

Conclusion 1 Timmings; 2 Convergence; 3 Scaling; A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 17 / 18

Conclusion 1 Timmings; 2 Convergence; 3 Scaling; 4 Elastic net ; A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 17 / 18

Acknoledgments The implementation is available at: https://github.com/antonior92/NarmaxLasso.jl A. H. Ribeiro, L. A. Aguirre (UFMG) Lasso Regularization Paths for NARMAX ACC 2018 18 / 18

Lasso Regularization Paths for NARMAX Models via Coordinate Descent - PowerPoint PPT Presentation

Lasso Regularization Paths for NARMAX Models via Coordinate Descent Ant onio H. Ribeiro, Luis A. Aguirre Universidade Federal de Minas Gerais (UFMG), Brazil American Control Conference, June 29, 2018 Milwaukee, U.S. A. H. Ribeiro, L. A.

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Complexity Analysis of the Lasso Regularization Path Julien Mairal and Bin Yu Inria, UC Berkeley

Regularization: Ridge Regression and the LASSO Statistics 305: Autumn Quarter 2006/2007

Why LASSO, EN, and General Regularization CLOT: Invariance-Based Scale-Invariance: . . .

"Interesting" Paths = Shortest Paths? "Interesting" Paths Shortest Paths!

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

Regularization Paths Boosting fits a regularization path toward a max-margin classifier.

Current Flight Paths Current Flight Paths Current approach and departure paths are all over

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco

Why LASSO, Ridge Need for Strictly . . . Regression, and EN: General Analysis of the . . . Why

Using the lasso in Stata for inference in high-dimensional models David M. Drukker Executive

Econometric Evaluation of Social Programs Part I: Identification James J. Heckman and Edward J.

Enough? Interactive Selection of Bonds between Pairs of Tangible Molecules Patrick Maier, Marcus

Truncated Moreaus sweeping process Florent Nacry - Institut Elie Cartan de Lorraine joint works

Modularity (1): Childhood Activity Modularity Abstract Data Types (ADTs) EECS3311 A & E:

Numerical Analysis of initial data identification of parabolic problems Dmitriy Leykekhman

De-anonymizing D4D Datasets Kumar Sharad 1 George Danezis 2 1 University of Cambridge 2 Microsoft

The CAPCM (Welch, Chapter 10-A) Ivo Welch Maintained Assumptions Perfect Markets 1. No

Risk management for hedge funds AQF 2005 Nicolas Papageorgiou Outline VaR and drawbacks

Lasso Regularization Paths for NARMAX Models via Coordinate Descent - PowerPoint PPT Presentation

Lasso Regularization Paths for NARMAX Models via Coordinate Descent Ant onio H. Ribeiro, Luis A. Aguirre Universidade Federal de Minas Gerais (UFMG), Brazil American Control Conference, June 29, 2018 Milwaukee, U.S. A. H. Ribeiro, L. A.

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Complexity Analysis of the Lasso Regularization Path Julien Mairal and Bin Yu Inria, UC Berkeley

Regularization: Ridge Regression and the LASSO Statistics 305: Autumn Quarter 2006/2007

Why LASSO, EN, and General Regularization CLOT: Invariance-Based Scale-Invariance: . . .

&quot;Interesting&quot; Paths = Shortest Paths? &quot;Interesting&quot; Paths Shortest Paths!

Sparse CCA using Lasso Anastasia Lykou &amp; Joe Whittaker Department of Mathematics and

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

Regularization Paths Boosting fits a regularization path toward a max-margin classifier.

Current Flight Paths Current Flight Paths Current approach and departure paths are all over

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco

Why LASSO, Ridge Need for Strictly . . . Regression, and EN: General Analysis of the . . . Why

Using the lasso in Stata for inference in high-dimensional models David M. Drukker Executive

Econometric Evaluation of Social Programs Part I: Identification James J. Heckman and Edward J.

Enough? Interactive Selection of Bonds between Pairs of Tangible Molecules Patrick Maier, Marcus

Truncated Moreaus sweeping process Florent Nacry - Institut Elie Cartan de Lorraine joint works

Modularity (1): Childhood Activity Modularity Abstract Data Types (ADTs) EECS3311 A &amp; E:

Numerical Analysis of initial data identification of parabolic problems Dmitriy Leykekhman

De-anonymizing D4D Datasets Kumar Sharad 1 George Danezis 2 1 University of Cambridge 2 Microsoft

The CAPCM (Welch, Chapter 10-A) Ivo Welch Maintained Assumptions Perfect Markets 1. No

Risk management for hedge funds AQF 2005 Nicolas Papageorgiou Outline VaR and drawbacks

"Interesting" Paths = Shortest Paths? "Interesting" Paths Shortest Paths!

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

Modularity (1): Childhood Activity Modularity Abstract Data Types (ADTs) EECS3311 A & E: