analysis of big dependent data in economics and finance
play

Analysis of Big Dependent Data in Economics and Finance Ruey S. - PowerPoint PPT Presentation

Analysis of Big Dependent Data in Economics and Finance Ruey S. Tsay Booth Shool of Business, University of Chicago September 2016 Ruey S. Tsay Big Dependent Data 1 / 72 Outline Big data? Machine learning? Data science? What is in for 1


  1. Analysis of Big Dependent Data in Economics and Finance Ruey S. Tsay Booth Shool of Business, University of Chicago September 2016 Ruey S. Tsay Big Dependent Data 1 / 72

  2. Outline Big data? Machine learning? Data science? What is in for 1 economics and finance? Real-world data are often dynamically dependent 2 A simple example: Methods for independent data may fail 3 Trade-off between simplicity and reality 4 Some methods useful for analyzing big dependent data in 5 economics and finance Examples 6 Concluding remarks 7 Ruey S. Tsay Big Dependent Data 2 / 72

  3. Big dependent data Accurate information is the key to success in the 1 competitive global economy. Information age. What is big data? High dimension (many variables)? Large 2 sample size? Both? Not all big data sets are useful. Confounding & Noises 3 Need to develop methods to efficiently extract useful 4 information from big data Know the limitations of big data 5 Issues emerged from big data: privacy? ethical issues? 6 Focus on methods for analyzing big dependent data in 7 economics and finance Ruey S. Tsay Big Dependent Data 3 / 72

  4. What are available? Statistical methods: Focus on sparsity (Simplicity) 1 Various penalized regressions, e.g. Lasso and its 2 extensions Various dimension reduction methods and models 3 Common framework used: Independent observations, with 4 limited extensions to stationary data Real data are often dynamically dependent! Some useful concepts in analyzing big data: Parsimony vs sparsity: Parsimony ⇒ Sparsity 1 Simplicity vs reality: trade-off btw feasibility & 2 sophistication Ruey S. Tsay Big Dependent Data 4 / 72

  5. Parsimonious, not sparse A simple example k k � � y t = c + β x it + ǫ t = c + β x it + ǫ t , i = 1 i = 1 where k is large, x it are not perfectly correlated, and ǫ t are iid N ( 0 , σ 2 ) . The model has three parameters so it is parsimonious, but not sparse because y depends on all explanatory variables. In some applications, � k i = 1 x it is a close approximation to the first principal component. For example, the level of interest rates is important to an economy. Fused-Lasso can solve this difficulty in some situations. Ruey S. Tsay Big Dependent Data 5 / 72

  6. What is LASSO regression? Model: (assume mean-adjusted) p � y i = β j X j , i + ǫ i . j = 1 Matrix form: X is the design matrix Y = X β + ǫ . Objective function: In particular, if p > T � ( � Y − X β � 2 β ( λ ) = arg min 2 / T + λ � β � 1 ) , β where λ ≥ 0 is a penalty parameter, � β � 1 = � p j = 1 | β j | , 2 = � T i = 1 ( y i − X ′ � Y − X β � 2 i β ) 2 Ruey S. Tsay Big Dependent Data 6 / 72

  7. What is the big deal? Sparsity Using convexity, LASSO is equivalent to � � Y − X β � 2 β opt ( R ) = arg 2 / T . min β ; � β � 1 ≤ R Old friend: Ridge regression � ( � Y − X β � 2 2 / T + λ � β � 2 β Ridge ( λ ) = arg min 2 ) , or β � � Y − X β � 2 β ( R ) = arg min 2 / T . β ; � β � 2 2 ≤ R Special case: p = 2. � Y − X β � 2 2 / T is quadratic. � β � 1 is a region of diamond shape, yet � β � 2 2 is a circle. Thus, LASSO leads to sparsity. Ruey S. Tsay Big Dependent Data 7 / 72

  8. Computation and extensions Optimization: Least angle regression (lars) by Efron et al. 1 (2004) makes the computation very efficient. Extensions: 2 Group lasso: Yuan and Lin (2006). Subsets of X have specific meaning, e.g. treatment Elastic net: Zou and Hastie (2005). Using a combination of L 1 and L 2 penalties SCAD: Fan and Li (2001). Nonconcave penalized likelihood. [Smoothly clipped absolute deviation (SCAD).] Various Bayesian methods: penalty function is the prior. Packages available in R: lars, glmnet, gamlr, gbm and 3 many others. Ruey S. Tsay Big Dependent Data 8 / 72

  9. A simulated example p = 300, T = 150, X iid N ( 0 , 1 ) , ǫ i iid N ( 0 , 0 . 25 ) . y i = x 3 i + 2 ( x 4 i + x 5 i + x 7 i ) − 2 ( x 11 , i + x 12 , i + x 13 , i + x 21 , i + x 22 , i + x 30 , i )+ ǫ i How? R demonstration 1 Selection of λ ? Cross-validation (10-fold), measurement of 2 prediction accuracy The commands lars and cv.lars of the package lars 3 The commands glmnet and cv.glmnet of the package 4 glmnet Relationship between the two packages (alpha = 0) 5 Ruey S. Tsay Big Dependent Data 9 / 72

  10. Lasso may fail for dependent data Data generating model: scalar Gaussian autoregressive, 1 AR(3), model x t = 1 . 9 x t − 1 − 0 . 8 x t − 2 − 0 . 1 x t − 3 + a t , a t ∼ N ( 0 , 1 ) . Generate 2000 observations. See Figure 1. Big data setup 2 Dependent x t : t = 11 , . . . , 2000 Regressors: X t = [ x t − 1 , x t − 2 , . . . , x t − 10 , z 1 t , . . . , z 10 , t ] , where z it are iid N ( 0 , 1 ) . Dimension = 20, sample size 1990. Run the Lasso regression via the lars package of R. See 3 Figure 2 for results. Lag 3, x t − 3 was not selected. Lasso fails in this case. Ruey S. Tsay Big Dependent Data 10 / 72

  11. −25000 −30000 xt −35000 −40000 0 500 1000 1500 2000 Time Figure: Time plot of simulated AR(3) time series with 2000 observations Ruey S. Tsay Big Dependent Data 11 / 72

  12. LASSO 0 1 9 23 28 35 39 40 43 48 50 1 * * * * * * * * ** * 4e+05 * * ** * * * * * * * ** * * * * * ** ** Standardized Coefficients * ** * * * * ** * * * * * 2e+05 0e+00 * * * 5 * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * ** ** ** * ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** * ** * ** * ** * * ** * ** * ** * ** * ** * * ** * ** * ** * ** * ** * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * ** * * * * ** * * * * * * * * * * * * * * * ** * * * 6 * * * * * * * ** * * * * * * ** * * ** ** * * * * * −2e+05 * * * * * * * * 2 0.0 0.2 0.4 0.6 0.8 1.0 |beta|/max|beta| Figure: Results of Lasso regression for the AR(3) series Ruey S. Tsay Big Dependent Data 12 / 72

  13. OLS works if we entertain AR models Run the linear regression using the first three variables of X t . Fitted model x t = 1 . 902 x t − 1 − 0 . 807 x t − 2 − 0 . 095 x t − 3 + ǫ t , σ ǫ = 1 . 01 . All estimates are statistically significant with p -value less than 2 . 22 × 10 − 5 . The residuals are well behaved, e.g. Q ( 10 ) = 12.23 with p -value 0.20 (after adjusting the df). Simple time series method works for dependent data. Ruey S. Tsay Big Dependent Data 13 / 72

  14. Why does lasso fail? Two possibilities: Scaling effect: Lasso standardizes each variable in X t . For 1 unit-root non-stationary time series, standardization might wash out the dependence in the stationary part Multicollinearity: Unit-root time series have strong serial 2 correlations. [ACF approach 1 for all lags.] This artificial example highlights the difference between independent and dependent data. Need to develop methods for big dependent data! Ruey S. Tsay Big Dependent Data 14 / 72

  15. Possible solutions Re-parameterization using time series properties 1 Use different penalties for different parameters 2 The first approach is easier. For the particular time series, we can define ∆ x t = ( 1 − B ) x t and ∆ 2 x t = ( 1 − B ) 2 x t . Then, x t = 1 . 9 x t − 1 − 0 . 8 x t − 2 − 0 . 1 x t − 3 + a t x t − 1 + ∆ x t − 1 − 0 . 1 ∆ 2 x t − 1 + a t = = double + single + stationary + a t . The coefficients of x t − 1 , ∆ x t − 1 , ∆ 2 x t − 1 are 1, 1, an − 0 . 1, respectively. Ruey S. Tsay Big Dependent Data 15 / 72

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend