Analysis of Big Dependent Data in Economics and Finance Ruey S. - PowerPoint PPT Presentation

Analysis of Big Dependent Data in Economics and Finance Ruey S. Tsay Booth Shool of Business, University of Chicago September 2016 Ruey S. Tsay Big Dependent Data 1 / 72

Outline Big data? Machine learning? Data science? What is in for 1 economics and finance? Real-world data are often dynamically dependent 2 A simple example: Methods for independent data may fail 3 Trade-off between simplicity and reality 4 Some methods useful for analyzing big dependent data in 5 economics and finance Examples 6 Concluding remarks 7 Ruey S. Tsay Big Dependent Data 2 / 72

Big dependent data Accurate information is the key to success in the 1 competitive global economy. Information age. What is big data? High dimension (many variables)? Large 2 sample size? Both? Not all big data sets are useful. Confounding & Noises 3 Need to develop methods to efficiently extract useful 4 information from big data Know the limitations of big data 5 Issues emerged from big data: privacy? ethical issues? 6 Focus on methods for analyzing big dependent data in 7 economics and finance Ruey S. Tsay Big Dependent Data 3 / 72

What are available? Statistical methods: Focus on sparsity (Simplicity) 1 Various penalized regressions, e.g. Lasso and its 2 extensions Various dimension reduction methods and models 3 Common framework used: Independent observations, with 4 limited extensions to stationary data Real data are often dynamically dependent! Some useful concepts in analyzing big data: Parsimony vs sparsity: Parsimony ⇒ Sparsity 1 Simplicity vs reality: trade-off btw feasibility & 2 sophistication Ruey S. Tsay Big Dependent Data 4 / 72

Parsimonious, not sparse A simple example k k � � y t = c + β x it + ǫ t = c + β x it + ǫ t , i = 1 i = 1 where k is large, x it are not perfectly correlated, and ǫ t are iid N ( 0 , σ 2 ) . The model has three parameters so it is parsimonious, but not sparse because y depends on all explanatory variables. In some applications, � k i = 1 x it is a close approximation to the first principal component. For example, the level of interest rates is important to an economy. Fused-Lasso can solve this difficulty in some situations. Ruey S. Tsay Big Dependent Data 5 / 72

What is LASSO regression? Model: (assume mean-adjusted) p � y i = β j X j , i + ǫ i . j = 1 Matrix form: X is the design matrix Y = X β + ǫ . Objective function: In particular, if p > T � ( � Y − X β � 2 β ( λ ) = arg min 2 / T + λ � β � 1 ) , β where λ ≥ 0 is a penalty parameter, � β � 1 = � p j = 1 | β j | , 2 = � T i = 1 ( y i − X ′ � Y − X β � 2 i β ) 2 Ruey S. Tsay Big Dependent Data 6 / 72

What is the big deal? Sparsity Using convexity, LASSO is equivalent to � � Y − X β � 2 β opt ( R ) = arg 2 / T . min β ; � β � 1 ≤ R Old friend: Ridge regression � ( � Y − X β � 2 2 / T + λ � β � 2 β Ridge ( λ ) = arg min 2 ) , or β � � Y − X β � 2 β ( R ) = arg min 2 / T . β ; � β � 2 2 ≤ R Special case: p = 2. � Y − X β � 2 2 / T is quadratic. � β � 1 is a region of diamond shape, yet � β � 2 2 is a circle. Thus, LASSO leads to sparsity. Ruey S. Tsay Big Dependent Data 7 / 72

Computation and extensions Optimization: Least angle regression (lars) by Efron et al. 1 (2004) makes the computation very efficient. Extensions: 2 Group lasso: Yuan and Lin (2006). Subsets of X have specific meaning, e.g. treatment Elastic net: Zou and Hastie (2005). Using a combination of L 1 and L 2 penalties SCAD: Fan and Li (2001). Nonconcave penalized likelihood. [Smoothly clipped absolute deviation (SCAD).] Various Bayesian methods: penalty function is the prior. Packages available in R: lars, glmnet, gamlr, gbm and 3 many others. Ruey S. Tsay Big Dependent Data 8 / 72

A simulated example p = 300, T = 150, X iid N ( 0 , 1 ) , ǫ i iid N ( 0 , 0 . 25 ) . y i = x 3 i + 2 ( x 4 i + x 5 i + x 7 i ) − 2 ( x 11 , i + x 12 , i + x 13 , i + x 21 , i + x 22 , i + x 30 , i )+ ǫ i How? R demonstration 1 Selection of λ ? Cross-validation (10-fold), measurement of 2 prediction accuracy The commands lars and cv.lars of the package lars 3 The commands glmnet and cv.glmnet of the package 4 glmnet Relationship between the two packages (alpha = 0) 5 Ruey S. Tsay Big Dependent Data 9 / 72

Lasso may fail for dependent data Data generating model: scalar Gaussian autoregressive, 1 AR(3), model x t = 1 . 9 x t − 1 − 0 . 8 x t − 2 − 0 . 1 x t − 3 + a t , a t ∼ N ( 0 , 1 ) . Generate 2000 observations. See Figure 1. Big data setup 2 Dependent x t : t = 11 , . . . , 2000 Regressors: X t = [ x t − 1 , x t − 2 , . . . , x t − 10 , z 1 t , . . . , z 10 , t ] , where z it are iid N ( 0 , 1 ) . Dimension = 20, sample size 1990. Run the Lasso regression via the lars package of R. See 3 Figure 2 for results. Lag 3, x t − 3 was not selected. Lasso fails in this case. Ruey S. Tsay Big Dependent Data 10 / 72

−25000 −30000 xt −35000 −40000 0 500 1000 1500 2000 Time Figure: Time plot of simulated AR(3) time series with 2000 observations Ruey S. Tsay Big Dependent Data 11 / 72

LASSO 0 1 9 23 28 35 39 40 43 48 50 1 * * * * * * * * ** * 4e+05 * * ** * * * * * * * ** * * * * * ** ** Standardized Coefficients * ** * * * * ** * * * * * 2e+05 0e+00 * * * 5 * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * ** ** ** * ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** * ** * ** * ** * * ** * ** * ** * ** * ** * * ** * ** * ** * ** * ** * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * ** * * * * ** * * * * * * * * * * * * * * * ** * * * 6 * * * * * * * ** * * * * * * ** * * ** ** * * * * * −2e+05 * * * * * * * * 2 0.0 0.2 0.4 0.6 0.8 1.0 |beta|/max|beta| Figure: Results of Lasso regression for the AR(3) series Ruey S. Tsay Big Dependent Data 12 / 72

OLS works if we entertain AR models Run the linear regression using the first three variables of X t . Fitted model x t = 1 . 902 x t − 1 − 0 . 807 x t − 2 − 0 . 095 x t − 3 + ǫ t , σ ǫ = 1 . 01 . All estimates are statistically significant with p -value less than 2 . 22 × 10 − 5 . The residuals are well behaved, e.g. Q ( 10 ) = 12.23 with p -value 0.20 (after adjusting the df). Simple time series method works for dependent data. Ruey S. Tsay Big Dependent Data 13 / 72

Why does lasso fail? Two possibilities: Scaling effect: Lasso standardizes each variable in X t . For 1 unit-root non-stationary time series, standardization might wash out the dependence in the stationary part Multicollinearity: Unit-root time series have strong serial 2 correlations. [ACF approach 1 for all lags.] This artificial example highlights the difference between independent and dependent data. Need to develop methods for big dependent data! Ruey S. Tsay Big Dependent Data 14 / 72

Possible solutions Re-parameterization using time series properties 1 Use different penalties for different parameters 2 The first approach is easier. For the particular time series, we can define ∆ x t = ( 1 − B ) x t and ∆ 2 x t = ( 1 − B ) 2 x t . Then, x t = 1 . 9 x t − 1 − 0 . 8 x t − 2 − 0 . 1 x t − 3 + a t x t − 1 + ∆ x t − 1 − 0 . 1 ∆ 2 x t − 1 + a t = = double + single + stationary + a t . The coefficients of x t − 1 , ∆ x t − 1 , ∆ 2 x t − 1 are 1, 1, an − 0 . 1, respectively. Ruey S. Tsay Big Dependent Data 15 / 72

Analysis of Big Dependent Data in Economics and Finance Ruey S. - PowerPoint PPT Presentation

Analysis of Big Dependent Data in Economics and Finance Ruey S. Tsay Booth Shool of Business, University of Chicago September 2016 Ruey S. Tsay Big Dependent Data 1 / 72 Outline Big data? Machine learning? Data science? What is in for 1

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Why Dependent Origination? So what is dependent origination? Dependent on ignorance, there

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Dependent Eligibility Audit Dependent Eligibility Audit Purpose: The dependent eligibility audit

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Copula Models for Dependent Data Analysis Yihao Deng Department of Mathematical Sciences Purdue

WHY STUDY ECONOMICS? Choosing a major or minor in economics MYTHS OF ECONOMICS: Economics is

MSc Economics and MSc Finance Prof. Manolis Galenianos, Director MSc Economics Prof. Alessio

Structural-Factor Modeling of Big Dependent Data Ruey S. Tsay Booth School of Business,

Big Questions for Social Media Big Data Ignacio Penas 27/03/2020 Big data for human behaviour

Toward dependent choice: a classical sequent calculus with dependent types Hugo Herbelin 1 ,

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

BIG DATA: Revolutionizing construction business through socmed data mining REVOLUTIONIZING

Introduction ECN 102: Analysis of Economic Data Winter, 2011 J. Parman (UC-Davis) Analysis of

dedate then regulatory network gere neuroscience lo Primary decomposition model wiring diagram

The Stuxnet Worm Babak Yadegari and Paul Mueller CSc 566: Computer Security April 25, 2012

Poverty in Canada: Unidimensional and Multidimensional Measures Presented by: Lori J Curtis, PhD

Overview Preterm Birth The Persistent Dilemma of Preterm Delivery Prevalence Etiology Current

Bleeding in Dialysis Patients Dr. Birnbaumer has no financial disclosures Diane M.

Applying IPFIX to Network Measurement and Management presented

IDIS OVERVIEW IDIS Online for CDBG Entitlement Communities 1 What is IDIS Online? Real-time

Analysis of Big Dependent Data in Economics and Finance Ruey S. - PowerPoint PPT Presentation

Analysis of Big Dependent Data in Economics and Finance Ruey S. Tsay Booth Shool of Business, University of Chicago September 2016 Ruey S. Tsay Big Dependent Data 1 / 72 Outline Big data? Machine learning? Data science? What is in for 1

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Why Dependent Origination? So what is dependent origination? Dependent on ignorance, there

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Dependent Eligibility Audit Dependent Eligibility Audit Purpose: The dependent eligibility audit

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Copula Models for Dependent Data Analysis Yihao Deng Department of Mathematical Sciences Purdue

WHY STUDY ECONOMICS? Choosing a major or minor in economics MYTHS OF ECONOMICS: Economics is

MSc Economics and MSc Finance Prof. Manolis Galenianos, Director MSc Economics Prof. Alessio

Structural-Factor Modeling of Big Dependent Data Ruey S. Tsay Booth School of Business,

Big Questions for Social Media Big Data Ignacio Penas 27/03/2020 Big data for human behaviour

Toward dependent choice: a classical sequent calculus with dependent types Hugo Herbelin 1 ,

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES &amp; OPPORTUNITIES Paris Big Data

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

BIG DATA: Revolutionizing construction business through socmed data mining REVOLUTIONIZING

Introduction ECN 102: Analysis of Economic Data Winter, 2011 J. Parman (UC-Davis) Analysis of

dedate then regulatory network gere neuroscience lo Primary decomposition model wiring diagram

The Stuxnet Worm Babak Yadegari and Paul Mueller CSc 566: Computer Security April 25, 2012

Poverty in Canada: Unidimensional and Multidimensional Measures Presented by: Lori J Curtis, PhD

Overview Preterm Birth The Persistent Dilemma of Preterm Delivery Prevalence Etiology Current

Bleeding in Dialysis Patients Dr. Birnbaumer has no financial disclosures Diane M.

Applying IPFIX to Network Measurement and Management presented

IDIS OVERVIEW IDIS Online for CDBG Entitlement Communities 1 What is IDIS Online? Real-time

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data