Big Data Analytics in Economics: What Have We Learned so Far, and - PowerPoint PPT Presentation

Big Data Analytics in Economics: What Have We Learned so Far, and Where Should We Go From Here? Norman R. Swanson and Weiqi Xiong* June 2017 *Rutgers University prep. for Workshop on Forecasting at Deutsche Bundesbank, Sept., 2017

 Availability of big data at many frequencies, for many variables is a key driving force for applied and theoretical work.  Methodological and empirical advances have accumulated very quickly in recent years.  I will discuss a very few of the advances in forecasting due in large part to this phenomenon – model building and model selection methods.

 I . Model Building:  Discuss - Factor Models and Diffusion Indices  Principal component analysis  Sparse principal component analysis  Independent component analysis  Mention - Mixed Frequency (MF) Indices  Hybrid models using MF and diffusion indices  Modeling with switching and surveys

 Discuss - Machine Learning, Variable Selection, and Shrinkage  Bagging  Boosting  Ridge regression  Least angle regression  Lasso  Elastic net  Non-negative garrote  Hybrid factor models using above methods

 II. Model Selection:  Loss Function Dependent Tests  Pairwise Comparison  Data Snooping or Multiple Comparison  Robust Forecast Comparison  Stochastic Dominance Methods  Robust to Choice of Loss Function

X t     F t   t , X t an N  1 vector  an N  r factor loading matrix  0 an N  1 intercept F t is unobserved r  1 factor vector  t an error term y t  h  W t  W  F t  F   t  h  Above model includes variables; autoregressive structure – key W t additional variables.  Allow for random walk, AR, and VAR strawman models.  Factor model is an approximation. Underlying model may not have a factor structure, but complex and rich covariance structure (e.g. in MC studies) across the X variables lends itself to principal component type shrinkage.  What about mixed frequency models? Estimation of diffusion indices?

 What about usefulness of sparseness (SPCA – discussed later) and zero restrictions in factor loadings? e.g. Ability to isolate potential control variables for policy analysis. Interpretability remains an issue.  Armah and Swanson (2010): Factor “proxy” selection  small set of observables as predictors. Parsimonious model selection?  Key predictors = “variable subset”? Targeted predictors (e.g. Bai and Ng (2007,2008))?  In Carrasco and Rossi (2016)  factors chosen using cross validation … explicitly considers “target variable”. What about also selecting factor loadings based on target variable? i.e. three layers here:  (i) Traditional approach of using highest eigenvalue factors.  (ii) Select factors other than highest eigenval. ones, given target variable.  (iii) Use (ii) and also determine “adjusted” loadings = shrinkage = lasso …

 Might lack of sparseness be of interest?  Variables that are not usually relevant included, and if these variables “jump” under structural change, then may impose robustness to structural instability  Turning point stability of predictions ...  But sparseness useful to isolate potential control variable … interpretability.  Again leads to methodology of Bai and Ng, i.e., targeted predictors.  What about: Couple shrinkage regression approach with factor/loadings shrinkage methods, such as sparse PCA, and include also a set of W targeted “stability predictors”, say, or a factor constructed using these stability predictors.  Kim and Swanson (2014,2016)  SPCA then shrinkage, or shrinkage followed by ICA, SPCA or PCA dimension reduction = lasso, elastic net -> get targeted predictors … then construct factors …  Or directly “shrink” factors to a particular target …

 Independent Component Analysis  Assume the F are statistically independent  .  As is evident from above figure, ICA exactly the same as PCA, if demixing matrix is the factor loading coefficient matrix associated with PCA.  In general, ICA yields uncorrelated factors with descending variance => easy "ordering".  Moreover, those components explaining the largest share of the variance are often assumed to be the "relevant" ones for subsequent use in diffusion index forecasting.

For simplicity, consider two observables, X   X 1 , X 2  .  .  . PCA transforms X into uncorrelated components F   F 1 , F 2  .  . Joint pdf characterized by E  F 1 F 2   E  F 1  E  F 2  .  . ICA finds a demixing matrix which transforms the observed X into independent components F ∗   F 1 ∗ , F 2 ∗  . ∗ q   E  F 1 ∗ p  E  F 2 ∗ q  . ∗ p F 2 Joint pdf characterized by E  F 1  .

 Use multiple frequencies of data?  Pastcasting, nowcasting, forecasting, and “continuous” updating.  Example: Factor MIDAS used for predicting quarterly data via the use of monthly factors (Marcellino and Schumacher (2010)). h q  MIDAS model for forecasting quarters ahead is  3    t q ̂ t m Y t q  h q   0   1 B  L 1/ m ,   F  . j max B  L 1/ m ,    ∑ b  j ,   L j / m  . j  0 exp  1 j   2 j 2 b  j ,    ∑ j  0 j max  . -- Almon distributed lag exp  1 j   2 j 2  3  is skip sampled fromthe monthly factor, F ̂ t m is a set of monthly factors F ̂ t m ̂ t m     1 ,  2  F

Sparseness not present in ridge regression, but may be useful for interpretation of factors. Key idea is to be able to (uniquely) estimate regression coefficients when number of variables > sample size.  Optimization Problems that treat such multicollinearity. p s.t. ‖  ‖ 2  ∑ 2 ≤  . min ‖ y − X  ‖ 2 Ridge (Hoerl):  j p j  1 s.t. ‖  ‖ 1  ∑ min ‖ y − X  ‖ 2 |  j | ≤  . Lasso (Tibshirani): j  1 p p s.t. ‖  ‖ 2  ∑ |  j | ≤  1 and ∑ 2 ≤  2 . min ‖ y − X  ‖ 2 Elastic Net (Zou, Hastie):  j j  1 j  1  Ridge the original  but lasso (least absolute shrinkage and selection operator) shrinks some parameters all the way to zero.  Elastic net (Zou and Hastie (2005)) combines the two.  If do not care about sparsity, how about neural nets as an alternative? Overfitting matters – how big an issue in factor analysis w/o sparseness, in the sense of PEER?

 Circling Back  Consider SPCA (Zou, Hastie and Tibshirani (2006)), which adds the sparseness feature of lasso (elastic net) to PCA. How? Reformulate PCA as a regression-type optimization problem, and then impose the lasso (elastic net = double shrinkage) . Consider penalized regression form of the optimization problems outlined above.  2   1  j  1 N |  j | . N X j  j ‖  lasso  arg min  ‖ y −  j  1  2   1  j  1 N |  j |   2  j  1 N  j N X j  j ‖  elastic net   1   2  arg min  ‖ y −  j  1 2 .  2-stage SPCA? Replace y with F -> ridge is PCA then add L1-norm penalty.  Constraint: Lasso can select at most T of N variables, when N>T in PCA construction.  Economic interpretability of factors. Couple SPCA for factors with further targeted (on predictor variable) penalized regression?

Recalling that the L1 norm does not necessarily lead to sparsity, but the L1 regularization term (the penalty) on the weights/coefficients in the model does. L2-norm (e.g. least squares regression) L1-norm (e.g. LAD regression) Not so robust to outliers Robust Stable solution Unstable solution for small data perturbations Unique solution Possibly multiple solutions Non-sparsity Sparsity Computational efficiency (anal. soln) Comput. inefficiency (what if non-sparse?)

 Diebold and Mariano (1995), White (2000), Chao, Corradi and Swanson (2001), Clark and McCracken (2001,2013), Corradi and Swanson (2006) …  Key Question: Should We Utilize Loss Function Specific Measures, or Not? H 0 : E  g  u 0, t  h  − g  u 1, t  h   0  . H A : E  g  u 0, t  h  − g  u 1, t  h  ≠ 0 d DM P  d t → N  0,1  ,  Pairwise Accuracy   dt  d t , d t  g  u 0, t  h  − g   dt P ∑ t  R  1 1 T d t  u 1, t  h  , and  d t  P .  m P  P − 1/2 ∑ t  R  1 T  u 0, t  h X t Causality S P  max k  1,..., m DM P  1, k   Big Data

Stochastic Dominance Methods  General Loss Forecast Superiority <-> 1 st Order Stochastic Dominance u 1  G u 2 iff E  L  u 1  ≤ E  L  u 2  , ∀ L ∈ L G  Convex Loss Forecast Superiority <-> 2 nd Order Stochastic Dominance u 1  C u 2 iff E  L  u 1  ≤ E  L  u 2  , ∀ L ∈ L C E  L  u 1  ≤ E  L  u 2  for all L iff G  x  ≤ 0,  Implementation: G  x    F 2  x  − F 1  x  sgn  x  x  F 1  t  − F 2  t  dt 1  x  0    x   F 2  t  − F 1  t  dt 1  x ≥ 0  C  x    −

Big Data Analytics in Economics: What Have We Learned so Far, and - PowerPoint PPT Presentation

Big Data Analytics in Economics: What Have We Learned so Far, and Where Should We Go From Here? Norman R. Swanson and Weiqi Xiong* June 2017 *Rutgers University prep. for Workshop on Forecasting at Deutsche Bundesbank, Sept., 2017

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

Analytics (9:55-10:15am) Break Research Opportunities in Location, Analytics, Big Data and GIS

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

Big Data Analytics Armistead Boyd SVP, Product & Data Partnerships October 25, 2016 What is

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural

Big Data Analytics: What is Big Data? H. Andrew Schwartz Stony Brook University CSE545, Fall

Lessons Learned Lessons Learned From From Lessons Learned Lessons Learned From From

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Predictive Simulation & Big Data Analytics ISD Analytics Predict a better future

Michael Stonebraker The Meaning of Big Data - 3 V s Big Volume With simple (SQL)

WHY STUDY ECONOMICS? Choosing a major or minor in economics MYTHS OF ECONOMICS: Economics is

How Stranger Things can happen with Visual Analytics Jason Flittner Senior Analytics

Sparse PCA refusing to graduate :-) Aviad Rubinstein (UC Berkeley) Joint work with Siu-On Chan

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Principal Components Analysis David Benjamin, Broad DSDE Methods February 10, 2016 What is PCA?

Lecture 14: High Dimensionality & PCA CS109A Introduction to Data Science Pavlos Protopapas,

Multiple Object Tracking Using Local PCA C. Beleznai 1 , B. Frhstck 2 , H. Bischof 3 1 Advanced

Access Link at UHCprovider.com Sign in to Link by clicking on the Link button in the top right

Connecting the dots 2 Planning the Retreat Schedule a mandatory retreat date Find a location

Unsupervised Data Discretization of Mixed Data Types Jee Vang Outline Introduction

Big Data Analytics in Economics: What Have We Learned so Far, and - PowerPoint PPT Presentation

Big Data Analytics in Economics: What Have We Learned so Far, and Where Should We Go From Here? Norman R. Swanson and Weiqi Xiong* June 2017 *Rutgers University prep. for Workshop on Forecasting at Deutsche Bundesbank, Sept., 2017

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

Analytics (9:55-10:15am) Break Research Opportunities in Location, Analytics, Big Data and GIS

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

Big Data Analytics Armistead Boyd SVP, Product &amp; Data Partnerships October 25, 2016 What is

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural

Big Data Analytics: What is Big Data? H. Andrew Schwartz Stony Brook University CSE545, Fall

Lessons Learned Lessons Learned From From Lessons Learned Lessons Learned From From

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Predictive Simulation &amp; Big Data Analytics ISD Analytics Predict a better future

Michael Stonebraker The Meaning of Big Data - 3 V s Big Volume With simple (SQL)

WHY STUDY ECONOMICS? Choosing a major or minor in economics MYTHS OF ECONOMICS: Economics is

How Stranger Things can happen with Visual Analytics Jason Flittner Senior Analytics

Sparse PCA refusing to graduate :-) Aviad Rubinstein (UC Berkeley) Joint work with Siu-On Chan

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Principal Components Analysis David Benjamin, Broad DSDE Methods February 10, 2016 What is PCA?

Lecture 14: High Dimensionality &amp; PCA CS109A Introduction to Data Science Pavlos Protopapas,

Multiple Object Tracking Using Local PCA C. Beleznai 1 , B. Frhstck 2 , H. Bischof 3 1 Advanced

Access Link at UHCprovider.com Sign in to Link by clicking on the Link button in the top right

Connecting the dots 2 Planning the Retreat Schedule a mandatory retreat date Find a location

Unsupervised Data Discretization of Mixed Data Types Jee Vang Outline Introduction

Big Data Analytics Armistead Boyd SVP, Product & Data Partnerships October 25, 2016 What is

Predictive Simulation & Big Data Analytics ISD Analytics Predict a better future

Lecture 14: High Dimensionality & PCA CS109A Introduction to Data Science Pavlos Protopapas,