Big Data Analytics in Economics: What Have We Learned so Far, and - - PowerPoint PPT Presentation

big data analytics in economics what have we learned so
SMART_READER_LITE
LIVE PREVIEW

Big Data Analytics in Economics: What Have We Learned so Far, and - - PowerPoint PPT Presentation

Big Data Analytics in Economics: What Have We Learned so Far, and Where Should We Go From Here? Norman R. Swanson and Weiqi Xiong* June 2017 *Rutgers University prep. for Workshop on Forecasting at Deutsche Bundesbank, Sept., 2017


slide-1
SLIDE 1

Big Data Analytics in Economics: What Have We Learned so Far, and Where Should We Go From Here? Norman R. Swanson and Weiqi Xiong* June 2017

*Rutgers University

  • prep. for Workshop on Forecasting at Deutsche Bundesbank, Sept., 2017
slide-2
SLIDE 2

 Availability of big data at many frequencies,

for many variables is a key driving force for applied and theoretical work.

 Methodological and empirical advances have

accumulated very quickly in recent years.

 I will discuss a very few of the advances in

forecasting due in large part to this phenomenon – model building and model selection methods.

slide-3
SLIDE 3

 I . Model Building:  Discuss - Factor Models and Diffusion Indices

 Principal component analysis  Sparse principal component analysis  Independent component analysis

 Mention - Mixed Frequency (MF) Indices

 Hybrid models using MF and diffusion indices  Modeling with switching and surveys

slide-4
SLIDE 4

 Discuss - Machine Learning, Variable Selection,

and Shrinkage

 Bagging  Boosting  Ridge regression  Least angle regression  Lasso  Elastic net  Non-negative garrote  Hybrid factor models using above methods

slide-5
SLIDE 5

 II. Model Selection:  Loss Function Dependent Tests

 Pairwise Comparison  Data Snooping or Multiple Comparison

 Robust Forecast Comparison

 Stochastic Dominance Methods  Robust to Choice of Loss Function

slide-6
SLIDE 6
  • Above model includes variables; autoregressive structure – key

additional variables.

  • Allow for random walk, AR, and VAR strawman models.
  • Factor model is an approximation. Underlying model may not have a factor

structure, but complex and rich covariance structure (e.g. in MC studies) across the X variables lends itself to principal component type shrinkage.

  • What about mixed frequency models? Estimation of diffusion indices?

Xt    Ft  t, Xt an N  1 vector  an N  r factor loading matrix 0 an N  1 intercept Ft is unobserved r  1 factor vector t an error term Wt

yth  WtW  FtF  th

slide-7
SLIDE 7
  • What about usefulness of sparseness (SPCA – discussed later) and zero

restrictions in factor loadings? e.g. Ability to isolate potential control variables for policy analysis. Interpretability remains an issue.

  • Armah and Swanson (2010): Factor “proxy” selection small set of
  • bservables as predictors. Parsimonious model selection?
  • Key predictors = “variable subset”? Targeted predictors (e.g. Bai and Ng

(2007,2008))?

  • In Carrasco and Rossi (2016)  factors chosen using cross validation …

explicitly considers “target variable”. What about also selecting factor loadings based on target variable? i.e. three layers here:

  • (i) Traditional approach of using highest eigenvalue factors.
  • (ii) Select factors other than highest eigenval. ones, given target variable.
  • (iii) Use (ii) and also determine “adjusted” loadings = shrinkage = lasso …
slide-8
SLIDE 8
  • Might lack of sparseness be of interest?

 Variables that are not usually relevant included, and if these variables “jump” under structural change, then may impose robustness to structural instability  Turning point stability of predictions ...

  • But sparseness useful to isolate potential control variable … interpretability.
  • Again leads to methodology of Bai and Ng, i.e., targeted predictors.
  • What about: Couple shrinkage regression approach with factor/loadings

shrinkage methods, such as sparse PCA, and include also a set of W targeted “stability predictors”, say, or a factor constructed using these stability predictors.

  • Kim and Swanson (2014,2016) SPCA then shrinkage, or shrinkage

followed by ICA, SPCA or PCA dimension reduction = lasso, elastic net -> get targeted predictors … then construct factors …

  • Or directly “shrink” factors to a particular target …
slide-9
SLIDE 9
  • Independent Component Analysis
  • Assume the F are statistically independent
  • .
  • As is evident from above figure, ICA exactly the same as PCA, if demixing

matrix is the factor loading coefficient matrix associated with PCA.

  • In general, ICA yields uncorrelated factors with descending variance =>

easy "ordering".

  • Moreover, those components explaining the largest share of the variance

are often assumed to be the "relevant" ones for subsequent use in diffusion index forecasting.

slide-10
SLIDE 10
  • .
  • .
  • .
  • .
  • .

ICA finds a demixing matrix which transforms the observed X into independent components F∗  F1

∗,F2 ∗.

Joint pdf characterized by EF1

∗pF2 ∗q   EF1 ∗p EF2 ∗q .

Joint pdf characterized by EF1F2   EF1 EF2 .

For simplicity, consider two observables, X  X1,X2 .

PCA transforms X into uncorrelated components F  F1,F2 .

slide-11
SLIDE 11
  • Use multiple frequencies of data?

 Pastcasting, nowcasting, forecasting, and “continuous” updating.

  • Example: Factor MIDAS used for predicting quarterly data via the use of

monthly factors (Marcellino and Schumacher (2010)).

  • MIDAS model for forecasting quarters ahead is
  • .
  • .
  • . -- Almon distributed lag

hq

Ytqh q  0  1BL1/m,F ̂ tm

3  tq

BL1/m,  ∑

j0 jmax

bj,Lj/m

bj, 

exp 1j2j2

∑j0

jmax exp 1j2j2

  1,2 

F ̂ tm is a set of monthly factors F ̂ tm

3 is skip sampled fromthe monthly factor, F

̂ tm

slide-12
SLIDE 12

Sparseness not present in ridge regression, but may be useful for interpretation of factors. Key idea is to be able to (uniquely) estimate regression coefficients when number of variables > sample size.

  • Optimization Problems that treat such multicollinearity.

Ridge (Hoerl): Lasso (Tibshirani): Elastic Net (Zou, Hastie):

  • Ridge the original  but lasso (least absolute shrinkage and selection
  • perator) shrinks some parameters all the way to zero.
  • Elastic net (Zou and Hastie (2005)) combines the two.
  • If do not care about sparsity, how about neural nets as an alternative?

Overfitting matters – how big an issue in factor analysis w/o sparseness, in the sense of PEER? min‖y − X‖2 s.t. ‖‖1  ∑

j1 p

|j| ≤ . min‖y − X‖2 s.t. ‖‖2  ∑

j1 p

j

2 ≤ .

min‖y − X‖2 s.t. ‖‖2  ∑

j1 p

|j| ≤ 1 and ∑

j1 p

j

2 ≤ 2.

slide-13
SLIDE 13
  • Circling Back  Consider SPCA (Zou, Hastie and Tibshirani (2006)), which

adds the sparseness feature of lasso (elastic net) to PCA. How? Reformulate PCA as a regression-type optimization problem, and then impose the lasso (elastic net = double shrinkage) . Consider penalized regression form of the optimization problems outlined above.

  • 2-stage SPCA? Replace y with F -> ridge is PCA then add L1-norm penalty.
  • Constraint: Lasso can select at most T of N variables, when N>T in PCA

construction.

  • Economic interpretability of factors. Couple SPCA for factors with further

targeted (on predictor variable) penalized regression?

 lasso  arg min ‖y − j1

N Xjj‖ 2  1j1 N |j| .

 elastic net  1  2arg min ‖y − j1

N Xjj‖ 2  1j1 N |j|  2j1 N j 2

.

slide-14
SLIDE 14

Recalling that the L1 norm does not necessarily lead to sparsity, but the L1 regularization term (the penalty) on the weights/coefficients in the model does. L2-norm (e.g. least squares regression) L1-norm (e.g. LAD regression) Not so robust to outliers Robust Stable solution Unstable solution for small data perturbations Unique solution Possibly multiple solutions Non-sparsity Sparsity Computational efficiency (anal. soln)

  • Comput. inefficiency (what if non-sparse?)
slide-15
SLIDE 15
  • Diebold and Mariano (1995), White (2000), Chao, Corradi and Swanson

(2001), Clark and McCracken (2001,2013), Corradi and Swanson (2006) …

  • Key Question: Should We Utilize Loss Function Specific Measures, or Not?
  • .
  • Pairwise Accuracy
  • Causality
  • Big Data

H0 : Egu0,th − gu1,th  0 HA : Egu0,th − gu1,th ≠ 0

DMP 

dt  dt d

→ N0,1,

dt 

1 P ∑tR1 T

dt, dt  g u 0,th − g u 1,th, and dt 

 dt P .

mP  P−1/2 ∑tR1

T

 u 0,thXt SP  maxk1,...,m DMP1,k

slide-16
SLIDE 16

Stochastic Dominance Methods

  • General Loss Forecast Superiority <-> 1st Order Stochastic Dominance
  • Convex Loss Forecast Superiority <-> 2nd Order Stochastic Dominance
  • Implementation:

Gx  F2x − F1xsgnx

Cx  −

x F1t − F2tdt1x  0  x F2t − F1tdt1x ≥ 0

u1 G u2 iff ELu1 ≤ ELu2, ∀ L ∈ LG

u1 C u2 iff ELu1 ≤ ELu2, ∀ L ∈ LC

ELu1 ≤ ELu2 for all L iff Gx ≤ 0,

slide-17
SLIDE 17
  • .

Fkx  Puk,t ≤ x Fk,nx  P−1 ∑

tR T

1uk,t ≤ x

TGn

  k2,..,m

max

x∈X

sup n Gk,nxandTGn

−  k2,..,m

max

x∈X−

sup n Gk,nx TCn

  k2,..,m

max

x∈X

sup n Ck,nx and TCn

−  k2,..,m

max

x∈X−

sup n Ck,nx,

Gk,nx  Fk,nx − F1,nxsgnx Ck,nx  

− x

F1,ns −Fk,nsds1x  0  

x 

Fk,ns − F1,nsds1x ≥ 0

slide-18
SLIDE 18
  • The Models
  • Dynamic Nelson Siegel – a ‘small data’ model (Diebold and Li (2006)) with

time decay parameter, maturity parameter; and level, slope and curvature ‘factors’ (i.e., the betas), so that factor loading on level factor is one, etc.

  • Slope factor increase -> slope of curve increases as short rates increases

more than long rates in this case …

  • Dimension Reduction (Big Data) Models
  • Strawman Econometric Models

yt  1,t  2,t 1−exp−t

t

  3,t 1−exp−t

t

− exp−t  t

yth  ′Wt  2

′ Ft b  2 ′ Ft s  th

yth  ′Wt  th

slide-19
SLIDE 19
  • Use zero coupon U.S. Treasury yield curve, monthly, 1982-2016;

Gurkaynak, Sack, and Wright ((2006).

  • Target variables are 1,2,3,5,10 year maturity yields.
  • Forecast horizons are h=1,3,12 months.
  • Prediction subsamples 1992-99, ‘2000-07, 2008-16, recession/expansion.
  • Small data panel has N=10, T=415.
  • Big data panel uses FRED-MD dataset with 103 macroeconomic variables.
  • Predictions constructed in real-time, and estimations are based on rolling

windows.

  • Model Selection: MSFE and DM Tests.
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
  • The Models
slide-25
SLIDE 25
  • .
slide-26
SLIDE 26
  • .
  • .
slide-27
SLIDE 27
  • MSFE-Best Models
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33
  • Forecast Combination (1-Month Ahead)
  • .
slide-34
SLIDE 34
  • Subsamples 1 and 2: DNS+FB models usually win, including 17 of 20

maturity/horizon permutations.

  • Subsample 3: DNS+FB wins in only 2 of 10 cases for h=1 and 3, across
  • maturities. Post Great Recession confusion?
  • Entire Sample Period: For h=1,3, DNS+FB wins 7 of 10 times.
  • Evidence for h=12 much more mixed, AR, VAR, and pure DNS often ‘win.’
  • 1 or 2 factors always ‘best’.
slide-35
SLIDE 35
  • AND the `best’ models are almost always significantly better than AR(1)

straw-man model.

  • DNS model ‘winners’ are used ‘vector’ variety. DNS factors do not evolve

independently of one another, when predicting.

  • Thus, DNS factors best predicted using other DNS factors AND big data

diffusion indexes.

  • DNS+FB evidence even stronger for recession subsample: DNS+FB wins

in 13 of 15 horizon/maturity permutations.

  • NOT so for expansion subsample: DNS+FB wins in 7 of 15 permutaitons.
slide-36
SLIDE 36
  • Forecast Combination is not optimal in our experiments.
  • Combinations fail to win in 15 of 20 permutations, for h=1, across all

subsamples.

  • Combinations fail to win in 18 of 20 permutations, for h=3, across all

subsamples and bond maturities.

  • Combinations fail to win in 17 of 20 permutations, for h=12, across all

subsamples and bond maturities.

slide-37
SLIDE 37
  • Big (and wide) data analysis is a burgeoning area of research, and many

interesting methodological advances remain to discovered and also empirically analyzed.

  • Not only do we have more data than ever to propel this empirical research,

but we also have many useful new tools, ranging from data shrinkage methods to varieties of latent factor modelling, with which to work.

  • Thank You!!!!
slide-38
SLIDE 38