Efficient Programming in Stata and Mata II: Obtaining Non-Standard - - PowerPoint PPT Presentation

β–Ά
efficient programming in stata and mata ii obtaining non
SMART_READER_LITE
LIVE PREVIEW

Efficient Programming in Stata and Mata II: Obtaining Non-Standard - - PowerPoint PPT Presentation

Efficient Programming in Stata and Mata II: Obtaining Non-Standard Distributions for a Cointegration Test via Simulation Sebastian Kripfganz University of Exeter Business School Daniel C. Schneide r Max Planck Institute for Demographic


slide-1
SLIDE 1

Efficient Programming in Stata and Mata II: Obtaining Non-Standard Distributions for a Cointegration Test via Simulation

Sebastian Kripfganz University of Exeter Business School Daniel C. Schneider Max Planck Institute for Demographic Research German Stata Users Group Meeting, June 22, 2018, Konstanz

slide-2
SLIDE 2

2 / 25

Last Year’s Talk

  • efficient coding strategies:
  • use common sense
  • use your knowledge of your software (Stata, of

course!)

  • use your knowledge of matrix algebra
  • case study: the -ardl- estimation command
  • last year: optimal lag selection
  • this talk: simulation of finite sample

distributions

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

slide-3
SLIDE 3

3 / 25

Stationarity vs. Non-Stationarity

  • fundamental distinction in time series analysis (TSA)
  • mostly about time series with a unit root: I(0) vs. I(1)
  • non-stationary TS behave fundamentally different

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

slide-4
SLIDE 4

4 / 25

Multiple Time Series Analysis

Long-run relationship: Some time series are bound together due to equilibrium forces even though the individual time series might move considerably.

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

slide-5
SLIDE 5

5 / 25

The ARDL Model and the Bounds Test

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

  • Pesaran / Shin / Smith (2001) (PSS) derive the asymptotic coefficient

distributions under the opposing assumptions of stationary vs. non- stationary regressors, the basis for their bounds test for a levels relationship.

  • They provide critical values (CV) tables obtained via simulation.
slide-6
SLIDE 6

6 / 25

ARDL Toy Model Estimation

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

slide-7
SLIDE 7

7 / 25

ARDL Toy Model Estimation

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

slide-8
SLIDE 8

8 / 25

Simulation Project Outline

  • PSS bounds test very popular, but CV tables only cover a

limited number of cases οƒžcomputational / simulation project:

  • 1. simulate distributions for all combinations of c, I, k, q, T
  • 2. store calculated statistics / distributions
  • 3. run response surface regressions (RSR), where the

depvars are distributional quantiles

  • 4. implement and distribute an ARDL postestimation feature

that displays RSR-based CVs / p-values

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

slide-9
SLIDE 9

9 / 25

Response Surface Regressions (RSR)

  • idea:

for each c, I, k: regress quantile of distr ~ g(T,q) We implement variations thereof.

  • use predicted values for a particular T, q as CVs

in applied work

  • introduced by MacKinnon (1991, 1994, 1996)
  • Other Stata commands, e.g.
  • ersur (Baum/Otero 2017)
  • kssur, ksur (Otero/Smith 2017)

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

slide-10
SLIDE 10

10 / 25

The Computational Task Similar to PSS, the DGP is 𝑧𝑒 = π‘§π‘’βˆ’1 + πœ—π‘§π‘’ π’šπ‘’ = π‘Έπ’šπ‘’βˆ’1 + 𝝑𝑦𝑒 for 𝑒 = 1, 2, … , π‘ˆ + 50 (including 50 burn-in periods), and where 𝑧0, π’šβ€²0 β€² = 𝟏, πœ—π‘’~𝑂 0, 𝐽𝑙+1 and 𝑸 = 0 (𝐽 0 regressors) 𝑸 = 𝑱𝒍 (𝐽 1 regressors)

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

slide-11
SLIDE 11

11 / 25

The Computational Task

project size: Results in ~160,000,000,000 stats Implies several months of computation (β€œOh my!”) Implies ~600GB disk space (β€œOh dear!”)

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

Symbol Meaning Values # values c deterministics cases 1, 2, …, 5 (F); 1, 3, 5 (t) 8 I integration order 0, 1 2 k # of regressors 0, 1, …, 10 11 q # of lags 0, 1, …, 4, 6, 8, 12 8 T sample size 20, 22, …, 400, 500, 1000 18 r # replications 100,000 m # meta replications 100

slide-12
SLIDE 12

12 / 25

Reducing Data Size

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

Idea, omitting details: i) round to 3 decimal places, ii) store tabulation

slide-13
SLIDE 13

13 / 25

Reducing Data Size

  • Achieved size reduction: over 90%
  • After -zipfile-, data occupy 10GB
  • Solving this was crucial as now computational steps can be

separated.

  • But: Takes up 20% computation time
  • . help data types, . help compress
  • Data transformations and data types
  • Years, age in years
  • Wish list item: if Mata supported all numeric types of Stata
  • Could implement more complex storage ideas in Mata

and its mmat files

  • Could write (de-)compression in terms of a class

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

slide-14
SLIDE 14

14 / 25

Simulation & Multiple Stata Instances

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

slide-15
SLIDE 15

15 / 25

Simulation & Multiple Stata Instances

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

Windows / DOS batch file to fire up Stata instances

slide-16
SLIDE 16

16 / 25

Simulation & Multiple Stata Instances

  • Multiple instances
  • help entry: [GSW] B.5 Stata batch mode
  • careful with any kind of file saving operations,

e.g. logs

  • batch file to kill processes?
  • RNG streams
  • new in Stata 15
  • . help set rngstream

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

slide-17
SLIDE 17

17 / 25

Mata Code Optimization

  • necessary to examine each expression for speed

improvements

  • examples of smaller improvements
  • row extraction instead of column extraction
  • inner vector product: sum of squares vs. cross() vs.

multiplication

  • most important code features
  • pre-calculation of cross-products, accessing through

indexing

  • use pointer variables to facilitate storing numbers
  • experiment with inverters / solvers
  • not pursued: C/C++
  • Stata/Mata has a MUCH better convenience-speed trade-off
  • Stata/Mata great in other respects too: version control

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

slide-18
SLIDE 18

18 / 25

Mata Code Optimization

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

Usage of pointer variables

slide-19
SLIDE 19

19 / 25

Mata Code Optimization

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

Loop structure

slide-20
SLIDE 20

20 / 25

Project Results: ARDL Toy Example

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

slide-21
SLIDE 21

21 / 25

Project Results: ARDL Toy Example

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

PSS values Response surface regression based values

slide-22
SLIDE 22

22 / 25

Project Results: E.g. Dickey-Fuller

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

Besides Cheung and Lai (1995), the existing literature largely neglects the lag-order dependence of the finite-sample critical values (t-statistic, k=0, case (iii), Ξ±=5%)

slide-23
SLIDE 23

23 / 25

Recap

  • Non-stationary time series and

cointegration, ardl and the PSS bounds test

  • Simulation project: Improve CV tables for

bounds test

  • Storing large quantity of numbers
  • Computation time
  • Multiple Stata instances
  • Code improvements within Mata

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

slide-24
SLIDE 24

24 / 25

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018

Thank you!

Questions? Comments?

schneider@demogr.mpg.de See also: the ardl discussion thread on the Stata Forum . net install ardl, from(http://www.kripfganz.de/stata/) Paper available at http://www.kripfganz.de/research/index.html

slide-25
SLIDE 25

25 / 25

References

Cheung, Y.-W. and K. S. Lai (1995a). Lag order and critical values of the augmented Dickey-Fuller

  • test. Journal of Business & Economic Statistics 13 (3), 277-280.

Kripfganz, S. and D. C. Schneider (2018). Response Surface Regressions for Critical Value Bounds and Approximate p-values in Equilibrium Correction Models. Manuscript, University of Exeter and Max Planck Institute for Demographic Research. Available at www.kripfganz.de/research/Kripfganz_Schneider_ec.html. MacKinnon, J. G. (1991). Critical values for cointegration tests. In R. F. Engle and C. W. J. Granger (Eds.), Long-Run Economic Relationships: Readings in Cointegration, Chapter 13,

  • pp. 267-276. Oxford: Oxford University Press.

MacKinnon, J. G. (1994). Approximate asymptotic distribution functions for unit-root and cointegration tests. Journal of Business & Economic Statistics 12 (2), 167-176. MacKinnon, J. G. (1996). Numerical distribution functions for unit root and cointegration tests. Journal of Applied Econometrics 11 (6), 601-618. Otero, J. and C. F. Baum (2017). Response surface models for the Elliott, Rothenberg, and Stock unit-root test. Stata Journal 17 (4), 985-1002. Otero, J. and J. Smith (2017). Response surface models for OLS and GLS detrending-based unit- root tests in nonlinear ESTAR models. Stata Journal 17 (3), 704-722. Pesaran, M. H., Y. Shin, and R. J. Smith (2001). Bounds testing approaches to the analysis of level

  • relationships. Journal of Applied Econometrics 16 (3), 289-326.

Efficient Programming in Stata/Mata Kripfganz/Schneider German Stata Meeting 2018