speeding up the ardl estimation command
play

Speeding Up the ARDL Estimation Command: A Case Study in Efficient - PowerPoint PPT Presentation

Efficient Coding Digression: A Tiny Bit of Asymptotic Notation The ARDL Model Optimal Lag Selection Incremental Code Improvements Speeding Up the ARDL Estimation Command: A Case Study in Efficient Programming in Stata and Mata Sebastian


  1. Efficient Coding Digression: A Tiny Bit of Asymptotic Notation The ARDL Model Optimal Lag Selection Incremental Code Improvements Speeding Up the ARDL Estimation Command: A Case Study in Efficient Programming in Stata and Mata Sebastian Kripfganz 1 Daniel C. Schneider 2 1 University of Exeter 2 Max Planck Institute for Demographic Research German Stata Users Group Meeting, June 23, 2017 Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 1 / 27

  2. Efficient Coding Digression: A Tiny Bit of Asymptotic Notation The ARDL Model Optimal Lag Selection Incremental Code Improvements Contents Efficient Coding Digression: A Tiny Bit of Asymptotic Notation The ARDL Model Optimal Lag Selection Incremental Code Improvements Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 2 / 27

  3. Efficient Coding Digression: A Tiny Bit of Asymptotic Notation The ARDL Model Optimal Lag Selection Incremental Code Improvements Introduction ➓ Long code execution times are more than a nuisance: they negatively affect the quality of research ➓ strategies for speeding up execution: ➓ lower-level language ➓ parallelization ➓ writing efficient code ➓ Efficient coding is often the best choice. ➓ Moving to lower-level languages is tedious. ➓ In many settings, speed improvements are higher than through parallelization. Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 3 / 27

  4. Efficient Coding Digression: A Tiny Bit of Asymptotic Notation The ARDL Model Optimal Lag Selection Incremental Code Improvements Introduction: Speed of Stata and Mata ➓ C is the reference ➓ compiled to machine instructions ➓ Post of Bill Gould (2014) at the Stata Forum: ➓ Stata (interpreted) code is 50-200 times slower than C. ➓ Mata compiled byte-code 5-6 times slower than C. => Mata is 10-40 times faster than Stata. ➓ In real-world applications, Mata is ~2 times slower than C. ➓ Mata has built-in C routines based on very efficient code. Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 4 / 27

  5. Efficient Coding Digression: A Tiny Bit of Asymptotic Notation The ARDL Model Optimal Lag Selection Incremental Code Improvements Introduction: Efficient Coding Strategies ➓ Using Common Sense ➓ An if-condition requires at least N comparisons. Use in-conditions instead, if possible. ➓ Multiplying two 100x100 matrices requires about 2*100^3 = 2,000,000 arithmetic operations. ➓ Using Knowledge of Your Software (Stata, of course!) ➓ Examples: ➓ Mata: passing of arguments to functions ➓ Efficient operators and functions (e.g. Mata’s colon operator and its c-conformability) ➓ Read the Stata and Mata programming manuals Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 5 / 27

  6. Efficient Coding Digression: A Tiny Bit of Asymptotic Notation The ARDL Model Optimal Lag Selection Incremental Code Improvements Introduction: Efficient Coding Strategies Using Knowledge of Matrix Algebra ➓ Translating mathematical formulas one-to-one into matrix language expressions is oftentimes (very!) inefficient. ➓ Examples: ➓ diagonal matrices (D) : ➓ multiplication of a matrix by D: don’t do it! Mata: use c-conformability of the colon operator (see [M-2] op_colon ) ➓ inverse: flip diagonal elements instead of calling a matrix solver / inverter function ( O ♣ n q vs. O ♣ n ✸ q ) ➓ block diagonal matrices: ➓ multiplication: just multiply diagonal blocks; the latter is faster by ✶ ④ s ✷ , where s is the number of diagonal blocks ➓ inverse: invert individual blocks ➓ order of matrix multiplication / parenthesization ➓ b ✏ ♣ X ✶ X q ✁ ✶ ♣ X ✶ y q is faster than b ✏ ♣ X ✶ X q ✁ ✶ X ✶ y e.g. for k ✏ ✶✵ , N ✏ ✶✵ , ✵✵✵ : matrix multiplications are 11 times faster! Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 6 / 27

  7. Efficient Coding Digression: A Tiny Bit of Asymptotic Notation The ARDL Model Optimal Lag Selection Incremental Code Improvements Asymptotic Notation Definition An algorithm with input size n and running time T ♣ n q is said to be Θ ♣ g ♣ n qq (“theta of g of n”) or to have an asymptotically tight bound g ♣ n q if there exist positive real numbers c ✶ , c ✷ , n ✵ → ✵ such that c ✶ g ♣ n q ↕ T ♣ n q ↕ c ✷ g ♣ n q ❅ n ➙ n ✵ 6e9 3 - 1000n 2 + 1000n + 10e9 is O(n 3 ) # of arithmetic operations T(n)=0.8n 4e9 2e9 0 0 500 1000 1500 2000 Algorithm input size n 3 3 Kripfganz/Schneider Uni Exeter & MPIDR T(n) 0.2 * n Speeding Up ARDL 0.801 * n June 23, 2017 7 / 27

  8. Efficient Coding Digression: A Tiny Bit of Asymptotic Notation The ARDL Model Optimal Lag Selection Incremental Code Improvements Asymptotic Notation ➓ O ♣ g ♣ n qq (“(big) oh of g of n”), as opposed to Θ ♣ g ♣ n qq , is used here to only denote an upper bound. Notation differs in the literature. ➓ Technically, Θ ♣ g ♣ n qq and O ♣ g ♣ n qq are sets of functions, so we write e.g. T ♣ n q P O ♣ g ♣ n qq . ➓ For matrix operations, g ♣ n q is frequently n raised to some low integer power. � n ✷ ✟ � n ✸ ✟ ➓ Θ ♣ n q is much better than Θ , which in turn is much better than Θ ➓ (Square) matrix multiplication is Θ � n ✸ ✟ : each element of the new n ✂ n matrix is a sum of n terms. Costly! ➓ Many types of matrix inversion, e.g. the LU-decomposition, are also � n ✸ ✟ Θ . Costly! ➓ Inner vector products are Θ ♣ n q . ➓ When T ♣ n q is an i -th order polynomial, the leading term � n i ✟ asymptotically dominates: T ♣ n q P O . ➓ Θ ♣ a n q is worse than Θ ♣ n a q ; Θ ♣ lg n q is better than Θ ♣ n q Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 8 / 27

  9. Efficient Coding Digression: A Tiny Bit of Asymptotic Notation The ARDL Model Optimal Lag Selection Incremental Code Improvements ARDL: Model Setup ➓ ARDL ♣ p , q ✶ , . . . , q k q : autoregressive distributed lag model ➓ Popular, long-standing single-equation time-series model for continuous variables ➓ Linear model : p q ➳ ➳ ✵ , σ ✷ ✟ β ✶ � y t ✏ c ✵ � c ✶ t � φ i y t ✁ i � i ① t ✁ i � u t , u t � iid i ✏ ✶ i ✏ ✵ t q ✶ can be purely I ♣ ✵ q , purely I ♣ ✶ q , or cointegrated: can be used to test for ➓ ♣ y t , ① ✶ cointegration (bounds testing procedure). (Pesaran, Shin, and Smith, 2001). => econometrics of ARDL can be complicated. ➓ ♥❡t ✐♥st❛❧❧ ❛r❞❧ ✱ ❢r♦♠✭❤tt♣✿✴✴✇✇✇✳❦r✐♣❢❣❛♥③✳❞❡✴st❛t❛✮ ➓ This talk: programming; for the statistics of ❛r❞❧ , see Kripfganz/Schneider (2016). Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 9 / 27

  10. Efficient Coding Digression: A Tiny Bit of Asymptotic Notation The ARDL Model Optimal Lag Selection Incremental Code Improvements ARDL: Computational Considerations ➓ Despite its complex statistical properties, estimating an ARDL model is just based on OLS! ➓ The computational costly parts are: ➓ determination of optimal lag orders (e.g. via AIC or BIC) ➓ treated at length in this talk ➓ simulation of test distributions for cointegration testing (PSS 2001, Narayan 2005). ➓ not covered by this talk Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 10 / 27

  11. Efficient Coding Digression: A Tiny Bit of Asymptotic Notation The ARDL Model Optimal Lag Selection Incremental Code Improvements Optimal Lag Selection: The Problem ➓ For k � ✶ variables (indepvars + depvar) and maxlag lags for each variable, run a regression and calculate an information criterion (IC) for each possible lag combination and select the model with the best IC value. ➓ Example: 2 variables (v1 v2) , ➓ # of regressions to run is maxlag ✏ ✷ exponential in k : r❡❣r❡ss ✈✶ ▲✭✶✴✶✮✳✈✶ ▲✭✵✴✵✮✳✈✷ maxlags ☎ ♣ maxlags � ✶ q k : r❡❣r❡ss ✈✶ ▲✭✶✴✷✮✳✈✶ ▲✭✵✴✵✮✳✈✷ r❡❣r❡ss ✈✶ ▲✭✶✴✶✮✳✈✶ ▲✭✵✴✶✮✳✈✷ k � ✶ maxlags # regressions r❡❣r❡ss ✈✶ ▲✭✶✴✷✮✳✈✶ ▲✭✵✴✶✮✳✈✷ r❡❣r❡ss ✈✶ ▲✭✶✴✶✮✳✈✶ ▲✭✵✴✷✮✳✈✷ 3 4 100 r❡❣r❡ss ✈✶ ▲✭✶✴✷✮✳✈✶ ▲✭✵✴✷✮✳✈✷ 3 8 � 650 4 8 � 5,800 6 8 � 470,000 8 8 � 38,000,000 Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 11 / 27

  12. Efficient Coding Digression: A Tiny Bit of Asymptotic Notation The ARDL Model Optimal Lag Selection Incremental Code Improvements Lag Selection: Preliminaries ✔ ✜ ✶ ✵ ✵ ✶ ✵ ✶ ✖ ✣ ✖ ✣ ✶ ✵ ✷ ✖ ✣ ✖ ✣ ✶ ✶ ✵ ✖ ✣ ✖ ✣ ✶ ✶ ✶ ✖ ✣ ➓ Lag combination matrix for k ✏ ✸ and maxlags ✏ ✷ : ✖ ✣ ✶ ✶ ✷ ✖ ✣ ✖ ✣ ✶ ✷ ✵ ✖ ✣ ✖ ✣ ✶ ✷ ✶ ✖ ✣ ✖ ✣ ☎ ☎ ☎ ✕ ✢ ✷ ✷ ✷ ✏ ✘ ➓ e.g. row 3: corresponds to regressors ✶ ✵ ✷ ✏ ✘ ▲✳✈✶ ▲✭✵✴✵✮✳✈✷ ▲✭✵✴✷✮✳✈✸ = v ✶ t ✁ ✶ v ✷ t v ✸ t v ✸ t ✁ ✶ v ✸ t ✁ ✷ ➓ called “lagcombs” in pseudo-code to follow Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 12 / 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend