Adaptive Estimation of Autoregressive Models with Time-Varying - - PDF document

adaptive estimation of autoregressive models with time
SMART_READER_LITE
LIVE PREVIEW

Adaptive Estimation of Autoregressive Models with Time-Varying - - PDF document

Adaptive Estimation of Autoregressive Models with Time-Varying Variances Ke-Li Xu and Peter C. B. Phillips Yale University January 15, 2006 Abstract Stable autoregressive models of known fi nite order are considered with martingale di ff


slide-1
SLIDE 1

Adaptive Estimation of Autoregressive Models with Time-Varying Variances

Ke-Li Xu∗and Peter C. B. Phillips† Yale University January 15, 2006

Abstract Stable autoregressive models of known finite order are considered with martingale differ- ences errors scaled by an unknown nonparametric time-varying function generating hetero-

  • geneity. An important special case involves structural change in the error variance, but in

most practical cases the pattern of variance change over time is unknown and may involve shifts at unknown discrete points in time, continuous evolution or combinations of the two. This paper develops kernel-based estimators of the residual variances and associated adap- tive least squares (ALS) estimators of the autoregressive coefficients. These are shown to be asymptotically efficient, having the same limit distribution as the infeasible generalized least squares (GLS). Comparisons of the efficient procedure and the ordinary least squares (OLS) reveal that least squares can be extremely inefficient in some cases while nearly optimal in

  • thers. Simulations show that, when least squares work well, the adaptive estimators perform

comparably well, whereas when least squares work poorly, major efficiency gains are achieved by the new estimators. Keywords: Adaptive estimation, autoregression, heterogeneity, weighted regression. JEL classification: C14, C22

∗Department of Applied Mathematics, Yale University, 51 Prospect Street, New Haven, Connecticut USA 06520.

E-mail address: keli.xu@yale.edu.

†Corresponding author. Department of Economics, Cowles Foundation for Research in Economics, Yale Uni-

versity, P. O. Box 208281, New Haven, Connecticut USA 06520-8281. Telephone: +1-203-432-3695. Fax: +1-203- 432-6167. E-mail address: peter.phillips@yale.edu.

1

slide-2
SLIDE 2

1 Introduction

Recently robust estimation and inference methods have been developed in autoregressions to account for for potentially conditional heteroskedasticity in the innovation process. In this spirit, Kuersteiner (2001, 2002) developed efficient instrumental variables estimators for autoregressive and moving average (ARMA) models and autoregressive models of finite (p-th) order (AR(p)). Goncalves and Kilian (2004a, 2004b) used bootstrap methods to robustify inference in AR(p) and AR(∞) models with unknown conditional heteroskedasticity. These methods and results rely on the assumption that the unconditional variance of errors is constant over time. Unconditional homoskedasticity seems unrealistic in practice, especially in view of the recent emphasis in the empirical literature on structural change modeling for economic time series. To accommodate models with error variance changes, Wichern, Miller and Hsu (1976) investigated the AR(1) model when there are a finite number of step changes at unknown time points in the error variance. These authors used iterative maximum likelihood methods to locate the change points and then estimated the error variances in each block by averaging the squared least squares

  • residuals. The resulting feasible weighted least squares was shown to be efficient for the specific

model considered. Alternative methods to detect step changes in the variances of time series models have been studied by Abraham and Wei (1984), Baufays and Rasson (1985), Tsay (1988), Park, Lee and Jeon (2000), Lee and Park (2001), de Pooter and van Dijk (2004) and Galeano and Peña (2004). In practice, the pattern of variance changes over time, which may be discrete or continuous, is unknown to the econometrician and it seems desirable to use methods which can adapt for a wide range of possibilities. Accordingly, this paper seeks to develop an efficient estimation procedure which adapts for the presence of different and unknown forms of variance dynamics. We focus

  • n the stable AR(p) model whose errors are assumed to be martingale differences multiplied

by a time-varying scale factor which is a continuous or discontinuous function of time, thereby permitting a spectrum of variance dynamics that include step changes and smooth transition functions of time. Efficient estimation of linear models with heteroskedasticity under iid assumptions was earlier investigated by Carroll (1982) and Robinson (1987), and more recently by Kitamura, Tripathi and Ahn (2004) using empirical likelihood methods in a general conditional moment setting. In 2

slide-3
SLIDE 3

the time series context, Harvey and Robinson (1988) considered a regression model with deter- ministically trending regressors, whose error is an AR(p) process scaled by a continuous function

  • f time. Hansen (1995) considered the linear regression model, nesting autoregressive models

as special cases, when the conditional variance of the model error is a function of a covariate that has the form of a nearly integrated stochastic process with no deterministic drift. In this case, the nearly integrated process is scaled by the factor T −1/2, where T is the sample size, to

  • btain a nondegenerate limit theory. For nearly integrated covariates with deterministic drift,

the corresponding normalization would be T −1 and Hansen’s model be analogous to the model considered here. Regression models in which the conditional variance of the error is an unscaled function of an integrated time series has recently been investigated by Chung and Park (2004) using Brownian local time limit methods developed in Park and Phillips (1999, 2001). Recently, increasing attention has been paid to potential structural error variance changes in integrated process models. The effects of breaks in the innovation variance on unit root tests and stationarity tests were studied by Hamori and Tokihisa (1997), Kim, Leybourne and Newbold (2002), Busetti and Taylor (2003) and Cavaliere (2004a). A general framework to analyze the effect of time varying variances on unit root tests was given in Cavaliere (2004b) and Cavaliere and Taylor (2004). By contrast, little work of this general nature has been done on stable autoregressions, most of the attention in the literature being concerned with the case of step changes in the error variance, as discussed above. The present paper therefore contributes by focusing on efficient estimation of the AR(p) model with time varying variances of a general form that includes step changes as a special case. Robust inference in such models is dealt with in another paper (Phillips and Xu, 2005). The remainder of the paper proceeds as follows. Section 2 introduces the model and as- sumptions and develops a limit theory for a class of weighted least squares estimators, including efficient (infeasible) generalized least squares (GLS). A range of examples show that OLS can be extremely inefficient asymptotically in some cases while nearly optimal in others. Section 3 proposes a kernel-based estimator of the residual variance and shows the associated adaptive least squares estimator to be asymptotically efficient, in the sense of having the same limit distribution as the infeasible GLS estimator. Simulation experiments are conducted to assess the finite sample performance of the adaptive estimator in Section 4. Section 5 concludes. Proofs of the main 3

slide-4
SLIDE 4

results are collected in two appendices.

2 The Model

Let (Ω, F, P) be a probability space and {Ft} a sequence of increasing σ−fields of F. Suppose the sample {Y−p+1, · · · , Y0,Y1, · · · , YT } from the following data generating process for the time series Yt is observed A(L)Yt = ut, (1) ut = σtεt, (2) where L is the lag operator, A(L) = 1 − β1L − β2L2 − · · · − βpLp, βp 6= 0, is assumed to have all roots outside the unit circle and the lag order p is finite and known. We assume {σt} is a deterministic sequence and {εt} is a martingale difference sequence with respect to {Ft}, where Ft = σ(εs, s ≤ t) is the σ−field generated by {εs, s ≤ t}, with unit conditional variance, i.e. E(ε2

t|Ft−1) = 1, a.s., for all t. The conditional variance of {ut} is characterized fully by the

multiplicative factor σt, i.e. E(u2

t|Ft−1) = σ2 t, a.s.. This paper focuses on unconditional het-

eroskedasticity and σ2

t is assumed to be modeled as a general deterministic function, which rules

  • ut conditional dependence of σt on the past events of Yt. The autoregressive coefficient vector

β = (β1, β2, · · · , βp)0 is taken as the parameter of interest. Ordinary least squares (OLS) esti- mation gives b β = ³PT

t=1 Xt−1X0 t−1

´−1 ³PT

t=1 Xt−1Yt

´ , where Xt−1 = (Yt−1, Yt−2, · · · , Yt−p)0. Throughout the rest of the paper we impose the following conditions. Assumption (i). The variance term σt = g ¡ t

T

¢ , where g(·) is a measurable and strictly positive function

  • n the interval [0, 1] such that 0 < C1 <

inf

r∈[0,1]g(r) ≤

sup

r∈[0,1]

g(r) < C2 < ∞ for some positive numbers C1 and C2, and g(r) satisfies a Lipschitz condition except at a finite number of points

  • f discontinuity;

(ii). {εt} is strong mixing ( α-mixing) and E(εt|Ft−1) = 0, E(ε2

t|Ft−1) = 1, a.s., for all t.

(iii). There exist µ > 1 and C > 0, such that supt Eεt4µ < C < ∞. 4

slide-5
SLIDE 5
  • Remarks. (1) In contrast to modeling σt in a setting with finitely many parameters, Assump-

tion (i) is nonparametric and σt depends only on the relative position of the error in the sample. Similar formulations have been widely used in the econometric literature, for example by Robin- son (1989,1991) in the estimation of time-varying parameter of linear and nonlinear regression, and by Harvey and Robinson (1988) in the efficient estimation of regressions with deterministic trending regressors. In recent work, Cavaliere (2004b) analyzes the effects of heteroskedasticity

  • n unit root tests using this specification of the error variance.

(2) Under Assumption (i) the function g is integrable on the interval [0, 1] to any finite order. For brevity, we write R 1

0 gm(r)dr as R gm for any finite positive integer m. Formally, of course,

the assumption induces a triangular array structure to the processes ut and Yt, but we dispense with the additional affix T in the arguments that follow. Under the stated assumptions, the process Yt has Wold representation Yt =

X

i=0

αiut−i, (3) where the coefficients {αi} satisfy

X

i=0

|αi| < ∞. (4) Under Assumptions (i)-(iii), b β is asymptotically normal with limit distribution (Phillips and Xu, 2005): √ T(b β − β)

d

→ N µ 0, Λ ¶ , (5) where Λ = R g4 ( R g2)2 Γ−1 and Γ is the p × p positive definite matrix with the (i, j)-th element γ|i−j|, and γk =

P

i=0

αiαi+k < ∞, for 0 ≤ k ≤ p − 1. The matrix Γ−1 can be consistently estimated by b Γ−1 = µ b γ|i−j| ¶−1

i,j

, (6) where b γ0, b γ1, · · · , b γp−1 are the first p elements in the first column of the (p2 × p2) matrix [Ip2 − 5

slide-6
SLIDE 6

F ⊗ F]−1, where ⊗ indicates the Kronecker product and F = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ b β1 b β2 · · · b βp Ip−1 . . . ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ . Result (5) is a consequence of the following more general theorem. Theorem 1 Suppose w2

t is nonstochastic and satisfies (i) 0 < w2 t < C < ∞ for all t and

some finite positive number C > 0; (ii) there exists a function w(·) on [0, 1], continuous except for a finite number of discontinuities, such that w2

[Tr] → w2(r) for any r ∈ [0, 1] at which w(·) is

continuous ; (iii) R w2 > 0. Then, under Assumption (i)-(iii), the weighted least squares (WLS) estimator b βWLS = µ T P

t=1

w2

t Xt−1X0 t−1

¶−1 µ T P

t=1

w2

t Xt−1Yt

¶ (7) satisfies √ T(b βWLS − β)

d

→ N µ 0,

R w4g4 (R w2g2)2 Γ−1

¶ , (8) as T → ∞. Naturally, the estimator with the smallest asymptotic variance matrix in the class (7) is achieved by generalized least squares (GLS) β∗ = µ T P

t=1

Xt−1X0

t−1σ−2 t

¶−1 µ T P

t=1

Xt−1Ytσ−2

t

¶ , (9) with weights w2

t = σ−2 t (The optimality of β∗ can also be justified by the theory of unbiased linear

estimating equations, as in Godambe (1960) and Durbin (1960).) in which case √ T(β∗ − β)

d

→ N(0, Γ−1), (10) as T → ∞. 6

slide-7
SLIDE 7
  • Remarks. Clearly, the asymptotic variance matrix of b

β differs from that of β∗ by the factor R g4/(R g2)2, and since Γ−1 is invariant to the function g(·) the inefficiency of the OLS estimator b β depends crucially on this factor. The following examples1 show that the factor can be large and OLS can be very inefficient in some cases, whereas in others, the factor is close to unity and OLS is close to optimal. Example 1 (A single abrupt shift in the innovation variance) Let τ ∈ [0, 1] and g(r) be the step function g(r)2 = σ2

0 + (σ2 1 − σ2 0)1{r≥τ}, r ∈ [0, 1],

giving error variance σ2

0 before the break point [Tτ], and σ2 1 afterwards. The steepness of the

variance shift is measured by the ratio δ := σ1/σ0 of the post-break and pre-break standard

  • deviation. By (5) the asymptotic variance matrix of OLS is

Λ = τ + (1 − τ)δ4 (τ + (1 − τ)δ2)2 Γ−1 := f 2

1 (τ, δ)Γ−1,

where f2

1 (τ, δ) =

µ τ + (1 − τ)δ2 ¶−2 µ τ + (1 − τ)δ4 ¶ , which is a function of the break date τ and the shift magnitude δ. Figure 1 plots the value of f1(τ, δ) across δ ∈ [0.01, 100] for different values of τ. The variance

  • f the OLS estimator largely depends on where the break in the innovation variance occurs. For

the negative (δ < 1) shift, f1(τ, δ) increases steeply as δ decreases when τ = 0.1, and is relatively steady and nearly unity when τ = 0.9. The graph shows that OLS has large variance when the break occurs at the beginning (τ = 0.1) but much smaller variance, and in fact close to that of infeasible GLS, when the break is at the end (τ = 0.9) of the sample. This difference is explained by the fact that when the break in variance occurs early in the sample, the large innovation variance in the early part of the sample affects all later observations via the autoregressive mechanism. By contrast, when the break occurs near the end of the sample, only later observations are directly affected, so the impact of a negative shift is small. This argument applies when there is a negative shift - a shift to a smaller variance at the end of the sample - and a reverse argument applies in

1We follow the formulation of the variance function in Cavaliere (2004) (Section 5, page 271-283), who investi-

gates heteroskedastic unit root testing.

7

slide-8
SLIDE 8

the case of a positive shift. In fact, under a positive (δ > 1) shift, OLS has large variance when the shift occurs late (τ = 0.9) but small variance and more closely approximates infeasible GLS when it is early (τ = 0.1) in the sample. These phenomena are confirmed in the simulation experiment of Gaussian AR(1) case, reported in Section 4. Example 2 (Trending variances in the innovations) Let m be a positive integer and g(r) be g(r)2 = σ2

0 + (σ2 1 − σ2 0)rm, r ∈ [0, 1],

giving error variance changing from σ2

0 to σ2 1 continuously according to an m-th order power

  • function. Then

Λ = 1 + 2(δ2 − 1)/(m + 1) + (δ2 − 1)2/(2m + 1) [1 + (δ2 − 1)/(m + 1)]2 Γ−1 := f2

2 (m, δ)Γ−1,

where f2

2 (m, δ) =

µ 1 + δ2−1

m+1

¶−2 µ 1 + 2(δ2−1)

m+1

+ (δ2−1)2

2m+1

¶ and δ = σ1/σ0. Figure 2 plots the value of f2(m, δ) across δ ∈ [0.01, 100] for different values of m, so that both positive (δ > 1) and negative (δ < 1) trending heteroskedasticity is allowed. Compared with the case of a single abrupt shift in the innovation variance (Example 1), the multiplicative factor f2(m, δ) changes more steadily for a given value of m, especially when m is small (say, m = 1). In the case of large m (say, m = 6), much inefficiency in OLS is sustained when there is positive trending heteroskedasticity (δ > 1).

3 Adaptive Estimation

The GLS estimator β∗ in (9) is infeasible, since the true values of σt are unknown. To produce a feasible procedure, we propose a kernel-based estimator e β employing nonparametric estimates of the residual variances and having the same asymptotic distribution as β∗. Let K(z) be a kernel function defined on the real line such that K(z) is continuous at all but a finite number of points, 0 ≤ sup

−∞<z<∞K(z) < C for some finite real number C and

R ∞

−∞ K(z)dz = 1. Let b

ut = Yt − X0

t−1b

β 8

slide-9
SLIDE 9

be the OLS residuals and define the weighted squared residuals b σ2

t = T

X

i=1

wtib u2

i ,

where wti = K µ

t−i T b

T

P

i=1

K µ

t−i Tb

¶ := Kti

T

P

i=1

Kti with Kti := K µ

t−i T b

¶ and b is the bandwidth parameter, dependent on T. The implementation

  • f the estimator b

σ2

t depends on the choice of kernel function K and the bandwidth b. Consider

the uniform kernel K(z) = 0.5 for |z| ≤ 1, and K(z) = 0 otherwise. Then b σ2

t = 1

Tb X

|i−t|≤Tb

b u2

i

is the average of b u2

i for i falling into the bin with the center t and length 2Tb. Kernel functions

with infinite support are also possible, such as the Gaussian kernel, K(z) = (2π)−1/2 exp(−t2/2) for −∞ < z < ∞. In this case, wti assigns smaller weights to those b u2

i ’s whose i is far from t.

Define the adaptive least squares (ALS) estimator of β e β = µ T P

t=1

Xt−1X0

t−1b

σ−2

t

¶−1 µ T P

t=1

Xt−1Ytb σ−2

t

¶ (11) We use the following assumptions that modify and extend the earlier assumptions to facilitate the development of an asymptotic theory for e β. Assumption (iii’). There exists some finite positive number C such that sup

t E(ε8 t) < C < ∞;

(iv). E(ε3

t|Ft−1) = 0, a.s.;

(v). As T → ∞, b +

1 Tb2 → 0.

  • Remarks. We replace Assumption (iii) by the stronger assumption (iii’), which requires the

9

slide-10
SLIDE 10

existence of eighth moments of εt for all t. This moment condition simplifies the proof of the main theorem and is, no doubt, stronger than necessary. Assumption (v) is a rate condition that requires b → 0 at a slower rate than T −1/2. Assumption (iv) is satisfied if εt has a symmetric distribution conditional on the lagged observations, which is somewhat restrictive. This assumption could be avoided and the main theorem below (Theorem 2) would still hold, if we replaced the estimator b σ2

t by

b b σ

2 t = T

X

i=1,i6=t

wtib u2

i .

(12) We note in the simulations that the performance of the ALS estimator based on (12) is dominated by that based on (11), so that we do not pursue this estimator further here. The main result is as follows. Theorem 2 Under Assumptions (i)-(v) with (iii’) instead of (iii), as T → ∞, √ T(e β − β) = √ T(β∗ − β) + op(1)

d

→ N(0, Γ−1), where Γ−1 is estimated by (6). Remarks. (1) In practice, the bandwidth parameter b, when estimating the function g, can be chosen using cross-validation on the average squared error — see Wong (1983). Let b b σ

2 t be defined in (12).

The cross-validatory choice of b is the value b∗ which minimizes d CV (b) = 1 T

T

X

t=1

µ b u2

t − b

b σ

2 t

¶2 . (2) Alternative estimators include the one employed by Harvey and Robinson (1988), who deal with the time series regression with trending regressors. Rather than estimating each σ2

t

separately, they split the data into K blocks and estimate σ2

t in one block by the average of b

u2

t

in this block. So only K distinct estimators are used. It can be shown2 under the regularity assumptions, the resulting weighted least squares estimator of β also has the same asymptotic

2The proof is available from the authors upon request.

10

slide-11
SLIDE 11

distribution as e β if

1 T1 + T T 2

1 + T2

T → 0, as T → ∞, where T1 and T2 is the minimum and maximum

length of the K blocks. Compared to our estimator, this estimator is cheaper to compute but it does not integrate in an efficient way the information of b u2

s where s is close to t when estimating

σ2

t, especially when t is close to the boundary to the block. Furthermore, unreported simulation

results show that its performance is dominated by our kernel-based estimator in most cases.

4 Simulations

This section examines the finite sample performance of the ALS efficient procedure proposed in Section 3 using simulations of the heteroskedastic AR(1) model Yt = βYt−1 + ut, ut = σtεt, where σt = g ¡ t

T

¢ . The following values of β are used {−0.5, 0.1, 0.9}, and εt ∼ iidN(0, 1). Our simulation design basically follows Cavaliere (2004) and Cavaliere and Taylor (2004). The g functions generating heteroskedasticity are taken as the step function and polynomial function used in Examples 1 and 2, viz., Model 1: g(r)2 = σ2

0 + (σ2 1 − σ2 0)1{r≥τ}, r ∈ [0, 1].

Model 2: g(r)2 = σ2

0 + (σ2 1 − σ2 0)rm, r ∈ [0, 1].

In Model 1, the break date is chosen from {0.1, 0.5, 0.9} and the ratio of post-break and pre-break standard deviations δ = σ1/σ0 is set to the values {0.2, 5}. In Model 2, the order of polynomial function is taken from {1, 2, 6}, and δ ∈ {0.2, 5}. Without loss of generality, we let σ0 = 1. The estimates of β are obtained with sample size T = 60 and T = 200, and the number of replications is set to 10,000. We report estimates for β obtained by OLS, infeasible GLS and ALS. The label "ALS1" denotes the kernel-based ALS estimator (11) using the fixed bandwidth parameter b, b = 0.1333 when T = 60, and b = 0.040 when T = 200. The label "ALS2" refers to the ALS estimator with the bandwidth parameter chosen by the cross-validation procedure suggested in Section 3. 11

slide-12
SLIDE 12

Table 1 reports the ratio of the root mean squared errors (RMSE) of estimators considered relative to the RMSE of GLS in Model 1. OLS is clearly inefficient and the ALS estimator works reasonably well in all cases considered. The largest inefficiency of OLS is observed when an early shift in the innovation variance is negative, for instance, (τ, δ) = (0.1, 0.2), and when a late shift is positive, for instance, (τ, δ) = (0.9, 5). The former is explained by the fact that the large variance early in the sample affects all later observations and the latter is explained by the fact that the large variance in the last part of the sample means that the OLS estimator is more closely approximated by the terms involving the last few observations, thereby effectively reducing the sample size. In both these cases, substantial efficiency gains are achieved by the ALS estimator. In contrast, when there is a positive early shift or a negative late shift in the innovation variance, for instance, (τ, δ) = (0.1, 5) or (0.9, 0.2), OLS works nearly as well as GLS, especially when the sample size is large. The ALS estimator performs comparably well with OLS in those cases. The densities of the OLS and ALS estimators (after cross validation) in the cases mentioned above are plotted in Figure 3. In Panel (a) and (b), the significant improvement of ALS estimator upon OLS can be seen, while in Panel (c) and (d), we observe little difference between two estimators. We also note that the cross-validation procedure to choose the bandwidth of the ALS estimator works satisfactorily, but seems to be dominated by the one using the specified fixed bandwidth. When the sample size is increased from T = 60 to T = 200, the ALS estimators have the smaller ratio of RSME, while no improvement is observed for OLS. Table 2 reports the ratio of the RMSE’s of estimators considered relative to the RMSE of GLS in Model 2. The RMSE of the OLS estimator is more steady across the parameters in the heteroskedasticity function than in Model 1. The ALS estimator works remarkably well. Its ratio

  • f RMSE, relative to GLS is below 10% in all cases considered, especially when the sample size

is large. The densities of the OLS and ALS estimators (after cross validation) when m ∈ {2, 6}, and δ ∈ {0.2, 5}, are plotted in Figure 4. Simulations results, along with those not reported here, also show that, in both models the improvement of the ALS procedure relative to OLS is insensitive to the location of the true value

  • f the autoregressive parameter β, as long as |β| < 1.

We also checked the homoskedastic case when δ = 1 and show results in Table 1. OLS is equivalent to GLS when the errors are homoskedastic, so the ratio of RMSE of OLS relative to 12

slide-13
SLIDE 13

GLS is unity. We observe that in this case the the ALS estimator is also close to one, so that ALS may be used satisfactorily even when the errors are homoskedastic. In summary, the kernel-based ALS estimator and cross-validation procedure appear to perform very well, at least within the simulation design considered. Its advantages are clear, is convenient for practical use and has uniformly good performance over the parameter space.

5 Further Remarks

This paper considers efficient estimation of finite order autoregressive models under unconditional

  • heteroskedasticity. Several extensions of the approach taken in the paper are possible. One of them

is to consider the efficient estimation of unconditionally heteroskedastic stable autoregressions of possible infinite order. The issue is whether our nonparametric feasible GLS estimator is still asymptotically efficient when the order of autoregression, p, increases with the sample size, T. We leave these topics for future research.

6 Appendix A: Proofs of the Theorems.

This section gives the proofs of Theorem 1 and Theorem 2. The Proof of the Theorem 1. The WLS estimator b βWLS satisfies √ T(b βWLS − β) = µ

1 T T

P

t=1

w2

t Xt−1X0 t−1

¶−1 µ

1 √ T T

P

t=1

w2

t Xt−1ut

¶ . (13) It is easy to show that under Assumption (i)-(iii), {w2

t Yt−hYt−h−k − w2 t E(Yt−hYt−h−k)} is mean-

zero L1-NED (near-epoch dependent) on {εt} for 1 ≤ h ≤ p, 0 ≤ k ≤ p − h, and therefore a L1-mixingale with respect to Ft. It is uniform integrable by applying Lemma 1 (a) with µ = 2. By the law of large numbers for L1-mixingales (Andrews, 1988) we have 1 T X

t

µ w2

t Yt−hYt−h−k − w2 t E(Yt−hYt−h−k)

p

→ 0. (14) 13

slide-14
SLIDE 14

Lemma A(ii) of Phillips and Xu (2005) shows that for every continuous point r of g(·) lim

T→∞EY[Tr]−hY[Tr]−h−k

= g2(r)γk, (15) where [·] refers to the integer part. So by (14) 1 T X

t

w2

t Yt−hYt−h−k = 1

T X

t

w2

t E(Yt−hYt−h−k) + op(1)

=

T

X

t=1

Z

t+1 T t T

w2

[Tr]EY[T r]−hY[Tr]−h−kdr + op(1)

= Z

T +1 T 1 T

w2

[Tr]EY[T r]−hY[Tr]−h−kdr + op(1) p

→ µR w2g2 ¶ γk. (16) So we have 1

T T

P

t=1

Xt−1X0

t−1σ−2 t p

→ µR w2g2 ¶ Γ. Next we show that 1 √ T

T

X

t=1

w4

t Xt−1X0 t−1u2 t p

→ µR w4g4 ¶ Γ, (17) which holds if

1 T T

P

t=1

w4

t Yt−hYt−h−ku2 t p

→ γk for 1 ≤ h ≤ p, 0 ≤ k ≤ p − h. Indeed, since {w4

t Yt−hYt−h−ku2 t −w4 t σ2 tEYt−hYt−h−k, Ft} are martingale differences, so 1 T T

P

t=1

w4

t Yt−hYt−h−ku2 t = 1 T T

P

t=1

w4

t σ2 tEYt−hYt−h−k + op(1) p

→ µR w4g4 ¶ γk by similar arguments to (16). Furthermore, Ekw2

t Xt−1utk4 < ∞ by Lemma 1 (b) with µ = 2. By the central limit theorem for vector

martingale differences,

1 √ T T

P

t=1

w2

t Xt−1ut d

→ N µ 0, µR w4g4 ¶ Γ ¶ . Then Theorem 1 follows from (13). The Proof of the Theorem 2. We follow closely the proof of the theorem in Robinson (1987) using some of his notation. First, note that e β satisfies √ T(e β − β) = µ

1 T T

P

t=1

Xt−1X0

t−1b

σ−2

t

¶−1 µ

1 √ T T

P

t=1

Xt−1utb σ−2

t

¶ . Define a(f) =

1 √ T T

P

t=1

Xt−1utf−2

t

and A(f) =

1 T T

P

t=1

Xt−1X0

t−1f−2 t

, then we have √ T(β∗ − β) = 14

slide-15
SLIDE 15

A(σ)−1a(σ) and √ T(e β − β) = A(b σ)−1a(b σ) = A(σ)−1a(σ) + A(b σ)−1(a(b σ) − a(σ)) − A(σ)−1(A(b σ) − A(σ))A(b σ)−1a(σ). We have A(σ)

p

→ Γ which is positive definite, and a(σ) = Op(1), which follows from Markov’s inequality and E µ

1 √ T T

P

t=1

Yt−hutσ−2

t

¶2 = 1 T

T

X

t=1

σ−4

t EY 2 t−hu2 t ≤ C 1

T

T

X

t=1

EY 2

t−hu2 t < ∞,

by Lemma 1 (b) and Assumption (i). Hence Theorem 2 follows if we prove A(b σ) − A(σ)

p

→ 0, a(b σ) − a(σ)

p

→ 0. (18) Define e σ2

t = T

P

i=1

wtiu2

i and σ2 t = T

P

i=1

wtiσ2

i , and (18) follows from the following six results as

in Robinson (1987): (a) a(b σ) − a(e σ)

p

→ 0; (b) a(e σ) − a(σ)

p

→ 0; (c) a(σ) − a(σ) →p 0; (d) A(b σ) − A(e σ)

p

→ 0; (e) A(e σ) − A(σ)

p

→ 0; (f) A(σ) − A(σ)

p

→ 0. These will be shown as follows: (a) Since a(b σ) − a(e σ) =

1 √ T

P

t Xt−1ut e σ2

t −b

σ2

t

b σ2

t e

σ2

t , we have

° ° ° °a(b σ) − a(e σ) ° ° ° °

TI

≤ µ min

1≤t≤Te

σ2

t

¶−1 µ min

1≤t≤Tb

σ2

t

¶−1

T

X

t=1

° ° ° °Xt−1ut ° ° ° ° √ T ¯ ¯ ¯ ¯e σ2

t − b

σ2

t

¯ ¯ ¯ ¯

CS

≤ µ min

1≤t≤Te

σ2

t

¶−1 µ min

1≤t≤Tb

σ2

t

¶−1 µ

1 T T

P

t=1

° ° ° °Xt−1ut ° ° ° °

2¶1/2 µ T

P

t=1

¯ ¯ ¯ ¯e σ2

t − b

σ2

t

¯ ¯ ¯ ¯

2¶1/2

= Op( 1 Tb)

p

→ 0, by Lemma 1, 7, 9, 10. (b) We write 15

slide-16
SLIDE 16

a(e σ) − a(σ) = 1 √ T

T

X

t=1

Xt−1ut µ e σ−2

t

− σ−2

t

¶ = 1 √ T

T

X

t=1

Xt−1ut(σ2

t − e

σ2

t)σ−4 t

+ 1 √ T

T

X

t=1

Xt−1ut(σ2

t − e

σ2

t)2e

σ−2

t σ−4 t ,

(19) which holds since for two any nonzero real numbers p and q we have the following equality p−1 − q−1 = (q − p)q−2 + (q − p)2p−1q−2. We will show the two terms of (19) vanishes in

  • probability. For the first term, we note that {Xt−1ut(σ2

t − e

σ2

t)σ−4 t , Ft} is an m. d. sequence.

Indeed, we have E(Xt−1ut(σ2

t − e

σ2

t)σ−4 t |Ft−1)

= σ−2

t E(Xt−1ut|Ft−1) − σ−4 t

à E à Xt−1ut Ã

T

P

i=1,i6=t

wtiu2

i

! |Ft−1 ! + wttE(Xt−1u3

t|Ft−1)

! . (20) By Assumption (iv), E(Xt−1u3

t|Ft−1) = Xt−1E(u3 t|Ft−1) = 0. Further, we have

E Ã Xt−1ut Ã

T

P

i=1,i6=t

wtiu2

i

! |Ft−1 ! = 0, which holds since for the term i > t, E µ Xt−1utu2

i |Ft−1

¶ = Xt−1E µ utu2

i |Ft−1

¶ = Xt−1E µ utE µ u2

i |Fi−1

¶ |Ft−1 ¶ = Xt−1E µ ut|Ft−1 ¶ = 0, and for the term i < t, E µ Xt−1utu2

i |Ft−1

¶ = Xt−1u2

i E

µ ut|Ft−1 ¶ = 0. Thus, by (20) E(Xt−1ut(σ2

t − e

σ2

t)σ−4 t |Ft−1) = 0. So the first term of (19) converges to zero in

probability by the Markov inequality and 16

slide-17
SLIDE 17

E ° ° ° °

1 √ T T

P

t=1

Xt−1ut(σ2

t − e

σ2

t)σ−4 t

° ° ° °

2

≤ C T

T

X

t=1

E ° ° ° °Xt−1ut ° ° ° °

2

(σ2

t − e

σ2

t)2

≤ C T

T

X

t=1

µ E ° ° ° °Xt−1ut ° ° ° °

4¶1/2

· µ E(σ2

t − e

σ2

t)4

¶1/2 ≤ µ max

t E(σ2 t − e

σ2

t)4

¶1/2 · C T

T

X

t=1

µ E ° ° ° °Xt−1ut ° ° ° °

4¶1/2

= Op( 1 Tb)

p

→ 0, by Lemma 1 and 5. For the second term of (19), ° ° ° °

T

P

t=1 Xt−1ut √ T

(σ2

t − e

σ2

t)2e

σ−2

t σ−4 t

° ° ° ° ≤ C µ

1 T T

P

t=1

° ° ° °Xt−1ut ° ° ° °

2¶1/2 µ T

P

t=1

(σ2

t − e

σ2

t)4

¶1/2 = Op( 1 T 1/2b)

p

→ 0, by Lemma 1 and 5. This completes the proof of (b). (c) First we note σ2

t

µ σ−2

t

− σ−2

t

¶2 ≤ σ−4

t σ−2 t

¯ ¯ ¯ ¯σ2

t + σ2 t

¯ ¯ ¯ ¯ · ¯ ¯ ¯ ¯σ2

t − σ2 t

¯ ¯ ¯ ¯ ≤ C ¯ ¯ ¯ ¯σ2

t − σ2 t

¯ ¯ ¯ ¯ . (21) Since {Xt−1ut} is an m.d. sequence, we get 17

slide-18
SLIDE 18

E ° ° ° °a(σ) − a(σ) ° ° ° °

2

= 1 T

T

X

t=1

E µ° ° ° °Xt−1 ° ° ° °

2

u2

t

¶ µ σ−2

t

− σ−2

t

¶2 = 1 T

T

X

t=1

E µ° ° ° °Xt−1 ° ° ° °

2

E(u2

t|Ft−1)

¶ µ σ−2

t

− σ−2

t

¶2 = 1 T

T

X

t=1

E ° ° ° °Xt−1 ° ° ° °

2

σ2

t

¯ ¯ ¯ ¯σ−2

t

− σ−2

t

¯ ¯ ¯ ¯

2

≤ C T

T

X

t=1

E ° ° ° °Xt−1 ° ° ° °

2 ¯

¯ ¯ ¯σ2

t − σ2 t

¯ ¯ ¯ ¯ ≤ Cmax

t E

° ° ° °Xt−1 ° ° ° °

2

· 1 T

T

X

t=1

¯ ¯ ¯ ¯σ2

t − σ2 t

¯ ¯ ¯ ¯ = op(1), by Lemma 1 and 11. (d) It follows from ° ° ° °A(b σ) − A(e σ) ° ° ° ° ≤ µ min

1≤t≤Te

σ2

t

¶−1 µ min

1≤t≤Tb

σ2

t

¶−1 1 T

T

X

t=1

° ° ° °Xt−1 ° ° ° °

2 ¯

¯ ¯ ¯e σ2

t − b

σ2

t

¯ ¯ ¯ ¯ ≤ Cmax

t

¯ ¯ ¯ ¯e σ2

t − b

σ2

t

¯ ¯ ¯ ¯ · 1 T

T

X

t=1

° ° ° °Xt−1 ° ° ° °

2

= Op( 1 √ Tb ), by Lemma 1, 7, 8, 9. (e) This can be proved in the same way as (d) by employing Lemma 6. (f) It follows from ° ° ° °A(σ) − A(σ) ° ° ° ° ≤ µ min

1≤t≤Tσ2 t

¶−1 µ min

1≤t≤Tσ2 t

¶−1 1 T

T

X

t=1

° ° ° °Xt−1 ° ° ° °

2 ¯

¯ ¯ ¯σ2

t − σ2 t

¯ ¯ ¯ ¯ ≤ µ min

1≤t≤Tσ2 t

¶−1 µ min

1≤t≤Tσ2 t

¶−1 max

t

° ° ° °Xt−1 ° ° ° °

2 1

T

T

X

t=1

¯ ¯ ¯ ¯σ2

t − σ2 t

¯ ¯ ¯ ¯ = op(1), by Lemma 1, 4, 11. 18

slide-19
SLIDE 19

7 Appendix B: Supplementary Lemmas and Proofs.

This section states and proves some results (Lemma 1-Lemma 11) used in the proofs of the theorems. Lemma 1 (a) For 1 ≤ µ < ∞ and 1 ≤ h ≤ p, sup

1≤t≤T

EY 2µ

t−h < ∞

holds if sup

1≤t≤T

Eε2µ

t

< ∞;and sup

1≤t≤T

E µ Yt−hut ¶2µ < ∞ holds if sup

1≤t≤T

Eε4µ

t

< ∞.

  • Proof. (a) Note that Y 2

t−h = ∞

P

k=0 ∞

P

l=0

αkαlut−h−kut−h−l and E ¯ ¯ ¯ ¯ut−h−kut−h−l ¯ ¯ ¯ ¯

µ

≤ µ Eu2µ

t−h−kEu2µ t−h−l

¶1/2 < ∞. So we have E µ Yt−h ¶2µ = ° ° ° °Y 2

t−h

° ° ° °

µ µ

≤ µ ∞ P

k=0 ∞

P

l=0

αkαl ° ° ° °ut−h−kut−h−l ° ° ° °

µ

¶µ ≤ C µ ∞ P

k=0 ∞

P

l=0

αkαl ¶µ = C µ ∞ P

k=0

αk ¶2µ < ∞. (b) Since Y 2

t−hu2 t = ∞

P

k=0 ∞

P

l=0

αkαlut−h−kut−h−lu2

t and

E ¯ ¯ ¯ ¯ut−h−kut−h−lu2

t

¯ ¯ ¯ ¯

µ

≤ µ Eu4µ

t−h−kEu4µ t−h−l

¶1/4 µ Eu4µ

t

¶1/2 < ∞, so 19

slide-20
SLIDE 20

E µ Yt−hut ¶2µ = ° ° ° °Y 2

t−hu2 t

° ° ° °

µ µ

≤ µ ∞ P

k=0 ∞

P

l=0

αkαl ° ° ° °ut−h−kut−h−lu2

t

° ° ° °

µ

¶µ ≤ C µ ∞ P

k=0 ∞

P

l=0

αkαl ¶µ < ∞. Lemma 2. For any 1 ≤ t ≤ T,

1 T b T

P

i=1

Kti → R ∞

−∞ K(z)dz = 1, where Kti = K( t−i Tb ).

  • Proof. Let t − i = [Tx], where x is a real number, |x| < 1. Then

1 Tb

T

X

i=1

Kti =

T

X

i=1

Z (t−i+1)/T

(t−i)/T

K([Tx] Tb )d µ

x b

z=x/b

=

T

X

i=1

Z (t−i+1)/Tb

(t−i)/Tb

K([Tbz] Tb )dz = Z t/Tb

(t−T)/Tb

K([Tbz] Tb )dz → Z ∞

−∞

K(z)dz = 1. Lemma 3. max

t,i wti = O( 1 Tb).

  • Proof. It follows from wti =

µ

1 Tb T

P

i=1

Kti ¶−1

Kti Tb and Lemma 2.

Lemma 4. min

1≤t≤Tσ2 t ≥ c > 0.

  • Proof. It follows from min

1≤t≤Tσ2 t ≥ min 1≤i≤Tσ2 i ·

µ T P

i=1

wti ¶ ≥ inf

s∈[0,1]g2(s) ≥ c > 0.

Lemma 5. max

1≤t≤TE

¯ ¯ ¯ ¯e σ2

t − σ2 t

¯ ¯ ¯ ¯

4

= O µ

1 (Tb)2

¶ .

  • Proof. We make use of the Burkholder’s inequality (BI) (c.f. Shiryaev (1995), p499): for the

m.d. sequence ξ1, · · · , ξT and p > 1, there exists constant Ap and Bp, such that Ap ° ° ° ° ° µ T P

t=1

ξ2

t

¶1/2° ° ° ° °

p

≤ ° ° ° °

T

P

t=1

ξt ° ° ° °

p

≤ Bp ° ° ° ° ° µ T P

t=1

ξ2

t

¶1/2° ° ° ° °

p

. Let ai = u2

i − σ2 i , then ai is a m.d. sequence and Ea4 i < ∞.

20

slide-21
SLIDE 21

Then E µ e σ2

t − σ2 t

¶4 = E µ T P

i=1

wtiai ¶4 BI(p=4) ≤ E µ T P

i=1

w2

tia2 i

¶2 ≤ 1 (Tb)2 E µ T P

i=1

wtia2

i

¶2 ≤ 1 (Tb)2

T

X

i=1

wtiEa4

i = O

µ

1 (Tb)2

¶ , where the second-to-last line is by Jensen inequality f(

T

P

i=1

wtia2

i ) ≤ T

P

i=1

wtif(a2

i ) with convex

function f(x) = x2. Lemma 6. max

t

¯ ¯ ¯ ¯e σ2

t − σ2 t

¯ ¯ ¯ ¯

δ

= Op(T −δ/4b−δ/2), for δ = 1, 2.

  • Proof. It holds since

P µ max

t

¯ ¯ ¯ ¯e σ2

t − σ2 t

¯ ¯ ¯ ¯

δ

> CT −δ/4b−δ/2 ¶ ≤

T

X

t=1

P µ¯ ¯ ¯ ¯e σ2

t − σ2 t

¯ ¯ ¯ ¯

δ

> CT −δ/4b−δ/2 ¶ ≤ C−4Tb2

T

X

t=1

E ¯ ¯ ¯ ¯e σ2

t − σ2 t

¯ ¯ ¯ ¯

4

≤ C−4Tb2 · T · O µ

1 (Tb)2

¶ ≤ O(1). Lemma 7. µ min

1≤t≤Te

σ2

t

¶−1 = Op(1), as T → ∞.

  • Proof. It follows from Lemma 4 and

min

1≤t≤Tσ2 t ≤ min 1≤t≤Te

σ2

t + max t

¯ ¯ ¯ ¯e σ2

t − σ2 t

¯ ¯ ¯ ¯ = min

1≤t≤Te

σ2

t + op(1).

Lemma 8. max

1≤t≤T

¯ ¯ ¯ ¯b σ2

t − e

σ2

t

¯ ¯ ¯ ¯ = Op(

1 √ Tb).

21

slide-22
SLIDE 22
  • Proof. Note that

b σ2

t − e

σ2

t

=

T

X

i=1

wti µ b u2

i − u2 i

¶ =

T

X

i=1

wti µ (b β − β)0Xi−1X0

i−1(b

β − β) − 2uiX0

i−1(b

β − β) ¶ , and max

t,i T

P

i=1

w2

ti ≤ max t,i wti · T

P

i=1

wti = O( 1

T b). Thus

max

1≤t≤T

¯ ¯ ¯ ¯b σ2

t − e

σ2

t

¯ ¯ ¯ ¯ ≤ max

1≤t≤T T

X

i=1

wti ¯ ¯ ¯ ¯(b β − β)0Xi−1X0

i−1(b

β − β) − 2uiX0

i−1(b

β − β) ¯ ¯ ¯ ¯ ≤ max

1≤t≤T T

X

i=1

wti ° ° ° °b β − β ° ° ° °

2 °

° ° °Xi−1 ° ° ° °

2

+ 2 max

1≤t≤T T

X

i=1

wti ° ° ° °uiX0

i−1

° ° ° ° ° ° ° °b β − β ° ° ° ° ≤ max

t,i wti ·

° ° ° °b β − β ° ° ° °

2 T

X

i=1

° ° ° °Xi−1 ° ° ° °

2

+ 2 ° ° ° °b β − β ° ° ° ° · µ max

t,i T

P

i=1

w2

ti

¶1/2 · µ T P

i=1

° ° ° °uiX0

i−1

° ° ° ° ¶1/2 = Op( 1 Tb) + Op( 1 √ Tb ) = Op( 1 √ Tb ). Lemma 9. µ min

1≤t≤Tb

σ2

t

¶−1 = Op(1), as T → ∞.

  • Proof. It follows from Lemma 7 and

min

1≤t≤Te

σ2

t ≤ min 1≤t≤Tb

σ2

t + max t

¯ ¯ ¯ ¯b σ2

t − e

σ2

t

¯ ¯ ¯ ¯ = min

1≤t≤Tb

σ2

t + op(1).

Lemma 10.

T

P

t=1

µ b σ2

t − e

σ2

t

¶2 = Op(

1 (T b)2 ).

  • Proof. Since

b σ2

t − e

σ2

t = T

X

i=1

wti µ b u2

i − u2 i

¶ = (b β − β)0 µ T P

i=1

w2

tiXi−1X0 i−1

¶ (b β − β) − 2 µ T P

i=1

w2

tiuiX0 i−1

¶ (b β − β), we have 22

slide-23
SLIDE 23

T

X

t=1

µ b σ2

t − e

σ2

t

¶2

T

T

X

t=1

C ð ° ° °b β − β ° ° ° °

4 °

° ° °

T

P

i=1

w2

tiXi−1X0 i−1

° ° ° °

2

+ ° ° ° °

T

P

i=1

w2

tiuiX0 i−1

° ° ° °

2 °

° ° °b β − β ° ° ° °

2!

≤ ° ° ° °b β − β ° ° ° °

4 T

X

t=1

C µ T P

i=1

w2

ti

° ° ° °Xi−1 ° ° ° °

2¶2

+ ° ° ° °b β − β ° ° ° °

2 T

X

t=1

C µ T P

i=1

w2

ti

° ° ° °uiX0

i−1

° ° ° ° ¶2 (22) The first term of (22) is bounded by ° ° ° °b β − β ° ° ° °

4 T

X

t=1

C µ sup

i

° ° ° °Xi−1 ° ° ° °

2

· max

t,i wti · T

P

i=1

wti ¶2 = Op( 1 T 3b2 ), and similarly the second term of (22) is Op(

1 T 2b2 ). So Lemma 10 follows.

Lemma 11.

1 T T

P

t=1

|σ2

t − σ2 t| = o(1).

  • Proof. Without loss of generality and given the sample size T, suppose t1

T , · · · , tD T happen to

be the discontinuous points of g, then D is a finite number (independent of T). For M > 0 define P

i 0 =

P

|i−t|≤MTb

and P

i 00 =

P

|i−t|>MTb

.Then for t 6= t1, · · · , tD 1 Tb

T

X

i=1

Kti ¯ ¯ ¯ ¯σ2

t − σ2 t

¯ ¯ ¯ ¯ = 1 Tb

T

X

i=1

Kti ¯ ¯ ¯ ¯σ2

i − σ2 t

¯ ¯ ¯ ¯ ≤ 1 Tb X

i 0Kti

¯ ¯ ¯ ¯g2( i

T ) − g2( t T )

¯ ¯ ¯ ¯ + C Tb X

i 00Kti

(23) The first term of (23) is bounded by C max

|i−t|≤MT b

¯ ¯ ¯ ¯

i−t T

¯ ¯ ¯ ¯ 1 Tb · X

i 0Kti ≤ C MTb

T 1 Tb · 2MTb. For the second term, similarly to the proof of Lemma 2 we can get C Tb X

i 00Kti → C

Z

|z|≥M

K(z)dz. 23

slide-24
SLIDE 24

Thus (23) converges to zero by letting T → ∞ then M → ∞. In view of Lemma 2, we establish max

t6=t1,··· ,tD|σ2 t − σ2 t| = o(1). Thus

1 T

T

X

t=1

|σ2

t − σ2 t|

= 1 T X

t=t1,··· ,tD

|σ2

t − σ2 t| + 1

T

T

X

t6=t1,··· ,tD,t=1

|σ2

t − σ2 t|

≤ D T C + 1 T (T − D) max

t6=t1,··· ,tD|σ2 t − σ2 t| = o(1).

8 Acknowledgement

Phillips gratefully acknowledges support from the Kelly Foundation at the Business School, Uni- versity of Auckland, and the NSF under NSF Grant No. SES 04-142254. The authors gratefully acknowledge valuable comments from participants at the second meeting of Singapore Economet- ric Study Group (SESG) at Singapore Management University. 24

slide-25
SLIDE 25

References

[1] Abaraham, B., Wei, W., 1984. Inference about the parameters of a time series model with changing variance. Metrika 31, 183-194. [2] Andrews, D. W. K., 1988. Laws of large numbers for dependent non-identically distributed random variables. Econometric Theory 4, 458-467. [3] Baufays, P., Rasson, J. P., 1985. Variance changes in autoregressive models. Time Series Analysis: Theory and Practice. 2nd Ed. Springer, New York. [4] Busetti, F., Taylor, A. M. R., 2003. Variance shifts, structural breaks, and stationarity tests. Journal of Business and Economic Statistics 21(4), 510-31. [5] Carroll, R. J., 1982. Adapting for heteroskedasticity in linear models. Annals of Statistics 10, 1224-1233. [6] Cavaliere, G., 2004a. Testing stationarity under a permanent variance shift. Economics Let- ters 82, 403-408. [7] Cavaliere, G., 2004b. Unit root tests under time-varying variance shifts. Econometric Reviews 23, 259-292. [8] Cavaliere, G., Taylor, A. M. R., 2004. Testing for unit roots in time series models with non-stationary volatility. Working paper, University of Birmingham. [9] Chung, H., Park, J. Y., 2004. Nonstationary nonlinear heteroskedasticity in regression. Work- ing paper, Rice University. [10] de Pooter, M., van Dijk, D., 2004. Testing for changes in volatility in heteroskedastic time series - a further examination. Econometric Institute Report EI 2004-38. [11] Durbin, J., 1960. Esitmation of parameters in time series regression models. Journal of the Royal Statistical Society, Series A 22, 139-153. [12] Galeano, P., Peña, D., 2004. Variance changes detection in multivariate time series. Working paper 04-13, Statistics and Econometrics Series 05, Universidad Carlos III de Madrid, Spain. 25

slide-26
SLIDE 26

[13] Goncalves, S., Kilian, L., 2004a. Bootstrapping autoregression with conditional heteroskedas- ticity of unknown form. Journal of Econometrics 123, 89-120. [14] Godambe, V. P., 1960. An optimum property of regular maximum likelihood equation. Annals

  • f Mathematical Statistics 31, 1208-1211.

[15] Goncalves, S., Kilian, L., 2004b. Asymptotic and bootstrap inference for AR(∞) processes with conditional heteroskedasticity. CIRANO Working paper. [16] Hansen, B. E., 1995. Regression with nonstationary volatility. Econometrica 63, 1113-1132. [17] Harvey, A. C., Robinson, P. M., 1988. Efficient estimation of nonstationary time series re-

  • gression. Journal of Time Series Analysis 9, 201-214.

[18] Hamori, S., Tokihisa, A., 1997. Testing for a unit root in the presence of a variance shift. Economics Letters 57, 245-253. [19] Kim, T. H., Leybourne, S., Newbold, P., 2002. Unit root tests with a break in innovation

  • variance. Journal of Econometrics 109, 365-387.

[20] Kitamura, Y., Tripathi, G., Ahn, H., 2004. Empirical likelihood-based inference in conditional moment restriction models. Econometrica 72, 1667-1714. [21] Kuersteiner, G. M., 2001. Optimal instrumental variables estimation for ARMA models. Journal of Econometrics 104, 359-405. [22] Kuersteiner, G. M., 2002. Efficient IV estimation for autoregressive models with conditional

  • heteroskedasticity. Econometric Theory 18 (3), 547-583.

[23] Lee, S., Park, S., 2001. The cusum of squares test for scale changes in infinite order moving average models. Scandinavian Journal of Statistics 28(4), 625-644. [24] Park, J., Phillips, P. C. B., 1999. Asymptotics for nonlinear transformations of integrated time series. Econometric Theory 15, 269-298. [25] Park, J., Phillips, P. C. B., 2001. Nonlinear regression with integrated time series. Econo- metrica 69, 117-161. 26

slide-27
SLIDE 27

[26] Park, S., Lee, S., Jeon J., 2000. The cusum of squares test for variance changes in infinite

  • rder autoregressive models. Journal of the Korean Statistical Society 29, 351-361.

[27] Phillips, P. C. B., Xu, K.-L., 2005. Inference in autoregression under heteroskedasticity. mimeo, Yale University. [28] Robinson, P. M., 1987. Asymptotically efficient estimation in the presence of heteroskedas- ticity of unknown form. Econometrica 55, 875-891. [29] Robinson, P. M., 1989. Nonparametric estimation of time-varying parameters. In Statistical Analysis and Forecasting of Economic Structural Change (P. Hackl ed.). Amsterdam: North- Holland, 253-264. [30] Robinson, P. M., 1991. Time-varying nonlinear regression. In Economic Structural Change (P. Hackl and A. H. Westlund eds.). Berlin: Springer-Verlag, 179-190. [31] Shiryaev, A. N., 1995. Probability. New York: Springer-Verlag. [32] Tsay, R. S., 1988. Outliers, level shifts and variance changes in time series. Journal of Fore- casting 7, 1-20. [33] Wichern, D., Miller, R., Hsu, D., 1976. Changes of variance in first order autoregression time series models - with an application. Applied Statistics 25, 248-256. 27

slide-28
SLIDE 28

Table 1: The ratio of the RMSE relative to that of GLS in Model 1 (The levels of RMSE are reported

for GLS)

T = 60 T = 200 β τ δ OLS ALS1 ALS2 GLS OLS ALS1 ALS2 GLS

  • 0.5

0.1 0.2 2.1204 1.3246 1.3405 [.0967] 2.3136 1.1564 1.2091 [.0583] 1 1.0000 1.0101 1.0130 [.1190] 1.0000 1.0030 1.0058 [.0569] 5 1.0329 1.0595 1.0570 [.1156] 1.0446 1.0471 1.0450 [.0613] 0.5 0.2 1.5621 1.2714 1.3052 [.0987] 1.4704 1.1026 1.1364 [.0562] 5 1.3140 1.1129 1.1521 [.1147] 1.3639 1.0698 1.1177 [.0608] 0.9 0.2 1.1820 1.1767 1.1811 [.1023] 1.0915 1.1185 1.1217 [.0564] 5 2.0619 1.2267 1.2602 [.1198] 2.4099 1.1157 1.1857 [.0601] 0.1 0.1 0.2 2.1256 1.3755 1.4076 [.1113] 2.3017 1.1224 1.1831 [.0648] 1 1.0000 1.0197 1.0095 [.1296] 1.0000 1.0094 1.0051 [.0659] 5 1.0324 1.0516 1.0424 [.1259] 1.0430 1.0415 1.0467 [.0732] 0.5 0.2 1.4741 1.2324 1.2612 [.1150] 1.4650 1.1155 1.1547 [.0643] 5 1.2784 1.1029 1.1326 [.1310] 1.3786 1.0504 1.0693 [.0698] 0.9 0.2 1.1527 1.1665 1.1575 [.1161] 1.0970 1.1070 1.1183 [.0655] 5 2.0710 1.2388 1.2740 [.1252] 2.2879 1.0839 1.1138 [.0690] 0.9 0.1 0.2 1.9045 1.2771 1.3360 [.0624] 2.3275 1.1754 1.2246 [.0295] 1 1.0000 1.0044 1.0081 [.0776] 1.0000 1.0041 1.0055 [.0365] 5 1.0352 1.0441 1.0388 [.0797] 1.0516 1.0526 1.0540 [.0337] 0.5 0.2 1.7187 1.2607 1.3005 [.0607] 1.6318 1.1637 1.2052 [.0279] 5 1.5026 1.1886 1.2416 [.0794] 1.3985 1.0535 1.0773 [.0358] 0.9 0.2 1.2994 1.2706 1.2591 [.0617] 1.1829 1.1299 1.1558 [.0289] 5 2.2604 1.2429 1.3065 [.0695] 2.3215 1.0857 1.1646 [.0346]

.01 .1 1 10 100 1 2 3

(a) τ =0.1

δ f

1

(τ,δ) .01 .1 1 10 100 1 2 3

(b) τ =0.5

δ .01 .1 1 10 100 1 2 3

(c) τ =0.9

δ

Figure 1: The values of f1(τ, δ) ( y-axis) in Example 1 across δ ( x-axis) for different values of τ: (a) τ = 0.1; (b) τ = 0.5; (c) τ = 0.9. 28

slide-29
SLIDE 29

Table 2: The ratio of the RMSE relative to that of GLS in Model 2 (The levels of RMSE are reported

for GLS)

T = 60 T = 200 β m δ OLS ALS1 ALS2 GLS OLS ALS1 ALS2 GLS

  • 0.5

1 0.2 1.1329 1.0269 1.0500 [.1151] 1.1344 1.0371 1.0370 [.0613] 5 1.0869 1.0214 1.0471 [.1223] 1.1005 1.0245 1.0226 [.0610] 2 0.2 1.1408 1.0739 1.0823 [.1105] 1.0781 1.0173 1.0243 [.0624] 5 1.2286 1.0447 1.0696 [.1193] 1.2579 1.0336 1.0226 [.0587] 6 0.2 1.0926 1.0861 1.0856 [.1095] 1.0474 1.0550 1.0400 [.0610] 5 1.5504 1.0607 1.0994 [.1192] 1.5361 1.0251 1.0412 [.0639] 0.1 1 0.2 1.1297 1.0406 1.0608 [.1260] 1.1149 1.0343 1.0362 [.0672] 5 1.1428 1.0364 1.0573 [.1305] 1.1251 1.0295 1.0269 [.0743] 2 0.2 1.0887 1.0465 1.0619 [.1257] 1.0875 1.0383 1.0389 [.0678] 5 1.1949 1.0324 1.0597 [.1332] 1.2854 1.0294 1.0287 [.0695] 6 0.2 1.0607 1.0573 1.0573 [.1248] 1.0376 1.0258 1.0223 [.0713] 5 1.5141 1.0553 1.0930 [.1317] 1.6076 1.0442 1.0438 [.0689] 0.9 1 0.2 1.1460 1.0378 1.0634 [.0708] 1.1552 1.0179 1.0278 [.0317] 5 1.0962 1.0204 1.0398 [.0800] 1.1121 1.0247 1.0268 [.0352] 2 0.2 1.1312 1.0501 1.0615 [.0702] 1.0603 1.0303 1.0249 [.0344] 5 1.2342 1.0468 1.0820 [.0843] 1.2578 1.0194 1.0172 [.0340] 6 0.2 1.1097 1.0933 1.0987 [.0716] 1.0302 1.0365 1.0301 [.0345] 5 1.5187 1.0642 1.1141 [.0820] 1.6012 1.0278 1.0291 [.0339]

.01 .1 1 10 100 1 1.5 2 δ f

2

(m,δ)

(a) m=1

.01 .1 1 10 100 1 1.5 2

(b) m=2

δ .01 .1 1 10 100 1 1.5 2

(c) m=6

δ

Figure 2: The values of f2(m, δ) ( y-axis) in Example 2 across δ ( x-axis) for different values of m: (a) m = 1; (b) m = 2; (c) m = 6. 29

slide-30
SLIDE 30
  • 0.5

0.5 1 5 10

(a) β∈{0.1, 0.9}, τ=0.1, δ=0.2 β

  • 0.5

0.5 1 5 10

(b) β∈{0.1, 0.9}, τ=0.1, δ=0.2 β

  • 0.5

0.5 1 5 10

(c) β∈{0.1, 0.9}, τ=0.1, δ=5 β

  • 0.5

0.5 1 5 10

(d) β∈{0.1, 0.9}, τ=0.1, δ=5 β

T=60 T=60 T=200 T=200

β=0.1 β=0.9 β=0.1 β=0.9 β=0.1 β=0.9 β=0.1 β=0.9

Figure 3: Densities of the OLS (solid lines) and ALS2 (after cross-validation) estimators (dashed lines) in Model 1. 30

slide-31
SLIDE 31
  • 0.5

0.5 1 5 10

β (a) β∈{0.1, 0.9}, m=2, δ=0.2

  • 0.5

0.5 1 5 10

(b) β∈{0.1, 0.9}, m=2, δ=0.2 β

  • 0.5

0.5 1 5 10

(c) β∈{0.1, 0.9}, m=2, δ=5 β

  • 0.5

0.5 1 5 10

(d) β∈{0.1, 0.9}, m=2, δ=5 β

T=60 T=200 T=60 T=200

β=0.1 β=0.9 β=0.1 β=0.9 β=0.1 β=0.9 β=0.1 β=0.9

Figure 4: Densities of the OLS (solid lines) and ALS2 (after cross-validation) estimators (dashed lines) in Model 2. 31

slide-32
SLIDE 32

Bootstrapping Autoregression under Nonstationary Volatility

Ke-Li Xu∗ Program in Applied Mathematics, Department of Mathematics, Yale University, 51 Prospect Street, 111, New Haven, Connecticut, USA 06511-8937 Email: keli.xu@yale.edu January 15, 2006

Abstract This paper studies a stable autoregressive model around a polynomial trend with nonsta- tionary conditional variances. The formulation of the volatility process is general including a wide variety of deterministic and stochastic nonstationary volatility specifications existing in the literature. The aim of the paper is three-fold. First, it develops a limit theory for least squares estimates and shows how nonstationary volatility affects consistency, convergence rates and the asymptotic distribution of slope and drift parameters in different ways. Second, it studies conventional residual-based bootstrap (iid bootstrap) and wild bootstrap procedures in the presence of nonstationary volatility, and shows that the iid bootstrap is invalid in general and the wild bootstrap is first-order asymptotically valid only when the estimates are (mixed) Gaussian, neglecting potential leverage effects. Third, a CUSUM of squares test for variance constancy is shown to be consistent against a wide range of nonstationary volatility. Simulations reveal that more accurate coverage rates of confidence intervals are achieved by the wild boot- strap when it is valid, than those based on the asymptotic approximation or the iid bootstrap, under the nonstationary volatility considered. Keywords: Autoregressions, Bootstrap, CUSUM of squares test, Nonstationary volatility, Stochastic volatility, Wild bootstrap. JEL Classification: C15, C22.

∗I gratefully acknowledge my advisor, Peter Phillips, for his guidance and suggestions.

I also benefited from comments by Donald Andrews, Paramaraj Jegenathan and other seminar participants at Yale University.

1

slide-33
SLIDE 33

1 INTRODUCTION

The problem of nonstationarity in the variances of time series models has been extensively inves- tigated in the literature. The simplest approach to accounting for nonstationary variances is to assume the variances of the observations to be a continuous or discrete function of time. Wichern et al (1976) proposed a two-stage method to detect step changes of variance in a Gaussian stable first-

  • rder autoregression model and found strong evidence for variance changes in daily stock price data.

Discrete variance changes in independent series or time series have also been studied by Abraham and Wei (1984), Baufays and Rasson (1985), Inclán (1993), Inclán and Tiao (1994), Park, Lee and Jeon (2000), de Pooter and van Dijk (2004) and Galeano and Peña (2004), and a general framework to accommodate various kinds of deterministic volatility patterns was provided recently by Phillips and Xu (2006), allowing for structural changes, trending behavior, or seasonal effects in volatility across time. Nonstationary variances may also be modeled by assuming the volatility (viz., conditional stan- dard deviation) process to be a transformation of a latent integrated process. Hansen (1995) for- mulated the conditional variances of martingale difference (m.d.) innovations in a regression model as a nonlinear continuous function of a nearly integrated autoregressive stochastic volatility pro-

  • cess. Chung and Park (2005) treated the volatility as an asymptotically homogeneous or integrable

function of an unscaled integrated time series, as distinct from Hansen’s framework, where a scale normalization is needed. Parallel to this line of research, other authors have considered the effects of nonstationary volatil- ity on regression models where the regressors are integrated, e.g. unit root tests (Hamori and Toki- hisa, 1997, Boswijk, 2001, 2005, Kim et al, 2002, and Cavaliere, 2004b) and stationarity tests (Busetti and Taylor, 2003 and Cavaliere, 2004a), and some robust versions are still under investigation (see Cavaliere and Taylor, 2004 and Beare, 2005, for robust unit root tests, and Cavaliere and Taylor, 2005, for robust stationarity tests). The current paper studies residual-based bootstrap inference in the context of a finite-order sta- ble autoregressive process (AR(p)) around a polynomial trend in the presence of deterministic or stochastic nonstationary volatility. While conventional inference procedures based on asymptotic approximations can be poor in some cases even when the sample size is relatively large, as docu- mented by simulations in Phillips and Xu (2006) for the case of certain step changes in volatility, it 2

slide-34
SLIDE 34

is natural to turn to the bootstrap for improvements. To compare the asymptotics of statistics of interest and their bootstrap analogs, in Section 3.1 we develop a limit theory under a general model

  • f nonstationary volatility processes, nesting as special cases several formulations of nonstationary

volatility existing in the literature. Two kinds of recursive residual-based bootstrap method, the conventional iid bootstrap and the wild bootstrap, are studied in Section 3.2. We show that the conventional iid bootstrap (see, e.g., Bose, 1988, 1990, Kreiss and Franke, 1992) breaks down under heterogeneous innovations, which is not surprising since it resamples the error terms as if they were independent and identically distributed. The wild bootstrap proposed by Wu (1986) and Liu (1988) allowing heteroskedasticity in errors seems to be the better choice when the volatility process is

  • nonstationary. We establish that the validity of the wild bootstrap depends crucially on the (mixed)

Gaussianity of the limit distribution of the least squares estimates. The recursive design wild boot- strap in a zero mean stable autoregression was studied recently by Goncalves and Kilian (2004), who established the validity conditions for the wild bootstrap under conditional heteroskedasticity allowing for a variety of GARCH-type models and stationary stochastic volatility models. However, they assumed the unconditional variance to be constant over time. The current paper extends their approach to account for nonstationary volatility, as well as conditional heteroskedasticity in the

  • innovations. In Section 3.3, we show that a CUSUM test for variance constancy, proposed by Inclán

and Tiao (1994), can be used to test against a wide range of continuous or discrete, deterministic or stochastic nonstationary volatility, not just discrete variance changes. Simulation results reported in Section 4 reveal that more accurate coverage rates of confidence intervals are achieved by the wild bootstrap when it is valid, than those based on the asymptotic approximation or iid bootstrap, under the nonstationary volatility considered. Section 5 concludes. Proofs of the main results are collected in an appendix.

2 THE MODEL AND ASSUMPTIONS

Consider the AR(p) process around a deterministic m-th order polynomial trend (with intercept) Yt =

m

X

i=0

δiti +

p

X

j=1

θjYt−j + εt. (1) 3

slide-35
SLIDE 35

The slope coefficients θ1, · · · , θp are such that all roots of θ(L) = 1 + Pp

j=1 θjLj are outside the

unit circle. Considering a polynomial time trend, not merely a linear trend, gives more flexibility in modeling deterministically trending behavior of economic time series since any smooth nonlinear function of time admits a polynomial approximation. The error εt in (1) is cast as the stochastic volatility model of general form εt = σtηt, (2) where ηt is iid with zero mean and unit variance, and σt is a strictly positive stochastic volatility process to be specified. Let Ft−1 = σ{ηt−j, σt+1−j, j ≥ 1}, the sigma-field generated by the random variables ηt−j, σt+1−j, j ≥ 1, be the observable information set up to time t − 1, and assume ηt is independent of Ft−1. Then σ2

t = V ar(Yt|Ft−1), viz., conditional variance of Yt is characterized by

the process σ2

  • t. Since σ2

t depends on the past events, the formulation (2) also features conditional

heteroskedasticity. To focus on the nonstationarity of the volatility process σ2

t, we make the following assumption.

Throughout the paper, integrals are taken over the limits 0 to 1 if not specified. The symbol [·] means integer part, and ⇒ means weak convergence with respect to the uniform metric on [0, 1]. For brevity, stochastic integrals like R X(r)dY (r) and integrals with respect to Lebesgue measure like R X(r)dr will be written as R XdY and R X, respectively. Assumption 1. (i). The sequence σt and ηt satisfy ⎛ ⎜ ⎝ n−1/2 Pt

τ=1 fτjητ

γ−1

n σt

⎞ ⎟ ⎠ := ⎛ ⎜ ⎝ n−1/2 P[ns]

τ=1 fτjητ

γ−1

n σ[ns]

⎞ ⎟ ⎠ ⇒ ⎛ ⎜ ⎝ W j(s) σ(s) ⎞ ⎟ ⎠ , (3) as n → ∞, for j = 0, 1, · · · , and some nonrandom sequence γn, where fτj = ⎧ ⎪ ⎨ ⎪ ⎩ 1, if j = 0, ητ−j, if j = 1, 2, · · · and σ(·) is a nonnegative process with piecewise continuous sample paths and E( R σ2) < ∞. (ii). There exists a finite positive number C such that for some a > 0, max1≤t≤n Eη4+a

t

, max1≤t≤n E[γ−1

n σt]4+a ≤ C < ∞ uniformly in n.

4

slide-36
SLIDE 36

Assumption 1 (i) is similar in spirit to Boswijk (2005)’s framework. Note that the joint conver- gence of W j(s) and σ(s) is assumed in (3). Under the independence of {σt} and {ηt}, (3) reduces to the marginal convergence γ−1

n σt := γ−1 n σ[ns] ⇒ σ(s),

(4) n → ∞, since n−1/2 P[ns]

τ=1 fτjητ ⇒ W j(s) holds by an invariance principle for m.d. sequences. Here

the nonrandom sequence γn is referred to as the asymptotic order of nonstationary volatility in the

  • following. The key assumption is (4), viz., volatility process σt, redefined as a stochastic process on

[0, 1] after proper standardization, converges weakly when the sample size approaches infinity. It is natural to assume (4) in stochastic volatility models, which are proposed in the finance literature as discrete-time approximation of continuous-time models (see, e.g., Shephard, 2005). It also has the advantage that the volatility can be generated exactly from a continuous-time process. While we focus on the nonstationarity here, of course stationary volatility is also allowed. The weak convergence (4) is satisfied by a broad range of nonstationary volatility models studied in the literature. One simple case is the deterministic volatility studied by Phillips and Xu (2006). In this case, σt is specified as a deterministic piecewise continuous function of time t : σt = g( t n), (5) allowing for abrupt change points, trending or seasonal effects in the volatility process. In this case (4) is satisfied with γn = 1 and σ(s) = g(s). Another instance assumes σt as a function of an autoregressive process with autoregressive roots local to unity, studied in Hansen (1995). With more general assumptions on ηt, he modeled the volatility process as σt = g(c1 + c2n−1/2S0

t )

where ρ(L)S0

t = zt with ρ(L) = (1−ρ1L) and ρ1 = 1−c3/n , ci’s(i = 1, · · · , 4) are constants, and zt

is a conditionally homoskedastic m.d. sequence. By standard near unit root asymptotics, as n → ∞ σt ⇒ σ(s) = g(c1 + c2U c3(s)) 5

slide-37
SLIDE 37

where U c3(·) is a diffusion process depending on c3.1 After suitable normalization (other than n−1/2), (4) is also satisfied when S0

t is specified as a nonstationary long memory process by letting ρ(L) =

(1 − L)1+d, where −1/2 < d < 1/2, with certain re-normalization. For all the cases aforementioned the scalar sequence γn = 1. (4) is also satisfied when σt is an asymptotically homogeneous function of an integrated time series: σt = g(zt). (6) The function g(·) is asymptotically homogeneous if there exists a locally Riemann integrable function ϑ(·), such that for large λ, g(λs) = ν(λ)ϑ(s) + o(ν(λ)) uniformly in s over any compact interval (see Park and Phillips, 1999, 2001). Assume the invariance principle holds for (1 − L)zt with limit Brownian motion W(s) : n−1/2z[ns] ⇒ W(s), and (4) is satisfied with γn = ν(√n) and σ(s) = ϑ(W(s)) : γ−1

n σt = γ−1 n g(√n · n−1/2z[ns]) = ϑ(n−1/2z[ns]) + o(1) ⇒ ϑ(W(s)) = σ(s).

The volatility process in (6) is termed nonstationary nonlinear heteroskedasticity (NNH) by Park (2002) and regression models with NNH are studied by Chung and Park (2005). Sometimes we impose the following stronger assumption. Assumption 2. σ(s) is independent of W j(s) for any j = 0, 1, · · · . Under the high-level Assumption 2 the least squares estimates of (1) are asymptotically mixed Gaussian (Theorem 1), and the wild bootstrap recovers the correct limit distribution with some further restrictions on the growth rate of γn (Section 3.2). Deterministic nonstationary volatility satisfies Assumption 2 trivially and, for nonstationary stochastic volatility, lower level assumptions

  • n ηt and latent variables may be explored, depending on particular (parameterized) model of

volatility process (see Hansen, 1995, Assumption 4).

1Phillips and Xu (2006) noticed that if St were nearly integrated with drift and of the form St = c4t+ S0 t , say,

then extending Hansen’s approach would lead to the following conditional variance function σt = g(c1 + c2T −1(c4t + S0

t )),

with a natural normalization factor of T −1 in place of T −1/2. For such extensions, Hansen’s model is asymptotically equivalent to (5) (with centain redefinition of g(·)).

6

slide-38
SLIDE 38

3 MAIN RESULTS

3.1 Limit Theory

This subsection develops an asymptotic distribution theory of the model in (1) under nonstationary volatility satisfying Assumption 1. Before proceeding, we introduce a useful transformation of the regressors in (1). Hamilton (1994) used this transformation in deriving the limit distribution of an AR(p) process around a linear trend, and we can build, with some extra effort, a more general framework that gives us the flexibility to analyze the AR(p) process around a polynomial trend in (1) (see also Ing, 2003). Rewrite Yt as Yt =

m

X

i=0

µiti +

p

X

j=1

θj e Yt−j + εt, (7) where e Yt−j = Yt−j − Pm

i=0 µi(t − j)i, 0 ≤ j ≤ p, and

µm = δm 1 − Pp

j=1 θj

, µm−i = δm−i + Pi

l=1(−1)l¡m−i+l l

¢ µ Pp

j=1 jlθj

¶ µm−i+l 1 − Pp

j=1 θj

, 1 ≤ i ≤ m. (8) Then the regressors in (7) involve a polynomial trend and lags of a zero mean stable AR(p) process e Yt, which follows θ(L)e Yt = εt. (9) In matrix form, regression equations (1) and (7) can be written as Yt = X0

tβ + εt = e

X0

tα + εt = X0 tG0(G0)−1β + εt,

(10) where Xt = (Yt−1, · · · , Yt−p, 1, t, · · · , tm)0, β = (θ1, · · · , θp,δ0, · · · , δm)0, e Xt = GXt = (e Yt−1, · · · , e Yt−p, 1, t, · · · , tm)0, α = (G0)−1β = (θ1, · · · , θp,µ0, · · · , µm)0, 7

slide-39
SLIDE 39

and the (m + p + 1) × (m + p + 1) matrix G

0 =

⎛ ⎜ ⎝ Ip H0

(m+1)×p

Im+1 ⎞ ⎟ ⎠ , with H = µ Hjk ¶ = − Pm−k+1

h=0

µk+h−1 ¡k+h−1

h

¢ (−j)h. Let b β and b α be the OLS estimators of regres- sions of Yt on Xt and e Xt(t = 1, · · · , n), respectively. It is easy to show b α = (G0)−1b β and thus we have b β − β = G0(b α − α). (11) Under the assumptions in Section 2, θ(L)−1 = P∞

j=0 ψjLj with ψ0 = 1, where the coefficients ψj

decline exponentially, satisfying P∞

j=0 j|ψj| < ∞. Following the notation in Goncalves and Killian

(2004), we let bj = (ψj−1, · · · , ψj−p) for j ≥ 1, with ψj = 0 for j < 0, then (e Yt−1, · · · , e Yt−p)0 = P∞

j=1 bjεt−j. Denote Ω = P∞ j=1 bjb0

  • j. The following Lemma 1 evaluates the asymptotic distribution
  • f b

α. Lemma 1. (i) Let the (p + m + 1) × (p + m + 1) diagonal matrix Υ = diag(√n, · · · , √n | {z }

p times

, √nγ−1

n , n3/2γ−1 n , · · · , n(2m+1)/2γ−1 n )

and the (m + 1) × (m + 1) (Hilbert) matrix R = µ Rij ¶ , where Rij = (i + j − 1)−1 for i, j = 1, · · · , m + 1. Then under Assumption 1, as n → ∞ Υ(b α − α) ⇒ Q−1 Z V (r)dBp+m+1(r), where Bp+m+1(r) is a vector Brownian motion with variance matrix Λ = ⎛ ⎜ ⎝ Ω ιι0 ⎞ ⎟ ⎠ with ι = ( 1, · · · , 1 | {z }

(m+1) times

)0, Q = ⎛ ⎜ ⎝ Ω R σ2 0p×(m+1) 0(m+1)×p R ⎞ ⎟ ⎠ and V (r) = diag(σ2(r)Ip, σ(r), rσ(r), · · · , rmσ(r)). 8

slide-40
SLIDE 40

(ii) Under Assumption 1 and 2, as n → ∞ Υ(b α − α) ⇒ MN p+m+1(0, Q−1( Z V ΛV )Q−1), where MN p+m+1(0, Q−1( R V ΛV )Q−1) means a (p + m + 1)-dimensional mixed Gaussian distri- bution with mixing variate Q−1( R V ΛV )Q−1. The following theorem is a consequence of Lemma 1 in view of (11). Theorem 1. (i) Under Assumption 1, as n → ∞ √n(b θ − θ) ⇒ Ω−1/2 R σ2 Z σ2dWp; (12) √n(b δi − δi) ⇒ g0

p+i+1Q−1

Z V dBp+m+1, if n−iγn → c, |c| < ∞, (13) n(2i+1)/2γ−1

n (b

δi − δi) ⇒ g

p+i+1Q−1

Z V dBp+m+1, if niγ−1

n

→ 0, (14) for i = 0, · · · , m, where g0

p+i+1 is the same as the (p+i+1)−th row of G0 except with (p+i+1)−th

element c rather than 1 (when c = 1, g0

p+i+1 is the (p + i + 1) − th row of G0), and g p+i+1 is the

same as (p+i+1)−th row of G0 except with first p elements 0’s rather than elements in (i+1)−th row of H0. (ii) Under Assumption 1 and 2, as n → ∞ √n(b θ − θ) ⇒ MN p µ 0,

R σ4 ( R σ2)2 Ω−1

¶ (15) √n(b δi − δi) ⇒ MN p+m+1 µ 0, g0

p+i+1Q−1

µR V ΛV ¶ Q−1gp+i+1 ¶ , if n−iγn → c, |c| < ∞, (16) n(2i+1)/2γ−1

n (b

δi − δi) ⇒ MN p+m+1 µ 0, g

p+i+1Q−1

µR V ΛV ¶ Q−1gp+i+1 ¶ , if niγ−1

n

→ 0, (17) for i = 0, · · · , m, where g0

p+i+1 and g p+i+1 are defined in part (i).

  • Remarks. (1). (Convergence rates) Theorem 1 shows that under Assumption 1, the estimate of

the slope parameter vector b θ is √n−consistent, unaffected by nonstationary volatility. In contrast, 9

slide-41
SLIDE 41

both the convergence rate and the limit distribution of the estimate of the drift term, b δi(0 ≤ i ≤ m), depend on the asymptotic order of nonstationary volatility γn. Specifically, b δi is √n−consistent if γn ∝ nk, k ∈ (−∞, i] and its limit distribution is that in (13); b δi is n(2i+1)/2γ−1

n −consistent (slower than √n) if

γn ∝ nk, k ∈ (i, i + 1 2) and its limit distribution is that in (14); b δi is inconsistent if γn ∝ nk, k ∈ [i + 1 2, ∞). All b δi(0 ≤ i ≤ m) are consistent if γn ∝ nk, k ∈ (−∞, 1/2). (18) These results are in contrast to the findings in Chung and Park (2005), who studied asymptotics of linear regressions with strictly exogenous regressors under nonstationary volatility, and their results do not apply in our context. (2). To see the possibility of non-Gaussian of OLS estimates in the limit under nonstationary volatility and the role Assumption 2 plays, we consider the following stronger assumption than Assumption 2: σt is independent of ηs for all t and s, (19) and zero mean AR(1) model: Yt = θYt−1 + εt, where εt is an m.d. sequence satisfying (2) and Assumption 1. By the Dambis-Dubins-Schwarz Theorem, the Ito integral R σ2dW1 (p = 1) in the limit distribution (12) equals in distribution a random time changed Brownian motion W1(R σ4), which is Gaussian conditional on R σ4 and unconditionally Gaussian only if Assumption 2 holds. On the other hand, when (19) is satisfied we can derive the asymptotical normality in (15) by a CLT for m.d. arrays, conditional on σt which were looked as if they were fixed, while when (19) is not satisfied, we have to rely on weak convergence to an Ito integral to obtain the results in (12).2 This

2Conventional limit theory of θ relies on applying a CLT to the scaled sum n−1/2 Pn t=1 εtεt−k for fixed k. For this

10

slide-42
SLIDE 42

also gives the intuition why the wild bootstrap procedure works asymptotically in our context only when Assumption 2 is satisfied. Under Assumption 1 and 2, the (mixed) Gaussianity of OLS estimates implies that we can construct tests of hypotheses on regression coefficients provided that b β is consistent. For instance, t−tests are asymptotically standard normal under the null hypothesis provided that (18) holds. Corollary 1. Let b εt = Yt − X0

tb

β = Yt − e X0

tb

α and Υ as defined in Lemma 1. Then under Assumption 1 and 2 and (18), as n → ∞, tb

θj = √n(b

θj − θj)/ q b Cjj ⇒ N(0, 1), for j = 1, · · · , p; (21) tb

δ0 =

⎧ ⎪ ⎨ ⎪ ⎩ √n(b δ0 − δ0)/ q b Cp+1,p+1 ⇒ N(0, 1), if γn → c, |c| < ∞; √nγ−1

n (b

δ0 − δ0)/ q b Cp+1,p+1 ⇒ N(0, 1), if γn → ∞ and

γn √n → 0;

tb

δi = √n(b

δi − δi)/ q b Cp+1+i,p+1+i ⇒ N(0, 1), for i = 1, · · · , m, (22) where b Cjj is the (j, j)−element of b C and b C = (Υ−1 Pn

t=1 XtX0 t)−1(Pn t=1 XtX0 tb

ε2

t)(Pn t=1 XtX0 tΥ−1)−1

is the robust estimator of the asymptotic covariance matrix of b β. Corollary 1 makes it clear that t−tests can be constructed using coefficient estimates and es- timated residuals from the regression (1), rather than the transformed regression (7). We do not require knowledge of the matrix G either. When constructing the t−test on the intercept δ0 we need to take into consideration the asymptotic order of the nonstationary volatility γn. But for the t−tests on the slope and drift (except the intercept) coefficients, we do not need the information of γn provided that b β is consistent.

to be true, one condition of the CLT for m.d. arrays is (see e.g., White, 1999) n−1

n

X

t=1

ε2

t ε2 t−k − n−1 n

X

t=1

Eε2

t ε2 t−k = op(1).

(20) In our case with γn = 1 it can shown that n−1 Pn

t=1 ε2 t ε2 t−k ⇒ R σ4 and the limit is generally a random variable.

Thus (20) is not satisfied.

11

slide-43
SLIDE 43

3.2 Residual-Based Bootstrap

In this subsection, we analyze the asymptotic properties of the residual-based bootstrap method for the autoregressive model in (1) in the presence of nonstationary volatility. Let b εt = Yt − Pm

i=0 b

δiti − Pp

j=1 b

θjYt−j denote the residuals of the OLS estimate of (1). The (recursive design) bootstrap data generating process (DGP) follows Y ∗

t = m

X

i=0

b δiti +

p

X

j=1

b θjY ∗

t−j + b

ε∗

t = X∗0 t b

β + b ε∗

t ,

(23) where X∗

t = (Y ∗ t−1, · · · , Y ∗ t−p, 1, t, · · · , tm)0 and b

ε∗

t (t = 1, · · · , n) is drawn randomly from certain

distribution utilizing the information in b εt(t = 1, · · · , n). The sequence {Y ∗

t } is initialized at t = 0

with Y ∗

0 = 0. The bootstrap estimator of β, b

β

∗ = (b

θ

∗ 1, · · · ,b

θ

∗ p,b

δ

∗ 0, · · · ,b

δ

∗ m)0 is obtained by applying

OLS to the bootstrap data {Y ∗

t : t = 1, · · · , n}. Bootstrap versions of the statistics of interest, such

as normalized sampling error or t−statistics, are constructed by using the bootstrap estimator b β

and seeing b β as the true value of the bootstrap DGP. The empirical distribution of these bootstrap statistics are used to approximate the distribution of the original statistics of interest. For the conventional residual-based bootstrap, for each t = 1, · · · , n, b ε∗

t is drawn independently

from the same distribution - the empirical distribution of {b εt : t = 1, · · · , n}. It can be shown that √n(b θ

∗ − b

θ)

dP ∗

⇒ N(0, Ω−1/2), in probability, (24) where

dP ∗

⇒ in probability means weak convergence under the bootstrap probability measure occurring in a set with probability converging to one. The conventional bootstrap treats εt as independent and identically distributed, and thus fails to capture the potential heteroskedasticity in the errors (see Goncalves and Kilian (2004) for an explanation why the conventional residual-based bootstrap fails to recover the asymptotic covariance matrix in the presence of conditional heteroskedasticity). The wild bootstrap proposed by Wu (1986) and Liu (1988) is designed to allow heteroskedasticity in linear regression models. In the time series context , it has been considered by Kreiss (1997), Hafner and Herwartz (2000) and Anatolyev (2004). For the residual-based wild bootstrap, b ε∗

t = b

εtvt with vt an iid zero mean sequence with unit variance such that E∗v4

t < ∞. Unlike the standard

bootstrap, the wild bootstrap draws each b ε∗

t from a different distribution with mean zero and variance

b ε2

  • t. The following theorem evaluates the limit behavior of wild bootstrap analogs of b

θ − θ and b δ − δ. 12

slide-44
SLIDE 44

Theorem 2. Under Assumption 1 and (18), as n → ∞, √n(b θ

∗ − b

θ)

dP ∗

⇒ MN p ⎛ ⎝0,

R σ4 ÃR

σ2

! 2 Ω−1

⎞ ⎠ , in probability; √n(b δ

∗ 0 − b

δ0) ⇒ MN p+m+1 µ 0, g0

p+1Q−1

µR V ΛV ¶ Q−1gp+1 ¶ , if γn → c, |c| < ∞; √nγ−1

n (b

δ

∗ 0 − b

δ0) ⇒ MN p+m+1 µ 0, g

p+1Q−1

µR V ΛV ¶ Q−1gp+1 ¶ , if γn → ∞ and γn √n → 0; √n(b δ

∗ i − b

δi)

dP ∗

⇒ MN p+m+1 µ 0, g0

p+i+1Q−1

µR V ΛV ¶ Q−1g0

p+i+1

¶ , in probability. Theorem 2 shows that the centered and standardized residual-based wild bootstrap estimators converge to a mixed Gaussian distribution, implicitly neglecting potential leverage effects. Theorem 2 still holds if we use Yt−j in place of Y ∗

t−j in the bootstrap DGP in (23).

We do not pursue this (fixed-design) bootstrap scheme further here, since we observe in the simulations that its finite- sample performance in constructing percentile-t confidence interval is inferior to the recursive-design bootstrap scheme in (23). Let b C∗

jj be the (j, j)−element of b

C∗, where b C∗ = (Υ−1 Pn

t=1 X∗ t X∗0 t )−1(Pn t=1 X∗ t X∗0 t e

ε∗2

t )(Pn t=1 X∗ t X∗0 t Υ−1)−1

and e ε∗

t = Y ∗ t − X∗0 t b

β

∗. The following corollary shows that under Assumption 1 and 2, the empirical

distribution of tb

θ

∗ j and tb

δ

∗ i can be used to approximate the distribution of tb

θj and tb δi in (21) and

(22) respectively. Corollary 2. Under Assumption 1 and (18), as n → ∞, tb

θ

∗ j = √n(b

θ

∗ j − b

θj)/ q b C∗

jj dP ∗

⇒ N(0, 1), for j = 1, · · · , p; tb

δ

∗ 0 =

⎧ ⎪ ⎨ ⎪ ⎩ √n(b δ

∗ 0 − b

δ0)/ q b C∗

p+1,p+1 dP ∗

⇒ N(0, 1), if γn → c, |c| < ∞; √nγ−1

n (b

δ

∗ 0 − b

δ0)/ q b C∗

p+1,p+1 dP ∗

⇒ N(0, 1), if γn → ∞ and

γn √n → 0;

tb

δ

∗ i = √n(b

δ

∗ i − b

δi)/ q b C∗

p+1+i,p+1+i dP ∗

⇒ N(0, 1), for i = 1, · · · , m, 13

slide-45
SLIDE 45

3.3 Testing Homoskedasticity

This subsection studies a CUSUM of squares test for variance constancy. The CUSUM of squares test introduced by Brown et al (1975) was used in Inclán and Tiao (1994) to test for step variance changes in iid Gaussian random sequences. Park, Lee and Jeon (2001) applied the CUSUM test to residuals of infinite order autoregressions. In our context, it will be shown that the CUSUM test can be used to test for a wide range of continuous or discrete, deterministic or stochastic nonstationary volatility, not just discrete variance changes. Let CK = PK

t=1 b

ε2

t and DK = CK/Cn − K/n for

K = 1, · · · , n. Define Ξ = max

1≤K≤n

√nb σ2|DK|/b , where b σ2 = n−1 Pn

t=1 b

ε2

t, b

2 = n−1 Pn

t=1 b

ε4

t − (b

σ2)2. Theorem 3. (i) Under the null hypothesis H0 : σt is a constant for all t, Ξ ⇒ sups∈[0,1] |W(s)|, where W(s) = W(s) − sW(1). (ii) Under the alternative hypothesis H1 : σt is not constant over t but satisfies Assumption 1, n−1/2Ξ ⇒ [(η − 1) R σ4]−1/2 sups∈[0,1] ¯ ¯ ¯ ¯ R s

0 σ2 − s R σ2

¯ ¯ ¯ ¯ , where η = Eη4. Theorem 3 derives the limit distribution of the CUSUM test Ξ under both the null and alternative

  • hypotheses. Evidently the test is consistent against a wide class of nonstationary volatility satisfying

Assumption 1, as illustrated in Section 2.

4 SIMULATIONS

In this section, we report a simulation experiment to assess the finite-sample accuracy of the wild bootstrap approximation when it is asymptotically valid. We focus on the zero mean AR(1) model with deterministic nonstationary volatility. The DGP is Yt = θYt−1 + εt, (25) 14

slide-46
SLIDE 46

Table 1: Actual type I errors of confidence intervals based on asymptotic distribution (ASY), conventional

recursive design bootstrap (CB), and wild bootstrap (WB), under deterministic nonstationary volatility in (26), for θ ∈ {0.5, 0.9}, τ ∈ {0.2, 0.5, 0.8}, δ ∈ {0.2, 5} and the sample size n = {50, 100, 250}.

θ = 0.5 θ = 0.9 ASY CB WB ASY CB WB τ δ L R L R L R L R L R L R n = 50 .2 .2 .223 .141 .205 .211 .097 .132 .335 .012 .254 .107 .106 .091 1 .067 .024 .045 .072 .051 .082 .198 .003 .100 .088 .108 .092 5 .104 .031 .078 .082 .082 .085 .197 .003 .109 .065 .113 .062 .5 .2 .150 .056 .131 .109 .094 .081 .268 .002 .168 .070 .102 .098 5 .155 .058 .134 .114 .093 .080 .228 .012 .138 .116 .085 .072 .8 .2 .107 .030 .078 .072 .075 .074 .235 .002 .123 .053 .107 .069 5 .227 .112 .216 .187 .114 .105 .291 .085 .226 .215 .105 .082 n = 100 .2 .2 .241 .113 .223 .174 .089 .083 .314 .037 .228 .114 .089 .082 1 .082 .025 .059 .052 .066 .058 .131 .016 .071 .071 .077 .078 5 .103 .037 .083 .063 .074 .057 .147 .018 .087 .078 .070 .077 .5 .2 .141 .082 .122 .114 .069 .073 .216 .017 .138 .099 .064 .093 5 .151 .075 .123 .105 .078 .077 .200 .032 .148 .111 .085 .088 .8 .2 .090 .048 .073 .073 .073 .067 .150 .012 .087 .083 .071 .087 5 .224 .122 .213 .166 .095 .075 .283 .088 .222 .183 .069 .080 n = 250 .2 .2 .205 .176 .198 .198 .044 .081 .277 .092 .231 .156 .054 .086 1 .053 .036 .044 .055 .045 .062 .099 .012 .062 .060 .058 .062 5 .089 .045 .065 .058 .057 .051 .132 .023 .083 .068 .063 .062 .5 .2 .138 .082 .120 .103 .058 .047 .160 .057 .105 .110 .049 .081 5 .122 .103 .113 .130 .050 .073 .195 .054 .134 .123 .056 .081 .8 .2 .080 .052 .061 .069 .047 .058 .119 .022 .080 .076 .061 .068 5 .218 .164 .204 .201 .078 .067 .307 .105 .253 .177 .078 .077 where εt = σtηt, θ ∈ {0.5, 0.9}, and ηt ∼ iidN(0, 1). We assume the nonstationarity of the volatility process is due to a step variance change of εt from σ2

0 to σ2 1 at time [nτ], viz., σt = σ(t/n), where

σ(r)2 = σ2

0 + (σ2 1 − σ2 0)1{r≥τ}, r ∈ [0, 1].

(26) The steepness of the variance change is characterized by the post-break and pre-break variance ratio δ2 = σ2

1/σ2

  • 0. We set τ ∈ {0.2, 0.5, 0.8}, δ ∈ {0.2, 5} and σ2

0 = 1 without loss of generality. We also

consider the homoskedasticity case when δ = 1. The sample size is set to be 50, 100 and 250. For the bootstrap method, the bootstrap replication is 999 and we choose vt ∼ iidN(0, 1) for the wild 15

slide-47
SLIDE 47

bootstrap. We are interested in the coverage accuracy of the nominal 90% central percentile−t confidence interval for θ. Three methods of constructing confidence intervals are compared and all of them are based on OLS estimation of (25) with an intercept. In Table 1 we report the actual type I errors of confidence intervals using the asymptotic distribution (ASY), conventional recursive design bootstrap (CB) and wild bootstrap (WB) procedures. For each method we report the percentage

  • f samples in which the true value of θ lay to the left (right) of estimated confidence interval in the

row labeled L (R). The accuracy of the asymptotic interval is poor, the type I error being very different from 0.05 in general and the actual coverage rate is much lower than the nominal 90%. Significant deviations are observed when a positive (δ > 1, e.g., δ = 5) variance shift occurs at the end of the sample (e.g., τ = 0.8), and a negative (δ < 1, e.g., δ = 0.2) variance shift occurs at the beginning of the sample (e.g., τ = 0.2). This is similar to the findings in Phillips and Xu (2006), who study the size distortion of the t−test based on the asymptotic critical value. Furthermore, the performance

  • f the asymptotic interval improves little as the sample size increases. The conventional bootstrap

interval has a similar actual type I error as the asymptotic interval, and in some cases it performs even worse. The wild bootstrap interval has less coverage distortion than the other two methods, although in some cases (e.g., the late positive shift and the early negative shift mentioned above) it is not satisfactory. As the sample size increases, while the performance of other two procedures improves little, the type I error of the wild bootstrap interval is fairly close to 0.05 in all cases considered, and appears to be robust to the position and the magnitude of the shift.

5 DISCUSSION

In this paper we establish a limit theory for estimation of the autoregressive model around a poly- nomial trend under general nonstationary volatility. The residual-based wild bootstrap procedure is shown to be consistent when the estimates are (mixed) Gaussian. Simulations reveal that the wild bootstrap possesses much better finite sample performance than inference procedures based on the asymptotic approximation. It is natural to ask whether there exists a bootstrap method for recovering the correct limit distribution in general, allowing for leverage effects. However, even if a consistent bootstrap method exists, it may not be expected to produce an asymptotic refinement, 16

slide-48
SLIDE 48

since in the presence of nonstationary volatility, the limit distribution is non-pivotal, depending on a nuisance parameter involving the volatility function σt. One possible solution for this complica- tion is to estimate the volatility function σt nonparametrically, as in Hansen (1995) and Xu and Phillips (2005), and then apply the standard residual-based bootstrap method to the appropriately re-weighted regression equation. One specification of a volatility process for which least squares estimators are mixed Gaussian in the limit without imposing the independent assumption (like Assumption 2) is studied by Chung and Park (2005). They assumed the volatility function to be an integrable transformation of an integrated process. Asymptotic validity of the wild bootstrap is to be expected in this case but needs verification. We assume in the current study that the lag length of the autoregression, p, is known. This assumption is unrealistic in practical applications, of course, where information criteria are typically used to choose p. However, the asymptotic properties of the various information criteria used to select the order of autoregressions under nonstationary volatility, are still unknown, although it seems likely that the framework in Ploberger and Phillips (1996) and Phillips (1996) is general enough to apply in this case. Another possible extension of the current work is to study the consistency of the (unrestricted

  • r restricted) residual-based wild bootstrap when it is applied to first-order or higher-order autore-

gressive processes with possible unit roots under nonstationary volatility. All of these topics are left for future research.

6 APPENDIX: PROOFS.

In this appendix we state the proofs of the main results presented in Section 3. For a d(d ≥ 1)−dimensional vector x = (x1, · · · , xd), its Euclidean norm is defined as |x| = (Pd

i=1 x2 i )1/2. For

a random variable x, its Lq(q ≥ 1)−norm is defined as kxkq = (E|x|q)1/q. The following lemma is useful throughout the section. Lemma A1 (Billingsley, 1968, Theorem 4.2, page 25; see also Brockwell and Davis, 1991, Proposition 6.3.9) If ξn and χnL (L, n ≥ 1) are random vector such that (i) χnL ⇒ χL as n → ∞ for each L ≥ 1, 17

slide-49
SLIDE 49

(ii) χL ⇒ χ as L → ∞, and (iii) lim

L→∞

µ lim sup

n→∞ P(|ξn − χnL| > )

¶ = 0 for every > 0, then ξn ⇒ χ as n → ∞. Proof of Lemma 1. (i) Let e Xt = (Z0

t, T 0 t)0 where Zt = (e

Yt−1, · · · , e Yt−p)0 and Tt = (1, t, · · · , tm)0. Since Υ(b α − α) = µ γ−2

n Υ−1 Pn t=1 e

Xt e X0

tΥ−1

¶−1 µ γ−2

n Υ−1 Pn t=1 e

Xtεt ¶ , Lemma 1 (i) follows from γ−2

n Υ−1 n

X

t=1

e Xt e X0

tΥ−1 ⇒ Q

(27) and γ−2

n Υ−1 Pn t=1 e

Xtεt ⇒ R V dBp+m+1, where the first is from (a) (b) (c) and the second from (d) and (e) in the following: (a) n−1γ−2

n

Pn

t=1 ZtZ0 t ⇒ Ω

R σ2; (b) for i = 0, · · · , m, n−(i+1)γ−1

n

Pn

t=1 tiZt p

→ 0; (c) Υ

−1TtT 0 tΥ −1 → R, where Υ = diag(n1/2, n3/2, · · · , n(2m+1)/2);

(d) n−1/2γ−2

n

Pn

t=1 Ztεt ⇒

R σ2dBp, where Bp(r) is a vector Brownian motion with the variance matrix Ω. (e) γ−1

n Υ −1 Pn t=1 Ttεt ⇒

R diag(σ, rσ, · · · , rmσ)dBm+1, where Bm+1(r) is a vector Brownian motion with the variance matrix ιι0, independent of Bp(r) in (d). We prove (a) first. Since Zt = P∞

j=1 bjεt−j, we define

ξn

def

= n−1γ−2

n n

X

t=1

ZtZ0

t = ∞

X

i,j=1

bib0

j

µ n−1γ−2

n

Pn

t=1 εt−iεt−j

¶ . We are to show ξn ⇒ Ω R σ2 using Lemma A1. First we establish that, for fixed i and j, n−1γ−2

n n

X

t=1

εt−iεt−j ⇒ Z σ2, if i = j,

p

→ 0 if i 6= j. (28) 18

slide-50
SLIDE 50

If i = j, E(n−1γ−2

n n

X

t=1

ε2

t−i − n−1 n

X

t=1

γ−2

n σ2 t−i)2

= n−2E[

n

X

t=1

γ−2

n σ2 t−i(η2 t−i − 1)]2

= n−2

n

X

t=1

[γ−4

n Eσ4 t−i]E(η2 t−i − 1)2 + 2n−2γ−4 n n

X

s<t,s,t=1

E[σ2

t−iσ2 s−i(η2 s−i − 1)]E(η2 t−i − 1)

| {z }

=0

= n−2

n

X

t=1

[γ−4

n Eσ4 t−i]E(η2 t−i − 1)2 = op(1),

by Assumption 1(ii). So by Markov inequality n−1γ−2

n

Pn

t=1 ε2 t−i = n−1 Pn t=1[γ−1 n σt−i]2 + op(1) ⇒

R σ2 by Assumption 1(i). If i 6= j, without loss of generality, assume i < j. Similarly it can be shown n−1γ−2

n n

X

t=1

εt−iεt−j = n−1γ−2

n n

X

t=1

Eεt−iεt−j + op(1) = n−1γ−2

n n

X

t=1

Eσt−iηt−iσt−jEηt−j | {z }

=0

+ op(1) = op(1). Thus (28) holds. Since Zt = P∞

j=1 bjεt−j, we write

n−1γ−2

n n

X

t=1

ZtZ0

t = ∞

X

i,j=1

bib0

j

µ n−1γ−2

n

Pn

t=1 εt−iεt−j

¶ For fixed L, let χnL

def

= PL

i,j=1 bib0 j

µ n−1γ−2

n

Pn

t=1 εt−iεt−j

¶ ⇒ PL

i=1 bib0 i

R σ2 def = χL as n → ∞ by (28). As L → ∞, χL ⇒ P∞

i=1 bib0 i

R σ2 = Ω R σ2 def = χ. Further, we have P(|ξn − χnL| > ) ≤ −1E ¯ ¯ ¯ ¯ P∞

i,j=L+1 bib0 j[n−1γ−2 n

Pn

t=1 εt−iεt−j]

¯ ¯ ¯ ¯ (29) ≤ −1

X

i,j=L+1

|bi| · |bj| µ n−1 Pn

t=1[γ−4 n Eσ2 t−iEσ2 t−j]1/2

¶ ≤ C−1

X

i,j=L+1

|bi| · |bj|, for any n and some constant C → 0, as L → ∞, where the exchange of order of expectation and infinite sum follows absolute summability ad hoc (see, e.g., Rao, 1973, page 111). Thus (a) follows from Lemma A1. 19

slide-51
SLIDE 51

(b). We write ξn

def

= n−(i+1)γ−1

n

Pn

t=1 tiZt = P∞ j=1 bj[n−1 Pn t=1(t/n)iγ−1 n εt−j].

Let χnL

def

= PL

j=1 bj[n−1 Pn t=1(t/n)iγ−1 n εt−j]. For fixed j, (t/n)iγ−1 n εt−j is a uniformly integrable L1−mixingale

since E[(t/n)iγ−1

n εt−j]2 ≤ γ−2 n Eε2 t−j = γ−2 n Eσ2 t−j < ∞. By the law of large numbers (Andrews,

1988) n−1 Pn

t=1(t/n)iγ−1 n εt−j p

→ 0, and so χnL

p

→ 0 as n → ∞. By the similar arguments in (29) we have P(|ξn − χnL| > ) → 0 as L → ∞. Thus (b) follows from Lemma A1. (c). It follows from the fact that n−(i+1) Pn

t=1 ti → (i + 1)−1 for i = 0, 1, · · · .

(d). For j = 0, 1, · · · , define Sj

t := n−1/2 Pt τ=1 fτjητ, where fτj is as in Assumption 1 (i). Let

ξn

def

= n−1/2γ−2

n

Pn

t=1 Ztεt = P∞ j=1 bjn−1/2 Pn t=1 γ−2 n σt−jσtηt−jηt and

χnL

def

=

L

X

j=1

bjn−1/2

n

X

t=1

γ−2

n σt−jσtηt−jηt = L

X

j=1

bj

n

X

t=1

γ−2

n σt−jσtdSj t .

For fixed j, Pn

t=1 γ−2 n σt−jσtdSj t ⇒

R σ2dW j as n → ∞, by Assumption A(i) and Theorem 2.1 of Hansen (1992). So for fixed L, χnL ⇒ PL

j=1 bj

R σ2dW j def = χL as n → ∞. Note that W j is indepen- dent over j. In particular, for j ≥ 1, ES0

t Sj t = n−1 P τ,s Eητ−jητηs, = n−1 P τ,s EητEητ−jηs = 0,

if τ > s; = n−1 P

τ,s Eητ−jEητηs = 0, if τ ≤ s. For i, j ≥ 1, without loss of generality, as-

sume i < j, ESi

tSj t = n−1 P τ,s Eητ−iητηs−jηs, = n−1 P τ,s EητEητ−iηs−jηs = 0, if τ > s; ==

n−1 P

τ,s Eητ−iEητηs−jηs = 0, if τ ≤ s.

Let Bj

p = bjW j and Bp = P∞ j=1 Bj

  • p. As L → ∞, χL ⇒ P∞

j=1 bj

R σ2dW j = P∞

j=1

R σ2dBj

p =

R σ2dBp

def

= χ, by dominated convergence theorem and noting |χL| ≤ P∞

j=1 |bj| ·

¯ ¯ ¯ ¯ R σ2dW j ¯ ¯ ¯ ¯ and E µR σ2dW j ¶2 = E( R σ4) < ∞. Also P(|ξn − χnL| > ) ≤ −2E ¯ ¯ ¯ ¯ P∞

j=L+1 bjn−1/2 Pn t=1 γ−2 n σt−jσtηt−jηt

¯ ¯ ¯ ¯

2

≤ −2

X

j=L+1

|bj|2 µ n−1 Pn

t=1 Eη4[γ−8 n Eσ4 t−jEσ4 t]1/2

¶ → 0, as L → ∞ by Assumption 1 (ii). Thus (d) follows from Lemma A1. (e) It follows from γ−1

n Υ −1 n

X

t=1

Ttεt =

n

X

t=1

√nΥ

−1Tt · γ−1 n σt · d(n−1/2 t

X

τ=1

ητ) 20

slide-52
SLIDE 52

⇒ Z (σ, rσ, · · · , rmσ)0dW 0 = Z diag(σ, rσ, · · · , rmσ)ιdW 0 = Z diag(σ, rσ, · · · , rmσ)dBm+1 by Assumption 1(i) and Theorem 2.1 in Hansen (1992). Independence of Bm+1(r) in (e) and Bp(r) in (d) follows from the independence of W 0(s) and W j(s) for j ≥ 1. So the proof of Lemma 1 (i) is complete. ¥ Proof of Corollary 1. If we establish (Υ−1

n

X

t=1

e Xt e X0

t)−1( n

X

t=1

e Xt e X0

tb

ε2

t)( n

X

t=1

e Xt e X0

tΥ−1)−1 ⇒ Q−1(

Z V ΛV )Q−1, (30) then using Xt = G−1 e Xt we have b C = (Υ−1

n

X

t=1

XtX0

t)−1( n

X

t=1

XtX0

tb

ε2

t)( n

X

t=1

XtX0

tΥ−1)−1

= (Υ−1G−1

n

X

t=1

e Xt e X0

tG0−1Υ−1)−1(Υ−1G−1 n

X

t=1

e Xt e X0

tb

ε2

tG0−1Υ−1)(Υ−1G−1 n

X

t=1

e Xt e X0

tG0−1Υ−1)−1

= ΥG0 µ Pn

t=1 e

Xt e X0

t

¶−1 µ Pn

t=1 e

Xt e X0

tb

ε2

t

¶ µ Pn

t=1 e

Xt e X0

t

¶−1 GΥ. Thus b Cjj = Υjg0

j

µ Pn

t=1 e

Xt e X0

t

¶−1 µ Pn

t=1 e

Xt e X0

tb

ε2

t

¶ µ Pn

t=1 e

Xt e X0

t

¶−1 Υjgj = g0

j(ΥjΥ−1)

µ Υ−1 Pn

t=1 e

Xt e X0

t

¶−1 µ Pn

t=1 e

Xt e X0

tb

ε2

t

¶ µ Pn

t=1 e

Xt e X0

tΥ−1

¶−1 (ΥjΥ−1)gj For 1 ≤ j ≤ p, √n(b θj − θj)/ q [Q−1( R V ΛV )Q−1]jj ⇒ N(0, 1) and b Cjj ⇒ [Q−1( R V ΛV )Q−1]jj. For 1 ≤ i ≤ m, if n−1/2γn → 0, √n(b δi − δi)/ s g0

p+i+1Q−1

µR V ΛV ¶ Q−1gp+i+1 ⇒ N(0, 1) and b Cp+1+i,p+1+i ⇒ g0

p+i+1Q−1

µR V ΛV ¶ Q−1gp+i+1 with c = 0. For i = 0, if γn → c, √n(b δ0 − δ0)/ s g0

p+1Q−1

µR V ΛV ¶ Q−1gp+1 ⇒ N(0, 1) and b Cp+1,p+1 ⇒ g0

p+1Q−1

µR V ΛV ¶ Q−1gp+1; if γn → ∞ and n−1/2γn → 0, √nγ−1

n (b

δ0−δ0)/ s [Q−1 µR V ΛV ¶ Q−1]p+1,p+1 ⇒ N(0, 1) and b Cp+1,p+1 ⇒ 21

slide-53
SLIDE 53

[Q−1 µR V ΛV ¶ Q−1]p+1,p+1. Then Corollary 1 holds. Now we prove (30), which follows from γ−4

n Υ−1 n

X

t=1

e Xt e X0

tb

ε2

tΥ−1 ⇒

Z V ΛV. (31) First we have γ−4

n Υ−1 n

X

t=1

e Xt e X0

tε2 tΥ−1 ⇒

Z V ΛV, (32) provided that the following three results hold: (a). γ−4

n n−1 Pn t=1 ZtZ0 tε2 t ⇒ Ω

R σ4; (b). γ−3

n n−(i+1) Pn t=1 tiZtε2 t ⇒ 0, for i = 0, 1, · · · m;

(c). γ−2

n Υ −1 Pn t=1 TtT 0 tε2 tΥ −1 ⇒

R (σ, rσ, · · · , rmσ)0(σ, rσ, · · · , rmσ); We prove (a) first. Note that γ−4

n n−1 Pn t=1 ZtZ0 tε2 t = P∞ i,j=1 bib0 j 1 n

Pn

t=1 γ−4 n εt−iεt−jε2

  • t. For

fixed L, PL

i,j=1 bib0 j 1 n

Pn

t=1 γ−4 n εt−iεt−jε2 t ⇒ PL i=1 bib0 i

R σ4 as n → ∞. Then (a) holds by Lemma A1 provided that P( ¯ ¯ ¯ ¯ P∞

i,j=L+1 bib0 j 1 n

Pn

t=1 γ−4 n εt−iεt−jε2 t

¯ ¯ ¯ ¯ > ) → 0, which follows from arguments similar to those in (29) since max1≤t≤n Eη4

t, max1≤t≤n E[γ−1 n σt]4 ≤

C < ∞ (Assumption 1 (ii)). (b). Note that for fixed j ≥ 1, γ−3

n n−(i+1) Pn t=1 tiεt−jε2 t ⇒ 0. Indeed, γ−3 n (t/n)i(εt−jε2 t −εt−jσ2 t)

is a uniformly integrable m.d. sequence. By a WLLN for m.d. sequences, γ−3

n n−(i+1) n

X

t=1

tiεt−jε2

t

= n−1

n

X

t=1

(t/n)iγ−3

n εt−jσ2 t + op(1)

= n−1/2

n

X

t=1

(t/n)iγ−3

n σt−jσ2 td

µ n−1/2 Pt

τ=1 ητ−j

¶ + op(1) ⇒ n−1/2 Z riσ3dW 0 + op(1) = op(1). Thus γ−3

n n−(i+1) Pn t=1 tiZtε2 t = P∞ j=1 bjn−1 Pn t=1(t/n)iγ−3 n εt−jε2 t ⇒ 0 by Lemma A1

provided that P( ¯ ¯ ¯ ¯ P∞

j=L+1 bj 1 n

Pn

t=1(t/n)iγ−3 n εt−jε2 t

¯ ¯ ¯ ¯ > ) → 0, as L → ∞, which follows from the similar arguments in (29). 22

slide-54
SLIDE 54

(c). By a WLLN for m.d. sequences γ−2

n Υ −1 n

X

t=1

TtT 0

tε2 tΥ −1

= γ−2

n Υ −1 n

X

t=1

TtT 0

tσ2 tΥ −1 + op(1)

=

n

X

t=1

(γ−1

n Υ −1Ttσt)(γ−1 n Υ −1Ttσt)

0 + op(1)

⇒ Z (σ, rσ, · · · , rmσ)0(σ, rσ, · · · , rmσ). So (c) holds. Combining the results (a), (b) and (c) gives (32). Then γ−4

n Υ−1 n

X

t=1

e Xt e X0

tb

ε2

tΥ−1

= γ−4

n Υ−1 n

X

t=1

e Xt e X0

tε2 tΥ−1 + γ−2 n Υ−1 n

X

t=1

e Xt e X0

tγ−2 n (b

ε2

t − ε2 t)Υ−1

= γ−4

n Υ−1 n

X

t=1

e Xt e X0

tε2 tΥ−1 + op(1) ⇒

Z V ΛV, where the first equality follows from (27) and the fact that γ−2

n (b

ε2

t − ε2 t) = Op(n−1/2), which holds

since b εt = εt − X0

t(b

β − β) = εt − e X0

t(b

α − α) = εt + Op(γn/√n). (33) By (27) again, (30) holds and the proof of Corollary 1 is complete. Proof of Theorem 2. Define e Y ∗

t−j = Y ∗ t−j − Pm i=0 b

µi(t − j)i, 0 ≤ j ≤ p. Let b µm and b µm−i(1 ≤ i ≤ m) be the right-hand side of (8) except using δi, θi and µi instead of b δi,b θi and b µi. So (23) can rewritten as Y ∗

t = m

X

i=0

b µiti +

p

X

j=1

b θj e Y ∗

t−j + b

ε∗

t = e

X∗0

t b

α + b ε∗

t ,

where e X∗

t = (e

Y ∗

t−1, · · · , e

Y ∗

t−p, 1, t, · · · , tm)0. Let b

α∗ be the OLS coefficient estimates of regressing Y ∗

t

  • n 1, t, · · · , tm, e

Y ∗

t−1, · · · , e

Y ∗

t−p. To prove Theorem 2, it suffices to show

Υ(b α∗ − b α) = (γ−2

n Υ−1 n

X

t=1

e X∗

t e

X∗0

t Υ−1)−1(γ−2 n Υ−1 n

X

t=1

e X∗

t b

ε∗

t ) dP ∗

⇒ MN p+m+1(0, Q−1( Z V ΛV )Q−1), in probability, which follows from results 23

slide-55
SLIDE 55

(a) n−1γ−2

n

Pn

t=1 Z∗ t Z∗0 t dP ∗

⇒ Ω R σ2, in probability, where Z∗

t = (e

Y ∗

t−1, · · · , e

Y ∗

t−p)0;

(b) for i = 0, · · · , m, n−(i+1)γ−1

n

Pn

t=1 tiZ∗ t p∗

→ 0, in probability; (c) n−1/2γ−2

n

Pn

t=1 Z∗ t b

ε∗

t dP ∗

⇒ MN p(0, Ω R σ4), in probability; (d) γ−1

n Υ −1 Pn t=1 Ttb

ε∗

t dP ∗

⇒ MN(0, R (σ, rσ, · · · , rmσ)0(σ, rσ, · · · , rmσ)), in probability. (e). E∗(n−1/2γ−2

n

Pn

t=1 Z∗ t b

ε∗

t )0(γ−1 n Υ −1 Pn t=1 Ttb

ε∗

t ) = 0.

Here by ξ∗

n p∗

→ 0 in probability we mean P∗(|ξ∗

n| > ) = op(1) for any > 0. Note that ξn p

→ 0 implies ξn

p∗

→ 0. Using the autoregression in (9), under Assumption 1 b θ can be shown to be strongly consistent for (converges a.s − P to) θ by establishing n−1 Pn

t=1 γ−2 n e

Yt−1εt

a.s−P

→ 0, which holds by the strong LLN for martingale differences (see, e.g. White, 1999, Theorem 3.76) provided that kγ−2

n e

Yt−1εtk2 ≤ ° ° ° ° P∞

j=0 ψjγ−2 n εt−jεt

° ° ° °

2

X

j=0

|ψj| · kγ−2

n εt−jεtk2 ≤ C ∞

X

j=0

|ψj| < ∞. Here we make the simplifying assumption that the bootstrap DGP (23) is initialized at minus infinity, since the initial condition does not affect the limit distribution theory as long as it is

  • bounded. Thus, for sufficiently large n > n, b

θ(L) = b θ0 + b θ1L + · · · + b θpLp is invertible, a.s − P, and e Y ∗

t = b

θ(L)−1b ε∗

t def

= P∞

j=0 b

ψjb ε∗

t−j with P∞ j=0 |b

ψj| < ∞, a.s − P. Define b bj = (b ψj−1, · · · , b ψj−p) for j ≥ 1, with b ψj = 0 for j < 0. Thus Z∗

t = P∞ j=1 b

bjb ε∗

t−j. Denote b

Ω = P∞

j=1b

bjb b0

j.

We prove (a) by using Lemma A in the bootstrap world. First we show for fixed i and j, n−1

n

X

t=1

γ−2

n b

ε∗

t−ib

ε∗

t−j dp∗

⇒ Z σ2, if i = j,

p∗

→ 0 if i 6= j. (34) When i = j, we have n−1 Pn

t=1 γ−2 n b

ε∗2

t−i − n−1 Pn t=1 γ−2 n b

ε2

t−i p∗

→ 0 provided that E∗ µ n−1 Pn

t=1 γ−2 n b

ε∗2

t−i − n−1 Pn t=1 γ−2 n b

ε2

t−i

¶2 = n−2γ−4

n E∗

µ Pn

t=1 b

ε2

t−i(v2 t−i − 1)

¶2 = n−2

n

X

t=1

µ γ−1

n b

εt−i ¶4 E∗(v2

t−i − 1)2

≤ Cn−2

n

X

t=1

µ γ−1

n b

εt−i ¶4 = op(1), 24

slide-56
SLIDE 56

which holds since γ−1

n b

εt = γ−1

n εt + op(1) = Op(1)

(35) by (33), provided that γn/√n → 0. The case when i 6= j in (34) can be proved similarly. For fixed L, we have PL

i,j=1b

bib b0

j

µ n−1 Pn

t=1 γ−2 n b

ε∗

t−ib

ε∗

t−j

dp∗

⇒ PL

i=1 bib0 i

R σ2, as n → ∞, by (11), consistent estimation of the impulse responses and the continuous mapping theorem, and PL

i=1 bib0 i

R σ2 dp∗ ⇒ P∞

i=1 bib0 i

R σ2 when L → ∞. Now (a) holds provided that P∗ µ¯ ¯ ¯ ¯ P∞

i,j=L+1 b

bib b0

j(n−1γ−2 n

Pn

t=1 b

ε∗

t−ib

ε∗

t−j)

¯ ¯ ¯ ¯ > ¶ ≤ −1E∗ ¯ ¯ ¯ ¯ P∞

i,j=L+1 b

bib b0

j

µ n−1 Pn

t=1 γ−2 n b

ε∗

t−ib

ε∗

t−j

¶¯ ¯ ¯ ¯ ≤ −1

X

i,j=L+1

|b bi| · |b bj| µ n−1 Pn

t=1(γ−4 n b

ε2

t−ib

ε2

t−j)1/2

p

→ 0, as L → ∞, by (35) and P∞

i=1 |b

bi| < ∞, a.s − P. (b). By the Markov inequality it suffices to show E∗(n−(i+1)γ−1

n n

X

t=1

tiZ∗

t )

0(n−(i+1)γ−1

n n

X

t=1

tiZ∗

t ) = op(1).

Indeed, E∗(n−(i+1)γ−1

n n

X

t=1

tiZ∗

t )

0(n−(i+1)γ−1

n n

X

t=1

tiZ∗

t )

= n−2(i+1)γ−2

n n

X

t=1

t2iE∗Z∗0

t Z∗ t ≤ n−2 n

X

t=1

γ−2

n E∗Z∗0 t Z∗ t .

Since Z∗

t = Pt−1 j=0 b

bjb ε∗

t−j, we have γ−2 n E∗Z∗0 t Z∗ t = γ−2 n

Pt−1

j=0b

b0

jb

bjb ε2

t−j = Op(1) for t ≤ n. So (b) holds.

(c). First we establish for any fixed i ≥ 1, n−1/2γ−2

n n

X

t=1

b ε∗

t−ib

ε∗

t dP ∗

⇒ MN p(0, Z σ4), in probability. (36) Define F∗

t to be the sigma field generated by {vs : s ≤ t}, then γ−2 n b

ε∗

t−ib

ε∗

t is an m.d. array with

25

slide-57
SLIDE 57

respect to F∗

t . Note that

° ° ° °n−1 Pn

t=1 γ−4 n (ε2 t−iε2 t − σ2 t−iσ2 t)

° ° ° °

1+a 1+a

≤ n−1−a µ Pn

t=1 γ−4 n (||ε2 t−iε2 t||1+a + ||σ2 t−iσ2 t||1+a)

¶1+a → 0, since γ−4

n ||ε2 t−iε2 t||1+a = γ−4 n (Eε2+2a t−i ε2+2a t

)

1 1+a ≤ (γ−8

n Eσ4+4a t−i Eη4+4a t−i Eσ4+4a t

Eη4+4a

t

)

1 2+2a < ∞ by

Assumption 1 (i) and similarly γ−4

n ||σ2 t−iσ2 t||1+a < ∞. Thus n−1γ−4 n

Pn

t=1 E∗b

ε∗2

t−ib

ε∗2

t

= n−1 Pn

t=1 γ−4 n b

ε2

t−ib

ε2

t =

n−1 Pn

t=1 γ−4 n ε2 t−iε2 t +op(1) = n−1 Pn t=1 γ−4 n σ2 t−iσ2 t +op(1) ⇒ R σ4. Then (36) holds by the CLT for

m.d. arrays provided that n−1γ−4

n n

X

t=1

b ε∗2

t−ib

ε∗2

t − n−1 n

X

t=1

γ−4

n b

ε2

t−ib

ε2

t = op∗(1)

(37) and n−2γ−8

n n

X

t=1

E∗b ε∗4

t−iE∗b

ε∗4

t

= op∗(1). (38) (37) holds since E∗[n−1γ−4

n n

X

t=1

(b ε∗2

t−ib

ε∗2

t − b

ε2

t−ib

ε2

t)]2

= n−2γ−8

n E∗[ n

X

t=1

b ε2

t−ib

ε2

t(v2 t−iv2 t − 1)]2

≤ Cn−2

n

X

t=1

γ−8

n b

ε4

t−ib

ε4

t = op(1),

by (35), and (38) holds since n−2γ−8

n n

X

t=1

E∗b ε∗4

t−iE∗b

ε∗4

t

= n−2γ−8

n n

X

t=1

b ε4

t−ib

ε4

tE∗v4 t−iE∗v4 t ≤ Cn−2 n

X

t=1

γ−8

n b

ε4

t−ib

ε4

t = op(1).

Note that n−1/2γ−2

n

Pn

t=1 Z∗ t b

ε∗

t = P∞ i=1 b

bi µ n−1/2 Pn

t=1 γ−2 n b

ε∗

t−ib

ε∗

t

¶ . Thus (c) can be proved by using Lemma A1 provided that P∗ µ¯ ¯ ¯ ¯ P∞

i=L+1 b

bi µ n−1/2 Pn

t=1 γ−2 n b

ε∗

t−ib

ε∗

t

¶¯ ¯ ¯ ¯ < ¶ ≤ −2E∗ ¯ ¯ ¯ ¯ P∞

i=L+1 b

bi µ n−1/2 Pn

t=1 γ−2 n b

ε∗

t−ib

ε∗

t

¶¯ ¯ ¯ ¯

2

≤ −2

X

i=L+1

|b bi|2 µ n−1 Pn

t=1(γ−8 n b

ε4

t−ib

ε4

t)1/2

¶ = op(1), as L → ∞, 26

slide-58
SLIDE 58

by (35) and P∞

i=1 |b

bi| < ∞, a.s − P. (d). It suffices to show that for λ = (λ1, · · · , λm+1)0 > 0 with Pm+1

i=1 λi = 1,

λ0γ−1

n Υ −1 n

X

t=1

Ttb ε∗

t dP ∗

⇒ MN(0, Z λ0(σ, rσ, · · · , rmσ)0(σ, rσ, · · · , rmσ)λ0), in probability. (39) Note that γ−1

n n−1/2Υ −1Ttb

ε∗

t = γ−1 n (Pm+1 i=1 λi(t/n)i)b

ε∗

t = γ−1 n πtb

ε∗

t , where πt = Pm+1 i=1 λi(t/n)i ≤ 1,

is an m.d. array with respect to F∗

t = σ{vs : s ≤ t}, with E∗[γ−1 n πtb

ε∗

t ]2 = γ−2 n πtb

ε2

  • t. Since

E∗[n−1

n

X

t=1

γ−2

n π2 tb

ε∗2

t − n−1 n

X

t=1

γ−2

n π2 tb

ε2

t]2

= n−2E∗[

n

X

t=1

γ−2

n π2 tb

ε2

t(v2 t − 1)]2 ≤ n−2E∗[ n

X

t=1

γ−2

n b

ε2

t(v2 t − 1)]2

= n−2

n

X

t=1

γ−4

n b

ε4

tE∗(v2 t − 1)2 ≤ Cn−2 n

X

t=1

γ−4

n b

ε4

t = op(1),

by (35), then by Markov inequality and πt → Pm+1

i=1 λiri,

n−1

n

X

t=1

γ−2

n π2 tb

ε∗2

t

= n−1

n

X

t=1

γ−2

n π2 tb

ε2

t + op∗(1) ⇒

Z λ0(σ, rσ, · · · , rmσ)0(σ, rσ, · · · , rmσ)λ0 Further we have n−2

n

X

t=1

γ−4

n π4 tE∗b

ε∗4

t

= n−2

n

X

t=1

γ−4

n π4 tb

ε4

tE∗v4 t ≤ Cn−2 n

X

t=1

γ−4

n b

ε4

t = op(1).

Then (39) holds by the CLT for m.d. arrays. (e). It follows from E∗Z∗0

t b

ε∗

tb

ε∗

s = E∗Z∗0 t E∗b

ε∗

tb

ε∗

s = 0 if s ≥ t, E∗Z∗0 t b

ε∗

tb

ε∗

s = E∗b

ε∗

t E∗Z∗0 t b

ε∗

s = 0 if

s < t.¥ Proof of Corollary 2. The proof uses arguments similar to the proof of Corollary 1. Proof of Theorem 3. Let e CK = PK

t=1 ε2 t and e

DK = e CK/ e Cn − K/n for K = 1, · · · , n. It can be shown that DK = e DK + Op(1/n). (40) 27

slide-59
SLIDE 59

(i) Since γ−2

n b

σ2 ⇒ R σ2, γ−4

n b

2 ⇒ (η − 1) R σ4, it suffices to show √n e DK ⇒ µR σ2 ¶−1 µ (η − 1) R σ4 ¶1/2 W(t), which follows from (40) and Theorem 1 in Inclán and Tiao (1994). (ii) It suffices to show e DK ⇒ µR σ2 ¶−1 µR s

0 σ2 − s

R σ2 ¶ . Let ζt = ε2

t −σ2 t and σ2 ζ = (η−1)

R σ4, then n−1γ−4

n

Pn

t=1 ζ2 t ⇒ σ2 ζ. It follows from the invariance principle for m.d. arrays that

n−1/2γ−2

n K

X

t=1

ζt − Kn−3/2γ−2

n n

X

t=1

ζt ⇒ σζW(s) − σζsW(1) := σζW(s). On the other hand, n−1/2

K

X

t=1

ζt − Kn−3/2

n

X

t=1

ζt = (n−1

n

X

t=1

ε2

t)√n e

DK − n−1/2

K

X

t=1

σ2

t + Kn−3/2 n

X

t=1

σ2

t

gives e DK = µ n−1γ−2

n

Pn

t=1 ε2 t

¶−1 [n−1/2 µ n−1/2γ−2

n

PK

t=1 ζt − Kn−3/2γ−2 n

Pn

t=1 ζt

¶ +n−1

K

X

t=1

γ−2

n σ2 t − Kn−2 n

X

t=1

γ−2

n σ2 t]

⇒ µR σ2 ¶−1 µR s

0 σ2 − s R σ2

¶ .¥

References

[1] Anatolyev, S. (2004). Robustness of residual-based bootstrap to composition of serially corre- lated errors. New Economic School, Moscow, Russia, Working paper. [2] Andrews, D. W. K. (1988). Laws of large numbers for dependent non-identically distributed random variables. Econometric Theory 4, 458-467. [3] Beare, B. (2005). Robustifying unit root test to permanent changes in innovation variance. Yale University, mimeographed. [4] Billingsley, P. (1968). Convergence of Probability Measures. J. Wiley, New York. 28

slide-60
SLIDE 60

[5] Bose, A. (1988). Edgeworth correction by bootstrapin autoregression. Annals of Statistics 16, 1709—1722. [6] Bose, A. (1990). Bootstrap in moving average models. Annals of the Institute of Statistical Mathematics 42, 753-768. [7] Boswijk H.P. (2001). Testing for a unit root with near-integrated volatility, Tinbergen Institute, Discussion Paper. [8] Boswijk H.P. (2005). Adaptive testing for a unit root with nonstationary volatility, Tinbergen Institute, Discussion Paper. [9] Brockwell, P.J. and R.A. Davis. (1991). Time Series: Theory and Methods, 2nd Edition. Springer, New York. [10] Brown, R. L., J. Durbin and J. Evans. (1975). Techniques for testing the constancy of regression relationship over time. Journal of Royal Statistical Society, Series (B) 37, 149-163. [11] Busetti, F. and A. M. R. Taylor. (2003). Variance shifts, structural breaks, and stationarity

  • tests. Journal of Business and Economic Statistics 21(4), 510-31.

[12] Cavaliere, G. (2004a). Testing stationarity under a permanent variance shift. Economics Letters 82, 403-408. [13] Cavaliere, G. (2004b). Unit root tests under time-varying variance shifts. Econometric Reviews 23, 259-292. [14] Cavaliere, G. and A. M. R. Taylor. (2004). Testing for unit roots in time series models with non-stationary volatility. University of Birmingham, Working paper. [15] Cavaliere, G. and A. M. R. Taylor. (2005). Stationarity tests under time-varying variances. Econometric Theory 21, 1112-1129 . [16] Chung, H. and J. Y. Park. (2004). Nonstationary nonlinear heteroskedasticity in regression. Rice University, Working paper. [17] Goncalves, S. and L. Kilian. (2004). Bootstrapping autoregression with conditional het- eroskedasticity of unknown form. Journal of Econometrics 123, 89-120. 29

slide-61
SLIDE 61

[18] Hafner, C.M. and H. Herwartz. (2000). Testing for linear autoregressive dynamics under het-

  • eroskedasticity. Econometrics Journal 3, 177-197.

[19] Hamilton, J.D. (1994). Time Series Analysis. Princeton University Press, Princeton. [20] Hamori, S. and A. Tokihisa. (1997). Testing for a unit root in the presence of a variance shift. Economics Letters 57, 245-253. [21] Hansen, B. E. (1995). Regression with nonstationary volatility. Econometrica 63, 1113-1132. [22] Inclan, C. and G. Tiao. (1994). Use of cumulative sums of squares for retrospective detection

  • f changes of variance. Journal of American Statistical Association 89, 913-923.

[23] Ing, C.-K. (2003). Multistep prediction in autoregressive processes. Econometric Theory 19, 254—279. [24] Kim, T. H., S. Leybourne and P. Newbold. (2002). Unit root tests with a break in innovation

  • variance. Journal of Econometrics 109, 365-387.

[25] Kreiss,

  • J. P. (1997). Asymptotic properties of residual bootstrap for autoregressions.

Manuscript, Institute for Mathematical Stochastics, Technical University of Braunschweig, Ger- many. [26] Kreiss, J. P. and J. Franke. (1992). Bootstrapping stationary autoregressive movingaverage

  • models. Journal of Time Series Analysis 13, 297-317.

[27] Liu, R.Y. (1988). Bootstrapp rocedure under some non-i.i.d. models. Annals of Statistics 16, 1696—1708. [28] Mammen, E. (1993). Bootstrapand wild bootstrapfor high dimensional linear models. Annals

  • f Statistics 21, 255—285.

[29] Park, J. and P. C. B. Phillips. (1999). Asymptotics for nonlinear transformations of integrated time series. Econometric Theory 15, 269-298. [30] Park, J. and P. C. B. Phillips. (2001). Nonlinear regression with integrated time series. Econo- metrica 69, 117-161. 30

slide-62
SLIDE 62

[31] Park, S., S. Lee and J. Jeon. (2000). The cusum of squares test for variance changes in infinite

  • rder autoregressive models. Journal of the Korean Statistical Society 29, 351-361.

[32] Phillips, P. C. B. (1996). Econometric Model Determination. Econometrica 64, 763-812. [33] Phillips, P. C. B. and W. Ploberger. (1996). An Asymptotic Theory of Bayesian Inference for Time Series. Econometrica 64, 381-413. [34] Phillips, P. C. B. and K.-L. Xu. (2006). Inference in autoregression under heteroskedasticity. forthcoming in Journal of Time Series Analysis. [35] Rao, C. R. (1973). Linear Statistical Inference and Its Applications, 2nd ed. New York: Wiley. [36] Shephard, N. (2005). Stochastic Volatility: Selected Readings. Oxford University Press, Oxford. [37] White, H. (1999). Asymptotic Theory for Econometricians, 2nd Edition. Academic Press, Lon- don. [38] Wichern, D., R. Miller and D. Hsu. (1976). Changes of variance in first order autoregression time series models - with an application. Applied Statistics 25, 248-256. [39] Wu, C. F. J. (1986). Jackknife, bootstrap and other resampling methods in regression analysis. Annals of Statistics 14, 1261—1295. [40] Xu, K.-L. and P. C. B. Phillips. (2005). Adaptive estimation of autoregressive models with time-varying variances. Yale University, mimeographed. 31