High-dimensional functional time series forecasting Yuan Gao, Han - - PDF document
High-dimensional functional time series forecasting Yuan Gao, Han - - PDF document
High-dimensional functional time series forecasting Yuan Gao, Han Lin Shang, Yanrong Yang Abstract In this paper, we consider the problem of modeling and forecasting mortality rate data of N correlated populations simultaneously. The data are
2 Yuan Gao, Han Lin Shang, Yanrong Yang
(2006) on theoretical properties, Yao et al. (2005) for sparse longitudinal data, Locantore et al. (1999) and Viviani et al. (2005) for some interesting applications. Existing FPCA method has been developed for independent observations, which is a serious weakness when we are dealing with time series data. In this paper, we adopt a dynamic FPCA approach (H¨
- rmann et al.
2015, Panaretos & Tavakoli 2013), where serial dependence between the curves are taken into account. With dynamic FPCA, functional time series are reduced to a vector time series, where the individual component processes are mutually uncorrelated principal component scores. It is often the case that we collect a vector of N functions at a single time point t. If these N functions are assumed to be correlated, multivariate functional models should be considered. Classical multivariate FPCA concatenates the multiple functions into one to perform univariate FPCA (Ramsay & Silverman 2005). Chiou et al. (2014) suggested normalizing each random function as a preliminary step before concatenation. Berrendero et al. (2011) studied functional version of principal component ananlysis, where multivariate functional data are reduced to one or two functions rather than vectors. However, existing models dealing with multivariate functional data either fail to handle data with a large N (as in the classical FPCA approach),
- r are hard to implement practically (as in Berrendero et al. 2011).
We propose in this paper a two-fold dimension reduction model to model and forecast high-dimensional functional time series. By high-dimension, we allow that the dimension of the functional time series N to be as large as or even larger than the length of observed functional time series T. The dimension reduction process is straightforward and easy to implement: 1) Dynamic functional principal component analysis is performed separately on each set of functional time series, resulting in N sets of principal component scores of low dimension K (typically less than 5); 2) The first principal component scores from each of N sets of functional time series are combined into an N ×T matrix. Multivariate principal component analysis is conducted to further reduce the dimension into an r ×T matrix (r ≪ N). The same is done for the second, third until the Kth principal component
- scores. The vector of N functional time series is reduced to rK what we call factors.
3) Fit scalar time series to each factor and produce forecasts. The forecast factors can be used to construct forecast functions. The proposed dimension reduction model is essentially using a matrix of small dimension (r × K) to represent the covariation of the original N functional time series. Elements of the reduced matrix are uncorrelated and it is adequate to model each component with scalar time series models. In the second step above mentioned, multivariate principal component analysis should be tailored to dependent data settings. We adopt factor models which are frequently used for dimension reduction. Some early application of factor analysis to multiple time series include Anderson (1963), Priestley et al. (1974) and Brillinger (1981). Time series in high-dimensional settings where N → ∞ together with T are studied in Chamberlain (1983), Bai (2003) and Lam et al. (2011). Among these, we adopt similar approach considered in Lam et al. (2011), where the model is conceptually simple and asymptotic properties are established. We are interested in applying the proposed model to mortality rate forecasting. There has been literature where age-specific mortality rates are modeled as functional data. Hyndman & Ullah (2007) proposed to forecast mortality and fertility rates with a robust functional time series model. Hyndman et al. (2013) introduced a product-ratio model to forecast multiple populations. However, none of the existing models known are applicable when the number of populations is large. In this paper, we focus on Japanese sub- national mortality rates, where data are available for each of the 47 prefectures. The data set contains yearly age and sex-specific mortality rates in a span of 41 years from 1975 to 2015. We show the superiority of our model which forecasts 47 populations simultaneously by comparing with the independent functional time series model.
High-dimensional functional time series forecasting 3
The remainder of the paper is organized as follows. In Section 2, more detailed background on dynamic FPCA is introduced and the two-fold dimension reduction model is proposed. We show simulation studies in Section 3, and in Section 4, we apply our model to Japanese age and sex-specific mortality rate data. Conclusion and discussion are in Section 5.
2 Research methods
In this section, we introduce dynamic FPCA and factor model for the two-fold dimension reduction process.
2.1 Dynamic functional principal component analysis
In this first step, dynamic FPCA is performed on each population separately. We consider stationary N- dimensional functional time series X X Xt : t ∈ Z, where X X Xt = (X1
t (u),...,XN t (u))⊤. Xi t (u) stands for the function
from ith population at time t. It takes values in the space H := L2(I ) of real-valued square integrable functions on I . The space H is a Hilbert space, equipped with the inner product x,y :=
- I x(u)y(u)du. For each
i = 1,...,N, we assume Xi
t has a continuous mean function, µi(u) and an auto-covariance function at lag h,
γi
h(u,v), where
µi(u) = E[Xi(u)], γi
h(u,v) = cov[Xi t (u),Xi t+h(v)]
(1) The long-run covariance function is defined as ci(u,v) =
∞
∑
h=−∞
γi
h(u,v)
(2) Using ci(u,v) as a kernel operator, we define the operator C by: Ci(x)(u) =
- I ci(u,v)x(v)dv,
u,v ∈ I (3) The kernel is symmetric, and non-negative definite. Thus by Mercer’s Theorem, the operator C admits an eigendecomposition Ci(x) =
∞
∑
k=1
λkx,υkυk, (4) where (λl : l ≥ 1) are C’s eigenvalues in descending order and (υl : l ≥ 1) the corresponding normalized
- eigenfunctions. By Karhunen-Lo`
eve Theorem, Xi can be represented with Xi
t (u) = ∞
∑
k=1
β i
t,kυi k(u)
(5)
4 Yuan Gao, Han Lin Shang, Yanrong Yang
where β i
t,k =
- I Xi
t (u)υi k(u)du is the kth principal component score at time t. The infinite-dimensional
functions can be approximated by the first K principal component scores: Xi
t (u) = K
∑
k=1
β i
t,kυi k(u)+εi t (u),
(6) where εi
t (u) is the error term that captures the remaining terms from K +1 to ∞. For each population, the
infinite-dimension functional time series is reduced to a K-dimension vector.
2.2 Factor model
With the first step dimension reduction, we now have principal component scores β i
t,k, where i = 1,...,N.
Following the early work by Bai (2009), we consider the following factor model for β β β i,t = (β 1
t,k,...,β N t,k)⊤.
For each k = 1,...,K, let β β βt,k = A A Akω ω ωt,k +e e ei,t, t = 1,...T, (7) where β β βt,k is the vector that contains the kth principal component score of all N functional time series. ω ω ωt,k is an r ×1 unobserved factor time series. A A Ak is an N ×r unknown constant factor loading matrix, and e e et,k is idiosyncratic error with mean 0 and variance σ2
i,t. It is assumed that r ≪ N. Thus we manage to use a
vector of much smaller dimension to capture the covariation of the kth principal component scores between N
- populations. The same is done for each group of principal component scores. So K factor models are fitted.
With two-fold dimension reduction, the original functional time series can be approximated by Xi
t (u) = K
∑
k=1
[A A Akω ω ωt,k]iυi
k(u)+θ i t (u),
(8) where [A A Akω ω ωt,k]i is the ith element in the vector A A Akω ω ωt,k, and θ i
t (u) is the error term from two steps of ap-
- proximation. The factors ω
ω ωt,k is the only thing captures variations with time, which would be used for forecasting.
2.3 Estimation
We want to estimate A A Ak
k k, ω
ω ωt,k and υi
k(u) in (8). In the dynamic FPCA step, the long-run covariance function
ci(u,v) can be estimated by
- ci(u,v) =
∞
∑
h=−∞
W h q
- γi
h(u,v),
(9) where ˆ γi
h(u,v) =
- 1
T ∑T−h j=1 (Xi j(u)− ¯
Xj(u))(Xj+h(v)− ¯ X(v)), h ≥ 0
1 T ∑T j=1−h(Xi j(u)− ¯
Xj(u))(Xj+h(v)− ¯ X(v)), h < 0
High-dimensional functional time series forecasting 5
where W is a weight function with W(0) = 1, W(u) = W(−u), W(u) = 0 if |u| > m for some m > 0, and W is continuous on [−m,m]. Some possible choices include Bartlett, Parzen, Tukey-Hanning, Quadratic spectral and Flat-top functions (Andrews 1991, Andrews & Monahan 1992). Here q is a bandwidth parameter. Rice & Shang (2017) proposed a plug-in procedure to select q. We use ci(u,v) as the kernel of the operator C, with which we can estimate υi
k(u) by performing eigende-
composition on C. υi
k(u) is the normalized eigenfunction that corresponds to the kth largest eigenvalue. The
empirical principal component scores β i
t,k =
- I Xi
t (u)
υi
k(u)du, can be calculated by numerical integration.
The estimates β i
t,k are then fitted to a factor model. The estimation of latent factors for high-dimensional
time series can be found in Lam et al. (2011). A natural estimator for A A Ak is defined as A A Ak = ( a a a1,..., a a ar), where
- a
a a j is the jth eigenvector of A A Ak, and
- A
A Ak =
h0
∑
h=1
- Σ
Σ Σ h,k Σ Σ Σ
⊤ h,k,
- Σ
Σ Σ h = 1 T −h
T−h
∑
h=1
- β
β βt+h,k − ˜ β β β k
- β
β βt,k − ˜ β β β k ⊤ , (10) where ˜ β β β k = 1
T ∑T t=1
β β βt,k. Thus the kth factor can be estimated by
- ω
ω ωt,k = A A A
⊤ k
β β βt,k (11) The estimator for the original function Xi
t (u) is
- Xi
t (u) = K
∑
k=1
- A
A Ak ω ω ωt,k
- i
υi
k(u),
i = 1,...,N, t = 1,...,T (12) where
- A
A Ak ω ω ωt,k
- i is the ith element of the vector
A A Ak ω ω ωt,k. In both the FPCA and factor model step, there is a need to choose the number of dimensions retained. Some commonly used methods include: 1) Ensuring a certain fraction of the data variation is explained (FVE) (Chiou 2012); 2) Ratio-based estimator (Lam et al. 2011); 3) Cross-validation (Rice & Silverman 1991); 4) Bootstrapping (Hall & Vial 2006); 5) Information criteria (Yao et al. 2005).
2.4 Forecasting
With two-fold dimension reduction, information of serial correlation is contained in the factors ω ω ωt,k. To forecast N-dimensional functional time series, we could instead make forecast on the estimated factors. Scalar or vector time series models could be applied. We suggest univariate time series models, such as autoregressive moving average (ARMA) models, since the factors are mutually uncorrelated. Consequently, we need to fit r ×K ARMA models on the factors. The prediction of the functions could be calculated:
6 Yuan Gao, Han Lin Shang, Yanrong Yang
E[ Xi
t+h|t(u)] = K
∑
k=1
- A
A Ak ω ω ωt+h|t,k
- i
υi
k(u),
i = 1,...,N, t = 1,...,T, (13) where Xi
t+h|t(u) is the h-step ahead forecast at time t.
3 Simulation Studies 3.1 Data generation
We illustrate our method using simulated data. We compare results using the proposed model and univariate model where each population is modeled and predicted separately. In order to generate N correlated samples of functional time series, we first construct the principal component scores for all populations. The first functional principal component scores for N populations are generated with β β β 1,t = A A A1,tω ω ω1,t +e e et, where e e e1,t are independent N(0,4) random variables, and A A A1,t is an N ×2
- matrix. The first column of A
A A1,t, has 4cos(2πi/N) as the ith element. The second column has 2sin(2πi/N) as the ith element. ω ω ωt are generated using a VAR(1) model with ω ω ωt =
- 0.7 0.2
0.2 0.7
- ω
ω ωt−1 +ξ ξ ξt, where ξ ξ ξ 1,t ∼ N(0,1) The second principal component scores for N populations can also be constructed using similar scheme with β β β 2,t = A A A2,tω ω ω2,t +e e et, where e e et are independent N(0,1) random variable. The first principal component makes up most of the variations in the simulated data. The first column of A A A2,t has cos(2πi/N) as the ith
- element. The second column has 1
2 sin(2πi/N) as the ith element. ω
ω ωt are generated using a VAR(1) model with ω ω ωt =
- 0.3 0.2
0.2 0.3
- ω
ω ωt−1 +ξ ξ ξt, where ξ ξ ξ 2,t ∼ N(0,1). While the functional principal components are constructed for each population, the two basis functions are generated using υ(i)
1 (u) = sin(πu/3+πi/8) and υ(i) 2 (u) = cos(πu/2−πi/3). The functional time series for
the ith population is constructed by X(i)
t (u) =
- β
β β 1,t
- i υ(i)
1 (u)+
- β
β β 2,t
- i υ(i)
2
where [·]i denotes the ith element of the vector.
3.2 Model fitting
When fitting models to the data, we use expanding window prediction: The data is divided into a training set and a test set. The proposed models are fitted to the training set and forecasts are made based on fitted models. Then the test set is used for forecast evaluation. Each time we increase the training size by one unit and refit the model. New forecasts are produced each time. At last all the prediction error for one, two and three-step ahead forecasts are collected and then taken means. We use mean absolute forecast error (MAFE) and mean squared forecast error (MSFE):
High-dimensional functional time series forecasting 7
MAFE(h) = 1 (21−h)× p
20
∑
η=h p
∑
j=1
- yn+η(x j)−
fn+η|n+η−h(xj)
- ,
MSFE(h) = 1 (21−h)× p
20
∑
η=h p
∑
j=1
- yn+η(xj)−
fn+η|n+η−h(x j) 2 , (14) where fn+η|n+η−h represents the h-step-ahead prediction using the first n+η −h years fitted in the model, and yn+η(xj) denotes the holdout function.
3.3 Result
We use various combinations of number of populations N and sample size T, and see how the proposed method works when both N and T grow larger. Table 1 shows the sample mean of the mean absolute forecast errors in different settings. Each number in the table is the mean of h-step ahead errors taken over all N
- populations. We show only short-term forecast: h = 1,2,3. It could be seen that the proposed model produces
smaller forecast errors under all settings and all forecast horizons. In each setting, the MAFE increases with forecast horizon, but in general the error values do not inflate with the growth of both N, T values. Table 1: Mean absolute forecast error
(N, T) h Individual forecast Proposed model 1 2.372 2 2 2. . .2 2 28 8 83 3 3 (50,100) 2 2.766 2 2 2. . .6 6 63 3 30 3 3.088 2 2 2. . .8 8 89 9 90 1 2.353 2 2 2. . .1 1 19 9 90 (50,200) 2 2.783 2 2 2. . .5 5 55 5 54 4 4 3 3.023 2 2 2. . .7 7 78 8 84 4 4 1 2.646 2 2 2. . .4 4 46 6 63 3 3 (100,200) 2 2.995 2 2 2. . .7 7 77 7 72 2 2 3 3.202 3 3 3. . .0 01 1 15 5 5 1 2.333 2 2 2. . .1 1 16 6 69 9 9 (200,200) 2 2.736 2 2 2. . .5 5 53 3 39 9 9 3 2.960 2 2 2. . .7 7 74 4 49 9 9
4 Empirical studies
Japanese sub-national mortality rates in 47 prefectures is used to demonstrate the effectiveness of our proposed
- method. Available at Japan Mortality Database (2017), the data set contains yearly age-specific mortality
rates in a span of 41 years from 1975 to 2015. The observations are yearly mortality curves from ages 0 to 110 years, where the age is treated as the continuum in the rate function. In this study, the data at ages 95 and
- lder are grouped together, to avoid problems associated with erratic rates at these ages. One example of how
8 Yuan Gao, Han Lin Shang, Yanrong Yang
the data looks like is displayed in Figure 1. The figure presents log smoothed female age-specific mortality rates in Tokyo prefecture, where red color represents more distant data and purple represents more recent
- years. The curves are smoothed using penalized regression splines with a monotonically increasing constraint
after the age of 65 (see Wood 1994, Hyndman & Ullah 2007). The dimension of the functional time series is 47, which is greater than the sample size 41. With the two-fold dimension reduction model, we use the first three principal component scores for each population, and the first three factors. The first 26 years of data (from 1975 to 2000) are allocated to the training set, and the last 15 years of data from (2001 to 2015) are allocated to the testing set. We compare the forecast accuracy of our proposed method with the independent functional time series model, where each sub-national population is forecast
- individually. It is found that the proposed method outperforms the independent model in general. Specifically,
in 24 out of 47 prefectures, the proposed model produces smaller MAFE in all one to ten forecast horizons. In 45 out of 47 prefectures, the proposed model produce smaller mean of MAFE taken across all forecast horizons. The one five and ten-step ahead MAFE are shown in Figure 2. The prediction errors generally increase with forecast horizon. The proposed model produces lower mean error level with also visible smaller variation across forecast horizon. 20 40 60 80 −10 −8 −6 −4 −2
Tokyo: female death rates (1975−2015)
Age Log death rate
- Fig. 1: Log smoothed female mortality rates in the Tokyo prefecture from 1975 to 2015.
High-dimensional functional time series forecasting 9
- 0.001
0.002 0.003
- nestep
fivestep tenstep
forecast horizon Mean absolute forecast error Model
multi uni
- Fig. 2: MAFE values across forecast horizon. Green color represents the proposed model and purple represents
the individual forecasting model. We have also modeled and forecast Japanese male mortality rates. In 19 out of 47 prefectures, the proposed model produces smaller MAFE in all one to ten forecast horizons. In 30 out of 47 prefectures, the proposed model produces smaller mean MAFE taken across all forecast horizons.
5 Conclusions
We have proposed a two-fold dimension reduction model for high-dimension functional time series data. The model proved to work well in simulations and empirical analysis. We apply our model to Japanese sub-national mortality data, where age specific mortality rate data are from 47 prefectures. Compared with existing forecasting approaches, the proposed method has its merits in three ways: 1) When multiple populations of mortality rates are believed to be correlated, our approach allows one to model and forecast all populations simultaneously. 2) When the number of populations tend to be large, our approach is still feasible and prediction error does not inflate. 3) The proposed method is simple in practical implementation.
10 Yuan Gao, Han Lin Shang, Yanrong Yang
References
Anderson, T. (1963), ‘The use of factor analysis in the statistical analysis of multiple time series’, Psychome- trika 28, 1–25. Andrews, D. (1991), ‘Heteroskedasticity and autocorrelation consistent covariance matrix estimation’, Econo- metrica: Journal of the Econometric Society pp. 817–858. Andrews, D. & Monahan, J. (1992), ‘An improved heteroskedasticity and autocorrelation consistent covariance matrix estimator’, Econometrica pp. 953–966. Bai, J. (2003), ‘Inferential theory for factor models of large dimensions’, Econometrica 71, 135–171. Bai, J. (2009), ‘Panel data models with interactive fixed effects’, Econometrica 77, 1229–1279. Berrendero, J., Justel, A. & Svarc, M. (2011), ‘Principal components for multivariate functional data’, Computational Statistics and Data Analysis 55(9), 2619–2634. Brillinger, D. (1981), Time Series Data Analysis and Theory, extended ed., Holder-Day, San Francisco. Chamberlain, G. (1983), ‘Funds, factors, and diversification in arbitrage pricing models’, Econometrica 70, 191–221. Chiou, J. M. (2012), ‘Dynamical functional prediction and classification with application to traffic flow prediction’, The Annals of Applied Statistics 6(4), 1588–1614. Chiou, J. M., Chen, Y. T. & Yang, Y. F. (2014), ‘Multivariate functional principal component analysis: A normalization approach’, Statistica Sinica 24(4), 1571–1596. Hall, P. & Hosseini-Nasab, M. (2006), ‘On properties of functional principal components analysis’, Journal
- f the Royal Statistical Society, Statistical Methodology, Series B 68(Part 1), 109–126.
Hall, P., M¨ uller, H. G. & Wang, J. L. (2006), ‘Properties of principal component methoda for functional and longitudinal data analysis’, The Annals of Statistics 34(3), 1493–1517. Hall, P. & Vial, C. (2006), ‘Assessing the finite dimensionality of functional data’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68(4), 689–705. H¨
- rmann, S., Kidzi´
nski, L. & Hallin, M. (2015), ‘Dynamic functional principal components’, Journal of the Royal Statistical Society, Statistical Methodology, Series B 77, 319–348. Hyndman, R. J., Booth, H. & Yasmeen, F. (2013), ‘Coherent mortality forecasting: The product-ratio method with functional time series models’, Demography 50(1), 261–283. Hyndman, R. J. & Ullah, M. S. (2007), ‘Robust forecasting of mortality and fertility rates: A fucntional data approach’, Computational Statistics and Data Analysis 51(10), 4942–4956. Japan Mortality Database (2017), National Institute of Population and Social Security Research. Accessed at 8 March 2016. URL: http://www.ipss.go.jp/p-toukei/JMD/index-en.html. Lam, C., Yao, Q. & Bathia, N. (2011), ‘Estimation of latent factors for high-dimensional time series’, Biometrika 98, 901–918. Locantore, N., Marron, J. S., Simpson, D. G., Tripoli, N., Zhang, J. T. & Cohen, K. L. (1999), ‘Robust principal componet analysis for functional data’, Test 8(1), 1–73. Panaretos, V. M. & Tavakoli, S. (2013), ‘Cram´ er-karhunen-lo` eve representation and harmonic principal component analysis of functional time series’, Stochastic Processes and their Applications 123, 2779– 2807. Priestley, M., Rao, T. & Tong, J. (1974), ‘Applications of principal component analysis and factor analysis in the identification of multivariable systems’, IEEE Trans. Auto. Contr 19, 730–734. Ramsay, J. O. & Silverman, B. W. (2002), Applied Functional Data Analysis, Springer, New York. Ramsay, J. O. & Silverman, B. W. (2005), Functional Data Analysis, Springer, New York. Rice, G. & Shang, H. L. (2017), ‘A plug-in bandwidth selection procedure for long run covariance estimation with stationary functional time series’, Journal of Time Series Analysis 38, 591–609.
High-dimensional functional time series forecasting 11
Rice, J. & Silverman, B. (1991), ‘Estimating the mean and covariance structure nonparametrically when the data are curves’, Journal of the Royal Statistical Society. Series B (Methodological) 53(1), 233–243. Viviani, R., Gr¨
- n, G. & Spitzer, M. (2005), ‘Functional principal component analysis of fMRI data’, Human