Big data and machine learning in macroeconomics: Some challenges and prospects
Eleni Kalamara George Kapetanios Felix Kempf King’s College London
Big data and machine learning in macroeconomics: Some challenges and - - PowerPoint PPT Presentation
Big data and machine learning in macroeconomics: Some challenges and prospects Eleni Kalamara George Kapetanios Felix Kempf Kings College London Motivation - Macroeconomic forecasts have been, to put it mildly, receiving bad press... -
Eleni Kalamara George Kapetanios Felix Kempf King’s College London
phenomena all the time.
2 / 66
searches, surveys.
Data.
the ever-growing amount of data, avoid overfitting and improve forecast accuracy. − → e.g. Factor models.
3 / 66
for macroeconomics and provide some proposals on ways forward
for the time series nature of macroeconomic data.
they can’t do that since most seem to be best suited for stationary data - certainly neural net ones are.
There are approaches but we need one tailored to macroeconomics.
4 / 66
5 / 66
unobserved variables and forecasting.
Xi = F + ǫi, i = 1, ..., N where Xi are observed, F and ǫi are unobserved, F ∼ niid(0, σ2
f ), and ǫi ∼ niid(0, σ2 i ).
We are interested in Var(F|X1, ..., XN). Is Var(F| ¯ X), ¯ X = 1
N ∑i Xi a good enough
alternative?
i = σ2 for all i.
see details
6 / 66
200 400 600 800 1000 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 7 / 66
The N dimensional balanced dataset: Xt = ΛFt + ξt (1) The unstructured data set: Zt = BtFt + ǫt (2)
see example
events at every period and each event can be represented by a vector of different
Ft = CFt−1 + ηt (3)
8 / 66
Define: Yt =
Zt
Bt
ǫt
And re-write: Yt = Λ0,tFt + ζt Measurement eq Ft = CFt−1 + ηt Transition eq. where Λ0,t = (Λ, Bt)′.
9 / 66
Zt a higher one.
frequency factor.
10 / 66
Different Specifications for DGP for: Zt = BtFt + ǫt:
(exact model)
it ∗ Imax(kt), σ2 it ∈ U(1, 3).
11 / 66
Model 1: Not include zt (standard factor model), i.e: Xt = ΛFt + ξt Model 2: Y ∗
t =
Xt Z ∗
t
Λ B∗
t
ξt ǫ∗
t
t = ∑kt k=1 Zt/kt - average of the unstructured data set at each point in time t.
t ) = ¯ σ2
i,t
max kt
Keep the same factor structure
12 / 66
True Parameters : β = 0.5, σ2
i ∈ U(1, 3)
Model 1 Model 2 max(kt) 10 50 100 500 1000 10 50 100 500 1000 50 0.666 0.215 0.190 0.076 0.096 0.339 0.307 0.306 0.193 0.103
T
100 0.662 0.222 0.266 0.168 0.123 0.995 0.362 0.421 0.229 0.167 200 0.280 0.243 0.261 0.181 0.102 0.461 0.386 0.409 0.246 0.139
Table: Average of relative RMSE of the HSS over Model 1 and Model 2 respectively. Model 1: does not include unstructured dataset (Zt), Model 2: includes the average of Zt
.
13 / 66
appeared monthly. That said, let zk
t , be a kt × 1 vector of sentiment scores where k is
the number of articles for each point period t and kt = 1 . . . M. This implies that for s, where k < s < M the observations of zk
t are missing.
see example
sentiment/uncertainty scores extracted from newspaper articles1
see text methods .
ˆ xt+h = ˆ α + ˆ βxt + ∑
j
ˆ γj · χjt where χjt : macro/fin factors (Redl, 2017).
1The sentiment on each article is measured using a dictionary based method
14 / 66
Text Model h = 3 h= 6 h= 9 Loughran sentiment 0.828 ** 0.823 *** 0.850*** Harvard sentiment 0.831 *** 0.813 *** 0.850 *** vader sentiment 0.853*** 0.856*** 0.864 *** stability sentiment 0.874 0.824*** 0.845 ***
0.885*** 0.932 0.930 tf idf econom 0.889 * 0.865 *** 0.906 * Nyman sentiment 0.933 *** 0.964*** 0.972 economcounts 0.938** 0.933** 0.965 ** tf idf uncert 0.939 0.951 0.934 alexopoulos 09 0.964 0.953 0.973 *** uncertaincounts 0.973 0.951 0.971 Afinn sentiment 0.985 1.004 1.001 baker bloom davis 1.001 0.967 0.973 husted 1.001 0.979 0.983
Table: relative RMSEs, based on the estimated factor using a text method.* Denotes rejection at the 10% level, ** at the 5% level and *** 1 (D-M test)
15 / 66
Text Model h = 3 h= 6 h= 9 Harvard sentiment 0.861** 0.745 0.688 Loughran sentiment 0.881 0.803 0.754 economcounts 0.905 0.852 0.829
0.910 0.855 0.824 stability sentiment 0.922 0.865 0.835 uncertaincounts 0.925 0.893 0.883 tf idf econom 0.926 0.871 0.845 vader sentiment 0.929* 0.863 0.809 alexopoulos 09 0.943 0.919 0.913 tf idf uncert 0.957 0.922 0.919 Nyman sentiment 0.960 0.923 0.893 Afinn sentiment 0.975 0.970 0.967 husted 0.979 0.972 0.986 baker bloom davis 0.983 0.978 0.980
Table: relative RMSEs, based on the estimated factor using a text method.* Denotes rejection at the 10% level, ** at the 5% level and *** 1 (D-M test)
16 / 66
17 / 66
for structural breaks using the kernel based approach of Giraitis et al. (2014).
macroeconomic applications. In particular, examine the support vector regressor (SVR) (Vapnik, 1998) and neural nets (Friedman et al., 2001) and propose a ground theoretical framework to allow for structural breaks.
18 / 66
A general definition of a multi-layer (deep) neural network follows: Let x = (x1, . . . , xp)′ be the input vector. Let h1 . . . hL be vectors of activation functions
see types for each of the L (hidden) layers of
the network representing non-linear transformations of the data. Denote by gl the l-layer which is a vector of functions of length equal to the number of Jl nodes in that layer, such that g0 = x. The overall structure of the network is equal to: G = gL(gL−1(. . . (g0(.)) where: gl(x) = W1,lhl (W2,lx + bl) ∀1 ≤ l ≤ L and W1,l, W2,l and bl are matrices and vectors of weight parameters.
19 / 66
The model can then be written as (Friedman et al., 2001): yt = G
+ εt, t = 1, ..., T (2) where xt is p × 1, β0 is k × 1 and contains all model parameters and G denotes the overall nonlinear mapping. We estimate this model by penalised least squares, i.e. ˆ β = arg min
β
y − G (X, β)2
2
T + λ β1 . where y = (y1, ..., yT )′ and G (X, β) = (G (x1, β) , ..., G (xT, β))′
20 / 66
Let the model now be extended to the case yt = G
t
(3) where β0
t is a persistent, bounded, possibly stochastic process and:
t − β0 s
≤ C |t − s| min(t, s) sup
s≤h≤t
h
(4) We estimate this model by time varying penalised least squares, i.e. ˆ βt = arg min
β
y − G (X, β)2
wt,2
T + λ β1 . where y − G (X, β)2
wt,2 = T
j=1
wt,j
2 , and wt,j = K
H
21 / 66
t is allowed to be stochastic.
show that kernel based estimation of β0
t , in many contexts (regression, ML) is
consistent and asymptotically normal, even if β0
t is stochastic and satisfies a
smoothness condition.
setting, providing sharp probability exponential inequalities for weighted, randomly scaled sums of mixing and possibly fat-tailed data that allows time varying estimation
22 / 66
Theorem
Let model (3) with condition (4) hold and let (i) εt be a martingale difference process that is independent of xt and (ii) G be a function with bounded first derivatives. Then, for all t,
βt
wt,2
T = Op log k H 1/2 sup
t
t
(5) where β(T),0 = (β0
1′, ..., β0 T ′)′ and G
= (G
1
T
23 / 66
yt = β0′xt + εt. (6)
ˆ β = min
β β2 , s.t.
yt − β′xt ≤ ǫ + ξt, β′xt − yt ≤ ǫ + ξ∗
t , ξ∗ t , ξt ≥ 0
where ǫ denotes a preselected error margin parameter (tuning parameter) and ξ∗
t , ξt
are slack variables.
24 / 66
Dual Formulation of the problem:
max
α,α∗
T
i,j=1
(αi − α∗
i )
j
ixj − ǫ T
i=1
(αi + α∗
i ) + T
i=1
(αi + α∗
i ) yi
T
i=1
(αi − α∗
i ) = 0 and αi, α∗ i ≥ 0.
Then, ˆ βt =
T
i=1
(αti − α∗
ti) xi
The value of parameter ǫ defines a margin of tolerance where no penalty is given to the errors. Thus, the formulation of the problem can be viewed as a penalised optimisation procedure where there is a positive value which controls the penalty imposed on observations that lies outside the ǫ and helps to prevent overfitting (Steinwart and Christmann, 2008).
25 / 66
Following Giraitis et al. (2014); Kapetanios and Zikes (2018), we incorporate weights wt,j where wt,j = K
H
interest and decays for more distant observations and bandwidth H = o (T) , H → ∞, the
max
αt,α∗
t
T
i,j=1
wt,jwt,i (αi − α∗
i )
j
ixj−
ǫ
T
i=1
wt,i (αi + α∗
i ) + T
i=1
wt,i (αi + α∗
i ) yi
s.t.
T
i=1
(αi − α∗
i ) = 0 and αi, α∗ i ≥ 0, T
i=1
wt,i = T Then, ˆ βt =
T
i=1
wt,i (αti − α∗
ti) xi.
26 / 66
Dataset
2Focus on portfolios rather than individual assets because they have more stable betas, higher
signal-to-noise ratios, and are less prone to missing data issues. Data from Kenneth French’s website.
3150 risk factors at the monthly frequency for the period from July 1976 to December 2017
27 / 66
Table: Out of sample relative RMSEs using time-varying ML and standard ML model: Benchmark: AR(1)
Steps Ahead (1) (3) (6) (9) (12) Cnsm TVSVM 0.750 0.735 0.734 0.733 0.829 SVM 0.868 0.844 0.830 0.839 0.831 TVNN 0.940 1.000 1.000 0.890 0.983 NN 1.106 1.030 1.090 0.925 1.050 TV-BOOST 1.01 1.299 1.066 0.991 0.966 Manuf TVSVM 0.684 0.864 0.800 0.605 0.798 SVM 0.892 0.871 0.809 0.811 0.793 TVNN 1.035 1.068 0.947 0.984 0.887 NN 0.968 1.299 0.991 0.944 0.936 TV-BOOST 1.026 0.976 0.960 0.976 0.978
28 / 66
Table: Out of sample relative RMSEs using time-varying ML and standard ML model: Benchmark: AR(1)
Steps Ahead (1) (3) (6) (9) (12) HiTech TVSVM 0.608 0.604 0.805 0.809 0.813 SVM 0.829 0.811 0.803 0.822 0.821 TVNN 0.982 0.987 0.997 0.964 0.908 NN 0.926 0.927 0.926 0.944 0.983 TV-BOOST 1.032 0.981 0.991 0.992 0.989
29 / 66
step ahead using a large panel of survey indicators.
support vector regressions and compare with standard AR(1). For comparison, forecasts are also derived using the standard neural nets and support vector regressors under the same specifications.
30 / 66
In-sample period: 2000m1-2009m12 Model h = 1 h = 3 h= 6 h= 9 h= 12 TVNN 0.750*** 0.698* 0.742* 0.840 0.825* NN 0.843 0.721 0.778 0.843 0.900*** TVSVR 0.945 0.925 0.887 0.915 0.987 SVR 0.716 0.646*** 0.664*** 0.696* 0.764
Table: Average RMSEs at h=1,3,6,9,12 for the time-varying and the standard ML models relative to the AR(1). *, **, *** are the results from Diebold and Mariano (1995) test with Harvey’s (1997) adjustment for predictive accuracy.* Denotes rejection at the 10% level, ** at the 5% level and *** 1% level.
31 / 66
32 / 66
We provide an alternative toolbox to interpret deep neural networks in the context of macroeconomic modelling. In this presentation, we
One of the key criticisms of (deep) neural networks is the limited scope for model
In particular, first preliminary results look quite interesting, so that we pursue this idea
points in time (e.g.times of increased volatility) while it remains constant otherwise. For other dependent variables, this variable influence seems to change more frequently.
33 / 66
No clear definition for interpretability, but can be summarised as: the degree to which a human can understand the cause of a decision, see Miller (2018). Why do we care at all about interpretability if accuracy is competitive? Single metric such as OOS-MSE is an incomplete description, e.g. see Doshi-Velez and Kim (2017). There are, however, many other reasons, why interpretability is relevant (e.g. see Molnar (2019)), including:
34 / 66
There already exist different approaches to generally enhance model interpretability in ML, examples include
In this project, we turn our focus explicitly to neural networks in macroeconomics.
35 / 66
The central motivation for this research is two-fold: 1 Universal approximation:
Hornik et al. (1989)
2 Time-varying effects:
e.g. see Kapetanios (2007)
We therefore hope to make meaningful contributions to current debates by offering time-varying analysis tools.
36 / 66
In its most fundamental form, we describe an arbitrary economic process as yt = E(yt|xt−1) + ǫt, (8) where E(yt|xt−1) = g(xi,t−1, θ), (9) is a (non-)linear approximation for the true but unknown DGP. Note the lag in x which is supported by the fact of publication delays but also by idiosyncratic persistence. Moreover, we allow x to include lags of y. All model parameters including the weights and biases as well as hyper-parameters are denoted by θ. What is g(·)?
37 / 66
We investigate the case where we approximate g(·) with a neural network. In general, for feedforward networks, g(·) takes the form g(X, Θ) = σL(...σ2(W2
T σ1(W1 TX + b1) + b2)),
(10) where σ are activation functions, l = 1, 2, ..., L denotes the number of layers. While the weights W and biases b are summarised in Θ, all model parameters, including weights, biases and other hyper-parameters, are denoted by θ, e.g. see Goodfellow et al. (2016). The considered loss function is ˜ L(Y, X, θ) = L(Y, X, Θ) + λΩ(Θ), with L(Y, X, Θ) = (g(X, Θ) − Y)T (g(X, Θ) − Y). (11) Moreover, with L1, respectively L2 regularisation, the loss function becomes ˜ L(Y, X, θ) = (g(X, Θ) − Y)T (g(X, Θ) − Y) + λ||Θ||1 (12) ˜ L(Y, X, θ) = (g(X, Θ) − Y)T (g(X, Θ) − Y) + 1 2λΘT Θ (13)
38 / 66
In this first preliminary draft, we consider:
drafts)
Many other specifications are imaginable!
39 / 66
We propose the usage of partial derivatives at each point in time to evaluate variable influence over time. Iij,t = ∂gj(X, θ) ∂xi,t−p (14) Note that in this preliminary draft, each target variable j (j = 1, 2, ..., N) has its own g(·). Alternatives are also imaginable, where the network output could be multidimensional. Due to the inherent non-linearity of the neural network we expect the derivative to vary
The motivation for using the partial derivative is that the derivative can be interpreted as the marginal influence each input variable has on the process at each point in time. It can also be used to test Granger causality.
40 / 66
Moreover, we propose the usage of confidence bands in extension to equation (14). In particular, we propose using the moving-block bootstrap methodology, where we draw T − b + 1 overlapping blocks from the original data, where T is the total number of
are drawn at random with replacement and aligned following the order they were drawn with to build the bootstrapped observations, e.g. see Kunsch (1989). For each bootstrapped dataset, we fit a neural network as described before. We then calculate the partial derivatives, but with respect to the original input data: IijB,t = ∂gj,B(·) ∂xi,t−p , (15) where B indicates the respective bootstrap replication.
41 / 66
Similarly to the methodology applied in a linear setting, we propose the usage of impulse response functions (IRF) in the context of neural networks. Generally, we apply the framework IRF(h, ν, Ω) = E[yt+h|νt, Ωt−1] − E[yt+h|Ωt−1], (16) where νt denote structural shocks at time t, e.g. see Koop et al (1997). Given yt = g(xi,t−1, θ) + ǫt, (17) with E[ǫtǫ′
t] = Σ
(18) being a symmetric and positive definite covariance matrix whose off-diagonals are non-zero. We apply a Cholesky Decomposition of the covariance matrix such that Σ = PP′, where P is a lower-triangular matrix. It follows νt = Put (19) with ut being reduced form residuals, e.g. see Sims (1980).
42 / 66
We consider the US economy as an empirical example with GDP, inflation, unemployment, export prices, S&P500 returns and Fed Fund rates as our dependent, with their lagged values as explanatory variables. The dataset ranges from Q4 1984 to Q2 2019. In particular, we include export prices as another price variable to avoid encountering price puzzle (e.g. see Sims (1992), Buch et al. (2014), Balke et al. (1994)). However, we find that an impulse response analysis (e.g. using VAR) is occasionally still displaying the price puzzle dependent on where we split the data. We perform variable transformation to make them stationary. In this draft, we feed the network the already transformed data. However, in next steps we will experiment with standardised and / or raw data. Since the Cholesky Decomposition is sensitive to variable ordering we use the following
GDP → CPI → Unemployment → Export Prices → S&P 500 → Rates
43 / 66
Figure: Full Sample (in- and out-of-sample) predictions
44 / 66
Figure: Partial Derivatives: CPI
45 / 66
Figure: Partial Derivatives: GDP
46 / 66
Figure: Partial Derivatives: Rates
47 / 66
variable.
values of GDP most of the time. It is only during times of increased volatility that the network also becomes more sensitive to changes in other explanatory variables.
changes in almost all variables. In absolute values, CPI and Unemployment seem to have the largest influence.
48 / 66
shocks in rates lead to positive responses in CPI as supported by economic theory.
response is time-dependent.
negative shock in rates cannot offset the effect of the crisis and CPI falls despite reduced rates.
49 / 66
respond negatively to a negative shocks in GDP as expected by economic theory.
prior the GFC leads to a lower level in rates than post the crisis.
50 / 66
respond negatively to a negative shock in CPI.
economic theory.
effects of a shock in CPI seem to be more distinct prior the GFC.
51 / 66
We find that both partial derivatives and non-linear impulse responses can help to shed some light on economic theory. First preliminary results look somewhat interesting, so we will pursue this project further. There is room for further improvement, in particular with regard to model tuning and selection.
52 / 66
53 / 66
Alexopoulos, M., Cohen, J., et al. (2009). Uncertain times, uncertain measures. University of Toronto Department of Economics Working Paper, 352. Apley, D. W. (2016). Visualizing the effects of predictor variables in black box supervised learning models. arXiv preprint arXiv:1612.08468. Baker, S. R., Bloom, N., and Davis, S. J. (2016). Measuring economic policy uncertainty. The Quarterly Journal of Economics, 131(4):1593–1636. Balke, N. S., Emery, K. M., et al. (1994). Understanding the price puzzle. Federal Reserve Bank of Dallas Economic Review, Fourth Quarter, pages 15–26. Buch, C. M., Eickmeier, S., and Prieto, E. (2014). Macroeconomic factors and microlevel bank behavior. Journal of Money, Credit and Banking, 46(4):715–751. Correa, R., Garud, K., Londono, J. M., Mislang, N., et al. (2017). Constructing a dictionary for financial stability. Technical report, Board of Governors of the Federal Reserve System (US).
54 / 66
Doshi-Velez, F. and Kim, B. (2017). Towards a rigorous science of interpretable machine
Feng, G., Giglio, S., and Xiu, D. (2019). Taming the factor zoo: A test of new factors. Technical report, National Bureau of Economic Research. Friedman, J., Hastie, T., and Tibshirani, R. (2001). The elements of statistical learning, volume 1. Springer series in statistics New York. Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232. Gilbert, C. H. E. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Eighth International Conference on Weblogs and Social Media (ICWSM-14). Available at (20/04/16) http:/ /comp. social. gatech. edu/papers/icwsm14.
Giraitis, L., Kapetanios, G., and Yates, T. (2014). Inference on stochastic time-varying coefficient models. Journal of Econometrics, 179(1):46–65.
55 / 66
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E. (2015). Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical Statistics, 24(1):44–65. Goodfellow, I., Bengio, Y., Courville, A., and Bengio, Y. (2016). Deep learning, volume 1. MIT press Cambridge. Hornik, K., Stinchcombe, M., and White, H. (1989). Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366. Hu, G., Bhargava, P., Fuhrmann, S., Ellinger, S., and Spasojevic, N. (2017). Analyzing users’ sentiment towards popular consumer industries and brands on twitter. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on, pages 381–388. IEEE. Hu, M. and Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 168–177. ACM.
56 / 66
Husted, L. F., Rogers, J., and Sun, B. (2017). Monetary policy uncertainty. International Finance Discussion Papers 1215, Board of Governors of the Federal Reserve System (U.S.). Joseph, A. (2019). Shapley regressions: A framework for statistical inference on machine learning models. arXiv preprint arXiv:1903.04209. Kapetanios, G. (2007). Measuring conditional persistence in nonlinear time series. Oxford Bulletin of Economics and Statistics, 69(3):363–386. Kapetanios, G. and Zikes, F. (2018). Time-varying lasso. Economics Letters, 169:1–6. Kunsch, H. R. (1989). The jackknife and the bootstrap for general stationary observations. The annals of Statistics, pages 1217–1241. Loughran, T. and McDonald, B. (2013). Ipo first-day returns, offer price revisions, volatility, and form s-1 language. Journal of Financial Economics, 109(2):307–326. Miller, T. (2018). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence.
57 / 66
Molnar, C. (2019). Interpretable machine learning. Lulu. com. Nielsen, F. ˚
Nyman, R., Kapadia, S., Tuckett, D., Gregory, D., Ormerod, P., and Smith, R. (2018). News and narratives in financial systems: exploiting big data for systemic risk assessment. Bank of England Staff Working Papers, 704. Sims, C. A. (1980). Macroeconomics and reality. Econometrica: journal of the Econometric Society, pages 1–48. Sims, C. A. (1992). Interpreting the macroeconomic time series facts: The effects of monetary policy. European economic review, 36(5):975–1000. Steinwart, I. and Christmann, A. (2008). Support vector machines. Springer Science & Business Media. Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock
58 / 66
Vapnik, V., Burges, C. J., Kaufman, L., Smola, A. J., and Drucker, H. (1997). Support vector regression machines. In Advances in neural information processing systems, pages 155–161.
59 / 66
60 / 66
Go back 61 / 66
Positive and negative dictionary Boolean Computer science-based Financial stability (Correa et al., 2017) Economic Uncertainty (Alexopoulos et al., 2009) VADER sentiment (Gilbert, 2014) Finance oriented (Loughran and McDonald, 2013) Monetary policy uncertainty (Husted et al., 2017) ‘Opinion’ sentiment (Hu et al., 2017; Hu and Liu, 2004) Afinn sentiment (Nielsen, 2011) Economic Policy Uncertainty (Baker et al., 2016) Punctuation economy (this paper:
See details )
Harvard IV (used in Tetlock (2007)) Anxiety-excitement (Nyman et al., 2018) Single word counts of “uncertain” and “econom” tf-idf applied to “uncertain” and “econom”
Go back 62 / 66
True Parameters : σ = 1, q = 0.9 Model 1 Model 2 T maxkt 100 500 1000 100 500 1000 50 0.928 0.750 0.640 0.928 0.757 0.635 100 0.920 0.741 0.642 0.914 0.742 0.640 400 0.921 0.742 0.635 0.920 0.755 0.640 1000 0.914 0.742 0.6357 0.921 0.748 0.647
Table: Average of RMSEs of the High Dimensional state space relative to the comparator models. Model1: not include unstructured dataset (Zt), Model2: includes the average of Zt
63 / 66
1 1+exp −x
go back 64 / 66
Var(F|X1, . . . XN) = ΣFF − ΣFXΣ−1
XXΣXF. Given that:
Var(Xi) = 1 + σ2
i ,
Var( ¯ X) = 1 + 1
N2 ∑N i=1 σ2 i ,
Cov(Xi, F) = 1, Cov( ¯ X, F) = 1, we have Var(F| ¯ X) = 1 − 1 1 + 1
N2 ∑N i=1 σ2 i
, And applying the Sherman-Morrison formula : Var(F|X1, ..., XN) = 1 −
N
i=1
1 σ2
i
+ ∑N
i=1 ∑n j=1 1 σ2
i σ2 j
1 + ∑N
i=1 1 σ2
i
.
go back 65 / 66
It holds that: Var(F| ¯ X) ≥ Var(f|X1, XN),i.e:,
N
i=1
1 σ2
i
− ∑N
i=1 ∑N j=1 1 σ2
i σ2 j
1 + ∑N
i=1 1 σ2
i
≥ 1 1 + 1
N2 ∑N i=1 σ2 i
If σ2
i = σ2, for all i ∈ N,
N σ2 −
N2 σ4
1 + N
σ2
− 1 1 + σ2
N
≥ 0 (20) We set α = N
σ2 . Then:
α − α2 1 + α − 1 1 + 1
α
≥ 0
α + α2 + 1 + α − α2 − α − 1 − α ≥ 0 But α + α2 + 1 + α − α2 − α − 1 − α = 0
go back 66 / 66