1 University of the Algarve - ESGHT, Portugal 2 New University of Lisbon - ISEGI, Portugal
Assessing uncertainty of the temporal EBLUP: a resampling-based - - PowerPoint PPT Presentation
Assessing uncertainty of the temporal EBLUP: a resampling-based - - PowerPoint PPT Presentation
Assessing uncertainty of the temporal EBLUP: a resampling-based approach Lus N. Pereira, MsC 1 Pedro S. Coelho, PhD 2 1 University of the Algarve - ESGHT, Portugal 2 New University of Lisbon - ISEGI, Portugal Rome, 11 th July 2008 Outline
2
Background Objectives Methods
– Rao-Yu Model – Uncertainty measures of the EBLUP
Monte Carlo Simulation Study Application with real data Conclusion
Outline
3
Small Area Estimation (SAE):
is about how to produce reliable estimates of domain characteristics when the sample sizes within the domains are very small or even zero.
We need employ indirect estimators that borrow strength from
related areas and time periods through linking models based on auxiliary information, such as recent census and current administrative data.
Such estimators are often base on linear mixed models (LMM)
Background
4
The Empirical Best Linear Unbiased Prediction (EBLUP) approach
is the most popular method for the estimation the parameters
When time-series and cross-sectional data is available, longitudinal
LMM are useful
While EBLUP estimators are easy to obtain, measuring its quality is
a challenging problem
Although there is much theory about measuring the uncertainty of
EBLUP under general LMM, very little research has been done regarding the comparison of the approaches.
Background
5
- To introduce a parametric bootstrap procedure and a weighted
jackknife method for MSPE estimation of the EBLUP, under a cross-sectional and time-series stationary model
- To compare the performance of the resampling-based methods
with the Taylor series-based method
- To apply these methods to real data from the Prices of the
Habitation Transaction Survey
Objectives
6
Rao and Yu (1994) proposed the following model:
is the parameter of inferential interest for the ith small-area at tth time point (i=1, …, m; t=1, …, T) is its design-unbiased direct survey estimator are independent sampling errors normally distributed with (p×1) is a column vector of area-by-time specific auxiliary variables (p×1) is a column vector of regression parameters are random area specific effects with are random area-by-time specific effects with , following a common AR(1) process for each i.
Rao-Yu Model
it it it
e + = θ θ ˆ
it i it it
u v + + ′ = β x θ
it t i it
u u ε ρ + =
−1 ,
1 < ρ
, ,
it
θ
Where
it
θ ˆ
it
e
it
x
β
i
v
( )
2
, ~
v iid i
N v σ
it
u
( )
2
, ~ σ ε N
iid it
) | ( =
it it
e E θ
7
They showed that the model can be expressed in matrix form as:
Rao-Yu Model
e u Zv Xβ θ + + + = ˆ
( )
1 , ~ N e
iid it
( )
( )
i m i
blockdiag Cov V V θ
≤ ≤
= =
1
ˆ
Assuming then: with
T T v i
I J Γ V + + =
2 2
σ σ where with
( )
m v
N I v
2
, ~ σ
( )
Γ I u ⊗
m
N
2
, ~ σ
( )
R e , ~ N ) (
2 1 ; 1 it T t m i
diag σ
≤ ≤ ≤ ≤
= R with
( )
{ }
2
1 ρ ρ − =
− j i
Γ
Assuming that is known then:
[ ]′
= ) ( ), (
2 2
ρ σ ρ σ v ψ
ρ
8
The BLUP estimator (Rao and Yu, 1994):
is the tth row of
Rao-Yu Model
,
Where
t
γ
( )
) ~ ˆ ( ) ( ~ ~
1 2 2
β X θ V γ 1 β x ψ
i i i t T v it it
− ′ + + ′ =
−
σ σ θ
( )
{ }
2
1 ρ ρ − =
− j i
Γ
were estimated through a method of moments ψ
( )
θ V X X V X β
1 1 1
ˆ ~
− − −
′ ′ =
The EBLUP estimator (Rao and Yu, 1994):
) ˆ ˆ ( ˆ ) ˆ ˆ ( ˆ ) ˆ ( ˆ ~
1 2 2
β X θ V γ 1 β x ψ
i i i t T v it it
− ′ + + ′ =
−
σ σ θ
Where
( )
θ V X X V X β
1 1 1
ˆ ˆ ˆ ˆ
− − −
′ ′ =
9
Analytical approximation of the MSPE
Under the normality of the , and (Kackar and Harville, 1984): In the context of the Rao-Yu model (Rao and Yu, 1994) cannot be analytically evaluated.
{ }
it
e
{ }
i
v
{ }
it
u ( )
[ ]
( ) ( )
ψ ψ ψ
it it it
g g MSE
2 1
~ + = θ
( ) ( ) ( ) ( ) ( )
2 2 1
~ ˆ ˆ ~ ˆ ˆ ~ − + + = ψ ψ ψ ψ ψ
it it it it it
E g g MSE θ θ θ
( )
( ) ( )
t T d t T it
g γ 1 V γ 1 ψ
2 2 1 2 2 2 2 2 1
1 σ σ σ σ ρ σ σ
υ υ υ
+ ′ + − − + =
−
( )
( ) [ ] ( ) ( ) [ ]
t T i i it t T i i it it
g γ 1 V X x X V X γ 1 V X x ψ
1 2 2 1 1 2 2 1 2
σ σ σ σ
υ υ
+ ′ − ′ ′ + ′ − =
− − − −
( ) ( )
2
~ ˆ ˆ ~ − ψ ψ
it it
E θ θ
( )
1 O
( )
1 −
m O
, ,
10 Rao and Yu (1994) obtained the following approximation: where At (2×2) is a matrix with: and ( )
1 −
m O
,
( ) ( )
( )
( )
ψ Σ A ψ ψ
it t it it
g tr E
3 * 2
~ ˆ ˆ ~ = ≈ −θ θ
( ) [ ] ( ) [ ]
t T i t i t T i t
a γ 1 ΓV γ V γ 1 ΓV γ
2 2 1 1 2 2 1 11
σ σ σ σ
υ υ
+ − ′ + − =
− − −
( ) [ ] ( ) [ ]
t T i T t i t T i T t
a γ 1 V J 1 V γ 1 V J 1
2 2 1 1 2 2 1 22
σ σ σ σ
υ υ
+ − ′ + − =
− − −
( ) [ ] ( ) [ ]
t T i T t i t T i t
a a γ 1 V J 1 V γ 1 ΓV γ
2 2 1 1 2 2 1 21 12
σ σ σ σ
υ υ
+ − ′ + − = =
− − −
[ ]
) ( ˆ ; ) ( ˆ
2 2 *
ρ σ ρ σ v Cov = Σ
Rao and Yu (1994) proposed the following approximately unbiased estimator:
( ) ( ) ( ) ( )
ψ ψ ψ ψ ˆ 2 ˆ ˆ ˆ ˆ ~
3 2 1 it it it it RY it
g g g mspe + + = θ
Analytical approximation of the MSPE
11
Bootstrap approximation of the MSPE
Following Butar and Lahiri (2003) and González-Manteiga et al. (2005) ideas, the parametric boostrap procedure work as follows: 1) Estimate using the method of moments, and then estimate based on Rao-Yu model 2) Generate 3) Generate , independently of 4) Generate , independently of and . Then construct , assuming that ρ ρ ρ ρ is known 5) Construct the bootstrap data 6) Fit the model to and estimate 7) Estimate from Then fit the model to and estimate
) ( ˆ , ) ( ˆ
2 2
ρ σ ρ σ v
( )
θ ψ β β ˆ , ˆ ˆ ˆ =
*
v
( )
2 *
ˆ , ~
v
N σ v
( )
1 , ~
*
N e
( )
2 *
ˆ , ~ σ N ε
*
v
*
e
*
u
* * * *
ˆ ˆ e u Zv β X θ + + + =
*
ˆ θ
( )
* *
ˆ , ˆ ˆ ˆ θ ψ β β =
B * 2 * 2
ˆ ˆ σ σ e
v
*
ˆ θ
( )
* * *
ˆ , ˆ ˆ ˆ θ ψ β β =
E
*
ˆ θ
12 8) Compute the bootstrap temporal BLUP from 9) Compute the bootstrap temporal EBLUP from 10) Repeat 2)-9) B times: (b=1, …, B) 11) Calculate a boostrap estimate of :
*
ˆ θ
*
ˆ θ
( )
[ ]
) ˆ ˆ ( ˆ ) ˆ ˆ ( ˆ ) ˆ ( ~
* * 1 2 2 * * , B i i i t T v B it it B
β X θ ψ V γ 1 β x ψ − ′ + + ′ =
−
σ σ θ
( ) [ ]
) ˆ ˆ ( ˆ ) ˆ ˆ ( ˆ ) ˆ ( ˆ ~
* * 1 * * 2 * 2 * * * , E i i i t T v E it it E
β X θ ψ V γ 1 β x ψ − ′ + + ′ =
−
σ σ θ
it
g3
( ) ( ) ( )
∑
= −
− =
B b b it B b b it E it
B g
1 2 * , * * , 1 * 3
) ˆ ( ~ ) ˆ ( ˆ ~ ψ ψ θ θ
Following the lines of Butar and Lahiri (2003), a bias corrected bootstrap estimator is:
[ ]
( ) ( )
[ ]
* 3 1 * 2 * 1 1 2 1
) ˆ ( ) ˆ ( ) ˆ ( ) ˆ ( 2 ) ˆ ( ˆ ~
it B b b it b it it it it B it
g g g B g g mspe + + − + =
∑
= −
ψ ψ ψ ψ ψ θ
( ) ( ) ( )
) ˆ ( ~ , ) ˆ ( ˆ ~
* , * * ,
ψ ψ
b it B b b it E
θ θ
Bootstrap approximation of the MSPE
13
Jackknife approximation of the MSPE
Following Jiang et al. (2002) and Chen and Lahiri (2008) ideas, the Taylor series approximation of the jackknife MSPE estimator of the EBLUP is:
( ) ( ) ( ) ( ) [ ] ( ) ( )
[ ]
( )
[ ]
( )
′ ′ − − + + + ∇ ′ − + =
WJ t i i i i t WJ t t WJ it it it J it
tr tr g g g mspe υ ψ L ψ β X y ψ β X y ψ L υ A ψ ψ c ψ ψ ψ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ) ˆ ( ˆ ~
1 2 1
θ where:
( )
∑
= − −
=
m j j it j it WJ
w
1 , ,
ˆ ˆ ˆ ψ ψ c
( )( )
∑
= − −
′ − − =
m j j j it j it WJ
w
1 , ,
ˆ ˆ ˆ ˆ ψ ψ ψ ψ υ
j −
ψ ˆ
is the estimator of after deleting the jth small-area data
it j
w , are weights that satisfy
( )
1 ,
1
−
+ = m O w
it j
ψ
14 Where is the gradient of with:
( )
′ ∂ ∂ ∂ ∂ = ∇
2 1 2 1 1
,
v it it t
g g g σ σ ψ
( ) ( )
t T v i t i t T v it
g γ 1 V γ Γ V γ 1
2 2 1 1 2 2 2 2 1
2 1 1 σ σ σ σ ρ σ + ′ − ′ + + − = ∂ ∂
− −
( ) ( )
t T v i T T i t T v v it
g γ 1 V 1 J V γ 1
2 2 1 1 2 2 2 2
2 1 σ σ σ σ σ + ′ − ′ + + = ∂ ∂
− −
( )
′ ∂ ∂ ∂ ∂ =
2 2 , v t t t
σ σ b b ψ L
is a partitioned matrix (2T×1), with:
( )
ψ
it
g1
( ) [ ]
t T v i t i t
γ 1 ΓV γ V b
2 2 1 1 2
σ σ σ + − = ∂ ∂
− −
( ) [ ]
t T v i T T i v t
γ 1 V J 1 V b
2 2 1 1 2
σ σ σ + − = ∂ ∂
− −
( )
t T v i t
γ 1 V b
2 2 1
σ σ + =
−
Jackknife approximation of the MSPE
m=28 ; T=7 ; p=2
) 2 , 1 ( ′ = β
{ }
. 1 , 5 .
2 ∈ v
σ
{ }
00 . 1 , 50 . , 25 . , 00 .
2 ∈
σ
{ }
8 . , 4 . , 2 . , . ∈ ρ ) 1 , ( ~Unif x
iid it
) , 1 ( ′ =
it it
x x We computed for each data set (l=1, …, 1000):
( ) ( ) ( )
( )
l l it l RY it
mspe ψ ˆ ˆ ~ θ
( ) ( ) ( )
( )
l l it l J it
mspe ψ ˆ ˆ ~
1
θ
( ) ( ) ( )
( )
l l it l J it
mspe ψ ˆ ˆ ~
2
θ
Monte Carlo simulation study
15
m m w
it j
1
,
− =
it j m i T t it it it j it j
w
, 1 1 1 , ,
1 x x x x
− = =
′ ′ − =
∑∑
with with
We generated B=250 bootstrap data sets, and then we computed: For each initial data set (l=1, …, 1000):
( ) ( ) ( )
( )
l l it l B it
mspe ψ ˆ ˆ ~
1
θ
( ) ( ) ( )
∑
= −
− =
B b b it B b b it E it
B g
1 2 * , * * , 1 * 3
) ˆ ( ~ ) ˆ ( ˆ ~ ψ ψ θ θ
The MSPE estimators were evaluated through:
∑∑ ∑
= = = −
× − =
m i T t L l it it l a it
MSPE MSPE mspe L mt AARB
1 1 1 ) ( 1
100 1
( )
∑∑∑
= = =
× − =
m i T t L l it it l a it
MSPE MSPE mspe mTL ARMSE
1 1 1 2 ) (
100 1 a∈{RY, B1, J1, J2}.
Monte Carlo simulation study
The percentage of areas where the relative bias (RB) is negative 16
Table 1: RBN, ARB and ARMSE of MSPE estimators, ρ=0.0
Monte Carlo simulation study
PR-MSPE B1-MSPE J1-MSPE J2-MSPE RBN (%) 0.5 77.679 54.607 38.000 47.500 1.0 74.464 57.964 69.143 61.750 AARB (%) 0.5 25.042 35.209 22.617 22.575 1.0 16.127 16.901 13.781 12.851 ARMSE (%) 0.5 4.464 8.789 4.313 3.888 1.0 3.089 4.295 2.093 2.012
2 v
σ
17
77.679 74.464 22.575 12.851 3.888 2.012
Figure 1: Box-and-whisker plots of RB of MSPE estimators, ρ=0.0
Monte Carlo simulation study
18
Monte Carlo simulation study
Table 2: RBN, ARB and ARMSE of MSPE estimators, ρ=0.2
RY-MSPE B1-MSPE J1-MSPE J2-MSPE RBN (%) 0.25 0.5 49.408 40.194 51.449 46.235 1.0 44.184 41.378 39.408 39.663 0.50 0.5 48.490 42.265 44.867 44.673 1.0 39.837 24.224 37.347 36.939 1.00 0.5 41.133 19.214 38.694 38.204 1.0 31.500 10.622 30.010 29.286 AARB (%) 0.25 0.5 26.388 32.833 30.168 31.100 1.0 26.700 30.287 32.105 31.985 0.50 0.5 16.401 26.399 17.374 17.278 1.0 17.176 24.605 18.333 18.286 1.00 0.5 9.770 14.992 10.156 10.122 1.0 10.854 21.099 11.399 11.431 ARMSE (%) 0.25 0.5 2.886 4.578 4.466 4.678 1.0 2.999 3.866 4.786 4.774 0.50 0.5 1.677 1.690 1.864 1.838 1.0 1.762 3.582 2.006 1.993 1.00 0.5 0.752 1.619 0.811 0.805 1.0 0.903 3.242 0.995 1.001
2
σ
2 v
σ
19
26.388 26.700 16.401 17.176 9.770 10.854 2.886 2.999 1.677 1.762 0.752 0.903
Monte Carlo simulation study
Figure 2: Box-and-whisker plots of RB of MSPE estimators, ρ=0.2
20
Table 3: RBN, ARB and ARMSE of MSPE estimators, ρ=0.4
Monte Carlo simulation study
RY-MSPE B1-MSPE J1-MSPE J2-MSPE RBN (%) 0.25 0.5 76.541 71.980 67.612 67.367 1.0 73.898 67.408 65.296 67.449 0.50 0.5 73.337 66.969 64.337 63.908 1.0 74.857 68.724 67.949 67.612 1.00 0.5 51.418 44.806 47.235 47.020 1.0 56.816 46.520 52.735 52.449 AARB (%) 0.25 0.5 31.607 35.633 38.391 38.503 1.0 28.119 28.938 33.028 33.086 0.50 0.5 22.858 21.495 25.819 25.864 1.0 20.858 17.046 23.231 23.248 1.00 0.5 12.236 11.776 13.477 13.508 1.0 11.549 10.464 12.758 12.785 ARMSE (%) 0.25 0.5 4.587 6.693 7.559 7.632 1.0 3.872 4.715 5.812 6.005 0.50 0.5 3.394 3.453 4.854 4.881 1.0 3.002 2.455 4.809 4.849 1.00 0.5 1.176 1.092 1.439 1.446 1.0 1.108 0.918 1.351 1.357
2 v
σ
21
2
σ
35.633 28.938 21.495 17.046 11.776 10.464 6.693 4.715 3.453 2.455 1.092 0.918
Monte Carlo simulation study
Figure 3: Box-and-whisker plots of RB of MSPE estimators, ρ=0.4
22
23
Application with real data
Real time series obtained from the Prices of the Habitation Transaction
Survey (PHTS) and the Prices of Bank Evaluation in the Habitation Survey (PBEHS) were used
Data available on a quarter basis (T=7) Main goal - mean price of the habitation transaction at NUT III level Auxiliary variable – mean price of bank evaluation at NUT III level 28 NUT III were used as domains of interest (m=28)
24
Application with real data
Domains ni µ µ µ µit CV anal. Domains ni µ µ µ µit CV anal. 1 1 646 5,9% 15 39 735 3,8% 2 1 714 5,3% 16 34 867 3,6% 3 7 661 5,6% 17 17 814 3,5% 4 6 718 5,2% 18 40 876 3,4% 5 31 763 5,1% 19 12 960 3,4% 6 18 704 5,1% 20 24 974 3,1% 7 19 670 4,9% 21 77 937 2,5% 8 56 756 4,6% 22 49 866 2,5% 9 12 769 4,4% 23 26 1128 2,4% 10 23 820 4,4% 24 90 956 1,8% 11 19 804 4,3% 25 89 1172 1,6% 12 17 666 4,2% 26 405 1041 1,3% 13 22 710 4,1% 27 263 1073 1,0% 14 27 658 3,9% 28 488 1321 0,7%
Table 4: Sample size, mean estimates and coefficients of variation for the EBLUP estimator
25
Application with real data
Figure 4: Coefficients of variation for the EBLUP estimator
26
Conclusion
- It is difficult to find one MSPE estimator which performs better than
the others on bias and precision behaviour;
- All estimators have absolute relative bias and MSE’s of the same
- rder of magnitude;
- The resampling-based approaches outperform the asymptotic
analytical approximation in several situations;
- The bootstrap estimator tends to show a similar performance to the
jackknife estimators;
- It seems suitable to use resampling-based methods in order to
estimate the uncertainty of the temporal EBLUP as an alternative to estimators based on long analytical developments.
27
Further Research
- Assess different measures of uncertainty for:
–
Different number of small areas and time points;
–
Unknown ρ
- Use resampling-based methods under more complex longitudinal
small area models in which it is impossible to obtain analytical approximations of MSPE of the EBLUP.
28
References
Butar, F.B., & Lahiri, P. (2003). On measures of uncertainty of empirical Bayes small area
- estimators. Journal of Statistical Planning and Inference, 112, 63-76.
Chen, S., & Lahiri, P. (2008). On mean squared prediction error estimation in small area estimation
- problems. Communications in Statistics – Theory and Methods, 37, 1792-1798.
González-Manteiga,W., Lombardía,M., Molina,I., Morales,D., & Santamaría,L. (2008). Bootstrap mean squared error of a small-area EBLUP. Journal of Statistical Computation and Simulation, 78, 443-462. Jiang, J., Lahiri, P., & Wan, S.-M. (2002). A unified jackknife theory for empirical best prediction with M-estimation, The Annals of Statistics, 30, 1782-1810. Kackar, R.N., & Harville, D.A. (1984). Approximations for standard errors of estimators of fixed and random effects in mixed linear models. Journal of the American Statistical Association, 79, 853- 862. Prasad, N.G.N., & Rao, J.N.K. (1990). The estimation of the mean squared error of small-area
- estimators. Journal of the American Statistical Association, 85, 163-171.
Rao, J.N.K., & Yu, M. (1994). Small-area estimation by combining time-series and cross-sectional
- data. The Canadian Journal of Statistics, 22(4), 511-528.