Robust Fay Herriot Estimators in Small Area Estimation Sebastian - - PowerPoint PPT Presentation

robust fay herriot estimators in small area estimation
SMART_READER_LITE
LIVE PREVIEW

Robust Fay Herriot Estimators in Small Area Estimation Sebastian - - PowerPoint PPT Presentation

Robust Fay Herriot Estimators in Small Area Estimation Sebastian Warnholz Statistical Consultancy FU Berlin 5th May 2016 Outline Small Area Estimation Area Level Models Robust Area Level Models Example & Simulation Study


slide-1
SLIDE 1

Robust Fay Herriot Estimators in Small Area Estimation

Sebastian Warnholz Statistical Consultancy – FU Berlin 5th May 2016

slide-2
SLIDE 2

Outline

◮ Small Area Estimation ◮ Area Level Models ◮ Robust Area Level Models ◮ Example & Simulation Study

, S3RI Research Seminars 2

slide-3
SLIDE 3

Small Area Estimation

◮ SAE: Estimation of population parameters for small domains / areas ◮ Problem: Direct estimations may have insufficient precision

(variance)

◮ Estimations may be based on survey data which was not designed to

make predictions for small domains

◮ Very view or no sampled units are available within target domains

◮ Methods used in SAE borrow strength to improve domain

predictions by

◮ using additional data sources ◮ exploiting correlation structures (space and time) ◮ often models , S3RI Research Seminars 3

slide-4
SLIDE 4

Models in SAE

◮ Area level models:

◮ Use information on the area level, e.g. aggregates like a direct

estimator

◮ Are used when unit level information is not available ◮ May be useful to reduce computational complexity

◮ Unit level models:

◮ Use the sampled observations directly ◮ May provide more precise parameter estimates due to increased

number of observations

, S3RI Research Seminars 4

slide-5
SLIDE 5

Area Level Models

◮ Fay and Herriot (1979):

◮ ¯

yi = θi + ei; ei ∼ N(0, σ2

ei); i = 1, . . . , D

◮ θi = x⊤

i β + vi; vi ∼ N(0, σ2 v)

◮ And combined, an estimator for the population mean can be derived:

ˆ θFH

i

= ˆ γi¯ yi + (1 − ˆ γi)x⊤

i ˆ

β with ˆ γi = ˆ σ2

v

ˆ σ2

v + σ2 ei

◮ When σ2

ei >> ˆ

σ2

v we rely more on the synthetic estimator

◮ When σ2

ei << ˆ

σ2

v the direct estimator is preferred

◮ σ2

ei is assumed to be known under the model – in practice we may

use the sampling variance

, S3RI Research Seminars 5

slide-6
SLIDE 6

Outliers in Area Level Models

¯ yi = x⊤

i β + vi + ei ◮ Area level outliers are outliers in the random effect: vi – i.e. all

units within a domain are outlying

◮ Here a robust method can be beneficial

◮ Unit level outliers are outliers in ei – single units

◮ We may use estimated sampling variances for σ2

ei; then the FH model

will automatically plug-in the synthetic estimator

◮ When the sampling variances are unreliable they may be replaced

using a more stable estimate based on generalised variance functions

, S3RI Research Seminars 6

slide-7
SLIDE 7

Robust Area Level Methods – Review

◮ When framed as a violation of the distributional assumption (of vi):

◮ Transform the response, i.e. the direct estimator – Sugasawa and

Kubokawa (2015)

◮ Replace the distribution (e.g.) ◮ generalised normal: Fabrizi and Trivisano (2010) ◮ t-distribution: Bell and Huang (2006) ◮ Cauchy distribution: Datta and Lahiri (1995)

◮ When we still believe in the normal distribution:

◮ Use influence functions in the context Hierarchical Bayes: Ghosh,

Maiti and Roy (2008)

◮ Use influence functions in the context of linear mixed models: Sinha

and Rao (2009)

◮ M-Quantile regression: Chambers and Tzavidis (2006)

, S3RI Research Seminars 7

slide-8
SLIDE 8

Robust Area Level Methods – Method

◮ Here the method by Sinha and Rao (2009) is adapted for area level

models framed as linear mixed model y ∼ N(Xβ, ZVvZ⊤ + Ve

  • V

)

◮ Restrict the influence of the residuals in ML estimation equations.

E.g. for the regression parameters we use: X⊤V−1U

1 2 ψ(U− 1 2 (y − Xβ)) = 0

instead of X⊤V−1 (y − Xβ) = 0

, S3RI Research Seminars 8

slide-9
SLIDE 9

Robust Area Level Methods – Method

◮ Solving these robust estimation equations leads to outlier robust

parameter estimates, ˆ βψ and σ2,ψ

v

, and outlier robust predictions: ˆ vψ

i

ˆ θRFH

i

= x⊤

i ˆ

βψ + ˆ vψ

i ◮ In the setting of linear mixed models this representation is the

robust empirical best linear unbiased prediction (REBLUP)

◮ The MSE of these predictions can be computed using a parametric

bootstrap or an approximation based on the results of Chambers, Chandra and Tzavidis (2011)

, S3RI Research Seminars 9

slide-10
SLIDE 10

Robust Area Level Methods – Extensions

◮ Framed as linear mixed effects models we can incorporate spatial

and temporal correlation in the random effects:

◮ Simultanous autoregressive process – Pratesi and Salvati (2008) ◮ Random intercept + temporal autocorrelation – Rao and Yu (1994) ◮ Combining spatial and temporal correlation – Marhuenda et.al.

(2013)

◮ The same idea for robust predictions can be used for these methods

, S3RI Research Seminars 10

slide-11
SLIDE 11

Robust Area Level Methods – Optimisation

◮ Sinha and Rao (2009) derived Newton-Raphson algorithms based

  • n a Taylor series expansion of the estimation equations (unit level

models)

◮ Schmid (2011) minimised the squared estimation equations for

variance components – more stable

◮ Schoch (2012) uses a IRWLS algorithm for β and a robust method

  • f moments estimator for the variance parameters – more stable for

starting values

◮ Chatrchi (2012) uses a fixed point algorithm for variance

components – slow but stable for starting values

◮ For area level models:

◮ IRWLS algorithm for the regression parameters ◮ Fixed-point algorithm for the random effects ◮ For variance components: ◮ Fixed point algorithm for variances ◮ Newton-Raphson for correlation parameters , S3RI Research Seminars 11

slide-12
SLIDE 12

Robust Area Level Methods – Software

◮ R-packages:

◮ rsae – implements the methods by Schoch (2012) for unit level

models

◮ saeRobust (about to be released) – implements the presented

methods for

◮ Standard RFH ◮ Spatial RFH ◮ Temporal RFH ◮ Spatio-Temporal RFH , S3RI Research Seminars 12

slide-13
SLIDE 13

CBS Data Example

◮ The target statistic is the mean tax turnover of 20 industry sectors

in the Netherlands

◮ Available is a synthetic population with 63981 observations

◮ Based on the Structural Business Survey (SBS) which is an annual

survey in the Netherlands conducted by CBS

◮ In this example one sample is drawn similar to the design in the

SBS:

◮ Stratified for the size class (employee) of firms ◮ SRSWOR within each stratum ◮ Large firms are selected with probability one

◮ Sample sizes range between 9 and 1052; 5074 overall ◮ This is repeated 500 times and compared to the population

parameters

, S3RI Research Seminars 13

slide-14
SLIDE 14

Modeling Strategy

¯ yi = β0 + β1¯ yi,t−1 + vi + ei

◮ ¯

yi is the direct estimator based on the HT estimator

◮ ¯

yi,t−1 is the true tax turnover from the previous period

◮ The sampling variances under the FH model, σ2 ei, are either based

  • n the estimated standard error of the direct estimator; or

smoothed using a generalised variance function

, S3RI Research Seminars 14

slide-15
SLIDE 15

QQ Plots

−0.04 −0.02 0.00 0.02 Random Effects

RFH FH

−3 −2 −1.00 1 2 −2 −1 1 2 theoretical residuals / sqrt(samplingVar) −2 −1 1 2 theoretical

, S3RI Research Seminars 15

slide-16
SLIDE 16

Coefficient of Variation

20 40 60 5 10 15 20 domain (sorted by increasing CV of direct) CV in %

direct eblup reblup

direct eblup reblup

, S3RI Research Seminars 16

slide-17
SLIDE 17

RBIAS & RRMSE

Direct FH RFH FH.GVF RFH.GVF −20 −10 RBIAS in % 20 40 60 80 RRMSE in %

, S3RI Research Seminars 17

slide-18
SLIDE 18

Discussion

◮ Outlier robust predictions may be beneficial to address area level

  • utliers

◮ Unit level outliers? ◮ MSE estimation is problematic in scenarios where the estimated

variance of the random effect is very small

, S3RI Research Seminars 18

slide-19
SLIDE 19

Thank you for your attention! Sebastian Warnholz (Sebastian.Warnholz@fu-berlin.de)

, S3RI Research Seminars 19

slide-20
SLIDE 20

Bibliography

◮ Bell / Huang (2006): Using the t-distribution to Deal with Outliers

in Small Area Estimation, Proceedings of Statistics Canada Symposium 2006: Methodological Issues in Measuring Population Health

◮ Chatrchi (2012): Robust Estimation of Variance Components in

Small Area Estimation, MA thesis, School of Mathematics and Statistics, Carleton University, Ottawa, Canada

◮ Datta / Lahiri (1995): Robust Hierarchical Bayes Estimation of

Small Area Characteristics in the Presence of Covariates and Outliers, Journal of Multivariate Analysis 54, pp. 310–328

◮ Ghosh / Maiti / Roy (2008): Influence functions and robust Bayes

and empirical Bayes small area estimation. Biometrika 95.3,

  • pp. 573–585

◮ Fabrizi / Trivisano (2010): Robust Linear Mixed Models for Small

Area Estimation, Journal of Statistical Planning and Inference 140, 433–43

, S3RI Research Seminars 20

slide-21
SLIDE 21

Bibliography

◮ Fay / Herriot (1979): Estimation of income for small places: An

application of james-stein procedures to census data, Journal of the American Statistical Association 74 (366), 269–277

◮ Gershunskaya (2010): Robust Small Area Estimation Using a

Mixture Model, Section on Survey Methods, JSM, 2783-2796

◮ Marhuenda / Molina / Morales (2013): Small area estimation with

spatio-temporal Fay-Herriot models, Computational Statistics and Data Analysis 58, pp. 308–325

◮ Pratesi / Salvati (2008): Small area estimation: the EBLUP

estimator based on spatially correlated random area effects, Statistical Methods & Applications 17, pp. 113–141

◮ Rao / Yu (1994): Small-Area Estimation by Combining Time-Series

and Cross-Sectional Data, Canadian Journal of Statistics 22.4,

  • pp. 511–528

◮ Schmid (2011): Spatial Robust Small Area Estimation applied on

Business Data, PhD thesis, University of Trier

, S3RI Research Seminars 21

slide-22
SLIDE 22

Bibliography

◮ Schoch (2012): Robust Unit-Level Small Area Estimation: A Fast

Algorithm for Large Datasets, Austrian Journal of Statistics 41 (4),

  • pp. 243–265

◮ Sinha / Rao (2009): Robust small area estimation, The Canadian

Journal of Statistics 37 (3), 381–399

◮ Sugasawa / Kubokawa (2015): Parametric transformed Fay–Herriot

model for small area estimation, Journal of Multivariate Analysis 139, 295-311

◮ Wolter (2007): Introduction to Variance Estimation, Springer

, S3RI Research Seminars 22

slide-23
SLIDE 23

Mean-difference Plot

−1 1 2 1 2 (direct + reblup) / 2 direct − reblup

RFH

−1 1 2 1 2 3 (direct + eblup) / 2 direct − eblup

FH

, S3RI Research Seminars 23

slide-24
SLIDE 24

Quality Measures

◮ Relative Root Mean Square Error:

RRMSE m

i

=

  • 1

R

R

  • r=1

ˆ

θm

i,r − θi,r

θi,r

2

◮ Relative Bias:

RBIASm

i

= 1 R

R

  • r=1

ˆ

θm

i,r − θi,r

θi,r

  • ,

S3RI Research Seminars 24

slide-25
SLIDE 25

Estimation Equations

◮ For variance parameter δl:

ψ(r)⊤U

1 2 V−1 ∂V

∂δl V−1U

1 2 ψ(r) − tr

  • KV−1 ∂V

∂δl

  • = 0

◮ For random effects:

Z⊤V−1

e U

1 2

e ψ

  • U

− 1

2

e

(y − Xβ − Zv)

  • − V−1

v U

1 2

u ψ

  • U

− 1

2

u v

  • = 0

, S3RI Research Seminars 25

slide-26
SLIDE 26

Algorithms

β(m+1) =

  • X⊤V−1W1(β(m))X

−1 X⊤V−1W1(β(m))y

v(m+1) =

  • Z⊤V−1

e W2(v(m))Z + V−1 v W3(v(m))

−1

× Z⊤V−1

e W2(v(m)) (y − Xβ)

, S3RI Research Seminars 26