CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVEYS Rodolphe - - PowerPoint PPT Presentation

calibration of small area estimates in business surveys
SMART_READER_LITE
LIVE PREVIEW

CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVEYS Rodolphe - - PowerPoint PPT Presentation

CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVEYS Rodolphe Priam, Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton United Kingdom SAE, August 2011 The BLUE-ETS Project is financed by the grant


slide-1
SLIDE 1

Trier- August 2011 Page 1

CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVEYS

Rodolphe Priam, Natalie Shlomo

Southampton Statistical Sciences Research Institute University of Southampton United Kingdom

SAE, August 2011

The BLUE-ETS Project is financed by the grant agreement no: 244767 under Theme 8 of the 7th Framework Programme (FP7) of the European Union, Socio-economic Sciences and Humanities.

slide-2
SLIDE 2

Trier- August 2011 Page 2

BUSINESS SURVEYS

  • Statistical units are organisational entities in a country
  • Interested in small area/domain estimates
  • Business registers allow for unit level covariates
  • Distributions are typically skewed with outliers
  • Transformations, such as the log, to ensure normality

assumptions

slide-3
SLIDE 3

Trier- August 2011 Page 3

SMALL AREA ESTIMATION

  • Central problem in many areas of social statistics. Recently

used in business statistics.

  • Estimation of the mean in diverse domains

1

Y

2

Y

i

Y

m

Y

M

Y

w

Y ;

1

ˆ

w

Y ;

2

ˆ

w i

Y ; ˆ

w m

Y

;

ˆ

w M

Y

;

ˆ

  • True population mean
  • and design-based estimate
  • Estimated small area mean (EBLUP)

because of small

area i

… …

slide-4
SLIDE 4

Trier- August 2011 Page 4

SMALL AREA ESTIMATION AND BENCHMARKING

  • Small area estimation of the total in the different domains

1

Y

2

Y

i

Y

m

Y

M

Y

y ; 1

ˆ θ

y ; 2

ˆ θ

y i;

ˆ θ

y m;

ˆ θ

y M ;

ˆ θ Problem: The total estimated by the model

  • =

i y i i y

w T

;

ˆ ~ θ

should match the design based estimate of the population total

  • =

i w i i y

Y w T

;

ˆ ˆ

.

  • Solution by benchmarking the estimates by appropriate method
  • Consequence of more robust estimation to misspecifications of

the model.

… …

slide-5
SLIDE 5

Trier- August 2011 Page 5

NESTED ERROR UNIT LEVEL MODEL

  • The Battese, Harter and Fuller (1988) (BHF) model

for small areas i=1, …, M:

i i N i i

u

i

e 1 X Y + + = β

  • The target parameter of interest is the area mean:

i i N i

N Y Y

i

/ 1 ′ =

  • The EBLUP for non-negligible sampling fractions:

( )[

]

ˆ ˆ 1 ˆ

; i GLS ic i i i f y i

u X f y f + ′ − + = β θ

slide-6
SLIDE 6

Trier- August 2011 Page 6

BENCHMARKING AT THE LINEAR SCALE (1/2)

  • Existing methods considered (see for instance Wang & al. (2008))

The ratio method by multiplicative term:

f y i f y y RT y i

T T

; 1 ;

ˆ ~ ˆ ˆ θ θ

=

An additive term with variance weighting:

( ) ( )(

)

f y y m i i e u i i e u i f y i VAR y i

T T n N n N ~ ˆ / ˆ ˆ / ˆ ˆ ˆ ˆ

1 2 2 2 2 2 ; ;

− + + + =

=

σ σ σ σ θ θ

Pfeffermann and Barnard (1991): ( )[

]

ˆ ˆ 1 ˆ

; PB i PB ic i i i PB y i

u X f y f + ′ − + = β θ

where ( )

R RC R r R C

PB

′ − ′ − = / ˆ ˆ ˆ η η η

,

) ˆ ,..., ˆ , ˆ ( ˆ

1

′ ′ =

M GLS

u u β η

,

y n T r

y −

= ˆ

,

r R

PB =

η ˆ

,

( )

M m m m M i i i

N N n N n N n N X N R , , , , , , ,

1 2 2 1 1 1

  • +

=

− − − = Ugarte & al. (2009) applied this constrained model for a business survey for several regions with variance calculations

slide-7
SLIDE 7

Trier- August 2011 Page 7

BENCHMARKING AT THE LINEAR SCALE (2/2)

  • We propose the method

Augmentation of the unconstrained least-squares system by adding to the original GLS system one row and one column:

a PSW a a s a PSW a a a s a s

e X X e w X w X y y +

= +

=

  • +

+ + +

β β

; ; ; ; ;

where,

( )′

′ ′ ′ =

a m a a a

w w w w

; ; 2 ; 1

, , ,

  • ;

( )

Ni i i a i

n N w 1 1 /

;

× − =

;

( ){

}

=

+

′ − + ′ − − = ′

m i a i i a ic i i a

x X n N X

1 ; ; ;

) 1 ˆ 2 ( γ

;

( )(

)

( )

( )

=

+

− + − − =

m i i i i i i a

y n N n n N y

1 ;

/ 1 1 ˆ 2γ

;

( )

. / ) 1 ˆ ( 2

1 2 ;

=

+

− − =

m i i i i i a

n n N w γ

  • The benchmarking equation is obtained by orthogonality of the

residual to the new added column

slide-8
SLIDE 8

Trier- August 2011 Page 8

SIMULATION FOR LINEAR CASE

  • Nested error unit level regression model
  • B=1000 populations generated
  • M = 30 areas (no empty areas)
  • 4%

fi ≈

  • 0.1

=

u

σ , 0.3 =

e

σ , and

T

) 25 . , 2 ( = β

  • )

s , (m N ~ x

i i ij

; N(10,3) ~ mi ; 2 = si

ONE POPULATION GENERATED TWO AREAS IN THE POPULATION

slide-9
SLIDE 9

Trier- August 2011 Page 9

SIMULATION RESULT FOR LINEAR CASE (1/2)

1 2 3 4 5

f y i;

ˆ θ

RT y i;

ˆ θ

VAR y i;

ˆ θ

PB y i;

ˆ θ

PSW y i;

ˆ θ

BIASREL 0.06% 0.58% 0.60% 0.60% 0.60% AARB 0.04% 0.60% 0.62% 0.62% 0.62% ARMSE 1.31% 1.45% 1.46% 1.46% 1.47% DIFFTOT 4.0x102 0.000 0.000 0.000 0.000

1 EBLUP 2 Ratio Benchmark 3 Variance Weighted Benchmark 4 Pfeffermann and Barnard Benchmark 5 Proposed Method Benchmark

slide-10
SLIDE 10

Trier- August 2011 Page 10

SIMULATION RESULT FOR LINEAR CASE (2/2)

  • 0.004
  • 0.002

0.002 0.004 0.006 0.008 0.01 0.012 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5

1 EBLUP 2 Ratio Benchmark 3 Variance Weighted Benchmark 4 Pfeffermann and Barnard Benchmark 5 Proposed Method Benchmark

slide-11
SLIDE 11

Trier- August 2011 Page 11

LOG TRANSFORMATION FOR SKEWED VARIABLE

  • In BHF model,

i i ij ij

u e x y + + = β

  • In business surveys, distributions are skewed
  • Log normal transformation

( )

i i ij ij

u e x exp z + + = β

  • New formulation of the predictors
slide-12
SLIDE 12

Trier- August 2011 Page 12

BACK-TRANSFORMATION WITH BIAS CORRECTION

  • Formulation of a nearly unbiased estimator is:

( ) ∈

+ − + =

i i s

U j i ij i i i sum f z i

y f z f

\ , ;

) ˆ ˆ exp( 1 ˆ α θ

(1)

The bias correction is

i

α ˆ and can be defined at the unit level or area level (see Chambers, Dorfman (2003) and Molina (2009))

  • Other formulation from Kurnia, Notodiputro, Chambers (2009):

) ~ ˆ exp( ˆ

* ; *,exp ; i y i z i

α θ θ + =

(2)

  • The bias correction is the modified term at the area level

i

α ~

  • We propose the corrective term

2

~

i

α and compare to

1

~

i

α

where

i

Σ ˆ is the covariance matrix of the covariates.

slide-13
SLIDE 13

Trier- August 2011 Page 13

BACK-TRANSFORMATION WITH BIAS CORRECTION

  • Approaches under model (1)
  • Chambers, Dorfman (2003) introduce several estimators: the rast

predictor and smearing predictor

  • Fabrizi, Ferrante, Pacei (2007) compare estimators to a naïve

predictor without a bias correction. The twiced smeared estimator performed best in simulation

  • Chandra, Chambers (2011) discuss calibration after a log-

transformation

slide-14
SLIDE 14

Trier- August 2011 Page 14

BENCHMARKING AFTER BACK-TRANSFORMATION

Compare benchmarking at different stages with back transformation and bias correction by: (a)

( ) 2

/ ˆ ˆ ˆ

2 2 e u i

σ σ α + =

  • r (b)

2 / ˆ ˆ ˆ ˆ ~

2

β β α α

i i i

Σ ′ + =

  • Ratio method under different scenarios

No benchmark at log scale, back-transformed method (2), bias correction (a)

RT f z i , ;

ˆ θ Benchmark at log scale, back-transformed method (2), bias correction (a)

RT VAR z i , ;

ˆ θ

RT PB z i , ;

ˆ θ

RT PSW z i , ;

ˆ θ No benchmark at log scale, back-transformed method (1), bias correction (a)

RT sum f z i , , ;

ˆ θ No benchmark at log scale, back- transformed method (2), bias correction (b)

RT f z i , 2 ;

ˆ θ

  • A maximization of the log-likelihood of the BHF model under

constraints, back transformed method (2) and bias correction (b)

MLC z i;

ˆ θ

slide-15
SLIDE 15

Trier- August 2011 Page 15

SIMULATION RESULT FOR NON-LINEAR CASE (1/2)

NOT BENCHMARKED BENCHMARKED 1a 2a 3a 4a 5a 6a 1b 2b 3b 4b 5b 6b 7b

sum f z i , ;

ˆ θ

f z i;

ˆ θ

2 ;

ˆ f

z i

θ

VAR z i;

ˆ θ

PB z i;

ˆ θ

PSW z i;

ˆ θ

RT sum f z i , , ;

ˆ θ

RT f z i , ;

ˆ θ

RT f z i , 2 ;

ˆ θ

RT VAR z i , ;

ˆ θ

RT PB z i , ;

ˆ θ

RT PSW z i , ;

ˆ θ

MLC z i;

ˆ θ

BIASREL

0.39% 11.16% 0.47% 8.77% 8.77% 8.75% 2.99% 2.84% 3.03% 2.83% 2.87% 2.90% 2.58%

AARB

0.66% 10.89% 0.28% 8.50% 8.49% 8.49% 3.30% 3.15% 3.34% 3.15% 3.18% 3.20% 2.89%

ARMSE

5.81% 12.05% 5.75% 10.01% 10.01% 10.02% 6.87% 6.84% 6.90% 6.84% 6.86% 6.90% 6.69%

DIFFTOT

5.6x104 3.0x105 7.1x104 2.5x105 2.5x105 2.5x105

0.00 0.00 0.00 0.00 0.00 0.00 0.00

  • No benchmark at log scale, back-transformed method (2) , ,bias correction (a) , ratio adjusted
  • Benchmark at log scale, back- transformed method (2) , bias correction (a), ratio adjusted
  • No benchmark at log scale, back- transformed method (1) , bias correction (a) , ratio adjusted
  • No benchmark at log scale, back- transformed method (2) , bias correction (b), ratio adjusted
  • MLC adjustment, back- transformed method (2) , bias correction (b)
slide-16
SLIDE 16

Trier- August 2011 Page 16

SIMULATION RESULT FOR NON-LINEAR CASE (2/2)

  • 0.14
  • 0.12
  • 0.1
  • 0.08
  • 0.06
  • 0.04
  • 0.02

0.02 0.04 0.06 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

1a 2a 3a 4a 5a 6a 1b 2b 3b 4b 5b 6b 7b

A B C D

Group A: All benchmark estimates to original scale using the Ratio Method or the MLC method (‘1b’ – ‘7b’) Group B: No benchmark, back- transformed method (1) and bias correction (a) (‘1a’) and back- transformed method (2) and bias correction (b) (‘3a’) Group C: Benchmark at log-scale and no benchmark to original scale, back- transformed method (2) and bias correction (a) (‘4a’, ‘5a’, ‘6a’) Group D: No benchmark, back-transformed method (2) and bias correction (a) (‘2a’)

slide-17
SLIDE 17

Trier- August 2011 Page 17

CONCLUSION

  • We have used the nested error unit level regression model
  • Benchmarking methods for the linear case perform similarly
  • Benchmarking methods for non-linear case differ depending on

back-transformation and stage of benchmarking

  • Ratio

adjustment to benchmarked log-scale and back transformation provide comparable results to the case when log- scale is not benchmarked

  • Future research:
  • Performance under more realistic populations, empty areas
  • Comparison with alternative methods, for example robust methods
  • f small area models

Inclusion of survey weights, variance estimates Thanks for your attention