Estimation of Complex Small Area Parameters with Application to - - PowerPoint PPT Presentation

estimation of complex small area parameters with
SMART_READER_LITE
LIVE PREVIEW

Estimation of Complex Small Area Parameters with Application to - - PowerPoint PPT Presentation

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS Estimation of Complex Small Area Parameters with Application to Poverty Indicators J.N.K. Rao School of Mathematics and Statistics, Carleton University


slide-1
SLIDE 1

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

Estimation of Complex Small Area Parameters with Application to Poverty Indicators

J.N.K. Rao

School of Mathematics and Statistics, Carleton University

(Joint work with Isabel Molina) 1

slide-2
SLIDE 2

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS 2

slide-3
SLIDE 3

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

NOTATION

  • U finite population of size N.
  • Population partitioned into D subsets U1, . . . , UD of sizes

N1, . . . , ND, called domains or areas.

  • Variable of interest Y .
  • Ydj value of Y for unit j from domain d.
  • Target: to estimate domain parameters.

δd = h(Yd1, . . . , YdNd), d = 1, . . . , D.

  • We want to use data from a sample S ⊂ U of size n drawn

from the whole population.

  • Sd = S ∩ Ud sub-sample from domain d of size nd.
  • Problem: nd small for some domains.

3

slide-4
SLIDE 4

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

DIRECT ESTIMATORS

  • Direct estimator: Estimator that uses only the sample data

from the corresponding domain.

  • Small area/domain: subset of the population that is target
  • f inference and for which the direct estimator does not have

enough precision.

  • What does “enough precision” mean? Some National

Statistical Offices (GB, Spain) allow a maximum coefficient of variation of 20 %.

  • Indirect estimator: Borrows strength from other areas.

4

slide-5
SLIDE 5

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

NESTED-ERROR REGRESSION MODEL

  • Model: xdj auxiliary variables at unit level,

Ydj = x′

djβ + ud + edj,

ud

iid

∼ N(0, σ2

u),

edj

iid

∼ N(0, σ2

e).

  • Vector of variance components:

θ = (σ2

u, σ2 e)′

  • BLUP of ¯

Yd: Predict non-sample values ˆ Ydj = x′

dj ˆ

βWLS + ˆ ud. ˆ ¯ Y BLUP

d

= 1 Nd  

j∈sd

Ydj +

  • j∈rd

ˆ Ydj   , d = 1, . . . , D.

  • Empirical BLUP (EBLUP): ˆ

θ estimator of θ ˆ ¯ Y EBLUP

d

= ˆ ¯ Y BLUP

d

(ˆ θ) Battese, Harter & Fuller (1988), JASA 5

slide-6
SLIDE 6

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

SOME POVERTY AND INCOME INEQUALITY MEASURES

  • FGT poverty indicator
  • Gini coefficient
  • Sen index
  • Theil index
  • Generalized entropy
  • Fuzzy monetary index

6

slide-7
SLIDE 7

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

FGT POVERTY INDICATORS

  • Edj welfare measure for indiv. j in domain d: for instance,

equivalised annual net income.

  • z = poverty line.
  • FGT family of poverty indicators for domain d:

Fαd = 1 Nd

Nd

  • j=1

z − Edj z α I(Edj < z), α = 0, 1, 2. When α = 0 ⇒ Poverty incidence When α = 1 ⇒ Poverty gap When α = 2 ⇒ Poverty severity Foster, Greer & Thornbecke (1984), Econometrica 7

slide-8
SLIDE 8

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

FGT POVERTY INDICATORS

  • Complex non-linear quantities (non continuous): Even if

FGT poverty indicators are also means Fαd = 1 Nd

Nd

  • j=1

Fαdj, Fαdj = z − Edj z α I(Edj < z), we cannot assume normality for the Fαdj.

  • Not easy to obtain small area estimators with good bias and

MSE properties.

  • A method valid to estimate poverty measures in small areas

for any α and for other poverty or inequality measures would be desirable. 8

slide-9
SLIDE 9

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

SMALL AREA ESTIMATION

  • Due to the relative nature of the mentioned poverty line,

poverty has usually low frequency: Large sample size is needed. In Spain, poverty line for 2006: 6557 euros, approx. 20 % population under the line.

  • Survey on Income and Living Conditions (EU-SILC) has

limited sample size. In the Spanish SILC 2006, n = 34,389 out of N = 43,162,384 (8 out 10,000). 9

slide-10
SLIDE 10

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

SAMPLE SIZES OF PROVINCES BY GENDER

  • Direct estimators for Spanish provinces are not very precise.
  • Provinces × Gender → Small areas (52 × 2).
  • CVs of direct and EB estimators of poverty incidences for 5

selected provinces: Province Gender nd

  • Obs. Poor

CV Dir. CV EB Soria F 17 6 40.37 16.52 Tarragona M 129 18 19.85 16.15 C´

  • rdoba

F 230 73 7.52 6.73 Badajoz M 472 175 7.12 3.57 Barcelona F 1483 191 6.67 5.37 10

slide-11
SLIDE 11

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

EB METHOD (EMPIRICAL BEST/BAYES)

  • Vector with population elements for domain d:

yd = (Yd1, . . . , YdNd)′ = (y′

ds, y′ dr)′

  • Target parameter:

δd = h(yd)

  • Best estimator: The estimator ˆ

δd that minimizes the MSE is ˆ δB

d = Eydr (δd|yds).

  • Best estimator of Fαd: We need to express δd = Fαd in

terms of a vector yd = (y′

ds, y′ dr)′,

Fαd = hα(yd) for which we can derive the distribution of ydr|yds. 11

slide-12
SLIDE 12

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

EB METHOD FOR POVERTY ESTIMATION

  • Assumption: there exists a transformation Ydj = T(Edj) of

the welfare variables Edj which follows a normal distribution (i.e., the nested error model with normal errors ud and edj).

  • FGT poverty indicator as a function of transformed variables:

Fαd = 1 Nd

Nd

  • j=1

z − T −1(Ydj) z α I

  • T −1(Ydj) < z
  • .
  • EB estimator of Fαd:

ˆ F EB

αd = Eydr [Fαd|yds] ,

Fαd = hα(yd). 12

slide-13
SLIDE 13

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

EB METHOD FOR POVERTY ESTIMATION

  • Distribution: yd

ind

∼ N(µd, Vd), d = 1 . . . , D, where yd = yds ydr

  • ,

µd = µds µdr

  • ,

Vd = Vds Vdsr Vdsr Vdr

  • .
  • Distribution of ydr given yds:

ydr|yds ∼ N(µdr|ds, Vdr|ds), where µdr|ds = µdr + VdrsV−1

ds (yds − µds),

Vdr|ds = Vdr − VdrsV−1

ds Vdsr.

13

slide-14
SLIDE 14

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

EB METHOD FOR POVERTY ESTIMATION

  • For the nested-error model:

µdr|ds = Xdrβ + σ2

u1Nd−nd1′ ndV−1 ds (yds − Xdsβ)

Vdr|ds = σ2

u(1 − γd)1Nd−nd1′ Nd−nd + σ2 eINd−nd,

where γd = σ2

u(σ2 u + σ2 e/nd)−1

  • Model for simulations:

ydr = µdr|ds + vd1Nd−nd + ǫdr, with vd ∼ N{0, σ2

u(1 − γd)}

and ǫdr ∼ N(0Nd−nd, σ2

eINd−nd).

  • We only need to generate N + D univariate normal random

variables. Molina and Rao (2010), CJS 14

slide-15
SLIDE 15

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

MONTE CARLO APPROXIMATION

(a) Generate L non-sample vectors y(ℓ)

dr , ℓ = 1, . . . , L from the

(estimated) conditional distribution of ydr|yds. (b) Attach the sample elements to form a population vector y(ℓ)

d

= (yds, y(ℓ)

dr ), ℓ = 1, . . . , L.

(c) Calculate the poverty measure with each population vector F (ℓ)

αd = hα(y(ℓ) d ), ℓ = 1, . . . , L. Then take the average over the

L Monte Carlo generations: ˆ F EB

αd = Eydr [Fαd|yds] ∼

= 1 L

L

  • ℓ=1

F (ℓ)

αd .

15

slide-16
SLIDE 16

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

NON-SAMPLED AREAS

  • Y (ℓ)

dj

for j = 1, . . . , Nd and ℓ = 1, . . . , L generated from Y (ℓ)

dj

= x′

dj ˆ

β + u(ℓ)

d

+ e(ℓ)

dj .

u(ℓ)

d iid

∼ N(0, ˆ σ2

u);

e(ℓ)

dj iid

∼ N(0, ˆ σ2

e).

  • Calculate ˆ

F (ℓ)

αd from {Y (ℓ) dj } and use

ˆ F EB

αd ≃ 1

L

L

  • ℓ=1

ˆ F (ℓ)

αd

  • ˆ

F EB

αd is a synthetic estimator.

slide-17
SLIDE 17

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

MSE ESTIMATION

  • Construct bootstrap populations {Y ∗(b)

dj

, b = 1, . . . , B} from Y ∗

dj = x′ dj ˆ

β + u∗

d + e∗ dj;

j = 1, . . . , Nd, d = 1, . . . , D. u∗

d iid

∼ N(0, ˆ σ2

u);

e∗

dj iid

∼ N(0, ˆ σ2

e).

  • Calculate bootstrap population parameters F ∗

αd(b)

  • From each bootstrap population, take the sample with the

same indexes S as in the initial sample and calculate EBs F EB∗

αd (b) using bootstrap sample data y∗ s and known xdj.

mse∗(ˆ F EB

αd ) = 1

B

B

  • b=1

{ˆ F EB∗

αd (b) − F ∗ αd(b)} 2

slide-18
SLIDE 18

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

WORLD BANK (WB) / ELL METHOD

  • Elbers et al. (2003) also used nested error model on

transformed variables Ydj, using clusters as d.

  • For comparability we take cluster as small area.
  • Generate A bootstrap populations {Y ∗

dj(a), a = 1, . . . , A}

  • Calculate F ∗

αd(a), a = 1, . . . , A. Then ELL estimator is:

ˆ F (ELL)

αd

= 1 A

A

  • a=1

F ∗

αd(a) = F ∗ αd(·)

slide-19
SLIDE 19

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

WORLD BANK (WB) / ELL METHOD

  • MSE estimator:

mse(ˆ F ELL

αd ) = 1

A

A

  • a=1

{F ∗

αd(a) − F ∗ αd(·)}2

  • If the mean ¯

Yd is the parameter of interest, then ˆ ¯ Y (ELL)

d

≃ ¯ Xd ˆ β

  • ˆ

¯ Y (ELL)

d

is a regression synthetic estimator.

  • For non-sampled areas, ˆ

F ELL

αd

is essentially equivalent to ˆ F EB

αd .

slide-20
SLIDE 20

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

MODEL-BASED EXPERIMENT

  • We simulated I = 1000 populations from the nested error

model;

  • For each population, we computed the true domain poverty

measures.

  • We computed the MSE of the EB estimators as

MSE(ˆ F EB

αd ) = 1

I

I

  • i=1
  • ˆ

F EB(i)

αd

− F (i)

αd

2 , d = 1, . . . , D.

  • Similarly for direct and ELL estimators.

20

slide-21
SLIDE 21

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

MODEL-BASED EXPERIMENT

Sizes: N = 20000 D = 80 Nd = 250, d = 1, . . . , D nd = 50, d = 1, . . . , D Variance components: σ2

e = (0,5)2

σ2

u = (0,15)2

21

slide-22
SLIDE 22

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

MODEL-BASED EXPERIMENT

Explanatory variables: X1 ∈ {0, 1}, p1d = 0.3 + 0.5d/80, d = 1, . . . , D. X2 ∈ {0, 1}, p2d = 0.2, d = 1, . . . , D. Coefficients: β = (3, 0.03, −0.04)′.

  • The response increases when moving from X1 = 0 to X1 = 1,

and decreases when moving from X2 = 0 to X2 = 1.

  • The “richest” people are those with X1 = 1 and X2 = 0.
  • The last areas have “richer” individuals than the first areas,

i.e., poverty decreases with the area index. 22

slide-23
SLIDE 23

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

POVERTY INCIDENCE

  • Bias negligible for all three estimators (EB, direct and ELL).
  • EB much more efficient than ELL and direct estimators.
  • ELL even less efficient than direct estimators!

a) Bias ( %)

20 40 60 80 −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 Area Bias poverty incidence (x100) EB Sample ELL

b) MSE (×104)

20 40 60 80 10 20 30 40 50 60 70 Area MSE poverty incidence (x10000) EB Sample ELL

Figure 1. a) Bias and b) MSE of EB, direct and ELL estimators of poverty incidences F0d for each area d.

23

slide-24
SLIDE 24

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

POVERTY GAP

  • Same conclusions as for poverty incidence.

a) Bias ( %)

20 40 60 80 −0.10 −0.05 0.00 0.05 0.10 Area Bias poverty gap (x100) EB Direct ELL

b) MSE (×104)

20 40 60 80 1 2 3 4 Area MSE poverty gap (x10000) EB Direct ELL

Figure 2. a) Bias and b) MSE of EB, direct and ELL estimators of poverty gaps F1d for each area d.

24

slide-25
SLIDE 25

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

BOOTSTRAP MSE

  • The bootstrap MSE tracks true MSE.

a) MSE of poverty incidence

20 40 60 80 10.0 10.2 10.4 10.6 10.8 Area MSE Poverty incidence (x10000)

  • True MSE

Bootstrap MSE

b) MSE of poverty gap

20 40 60 80 0.78 0.80 0.82 0.84 0.86 Area MSE Poverty gap (x10000)

  • True MSE

Bootstrap MSE

Figure 3. True MSEs and bootstrap estimators (×104) of EB estimators with B = 500 for each area d.

25

slide-26
SLIDE 26

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

CENSUS EB METHOD

  • When sample data cannot be linked with census auxiliary data,

in steps (a) and (b) of EB method generate a full census from yd = ˆ µd|ds+vd1Nd+ǫd, ˆ µd|ds = Xd ˆ β+ˆ σ2

u1Nd1′ nd ˆ

V−1

ds (yds−Xds ˆ

β).

  • Practically the same as original EB method.

a) Mean (×100)

20 40 60 80 3.3 3.4 3.5 3.6 3.7 3.8 Area Poverty incidence (x100) EB Census EB

b) MSE (×104)

20 40 60 80 9.5 10.0 10.5 11.0 11.5 12.0 Area MSE poverty incidence (x10000) EB Census EB

Figure 4. a) Mean and b) MSE of EB and Census EB estimators of poverty gaps F1d for each area d.

26

slide-27
SLIDE 27

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

FAST EB METHOD

  • For large populations or computationally complex indicators.
  • Instead of generating a full census in the EB method, generate
  • nly samples from the conditional distribution and compute direct

estimators instead of true values.

  • Fast EB method quite close to original EB.

20 40 60 80 1 2 3 4 5 6 7 Area MSE PI (x1000) EB Design−based ELL fastEB

Figure 5. MSE (×104) of EB, direct, ELL and fast EB estimators of PI.

Ferretti, Molina & Lemmi, Submitted to JISAS 27

slide-28
SLIDE 28

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

SKEW-NORMAL EB

  • Nester error model with edj skew normal

ud

iid

∼ N(0, σ2

u),

edj

iid

∼ SN(0, σ2

e, λe)

θ = (β′, σ2

u, σ2 e, λe)′

λe = 0 corresponds to Normal

  • As in the Normal case, EB estimator can be computed by

generating only univariate normal variables, conditionally given a half-normal variable T = t.

  • SN-EB was computed assuming θ is known.

28

slide-29
SLIDE 29

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

SKEW-NORMAL EB SIMULATION

  • EB biased under significant skewness (λ > 1) unlike SN EB.

a) Bias of SN-EB estimator

20 40 60 80 0.0 0.2 0.4 0.6 0.8 1.0 Area Bias: Poverty Gap (x100) SN(0.5,10) SN(0.5,5) SN(0.5,3) SN(0.5,2) SN(0.5,1)

b) Bias of EB estimator

20 40 60 80 0.0 0.2 0.4 0.6 0.8 1.0 Area Bias: Poverty Gap (x100) SN(0.5,10) SN(0.5,5) SN(0.5,3) SN(0.5,2) SN(0.5,1)

Figure 6. Bias of a) SN-EB estimator and b) EB estimator under skew normal distributions for error term for λ = 1, 2, 3, 5, 10.

Diallo & Rao, Work in progress 29

slide-30
SLIDE 30

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

SKEW-NORMAL EB SIMULATION

  • RMSE = MSE(EB)/MSE(SN-EB)
  • SN-EB significantly more efficient than EB when λ > 1.

20 40 60 80 1.0 1.5 2.0 2.5 Area RMSE = MSE(MR)/MSE(SN-MR): Poverty Gap SN(0.5,10) SN(0.5,5) SN(0.5,3) SN(0.5,2) SN(0.5,1) SN(0.5,0)

Figure 7. RMSE for skewness parameter λ = 1, 2, 3, 5, 10.

30

slide-31
SLIDE 31

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

SMALL AREA DISTRIBUTION FUNCTION

  • EB good for estimating other non-linear characteristics such as
  • distrib. function.

a) Mean (×100)

10 20 30 40 0.0 0.2 0.4 0.6 0.8 1.0 z

  • Distrib. function
  • EB

Direct ELL

b) MSE (×104)

10 20 30 40 50 100 150 z MSE Distrib. Function EB Direct ELL

Figure 8. a) Mean of true, EB, direct and ELL estimators of the distribution function and b) MSE of estimators for area d = 1.

31

slide-32
SLIDE 32

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

HIERARCHICAL BAYES METHOD

  • Reparameterized nested-error model:

ydi|ud, β, σ2 ind ∼ N(x′

diβ + ud, σ2)

ud|ρ, σ2 ind ∼ N

  • 0,

ρ 1 − ρ σ2

  • Noninformative prior: π(β, σ2, ρ) ∝ 1/σ2.
  • Proper posterior density (provided X full column rank):

π(u, β, σ2, ρ|ys) = π1(u|β, σ2, ρ, ys) π2(β|σ2, ρ, ys) π3(σ2|ρ, ys) π4(ρ|ys)

  • ui|β, σ2, ρ, ys ∼ind Normal, β|σ2, ρ, ys ∼ Normal,

σ−2|ρ, ys ∼ Gamma.

  • π4(ρ|ys) is not simple but ρ-values from it can be generated

using a grid method. Rao, Nandram & Molina, Work in progress 32

slide-33
SLIDE 33

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

HIERARCHICAL BAYES METHOD

  • Very similar to original EB method (frequencial validity).

a) Mean (×100)

20 40 60 80 3.3 3.4 3.5 3.6 3.7 3.8 3.9 Area Poverty incidence (x100) EB HB

b) MSE (×104)

20 40 60 80 9 10 11 12 13 Area MSE poverty incidence (x10000) EB HB

Figure 9. a) Mean and b) MSE of EB and HB estimators of poverty gaps F1d for each area d.

Rao, Nandram & Molina, Work in progress 33

slide-34
SLIDE 34

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

CONCLUSIONS

  • We studied EB and HB estimation of complex small area

parameters.

  • Method applicable to unit level data.
  • EB method assumes normality for some transformation of the

variable of interest. EB work extended to skew normal distributions.

  • It requires the knowledge of all population values of the

auxiliary variables.

  • It requires computational effort because large number of

populations are generated. Fast EB method available. 34

slide-35
SLIDE 35

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

CONCLUSIONS

  • Original EB method, unlike ELL method, requires linking

sample with census data for the auxiliary variables. Census EB method avoids the linking and is practically the same as

  • riginal EB.
  • Both EB and ELL methods assume that the sample is

non-informative, that is, the model for the population holds good for the sample. Under informative sampling, probably both methods are biased. Currently an extension of EB method accounting for informative sampling is being studied. 35