Current Trends in Small Area Estimation Research Partha Lahiri - PowerPoint PPT Presentation

Current Trends in Small Area Estimation Research Partha Lahiri JPSM, University of Maryland, College Park, USA Paper to be presented at Q2008, Rome, Italy, July 10, 2008

What is a Small Area? • A subpopulation of interest, for which the sample size is not adequate to produce reliable direct estimates. • Example: Geographic Region Small Area Nation State State County, school district Demographic Group Small Domain Broad group Narrow groups by sex/race/ethnicity 2

Examples • Survey of drug use in Nebraska, N=4300. Boone County has n =14 and only 1 white, female age 25-44 was sampled. • In SAIPE, about one-third of the counties are in the sample. • In NHANSE III, a majority of US states do not have sample. 3

A Historical Note • 11th century England and 17th century Canada – Based on census or administrative records. • Recent 3 decades – Increasing demand for small area statistics, due to growing use in formulating policies and programs in the allocation of government funds and in regional planning. 4

Design Issues Ref: Singh et al. (1994), Marker (2001), Rao (2003) • Stratification – Use a large number of smaller strata • Degree of Clustering – Minimize clustering • Sample Allocation – Reallocate sample from large planned domains to smaller planned domains • Rolling samples (ACS), multiple frames • In the Canadian LFS, max(CV) for UI regions was reduced by about half using compromise allocation. 5

Planned Domains: • Minimize a weighted sum of sampling variances of direct small area estimators subject to fixed overall sample size. Ref: Longford (2006) • Minimize total sample size (or cost) subject to desired tolerances on the area sampling variances and on the aggregate sampling variance. Ref: Rao (2007) • Achieve (approx.) equal RRMSE of GREG for the planned domains subject to a fixed cost. Ref: Gabler, Ganninger, Münnich, and others 6

• Achieve equal RRMSE of EBP (or, the estimator to be used) for the planned domains subject to a fixed cost. However, “the client will always require more than specified at the design stage” (Fuller, 1999). 7

Issues in Small Area Estimation 1. Definition of small-areas 2. Identification of relevant sources of information 3. Method of combining information 4. Small area estimates 5. Accuracy of the SAE method 6. Robust validation 7. Computer programming 8. Presentation of SAE statistics 8

Borrowing Strength: • Relevant Source of Information – Census data – Administrative information – Related surveys • Method of Combining Information – Choices of good small area models – Use of a good statistical methodology 9

Synthetic Estimators 1944 Radio Listening Survey, Hansen, Hurwitz and Madow (1953, p. 483-486): To estimate the median number of radio stations heard during the day for over 500 counties (small areas). The following explicit regression equation based on data for 85 counties was used: y = 0.52 + 0.74x ˆ i i where for county i y : estimate obtained from the personal interview survey i x : estimate obtained from the personal interview survey i 10

County Crop Production (Stasny et al., 1991) To estimate wheat production for each county of Kansas ˆ ˆ ˆ y = β + β x + � + β x , where ij 0 1 1ij p pij y : wheat production of the jth farm in the ith county ij � x = (1,x , ,x )' : a vector of auxiliary variables 1ij pij ij Regression-synthetic estimator: ∑ ˆ ˆ ˆ ˆ � Y = y = N β + X β + + X β p ˆ i ij i 0 i1 1 ip j The total no. of farms N and the totals of the auxiliary i variables X (l = 1, � ,p ) are known. il 11

ˆ Y ˆ ˆ ˆ Ratio Adjustment: Y = i Y, where Y is the direct ∑ i,adj ˆ Y i i design-based estimate for the state from a large probability sample. NCHS synthetic State estimates for health variables: assume homogeneity within carefully constructed post- strata. More refined synthetic estimation: SPREE. World Bank Method: Elbers et al. (2003), Haslett-Jones (2005) Off-the-Shelf Methods: Schirm and Zaslavsky (1997) 12

Basic Area Level Model To estimate small area means Y using direct design-based i estimates y and area level auxiliary variables ’s. x i i A Basic Area-Level Model: ˆ Level 1: θ = g(y ) ~ ind. N( θ , ψ ) i i i i T 2 Level 2 : θ = g(Y ) ~ ind. N(x β , τ ) i i i Fay and Herriot (1979): g(Y ) = log(Y ) i i 13

Carter and Rolph (1974), Efron and Morris (1975): g(Y ) = arcsine( Y ) i i SAIPE: g(Y ) = Y i for state level estimation of proportion i of poor school-age children and g(Y ) = log(Y ) for county i i level poverty counts of school-age children The model can be written as a simple linear mixed normal model: ˆ = θ T θ + e = x β + v + e i , where i i i i i e : sampling error; e ~ ind. N(0, ψ ) i i i 2 v : area specific random effects; v ~ iid N(0, τ ) i i 14

Supplementary Information Used • Per-Capita Income for the county • Value of housing for the place • Value of housing for the county • IRS-adjusted gross income per exemption for the place • IRS-adjusted gross income per exemption for the county 15

The BP: ˆ BP ˆ 2 θ = E( θ | θ ; β , τ ) i i i T ˆ T = x β + γ ( θ - x β ) , i i i i ˆ T = γ θ + (1- γ )x β i i i i 2 τ where γ = τ + ψ i 2 i ˆ BP ˆ ˆ ˆ 2 EBP (or EBLUP): θ = E( θ | θ ; β , τ ) i i i 16

Different MSE of EBP: ˆ EBP 2 i E( θ - θ ) ( ) i i ⎡ ⎤ ˆ EBP 2 (ii)E ( θ - θ ) | θ ⎣ ⎦ i i i ⎡ ⎤ ˆ ˆ EBP 2 (iii)E ( θ - θ ) | θ ⎣ ⎦ i i i ⎡ ⎤ ˆ EBP 2 ˆ (iv)E ( θ - θ ) | θ , i = 1, � ,m ⎣ ⎦ i i i Majority of research focused on the unconditional MSE (i) estimation. 17

≈ ˆ EBP 2 2 2 MSE( θ ) g ( τ ) + g ( τ ) + g ( τ ) i 1i 2i 3i 2 ˆ BP g ( τ ) = MSE( θ ) 1i i 2 g ( τ ): the extra variability due to the estimation of β 2i 2 2 g ( τ ): the extra variability due to the estimation of τ 3i Ref: Prasad and Rao (1990) and Datta and Lahiri (2000) 2 and 2 are of the same order and is The terms g ( τ ) g ( τ ) 2i 3i 2 PR and DL lower than that of the leading term g ( τ ). 1i obtained a second-order (or nearly unbiased) estimator of unconditional MSE using the above approximation and 2 correcting the bias of g ( τ ) 1i 18

Longford (2007): The PR MSE estimator did not perform well in estimating design-based MSE for the EURAREA project. Zhang (2007): The PR MSE estimator, averaged over areas, tracks average of design-based MSE for large m, if the model holds. Different resampling methods [jackknife and parametric boostrap] have been proposed by Butar and Lahiri (2003), Jiang and Lahiri (2002), and Wan (2002), Hall and Maiti (2006), Pfeffermann and Glickmann (2004) and Chatterjee and Lahiri (2007). Compared to the Taylor seriesmethod, they performed well in simulations; see Fabrizi et al. (2007) and Pereira and Pedro (2008) 19

Issues: The method uses a simple model and results in an EBP which is design-consistent Normality: EBP method is extendable to specified non- normal distributions for the sampling and random effects. For unspecified non-normality of the sampling and random effects, one can use EBLUP [Lahiri and Rao, 1995] or certain adaptive [Lahiri, 2002; Fabrizi and Trivisano, 2007] or linear EB [Ghosh and Lahiri, 1987; Cocchi and Mouchart] 20

Known sampling variances ψ : GVF type methods are i generally used. The method usually does not consider small area effect and the uncertainty in estimating the sampling variances are not included in the EBP. In some situation, standard estimates [REML, ML, 2 ANOVA, etc.] of the model variance τ can be zero. When ˆ 2 τ is zero, EBLUP reduces to the regression synthetic estimate. One way to avoid the problem is to use the ADM or AML estimates [Morris, 1987; Li and Lahiri, 2007] 21

A simple back transformation is often used to obtain the estimate of Y . The optimum property of the BP is lost by i such a back transformation. Y = g ( θ ) : ( ) -1 -1 ˆ 2 The BP of E g ( θ )| θ ; β , τ i i i i ( ) -1 ˆ ˆ ˆ 2 An EBP Y: E g ( θ )| θ ; β , τ i i The rationale behind the transformation rests on the g(.) Taylor series argument and is used primarily to stabilize the variance. A direct modeling of the direct estimates is possible, but this is likely to lead to non-linear non-normal mixed model. 22

Confidence Interval: The intuitive interval [Cox, 1976] ˆ EBP 2 θ ±1.96 g ( τ ) ˆ i 1i has an undercoverage problem. The correction ˆ EBP PR θ ±1.96 mse i i does not solve the problem – it has either undercoverage or overcoverage problem. 23

Parametric bootstrap interval: ( ˆ EBP 2 ˆ EBP 2 θ - L g ( τ ) θ - U g ( τ )), , i 1i i 1i where L and U are obtained from the parametric ˆ *EBP * θ - θ i i bootstrap histogram: g ( τ ) [Ref: Chatterjee, Lahiri *2 1i and Li, 2008] Hall and Maiti (2006) has an alternative parametric bootstrap method, but the method is synthetic (Rao, 2005) 24

Estimation of Small Area Proportions: Two Basic Area Models Ref: Liu, Lahiri and Kalton (2007) Model 1: P (1- P ) Level 1: p | P ~ ind N(P , i i deff ) iw i i i n i T 2 Level 2: logit(P ) ~ ind N(x β , τ ) i i Model 2: P (1- P ) Level 1: p | P ~ ind Beta(P , deff ) i i iw i i i n i T 2 Level 2 : logit(P ) ~ ind N(x β , τ ) i i 25

Current Trends in Small Area Estimation Research Partha Lahiri - PowerPoint PPT Presentation

Current Trends in Small Area Estimation Research Partha Lahiri JPSM, University of Maryland, College Park, USA Paper to be presented at Q2008, Rome, Italy, July 10, 2008 What is a Small Area? A subpopulation of interest, for which the

Small area estimation of proportions of Small area estimation of proportions of Arsenic affected

Robust Fay Herriot Estimators in Small Area Estimation Sebastian Warnholz Statistical Consultancy

An Outlier Robust Block Bootstrap for Small Area Estimation Payam Mokhtarian and Ray Chambers

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Efficient Small Area Estimation in the Presence of Measurement Error in Covariates Dr. Trijya

Calibration and Small Area Estimation Methods in Polish National Census of Population and Housing

Small Area Estimation under the Growth Curve model Innocent Ngaruye, Link oping University,

Robust Hierarchical Bayesian Analysis Applied to Small Area Estimation Fernando Moura IM - UFRJ

Estimation of Normal Mixtures in a Nested Error Model With an Application to Small Area Estimation

Small Areas, Benchmarking, and Political Battles: Todays Novel Demands in Small-Area Estimation

Area Type Sub Model Estimation Area Type Sub Model Estimation AT classification used in: AT

E-TRENDS ARABNET 2014 IYAD KAMAL IY AD@ ARAMEX.COM IY AD KAMAL @ IY ADKAM E-TRENDS

Selection of small area estimation method for Poverty Mapping: A Conceptual Framework Sumonkanti

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Quantum computing for dummies Carlos Cotrini ETH Z urich ccarlos@inf.ethz.ch September 14,

Exploring the role of the Mathema2cal Horizon for Secondary

Phase separation, interfaces and wetting in two dimensions. Exact results from field theory

The perimeter and area of both a rhombus and a parallelogram can be found by applying the

Lets have a brief overview of the formulae to find the perimeter and area of both a square and a

Presentation to the National Advisory Panel on Marine Protected Area Standards May 4, 2018

The Tahoe Area Plan and the Lake Tahoe Regional Plan Washoe County Planning Commission February

WECC Compliance Open Web Thursday, November 20, 2014 W E C C E S T E R N L E C T R

Sambuz

Useful Links

Newsletter

Mail Us