Applying Bivariate Binomial-Logit Normal Models to Small Area - PowerPoint PPT Presentation

Applying Bivariate Binomial-Logit Normal Models to Small Area Estimation Carolina Franco and William R. Bell U.S. Census Bureau Center for Statistical Research and Methodology SAE 2013, Bangkok, Thailand September 2, 2013

Introduction Methods Discussion Disclaimer This presentation and the paper are released to inform interested parties of ongoing research and to encourage discussion of work in progress. Any views expressed on statistical, methodological, technical, or operational issues are those of the authors and not necessarily those of the U.S. Census Bureau. Franco and Bell (U.S. Census Bureau) Applying Bivariate Binomial-Logit Normal Models to Small Area Estimation 2/25

Introduction Methods Discussion Introduction The U.S. Census Bureau’s SAIPE (Small Area Income and Poverty Estimates) program estimates poverty for various age groups for states, counties, and school districts of the U.S. Our focus: poverty estimates of school-aged (5-17) children for counties. Inference is currently based on 1-year data from the American Community Survey (ACS), covariates from administrative records and a Census long form 2000 estimate. Franco and Bell (U.S. Census Bureau) Applying Bivariate Binomial-Logit Normal Models to Small Area Estimation 3/25

Introduction Methods Discussion The American Community Survey (ACS) Approximately 3 million addresses per year since 2005. Questions: demographic, income, disabilities, health insurance, etc. Sampling design: stratification, systematic sampling, clustering of persons, etc. Estimation procedure: basic weights undergo several adjustments to adjust for nonresponse, to calibrate to population controls, etc. Supplanted the census “long form”, which sampled about 1/6 of the population every 10 years (last long-form in 2000) Publishes 1-year, 3-year, and 5-year estimates for billions of estimands each year. Franco and Bell (U.S. Census Bureau) Applying Bivariate Binomial-Logit Normal Models to Small Area Estimation 4/25

Introduction Methods Discussion Why Consider a Bivariate Model in this Problem? County SAIPE model has traditionally used a previou Census long form estimate as an important regression variable. Census 2000 long form data increasingly out of date. The ACS 5-year estimate from the years prior to the production year may be a good alternative (Huang and Bell, 2012). Sampling error in census county estimates currently ignored in the modelling Due to the smaller sample size this is less acceptable with the ACS 5-year data. Bivariate model can allow for both sampling errors. Franco and Bell (U.S. Census Bureau) Applying Bivariate Binomial-Logit Normal Models to Small Area Estimation 5/25

Introduction Methods Discussion The Fay-Herriot Model (1979) The model for m small areas: y i = Y i + e i i = 1 , . . . , m (1) Y i = x ′ i β + u i (2) Y i is the population characteristic of interest for area i . y i is the direct survey estimate of Y i . e i is the sampling error in y i , generally assumed to be N (0 , v i ), independent with v i known. u i is the area i random effect, usually assumed to be i.i.d. N (0 , σ 2 u ) and independent of the e i . x i and β are the regression variables and coefficients. Franco and Bell (U.S. Census Bureau) Applying Bivariate Binomial-Logit Normal Models to Small Area Estimation 6/25

Introduction Methods Discussion The SAIPE 5-17 County Production Poverty Model The model is of the form of (1) and (2) with a logarithmic transformation. y i = log of the ACS estimate of the number of persons age 5-17 in poverty for county i . Y i =log of the true number of persons age 5-17 in poverty in the county. β and σ 2 u are estimated by ML. Prediction results are translated back from the log scale using properties of the lognormal distribution. Franco and Bell (U.S. Census Bureau) Applying Bivariate Binomial-Logit Normal Models to Small Area Estimation 7/25

Introduction Methods Discussion The SAIPE 5-17 County Production Poverty Model–Regression Variables log of the number of “poor child exemptions” for the county, i.e., child exemptions claimed on tax returns whose adjusted gross income falls below the official poverty threshold; log of the number of county SNAP benefits recipients in July of the previous year; log of the estimated county population age 0-17 as of July 1; log of the total number of child exemptions in the county claimed on tax returns; and log of the Census 2000 county estimate of the number of children in poverty ages 5 to 17. Franco and Bell (U.S. Census Bureau) Applying Bivariate Binomial-Logit Normal Models to Small Area Estimation 8/25

Introduction Methods Discussion Some Issues with the Current Production Model For some counties with small samples, the direct ACS estimate of the number of 5-17 year-olds in poverty is zero. Since logs cannot be taken of these zero estimates, such counties are dropped from the model fitting. Using the production model, one can still produce estimates for all counties. Our bivariate GLMM approach, which uses a generalized variance function (GVF) to estimate the sampling variances, does not require dropping any counties from the fitting. Franco and Bell (U.S. Census Bureau) Applying Bivariate Binomial-Logit Normal Models to Small Area Estimation 9/25

Introduction Methods Discussion A Univariate Binomial/Logit Normal Model Let y i be the sampled count, n i the sample size, and p i be the true proportion for county i Univariate Binomial/Logit Normal Model: y i | p i , n i ∼ Bin( n i , p i ) i = 1 , . . . , m (3) logit( p i ) = x ′ i β + u i (4) logit( p i ) = log[ p i / (1 − p i )], u i ∼ N (0 , σ 2 u ). This model does not incorporate the complex sampling features of the data! Franco and Bell (U.S. Census Bureau) Applying Bivariate Binomial-Logit Normal Models to Small Area Estimation 10/25

Introduction Methods Discussion Use of Effective Sample Sizes Due to the complex sampling design, we use “effective” sample sizes ˜ n i and sample counts ˜ y i based on the design effect: � � ˜ n i = ˜ p i (1 − ˜ p i ) Var(ˆ p i ) ˜ y i = ˜ n i × ˆ p i ˆ p i are the direct ACS estimates; ˜ p i are preliminary estimates of p i based on ˆ p i defined such that they cannot be zero. We then substitute (˜ n i , ˜ y i ) for ( n i , y i ) in the Binomial/Logit Normal Model, rounding to the nearest integer. Franco and Bell (U.S. Census Bureau) Applying Bivariate Binomial-Logit Normal Models to Small Area Estimation 11/25

Introduction Methods Discussion The Bivariate Binomial/Logit Normal Model ˜ y 1 i | p 1 i , ˜ n 1 i ∼ Bin(˜ n 1 i , p 1 i ) y 2 i | p 2 i , ˜ ˜ n 2 i ∼ Bin(˜ n 2 i , p 2 i ) (5) logit( p 1 i ) = x ′ logit( p 2 i ) = x ′ 1 i β 1 + u 1 i 2 i β 2 + u 2 i (6) � u 1 i � � σ 11 � σ 12 ∼ i . i . d . N (0 , Σ) , Σ = u 2 i σ 12 σ 22 for i = 1 , . . . , m . Application: (˜ y 1 i ,˜ n 1 i ), (˜ y 2 i ,˜ n 2 i ) are the effective sample counts of children aged 5-17 in poverty and effective sample sizes based on the 2011 ACS 1-year and the 2006-2010 ACS 5-year estimates. Franco and Bell (U.S. Census Bureau) Applying Bivariate Binomial-Logit Normal Models to Small Area Estimation 12/25

Introduction Methods Discussion Comments ˜ y 1 i and ˜ y 2 i are assumed conditionally independent (given p 1 i , ˜ n 1 i and p 2 i , ˜ n 2 i ) since the ACS samples are drawn approximately independently each year. Unconditionally, these are dependent due to the correlation of the random effects u 1 i and u 2 i . To avoid excluding observations from the fitting, we use a Generalized Variance Function (GVF) to generate estimates of the sampling variance even for counties that have an observed count of zero. We use SAS’s NLMIXED for fitting the model. Franco and Bell (U.S. Census Bureau) Applying Bivariate Binomial-Logit Normal Models to Small Area Estimation 13/25

Introduction Methods Discussion The GVF–Introduction Using ACS direct sampling variances ˆ S 2 i for each survey, our GVF model is: E ( S 2 i ) = GVF i = γ 0 ( p i ( 1 − p i )) γ 1 ( Rw i ) γ 2 . (7) � ni j =1 w 2 Rw i := j =1 w ij ) 2 , where w ij is the weight of household j in ij ( � ni county i , and n i is the sample size of county i . Rw i is an estimate of the inverse of the effective sample size when there is no clustering (Kish, 1987). Only counties with S 2 i � = 0 that meet a minimum sample size threshold are used in the fitting. The log of equation (7) can be fitted as a linear model. Franco and Bell (U.S. Census Bureau) Applying Bivariate Binomial-Logit Normal Models to Small Area Estimation 14/25

Introduction Methods Discussion Intial Values for the GVF and Iterative Approach p i = logit − 1 ( x i ˆ ˜ η ) where ˆ η solves the optimization problem m � p i − logit − 1 ( x i η )) 2 min ( ˆ (8) i =1 ˆ p i are the direct ACS estimates. Note ˜ p i cannot be zero. These ˜ p i are used in the GVF model to estimate γ 0 , γ 1 , γ 2 We then use the fitted GVF model (7) to estimate GVF i for all counties. We fit the bivariate binomial/logit Normal model using these GVF i for the sampling variances � Var (ˆ p i ). Iterative Approach: the ˜ p i are updated, repeat. Franco and Bell (U.S. Census Bureau) Applying Bivariate Binomial-Logit Normal Models to Small Area Estimation 15/25

Applying Bivariate Binomial-Logit Normal Models to Small Area - PowerPoint PPT Presentation

Applying Bivariate Binomial-Logit Normal Models to Small Area Estimation Carolina Franco and William R. Bell U.S. Census Bureau Center for Statistical Research and Methodology SAE 2013, Bangkok, Thailand September 2, 2013 Introduction

1 Binomial Heaps Binomial- -Heap Heap- -Union Union Binomial Heaps Binomial Binomial-

On the q -binomial coefficients and binomial congruences q -series seminar University of Illinois

Linear regression How to measure the accuracy of linear regression models Linear Regression

Overview Weaknesses of NE 1 Example 1: Centipede Game Example 2: Matching Pennies Logit QRE 2

1.10.2 Normal distribution 1.10.3 Approximating binomial distribution by normal 2.10 Central

Chapter 19: Binomial Heaps We will study another heap structure called, the binomial heap. The

Nested logit models Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory

Multivariate probability distributions September 1, 2017 STAT 151 Class 2 Slide 1 Outline

Bivariate Data Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc Mehlman Marc

Bivariate Correlation r > 0 r < 0 r = 0 r = 0 r > 0 r = 0 remember: r measures

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Lecture 5: The multivariate normal distribution The bivariate normal distribution Suppose x ,

Shall We Mixed Logit? Estimation stability and prediction reliability of error component mixed

Computer Lab II Further Introduction to Biogeme Binary Logit Model Estimation Anna Fernndez

Bayesian and Non-Bayesian Analysis of Soccer Data using Bivariate Poisson Regression Models

Normal A Spectrum of Engineering Design Normal Radical A Spectrum of Engineering Design Normal

KOHA at OSA: Experiences and the road ahead Katalin Dob (Senior Librarian) dobok@ceu.hu

Fairness and the Willingness to Pay for Impure Public Goods Mark Andor Manuel Frondel Stephan

Abbey Academies Trust Every Child Matters POLICY For Display and Presentation Amended June

Read Write Inc Phonics How it works Children are assessed and grouped according to their

The National Park Rx Initiative Zarnaaz Bashir, MPH Director, Health Initiatives www.nrpa.org

Congressional Budget Office June 23, 2013 Offsetting Effects of Prescription Drug Use on

Who Are the Dual Eligibles Dual eligibles are Medicare beneficiaries who are also

Health and Employee Benefits Pool Plan Year 2020, Effective October 1, 2019 Agenda Introduction

Sambuz

Useful Links

Newsletter

Mail Us

Applying Bivariate Binomial-Logit Normal Models to Small Area - PowerPoint PPT Presentation

Applying Bivariate Binomial-Logit Normal Models to Small Area Estimation Carolina Franco and William R. Bell U.S. Census Bureau Center for Statistical Research and Methodology SAE 2013, Bangkok, Thailand September 2, 2013 Introduction

1 Binomial Heaps Binomial- -Heap Heap- -Union Union Binomial Heaps Binomial Binomial-

On the q -binomial coefficients and binomial congruences q -series seminar University of Illinois

Linear regression How to measure the accuracy of linear regression models Linear Regression

Overview Weaknesses of NE 1 Example 1: Centipede Game Example 2: Matching Pennies Logit QRE 2

1.10.2 Normal distribution 1.10.3 Approximating binomial distribution by normal 2.10 Central

Chapter 19: Binomial Heaps We will study another heap structure called, the binomial heap. The

Nested logit models Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory

Multivariate probability distributions September 1, 2017 STAT 151 Class 2 Slide 1 Outline

Bivariate Data Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc Mehlman Marc

Bivariate Correlation r &gt; 0 r &lt; 0 r = 0 r = 0 r &gt; 0 r = 0 remember: r measures

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Lecture 5: The multivariate normal distribution The bivariate normal distribution Suppose x ,

Shall We Mixed Logit? Estimation stability and prediction reliability of error component mixed

Computer Lab II Further Introduction to Biogeme Binary Logit Model Estimation Anna Fernndez

Bayesian and Non-Bayesian Analysis of Soccer Data using Bivariate Poisson Regression Models

Normal A Spectrum of Engineering Design Normal Radical A Spectrum of Engineering Design Normal

KOHA at OSA: Experiences and the road ahead Katalin Dob (Senior Librarian) dobok@ceu.hu

Fairness and the Willingness to Pay for Impure Public Goods Mark Andor Manuel Frondel Stephan

Abbey Academies Trust Every Child Matters POLICY For Display and Presentation Amended June

Read Write Inc Phonics How it works Children are assessed and grouped according to their

The National Park Rx Initiative Zarnaaz Bashir, MPH Director, Health Initiatives www.nrpa.org

Congressional Budget Office June 23, 2013 Offsetting Effects of Prescription Drug Use on

Who Are the Dual Eligibles Dual eligibles are Medicare beneficiaries who are also

Health and Employee Benefits Pool Plan Year 2020, Effective October 1, 2019 Agenda Introduction

Sambuz

Useful Links

Newsletter

Mail Us

Bivariate Correlation r > 0 r < 0 r = 0 r = 0 r > 0 r = 0 remember: r measures