Understanding Variance Estimator Bia ias in in Stratified - - PowerPoint PPT Presentation

β–Ά
understanding variance estimator
SMART_READER_LITE
LIVE PREVIEW

Understanding Variance Estimator Bia ias in in Stratified - - PowerPoint PPT Presentation

Understanding Variance Estimator Bia ias in in Stratified Two-Stage Sampling Khoa Dong 1 , Tim Trudell 1 , Yang Cheng 1 , Eric Slud 1,2 1 U.S. Census Bureau, 2 University of Maryland Joint Statistical Meetings Vancouver, CA July 29, 2018 1


slide-1
SLIDE 1

Understanding Variance Estimator Bia ias in in Stratified Two-Stage Sampling

Khoa Dong1, Tim Trudell1, Yang Cheng1, Eric Slud1,2

1U.S. Census Bureau, 2University of Maryland

Joint Statistical Meetings Vancouver, CA July 29, 2018

1

slide-2
SLIDE 2

Disclaimer

This presentation is intended to inform interested parties

  • f
  • ngoing

research and to encourage discussion of work in progress. Any views expressed on statistical, methodological, technical, or operational issues are those of the authors and not necessarily those of the U.S. Census Bureau.

2

slide-3
SLIDE 3

Outline

  • 1. Motivation
  • 2. Overview of Current Population Survey (CPS)
  • 3. Problem description
  • 4. CPS variance estimation
  • 5. Simulation results

3

slide-4
SLIDE 4

Motivation

  • When estimating response rate π‘ž and 𝑀𝑏𝑠(π‘ž) for households

in CPS non-self-representing (NSR) primary sampling units (PSU), we observed unusually high value of 𝑀𝑏𝑠(π‘ž).

  • We wanted to better understand the cause of this result.

4

slide-5
SLIDE 5

Current Population Survey

  • One of the oldest surveys in the U.S. (in operation since 1942)
  • Measuring national unemployment rate
  • Monthly sample of ~72,000 households

5

slide-6
SLIDE 6

Primary Sampling Unit

  • PSU - either a county or group of contiguous counties
  • Two types of PSU:
  • Self-representing SR
  • Non-self-representing NSR

6

slide-7
SLIDE 7

CPS Sample Design

  • Two-stage stratified sampling design for NSR PSUs:
  • First

stage: select

  • ne

PSU per stratum with probability proportional to size (civilian noninstitutionalized population 16+ = CNP 16+)

  • Second stage: do systematic sampling within selected PSUs
  • Systematic sampling for SR PSUs

7

slide-8
SLIDE 8

CPS Sample Design

  • Select PSUs once every 10 years
  • 852 PSUs selected in first-stage (2010 design): 506 SR

and 346 NSR

  • Approximately 80% of CNP 16+ population in SR PSUs

8

slide-9
SLIDE 9

Key Labor Force Estimates

  • Noninstitutionalized civilian labor force statistics:
  • Unemployment/employment levels
  • Unemployment rate
  • Labor force participation rate

9

slide-10
SLIDE 10

Problem

  • Estimate monthly response rate π‘ž, variance 𝑀𝑏𝑠(π‘ž) for CPS

households in NSR PSUs.

  • The sample is at household level: 1 record for each sampled

household in each month.

  • Response 𝑧𝑗

has binary outcome: 1--response and 0-- nonresponse.

  • Time period: March 17 – March 18

10

slide-11
SLIDE 11

NSR Household Response Rates March 17 – March 18

11

Household Response Rate

slide-12
SLIDE 12

Estimated Varia iance for Response an and No Nonresponse Ra Rates Mar ar 17 17 – Mar ar 18 18

12

  • Expect to see

π’˜π’ƒπ’” 𝒒 = π’˜π’ƒπ’”(𝟐 βˆ’ 𝒒), but they are NOT.

  • Our chosen variance

estimator introduces bias in some way.

slide-13
SLIDE 13

CPS Vari riance Estimation

  • Due to CPS sample design, there is no direct variance

estimator formula:

  • Select only one PSU per NSR stratum
  • Systematic sampling within PSU
  • Currently use balanced-repeated replication (BRR) method

for NSR PSUs.

13

slide-14
SLIDE 14

CPS Vari riance Estimation

  • BRR variance estimator:

𝑀𝑏𝑠 ΰ·  𝑍 = 1 𝑆(1 βˆ’ 𝐿)2 ෍

𝑠=1 𝑆

(ΰ·  𝑍

𝑠 βˆ’ ΰ· 

𝑍)2 where

ΰ·  𝑍

𝑠 = the 𝑠-th replicate estimate of 𝑍

ΰ·  𝑍 = the full sample estimate of 𝑍 𝑆 = number of replicates 𝐿 = perturbation factor; 0 ≀ 𝐿 < 1

  • BRR requires selecting two PSUs per stratum, but CPS selects only
  • ne PSU per stratum οƒ  collapse strata to make pseudo-strata.
  • These pseudo-strata should ideally contain exactly 2 perfectly

matched strata.

14

slide-15
SLIDE 15

BRR wit ith Pseudo-Strata

  • Suppose we want to estimate a population total 𝑍 using

ΰ·  𝑍 = Οƒβ„Ž=1

𝑀

ΰ·  𝑍

β„Ž

where 𝑀 denotes the number of strata.

  • Consider the simple case when 𝑀 is even, we estimate the

variance of ΰ·  𝑍 by combining the 𝑀 strata into 𝐻 groups of two strata each (𝑀 = 2𝐻).

15

slide-16
SLIDE 16

BRR wit ith Pseudo-Strata

  • Hence,

ΰ·  𝑍 = ෍

𝑕=1 𝐻

ΰ·  𝑍

𝑕 = ෍ 𝑕=1 𝐻

(ΰ·  𝑍

𝑕1 + ΰ· 

𝑍

𝑕2)

𝑾𝒃𝒔 ΰ·‘ 𝒁 = ෍

𝒉=𝟐 𝑯

𝑾𝒃𝒔(ΰ·‘ π’π’‰πŸ) + 𝑾𝒃𝒔(ΰ·‘ π’π’‰πŸ‘) = ෍

𝒉=𝟐 𝑯

(π‰π’‰πŸ

πŸ‘ + π‰π’‰πŸ‘ πŸ‘ )

16

slide-17
SLIDE 17

BRR wit ith Pseudo-Strata

  • The 𝑠-th replicate estimate of 𝑍:

ΰ·  𝑍

𝑠 = ෍ 𝑕=1 𝐻

(1 + (1 βˆ’ 𝐿)πœ€π‘•π‘ ) ΰ·  𝑍

𝑕1 + (1 βˆ’ (1 βˆ’ 𝐿)πœ€π‘•π‘ )ΰ· 

𝑍

𝑕2

where πœ€π‘•π‘  = 1 if the first stratum in 𝑕-th group is selected and πœ€π‘•π‘  = βˆ’ 1 if the second stratum in 𝑕-th group is selected.

  • πœ€π‘•π‘  are chosen from entries of a Hadamard matrix.
  • Rows of a Hadamard matrix are mutually orthogonal:

෍

𝑠=1 𝑆

πœ€π‘•π‘ πœ€π‘™π‘  = 0 (βˆ€ 𝑕 β‰  𝑙)

17

slide-18
SLIDE 18

BRR wit ith Pseudo-Strata

ΰ·  𝑍

𝑠 βˆ’ ΰ· 

𝑍 = ෍

𝑕=1 𝐻

1 βˆ’ 𝐿 πœ€π‘•π‘  ΰ·  𝑍

𝑕1 βˆ’ ΰ· 

𝑍

𝑕2

(ΰ·  𝑍

𝑠 βˆ’ ΰ· 

𝑍)2= ෍

𝑕=1 𝐻

1 βˆ’ 𝐿 2 πœ€π‘•π‘ 

2

ΰ·  𝑍

𝑕1 βˆ’ ΰ· 

𝑍

𝑕2 2

+ ෍

𝑕=1 𝐻

෍

𝑙≠𝑕 𝐻

1 βˆ’ 𝐿 2πœ€π‘•π‘ πœ€π‘™π‘  ΰ·  𝑍

𝑕1 βˆ’ ΰ· 

𝑍

𝑕2

ΰ·  𝑍

𝑙1 βˆ’ ΰ· 

𝑍

𝑙2

18

slide-19
SLIDE 19

BRR with Pseudo-Strata

1 𝑆(1 βˆ’ 𝐿)2 ෍

𝑠=1 𝑆

(ΰ·  𝑍

𝑠 βˆ’ ΰ· 

𝑍)2 = 1 𝑆(1 βˆ’ 𝐿)2 ෍

𝑠=1 𝑆

෍

𝑕=1 𝐻

1 βˆ’ 𝐿 2 ΰ·  𝑍

𝑕1 βˆ’ ΰ· 

𝑍

𝑕2 2

+ 1 𝑆(1 βˆ’ 𝐿)2 ෍

𝑕=1 𝐻

෍

𝑙≠𝑕 𝐻

1 βˆ’ 𝐿 2 ΰ·  𝑍

𝑕1 βˆ’ ΰ· 

𝑍

𝑕2

ΰ·  𝑍

𝑙1 βˆ’ ΰ· 

𝑍

𝑙2 ෍ 𝑠=1 𝑆

πœ€π‘•π‘ πœ€π‘™π‘ 

  • Therefore,

𝑀𝑏𝑠 ΰ·  𝑍 = ෍

𝑕=1 𝐻

(ΰ·  𝑍

𝑕1 βˆ’ ΰ· 

𝑍

𝑕2)2 = ෍ 𝑕=1 𝐻

(ΰ·  𝑍

𝑕1 2 +ΰ· 

𝑍

𝑕2 2 βˆ’2ΰ· 

𝑍

𝑕1 ΰ· 

𝑍

𝑕2)

19

slide-20
SLIDE 20

Bia ias in in BRR wit ith Pseudo-Strata

  • Taking expectation:

𝐹 ෍

𝑕=1 𝐻

( ΰ·  𝑍

𝑕1 2 +ΰ· 

𝑍

𝑕2 2 βˆ’2ΰ· 

𝑍

𝑕1 ΰ· 

𝑍

𝑕2)

= π‘Šπ‘π‘  ΰ·  𝑍

𝑕1 + πœˆπ‘•1 2 + π‘Šπ‘π‘  ΰ· 

𝑍

𝑕2 + πœˆπ‘•2 2 βˆ’ 2πœˆπ‘•1πœˆπ‘•2

= ෍

𝑕=1 𝐻

(𝜏

𝑕1 2 + 𝜏 𝑕2 2 ) + ෍ 𝑕=1 𝐻

(πœˆπ‘•1 βˆ’ πœˆπ‘•2)2 = π‘Šπ‘π‘ (ΰ·  𝑍) + 𝐢𝑗𝑏𝑑2 where 𝜏

π‘•β„Ž 2 = Var{ΰ· 

𝑍

π‘•β„Ž} and πœˆπ‘•β„Ž = 𝐹{ΰ· 

𝑍

π‘•β„Ž}.

  • Bias squared term is positive and would ADD to variance estimate.
  • Bias squared term would be zero if the pair of PSUs in each group were perfectly

matched.

20

slide-21
SLIDE 21

How are Strata Coll llapsed ?

  • In CPS, the objective function is a function of:
  • Unemployment
  • Civilian labor force
  • Children 0-17 at or below 200% poverty level

21

slide-22
SLIDE 22

Sim imulation Overview

  • Use one month CPS data (Mar 18) which has pseudo-strata

information.

  • For each household, generate 𝑧𝑗 responses iid from Bernoulli

distribution with various π‘ž = 0.03, 0.06, … , 0.99.

  • For each π‘ž:
  • Run 5,000 sims.
  • Compute true variance and BRR variance.
  • Compute bias squared term σ𝑕=1

𝐻

(πœˆπ‘•1 βˆ’ πœˆπ‘•2)2

  • Compare true variance with BRR variance after adjusting for bias.

22

slide-23
SLIDE 23

Simulation Computation

  • Total number of households: ΰ·‘

𝑂 = σ𝑗=1

π‘œ

π‘₯𝑗

  • Full sample estimated response count: ΰ· 

𝑍 = σ𝑗=1

π‘œ

π‘₯𝑗𝑧𝑗

  • Replicate 𝑠 estimated response count: ΰ· 

𝑍

𝑠 = σ𝑗=1 π‘œ

π‘₯𝑗𝑧𝑗𝑔

𝑗𝑠 where 𝑔 𝑗𝑠 is

either 1.5 or 0.5.

23

slide-24
SLIDE 24

Simulation Computation

  • BRR variance of ΰ· 

𝑍: πΆπ‘†π‘†π‘Šπ‘π‘  ΰ·  𝑍 =

4 160 σ𝑠=1 160(ΰ· 

𝑍

𝑠 βˆ’ ΰ· 

𝑍)2

  • BRR variance of response rate π‘ž:

πΆπ‘†π‘†π‘Šπ‘π‘  π‘ž = 1 ΰ·‘ 𝑂

2

πΆπ‘†π‘†π‘Šπ‘π‘  ΰ·  𝑍 assuming 𝑂 = ΰ·‘ 𝑂 is fixed from outside knowledge.

  • πœˆπ‘•β„Ž = π‘ž Γ— 𝑂

π‘•β„Ž

24

slide-25
SLIDE 25

BRR Variance vs. Tru rue Variance

25

As 𝒒 gets close to 1, BRR variance estimate is far different from true variance. BRRVar TrueVar

slide-26
SLIDE 26

BRR Variance vs. Tru rue Variance

26

Not all of bias can be explained due to:

  • Strata collapsed based on different

set of covariates

  • Use AHS MOS instead of CPS
  • AHS MOS not currently updated

(from 2010 design)

BRRVar BRRVar - BiasSq TrueVar

slide-27
SLIDE 27

Summary ry

  • For NSR component:
  • CPS collapses strata to make pseudo-strata.
  • There is no perfect matching of strata οƒ  bias in variance estimator.
  • Bias gets significantly large when π‘ž gets close to 1.
  • Quick fix is to use 𝑀𝑏𝑠(1 βˆ’ π‘ž) for large π‘ž.
  • CPS is designed for civilian labor force statistics. Expect more

bias when estimating variance of other statistics.

27

slide-28
SLIDE 28

Questions?

Thank You!

khoa.dong@census.gov

28

slide-29
SLIDE 29

References

1.David Judkins (1990). β€œFay’s method for variance estimation.” Journal of Official Statistics, Vol 6,

  • No. 3, 1990

2.Philip J. McCarthy (1966). β€œReplication: An Approach to the Analysis of Data from Complex Surveys.” Vital and Health Statistics Series 2 No. 14 3.Robert E. Fay (1984). β€œSome Properties of Estimates of Variances Based on Replication Methods.” 4.Philip J. McCarthy (1969). β€œPseudo-Replication: Half Samples.” Review of the International Statistical Institute, Vol. 37, No. 3, pp. 239-264 5.Yang Cheng (2012). β€œOverview of Current Population Survey Methodology.” Internal Report. 6.Wolter, K.M. (2008). Introduction to Variance Estimation, New York: Spring-Verlag.

29