Introduction to Survey Weights for 2009-2010 National Adult Tobacco - - PowerPoint PPT Presentation

introduction to survey weights for
SMART_READER_LITE
LIVE PREVIEW

Introduction to Survey Weights for 2009-2010 National Adult Tobacco - - PowerPoint PPT Presentation

Introduction to Survey Weights for 2009-2010 National Adult Tobacco Survey Sean Hu, MD., MS., DrPH Office on Smoking and Health Presented to Webinar April 25, 2012 National Center for Chronic Disease Prevention and Health Promotion Office on


slide-1
SLIDE 1

Introduction to Survey Weights for 2009-2010 National Adult Tobacco Survey Sean Hu, MD., MS., DrPH

Office on Smoking and Health Presented to Webinar April 25, 2012

National Center for Chronic Disease Prevention and Health Promotion Office on Smoking and Health

slide-2
SLIDE 2

4/25/2012 2

Outline

  • Overview of design and weighting

methodology

  • Weighting procedures
  • Final weights
slide-3
SLIDE 3

OVERVIEW OF DESIGN AND WEIGHTING METHODOLOGY

slide-4
SLIDE 4

Construct Inferential Population Measurement Response Target Population Sampling Frame Sample

Validity

Measurement

Error Coverage

Error

Sampling

Error Measurement Representation Respondents

Nonresponse

Error Edited Data

Processing Error

Survey Statistic

Total Survey Error Framework Groves et al, 2004

Post Adjusted Data

Adjustment

Error

slide-5
SLIDE 5

Major survey errors with a random-digit-dialing survey

BRFSS Measurement

(self-reported data)

Nonresponse Coverage Sampling

slide-6
SLIDE 6

Percentage of U.S. Households Without Landline Telephones

4 8 12 16 20 24 28 32 36

1963 1970 1975 1980 1985-1986 1997 2001

  • Based on National Health Interview Survey data
slide-7
SLIDE 7

Percentage of U.S. Households Without Landline Telephones

4 8 12 16 20 24 28 32 36

1963 1970 1975 1980 1985-1986 1997 2001 Early 2003 Late 2005 Late 2007 Late 2009 Early 2011

  • Based on National Health Interview Survey data
slide-8
SLIDE 8

4 8 12 16 20 24 28 32 36

1963 1970 1975 1980 1985-1986 1997 2001 Early 2003 Late 2005 Late 2007 Late 2009 Early 2011

Percentage of U.S. Households Without Landline Telephones

31.6%

  • Based on National Health Interview Survey data

24.5% 15.8% % of cell phone only households

slide-9
SLIDE 9

A Dual-Frame RDD Survey

  • RDD expands a traditional landline based

RDD survey to a dual frame survey of landline and cell phone numbers

  • Reach 98% of US households
slide-10
SLIDE 10

Landline and Cell phone populations and frames

CELL PHONE LANDLINE

B/C A D

Landline

  • nly

Landline and Cell phone Cell phone

  • nly

Non-overlapping dual frames: Landline RDD frame = A + B Cell phone-only frame = D

slide-11
SLIDE 11

RDD Sampling

Disproportionate Stratified Sampling

 DSS – Landline telephone numbers are classified into

strata that are either high density (listed 1+ block telephone numbers) or medium density (not listed 1+ block telephone numbers) to yield residential telephone numbers. The sampling ratio is 1.5:1

 NATS sampling frame

  • Landline sampling frame: a list landline stratum

a not-listed landline stratum

  • Cell phone stratum

* Citations, references, and credits – Myriad Pro, 11pt

slide-12
SLIDE 12

NATS final disposition code distribution

Categories of final disposition codes No. % No. % Complete & partial complete 110,634 5.5 7,947 2.0 Eligible but not interview 93,217 4.6 4,112 1.0 Unknown eligibility, non-interview 519,773 25.6 247,905 62.5 Not eligible 1,303,822 64.3 136,932 34.5 Total 2,027,446 100.0 396,896 100.0 Landline Cell Phone

AAPOR RR4 (landline) = 46.4%; RR4 (Cell phone) = 41.2% There were many ineligible frame members

slide-13
SLIDE 13

No. % Unweighted % Weighted % Gender Male

151781326 49.16 39.17 48.52

Female

156964212 50.84 60.68 51.23

Unknown

0.15 0.25

Age 18-24

30672088 13.08 4.32 12.93

25-44

82134554 35.02 24.42 36.01

45-64

81489445 34.74 41.61 32.81

>=65

40267984 17.17 27.26 16.11

Unknown

2.39 2.39

Race/Ethnicity White Only, Non-Hispanic 196817552

63.75 82.02 67.98

Black Only, Non-Hispanic

37685848 12.21 7.31 11.52

Asian Only, Non-Hispanic

14465124 4.69 1.8 2.67

Other, Non-Hispanic

9299420 3.01 3.64 3.62

Hispanic

50477594 16.35 3.83 12.99

Unknown

1.4 1.22

2010 Census 2009-2010 NATS

Demographic Distributions between 2010 Census and 2009-2010 NATS

Male, young adults, and minority were under-represented in the survey

slide-14
SLIDE 14

What is weighting?

  • Weighting is a process used to remove bias in the

sample.

  • Corrects for difference in the probability of

selection due to non-response and non-coverage errors

  • Adjust variables of age, race, gender, and other

demographic and related variables between the sample and the entire population

  • Allows the generalization of findings to the whole

population.

slide-15
SLIDE 15

Sample Weights

  • Are assigned to each sample member
  • Reflect differences between the distribution
  • f the sample and the population
  • Can be viewed as the number of population

members that the sample unit represents

slide-16
SLIDE 16

NATS WEIGHTING PROCEDURES

1.Calculate design weights 2.Adjust for the unlisted frame members with unknown eligibility status 3.Adjust for the people that did not respond to the survey (nonresponse adjustment). 4.Poststratification

slide-17
SLIDE 17

Design Weights

  • Stratum design: the inverse of the probability of

selection (1/P)

  • Multiply number of adults in household for landline

sample (NUMADULT)

  • The inverse of number of phones in household for

landline sample (1/NUMPHONE)

  • Design Weight = (1/P)* (1/NUMPHONE) * NUMADULT

– P = Probability of selection – NUMPHONE= number of phones within the household – NUMADULT = number of adults eligible for the survey within the household

slide-18
SLIDE 18

Adjustment for unknown eligibility status

  • Purpose is to adjust sampled members with unknown

eligibility status.

  • Stratum weight = the inverse of the probability of

selection at a stratum = Total frame members at a stratum/sampled members

– Sampled members includes completed, refused, non-contact (unknown eligibility status), and ineligible members. – sampled members with unknown eligibility could be nonrespondents or they could be ineligible

  • Adjusted factor = (completed + refused)/(completed +

refused + ineligible)

– It is the estimated percentage of known eligible members at a stratum

  • Adjusted stratum weight = stratum weight * adjusted

factor

slide-19
SLIDE 19

Nonresponse Adjustment

  • If every person sampled agreed to do the

survey, then weighted estimates using just the design weights would be sufficient to estimate the population values.

  • However, every survey has some level of

nonresponse.

  • We can adjust for this nonresponse by

spreading the weight of the nonrespondents to the respondents.

slide-20
SLIDE 20

Nonresponse Adjustment (continued)

  • In order to perform this adjustment, we need the

following:

– The status of each interview:

  • Complete, partially complete
  • Refusal
  • Unable to contact
  • Ineligible

– Characteristics for all members of the sample including refusals, no contacts, and ineligibles.

slide-21
SLIDE 21

Step 1: Using Auxiliary Data

  • A final adjustment can be made to bring the sample

weights of who we did contact up to the level of the population.

  • This is done when there are accurate population totals

from a source other than the sample frame.

  • This is a way to account for under- or overcoverage of

certain demographic groups.

  • For example, we frequently under-cover males 18-24. If we

have good estimates of the population for this group, we can adjust the weighted total for the group up to the amount in the auxiliary data.

slide-22
SLIDE 22

Using Auxiliary Data (Continued)

Auxiliary data (America Community Survey) at the various geographical levels was appended to the sample frame.

  • Block group level information for listed landline sample
  • County FIPS code information for unlisted sample
  • Area code information for the cell sample

The following variables are used in the nonresponse adjustment procedure:

  • Population density
  • Proportion white
  • Proportion African-American
  • Proportion Hispanic
  • Proportion of families below 150% of the poverty line
  • Proportion that are high school graduates
  • Proportion that completed a Bachelor’s degree
slide-23
SLIDE 23

Step 2: Predict response propensity

  • Auxiliary data needed to model sample

units' response propensities

– A logistic regression is used to obtain a probability of response ( ρ ) for every unit

  • the outcome is response and the independent

variables are the auxiliary data

slide-24
SLIDE 24

Step 3: A new weight is calculated

P 1 * W W

j h, i, 1 j h, i, 2 j h, i,

2 j h, i,

W

Is the nonresponse adjusted probability for the jth unlisted frame member

j h, i,

P

Is the predicted probability of response for the jth unlisted respondent from the logistic model.

2 j h, i, 2 j h, i, 1 j h, i, 3 j h, i,

W W W W

 

We also make the following ratio adjustment.

slide-25
SLIDE 25

Poststratification

  • Poststratification adjusts the sample to the

target population to insure that the distribution of the sample aligns with the distribution in the population for some set

  • f variables.
  • To remove nonresponse and coverage bias
slide-26
SLIDE 26

Selecting variables in Poststratification

There are three ways to select the variables :

  • When using a dual frame we poststratify to the

population totals for each phone type (cell-only, landline including dual user)

  • Use some common demographic variables such

as: age, gender, race/ethnicity, etc.

  • Selecting variables that are most highly correlated

with your outcome of interest such as education and marital status.

slide-27
SLIDE 27

Poststratification

Several Possible Approaches: 1.Use a single big age x gender X education table for the calculation of the weights (traditional poststratification).

  • However, crosstabs may not be available for the population
  • and, small cell sizes in the sample table

2.Iterative Solutions:

  • Manual version (stepwise programming in statistical

software

  • Automatic version (i.e. Raking software)
  • Logistic regression based solutions

– NATS used the logistic regression approach.

slide-28
SLIDE 28

Imputation and Trimming

  • The poststratification variables were

imputed.

  • Weight trimming is applied in order to

constrain the most extreme weight, and thereby reduce variance.

slide-29
SLIDE 29

Cell Phone Respondent Issue

  • For state with very few cell phone respondents,

including cell phone respondents would result in large unequal weighting effects, and consequently, large variance for population estimates.

  • We chose 200 cell phone respondents as a cut-off

point for including cell phone respondents at state level.

slide-30
SLIDE 30

Correlation between unequal weighting effect (UWE) and number of cell phone respondents

slide-31
SLIDE 31

Final Weights

  • WT_NATIONAL:
  • for national estimates
  • use all of respondents (both landline and cell

respondents) in all of the states

  • WT_LANDLINE:
  • only for landline respondents
  • only use landline respondents
  • WT_STATE:
  • for state estimates
  • for states with 200+ cell cases, use both respondents
  • For state with less 200 cell cases, only use landline

cases

slide-32
SLIDE 32

Question or comments?

slide-33
SLIDE 33

Thanks!

Contact Info: Sean Hu, MD, MS, DrPH Telephone: 770-488-5845 E-mail: shu@cdc.gov

The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

National Center for Chronic Disease Prevention and Health Promotion Office on Smoking and Health