Tools for Identifying When to Modify a Surveys Data Collection - - PowerPoint PPT Presentation

tools for identifying when to modify a
SMART_READER_LITE
LIVE PREVIEW

Tools for Identifying When to Modify a Surveys Data Collection - - PowerPoint PPT Presentation

Univariate Tests for Phase Capacity: Tools for Identifying When to Modify a Surveys Data Collection Protocol Workshop on Responsive and Adaptive Survey Design Bureau of Labor Statistics Washington, DC March 14, 2018 Taylor Lewis 1 Senior


slide-1
SLIDE 1

Univariate Tests for Phase Capacity: Tools for Identifying When to Modify a Survey’s Data Collection Protocol

Taylor Lewis1 Senior Data Scientist U.S. Office of Personnel Management

1

Workshop on Responsive and Adaptive Survey Design Bureau of Labor Statistics Washington, DC March 14, 2018

1The opinions, findings, and conclusions expressed in this presentation are those of the authors and do not

necessarily reflect those of the U.S. Office of Personnel Management. Note: all references in this presentation can be found in the JOS Fall 2017 special issue article of the same name.

slide-2
SLIDE 2

I. Introduction

– Background and definitions – Illustration of phase capacity in the 2011 Federal Employee Viewpoint Survey (FEVS)

II. Tests for Phase Capacity

– Rao, Glickman, and Glynn (2008): a test based on multiple imputation for nonresponse – New method: a test amenable to weight adjustment methods for nonresponse

III. FEVS Application Comparing the Two Phase Capacity Tests IV. Limitations and Avenues for Further Research

Outline

2

slide-3
SLIDE 3
  • I. Introduction

3

slide-4
SLIDE 4

Nonresponse and Nonrespondent Follow-Up

4

  • Invariably, not all sampled entities respond to the initial survey

solicitation

  • Most surveys repeatedly follow-up with nonrespondents making

additional mailings, phone calls, household visits, etc., often aiming to meet a preset response rate target

  • Each subsequent reminder brings in a new “wave” of data, which

tends to be progressively smaller in size and, thus, impact estimates less and less

  • Other temporal delineations of waves are possible
slide-5
SLIDE 5

Notion of Phase Capacity

5

  • In their discussion of responsive survey design, Groves and Heeringa

(2006) define the following terms:

– design phase – data collection period with stable frame, sample, and recruitment protocol – phase capacity – point during a design phase at which additional responses cease influencing key statistics

  • Rather than fixating on a target response rate, they argue one should

change design phases (e.g., switch mode, increase incentive) or discontinue nonrespondent follow-up altogether once phase capacity has been reached

  • Problem for practitioners: no calculable rule given
slide-6
SLIDE 6

Illustration of Phase Capacity in FEVS

6

  • Federal Employee Viewpoint Survey (FEVS) – a yearly organizational

climate survey administered by the U.S. Office of Personnel Management (OPM) to a sample of ~1.3M federal employees from 80+ agencies

  • Web-based instrument comprised mainly of attitudinal items posed on

a five-point Likert-type scale

  • Key statistics are “percent positive” estimates based on the

dichotomization of, for example, “Completely Agree” or “Agree” elections versus all other possible responses

  • Weekly reminder emails are sent to nonrespondents
slide-7
SLIDE 7

Illustration of Phase Capacity in FEVS (2)

7

FEVS 2011 Reminder Schedule and Achieved Responses by Wave of Data Collection (a Calendar Week) for Three Example Agencies:

slide-8
SLIDE 8

Illustration of Phase Capacity in FEVS (3)

8

Trend of an example agency’s nonresponse-adjusted percent positive statistic for FEVS 2011 Item 4 with 95% confidence limits:

  • Above is a commonly observed FEVS pattern (Sigman et al., 2014)
  • Goal: identify estimate stability (i.e., phase capacity) as soon as possible 

change design phase

slide-9
SLIDE 9
  • II. Tests for Phase Capacity

9

slide-10
SLIDE 10

Rao, Glickman, and Glynn (RGG) – MI Test

10

  • Rao, Glickman, and Glynn (RGG) (2008) studied retrospective

“stopping rules” – best-performing method involves multiple imputation (MI)

  • Idea is to multiply impute (Rubin, 1987) the missing data M (M ≥ 2)

times for nonrespondents as of wave k, then delete responses

  • btained during wave k, specifically, and repeat for nonrespondents as

wave k – 1  result is 2M completed data sets and two nonresponse- adjusted, MI point estimates

  • A t-test is carried out by dividing the two point estimates’ difference

by an estimate of the MI variance of the difference

  • Phase capacity declared once the test statistic is insignificant
slide-11
SLIDE 11

Visualization of MI Test

11

M m m M

d M d

1

ˆ 1 ˆ

M m m M

d M U

1

) ˆ var( 1

        

M m M m M

d d M B

1 2

ˆ ˆ 1 1

Calculations:

: 

M

d H

:

1

M

d H

M M M

B M U d t          1 1 / ˆ

slide-12
SLIDE 12

New Method: A Test Amenable to Weighting

12

  • Alternatively, one could weight up the wave k and k – 1

respondents, respectively, producing two sets of nonresponse- adjusted weights w1

k and w1 k – 1

  • Fundamentals of Taylor series linearization (Woodruff, 1971) can

be used to derive the variance of the wave-specific weighted mean difference, which is a function of p = 4 totals:

  • Replication variance estimation methods could also be used

(Wolter, 2007), and unlike MI test, the weighting version generalizes to other point estimates, not strictly means

slide-13
SLIDE 13

Visualization of the Weighting Test

13

681 . 1 60 86 . 100 ˆ ˆ ˆ

2 1 2 1 10 1 2 1 10 1 2 1 2 1

   

 

 

N Y w y w y

i i i i i

666 . 1 60 96 . 99 ˆ ˆ ˆ

1 1 1 1 6 1 1 1 6 1 1 1 1 1

   

 

 

N Y w y w y

i i i i i

00567 . var ) ˆ var(

10 1 2 1

       

 i i

u  :

2 1

  H 2 . 00567 . 015 .    t :

2 1 1

  H 015 . ˆ 2

1

  

slide-14
SLIDE 14
  • III. FEVS Application Comparing the

Two Phase Capacity Tests

14

slide-15
SLIDE 15

Details of 2011 FEVS Application

15

  • Investigated 7 percent positive estimates comprising the Job Satisfaction

Index for a purposive sample of three agencies

  • Treating the ultimate respondent set as the full sample, used time stamps to

group responses by field period week and conducted the two test versions retrospectively – full sample used to compute “relative nonresponse error”

  • Used categorical demographics on sample frame (gender, minority status,

supervisor status, work unit, and work location) to adjust for nonresponse as follows:

– MI version: demographics served as main effects in a sequence of imputation models fit independently by agency, using IVEware (Raghunathan et al., 2001) (M = 5) – Weighting Version: demographics served as raking dimensions (Kalton and Flores- Cervantes, 2003)

slide-16
SLIDE 16

Results of 2011 FEVS Application

16

Comments:

  • MI version of test tends to

declare phase capacity sooner – only one instance calling for a 3rd wave of data collection

  • Because the nonresponse-

adjusted estimates tend to increase with each wave, the result is a larger residual NR error

slide-17
SLIDE 17

Interpretation of Application Results

17

  • Issue is that the variance decreases to 0 quicker for weighting version 

consider extreme case of no new respondents: variance of difference would be 0 for weighting version, but for MI version the dmi’s not necessarily 0

  • Results from a simulation study

discussed in article shed some more light on this claim: all else equal, weighting version has smaller estimated variance and is therefore more sensitive to point estimate changes

slide-18
SLIDE 18
  • IV. Limitations and Avenues for

Further Research

18

slide-19
SLIDE 19

Study Limitations

19

  • Despite aversion to phrase “stopping rule,” stopping was the only design

phase change investigated in this research

  • Data must be collected/processed real-time, and it was tacitly assumed that

the full sample is “active” – may be impractical for in-person surveys covering a vast geographical expanse, although tests could be applied to subsamples

  • Actual adoption of these approaches in FEVS would face resistance because:

– Desirable to treat each agency equitably; beginning in FEVS 2012, field period was preset to 6 weeks for all agencies – Higher scores are better, and so there may be opposition to any change, shortened field period included, believed to reduce point estimates

slide-20
SLIDE 20

Ideas for Further Research

20

  • Working out a more formal theoretical understanding as to why the

covariance is not accounted for equivalently in the two tests

  • Derive variants of MI test for point estimates other than means, so that more

comparisons could be made against the weighting version

  • Chapter 4 of Lewis (2014) extends weighting version of phase capacity test to

multivariate settings – could do something similar for MI version of test

  • Both phase capacity testing methods discussed today were retrospective in

nature; future research could develop prospective variants in the spirit of the

  • ne proposed by Wagner and Raghunathan (2010)
slide-21
SLIDE 21

Thanks!

Questions/Comments? Taylor.Lewis@opm.gov

21