Using Prior Wave Information and Paradata: Can They Help to Predict - - PowerPoint PPT Presentation

using prior wave information and paradata can they help
SMART_READER_LITE
LIVE PREVIEW

Using Prior Wave Information and Paradata: Can They Help to Predict - - PowerPoint PPT Presentation

Using Prior Wave Information and Paradata: Can They Help to Predict Response Outcomes and Call Sequence Length in a Longitudinal Study? Olga Maslovskaya, Gabriele Durrant and Peter W.F. Smith University of Southampton 14 March 2018, Washington


slide-1
SLIDE 1

Using Prior Wave Information and Paradata: Can They Help to Predict Response Outcomes and Call Sequence Length in a Longitudinal Study?

Olga Maslovskaya, Gabriele Durrant and Peter W.F. Smith University of Southampton 14 March 2018, Washington DC

slide-2
SLIDE 2

In face‐to‐face surveys interviewers make several visits (calls) to households to obtain response (this leads to a call sequence) Aim for survey practice: to identify (and ideally avoid) unsuccessful and long call sequences

Introduction

slide-3
SLIDE 3

Introduction 2

  • Paper 1 – Wave 1 data only

Durrant, Maslovskaya, Smith (2015): Journal of Survey Statistics and Methodology 3(3): 397‐424.

  • Paper 2 – Wave 1 and Wave 2 – longitudinal context

Durrant, Maslovskaya, Smith (2017): Journal of Official Statistics 33(3): 801‐834.

slide-4
SLIDE 4
  • Can we predict final call sequence length and final
  • utcome early on in the data collection process of

the current wave by taking into account information from current and previous waves?

  • In other words:

– Can we predict after, for example, the third call if a household is going to respond or not? – Can we predict how many calls it is going to take to reach the final outcome?

Main Research Questions

slide-5
SLIDE 5

Ability of ‘classical’ nonresponse models without call record data to predict nonresponse is often limited (R2 values well below 10%) (Olson et al. 2012; Olson and Groves 2012; West and Groves 2013).

  • How predictive are the models which include call record data?
  • Does their ability to predict improve once more call record

data are available (e.g. including earlier call information; including information from previous waves in a longitudinal study)?

  • How can these models best be used in survey practice?

Further Research Questions

slide-6
SLIDE 6

UK Understanding Society Survey (Waves 1 and 2) Wave 1: call record data, interviewers’ observation data, survey data – only responding households Wave 2: call record data, interviewers’ observation data Analysis sample: 10,630 households (HHs with 4 or more calls) Maximum number of calls in Wave 1: 30 Methodological difficulty: Household IDs are different in different waves (no such thing as longitudinal household) so it is difficult to follow HHs, but possible to follow individuals in households.

Data

slide-7
SLIDE 7

Dependent Variables and Models

slide-8
SLIDE 8

Modelling Strategy and Explanatory Variables

slide-9
SLIDE 9
  • Focus on ability of models to predict length and outcome
  • To compare different models and to assess quality of model prediction and

model fit

  • Pseudo‐R2 statistic (proportion of variation in the dependent variable

that is explained by the model)

  • Concepts from epidemiology to assess accuracy of models (Plewis et al

2012):

  • discrimination (sensitivity and specificity)
  • prediction (positive and negative predicted value)
  • AUC (Area under the Curve) (ROC (receiver operating characteristic)

curve)

Assessment of Models

slide-10
SLIDE 10

Assessment of Models

slide-11
SLIDE 11

Area under the Curve (AUC)

  • A Receiver Operating Curve (ROC)

summarises predictive power for all possible values c (cut‐off for π), by plotting sensitivity as a function of (1‐ specificity)

  • The higher sensitivity the higher

predictive power, given a particular specificity

  • The greater area under the curve (AUC)

the greater the predictive power. AUC values range from 0.5 (no discrimination) to 1 (perfect discrimination).

slide-12
SLIDE 12

Results

slide-13
SLIDE 13

Results

slide-14
SLIDE 14

Results

slide-15
SLIDE 15

Results

slide-16
SLIDE 16

Results

slide-17
SLIDE 17

Results

slide-18
SLIDE 18

Results

slide-19
SLIDE 19

Results: Sensitivity

slide-20
SLIDE 20

Results: Positive Predicted Values

Model Short Successful (n=5304) Short Unsuccessful (n=1400) Long Successful (n=2216) Long Unsuccessful (n=1710) 1 49.9% 0.0% 0.0% 0.0% 2 50.5% 66.7% 37.5% 34.6% 3 50.9% 46.7% 37.2% 37.2% 4 51.8% 39.3% 32.6% 36.6% 5 53.3% 43.0% 38.3% 37.2% 6 53.7% 37.9% 35.0% 39.1% 7 54.8% 48.3% 36.8% 40.4% 8 62.1% 48.6% 42.2% 41.1% P k|

  • k), for k=1,2,3,4

Of the cases predicted to be long unsuccessful 41.1% are predicted correctly

slide-21
SLIDE 21
  • Basic geographic information not very predictive
  • Controlling for survey data from Wave 1 increases R‐squared but

not other assessment indicators (classical nonresponse model not very predictive!)

  • Call record data from Wave 1 highly significant and improves

prediction (but not by very much)

Time of calls and time between calls are all significant variables but their impact on prediction limited

  • Interviewers’ observation variables: some significant in the

models, slightly improve the predictive power

Summary (1)

slide-22
SLIDE 22
  • Indicators of change (between Wave 1 and Wave 2):

₋ indicators of change in interviewers’ observations significant and improve prediction (but not by very much) ₋ Variable ‘household has changed household composition’ highly significant (but not very predictive)

  • Big increase in prediction once call outcomes of most

recent calls in Wave 2 included! (the more calls the more useful as expected)

  • Modelling length and final outcome jointly improves

prediction (variables in length model and in outcome model can be different)

Summary (2)

slide-23
SLIDE 23
  • Novelty is to model sequence length and to model length and
  • utcome jointly and to condition on all current and previous

information on a case (including survey data, call data, interviewer observations)

  • Most recent call outcome most predictive: could indicate that

the response process depends on current circumstances more than previous history and characteristics

  • Can be implemented into survey practice quite easily , using

standard methodology

  • Survey managers may wish to weigh up between the

probability of a successful outcome versus sequence length

Conclusions

slide-24
SLIDE 24

References

Durrant G., Maslovskaya O. and P.W.F. Smith (2015) Modelling final outcome and length of call sequence to improve efficiency in interviewer call scheduling, Journal of Survey Statistics and Methodology 3(3): 397‐424. Durrant,G., Maslovskaya, O. and P.W.F. Smith (2017) Using prior wave information and paradata: Can they help to predict response outcomes and call sequence length in a longitudinal study?. Journal of Official Statistics 33(3): 801‐834. Olson, K., and R.M. Groves (2012) An examination of within‐person variation in response propensity over the data collection field period. Journal of Official Statistics 28: 29‐51. Olson, K., J.D. Smyth, and H.M. Wood (2012) Does giving people their preferred survey mode actually increase survey participation rates? An experimental examination. Public Opinion Quarterly 76: 611 – 635. Plewis I., Ketende S. and L. Calderwood (2012) Assessing the accuracy of response propensities in longitudinal studies, Survey Methodology 38(2): 167‐171. West, B. T. and R.M. Groves (2013) A propensity‐adjusted interviewer performance indicator. Public Opinion Quarterly 77: 352‐374.