Weights in Understanding Society Olena Kaminska An initiative by - - PowerPoint PPT Presentation

weights in understanding society
SMART_READER_LITE
LIVE PREVIEW

Weights in Understanding Society Olena Kaminska An initiative by - - PowerPoint PPT Presentation

Weights in Understanding Society Olena Kaminska An initiative by the Economic and Social Research Council, with scientific leadership by the Institute for Social and Economic Research, University of Essex, and survey delivery by NatCen Social


slide-1
SLIDE 1

An initiative by the Economic and Social Research Council, with scientific leadership by the Institute for Social and Economic Research, University of Essex, and survey delivery by NatCen Social Research and Kantar Public

Weights in Understanding Society

Olena Kaminska

slide-2
SLIDE 2

Topics covered?

  • Should I use weights?
  • How to select a correct weight?
  • I want higher sample size but you have 0-weights
  • Can I create my own tailored weights?
slide-3
SLIDE 3

Should I use weights?

slide-4
SLIDE 4

The easiest way to represent population with UKHLS: svyset command in Stata

use [...]a_indall.dta svyset a_psu [pweight= a_psnenus_xw], strata(a_strata) svy: tabulate a_ethn_dv svy: logistic a_single_dv a_dvage

slide-5
SLIDE 5

Effect of weights in UKHLS: Country distribution

population unweighted England

84.2 79.8

Wales

4.7 6.2

Scotland

8.2 7.8

NI

2.8 6.2

Wave 8 (2016-2017) estimates of 0+ population Population estimates of mid-2016 from ONS

slide-6
SLIDE 6

Effect of weights in UKHLS: Country distribution

population unweighted weighted England

84.2 79.8 84.5

Wales

4.7 6.2 4.6

Scotland

8.2 7.8 8.1

NI

2.8 6.2 2.8

Wave 8 (2016-2017) estimates of 0+ population Population estimates of mid-2016 from ONS

slide-7
SLIDE 7

Effect of weights in UKHLS: General election 2017

population unweighted Conservatives

42.4 36.5

Labour

40.0 48.9

Liberal Democrat

7.4 7.3

Scottish National Party

3.0 1.9

Plaid Cymru

0.5 0.3

Green Party

1.6 1.7

Wave 8 estimates for July-December 2017, excludes NI 2017 UK general election results from Wikipedia

slide-8
SLIDE 8

Effect of weights in UKHLS: General election 2018

population unweighted weighted Conservatives

42.4 36.5 42.5

Labour

40.0 48.9 40.6

Liberal Democrat

7.4 7.3 7.8

Scottish National Party

3.0 1.9 2.7

Plaid Cymru

0.5 0.3 0.4

Green Party

1.6 1.7 2.4

Wave 8 estimates for July-December 2018, excludes NI, weight is adjusted as BHPS is also excluded 2017 UK general election results from Wikipedia

slide-9
SLIDE 9

What if I ran my analysis without weights?

  • Your results may be quite off, they may be only a

little off, but you will not know

slide-10
SLIDE 10

How to select a weight for my analysis?

slide-11
SLIDE 11
slide-12
SLIDE 12

Naming convention for Understanding Society weights

w_xxxyyzz_aa

w_ Xxx Yy Zz _aa a_ b_ c_ d_ … hhd: household psn: persons 0+ ind: persons 16+ yth: persons 10-15 en: enumeration in: interview px: interview or proxy 5m: “extra 5 minutes” sc: self- completion ns: nurse visit bd: blood us: GPS & EMB bh: BHPS ub: GPS, EMB & BHPS ui: GPS, EMB, BHPS & IEMB 91: BHPS original sample 01: BHPS original sample + boosts _xw: cross-sectional analysis weight _lw: longitudinal weight _xd: x-sectional design weight _li: longitudinal inclusion weight

slide-13
SLIDE 13

_aa part: Is your analysis longitudinal or cross-sectional

  • Longitudinal _lw
  • Cross-sectional _xw
slide-14
SLIDE 14

w_ part: which waves do you use?

  • (Last) wave of your analysis: e.g. wave 9: i_ weight

1 2 3 4 5 6 7 8 9 a b c d e f g h i

slide-15
SLIDE 15

_xxx part: whom do you study

  • Household level analysis: _hhdenzz_xw in

_hhresp.dta

  • Everyone in the household (0+): _psnenzz_ weight

in _indall.dta

  • Youth analysis (10-15): _ythsczz_xw in _youth.dta
  • Adults (16+): _indyyzz_aa in _indresp.dta
slide-16
SLIDE 16

yy part: analysis of adults

  • Questions asked to proxies: _indpxzz_
  • Questions in main questionnaire: _indinzz_
  • Questions in self-completion questionnaire: _indsczz_
  • Questions from nurse visit: _indnszz_
  • Questions from using information from blood samples:

_indbdzz_

  • Extra 5 minutes questionnaire: _ind5mzz_
slide-17
SLIDE 17

Combination of instruments

Level of Analysis Questions available for 5

Household level (all enumerated individuals)

4

Adult proxy and main interview

3

Adult main interview only (no proxy)

2

Adult self‐completion interview

2

Extra 5 minutes interview

2

Youth questionnaire

2

Nurse visit

1

Information from blood sample

Use the lowest level of analysis for your weight:

slide-18
SLIDE 18

zz_ part: which waves

  • Wave 6 onwards (BHPS+GPS+EMB+IEMB):

_XXXXXui_

  • Wave 2 onwards (BHPS+GPS+EMB): _XXXXXub_
  • Wave 1 onwards (GPS+EMB): _XXXXXus_
  • 2001 onwards (BHPS, including NI): _XXXXX01_lw
  • 1991 onwards (BHPS, excluding NI): _XXXXX91_lw
slide-19
SLIDE 19

I want higher sample size but you have 0-weights

slide-20
SLIDE 20

Why 0 weights

  • TSMs are not part of a longitudinal sample by design

– they all have 0 longitudinal weight

  • ‘TSMs from wave 1’ – non-eligible people in eligible

EMB and IEMB households (they started at wave 1 and 6) – always 0 weights, even in their wave 1

  • Longitudinal weights assume participation in all

waves – so 0 weight for anyone who missed at least

  • ne wave
  • Cross-sectional weights require household

participation in all waves (although ui weights require participation in waves1,2,6 and onwards)

slide-21
SLIDE 21

Zero weights

sample size estimate Std Error CI low CI high 20 0.193 0.096

  • 0.008

0.395 30 0.171 0.075 0.017 0.325 40 0.188 0.067 0.053 0.323 50 0.139 0.051 0.037 0.241 75 0.097 0.036 0.025 0.168 100 0.106 0.033 0.039 0.172 300 0.171 0.028 0.115 0.226 500 0.146 0.020 0.107 0.185 1000 0.138 0.013 0.113 0.164 5000 0.133 0.006 0.122 0.144 10000 0.133 0.004 0.125 0.140 15000 0.132 0.003 0.126 0.138 20000 0.131 0.003 0.126 0.136 30000 0.131 0.002 0.127 0.135 33818 0.132 0.002 0.128 0.136 unweighted 39,289 0.150 0.002 0.146 0.153 Proportion of natural/adoptive/step mothers of child under 16 from wave 8

slide-22
SLIDE 22

Zero weights

  • 0.050

0.000 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400 0.450 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 estimate CI low CI high

20 30 40 50 75 100 300 500 1000 5000 10000 15000 20000 30000 33818

Proportion and CIs of natural/adoptive/step mothers of child under 16 from wave 8

slide-23
SLIDE 23

I want more (sample size):

  • First, analyse with our weights
  • If significant – just published that
  • If not significant and p=0.6 – it’s unlikely that adding

20% of sample will take p below 0.05

  • If p is marginal – worth considering tailored

weighting

slide-24
SLIDE 24

Tailored weights

slide-25
SLIDE 25

Start with one of our weights

  • You are studying wave 1 and wave 9: start with

wave 1 weight and model wave 9 response conditional on wave 1;

  • You are studying wave 8 to 9 change: start with

wave 6 issue weight and model wave 8-9 joint response conditional on wave 6 positive weight;

  • You are studying youth questions at wave 1 and

main questionnaire at wave 9: start with enumerated weight i_psnenus_lw

slide-26
SLIDE 26

If you want your own attrition adjustment

  • Start with either:
  • Wave 1 for (GPS+EMB) weight: a_psnenus_xw
  • Wave 2 issue weight (BHPS+GPS+EMB): b_psnenub_li
  • Wave 6 issue weight (BHPS+GPS+EMB): f_psnenub_li
  • Use predictors from wave 1, 2 or 6 respectively
  • Remember to take into account newborns, death,

moving out of the country and becoming 16 (entering adult questionnaire) adjustments

  • You can create your own cross-sectional weights too

through a weight share

slide-27
SLIDE 27

BHPS 1991 only

1991 Original Sample 1999 2001 2009-10 2014-15

Weight = 1/prob Prob=prob_selection*prob_w1*prob_attr

prob_selection – selection probability reflecting sample design prob_w1 – correction for household and person nonresponse at wave 1 prob_attr – correction for nonresponse after wave 1

slide-28
SLIDE 28

UKHLS samples

1991 Original Sample 1999 Sc and W boost 2001 NI sample 2009-10 GPS + EMB samples 2014-15 IEMB+NIB samples

slide-29
SLIDE 29

UKHLS samples

1991 Original Sample 1999 Sc and W boost 2001 NI sample 2009-10 GPS + EMB samples

  • For each person we infer where they lived in ‘91, ‘99,

’01, ’09-’10 and ’14-’15 (E, W, Sc, NI or abroad) using:

  • the place they were selected at
  • for long-term members we know where and when they

moved

  • for new members based on survey questions
  • all IEMB sample members were asked where they lived
  • for ethnic minority groups, place is more detailed:

residence in ’09-’10 and ’14-’15 (London borough, postcode sector)

2014-15 IEMB+NIB samples

slide-30
SLIDE 30

Design weight at wave 8

1991 Original Sample 1999 Sc and W boost 2001 NI sample 2009-10 GPS + EMB samples

Dweight=1/Dprob Total is a sum of 17 selection probabilities Dprob=probE91+probSc91+probW91+ + probSc99 + probW99 + probNI01 + + probE09 + probSc09 + probW09 + probNI09 + + pembE09 + pembSc09 + pembW09 + + piembE14 + piembSc14 + piembW14 + pnib14 Newborns get their mother’s Dweight

2014-15 IEMB+NIB samples

slide-31
SLIDE 31

Issue weights at waves 2 and 6

1991 Original Sample 1999 Sc and W boost 2001 NI sample 2009-10 GPS + EMB samples

Iweight=1/Iprob Nr – prob of w1 response and retention until w2 (w6) Total is a sum of 17*2 probabilities Iprob=probE91*nrE91+probSc91*nrSc91+probW91*nrW91+ + probSc99*nrSc99 + probW99*nrW99 + probNI01*nrNI01 + + probE09*nrE09 + probSc09*nrSc09 + probW09*nrW09 + + probNI09*nrNI09 + pembE09*nreE09 + embSc09*nreSc09 + + pembW09*nreW09 + piembE14*nrieE14 + piembSc14*nrieSc14 + piembW14*nrieW14 + pnib14*nrnib14 Newborns get their mother’s Iweight

2014-15 IEMB+NIB samples

slide-32
SLIDE 32

If you want to start with a selection probabilities

  • In addition to the previous slide remember to correct

for wave 1 nonresponse (around 30-40%)

  • Nonresponse predictors must be available for

respondents and nonrespondents (from external sources linked to UKHLS)

  • Unless you use just GPS, remember 34 components
  • f selection probabilities and nonresponse

probabilities before you combine them

  • Correcting just for attrition with design weight only is

better than nothing but will miss on a substantial part

  • f nonresponse
slide-33
SLIDE 33

The main point to remember

  • Weights are like a magic wand: use svyset with psu,

strata and weight, and go straight to your analysis svyset n_psu [pw=weight], strata(n_strata)] svy:

  • The only task is to select the correct weight
slide-34
SLIDE 34

If you have further questions

  • Read our User Guide

https://www.understandingsociety.ac.uk/documentation

  • Ask us at the User Support Forum
  • https://iserswww.essex.ac.uk/support/projects/support
  • Email us at usersupport@understandingsociety.ac.uk
  • Ask us at the Helpdesk Hour, if you want to join email us at

usersupport@understandingsociety.ac.uk

  • Selecting the correct weight video:

https://www.youtube.com/watch?v=6xwrIdUmxts