Sample design and attrition in MCS Tarek Mostafa Centre for - - PowerPoint PPT Presentation

sample design and attrition in mcs
SMART_READER_LITE
LIVE PREVIEW

Sample design and attrition in MCS Tarek Mostafa Centre for - - PowerPoint PPT Presentation

Sample design and attrition in MCS Tarek Mostafa Centre for Longitudinal Studies t.mostafa@ioe.ac.uk Outline The MCS sample and design Attrition in the Millennium Cohort Study. What do we know about attrition in MCS? What can we do


slide-1
SLIDE 1

Sample design and attrition in MCS

Tarek Mostafa Centre for Longitudinal Studies t.mostafa@ioe.ac.uk

slide-2
SLIDE 2

Outline

  • The MCS sample and design
  • Attrition in the Millennium Cohort Study.

What do we know about attrition in MCS? What can we do about it?

  • Access to MCS documentation.
  • Data structure and how to merge datasets.
slide-3
SLIDE 3

The MCS sample

MCS population is defined as

All children born between 1 September 2000 and 31 August 2001 (for England and Wales), and between 23 November 2000 and 11 January 2002 (for Scotland and Northern Ireland, see 2.2), alive and living in the UK at age nine months, and eligible to receive Child Benefit at that age. CB was then a universal benefit. and, after nine months: for as long as they remain living in the UK.

slide-4
SLIDE 4

The MCS population

  • The population includes:
  • 1. Children living in non-household situations (women's refuges, hostels,

hospitals, prisons etc.) at age nine months in principle – in practice none?.

  • 2. Children not born in the UK but established as resident in the UK at age

nine months.

  • The population excludes:
  • 1. Children who died before age 9 months.
  • 2. UK-born children who emigrated from the UK before 9 months.
  • 3. Children not established as resident in the UK at age nine months- e.g.

children of foreign diplomats, asylum seekers etc.

slide-5
SLIDE 5

The MCS sample design

  • The population was stratified by UK country - England, Wales,

Scotland and Northern Ireland.

  • Each country had two strata: advantaged and disadvantaged
  • families. England had an additional one for areas with high

percentage of ethnic minorities.

  • The primary sampling unit is the electoral ward. Small wards

with very few births were combined into ‘super-wards’.

  • Minorities and disadvantaged families were over sampled.
  • Identified sample: 27201 | Issued sample: 24180
  • Productive at wave 1: 18552
  • About 692 new families joined MCS in wave 2.
slide-6
SLIDE 6

Problems of non-response/attrition

  • Distinction between unit (respondents’) non

response and item non-response (focus on former here)

  • Types of non-response (have separate reasons)

Non-contact Refusal Inability Out of scope/ineligible

  • Non-response on increase in all surveys
  • Non-response may not be permanent (in panel

survey)

  • Effects of non-response/attrition
slide-7
SLIDE 7

Attrition

Definition

  • Attrition is the discontinued participation of

some individuals in a longitudinal survey for reasons that are unknown and/or beyond the control of the researcher

slide-8
SLIDE 8

Productive sample over time

18,552 15,590 15,246 13,857 13,287 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 20,000 Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Pructive sample size

slide-9
SLIDE 9

Non-response

Outcome Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Productive 18,552 15,590 15,246 13,857 13,287 Not Issued 692 2,213 2,851 Ineligible 167 300 126 78 Untraced Movers 687 547 706 388 Refusal 1,739 2,315 1,811 2,196 Non-Contact 930 546 123 438 Other 131 290 408 6 Total 19,244 19,244 19,244 19,244 19,244

slide-10
SLIDE 10

Types of non-response

500 1,000 1,500 2,000 2,500 3,000

Wave 1 Wave 2 Wave 3 Wave 4 Wave 5

Not Issued Ineligible Untraced Movers Refusal Non-Contact Other

slide-11
SLIDE 11

Monotone vs. non-monotone response

Type of non-response Freq. %

Monotone 5,023 26.1 Non-monotone 3,773 19.6 All waves 10,448 54.3 Total 19,244 100.0

Monotone response: respondents dropped out without coming back. Non-monotone: interrupted response pattern over time. (New families are special case of non-monotone).

slide-12
SLIDE 12

Sample composition over time: gender

51.37 51 51.05 50.61 50.46 48.63 49 48.95 49.39 49.54 47 47.5 48 48.5 49 49.5 50 50.5 51 51.5 52 Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Boy Girl

slide-13
SLIDE 13

Sample composition over time: class & ‘race’

5 10 15 20 25 30 35 Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Non-White Managerial and professional

slide-14
SLIDE 14
  • Missing data

Smaller samples, fewer transitions, incomplete histories.

  • Biases in results (all surveys)

Disproportionate to some groups (mobile, disadvantaged, young, men, long working hours) Problem if linked to survey topic focus/variables

Effects of non-response/attrition

slide-15
SLIDE 15

Non-response bias - implications

  • Ignore the problem : equivalent to assuming that

there are no sample bias.

  • Use adjustment techniques such as weighting

and imputations.

slide-16
SLIDE 16

Sampling and attrition weights

  • Sampling weights: adjust the sample

composition to take account of over-sampling in the first wave.

  • Attrition weights: adjust the sample

composition to take account of the loss of particular type of respondents.

  • Adjustment means giving more importance

(weight) to a particular group.

  • Overall weights = Sampling Wgt x Attrition

Wgt

slide-17
SLIDE 17

Attrition weights construction

  • Logistic response models
  • Dependent variable: binary response
  • utcome (0/1) in wave M.
  • Independent variables: characteristics of

respondents in previous waves.

slide-18
SLIDE 18

MCS weights

Weights Wave Variable name Sampling weight (country specific analyses) weights1 Sampling weight (whole of UK analyses) weights2 Overall weights (country specific analyses) 1 aovwt1 Overall weights (whole of UK analyses) 1 aovwt2 Overall weights (country specific analyses) 2 bovwt1 Overall weights (whole of UK analyses) 2 bovwt2 Overall weights (country specific analyses) 3 covwt1 Overall weights (whole of UK analyses) 3 covwt2 Overall weights (country specific analyses) 4 dovwt1 Overall weights (whole of UK analyses) 4 dovwt2 Overall weights (country specific analyses) 5 eovwt1 Overall weights (whole of UK analyses) 5 eovwt2

Overall weights = Sampling weights x Attrition weights Weights are prefixed in alphabetical

  • rder depending on the wave

If you are doing an analysis with an

  • utcome at wave 4 use the

corresponding weight at same wave

slide-19
SLIDE 19

Applying survey design in Stata: svy

svyset sptn00 [pweight=covwt2], strata(pttype2) fpc(nh2)

  • Sptn00: Electoral ward ID.
  • Covwt2: weight (you need to choose the correct one).
  • Strata: Stratum ID.
  • Nh2: finite population coefficient.
slide-20
SLIDE 20

MCS datasets and access

slide-21
SLIDE 21

Documentation

  • Guide to the Datasets
  • Questionnaires (CAPI and paper)
  • Technical reports on sampling, response and

fieldwork, Data notes

  • Data Dictionary
  • User Guides to Initial Findings (per sweep), to

geographical identifiers, psychological scales, derived variables.

  • Online bibliography
slide-22
SLIDE 22

Available datasets: main survey data

MCS1 MCS2 MCS3 MCS4 MCS5 MCS6 Age 9 months 3 years 5 years 7 years 11 years 14 years Longitudinal family file X X X X X X Parental interview X X X X X X Household grid X X X X X X Child assessment X X X X X Child measurement X X X Neighbourhood assessment X Older siblings X X Child self completion X X X Consent to data linkage X X X X X X Derived variables X X X X X X

slide-23
SLIDE 23

Available datasets: additional data

MCS1 MCS2 MCS3 MCS4 MCS5 MCS6 Age 9 months 3 years 5 years 7 years 11 years 14 years Geographical linked data X X X X Foundation stage profile X Teacher survey X X X Birth registration and maternity espisodes X Health visitor survey X Oral fluid X X Activity monitor X X Time use record X Nursery observations Undeposited

slide-24
SLIDE 24

Access

  • Registration for UK based researchers has the following steps:
  • Apply for a username and password. This can be done on this page:

http://www.data-archive.ac.uk/sign-up/credentials-application

  • Complete an online registration form after logging in.
  • In the process of downloading the data, they will also be asked to

register their project online (30 words).

  • All that is quick and straight forward.
slide-25
SLIDE 25

Access

  • Website:

http://discover.ukdataservice.ac.uk/series/?sn=2000031

  • MCS datasets come under three different licence

types:

  • 1. End User Licence = easy!
  • 2. Special Access Licence = difficult
  • 3. Secure Access Licence = difficult & impossible for

non-UK based researchers.

slide-26
SLIDE 26

Access

  • All aforementioned datasets (with End User Licence) are

accessible and downloadable once the users are fully registered.

  • Special Licence Access: Hospital of Birth: Special Licence
  • Access. Available to non-UK based researcher, but process

more difficult.

  • Secure Access: Access to sensitive information such as

geographical identifiers and admin data. Possible to link to many datasets. Data are not downloadable and can be access only via remote desktop.

slide-27
SLIDE 27

Secure Access – linking

Geographical identifiers

Possible to link area-level dataset to MCS: level of poverty of their neighbourhood, presence of services or other amenities, etc.

MCS1-MCS4: Ward level MCS1-MCS4: Lower Super Output Area MCS1-MCS4: Output area

Education administrative Datasets

MCS1-MCS4: Linked Education Administrative Dataset - Scotland MCS1-MCS4: Linked Education Administrative Dataset - Wales MCS1-MCS4: Linked Education Administrative Dataset - England

slide-28
SLIDE 28

Data Structure and Linking Datasets

slide-29
SLIDE 29

Linking Datasets

  • Understand the layout of the datasets
  • How you link depends on what you want

– Family - bedrooms in household – Interview - Cohort child outcomes – Respondent - income of mother. – Twins and triplets.

slide-30
SLIDE 30

Dataset Layout

mcsid ampnum00 ampsex00 .... appnum00 appsex00 ... M10001N 1 female 2 male M10002P 1 female 2 male M10007U 2 male 1 female M10008V 1 female 2 male M10011Q 1 female 2 male mcsid bmpnum00 bmpsex00 .... bppnum00 bppsex00 ... M10001N 2 male 1 female M10002P 1 female 2 male M10007U 2 male 1 female M10008V 1 female 2 male M10014T 1 female 2 male

MCS1 MCS2

Sweep 1 Main respondent person no. Sweep 1 Main respondent sex Sweep 1 Partner respondent sex Sweep 1 Partner respondent person no. MCS Family Identifier Sweep Identifier ‘b’ Mother not always main respondent Sometimes they swap around Families don’t always appear at each sweep

slide-31
SLIDE 31

mcsid ampnum00 ampsex00 .... M10001N 1 female M10002P 1 female M10007U 2 male M10008V 1 female M10011Q 1 female mcsid bmpnum00 bmpsex00 .... M10001N 2 male M10002P 1 female M10007U 2 male M10008V 1 female M10014T 1 female

MCS1 MCS2

mcsid appnum00 appsex00 ... M10001N 2 male M10002P 2 male M10007U 1 female M10008V 2 male M10011Q 2 male mcsid bppnum00 bppsex00 ... M10001N 1 female M10002P 2 male M10007U 1 female M10008V 2 male M10014T 2 male

Stack equivalent variables Potentially rename variables

Same Respondent Linkage

slide-32
SLIDE 32

MCS wave 5 data structure

  • The data structure changed in MCS 5.
  • Instead of having separate data files for main

respondents and partners, now the observations are appended into one file.

  • Each person in the household has a unique identifier.

You need to restrict your sample based on these identifiers and on the research you want to do.

  • See user guide

http://www.cls.ioe.ac.uk/page.aspx?&sitesectionid=1266 &sitesectiontitle=User+Guides

slide-33
SLIDE 33

Variable Naming

cmreofa0 - S3 Main: How often reads to CM C1

  • 1. c - sweep identifier
  • 2. m - respondent identifier
  • 3. reof - CAPI Question name

– 0, or a, b, c - cohort member identifier – 0 or a… - no multi-code on question

  • cpreofb0 - S3 Partner: How
  • ften reads to CM C2
  • c - sweep identifier
  • m - respondent identifier
  • reof - CAPI Question name
  • 0, or a, b, c - cohort member

identifier

  • 0 or a… - no multi-code on

question

slide-34
SLIDE 34

Derived Variables

  • Respondent Identity and Response
  • Household composition
  • Ethnicity (Parents/ Carers and Cohort Members)
  • Education (Highest NVQ)
  • Employment and Occupation coding
  • Religion
  • Income
  • Housing
  • Anthropometry
  • Psychological Scales
slide-35
SLIDE 35

Finding Variables

  • There are many MCS files and each file contains a huge

number of variables

  • This can make it difficult to locate specific variables you

are interested in

  • The questionnaires for each sweep are available on the

CLS website, as is a searchable online data dictionary In Stata, command lookfor Allows to search variables and labels based on key words.

slide-36
SLIDE 36

Data Dictionary

slide-37
SLIDE 37

Website resources

  • http://www.cls.ioe.ac.uk/mcs
  • http://www.spsstools.net/
  • http://www.stata.com/links/resources-for-

learningstata/

slide-38
SLIDE 38

Tarek Mostafa Centre for Longitudinal Studies T.Mostafa@ioe.ac.uk