Sample design and attrition in MCS Tarek Mostafa Centre for - - PowerPoint PPT Presentation
Sample design and attrition in MCS Tarek Mostafa Centre for - - PowerPoint PPT Presentation
Sample design and attrition in MCS Tarek Mostafa Centre for Longitudinal Studies t.mostafa@ioe.ac.uk Outline The MCS sample and design Attrition in the Millennium Cohort Study. What do we know about attrition in MCS? What can we do
Outline
- The MCS sample and design
- Attrition in the Millennium Cohort Study.
What do we know about attrition in MCS? What can we do about it?
- Access to MCS documentation.
- Data structure and how to merge datasets.
The MCS sample
MCS population is defined as
All children born between 1 September 2000 and 31 August 2001 (for England and Wales), and between 23 November 2000 and 11 January 2002 (for Scotland and Northern Ireland, see 2.2), alive and living in the UK at age nine months, and eligible to receive Child Benefit at that age. CB was then a universal benefit. and, after nine months: for as long as they remain living in the UK.
The MCS population
- The population includes:
- 1. Children living in non-household situations (women's refuges, hostels,
hospitals, prisons etc.) at age nine months in principle – in practice none?.
- 2. Children not born in the UK but established as resident in the UK at age
nine months.
- The population excludes:
- 1. Children who died before age 9 months.
- 2. UK-born children who emigrated from the UK before 9 months.
- 3. Children not established as resident in the UK at age nine months- e.g.
children of foreign diplomats, asylum seekers etc.
The MCS sample design
- The population was stratified by UK country - England, Wales,
Scotland and Northern Ireland.
- Each country had two strata: advantaged and disadvantaged
- families. England had an additional one for areas with high
percentage of ethnic minorities.
- The primary sampling unit is the electoral ward. Small wards
with very few births were combined into ‘super-wards’.
- Minorities and disadvantaged families were over sampled.
- Identified sample: 27201 | Issued sample: 24180
- Productive at wave 1: 18552
- About 692 new families joined MCS in wave 2.
Problems of non-response/attrition
- Distinction between unit (respondents’) non
response and item non-response (focus on former here)
- Types of non-response (have separate reasons)
Non-contact Refusal Inability Out of scope/ineligible
- Non-response on increase in all surveys
- Non-response may not be permanent (in panel
survey)
- Effects of non-response/attrition
Attrition
Definition
- Attrition is the discontinued participation of
some individuals in a longitudinal survey for reasons that are unknown and/or beyond the control of the researcher
Productive sample over time
18,552 15,590 15,246 13,857 13,287 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 20,000 Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Pructive sample size
Non-response
Outcome Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Productive 18,552 15,590 15,246 13,857 13,287 Not Issued 692 2,213 2,851 Ineligible 167 300 126 78 Untraced Movers 687 547 706 388 Refusal 1,739 2,315 1,811 2,196 Non-Contact 930 546 123 438 Other 131 290 408 6 Total 19,244 19,244 19,244 19,244 19,244
Types of non-response
500 1,000 1,500 2,000 2,500 3,000
Wave 1 Wave 2 Wave 3 Wave 4 Wave 5
Not Issued Ineligible Untraced Movers Refusal Non-Contact Other
Monotone vs. non-monotone response
Type of non-response Freq. %
Monotone 5,023 26.1 Non-monotone 3,773 19.6 All waves 10,448 54.3 Total 19,244 100.0
Monotone response: respondents dropped out without coming back. Non-monotone: interrupted response pattern over time. (New families are special case of non-monotone).
Sample composition over time: gender
51.37 51 51.05 50.61 50.46 48.63 49 48.95 49.39 49.54 47 47.5 48 48.5 49 49.5 50 50.5 51 51.5 52 Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Boy Girl
Sample composition over time: class & ‘race’
5 10 15 20 25 30 35 Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Non-White Managerial and professional
- Missing data
Smaller samples, fewer transitions, incomplete histories.
- Biases in results (all surveys)
Disproportionate to some groups (mobile, disadvantaged, young, men, long working hours) Problem if linked to survey topic focus/variables
Effects of non-response/attrition
Non-response bias - implications
- Ignore the problem : equivalent to assuming that
there are no sample bias.
- Use adjustment techniques such as weighting
and imputations.
Sampling and attrition weights
- Sampling weights: adjust the sample
composition to take account of over-sampling in the first wave.
- Attrition weights: adjust the sample
composition to take account of the loss of particular type of respondents.
- Adjustment means giving more importance
(weight) to a particular group.
- Overall weights = Sampling Wgt x Attrition
Wgt
Attrition weights construction
- Logistic response models
- Dependent variable: binary response
- utcome (0/1) in wave M.
- Independent variables: characteristics of
respondents in previous waves.
MCS weights
Weights Wave Variable name Sampling weight (country specific analyses) weights1 Sampling weight (whole of UK analyses) weights2 Overall weights (country specific analyses) 1 aovwt1 Overall weights (whole of UK analyses) 1 aovwt2 Overall weights (country specific analyses) 2 bovwt1 Overall weights (whole of UK analyses) 2 bovwt2 Overall weights (country specific analyses) 3 covwt1 Overall weights (whole of UK analyses) 3 covwt2 Overall weights (country specific analyses) 4 dovwt1 Overall weights (whole of UK analyses) 4 dovwt2 Overall weights (country specific analyses) 5 eovwt1 Overall weights (whole of UK analyses) 5 eovwt2
Overall weights = Sampling weights x Attrition weights Weights are prefixed in alphabetical
- rder depending on the wave
If you are doing an analysis with an
- utcome at wave 4 use the
corresponding weight at same wave
Applying survey design in Stata: svy
svyset sptn00 [pweight=covwt2], strata(pttype2) fpc(nh2)
- Sptn00: Electoral ward ID.
- Covwt2: weight (you need to choose the correct one).
- Strata: Stratum ID.
- Nh2: finite population coefficient.
MCS datasets and access
Documentation
- Guide to the Datasets
- Questionnaires (CAPI and paper)
- Technical reports on sampling, response and
fieldwork, Data notes
- Data Dictionary
- User Guides to Initial Findings (per sweep), to
geographical identifiers, psychological scales, derived variables.
- Online bibliography
Available datasets: main survey data
MCS1 MCS2 MCS3 MCS4 MCS5 MCS6 Age 9 months 3 years 5 years 7 years 11 years 14 years Longitudinal family file X X X X X X Parental interview X X X X X X Household grid X X X X X X Child assessment X X X X X Child measurement X X X Neighbourhood assessment X Older siblings X X Child self completion X X X Consent to data linkage X X X X X X Derived variables X X X X X X
Available datasets: additional data
MCS1 MCS2 MCS3 MCS4 MCS5 MCS6 Age 9 months 3 years 5 years 7 years 11 years 14 years Geographical linked data X X X X Foundation stage profile X Teacher survey X X X Birth registration and maternity espisodes X Health visitor survey X Oral fluid X X Activity monitor X X Time use record X Nursery observations Undeposited
Access
- Registration for UK based researchers has the following steps:
- Apply for a username and password. This can be done on this page:
http://www.data-archive.ac.uk/sign-up/credentials-application
- Complete an online registration form after logging in.
- In the process of downloading the data, they will also be asked to
register their project online (30 words).
- All that is quick and straight forward.
Access
- Website:
http://discover.ukdataservice.ac.uk/series/?sn=2000031
- MCS datasets come under three different licence
types:
- 1. End User Licence = easy!
- 2. Special Access Licence = difficult
- 3. Secure Access Licence = difficult & impossible for
non-UK based researchers.
Access
- All aforementioned datasets (with End User Licence) are
accessible and downloadable once the users are fully registered.
- Special Licence Access: Hospital of Birth: Special Licence
- Access. Available to non-UK based researcher, but process
more difficult.
- Secure Access: Access to sensitive information such as
geographical identifiers and admin data. Possible to link to many datasets. Data are not downloadable and can be access only via remote desktop.
Secure Access – linking
Geographical identifiers
Possible to link area-level dataset to MCS: level of poverty of their neighbourhood, presence of services or other amenities, etc.
MCS1-MCS4: Ward level MCS1-MCS4: Lower Super Output Area MCS1-MCS4: Output area
Education administrative Datasets
MCS1-MCS4: Linked Education Administrative Dataset - Scotland MCS1-MCS4: Linked Education Administrative Dataset - Wales MCS1-MCS4: Linked Education Administrative Dataset - England
Data Structure and Linking Datasets
Linking Datasets
- Understand the layout of the datasets
- How you link depends on what you want
– Family - bedrooms in household – Interview - Cohort child outcomes – Respondent - income of mother. – Twins and triplets.
Dataset Layout
mcsid ampnum00 ampsex00 .... appnum00 appsex00 ... M10001N 1 female 2 male M10002P 1 female 2 male M10007U 2 male 1 female M10008V 1 female 2 male M10011Q 1 female 2 male mcsid bmpnum00 bmpsex00 .... bppnum00 bppsex00 ... M10001N 2 male 1 female M10002P 1 female 2 male M10007U 2 male 1 female M10008V 1 female 2 male M10014T 1 female 2 male
MCS1 MCS2
Sweep 1 Main respondent person no. Sweep 1 Main respondent sex Sweep 1 Partner respondent sex Sweep 1 Partner respondent person no. MCS Family Identifier Sweep Identifier ‘b’ Mother not always main respondent Sometimes they swap around Families don’t always appear at each sweep
mcsid ampnum00 ampsex00 .... M10001N 1 female M10002P 1 female M10007U 2 male M10008V 1 female M10011Q 1 female mcsid bmpnum00 bmpsex00 .... M10001N 2 male M10002P 1 female M10007U 2 male M10008V 1 female M10014T 1 female
MCS1 MCS2
mcsid appnum00 appsex00 ... M10001N 2 male M10002P 2 male M10007U 1 female M10008V 2 male M10011Q 2 male mcsid bppnum00 bppsex00 ... M10001N 1 female M10002P 2 male M10007U 1 female M10008V 2 male M10014T 2 male
Stack equivalent variables Potentially rename variables
Same Respondent Linkage
MCS wave 5 data structure
- The data structure changed in MCS 5.
- Instead of having separate data files for main
respondents and partners, now the observations are appended into one file.
- Each person in the household has a unique identifier.
You need to restrict your sample based on these identifiers and on the research you want to do.
- See user guide
http://www.cls.ioe.ac.uk/page.aspx?&sitesectionid=1266 &sitesectiontitle=User+Guides
Variable Naming
cmreofa0 - S3 Main: How often reads to CM C1
- 1. c - sweep identifier
- 2. m - respondent identifier
- 3. reof - CAPI Question name
– 0, or a, b, c - cohort member identifier – 0 or a… - no multi-code on question
- cpreofb0 - S3 Partner: How
- ften reads to CM C2
- c - sweep identifier
- m - respondent identifier
- reof - CAPI Question name
- 0, or a, b, c - cohort member
identifier
- 0 or a… - no multi-code on
question
Derived Variables
- Respondent Identity and Response
- Household composition
- Ethnicity (Parents/ Carers and Cohort Members)
- Education (Highest NVQ)
- Employment and Occupation coding
- Religion
- Income
- Housing
- Anthropometry
- Psychological Scales
Finding Variables
- There are many MCS files and each file contains a huge
number of variables
- This can make it difficult to locate specific variables you
are interested in
- The questionnaires for each sweep are available on the
CLS website, as is a searchable online data dictionary In Stata, command lookfor Allows to search variables and labels based on key words.
Data Dictionary
Website resources
- http://www.cls.ioe.ac.uk/mcs
- http://www.spsstools.net/
- http://www.stata.com/links/resources-for-
learningstata/
Tarek Mostafa Centre for Longitudinal Studies T.Mostafa@ioe.ac.uk