A Short Introduction to the German Socioeconomic Panel Study - - PowerPoint PPT Presentation
A Short Introduction to the German Socioeconomic Panel Study - - PowerPoint PPT Presentation
A Short Introduction to the German Socioeconomic Panel Study Overview 1 . What isthe SOEP 1a Topics Covered by the SOEP 1b Survey M ethodology and Instruments 1c Sample Development 2. Data Structure 2a Datasets 2b Variable Names
Overview
1 .
What isthe SOEP
1a Topics Covered by the SOEP 1b Survey M ethodology and Instruments 1c Sample Development
2.
Data Structure
2a Datasets 2b Variable Names and M issings 2c Identifyers and Pointers 2d PPFAD and HPFAD – two important datasets
3.
Weighting Strategy
4.
Documentation, Online S upport and Paneldata.org
5.
Data Access
Name + Datum (über Kopf- und Fußleiste einstellen) Titel und Untertitel 2
Part 1 : What isthe SOEP
Part 2: What is the SOEP Name + Datum (über Kopf- und Fußleiste einstellen) Titel und Untertitel
3
Was ist das SOEP?
4
SOEP Core – M ain study
- Since 1984
- 18 partial samples
The German Socio-economic Panel Study (GSOEP)?
1
Basic information on SOEPcore
5
Household Panel Study: Representative annual panel survey of private households in Germany
- Since 1984
- In 2018 about 15.000 households
- Individual information on all individuals living in the sampled household
- Information for up to three generations due to geneological design
- Life-Course perspective with a focus on well-being
- Theory-based multidisciplinary mix of subjective and objective data on
households, individuals and families.
- Local context indicators allow for regional comparison (available only at
selected on-site research data centers)
- Aim: Provision of micro-data for research in social and behavioral sciences as
well as economics for universities and independent research centers in Germany and abroad
1
Wagner, Frick & Schupp (2007): The German socio-economic panel study (SOEP). Scope, evolution and enhancements. Schmollers Jahrbuch: Zeitschrift für Wirtschafts- und Sozialwissenschaften ; journal of applied social science studies 127:139-169.
Information gathered in SOEP Households
Household Panel Study with information on individuals within the households
- Information on the household is gathered from the household head
- Personal interviews with all adult household members (since 1984)
- Since 2001: All individuals age 17 with retrospective questions on youth
- Since 2014: Personal interviews with children living the household aged 11-12
- Since 2016: Personal interviews with children living in the household aged 13-14
- Proxy information for younger children living in the household
- Proxy information on individuals who dropped out of the survey due to death
- r moving abroad
1
6
Composition of SOEP Core
Name + Datum (über Kopf- und Fußleiste einstellen) Titel und Untertitel 7
ActivePanel
Original Sample
(West (1 984), former East (1 990))
Demographic Inflows
(births, marriages, adoptions, in-movers, out-movers)
Refreshment Samples
(Refreshments)
Attritors
(Refusals,
Deaths Moving Abroad Special Samples
(Immigration, Family Type)
1
Total Population: Private Householdsin Germany
SOEP-Core
8
Sample Development (Individuals) by Subsamples
Kroh, Martin. 2014. Documentation of Sample Sizes and Panel Attrition in the German Socio-Economic Panel (SOEP) (1984 until 2012). SOEP Survey Papers 177: Series D. Berlin: SOEP/DIW.
1
5000 1 0000 1 5000 20000 25000 30000 35000
1 984 1 985 1 986 1 987 1 988 1 989 1 990 1 991 1 992 1 993 1 994 1 995 1 996 1 997 1 998 1 999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 201 201 1 201 2 201 3 201 4 201 5 201 6
M4 201 6 Refugee/family (201 3-201 5) M3 201 6 Refugee (201 3- 201 5) M2 201 5 Migration (2009- 201 3) M1 201 3 Migration (1 995- 201 1 ) L3 201 1 Family Type (Single-Parent L2 201 0 Family Type (Low-Income) L1 201 0 Birth Cohort (2007-201 0) K 201 2 Refreshment J 201 1 Refreshment I 2009 Innovation Sample H 2006 Refreshment G 2002 High Income F 2000 Refreshment E 1 998 Refreshment D 1 994/5 Migration (1 984-1 994, West) C 1 990 Initial Sample (East) B 1 984 Migration (until 1 983, West) A 1 984 Initial Sample (West)
Part 1 a: Topics covered by the SOEP
Name + Datum (über Kopf- und Fußleiste einstellen) Titel und Untertitel
9
Topics covered in the SOEP
10
M ulti-topic study with a multidisciplinary perspective, covering questions concerning:
- Demography and population
- Housing
- Education and qualification
- Occupation and employment
- Income, taxes and social security
- Health
- Attitudes, values and personality
- Subjective wellbeing (Life satisfaction).
- M igration and integration
Ì Additional topics with multi-annual survey rhythms
1 a
1 a
11 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Consumption and Savings Health Globalisation & Transnationalisation Grip Strength Reciprocity Big Five Risk Aversion Trust Health (SF12 BM I) Transport and energy consumtion Workenvironment Expectations about the future Family and social networks Time use and preferences Labor market & subj. indicators Training & occup. qualification Householdfinacne und wealth Social security und retirement Neighborhood
*Complete list of topics is available in the Desktop Companion (2017): http://about.paneldata.org/soep/dtc/content.html
Topics with multi-annual survey rhythm*
Death Birth
M emories & widow(er ) Pensions
Adult Life Planning stage
´M emories &
widow/ widower ) pensions
Adult life Preschool Fetal phase Information from “cradle to stretcher”
à Ideally for all individuals in the household
Physical indicators, health, personality, care Physical and mental condition of the mother Type of school, School performance, recreation, health, personality
- Obj. and subj. living
conditions, personality traits, values, physical and cognitive potential
Primary school Secondary school
Physical indicators, health, personality, care, recreation, school performance Values, relationships (parents), recreation, school performance , goals
Life Course Perspective
12
1 a
1 a
Dimensionality of gathered information
Name + Datum (über Kopf- und Fußleiste einstellen) Titel und Untertitel 13
Subjective Objective Past Future Present
Information are gathered on a temporal dimension spanning
&
14
Extension of survey instruments besides standardized questionnaires:
- Objective measure of physical characteristics (Grip Strength)
- Instruments measuring cognitive potential
- Behavioral experiments
- Innovative modules in SOEP-IS
1 a
Subjectiveand Objective Information
20 40 60 80age
0,0 20,0 40,0 60,0 80,0First Measurement right hand
Sex
Men WomenDIW Berlin / SOEP (Ed.) (2016) : SOEP 2016 – Erhebungsinstrumente 2016 (Welle 33) des Sozio-
- ekonomischen Panels: Greifkrafttest, Stichproben A-L3, SOEP
. Survey Papers, No. 363.
Temporal Dimension of gathered information
15
Information regarding the present:
- current employment status or current life satisfaction
Information regarding the past:
- Retrospective questions about events
- How often have you changed jobs in the recent years?
- Retrospectively assessed history of events since age of 15
- Employment or family status
- M onthly calendar of income and employment
- Employment status from January to December of the past year
Information regarding the future (expectations):
- Expected life satisfaction in 5 years
- Expectations regarding (re-)employment, retirement, expectations on schooling
1 a
Scales Manual
1 a
16
17
The Socio-economic Panel Study (SOEP)
1 a
SOEP Core
- 1984 - today
- 18 subsamples, regular refreshments
SOEP IS - Innovation Sample
- From 2012 onwards
- Subsamples E and I
- Opportunity for external users to
propose survey modules
SOEP RS - Related Studies
- Question+ data structure of SOEP
- Berlin Aging Study II (BAS
E II)
- Families in Germany (FiD)
- PIAAC-L
- BRISE
Part 1 b: Survey Methodology and Instruments
Name + Datum (über Kopf- und Fußleiste einstellen) Titel und Untertitel
18
Development of the Survey Method
1 b
19
URL: http://about.paneldata.org/soep/dtc/design.html
Primarily face-to-face Interview, earlier PAPI now CAPI
Was wird im SOEP-Core erhoben?
- Standard instruments (annually)
- Household questionnaire (household „chief“)
- Temporally changing household infomation
- Individual questionnaire (every person in household aged > 17/ 18)
- Temporally changing individual information
- Biographical instruments (once, during first interview)
- Supplementary Biography Questionnaire (aged 18 and above, since
1989)
- retrospective questions regarding important life events and
biographical data (e.g. place of birth, nationality, nationality of parents)
20
1 b
Standard and Biographical Survey Instruments
38
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Starting with birht-cohort
Newborns 2002/ 2003 2-3 years (M uki2) 2002 5-6 years (M uki3) 2002 7-8 years (FID) 2002 9-10 years (M uki5) 2002 11-12 years* (pre-teen) 2002 13-14 years*
(early youth)
2002 17 years* * (youth) 1983
1 b
* Personally interviewed, with parental consent * * personally interviewed
Target-group-specific Instruments
Was wird im SOEP-Core erhoben?
- Short Questionnaire „Luecke“ (from 2006 onwards )
- T
emporary unit non-response
- The deceased person (from 2009 onwards)
- Family relationship, cause of death, testament
- Y
- ur life abroad (2006, 2007, 2012, 2013, 2014)
- Reasons for leaving, desire to return
- IAB-SOEP-migration sample (since 2013 and 2015 respectively)
(M 1-M 2)
- IAB-BAM F-SOEP-survey of refugees (since 2016 and 2017
repectively) (M 3-M 5)
22
1 b
Target-group-specific Instruments
Migration Samples
Quelle: S tatistische Bundesamt (Zahlen vor 1989 beziehen sich lediglich auf Zuzüge in das Gebiet der Bundesrepublik Deutschland). Eigene Abbildung.
B
(1950- 1984)
23 D
(1984- 1994)
F I / J / L1 M 1
(1995- 2013)
M 2
(2009- 2015)
M 3/5 3/5
(2013- 2016)
1 b
in-migration to the SOEP
Identification of migrants and their descendants in SOEP:
- 1. PPFAD: Identification via variables in the
dataset (for instance BIOIM M IG)
- 2. BIOIM M IG: Identification via membership in
the dataset
- 3. Samples: Identification via Sample
(Immigrant and M -Samples)
PPFAD (all individuals who ever participated in the SOEP) BIOIMMIG (all individuals who ever answered the BIOIMMIG Questionnaire) Samples (e.g. M1 /M4) Own Figure. Relation to Datasets with respect to N
Migrants and their Descendants in the SOEP
M ainz, 4th M ay 2018 SOEPcampus 24
1 b
PPF AD: Identification via variables in the dataset
- remember: PPFAD contains central information on every individual who ever
lived in a SOEP-household
- For each of these variables there is a corresponding „Info“-variable providing
information regarding the quality of the generated information
(GEM BORNINFO, CORIGININFO, IM M IYEARINFO, M IGBACKINFO, REFINFO)
Variable Name Label (and value labels) GERMBORN Born in Germany [1 ] born in Germany or immigrated before 1 950 [2] not born in Germany CORIGIN Country of birth [1 ] Germany, [2] Turkey, etc. IMMIYEAR Y ear of immigration (after 1 950) MIGBACK Migration background [1 ] None, [2] direct or [3] indirect migration background REFBACK Flight background
Further information in Liebau, Schmitt und Schacht (2017).
Migrants and their Descendants in the SOEP
1 b
BIOIM M IG: Identification via membership in the dataset
- remember: BIOIM M IG contains central information on all individuals
who ever answered a BIOIM M IG questionnaire (introduced with life- course questionnaire for wave M in 1996)
- BIOIM M IG currently contains 46 variables
- Long format (combine via persnr + syear)
Variable name Label (and value labels) BIIMGRP Immigration group at point of immigration [1 ] East German (in questionnaire until 1 995) [2] Emigrants, Eastern Europe [3] Germans, abroad [4] EU member state (until 2009 EC) [5] Asylum seeker, Refugee [6] other foreigners [7] others BIRESPER Status of residence BIREASON Main reason for immigration BISTAYY Time of planned stay
Weitere Informationen in Goebel und Strauch (201 7).
Migrants and their Descendants in the SOEP
M ainz, 4th M ay 2018 SOEPcampus 26
1 b
1 b
Documentation of Questionnaires
27
URL: https:/ / www.diw.de/ de/ diw_02.c.222729.de/ instrumente_feldarbeit.html
Was wird im SOEP-Core erhoben?
28
1 b
Questionnaires in 201 6
Standard instruments Biographical instruments Mother-child instruments
URL: https:/ / www.diw.de/ de/ diw_02.c.222729.de/ instrumente_feldarbeit.html
Pupils/ early youth Deceased Person Target-group-specific questionnaires (migrants & refugees)
Part 1 c: Sample Development
Name + Datum (über Kopf- und Fußleiste einstellen) Titel und Untertitel
29
SOEP-Core
30
Sample Development (Individuals) by Subsamples
Kroh, Martin. 2014. Documentation of Sample Sizes and Panel Attrition in the German Socio-Economic Panel (SOEP) (1984 until 2012). SOEP Survey Papers 177: Series D. Berlin: SOEP/DIW.
1 c
5000 1 0000 1 5000 20000 25000 30000 35000
1 984 1 985 1 986 1 987 1 988 1 989 1 990 1 991 1 992 1 993 1 994 1 995 1 996 1 997 1 998 1 999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 201 201 1 201 2 201 3 201 4 201 5 201 6
M4 201 6 Refugee/family (201 3-201 5) M3 201 6 Refugee (201 3- 201 5) M2 201 5 Migration (2009- 201 3) M1 201 3 Migration (1 995- 201 1 ) L3 201 1 Family Type (Single-Parent L2 201 0 Family Type (Low-Income) L1 201 0 Birth Cohort (2007-201 0) K 201 2 Refreshment J 201 1 Refreshment I 2009 Innovation Sample H 2006 Refreshment G 2002 High Income F 2000 Refreshment E 1 998 Refreshment D 1 994/5 Migration (1 984-1 994, West) C 1 990 Initial Sample (East) B 1 984 Migration (until 1 983, West) A 1 984 Initial Sample (West)
Part 2: Data Structure of the SOEP
Name + Datum (über Kopf- und Fußleiste einstellen) Titel und Untertitel
31
2
Wide vs. Long Format
Name + Datum (über Kopf- und Fußleiste einstellen) Titel und Untertitel 32
- Annual information is combined
into a long format.
The SOEP is delivered in 2 formats:
- Annual datasets are organized in
wave-specific datasets.
- Each dataset is available
separately for every year of the survey.
Wide Long
Data-Structure and Naming conventions are slightly different across formats, but all information is contained completely in both formats
2
Wide vs. Long Format
Name + Datum (über Kopf- und Fußleiste einstellen) Titel und Untertitel 33
What format should I use?
- The choice of the format depends on the type of research question you
are trying to answer and thus:
- Whether your question/method require cross-sectional or longitudinal data
- The number of observations you need per item
- The specific variables you are planning to use
- Consider:
- The Long-Format datasets are very large and require large amounts of
computational power, such that opening datasets and data-management
- perations may take fairly long.
Part 2a: Datasets
Name + Datum (über Kopf- und Fußleiste einstellen) Titel und Untertitel
34
2a
Data Structureand Naming Conventions
Name + Datum (über Kopf- und Fußleiste einstellen) Titel und Untertitel 35
Regardless of the format, we provide three different types of data: … these are organized in different types of datasets
Raw survey data Tracking data Generated variables Survey data
2a
Data Structureand Naming Conventions
Name + Datum (über Kopf- und Fußleiste einstellen) Titel und Untertitel 36
General naming conventionsfor datasets
Underlying survey Instruments Type of data Subject of inquiry Wave (for Wide Format only) In both versions, the SOEP provides „ speaking“ names for datasets, which contain hints with regards to:
2a
Data Structureand Naming Conventions – Long Format
37
Long: Naming conventionsfor datasets
Underlying survey instruments Type of data Subject of inquiry p: person h: household kid: children jugend: youth vp: deceased person luecke, abroad, cog*, gripstr, timepref, trust, Original: Generated: *gen, bio*, *kalen, equiv, *spell, kal, Tracking: *brutto, *pfad, pbr_exit, Survey: *hrf, design, exit, pbr_hhch,
In welcher Form liegen die Daten vor?
38
- Y
early survey data firstly organized in wave-specific data sets
- $ stands for the respective wave token
2a
Additional Wave-identifiers in thewide format
wave token year wave nr. a 1 984 1 b 1 985 2 … ... … bf 201 5 32 bg 201 6 33
39
2a
Indication for Data Set Names
SOEP Wide Long
unit wave generated instrument generated $ *l instrument P H unit P H kid, jugend gen, equiv, kal ⋏ ⋎ cumulative data sets spell datasets B biography data sets tracking- data brutto, pfad gen, equiv, kal kind, luecke, ausl, jugend, ...
Part 2b: Variable Names and Missings
3b
Variable namesin long
41
DVTXXXX
1st digit: Dataset-identifier
P = Person, H = Household, L = Biography, J = Y
- uth
2nd digit: Variable-identifier
L = numeric variable, A = alphanumeric variable C = numeric variable containing
- riginal information for recoded
variables
3rd digit: Topic-identifier
a = demography and population b = work and employment c = income, taxes and social security d = family and social networks e = health and care f = habitation, equipment and services of private households g = eduacation and qualification h = attitudes, values and personality i = time use and ecological conduct j = integration, migration and transnationalization k = survey methodology
4th-7th digit: run-on number
Examples: ple0010 person numeric health & care 0010 (birthyear) pli0044 person numeric time use 0044 (time spent on childcare weekday)
3b
Variable names in wide – v.34 onwards
WU_Q_I_q
W ave-identifier
1-2 letters e.g. C = 1987, BG = 2016
Unit-identifier
1 letter e.g. H = Household, P = Person, K= child, J = Y
- uth
Question-number
2-3 numbers, leading zero if < 10 e.g. 001, 24, 157
Item-identifier
2 numbers with leading zeros e.g. 01, 04, 15
M ainz 3rd M ay 2018 SOEPcampus
questionnaire-identifyer
2 numbers Defined in instrument Variable e.g. q51
Examples: AP_06 wave A (1984) person question 6 BEK_78_10 wave BE (2014) child question 78 item 10 BHP_109_01_q75 wave BH (2017) person question 109 item 1 questionnaire 57 * * Personal Biography Q.(M 3-M 4 Re-Interviewed; CAPI)
3b
Conventions for missing valuesin SOEP
43
Code Bedeutung
- 1
No reply
- 2
Does not apply
- 3
Not valid
- 4
Invalid multiple reply
- 5
Not available version of questionnaire
- 6
Questionnaire with different filtering
- 8
Question not part of questionnaire*
*Only applies to datasets in Long Format.
Note: In SOEP no SYSTEM MISINGS („.“ in Stata)
Part 3c: Identifiers and Pointers
3c
Combining information from different datasets
45
Combining information from different levels
- How does the birth of a third
child affect the household income? Relevant information can be merged via indicators. Wide: persnr, hhnr, $hhnr, spellnr, syear Long: pid, cid, hid, syear, Combining information from different persons
- How does entering
unemployment affect the grades of children? Information from different persons can be combined via pointers Wide: partnr$$, kidnr, $kmutti, $kmup, mnr, vnr Long: partnr, kidnr, mnr, vnr
Part 3d: PPFAD and HPFAD – two important datasets
PPF AD contains relevant information on all individuals, who ever lived in a SOEP household
- Personnumber (grown-ups, youth, children)
- Sex, year of birth, month of birth, year of death, mig back
- Sample (psample)
- Current household number (hhnrakt / $hhnr)
- Survey status($netto, $netold)
- Sample region (East or West Germany)
HPF AD contains time consistent and time variant information on household level including the original household number, sample regions, response status, current household number
PPFAD and HPFAD – two basic datasets
3d
47
Response Status($netto)
respondents
3d
children Etc.
48
Part 3: Weighting Strategy
50
Empirical social research
- Aim: statements about populations
- Inference based on smaller research populations (samples)
- Analysis methods often assume random sampling
Problem:
- Alternative sample designs have many advantages but they can
lead to selection by design
- Drawn observation units can refuse to participate in the interview
(self-selection)
à Ignoring sample selectivity can distort estimations of parameters à Possible solution : weighting
Weighting
3
3
51
Sample development (from brutto to netto)
Weighting
Population: private households in Germany
Ziehungsdesign Non-Response (exploitation) Leaving the population Non-Response (Attrition)
Brutto wave 1: drawn, i.e. HH to be interviewed Netto wave 1: interviewed HH Brutto wave 2: HH to be interviewed again Netto Welle 2: interviewed HH design in design.dta $HHRF BHBLEIB, BHHRF Cross section Longitu- dinal
Weighting
52
Advantage of combinable weights:
- Cross sectional weight $ = Product of the probability of
being drawn* response probability in wave 1 and all response probabiliites until $
- M odular principle
1. Drawing designà Design 2. Non-response analysis wave 1 3. Post-stratification wave 1 à AHHRF 4. Non-response analysis wave 2 à BHBLEIB 5. Post-stratification wave 2 à BHHRF 6. … 7. Non-response analysis wave 32 à BFHBLEIB 8. Post-stratification wave 32 à BFHHRF
3
3
Weighting
53
M odular principle à possibility of the individual manipulation of weights
- Subtract the design information AHHRF/ DESIGN
- Subtract the cross section post-stratification
BCHHRF x BDHBLEIB = BDHHRFNEU statt BDHHRF
Part 4: Documentation,Online Support and Paneldata.org
Part 4a: Documentation and Online Support
Documentation: Overview
56
4a
Documentation
57
Important documents for working with the GSOEP
- Desktop Companion: basic survey information
- Questionnaires
- PDF on website
- Soon also under instruments on „Paneldata.org“
- Fieldwork reports: Documentation of field work
- PDF on website
- Documentation of „generated variables":
- PDF on website
4a
4a
Desktop-Companion
58
http:/ /about.paneldata.org/soep/ dtc/
4a
Questionnairesand Fieldwork reports
59
http:/ / www.diw.de/en/ diw_02.c.238114.en/ questionnaires_fieldwork_documents.html
4a
Documentation of Generated Variables
60
https:/ / www.diw.de/en/ diw_01.c.591537.en/soep_documentation_generated_variables_v33_1.html
61
www.diw.de/ de/soep
SOEPlit
4a
4a
SOEPlit
62
Data bank of publications based on the GSOEP https:/ / data.soep.de/soep-core/ publ/
BS1 BS2
Folie 62 BS1 Einschränkung. Nur die uns bekannten Publikationen. Keine vollst. Datenbank.
Bohmann, Sandra; 14.06.2017
BS2 Sollte mit Aufruf verbunden werden Publikationen immer zu melden
Bohmann, Sandra; 14.06.2017
Part 4b: paneldata.org
4b
paneldata.org
64
Overview of all studies of the SOEP and other panels
4b
paneldata.org
65
All data sets and increasingly also all questionnaires
4b
paneldata.org
66
4b
Working with the Basket
67
4b
Scriptgenerator for Cross-section with Stata, SPSS and R
68
Part 5: Data Access
Data Access
70
https:/ / www.diw.de/ en/ diw_02.c.222836.en/ access.html
5
Data Access
71
Step 1: Apply for a contract Step 2: Sign contract and send it back Step 3: Order the data
5
Data Access
72
5
Thank you for your attention.
DIW Berlin — Deutsches Institut für Wirtschaftsforschung e.V. M ohrenstraße 58, 10117 Berlin www.diw.de Editor Sandra Bohmann sbohmann@diw.de