Introduction to Survey Statistics – Day 1 Survey Methodology 101
Federico Vegetti Central European University University of Heidelberg
1 / 41
Introduction to Survey Statistics Day 1 Survey Methodology 101 - - PowerPoint PPT Presentation
Introduction to Survey Statistics Day 1 Survey Methodology 101 Federico Vegetti Central European University University of Heidelberg 1 / 41 Goals of the course By the end of this course you should have learned What are the main
1 / 41
2 / 41
3 / 41
4 / 41
5 / 41
◮ To answer what questions ◮ Accounts, Indicators, Associations, Syntheses, Typologies
◮ To answer why questions ◮ Ideally addressed with experiments (but not only)
6 / 41
◮ Can we find the hypothesized relationship in the data? Is it
◮ Can we trust the data at all?
7 / 41
◮ Describes Y as a variable generated by a Gaussian process ◮ Describes how a set of predictors X are associated with Y ◮ Tells how well this description fits the data (R2)
8 / 41
◮ What is the difference in vote share for AfD between West and
◮ How many Italians believe that vaccines cause autism?
◮ (though “big data” are becoming more and more popular) 9 / 41
10 / 41
11 / 41
12 / 41
◮ Accuracy, Relevance, Cost-efficiency, Timeliness, Accessibility,
13 / 41
◮ The more accurate our data, the more credible our inference 14 / 41
◮ The difference between the values that we observe for a given
◮ The difference between the values that we observe in the
15 / 41
16 / 41
◮ E.g. Telephone interviews are likely to produce different errors
17 / 41
◮ In this case, construct = concept
◮ E.g. voting for a right-wing party as a proxy for being
◮ It may pose a validity problem
18 / 41
19 / 41
◮ For instance, we want to measure mathematical ability, so we
◮ Jan is usually very good at maths, but that morning he has a
◮ The value of mathematical ability that would be obtained by
20 / 41
◮ When the distortion in the measurement is directional ◮ E.g. our maths problems are too easy to solve, so everyone gets
◮ When this is the case, the measurement is said to be biased
◮ The measured quantity may be instable, so the same person
◮ E.g. How much do you generally agree with your partner about
◮ The episodes that you recall when you think of an answer are
◮ This type of error inflates the variability of the measure 21 / 41
22 / 41
23 / 41
◮ E.g. The mean income in our data will have a different error
◮ E.g. If we do an online survey we will be able to reach only the
24 / 41
◮ Target population: all German citizens ◮ Sample frame: registered telephone users in Germany 25 / 41
26 / 41
◮ Among internet users: 41 ◮ Among internet non-users: 48 ◮ Share of internet non-users: 10%
27 / 41
28 / 41
◮ E.g. People in working age who have a phone but are never at
29 / 41
30 / 41
31 / 41
32 / 41
◮ Observe all individuals within the groups (single-stage) ◮ Sample again within groups (multistage)
33 / 41
◮ Example: personal income question, where richer people are less
34 / 41
35 / 41
◮ ESS, WVS, EES, CSES
36 / 41
◮ E.g. very long multi-item indexes, very complex explanations
37 / 41
◮ E.g. we pay only when the questionnaire is complete
38 / 41
39 / 41
◮ Surveys usually come with weights: it helps to know what is
◮ There are many diagnostics to assess the quality of
40 / 41
41 / 41