Kristian Wiklund | 20134-03-05 | Page 1
Collecting information from people
Photo CC-BY-2.0 Richard Riley http://www.flickr.com/photos/rileyroxx/169900126/
Kristian Wiklund
Collecting information from people Kristian Wiklund Photo - - PowerPoint PPT Presentation
Collecting information from people Kristian Wiklund Photo CC-BY-2.0 Richard Riley Kristian Wiklund | 20134-03-05 | Page 1 http://www.flickr.com/photos/rileyroxx/169900126/ Outline What is a survey? Different ways of collecting
Kristian Wiklund | 20134-03-05 | Page 1
Photo CC-BY-2.0 Richard Riley http://www.flickr.com/photos/rileyroxx/169900126/
Kristian Wiklund
Kristian Wiklund | 20134-03-05 | Page 2
› What is a survey? › Different ways of collecting data from people
– Questionnaires, (Qualitative) Interviews, Focus Groups
› Break with discussion › Design: Survey design and How to not ask questions
– Total survey quality – Sampling and Frames – Non-sampling validity threats (valid for all designs above) › Self-reporting, scales, order, …
Kristian Wiklund | 20134-03-05 | Page 3
› A survey concerns a set of
– The goal of the survey is to describe the population in terms of
– To do this, a sampling frame is needed, from which a sample of the population is selected to be included in the survey
› The survey makes observations on the sample
– The observed measurements are used to make inferences on the population
Population Sampling Frame Sample Describe the population
[Stat100]
Kristian Wiklund | 20134-03-05 | Page 4
› Create a profile of a group as whole
– The design of the survey makes it possible to generalize the answers of a selected few to the group
› Types of research
– Exploratory - ‘What is going on here?' – Descriptive - ‘What are the characteristics of the population?’ – Explanatory – ‘Why is something happening?’ – Predictive – ‘What is the likelihood of something happening?’
Kristian Wiklund | 20134-03-05 | Page 5
Structured: Following a questionnaire Free form: Having a nice chat Semi-structured: A base set of questions and clarification as needed
All interviews conducted in exactly the same way, no explanations, no clarifications. “What does it mean to you?” Self-administered questionnaires
Flexibility
More options for interpretation
reactions of the respondent? “Discourse Analysis”
Questionnaires Focus Groups
Kristian Wiklund | 20134-03-05 | Page 6
CC-BY-NC Ikhlasul Amal http://www.flickr.com/photos/21372148@N00/2443194039
Kristian Wiklund | 20134-03-05 | Page 7
“Let’s put together a questionnaire and send to some people and check…”
› Ideal situation:
– Researcher writes a number of questions, distributes the survey on Twitter, drinks some coffee and writes a couple of journal papers waiting for the data to drop in – Hordes of willing respondents answer the questions – Researcher analyses the data with Excel (it basically analyses itself), writes a paper with some nice bar charts, sends the paper to ICSE – Paper accepted, nice trip to Florence, wine, best paper award
Kristian Wiklund | 20134-03-05 | Page 8
In reality:
– Researcher intent is encoded into questions › Was that done right? – Researcher finds subjects and convinces them to answer › Was the selection done right? – Questions transmit researcher intent to subject › Were they interpreted correctly? – Subject decides to answer and formulates an answer › Is the answer reliable? – Researcher encodes the answers into something that can be analyzed › Was that done right? – Data is analyzed › Was that done right?
Kristian Wiklund | 20134-03-05 | Page 10
› Access to subjects › Performing the interviews without introducing bias › Reliable collection of data
CC-BY-NC-SA Gabe McIntyre http://www.flickr.com/photos/38366783@N00/2617316249
Kristian Wiklund | 20134-03-05 | Page 11
› “The key to successful interviewing is learning how to probe effectively… › ...that is, to stimulate an informant to produce more Information… › ...without injecting yourself so much into the interaction that you only get a reflection of yourself in the data.”
[Weiss2000]
› The same validity issues as with questionnaire design applies
– ...and will be described in detail later
› In addition to this, there is a clear risk for interviewer bias › Useful skills
– Active listening
› http://www.babblingengineer.com/com munication/how-to-improve-your- active-listening/
– Coaching
Kristian Wiklund | 20134-03-05 | Page 12
http://www.faculty.londondeanery.ac.uk/e-learning/appraisal/skilful-questioning-and-active-listeningAllow people to answer in their own terms Use non-leading, open, questions to get started, probe for details later Let the subject exhaust one question before moving on Have enough time allocated
Kristian Wiklund | 20134-03-05 | Page 13
› I strongly suggest that the interviews are recorded
– Informed consent
› Multiple reasons:
– You will not capture everything in your notes
› Not even if someone else takes the notes
– Self-evaluation
› Did you introduce bias? › Was the design followed? › Personal quirks (“uuuhhhmm…”)
– Capture “how” and not only “what”
http://www.flickr.com/photos/labanex/8668665270/
Kristian Wiklund | 20134-03-05 | Page 14
› Tooling:
– Additional computer works okay, separate microphone recommended, but I have had success with the built in microphone
› Transcription
– Get software support for transcription
› I use Express Scribe on a Mac
– Transcription takes time, 2-10 times the interview time
› And you will likely feel silly for most of the time listening to yourself
http://www.flickr.com/photos/strandarchives/9273941774/in/photostream/
Kristian Wiklund | 20134-03-05 | Page 16
› “A focus group is a carefully planned discussion designed to
area of interest in a permissive, non-threatening environment” › ...or what most people in industry would call a “workshop”
– Which makes this a very marketable skill.
CC-BY-SA-NC 2.0 Some rights reserved by Nebraska Library Commission [Fisk2005]
Kristian Wiklund | 20134-03-05 | Page 17
› Use for:
–Qualitative information –Insights into new area –Preparation for a larger, more formal, study
› Don’t use for:
–Quantitative information –Confidential information –Situations that may go
› Emotionally charged discussions › Participants with an agenda of their own
Kristian Wiklund | 20134-03-05 | Page 18
› Objectives and questions
– as important here as for interviews and questionnaires
› Agenda for the focus group meeting
– Introduction, questions, summary
› Facilitator
– neutral person with sufficient domain knowledge and capability to move the work forward – not necessarily the researcher
› Note-taker/Assistant facilitator › Group member selection
– 6 to 12 people, without dependencies such as manager/employee
› Location
– “Not the office”
Kristian Wiklund | 20134-03-05 | Page 19
› Basically the same skills and risks as for interviews › Group bias risk:
–A group will easily bike- shed on issues that are not really relevant for the research
› Parkinson’s Law of Triviality: › “Briefly stated, it means that the time spent on any item of the agenda will be in inverse proportion to the sum involved.” [Parkinson1957]
Kristian Wiklund | 20134-03-05 | Page 20
› For each question, iterate:
–Open question –Deeper follow up probes to keep the ball rolling –Summary of the findings by the moderator
For more information: http://www.hse.gov.uk/stress/standards/pdfs/focusgroups.pdf and http://www.cse.lehigh.edu/~glennb/mm/FocusGroups.htm
http://www.faculty.londondeanery.ac.uk/e-learning/appraisal/skilful-questioning-and-active-listeningKristian Wiklund | 20134-03-05 | Page 21
› Self-organize a group and discuss:
– Is there anything in your research area that can be answered by › A survey › Interviews › Focus groups – Why? Why not? – What are the main challenges in conducting this type of study in your area?
Kristian Wiklund | 20134-03-05 | Page 23
› The main problems with surveys is that the complexity commonly is under-estimated.
– It is commonly perceived to be trivial to do a survey. – In reality, there are limitless ways to go wrong.
› A survey is a fixed design, which means that we have to live with our mistakes. › Hence, it is very important to be aware of the potential problems.
Kristian Wiklund | 20134-03-05 | Page 24
› Survey research is a branch of statistics
– This is evident when the theoretical background is studied › Statistical concepts such as sampling theory is well developed, while psychological concepts such as how to ask questions is still being developed – For a very long time it was assumed that if a survey was designed correctly with respect to sampling, it would be correct. – Now it is known that there are a lot of factors influencing the success of the survey › A survey should be designed to minimize the total survey error (Biemer2011)
Kristian Wiklund | 20134-03-05 | Page 25
(After Biemer2011) Total Survey Error
Mean Squared Error Bias2 + Variance Variable Error Systematic Error Variance Bias
Sampling Error
Sampling Scheme Sample Size Estimator Choice
Non-sampling Error
Specification Nonresponse Frame Measurement Data processing
Many of the contributors to TSE are not possible to measure directly.
Kristian Wiklund | 20134-03-05 | Page 26
› The sampling design influences the TSE
– Frame, number of samples, stratification
› The design of the survey instrument influences the TSE
– How questions are formulated influences the result – The order of the questions influences the result – The sensitivity of the issue influences the result – Options and ranges given as possible answers influences the result
› The way the survey is administered influences the TSE
– On-line, phone, face to face, paper
› The personal traits of the researcher influences the TSE
– Language, dialect, gender, ethnicity, haircut, glasses, clothes
Kristian Wiklund | 20134-03-05 | Page 28
Low Variance High Variance [Fortmann-Roe] High Bias Low Bias
Kristian Wiklund | 20134-03-05 | Page 29
› It is important to remember that
– Response rate alone cannot be used as a quality indicator! – The keywords in the newspaper article is “risk” and “could”, meaning “further analysis is required” › If the responses are received from all strata in the population, we get a low bias and can manage the variance with statistics
http://www.dn.se/nyheter/sverige/resultaten-kan-bli-snedvridna
Kristian Wiklund | 20134-03-05 | Page 30
Kristian Wiklund | 20134-03-05 | Page 31
› Basically, the same process as all other research:
– Set the goals - What do you want to capture? – Decide on the target population and sample size - Who will you ask? – Determine the questions- What will you ask? – Pre-test the survey - Test the questions – Conduct the survey - Ask the questions – Analyze the data collected - Produce the report [Kuter2001]
Kristian Wiklund | 20134-03-05 | Page 32
After [Biemer2003]
Design Validation Research Objectives Concepts How to collect data Sampling Design Questionnaire Design Planning for data collection and processing Data Collection and Processing Analysis
Kristian Wiklund | 20134-03-05 | Page 33
Kristian Wiklund | 20134-03-05 | Page 34
Truman defeats Dewey
› Gallup predicted a victory for Dewey
– Printing lead times required printing before counting was complete
› Lesson learned:
– The sampling frame was wrong. By using telephones (in 1948) to do the survey, the group likely to vote for Truman was largely excluded. – A cattle feed company selling feed sacks decorated with elephants and donkeys got a better result. [Curran2002]
Kristian Wiklund | 20134-03-05 | Page 35
› Gallup frame › Feed company frame
People Having phones People Buying Animal feed
Mostly rich people, likely to vote for Republicans In the 1940’s, all social groups needed animal feed
Kristian Wiklund | 20134-03-05 | Page 36
› Probability sampling
– Each object in the frame have equal probability of being chosen – Frames carefully designed to be representative of the population – Results can be statistically analyzed
› Non-probability sampling
– Not representative of the population – Convenience sampling › Take what you get, such as web visitor surveys – Snowball sampling – Results can generally not be statistically analyzed [Sommer2006]
Kristian Wiklund | 20134-03-05 | Page 37
› Snowball sampling relies on spreading a survey via contacts
– Do we know what the frame is? – Do we know what the population is? – Is it possible to generalize the findings?
› Snowball sampling still has to be used sometimes, for example when a community is hard to reach
– Drug users, prostitutes, black hat hackers, …
[Sommer2006]
http://www.flickr.com/photos/juniorvelo/2200372991/lightbox/
Kristian Wiklund | 20134-03-05 | Page 38
The following equations are used to calculate sample size. All calculations for proportions can be done with “normal statistics”, using “Yes=1, No=0” For the details, please read [Stat100] and [Stat414]
Kristian Wiklund | 20134-03-05 | Page 39
› For proportions:
– Uses the same equations as for the population mean. – For an error margin of 100*d%, with a large population use – Example: If d=0.05, then n0 = 400, we need 400 samples. [Biemer2003]
Kristian Wiklund | 20134-03-05 | Page 40
› Fewer samples are needed for small populations. › N = the size of the population › If (1 – n0/N) < 0.9, then we need to correct for population size
Kristian Wiklund | 20134-03-05 | Page 41
› It is important to remember that
– Response rate alone cannot be used as a quality indicator! – The keywords in the newspaper article is “risk” and “could”, meaning “further analysis is required” › If the responses are received from all strata in the population, we get a low bias and can manage the variance with statistics
http://www.dn.se/nyheter/sverige/resultaten-kan-bli-snedvridna
Kristian Wiklund | 20134-03-05 | Page 42
› Assume that we want to estimate the average salary in Stockholm
– If we perform a purely random sample, areas of high or low income may be underrepresented – Instead we stratify the population and create one frame per strata – The strata are sampled, and we use the information to calculate the average for the whole population
Kristian Wiklund | 20134-03-05 | Page 43
› Not all respondents are qualified to provide an answer
– Gender, skills, affiliation, etc
› This means that one may need a mechanism to screen the respondents.
– Adjusting the frame after sampling – In particular if uncontrolled sampling is used
› Is the respondent qualified to contribute to a survey about software engineering principle X?
Kristian Wiklund | 20134-03-05 | Page 44
Kristian Wiklund | 20134-03-05 | Page 45
Reasons for Refusal
› Not motivated, lack of time, fear of being registered, bad timing, screening, survey too difficult, business policy, low priority, too expensive to answer, sensitive questions, boring, bad questionnaire [Biemer2003]
Kristian Wiklund | 20134-03-05 | Page 46
› Reciprocation – induced by a gift or perceived benefits
– Prepaid incentives work better than promised incentives › A prepaid incentive creates a social contract with the respondent, while promised incentives are perceived as payment for services rendered
› Consistency – Complying to the survey request is consistent with the respondent’s beliefs and values
[Biemer2003]
Kristian Wiklund | 20134-03-05 | Page 47
› Social validation – respondents are more likely to participate if they believe that others participate
– Example: Facebook shows that “X likes…” in their ads
› Authority – a higher response rate is likely if the survey comes from a “legitimate authority”
– In our case, it is likely that it is more efficient to use the MDH brand when creating company-external surveys, and to use the company brand when creating internal surveys [Biemer2003]
Kristian Wiklund | 20134-03-05 | Page 48
› Scarcity – It is more likely to get a response if the survey is perceived as a rare opportunity to make one’s opinion heard
– “last day of the survey”, “only 1 in 10000 are contacted”, …
› Liking – Subjects will be more willing to respond if they like the researcher
– Nice person, similar values, dress code, dialect, … [Biemer2003]
Kristian Wiklund | 20134-03-05 | Page 49
Kristian Wiklund | 20134-03-05 | Page 50
› Consider how the analysis is to be done when designing the questions
– Not all numbers are numbers that can be averaged and analyzed – If the analysis require ordinal measurement, make sure that ordinal data is created. [Babbie1990]
› “How often do you compile?” (seldom, often, very often) › “How often do you attend meetings?” (seldom, often, very often)
– Vs
› “Do you compile more often than you attend meetings?”
– This forces the issue and eliminates the risk of using different scales for different variables.
Kristian Wiklund | 20134-03-05 | Page 51
› Open question › “What are the effects of a modular software architecture?”
–…………………….
› Closed question › “What is the primary benefit of a modular software architecture?”
–Testability –Maintainability –Scalability –Other
Kristian Wiklund | 20134-03-05 | Page 52
› Qualitative data that need interpretation and coding before use.
– Coding: Classifying the text into usable categories
› Useful for questions about behavioral frequency, or when it is hard to produce a good multiple-choice list
– Requires that the respondent is able to express the answer
Kristian Wiklund | 20134-03-05 | Page 53
› Multiple-choice answers that should be
– Non-overlapping – Unambiguous – Exhaustive
› Problems
– Missing alternatives › If the respondent have another alternative not in the list, we force the respondent to select the wrong answer – Primacy, Recency, Satisficing
Kristian Wiklund | 20134-03-05 | Page 54
A double-barreled question asks at least two things simultaneously › "My company's sustainability and corporate responsibility efforts have increased my overall satisfaction with working here”
– The example assumes that it is important to the respondent that the company have such a policy. – It also assumes that a good policy makes the respondent more satisfied. – A negative answer could indicate that › The respondent does not care (the satisfaction is unchanged) › The respondent cares and is dissatisfied with the policy. The company is either doing too much or too little work in the area. – A positive answer indicates that the respondent is happy › The policy could be either good or bad, who knows what motivates people?
Kristian Wiklund | 20134-03-05 | Page 55
Kristian Wiklund | 20134-03-05 | Page 56
Simplified Response Process
› Understand the question
– Listen or read to the question – Brain translates it to something that has meaning for the respondent
› Retrieve data
– Use the interpretation of the question to retrieve data from the memory
› Formulate an answer
– Filter the information into something the respondent is comfortable with sharing – Translate the data to something that fits the question
Kristian Wiklund | 20134-03-05 | Page 57
› “Conversational norms” may influence the survey result › In conversation, we seldom transition rapidly and without warning to an entirely new conversational context.
– In a survey, the previous context will linger with the respondent and may introduce bias.
› The context includes the researcher, the previous questions, the instructions, the survey title and introduction, … › Example: Schwarz1999, page 96
Kristian Wiklund | 20134-03-05 | Page 58
› Satisficing is a strategy that minimize the effort needed to respond.
–“Providing an answer” is the priority, not providing the right answer.
› Occurs when
–The task is difficult –The motivation is low
› A special case is straight-lining
–Avoid designs that encourage this behavior
Kristian Wiklund | 20134-03-05 | Page 59
› Aquiescence – the tendency to agree with a statement › Example:
– “Do you think that the United States should forbid public speeches against democracy?,” 54% of respondents said “yes,” they should be forbidden – “Do you think the United States should allow speeches against democracy?,” 75% said “yes,” suggesting that only 25% would not allow such public speeches [Biemer2003]
Kristian Wiklund | 20134-03-05 | Page 60
› Occurs in multiple-choice questions › Primacy is the tendency to favor the first options
– This primarily occur in written surveys
› Recency is the tendency to favor the last options
– This primarily occur in verbal surveys › What is most important?
– Travel – Money – Environment – Family
› Primacy:
– TRAVEL, money, blahblahblah
› Recency:
– Blahbhlah, environment, FAMILY
Kristian Wiklund | 20134-03-05 | Page 61
› Rating scales [Schwarz1999]
– “How successful are you in life?” › How do we define successful? Different people have different
– The scales used influenced the result › 0 to 10: 0 is “not at all successful”, 10 is “extremely successful”
› -5 to 5: -5 is “not at all”, 5 is “extremely”
Kristian Wiklund | 20134-03-05 | Page 62
› “Middle” alternatives are assumed to be normal. This introduces a bias.
– Right side: 37.5% watches TV for more than 2.5 hours per day – Left side: 16.2% watches TV for more than 2.5 hours per day
› Do not use rating or frequency scales, use open questions.
[Schwarz1999]
Kristian Wiklund | 20134-03-05 | Page 63
› “How often did you buy vegetables in the first quarter of 2013”
– Basically impossible to answer. – Encourages satisficing by counting, estimating and guessing
› “Major life-events” may be recalled for up to one year, minor events during a significantly shorter time [Kitchenham2002]
Kristian Wiklund | 20134-03-05 | Page 64
› People are less likely to be honest in reporting sensitive issues and will conform to what they believe is “normal” or answer in a way that increase their social status
– “How much alcohol do you consume in a week” [Biemer2003] – “How much meat do you eat” [Hebert2008]
› Mitigation
– Use open questions and neutral wording to avoid signaling what is normal. – Increase anonymity › For example, use web-based surveys over interviews. › Can be done by handing a computer to the subject during the final part of an interview – Put demographics at the end of the survey
Kristian Wiklund | 20134-03-05 | Page 65
› Swedes seldom use the extreme values in a rating question
– [Source: Personal communication]
– Other effects are present in other cultures
› This makes pre-testing and instrument validation very important for international surveys
Kristian Wiklund | 20134-03-05 | Page 66
› Survey research is used to measure on a subset of a population and use the measurements to draw conclusions about the population. › Total survey error is more important than response rate. › Survey research is a fixed design – do a good design! › Bias must be removed by design. › Snowball sampling should be avoided. › Trigger the respondent motivators to increase response rate. › The way we design the questions and questionnaire is highly influential on the total survey error. › Cultural effects cannot be ignored in international surveys.
Kristian Wiklund | 20134-03-05 | Page 67
› [Biemer2003] Biemer, P., & Lyberg, L. (2003). Introduction to survey quality. New York: Wiley Publishing. Retrieved from http://onlinelibrary.wiley.com/doi/10.1002/0471458740.fmatter/sum mary › [Biemer2011] Biemer, P. P. (2011). Total Survey Error: Design, Implementation, and Evaluation. Public Opinion Quarterly, 74(5), 817–848. doi:10.1093/poq/nfq058 › [Wiki2013a] http://en.wikipedia.org/wiki/Census_of_Quirinius › [Wiki2013b] http://en.wikipedia.org/wiki/Census_in_Egypt › [Wiki2013c] http://en.wikipedia.org/wiki/Domesday_Book › [RSV2013] http://www.skatteverket.se/privat/folkbokforing/omfolkbokforing/folk bokforingigaridag/densvenskafolkbokforingenshistoriaundertresekle r.4.18e1b10334ebe8bc80004141.html › [Schwarz1999] Schwarz, N. (1999). Self-reports: How the questions shape the answers. American psychologist. Retrieved from http://psycnet.apa.org/journals/amp/54/2/93/ › [Friedman] http://academic.brooklyn.cuny.edu/economic/friedman/rateratingsc ales.htm › [Torkar2003] A Survey on Testing and Reuse › [Curran2002] http://www.csudh.edu/dearhabermas/sampling01.htm › [Fortmann-Roe] http://scott.fortmann- roe.com/docs/BiasVariance.html [Hammel2013] https://onlinecourses.science.psu.edu/stat414/node/210 › [Kuter2001] http://lte-projects.umd.edu/charm/survey.html
› [Stockholm] http://www.statistikomstockholm.se/index.php/statistik-pa- karta/arbetsmarknad-kartor › [Stat100] https://onlinecourses.science.psu.edu/stat100/node/15 › [Sommer2006] http://psychology.ucdavis.edu/sommerb/sommerdemo/sampl ing/ › [Hebert2008] J. R. Hebert et al, ...Social Desirability Trait Influences on Self-Reported Dietary Measures among Diverse Participants in a Multicenter Multiple Risk Factor Trial,... J. Nutr., vol. 138, no. 1, p. 226S...234, Jan. 2008. › [Kitchenham2002] Kitchenham, Ofleeger, “Principles of survey research: part 3: constructing a survey instrument”, ACM SIGSOFT Sw Eng 2002, vol 27 issue 2 › [Ji2008] Ji et al, “Some lessons learned in conducting software engineering surveys in china”, ESEM ’08 › [Babbie1990] E Babbie, “Survey Research Methods” › [Fisk2005] http://hccedl.cc.gatech.edu/taxonomy/docInfo.php?doc=40 › [Weiss2000] http://www.jhsph.edu/research/centers-and- institutes/center-for-refugee-and-disaster- response/publications_tools/publications/qualresearch.html › [Parkinson1957] Parkinson, C. Northcote. "Parkinson's law, and other studies in administration." (1957). › [Stat414] https://onlinecourses.science.psu.edu/stat414/node/264