ACMS 20340 Statistics for Life Sciences Chapter 7: Samples and - PowerPoint PPT Presentation

ACMS 20340 Statistics for Life Sciences Chapter 7: Samples and Observational Studies

Obtaining Data How do we obtain the data we need for statistical analysis? ◮ Suppose we’re interested in answering the question, “What percent of Americans drive to work daily?” ◮ We couldn’t possibly ask every American. ◮ However, we can get information from a sample chosen to represent the whole population ◮ But how do we choose such a sample?

The Language of Sampling Some important terminology: ◮ The population in a statistical study is the entire group of individuals about which we want information. ◮ A sample is a part of the population from which we actually collect information. ◮ A sampling design describes exactly how we choose a sample from the population.

The Challenges of Sampling Choosing a sample from a large or varied population can be challenging. We should ask ourselves: ◮ “What population do we want to describe?” ◮ “What variables do we want to measure?” Statistical studies cannot always sample from the entire population of interest due to practical or ethical reasons.

Sampling Designs Done Wrong Convenience Sample: The experimenter selects which individuals to measure (usually the close at hand, or the ones which are easy to contact). Voluntary Response Sample: Individuals solely decide whether or not to participate. The design of a statistical study is biased if it systematically favors certain outcomes. In both of these designs the sample is determined by choice, either the experimenter’s choice or the individuals choice. It is possible that parts of the population would never be chosen with these designs!

Sampling Designs Done Right Probability sampling removes bias by selecting individuals based on chance. A simple random sample (SRS) of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample selected. (This is stronger than saying every individual has the same chance of being selected). How do we pick such a sample?

Table of Random Digits A table of random digits is a long string of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with these two properties: ◮ Each entry in the table is equally likely to be any of the 10 digits 0 through 9. ◮ The entries are independent of each other: knowledge of one part of the table gives no information about any other part.

How to Use a Table of Random Digits 1. Assign to each member of the population a numerical label of the same length. 2. Select a place in the table and begin reading blocks of digits of the length we chose in step 1. 3. We include in our sample those individuals whose labels we find in the table. 4. If a block of numbers appears more than once, we ignore it.

Hooray For Technology! Commonly, we use a random number generator. ◮ SRS applet available on the book’s website. ◮ www.random.org

Advanced Sampling Designs—Stratified Random Sample While SRSs are ideal, they have some shortcomings. ◮ Suppose we are interested in including both majority and minority groups in our study. ◮ However, our SRS may include only a few members of one such minority group, or even no members at all. ◮ For instance could sample students about their sporting interests, but we want to include the opinions of scuba divers in our study. Stratified random sample: Sample important groups within the population separately and then combine the results. ◮ To include the opinions of scuba divers in our study, we can use a stratified random sample by choosing a sample from the scuba divers and another from the non-scuba divers. ◮ The results can then be combined.

Advanced Sampling Designs—Multistage Random Sample Another shortcoming: If the population is very large or is spread over a large area, choosing an SRS can be logistically challenging. Multistage random sample: Choose SRSs within SRSs. ◮ For example, to sample sporting interest of students in ND, we could first choose a random selection of dorms, and then for each chosen dorm, we then choose a SRS of students who live in that dorm.

Observation vs. Experiment There are two distinct settings for collecting data, both of which involve sampling: ◮ An observational study observes individuals and measures variables of interest but does not attempt to influence the responses. ◮ The purpose is to describe a group or situation. ◮ An experiment deliberately imposes some treatment on individuals in order to observe their responses. ◮ The purpose is to study any possible causation due to the treatment. We consider the specifics of observational studies in this chapter, and the specifics of experimental design in the next chapter.

Some Types of Observational Studies We will focus on three in particular: ◮ Sample Surveys ◮ Case-Control Studies ◮ Cohort Studies

Sample Surveys A sample survey is an observational study trying to answer questions of a population. In conducting a sample survey, one asks the members of a sample one or more questions (spoken or written). ◮ Opinion surveys ◮ Election polling

Potential Pitfalls of Sample Surveys Even when we have an SRS, the sample still may be biased. Some sources of bias in sample surveys: 1. undercoverage 2. non-response 3. response bias 4. wording of questions To avoid these problems, try to think of all the possible ways one might not get an accurate, random sample even after using randomization to select the sample.

Undercoverage If one or more groups of the population are left out of the process of generating the samples, then the survey will undercover the population. ◮ For example, when doing a telephone survey people who do not have phones will be left out. ◮ If the stated population for the survey includes people without phones, this would be an example of undercoverage. If a certain group is not included in our sample, this may not be due to undercoverage. ◮ If a group has the potential to be in the sample and isn’t, that is fine. ◮ Undercoverage occurs only if there is no way for the sampling process to select individuals from some subgroup.

Nonresponse Nonresponse occurs when an individual cannot be contacted or refuses to take part after being selected to be part of the sample. If enough people of a certain type refuse to participate, the omission of this group of people can bias the survey.

Response Rates

Response Bias Some survey questions may be on sensitive topics, and as a result, the responder may exaggerate or understate his or her answers. ◮ “How much do you weigh?” ◮ “Have you ever committed a felony?” It may not be just the content of a question that results in response bias, but also certain traits of the interviewer. ◮ The interviewer’s gender. ◮ The interviewer’s ethnicity.

Wording of Questions The wording of the questions can potentially have a large on the answer given. ◮ Confusing or leading questions can change a survey’s outcome. ◮ Questions worded like “Do you agree that it is awful that...” are prompting you to give a particular response. ◮ Questions may also be too complicated and confusing. Many questions are standardized to allow comparison with earlier studies.

Wording Differences

Case-Control Study The second type of observational study that we consider is the case-control study . In a case-control observational study, we consider samples of individuals from two different groups: (i) case-subjects are selected based on a defined outcome, and (ii) and a control group of subjects is selected separately to serve as a baseline with which the case group is compared. Once these two samples are selected, we look for exposure factors in the subjects’ past (the retrospective approach).

Some Pros and Cons of Case-Control Studies Case-control studies are useful for studying rare conditions. However, selecting controls can be challenging. Not all case-control studies are so careful about the choice of control group: Historical-control designs are case-control studies that utilize existing data from previous studies to make up the control group. Historical-control designs may introduce confounding variables (to be defined shortly). When approval by an ethics committee is hard to obtain, historical-control designs may be the only available tool.

Cohort Study The last type of observational study we will consider is the cohort study . Cohort studies enlist individuals of common demographic, and keep track of them over a long period of time (the prospective approach). Individuals who later develop a condition are compared with those who don’t. In general, cohort studies examine the compounded effect of certain factors over time. ◮ They are good for studying common conditions. ◮ However, they can be very expensive.

Example of a Cohort Study

Confounding variables Two variables (explanatory or lurking) are confounded when their effects on a response variable cannot be distinguished from each other. Observational studies of the effect of a variable often fail because of confounding. For example, moderate use of alcohol is associated with better health. Observational studies suggest wine has a better effect on health than other alcoholic beverages. Is there a confounding variable?

Confound It!

ACMS 20340 Statistics for Life Sciences Chapter 7: Samples and - PowerPoint PPT Presentation

ACMS 20340 Statistics for Life Sciences Chapter 7: Samples and Observational Studies Obtaining Data How do we obtain the data we need for statistical analysis? Suppose were interested in answering the question, What percent of

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability Why Consider

ACMS 20340 Statistics for Life Sciences Chapter 3: Scatterplots and Correlation Exploratory

ACMS 20340 Statistics for Life Sciences Chapter 8: Designing Experiments Fishers Experiments

ACMS 20340 Statistics for Life Sciences Chapter 13: Sampling Distributions Sampling We use

ACMS 20340 Statistics for Life Sciences Chapter 18: Comparing Two Means Daily Activity and

ACMS 20340 Statistics for Life Sciences Chapter 15: Inference in Practice Inference in Practice

ACMS 20340 Statistics for Life Sciences Chapter 14: Introduction to Inference Sampling

ACMS 20340 Statistics for Life Sciences Chapter 4: Regression A Quick Recap of Chapter 3

ACMS 20340 Statistics for Life Sciences Chapter 11: The Normal Distributions Introducing the

ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two Proportions Two sample tests

ACMS 20340 Statistics for Life Sciences Chapter 22: The Chi-Square Test for Two-Way Tables

ACMS 20340 Statistics for Life Sciences Chapter 17: Inference About a Population Mean

ACMS 20340 Statistics for Life Sciences Chapter 19: Inference about a Population Proportion

ACMS 20340 Statistics for Life Sciences Chapter 24: One-way Analysis of Variance: Comparing

ACMS 20340 Statistics for Life Sciences Chapter 12: Discrete Probability Distributions What

ACMS 20340 Statistics for Life Sciences Chapter 21: The Chi-Square Test for Goodness of Fit

Population Substructure and Control Selection in Genome-wide Association Studies Kai Yu, Ph.D.

STATISTICS 536B, Lecture #6 March 12, 2015 Meta-Analysis - continued: Selected comments prompted

Developing Risk Prediction Models Using Nested Case-Control Data 28 May 2015 Agus Salim ()

Statistical Methods for Evaluating Correlates of Risk Peter Gilbert Sanofi Pasteur Swiftwater PA

Introduction to Observational Studies Deborah Friedman, MD, MPH University of Texas Southwestern

e t o u q Report of the HEI Diesel r o Epidemiology Panel (Part II): e Diesel

Histologic Types of Endometrial Cancer: Have They Different Risk Factors? Have They Different

Detection of Evidence in Clinical Research Papers Patrick Davis-Desmond Diego Moll a

Sambuz

Useful Links

Newsletter

Mail Us