u nit 1 i ntroduction to data l ecture 1 d ata collection
play

U nit 1: I ntroduction to data L ecture 1: D ata collection , - PowerPoint PPT Presentation

U nit 1: I ntroduction to data L ecture 1: D ata collection , observational studies , and experiments S tatistics 101 Nicole Dalzell Duke University May 13, 2015 Welcome to Stat 101! Welcome to Stat 101! 1 Introduction to Inference


  1. Welcome to Stat 101! Introduction to Inference Initials − All names in 201 (M) 1200 800 600 400 200 0 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Initials − All names in 2012 (F) 3000 2500 2000 1500 1000 500 0 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 14 / 59

  2. Welcome to Stat 101! Introduction to Inference Step 4: Form a conclusion In 2012, there were about 3,000 babies given a first name that began with ”j” based on the data from the Social Security database The list of babies from the Social Security data set is a sample , a group of individuals taken from the entire population. The number of individuals in the sample is usually denoted with the letter n . A statistic is any function of the data collected in the sample (e.g., mean, median, etc). So, the count of the babies in the Social Security data set for 2012 who have first initial ”j” is a statistic . Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 15 / 59

  3. Welcome to Stat 101! Populations and Samples Data Collection Be aware that there exist “bad” samples. “There are three kinds of lies: lies, damned lies, and statistics.” If poor sampling techniques are utilized, then the observed statistics will not be applicable to the true population of interest. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 16 / 59

  4. Welcome to Stat 101! Populations and Samples Data Collection Be aware that there exist “bad” samples. “There are three kinds of lies: lies, damned lies, and statistics.” If poor sampling techniques are utilized, then the observed statistics will not be applicable to the true population of interest. Example Data Collection: Raise your hand if you have been on an airplane in the past two years. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 16 / 59

  5. Welcome to Stat 101! Populations and Samples Data Collection Be aware that there exist “bad” samples. “There are three kinds of lies: lies, damned lies, and statistics.” If poor sampling techniques are utilized, then the observed statistics will not be applicable to the true population of interest. Example Data Collection: Raise your hand if you have been on an airplane in the past two years. What does this tell us about how many 17-30 year olds have ridden an airplane in the past two years? Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 16 / 59

  6. Welcome to Stat 101! Sampling from a population Census Wouldn’t it be better to just include everyone and “sample” the entire population, i.e. conduct a census ? Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 17 / 59

  7. Welcome to Stat 101! Sampling from a population Census Wouldn’t it be better to just include everyone and “sample” the entire population, i.e. conduct a census ? Some individuals are hard to locate or hard to measure. And these difficult-to-find people may have certain characteristics that distinguish them from the rest of the population. Populations rarely stand still. Even if you could take a census, the population changes constantly, so it’s never possible to get a perfect measure. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 17 / 59

  8. Welcome to Stat 101! Sampling from a population Census Wouldn’t it be better to just include everyone and “sample” the entire population, i.e. conduct a census ? Some individuals are hard to locate or hard to measure. And these difficult-to-find people may have certain characteristics that distinguish them from the rest of the population. Populations rarely stand still. Even if you could take a census, the population changes constantly, so it’s never possible to get a perfect measure. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 17 / 59 http://www.npr.org/templates/story/story.php?storyId=125380052

  9. Welcome to Stat 101! Sampling from a population Exploratory analysis to inference Sampling is natural. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 18 / 59

  10. Welcome to Stat 101! Sampling from a population Exploratory analysis to inference Sampling is natural. Think about sampling something you are cooking - you taste (examine) a small part of what you’re cooking to get an idea about the dish as a whole. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 18 / 59

  11. Welcome to Stat 101! Sampling from a population Exploratory analysis to inference Sampling is natural. Think about sampling something you are cooking - you taste (examine) a small part of what you’re cooking to get an idea about the dish as a whole. When you taste a spoonful of soup and decide the spoonful you tasted isn’t salty enough, that’s exploratory analysis . Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 18 / 59

  12. Welcome to Stat 101! Sampling from a population Exploratory analysis to inference Sampling is natural. Think about sampling something you are cooking - you taste (examine) a small part of what you’re cooking to get an idea about the dish as a whole. When you taste a spoonful of soup and decide the spoonful you tasted isn’t salty enough, that’s exploratory analysis . If you generalize and conclude that your entire soup needs salt, that’s an inference . Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 18 / 59

  13. Welcome to Stat 101! Sampling from a population Exploratory analysis to inference Sampling is natural. Think about sampling something you are cooking - you taste (examine) a small part of what you’re cooking to get an idea about the dish as a whole. When you taste a spoonful of soup and decide the spoonful you tasted isn’t salty enough, that’s exploratory analysis . If you generalize and conclude that your entire soup needs salt, that’s an inference . For your inference to be valid, the spoonful you tasted (the sample) needs to be representative of the entire pot (the population). If your spoonful comes only from the surface and the salt is collected at the bottom of the pot, what you tasted is probably not representative of the whole pot. If you first stir the soup thoroughly before you taste, your spoonful will more likely be representative of the whole pot. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 18 / 59

  14. Welcome to Stat 101! Sampling bias Landon vs. FDR A historical example of a biased sample yielding misleading results: In 1936, Landon sought the Republican presidential nomination opposing the re-election of FDR. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 19 / 59

  15. Welcome to Stat 101! Sampling bias The Literary Digest Poll The Literary Digest polled about 10 million Americans, and got responses from about 2.4 million. The poll showed that Landon would likely be the overwhelming winner and FDR would get only 43% of the votes. Election result: FDR won, with 62% of the votes. The magazine was completely discredited because of the poll, and was soon discontinued. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 20 / 59

  16. Welcome to Stat 101! Sampling bias The Literary Digest Poll - what went wrong? The magazine had surveyed its own readers, registered automobile owners, and registered telephone users. These groups had incomes well above the national average of the day (remember, this is Great Depression era) which resulted in lists of voters far more likely to support Republicans than a truly typical voter of the time, i.e. the sample was not representative of the American population at the time. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 21 / 59

  17. Welcome to Stat 101! Sampling bias Large samples are preferable, but... The Literary Digest election poll was based on a sample size of 2.4 million, which is huge, but since the sample was biased , the sample did not yield an accurate prediction. Back to the soup analogy: If the soup is not well stirred, it doesn’t matter how large a spoon you have, it will still not taste right. If the soup is well stirred, a small spoon will suffice to test the soup. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 22 / 59

  18. Welcome to Stat 101! Sampling bias A few sources of bias Non-response: If only a (non-random) fraction of the randomly sampled people choose to respond to a survey, the sample may no longer be representative of the population. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 23 / 59

  19. Welcome to Stat 101! Sampling bias A few sources of bias Non-response: If only a (non-random) fraction of the randomly sampled people choose to respond to a survey, the sample may no longer be representative of the population. Voluntary response: Occurs when the sample consists of people who volunteer to respond because they have strong opinions on the issue, and hence is not representative of the population. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 23 / 59

  20. Welcome to Stat 101! Sampling bias A few sources of bias Non-response: If only a (non-random) fraction of the randomly sampled people choose to respond to a survey, the sample may no longer be representative of the population. Voluntary response: Occurs when the sample consists of people who volunteer to respond because they have strong opinions on the issue, and hence is not representative of the population. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 23 / 59

  21. Welcome to Stat 101! Sampling bias A few sources of bias Non-response: If only a (non-random) fraction of the randomly sampled people choose to respond to a survey, the sample may no longer be representative of the population. Voluntary response: Occurs when the sample consists of people who volunteer to respond because they have strong opinions on the issue, and hence is not representative of the population. edition.com, Aug 29, 2013 Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 23 / 59

  22. Welcome to Stat 101! Sampling bias A few sources of bias Non-response: If only a (non-random) fraction of the randomly sampled people choose to respond to a survey, the sample may no longer be representative of the population. Voluntary response: Occurs when the sample consists of people who volunteer to respond because they have strong opinions on the issue, and hence is not representative of the population. edition.com, Aug 29, 2013 Convenience sample: Individuals who are easily accessible are more likely to be included in the sample. What type of bias do reviews on Amazon.com have? What about re- views on RateMyProfessor.com? Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 23 / 59

  23. Welcome to Stat 101! Sampling bias Participation question A school district is considering whether it will no longer allow high school students to park at school after two recent accidents where students were severely injured. As a first step, they survey parents by mail, asking them whether or not the parents would object to this policy change. Of 6,000 sur- veys that go out, 1,200 are returned. Of these 1,200 surveys that were com- pleted, 960 agreed with the policy change and 240 disagreed. Which of the following statements are true? I. Some of the mailings may have never reached the parents. II. The school district has strong support from parents to move forward with the policy approval. III. It is possible that majority of the parents of high school students disagree with the policy change. IV. The survey results are unlikely to be biased because all parents were mailed a survey. (a) Only I (b) I and II (c) I and III (d) III and IV (e) Only IV Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 24 / 59

  24. Welcome to Stat 101! Sampling bias Participation question A school district is considering whether it will no longer allow high school students to park at school after two recent accidents where students were severely injured. As a first step, they survey parents by mail, asking them whether or not the parents would object to this policy change. Of 6,000 sur- veys that go out, 1,200 are returned. Of these 1,200 surveys that were com- pleted, 960 agreed with the policy change and 240 disagreed. Which of the following statements are true? I. Some of the mailings may have never reached the parents. II. The school district has strong support from parents to move forward with the policy approval. III. It is possible that majority of the parents of high school students disagree with the policy change. IV. The survey results are unlikely to be biased because all parents were mailed a survey. (a) Only I (b) I and II (c) I and III (d) III and IV (e) Only IV Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 24 / 59

  25. Welcome to Stat 101! Sampling bias A picture’s worth a lot, but... A lot of the time we only have part of the story. BabyCenter: ”Our data comes from nearly half a million parents who shared their baby’s name with us in 2014.” http://www.babycenter.com/top-baby-names-2014 The Netflix effect 1 Orange is the new Black : Galina, Piper, Nicky, Alex, Gloria House of Cards : Garrett, Claire, Robin, Wright A blizzard of Frozen names (Elsa, Hans, Kristin) 2 Are we comfortable making decisions about these name trends based on this data? Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 25 / 59

  26. Welcome to Stat 101! Sampling bias A picture’s worth a lot, but... A lot of the time we only have part of the story. BabyCenter: ”Our data comes from nearly half a million parents who shared their baby’s name with us in 2014.” http://www.babycenter.com/top-baby-names-2014 The Netflix effect 1 Orange is the new Black : Galina, Piper, Nicky, Alex, Gloria House of Cards : Garrett, Claire, Robin, Wright A blizzard of Frozen names (Elsa, Hans, Kristin) 2 Are we comfortable making decisions about these name trends based on this data? The ”name Elsa soared 29 percent on our list of names for baby girls”. Is this sample statistic enough for us to conclude that the population parameter of the percent of newborn girls in the United States who are named Elsa has increased from 2013 to 2014? http://www.babycenter.com/top-baby-names-2014 Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 25 / 59

  27. Welcome to Stat 101! Observational studies and experiments Causality versus Correlation Has the popularity in Frozen caused an increase in the number 1 of baby girls in 2014 that were named Elsa? Causal Effect Is the popularity in Frozen related to the increase in the number 2 of baby girls in 2014 that were named Elsa? Correlation , or relationship We collect our data differently depending on the type of relationship (causal or correlation) that we are interested in. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 26 / 59

  28. Welcome to Stat 101! Observational studies and experiments Observational studies and experiments An experimental study is a controlled study in which the researchers impose treatments upon the subjects. Subjects are assigned to control and treatment groups using random assignment . Experiments are the preferred method of data collection because often results can be attributed as causal . I.e., we can conclude that the treatments caused the response of the study. In some cases experiments are not always feasible or ethical. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 27 / 59

  29. Welcome to Stat 101! Observational studies and experiments Observational studies and experiments An experimental study is a controlled study in which the researchers impose treatments upon the subjects. Subjects are assigned to control and treatment groups using random assignment . Experiments are the preferred method of data collection because often results can be attributed as causal . I.e., we can conclude that the treatments caused the response of the study. In some cases experiments are not always feasible or ethical. An observational study is a study in which the researchers did not assign the subjects to treatments. Observational studies retain the notion of treatment and control groups. Observational studies still require the researcher to clearly define a research question. This requires identification of the response variable that they will measure on each subject in the study. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 27 / 59

  30. Welcome to Stat 101! Observational studies and experiments Experimental vs Observational Datasets (cont.) Example: We want to consider the effect of drinking alcohol during pregnancy on rates of Fetal Alcohol Syndrome. Research question (population/parameter)? Should we use experimental or observational data? What potential biases should we be cautious of? Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 28 / 59

  31. Welcome to Stat 101! Observational studies and experiments Experimental vs Observational Datasets (cont.) Example: We want to consider the effect of drinking alcohol during pregnancy on rates of Fetal Alcohol Syndrome. Research question (population/parameter)? Should we use experimental or observational data? What potential biases should we be cautious of? Response Bias Non-response Bias Undercoverage Bias Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 28 / 59

  32. Welcome to Stat 101! Observational studies and experiments Observational studies and experiments (Recap) Observational study: Researchers collect data in a way that does not directly interfere with how the data arise, i.e. they merely “observe”, and can only establish an association between the explanatory and response variables. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 29 / 59

  33. Welcome to Stat 101! Observational studies and experiments Observational studies and experiments (Recap) Observational study: Researchers collect data in a way that does not directly interfere with how the data arise, i.e. they merely “observe”, and can only establish an association between the explanatory and response variables. Experiment: Researchers randomly assign subjects to various treatments in order to establish causal connections between the explanatory and response variables. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 29 / 59

  34. Welcome to Stat 101! Observational studies and experiments Observational studies and experiments (Recap) Observational study: Researchers collect data in a way that does not directly interfere with how the data arise, i.e. they merely “observe”, and can only establish an association between the explanatory and response variables. Experiment: Researchers randomly assign subjects to various treatments in order to establish causal connections between the explanatory and response variables. If you’re going to walk away with one thing from this class, let it be “correlation does not imply causation”. http://xkcd.com/552/ Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 29 / 59

  35. Welcome to Stat 101! Cereal breakfast Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 30 / 59

  36. Welcome to Stat 101! Cereal breakfast What type of study is this, observational study or an experiment? “Girls who regularly ate breakfast, particularly one that includes cereal, were slimmer than those who skipped the morning meal, according to a study that tracked nearly 2,400 girls for 10 years. [...] As part of the survey, the girls were asked once a year what they had eaten during the previous three days.” What is the conclusion of the study? Who sponsored the study? Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 31 / 59

  37. Welcome to Stat 101! Cereal breakfast What type of study is this, observational study or an experiment? “Girls who regularly ate breakfast, particularly one that includes cereal, were slimmer than those who skipped the morning meal, according to a study that tracked nearly 2,400 girls for 10 years. [...] As part of the survey, the girls were asked once a year what they had eaten during the previous three days.” This is an observational study since the researchers merely observed the behavior of the girls (subjects) as opposed to imposing treatments on them. What is the conclusion of the study? Who sponsored the study? Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 31 / 59

  38. Welcome to Stat 101! Cereal breakfast What type of study is this, observational study or an experiment? “Girls who regularly ate breakfast, particularly one that includes cereal, were slimmer than those who skipped the morning meal, according to a study that tracked nearly 2,400 girls for 10 years. [...] As part of the survey, the girls were asked once a year what they had eaten during the previous three days.” This is an observational study since the researchers merely observed the behavior of the girls (subjects) as opposed to imposing treatments on them. What is the conclusion of the study? There is an association between girls eating breakfast and being slimmer. Who sponsored the study? Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 31 / 59

  39. Welcome to Stat 101! Cereal breakfast What type of study is this, observational study or an experiment? “Girls who regularly ate breakfast, particularly one that includes cereal, were slimmer than those who skipped the morning meal, according to a study that tracked nearly 2,400 girls for 10 years. [...] As part of the survey, the girls were asked once a year what they had eaten during the previous three days.” This is an observational study since the researchers merely observed the behavior of the girls (subjects) as opposed to imposing treatments on them. What is the conclusion of the study? There is an association between girls eating breakfast and being slimmer. Who sponsored the study? General Mills. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 31 / 59

  40. Welcome to Stat 101! Cereal breakfast 3 possible explanations: Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 32 / 59

  41. Welcome to Stat 101! Cereal breakfast 3 possible explanations: Eating breakfast causes girls to be thinner. 1 Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 32 / 59

  42. Welcome to Stat 101! Cereal breakfast 3 possible explanations: Eating breakfast causes girls to be thinner. 1 Being thin causes girls to eat breakfast. 2 Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 32 / 59

  43. Welcome to Stat 101! Cereal breakfast 3 possible explanations: Eating breakfast causes girls to be thinner. 1 Being thin causes girls to eat breakfast. 2 A third variable is responsible for both. What could it be? 3 An extraneous variable that affects both the explanatory and the response variable and that make it seem like there is a relationship between the two are called confounding variables. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 32 / 59

  44. Welcome to Stat 101! Observations and variables Types of variables all variables categorical numerical regular continuous discrete ordinal categorical unordered ordered measured counted categories categories Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 33 / 59

  45. Welcome to Stat 101! Observations and variables Types of variables (cont.) type : small, midsize or large. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

  46. Welcome to Stat 101! Observations and variables Types of variables (cont.) type : small, midsize or large. (categorical, ordinal) Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

  47. Welcome to Stat 101! Observations and variables Types of variables (cont.) type : small, midsize or large. (categorical, ordinal) price : average price in $1000’s Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

  48. Welcome to Stat 101! Observations and variables Types of variables (cont.) type : small, midsize or large. (categorical, ordinal) price : average price in $1000’s (numerical, continuous) Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

  49. Welcome to Stat 101! Observations and variables Types of variables (cont.) type : small, midsize or large. (categorical, ordinal) price : average price in $1000’s (numerical, continuous) mpgCity : cite mileage per gallon Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

  50. Welcome to Stat 101! Observations and variables Types of variables (cont.) type : small, midsize or large. (categorical, ordinal) price : average price in $1000’s (numerical, continuous) mpgCity : cite mileage per gallon (numerical, continuous) Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

  51. Welcome to Stat 101! Observations and variables Types of variables (cont.) type : small, midsize or large. (categorical, ordinal) price : average price in $1000’s (numerical, continuous) mpgCity : cite mileage per gallon (numerical, continuous) drivetrain : front, rear, 4WD Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

  52. Welcome to Stat 101! Observations and variables Types of variables (cont.) type : small, midsize or large. (categorical, ordinal) price : average price in $1000’s (numerical, continuous) mpgCity : cite mileage per gallon (numerical, continuous) drivetrain : front, rear, 4WD (categorical) Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

  53. Welcome to Stat 101! Observations and variables Types of variables (cont.) type : small, midsize or large. (categorical, ordinal) price : average price in $1000’s (numerical, continuous) mpgCity : cite mileage per gallon (numerical, continuous) drivetrain : front, rear, 4WD (categorical) passengers : passenger capacity Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

  54. Welcome to Stat 101! Observations and variables Types of variables (cont.) type : small, midsize or large. (categorical, ordinal) price : average price in $1000’s (numerical, continuous) mpgCity : cite mileage per gallon (numerical, continuous) drivetrain : front, rear, 4WD (categorical) passengers : passenger capacity (numerical, discrete) Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

  55. Welcome to Stat 101! Observations and variables Types of variables (cont.) type : small, midsize or large. (categorical, ordinal) price : average price in $1000’s (numerical, continuous) mpgCity : cite mileage per gallon (numerical, continuous) drivetrain : front, rear, 4WD (categorical) passengers : passenger capacity (numerical, discrete) weight : car weight in pounds Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

  56. Welcome to Stat 101! Observations and variables Types of variables (cont.) type : small, midsize or large. (categorical, ordinal) price : average price in $1000’s (numerical, continuous) mpgCity : cite mileage per gallon (numerical, continuous) drivetrain : front, rear, 4WD (categorical) passengers : passenger capacity (numerical, discrete) weight : car weight in pounds (numerical, continuous) Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

  57. Welcome to Stat 101! Observations and variables Types of variables (cont.) type : small, midsize or large. (categorical, ordinal) price : average price in $1000’s (numerical, continuous) mpgCity : cite mileage per gallon (numerical, continuous) drivetrain : front, rear, 4WD (categorical) passengers : passenger capacity (numerical, discrete) weight : car weight in pounds (numerical, continuous) Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 34 / 59

  58. Welcome to Stat 101! Principles of experimental design Principles of experimental design Control: Compare treatment of interest to a control group. 1 Randomize: Randomly assign subjects to treatments. 2 Replicate: Within a study, replicate by collecting a sufficiently 3 large sample. Or replicate the entire study. Block: If there are variables that are known or suspected to affect 4 the response variable, first group subjects into blocks based on these variables, and then randomize cases within each block to treatment groups. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 35 / 59

  59. Welcome to Stat 101! Principles of experimental design More on blocking We would like to design an experiment to investigate if energy gels makes you run faster: Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 36 / 59

  60. Welcome to Stat 101! Principles of experimental design More on blocking We would like to design an experiment to investigate if energy gels makes you run faster: Treatment: energy gel Control: no energy gel Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 36 / 59

  61. Welcome to Stat 101! Principles of experimental design More on blocking We would like to design an experiment to investigate if energy gels makes you run faster: Treatment: energy gel Control: no energy gel It is suspected that energy gels might affect pro and amateur athletes differently, therefore we block for pro status: Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 36 / 59

  62. Welcome to Stat 101! Principles of experimental design More on blocking We would like to design an experiment to investigate if energy gels makes you run faster: Treatment: energy gel Control: no energy gel It is suspected that energy gels might affect pro and amateur athletes differently, therefore we block for pro status: Divide the sample to pro and amateur Randomly assign pro athletes to treatment and control groups Randomly assign amateur athletes to treatment and control groups Pro/amateur status is equally represented in the resulting treatment and control groups Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 36 / 59

  63. Welcome to Stat 101! Principles of experimental design More on blocking We would like to design an experiment to investigate if energy gels makes you run faster: Treatment: energy gel Control: no energy gel It is suspected that energy gels might affect pro and amateur athletes differently, therefore we block for pro status: Divide the sample to pro and amateur Randomly assign pro athletes to treatment and control groups Randomly assign amateur athletes to treatment and control groups Pro/amateur status is equally represented in the resulting treatment and control groups Why is this important? Can you think of other variables to block for? Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 36 / 59

  64. Welcome to Stat 101! Principles of experimental design Participation question A study is designed to test the effect of light level and noise level on exam performance of students. The researcher also believes that light and noise levels might have different effects on males and females, so wants to make sure both genders are represented equally under different conditions. Which of the below is correct? (a) There are 3 explanatory variables (light, noise, gender) and 1 response variable (exam performance) (b) There are 2 explanatory variables (light and noise), 1 blocking variable (gender), and 1 response variable (exam performance) (c) There is 1 explanatory variable (gender) and 3 response variables (light, noise, exam performance) (d) There are 2 blocking variables (light and noise), 1 explanatory variable (gender), and 1 response variable (exam performance) Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 37 / 59

  65. Welcome to Stat 101! Principles of experimental design Participation question A study is designed to test the effect of light level and noise level on exam performance of students. The researcher also believes that light and noise levels might have different effects on males and females, so wants to make sure both genders are represented equally under different conditions. Which of the below is correct? (a) There are 3 explanatory variables (light, noise, gender) and 1 response variable (exam performance) (b) There are 2 explanatory variables (light and noise), 1 blocking variable (gender), and 1 response variable (exam performance) (c) There is 1 explanatory variable (gender) and 3 response variables (light, noise, exam performance) (d) There are 2 blocking variables (light and noise), 1 explanatory variable (gender), and 1 response variable (exam performance) Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 37 / 59

  66. Welcome to Stat 101! Principles of experimental design Difference between blocking and explanatory variables Factors are conditions we can impose on the experimental units. Blocking variables are characteristics that the experimental units come with, that we would like to control for. Blocking is like stratifying, except used in experimental settings when randomly assigning, as opposed to when sampling. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 38 / 59

  67. Welcome to Stat 101! Principles of experimental design More experimental design terminology... Placebo: fake treatment, often used as the control group for medical studies Placebo effect: experimental units showing improvement simply because they believe they are receiving a special treatment Blinding: when experimental units do not know whether they are in the control or treatment group Double-blind: when both the experimental units and the researchers do not know who is in the control and who is in the treatment group Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 39 / 59

  68. Welcome to Stat 101! Recap Participation question What is the main difference between observational studies and exper- iments? (a) Experiments take place in a lab while observational studies do not need to. (b) In an observational study we only look at what happened in the past. (c) Most experiments use random assignment while observational studies do not. (d) Observational studies are completely useless since no causal inference can be made based on their findings. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 40 / 59

  69. Welcome to Stat 101! Recap Participation question What is the main difference between observational studies and exper- iments? (a) Experiments take place in a lab while observational studies do not need to. (b) In an observational study we only look at what happened in the past. (c) Most experiments use random assignment while observational studies do not. (d) Observational studies are completely useless since no causal inference can be made based on their findings. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 40 / 59

  70. Welcome to Stat 101! Recap Random assignment vs. random sampling most ideal Random No random observational experiment assignment assignment studies No causal conclusion, Random Causal conclusion, correlation statement Generalizability generalized to the whole sampling generalized to the whole population. population. No causal conclusion, No random No Causal conclusion, correlation statement only sampling only for the sample. generalizability for the sample. bad most Causation Correlation observational experiments studies Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 41 / 59

  71. Welcome to Stat 101! Recap More... Want more baby name analysis? Freakonomics podcast: How Much Does Your Name Matter? http://freakonomics.com/2013/04/08/ how-much-does-your-name-matter-a-new-freakonomics-radio-podcast/ Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 42 / 59

  72. Syllabus & policies Welcome to Stat 101! 1 Introduction to Inference Populations and Samples Sampling from a population Sampling bias Observational studies and experiments Cereal breakfast Observations and variables Principles of experimental design Recap Syllabus & policies 2 Logistics Goals and topics Details Support Policies Tips To do 3 Sta 101 U1 - L1: Data coll., obs. studies, experiments N.Dalzell– Duke University

  73. Syllabus & policies Logistics General Info Instructor: Nicole Dalzell - nmd16@stat.duke.edu Old Chemistry 214 MTuWThF 11:30 AM - 12:45 PM Lecture: Old Chemistry 201 Lab: TuWTh 1 PM - 2 PM Old Chemistry 101 Office Tentative: Monday 2:00 PM - 3:00 PM hours: Wednesday 3-4 PM Friday 2-3 PM, or by appointment Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 42 / 59

  74. Syllabus & policies Logistics Required materials Textbook OpenIntro Statistics Diez, Barr, C ¸ etinkaya-Rundel CreateSpace, 2 nd Edition, 2012 ISBN: 978-1478217206 (Optional) You might need a four function calcu- Calculator lator that can do square roots for this class. No limitation on the type of calculator you can use. Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 43 / 59

  75. Syllabus & policies Logistics Webpage https://stat.duke.edu/ ∼ nmd16/courses/Summer15/sta101.001-1/ Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments May 13, 2015 44 / 59

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend