introduction
play

Introduction Marc H. Mehlman marcmehlman@yahoo.com University of - PowerPoint PPT Presentation

Introduction Marc H. Mehlman marcmehlman@yahoo.com University of New Haven To understand Gods thoughts, we must study statistics, for these are the measure of his purpose. Florence Nightingale Statistics: the mathematical


  1. Introduction Marc H. Mehlman marcmehlman@yahoo.com University of New Haven “To understand God’s thoughts, we must study statistics, for these are the measure of his purpose.” – Florence Nightingale “Statistics: the mathematical theory of ignorance.” – Morris Kline “Statistics means never having to say you’re certain.” – Anonymous Marc Mehlman Marc Mehlman (University of New Haven) Introduction 1 / 23

  2. Table of Contents Introduction 1 Studies 2 Blocking 3 Data Collection 4 Random Samples vs Simple Random Samples 5 Correlation 6 Error 7 Marc Mehlman Marc Mehlman (University of New Haven) Introduction 2 / 23

  3. Introduction Introduction Introduction “Data. Data. Data. I can’t make bricks without clay.” – Sherlock Holmes “In God we trust. All others must bring data.” - W. Edwards Deming Marc Mehlman Marc Mehlman (University of New Haven) Introduction 3 / 23

  4. Introduction Population and Sample The distinction between population and sample is basic to statistics. To make sense of any sample result, you must know what population the sample represents. The population in a statistical study is the entire group of The population in a statistical study is the entire group of individuals about which we want information. individuals about which we want information. A sample is the part of the population from which we actually A sample is the part of the population from which we actually collect information. We use information from a sample to draw collect information. We use information from a sample to draw conclusions about the entire population. conclusions about the entire population. Population Population Collect data from a representative Sample ... Sample Sample Make an Inference about the Population . 19 Marc Mehlman Marc Mehlman (University of New Haven) Introduction 4 / 23

  5. Introduction Definition If the sample is the entire population, it is called the census . A variable is a measurable characteristic of individuals within the population. The distribution of a variable is the frequency it obtains it outputs. Data is a variable’s values from the sample. Statistics is the science of drawing inference from data about the population. Statistic’s Origins: Anecdotes and noticing patterns in random happenings (samples). Assumption: sample collected from a random subset of population, ie. sample is a random sample. Definition The population can be parameterized. For instance if one is interested in weights of all Americans of age x , then x is a parameter . Marc Mehlman Marc Mehlman (University of New Haven) Introduction 5 / 23

  6. Introduction Example From the 50,000 residents of the town a Milford, 300 where selected randomly and asked if they have ever had cancer. The population is the 50,000 residents, the sample is the 300 randomly selected residents and the variable is the variable cancer/no cancer. It was too costly to contact all 50,000 residents so the actual distribution of cancer among the entire population is inferred from the distribution of the cancer of 300 randomly sampled residents. Definition (Types of Variables) qualitative (categorical): descriptive Examples: color of eyes, gender, city born in. quantitative: numeric Examples: height, miles per gallon, tem- perature, etc. Marc Mehlman Marc Mehlman (University of New Haven) Introduction 6 / 23

  7. Introduction Definition (Types of Quantitative Variables) discrete: discrete range Examples: # of children someone has, number of coins in pocket continuous: continuous range Examples: weight, speed Definition (Categories of Data) nominal level of measurement labels – no ordering. Examples: colors or yes/no ordinal level of measurement ordered, but distances between data values are meaningless. Examples: grades, ranks interval level of measurement ordered, distance between data values have meaning, but no natural zero. Example: height of class members above sea level ratio level of measurement interval level + a natural zero (ratios meaningful). Example: height of class members above classroom floor. Marc Mehlman Marc Mehlman (University of New Haven) Introduction 7 / 23

  8. Studies Studies Studies Marc Mehlman Marc Mehlman (University of New Haven) Introduction 8 / 23

  9. Studies Observation vs. Experiment When our goal is to understand cause and effect, experiments are the only source of fully convincing data. The distinction between observational study and experiment is one of the most important in statistics. An observational study observes individuals and measures An observational study observes individuals and measures variables of interest but does not attempt to influence the variables of interest but does not attempt to influence the responses. The purpose is to describe some group or situation. responses. The purpose is to describe some group or situation. An experiment deliberately imposes some treatment on An experiment deliberately imposes some treatment on individuals to measure their responses. The purpose is to study individuals to measure their responses. The purpose is to study whether the treatment causes a change in the response. whether the treatment causes a change in the response. In contrast to observational studies, experiments don’t just observe individuals or ask them questions. They actively impose some treatment in order to measure the response. 5 Marc Mehlman Marc Mehlman (University of New Haven) Introduction 9 / 23

  10. Studies Definition cross–sectional study data collected at one point in time retrospective study historical prospective or longitudinal study on going – collecting future data too. Example: Framingham Heart Study cohort study usually a longitudinal study – data is compared between cohorts (individuals who share similary characteristics or experience). Example: Minnesota Twin Family Study (MTFS) For an experiment: Definition single blinding subjects not aware if they are control group or treatment group. double blinding subjects and experimenters have no idea who is in control or treatment group. Marc Mehlman Marc Mehlman (University of New Haven) Introduction 10 / 23

  11. Blocking Blocking Blocking Marc Mehlman Marc Mehlman (University of New Haven) Introduction 11 / 23

  12. Blocking Blocking group like subjects together to reduce variance Example men with men and women with women. for instance, women maybe more prone to a disease than men. Or a treatment may help men and harm women. By blocking men and women apart, gender differences can be isolated. randomized block design vs completely randomized design – assign treatments randomly in each block versus assigning treatments to subjects at–large. “Block what you can; randomize what you cannot.” Moral: Group like subjects together to reduce variance. Marc Mehlman Marc Mehlman (University of New Haven) Introduction 12 / 23

  13. Blocking Sample Size big = more reliable results. replication is the measure of reliable, ie., if the experiment is replicated one would get similar results. small cheaper, faster Marc Mehlman Marc Mehlman (University of New Haven) Introduction 13 / 23

  14. Data Collection Data Collection Data Collection Marc Mehlman Marc Mehlman (University of New Haven) Introduction 14 / 23

  15. Data Collection Data Collection Types of Sampling self–selected sample one could send out a questions in a mailing and only get answers from those who choose to reply. systematic sampling Example: every 50 th subject. convenience sampling sample those that are easiest to sample. stratified sampling sample from each strata (subgroup) cluster sampling sample everyone in randomly selected clusters stratified sampling gives better results – see blocking Multistage Sampling Example 1 take a random sample of size 8 of the states. 2 take a simple random sample of size 7 of the counties in each state. 3 take a random sample of 6 cities. 4 take a random sample of 5 voters. Marc Mehlman Marc Mehlman (University of New Haven) Introduction 15 / 23

  16. Random Samples vs Simple Random Samples Random Samples vs Simple Random Samples Random Samples vs Simple Random Samples Marc Mehlman Marc Mehlman (University of New Haven) Introduction 16 / 23

  17. Random Samples vs Simple Random Samples Random Samples vs Simple Random Samples Definition random sample , x 1 , · · · , x n 1 each subject is as likely to be x i as any other subject 2 the x 1 , · · · , x n are indep – just because x i = Bob does not mean that x j can not be Bob. simple random sample: out of N subjects chose n randomly so that the probability of any n subjects being chosen is the same as any other n � N N ! � subjects ( = n !( N − n )! ). n Marc Mehlman Marc Mehlman (University of New Haven) Introduction 17 / 23

  18. Random Samples vs Simple Random Samples Random Samples vs Simple Random Samples Note: 1 subjects can be chosen more than once in a random sample - not so in a simple random sample. Simple random sample does not assume indep for the sample. 2 book’s definition is not quite right. Convention: When a simple random sample can be thought of as being a random sample? Answer: when sample size is 5% or less than the size of the entire population. Marc Mehlman Marc Mehlman (University of New Haven) Introduction 18 / 23

  19. Correlation Correlation Correlation Marc Mehlman Marc Mehlman (University of New Haven) Introduction 19 / 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend