Introduction Marc H. Mehlman marcmehlman@yahoo.com University of - - PowerPoint PPT Presentation

introduction
SMART_READER_LITE
LIVE PREVIEW

Introduction Marc H. Mehlman marcmehlman@yahoo.com University of - - PowerPoint PPT Presentation

Introduction Marc H. Mehlman marcmehlman@yahoo.com University of New Haven To understand Gods thoughts, we must study statistics, for these are the measure of his purpose. Florence Nightingale Statistics: the mathematical


slide-1
SLIDE 1

Marc Mehlman

Introduction

Marc H. Mehlman

marcmehlman@yahoo.com

University of New Haven

“To understand God’s thoughts, we must study statistics, for these are the measure of his purpose.” – Florence Nightingale “Statistics: the mathematical theory of ignorance.” – Morris Kline “Statistics means never having to say you’re certain.” – Anonymous

Marc Mehlman (University of New Haven) Introduction 1 / 23

slide-2
SLIDE 2

Marc Mehlman

Table of Contents

1

Introduction

2

Studies

3

Blocking

4

Data Collection

5

Random Samples vs Simple Random Samples

6

Correlation

7

Error

Marc Mehlman (University of New Haven) Introduction 2 / 23

slide-3
SLIDE 3

Marc Mehlman

Introduction

Introduction

Introduction

“Data. Data. Data. I can’t make bricks without clay.” – Sherlock Holmes “In God we trust. All others must bring data.” - W. Edwards Deming

Marc Mehlman (University of New Haven) Introduction 3 / 23

slide-4
SLIDE 4

Marc Mehlman

Introduction

The distinction between population and sample is basic to statistics. To make sense of any sample result, you must know what population the sample represents.

19

Population and Sample

The population in a statistical study is the entire group of individuals about which we want information. A sample is the part of the population from which we actually collect information. We use information from a sample to draw conclusions about the entire population. The population in a statistical study is the entire group of individuals about which we want information. A sample is the part of the population from which we actually collect information. We use information from a sample to draw conclusions about the entire population.

Population Population Sample Sample

Collect data from a representative Sample... Make an Inference about the Population.

Marc Mehlman (University of New Haven) Introduction 4 / 23

slide-5
SLIDE 5

Marc Mehlman

Introduction

Definition If the sample is the entire population, it is called the census. A variable is a measurable characteristic of individuals within the population. The distribution of a variable is the frequency it obtains it outputs. Data is a variable’s values from the sample. Statistics is the science of drawing inference from data about the population. Statistic’s Origins: Anecdotes and noticing patterns in random happenings (samples). Assumption: sample collected from a random subset of population, ie. sample is a random sample. Definition The population can be parameterized. For instance if one is interested in weights of all Americans of age x, then x is a parameter.

Marc Mehlman (University of New Haven) Introduction 5 / 23

slide-6
SLIDE 6

Marc Mehlman

Introduction

Example From the 50,000 residents of the town a Milford, 300 where selected randomly and asked if they have ever had cancer. The population is the 50,000 residents, the sample is the 300 randomly selected residents and the variable is the variable cancer/no cancer. It was too costly to contact all 50,000 residents so the actual distribution of cancer among the entire population is inferred from the distribution of the cancer of 300 randomly sampled residents. Definition (Types of Variables) qualitative (categorical): descriptive Examples: color of eyes, gender, city born in. quantitative: numeric Examples: height, miles per gallon, tem- perature, etc.

Marc Mehlman (University of New Haven) Introduction 6 / 23

slide-7
SLIDE 7

Marc Mehlman

Introduction

Definition (Types of Quantitative Variables) discrete: discrete range Examples: # of children someone has, number of coins in pocket continuous: continuous range Examples: weight, speed Definition (Categories of Data) nominal level of measurement labels – no ordering. Examples: colors or yes/no

  • rdinal level of measurement ordered, but distances between data values are
  • meaningless. Examples: grades, ranks

interval level of measurement ordered, distance between data values have meaning, but no natural zero. Example: height of class members above sea level ratio level of measurement interval level + a natural zero (ratios meaningful). Example: height of class members above classroom floor.

Marc Mehlman (University of New Haven) Introduction 7 / 23

slide-8
SLIDE 8

Marc Mehlman

Studies

Studies

Studies

Marc Mehlman (University of New Haven) Introduction 8 / 23

slide-9
SLIDE 9

Marc Mehlman

Studies

In contrast to observational studies, experiments don’t just observe individuals or ask them questions. They actively impose some treatment in order to measure the response.

Observation vs. Experiment

An observational study observes individuals and measures variables of interest but does not attempt to influence the

  • responses. The purpose is to describe some group or situation.

An experiment deliberately imposes some treatment on individuals to measure their responses. The purpose is to study whether the treatment causes a change in the response. An observational study observes individuals and measures variables of interest but does not attempt to influence the

  • responses. The purpose is to describe some group or situation.

An experiment deliberately imposes some treatment on individuals to measure their responses. The purpose is to study whether the treatment causes a change in the response. When our goal is to understand cause and effect, experiments are the

  • nly source of fully convincing data.

The distinction between observational study and experiment is one of the most important in statistics.

5 Marc Mehlman (University of New Haven) Introduction 9 / 23

slide-10
SLIDE 10

Marc Mehlman

Studies

Definition cross–sectional study data collected at one point in time retrospective study historical prospective or longitudinal study on going – collecting future data too. Example: Framingham Heart Study cohort study usually a longitudinal study – data is compared between cohorts (individuals who share similary characteristics or experience). Example: Minnesota Twin Family Study (MTFS) For an experiment: Definition single blinding subjects not aware if they are control group or treatment group. double blinding subjects and experimenters have no idea who is in control

  • r treatment group.

Marc Mehlman (University of New Haven) Introduction 10 / 23

slide-11
SLIDE 11

Marc Mehlman

Blocking

Blocking

Blocking

Marc Mehlman (University of New Haven) Introduction 11 / 23

slide-12
SLIDE 12

Marc Mehlman

Blocking

Blocking

group like subjects together to reduce variance Example men with men and women with women. for instance, women maybe more prone to a disease than men. Or a treatment may help men and harm

  • women. By blocking men and women apart, gender differences can be

isolated. randomized block design vs completely randomized design – assign treatments randomly in each block versus assigning treatments to subjects at–large. “Block what you can; randomize what you cannot.” Moral: Group like subjects together to reduce variance.

Marc Mehlman (University of New Haven) Introduction 12 / 23

slide-13
SLIDE 13

Marc Mehlman

Blocking

Sample Size

big = more reliable results. replication is the measure of reliable, ie., if the experiment is replicated one would get similar results. small cheaper, faster

Marc Mehlman (University of New Haven) Introduction 13 / 23

slide-14
SLIDE 14

Marc Mehlman

Data Collection

Data Collection

Data Collection

Marc Mehlman (University of New Haven) Introduction 14 / 23

slide-15
SLIDE 15

Marc Mehlman

Data Collection

Data Collection

Types of Sampling self–selected sample one could send out a questions in a mailing and only get answers from those who choose to reply. systematic sampling Example: every 50th subject. convenience sampling sample those that are easiest to sample. stratified sampling sample from each strata (subgroup) cluster sampling sample everyone in randomly selected clusters stratified sampling gives better results – see blocking Multistage Sampling Example

1 take a random sample of size 8 of the states. 2 take a simple random sample of size 7 of the counties in each state. 3 take a random sample of 6 cities. 4 take a random sample of 5 voters. Marc Mehlman (University of New Haven) Introduction 15 / 23

slide-16
SLIDE 16

Marc Mehlman

Random Samples vs Simple Random Samples

Random Samples vs Simple Random Samples

Random Samples vs Simple Random Samples

Marc Mehlman (University of New Haven) Introduction 16 / 23

slide-17
SLIDE 17

Marc Mehlman

Random Samples vs Simple Random Samples

Random Samples vs Simple Random Samples

Definition random sample, x1, · · · , xn

1 each subject is as likely to be xi as any other subject 2 the x1, · · · , xn are indep – just because xi = Bob does not mean that

xj can not be Bob. simple random sample: out of N subjects chose n randomly so that the probability of any n subjects being chosen is the same as any other n subjects ( N

n

  • =

N! n!(N−n)!).

Marc Mehlman (University of New Haven) Introduction 17 / 23

slide-18
SLIDE 18

Marc Mehlman

Random Samples vs Simple Random Samples

Random Samples vs Simple Random Samples

Note:

1 subjects can be chosen more than once in a random sample - not so

in a simple random sample. Simple random sample does not assume indep for the sample.

2 book’s definition is not quite right.

Convention: When a simple random sample can be thought of as being a random sample? Answer: when sample size is 5% or less than the size of the entire population.

Marc Mehlman (University of New Haven) Introduction 18 / 23

slide-19
SLIDE 19

Marc Mehlman

Correlation

Correlation

Correlation

Marc Mehlman (University of New Haven) Introduction 19 / 23

slide-20
SLIDE 20

Marc Mehlman

Correlation

Correlation

“Correlation is not Causation” Definition Confounding: when 2 variables can give the same results so it is not possible to tell which variable is responsible. Example IQ’s have increased the last few decades as nutrition and ✩ spent on education have ↑ed. Is it nutrition? Spending? Another factor? Example Children with bigger feet tend to be better readers. Does large feet cause better reading skills? Lurking variable:

Marc Mehlman (University of New Haven) Introduction 20 / 23

slide-21
SLIDE 21

Marc Mehlman

Correlation

Correlation

Example People who carry matches in their pockets have a higher chance of cancer. Does carrying matches cause cancer? Lurking variable: Example A larger the number of firefighters at a fire site indicates a greater amount

  • f damage. The lurking variable is · · ·

Marc Mehlman (University of New Haven) Introduction 21 / 23

slide-22
SLIDE 22

Marc Mehlman

Error

Error

Error

Marc Mehlman (University of New Haven) Introduction 22 / 23

slide-23
SLIDE 23

Marc Mehlman

Error

Definition sampling error difference between properties of sample and the true population due to randomization. nonsampling error from improper collection or analyzing of samples.

Marc Mehlman (University of New Haven) Introduction 23 / 23