[PPT] - Unit 5: Inference for categorical variables Lecture 1: Inference for PowerPoint Presentation

SLIDE 1

Unit 5: Inference for categorical variables Lecture 1: Inference for proportions

Statistics 101

Thomas Leininger

June 12, 2013

SLIDE 2

Single population proportion

1

Single population proportion Identifying when a sample proportion is nearly normal Confidence intervals for a proportion Choosing a sample size when estimating a proportion Hypothesis testing for a proportion

2

Small sample inference for a proportion Carnival Game Paul the octopus

Statistics 101 U5 - L1: Inf. for proportions Thomas Leininger

SLIDE 3

Single population proportion

Many research questions involve proportions

Who will win the election?

http://elections.huffingtonpost.com/2012/results Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 2 / 23

SLIDE 4

Single population proportion

Many research questions involve proportions

Who will win the NBA finals?

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 2 / 23

SLIDE 5

Single population proportion

Many research questions involve proportions

Mac or PC?

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 2 / 23

SLIDE 6

Single population proportion

Many research questions involve proportions

Is this the cutest baby in the world?

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 2 / 23

SLIDE 7

Single population proportion

Question Two scientists want to know if a certain drug is effective against high blood pressure. The first scientist wants to give the drug to 1000 peo- ple with high blood pressure and see how many of them experience lower blood pressure levels. The second scientist wants to give the drug to 500 people with high blood pressure, and not give the drug to another 500 people with high blood pressure, and see how many in both groups experience lower blood pressure levels. Which is the better way to test this drug? (a) All 1000 get the drug (b) 500 get the drug, 500 don’t

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 3 / 23

SLIDE 8

Single population proportion

Question Two scientists want to know if a certain drug is effective against high blood pressure. The first scientist wants to give the drug to 1000 peo- ple with high blood pressure and see how many of them experience lower blood pressure levels. The second scientist wants to give the drug to 500 people with high blood pressure, and not give the drug to another 500 people with high blood pressure, and see how many in both groups experience lower blood pressure levels. Which is the better way to test this drug? (a) All 1000 get the drug (b) 500 get the drug, 500 don’t

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 3 / 23

SLIDE 9

Single population proportion

Results from the GSS

The GSS asks the same question, below is the distribution of responses from the 2010 survey: All 1000 get the drug 99 500 get the drug 500 don’t 571 Total 670

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 4 / 23

SLIDE 10

Single population proportion

Parameter and point estimate

We would like to estimate the proportion of all Americans who have a good intuition about experimental design, i.e. would answer “500 get the drug 500 don’t?” What are the parameter of interest and the point estimate?

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 5 / 23

SLIDE 11

Single population proportion

Parameter and point estimate

We would like to estimate the proportion of all Americans who have a good intuition about experimental design, i.e. would answer “500 get the drug 500 don’t?” What are the parameter of interest and the point estimate? Parameter of interest: Proportion of all Americans who have a good intuition about experimental design.

p (a population proportion)

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 5 / 23

SLIDE 12

Single population proportion

Parameter and point estimate

We would like to estimate the proportion of all Americans who have a good intuition about experimental design, i.e. would answer “500 get the drug 500 don’t?” What are the parameter of interest and the point estimate? Parameter of interest: Proportion of all Americans who have a good intuition about experimental design.

p (a population proportion)

Point estimate: Proportion of sampled Americans who have a good intuition about experimental design.

ˆ p (a sample proportion) = 571/670 = 0.85

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 5 / 23

SLIDE 13

Single population proportion

Inference on a proportion

What percent of all Americans have a good intuition about experimen- tal design, i.e. would answer “500 get the drug 500 don’t?”

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 6 / 23

SLIDE 14

Single population proportion

Inference on a proportion

What percent of all Americans have a good intuition about experimen- tal design, i.e. would answer “500 get the drug 500 don’t?” We can answer this research question using a confidence interval, which we know is always of the form

point estimate ± ME

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 6 / 23

SLIDE 15

Single population proportion

Inference on a proportion

What percent of all Americans have a good intuition about experimen- tal design, i.e. would answer “500 get the drug 500 don’t?” We can answer this research question using a confidence interval, which we know is always of the form

point estimate ± ME

And we also know that ME = critical value × standard error of the point estimate.

SEˆ

p =?

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 6 / 23

SLIDE 16

Single population proportion

Inference on a proportion

What percent of all Americans have a good intuition about experimen- tal design, i.e. would answer “500 get the drug 500 don’t?” We can answer this research question using a confidence interval, which we know is always of the form

point estimate ± ME

And we also know that ME = critical value × standard error of the point estimate.

SEˆ

p =?

Standard error of a sample proportion

SEˆ

p =

p (1 − p)

n

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 6 / 23

SLIDE 17

Single population proportion Identifying when a sample proportion is nearly normal

1

Single population proportion Identifying when a sample proportion is nearly normal Confidence intervals for a proportion Choosing a sample size when estimating a proportion Hypothesis testing for a proportion

2

Small sample inference for a proportion Carnival Game Paul the octopus

Statistics 101 U5 - L1: Inf. for proportions Thomas Leininger

SLIDE 18

Single population proportion Identifying when a sample proportion is nearly normal

Sample proportions are also nearly normally distributed

Central limit theorem for proportions Sample proportions will be nearly normally distributed with mean equal to the population mean, p, and standard error equal to

p (1−p)

n

.

ˆ p ∼ N       mean = p, SE =

p (1 − p)

n       

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 7 / 23

SLIDE 19

Single population proportion Identifying when a sample proportion is nearly normal

Sample proportions are also nearly normally distributed

Central limit theorem for proportions Sample proportions will be nearly normally distributed with mean equal to the population mean, p, and standard error equal to

p (1−p)

n

.

ˆ p ∼ N       mean = p, SE =

p (1 − p)

n       

But of course this is true only under certain conditions... any guesses?

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 7 / 23

SLIDE 20

Single population proportion Identifying when a sample proportion is nearly normal

Sample proportions are also nearly normally distributed

Central limit theorem for proportions Sample proportions will be nearly normally distributed with mean equal to the population mean, p, and standard error equal to

p (1−p)

n

.

ˆ p ∼ N       mean = p, SE =

p (1 − p)

n       

But of course this is true only under certain conditions... any guesses? independent observations, at least 10 successes and 10 failures

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 7 / 23

SLIDE 21

Single population proportion Identifying when a sample proportion is nearly normal

Sample proportions are also nearly normally distributed

Central limit theorem for proportions Sample proportions will be nearly normally distributed with mean equal to the population mean, p, and standard error equal to

p (1−p)

n

.

ˆ p ∼ N       mean = p, SE =

p (1 − p)

n       

But of course this is true only under certain conditions... any guesses? independent observations, at least 10 successes and 10 failures

Note: If p is unknown (most cases), we use ˆ p when doing a CI and p0 when doing a HT.

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 7 / 23

SLIDE 22

Single population proportion Confidence intervals for a proportion

1

Single population proportion Identifying when a sample proportion is nearly normal Confidence intervals for a proportion Choosing a sample size when estimating a proportion Hypothesis testing for a proportion

2

Small sample inference for a proportion Carnival Game Paul the octopus

Statistics 101 U5 - L1: Inf. for proportions Thomas Leininger

SLIDE 23

Single population proportion Confidence intervals for a proportion

Back to experimental design...

The GSS found that 571 out of 670 (85%) of Americans answered the question on experimental design correctly. Estimate (using a 95% confidence interval) the proportion of all Americans who have a good intuition about experimental design?

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 8 / 23

SLIDE 24

Single population proportion Confidence intervals for a proportion

Back to experimental design...

The GSS found that 571 out of 670 (85%) of Americans answered the question on experimental design correctly. Estimate (using a 95% confidence interval) the proportion of all Americans who have a good intuition about experimental design? Given: n = 670, ˆ

p = 0.85. First check conditions.

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 8 / 23

SLIDE 25

Single population proportion Confidence intervals for a proportion

Back to experimental design...

The GSS found that 571 out of 670 (85%) of Americans answered the question on experimental design correctly. Estimate (using a 95% confidence interval) the proportion of all Americans who have a good intuition about experimental design? Given: n = 670, ˆ

p = 0.85. First check conditions.

1. Independence: The sample is random, therefore we can assume

that one respondent’s response is independent of another.

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 8 / 23

SLIDE 26

Single population proportion Confidence intervals for a proportion

Back to experimental design...

The GSS found that 571 out of 670 (85%) of Americans answered the question on experimental design correctly. Estimate (using a 95% confidence interval) the proportion of all Americans who have a good intuition about experimental design? Given: n = 670, ˆ

p = 0.85. First check conditions.

1. Independence: The sample is random, therefore we can assume

that one respondent’s response is independent of another.

2. Success-failure: 571 people answered correctly (successes) and

99 answered incorrectly (failures), both are greater than 10.

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 8 / 23

SLIDE 27

Single population proportion Confidence intervals for a proportion

Question We are given that n = 670, ˆ

p = 0.85, we also just learned that the

standard error of the sample proportion is SE =

p(1−p)

n

. Which of the below is the correct calculation of the 95% confidence interval? (a) 0.85 ± 1.96 ×

0.85×0.15

670

(b) 0.85 ± 1.65 ×

0.85×0.15

670

(c) 0.85 ± 1.96 × 0.85×0.15

√ 670

(d) 571 ± 1.96 ×

571×99

670

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 9 / 23

SLIDE 28

Single population proportion Confidence intervals for a proportion

Question We are given that n = 670, ˆ

p = 0.85, we also just learned that the

standard error of the sample proportion is SE =

p(1−p)

n

. Which of the below is the correct calculation of the 95% confidence interval? (a) 0.85 ± 1.96 ×

0.85×0.15

670

→ (0.82, 0.88)

(b) 0.85 ± 1.65 ×

0.85×0.15

670

(c) 0.85 ± 1.96 × 0.85×0.15

√ 670

(d) 571 ± 1.96 ×

571×99

670

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 9 / 23

SLIDE 29

Single population proportion Choosing a sample size when estimating a proportion

1

Single population proportion Identifying when a sample proportion is nearly normal Confidence intervals for a proportion Choosing a sample size when estimating a proportion Hypothesis testing for a proportion

2

Small sample inference for a proportion Carnival Game Paul the octopus

Statistics 101 U5 - L1: Inf. for proportions Thomas Leininger

SLIDE 30

Single population proportion Choosing a sample size when estimating a proportion

Choosing a sample size

How many people should you sample in order to cut the margin of error

f a 95% confidence interval down to 1%.

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 10 / 23

SLIDE 31

Single population proportion Choosing a sample size when estimating a proportion

Choosing a sample size

How many people should you sample in order to cut the margin of error

f a 95% confidence interval down to 1%.

ME = z⋆ × SE

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 10 / 23

SLIDE 32

Single population proportion Choosing a sample size when estimating a proportion

Choosing a sample size

How many people should you sample in order to cut the margin of error

f a 95% confidence interval down to 1%.

ME = z⋆ × SE 0.01 ≥ 1.96 ×

0.85 × 0.15

n → Use estimate for ˆ p from previous study

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 10 / 23

SLIDE 33

Single population proportion Choosing a sample size when estimating a proportion

Choosing a sample size

How many people should you sample in order to cut the margin of error

f a 95% confidence interval down to 1%.

ME = z⋆ × SE 0.01 ≥ 1.96 ×

0.85 × 0.15

n → Use estimate for ˆ p from previous study 0.012 ≥ 1.962 × 0.85 × 0.15 n

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 10 / 23

SLIDE 34

Single population proportion Choosing a sample size when estimating a proportion

Choosing a sample size

How many people should you sample in order to cut the margin of error

f a 95% confidence interval down to 1%.

ME = z⋆ × SE 0.01 ≥ 1.96 ×

0.85 × 0.15

n → Use estimate for ˆ p from previous study 0.012 ≥ 1.962 × 0.85 × 0.15 n n ≥ 1.962 × 0.85 × 0.15 0.012

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 10 / 23

SLIDE 35

Single population proportion Choosing a sample size when estimating a proportion

Choosing a sample size

How many people should you sample in order to cut the margin of error

f a 95% confidence interval down to 1%.

ME = z⋆ × SE 0.01 ≥ 1.96 ×

0.85 × 0.15

n → Use estimate for ˆ p from previous study 0.012 ≥ 1.962 × 0.85 × 0.15 n n ≥ 1.962 × 0.85 × 0.15 0.012 n ≥ 4898.04

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 10 / 23

SLIDE 36

Single population proportion Choosing a sample size when estimating a proportion

Choosing a sample size

How many people should you sample in order to cut the margin of error

f a 95% confidence interval down to 1%.

ME = z⋆ × SE 0.01 ≥ 1.96 ×

0.85 × 0.15

n → Use estimate for ˆ p from previous study 0.012 ≥ 1.962 × 0.85 × 0.15 n n ≥ 1.962 × 0.85 × 0.15 0.012 n ≥ 4898.04 → n should be at least 4,899

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 10 / 23

SLIDE 37

Single population proportion Choosing a sample size when estimating a proportion

What if there isn’t a previous study?

... use ˆ

p = 0.5

why?

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 11 / 23

SLIDE 38

Single population proportion Choosing a sample size when estimating a proportion

What if there isn’t a previous study?

... use ˆ

p = 0.5

why? if you don’t know any better, 50-50 is a good guess

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 11 / 23

SLIDE 39

Single population proportion Choosing a sample size when estimating a proportion

What if there isn’t a previous study?

... use ˆ

p = 0.5

why? if you don’t know any better, 50-50 is a good guess

ˆ p = 0.5 gives the most conservative estimate – highest possible

sample size

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 11 / 23

SLIDE 40

Single population proportion Hypothesis testing for a proportion

1

Single population proportion Identifying when a sample proportion is nearly normal Confidence intervals for a proportion Choosing a sample size when estimating a proportion Hypothesis testing for a proportion

2

Small sample inference for a proportion Carnival Game Paul the octopus

Statistics 101 U5 - L1: Inf. for proportions Thomas Leininger

SLIDE 41

Single population proportion Hypothesis testing for a proportion

CI vs. HT for proportions

Success-failure condition:

CI: At least 10 observed successes and failures (use ˆ p) HT: At least 10 expected successes and failures (use p0)

Standard error:

CI: calculate using observed sample proportion: SE =

ˆ

p(1−ˆ p) n

HT: calculate using the null value: SE =

p0(1−p0)

n

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 12 / 23

SLIDE 42

Single population proportion Hypothesis testing for a proportion

The GSS found that 571 out of 670 (85%) of Americans answered the question on experimental design correctly. Do these data provide convincing evidence that more than 80% of Americans have a good intuition about experimental design?

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 13 / 23

SLIDE 43

Single population proportion Hypothesis testing for a proportion

The GSS found that 571 out of 670 (85%) of Americans answered the question on experimental design correctly. Do these data provide convincing evidence that more than 80% of Americans have a good intuition about experimental design?

H0 : p = 0.80 HA : p > 0.80

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 13 / 23

SLIDE 44

Single population proportion Hypothesis testing for a proportion

The GSS found that 571 out of 670 (85%) of Americans answered the question on experimental design correctly. Do these data provide convincing evidence that more than 80% of Americans have a good intuition about experimental design?

H0 : p = 0.80 HA : p > 0.80 SE =

0.80 × 0.20

670 = 0.0154

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 13 / 23

SLIDE 45

Single population proportion Hypothesis testing for a proportion

The GSS found that 571 out of 670 (85%) of Americans answered the question on experimental design correctly. Do these data provide convincing evidence that more than 80% of Americans have a good intuition about experimental design?

H0 : p = 0.80 HA : p > 0.80 SE =

0.80 × 0.20

670 = 0.0154 Z = 0.85 − 0.80 0.0154 = 3.25

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 13 / 23

SLIDE 46

Single population proportion Hypothesis testing for a proportion

The GSS found that 571 out of 670 (85%) of Americans answered the question on experimental design correctly. Do these data provide convincing evidence that more than 80% of Americans have a good intuition about experimental design?

H0 : p = 0.80 HA : p > 0.80 SE =

0.80 × 0.20

670 = 0.0154 Z = 0.85 − 0.80 0.0154 = 3.25 p − value = 1 − 0.9994 = 0.0006

sample proportions

0.8 0.85

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 13 / 23

SLIDE 47

Single population proportion Hypothesis testing for a proportion

The GSS found that 571 out of 670 (85%) of Americans answered the question on experimental design correctly. Do these data provide convincing evidence that more than 80% of Americans have a good intuition about experimental design?

H0 : p = 0.80 HA : p > 0.80 SE =

0.80 × 0.20

670 = 0.0154 Z = 0.85 − 0.80 0.0154 = 3.25 p − value = 1 − 0.9994 = 0.0006

sample proportions

0.8 0.85

Since p-value is low we reject H0. The data provide convincing evidence that more than 80% of Americans have a good intuition on experimental design.

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 13 / 23

SLIDE 48

Single population proportion Hypothesis testing for a proportion

Question 11% of 1,001 Americans responding to a 2006 Gallup survey stated that they have objections to celebrating Halloween on religious

grounds. At 95% confidence level, the margin of error for this survey a

is ±3%. A news piece on this study’s findings states: “More than 10%

f all Americans have objections on religious grounds to celebrating

Halloween.” At 95% confidence level, is this news piece’s statement justified? (a) Yes (b) No (c) Cannot tell

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 14 / 23

SLIDE 49

Single population proportion Hypothesis testing for a proportion

Question 11% of 1,001 Americans responding to a 2006 Gallup survey stated that they have objections to celebrating Halloween on religious

grounds. At 95% confidence level, the margin of error for this survey a

is ±3%. A news piece on this study’s findings states: “More than 10%

f all Americans have objections on religious grounds to celebrating

Halloween.” At 95% confidence level, is this news piece’s statement justified? (a) Yes (b) No (c) Cannot tell

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 14 / 23

SLIDE 50

Small sample inference for a proportion

1

Single population proportion Identifying when a sample proportion is nearly normal Confidence intervals for a proportion Choosing a sample size when estimating a proportion Hypothesis testing for a proportion

2

Small sample inference for a proportion Carnival Game Paul the octopus

Statistics 101 U5 - L1: Inf. for proportions Thomas Leininger

SLIDE 51

Small sample inference for a proportion Carnival Game

1

Single population proportion Identifying when a sample proportion is nearly normal Confidence intervals for a proportion Choosing a sample size when estimating a proportion Hypothesis testing for a proportion

2

Small sample inference for a proportion Carnival Game Paul the octopus

Statistics 101 U5 - L1: Inf. for proportions Thomas Leininger

SLIDE 52

Small sample inference for a proportion Carnival Game

Suppose we want to set up a carnival game at the NC state fair this

year. Can we estimate the proportion of times people can throw a ball

and hit a target?

https://commons.wikimedia.org/wiki/File:Archery Target 80cm.svg Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 15 / 23

SLIDE 53

Small sample inference for a proportion Carnival Game

Let’s build a CI

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 16 / 23

SLIDE 54

Small sample inference for a proportion Carnival Game

Let’s build a CI

Conditions:

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 16 / 23

SLIDE 55

Small sample inference for a proportion Carnival Game

Let’s build a CI

Conditions:

1

Independence: We can assume that each guess is independent

f another.

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 16 / 23

SLIDE 56

Small sample inference for a proportion Carnival Game

Let’s build a CI

Conditions:

1

Independence: We can assume that each guess is independent

f another.

2

Sample size: Are the number of successes and failures both larger than 10?

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 16 / 23

SLIDE 57

Small sample inference for a proportion Carnival Game

Let’s build a CI

Conditions:

1

Independence: We can assume that each guess is independent

f another.

2

Sample size: Are the number of successes and failures both larger than 10? So what do we do?

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 16 / 23

SLIDE 58

Small sample inference for a proportion Carnival Game

Let’s build a CI

Conditions:

1

Independence: We can assume that each guess is independent

f another.

2

Sample size: Are the number of successes and failures both larger than 10? So what do we do? Since the sample size isn’t large enough to use CLT based methods, we use a simulation method instead: http://lock5stat.com/statkey/

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 16 / 23

SLIDE 59

Small sample inference for a proportion Paul the octopus

1

Single population proportion Identifying when a sample proportion is nearly normal Confidence intervals for a proportion Choosing a sample size when estimating a proportion Hypothesis testing for a proportion

2

Small sample inference for a proportion Carnival Game Paul the octopus

Statistics 101 U5 - L1: Inf. for proportions Thomas Leininger

SLIDE 60

Small sample inference for a proportion Paul the octopus

Famous predictors

Before this guy...

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 17 / 23

SLIDE 61

Small sample inference for a proportion Paul the octopus

Famous predictors

Before this guy... There was this guy...

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 17 / 23

SLIDE 62

Small sample inference for a proportion Paul the octopus

Paul the Octopus - psychic?

Paul the Octopus predicted 8 World Cup games, and predicted them all correctly

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 18 / 23

SLIDE 63

Small sample inference for a proportion Paul the octopus

Paul the Octopus - psychic?

Paul the Octopus predicted 8 World Cup games, and predicted them all correctly Does this provide convincing evidence that Paul actually has psychic powers?

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 18 / 23

SLIDE 64

Small sample inference for a proportion Paul the octopus

Paul the Octopus - psychic?

Paul the Octopus predicted 8 World Cup games, and predicted them all correctly Does this provide convincing evidence that Paul actually has psychic powers? How unusual would this be if he was just randomly guessing (with a 50% chance of guessing correctly)?

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 18 / 23

SLIDE 65

Small sample inference for a proportion Paul the octopus

Paul the Octopus - psychic?

Paul the Octopus predicted 8 World Cup games, and predicted them all correctly Does this provide convincing evidence that Paul actually has psychic powers? How unusual would this be if he was just randomly guessing (with a 50% chance of guessing correctly)? Hypotheses: H0 : p = 0.5 HA : p > 0.5

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 18 / 23

SLIDE 66

Small sample inference for a proportion Paul the octopus

Conditions

1

Independence: We can assume that each guess is independent

f another.

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 19 / 23

SLIDE 67

Small sample inference for a proportion Paul the octopus

Conditions

1

Independence: We can assume that each guess is independent

f another.

2

Sample size: The number of expected successes and losses are both smaller than 10.

8 × 0.5 = 0.4

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 19 / 23

SLIDE 68

Small sample inference for a proportion Paul the octopus

Conditions

1

Independence: We can assume that each guess is independent

f another.

2

Sample size: The number of expected successes and losses are both smaller than 10.

8 × 0.5 = 0.4

So what do we do?

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 19 / 23

SLIDE 69

Small sample inference for a proportion Paul the octopus

Conditions

1

Independence: We can assume that each guess is independent

f another.

2

Sample size: The number of expected successes and losses are both smaller than 10.

8 × 0.5 = 0.4

So what do we do? Since the sample size isn’t large enough to use CLT based methods, we can use a simulation method instead.

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 19 / 23

SLIDE 70

Small sample inference for a proportion Paul the octopus

Conditions

1

Independence: We can assume that each guess is independent

f another.

2

Sample size: The number of expected successes and losses are both smaller than 10.

8 × 0.5 = 0.4

So what do we do? Since the sample size isn’t large enough to use CLT based methods, we can use a simulation method instead. How could we simulate this hypothesis test?

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 19 / 23

SLIDE 71

Small sample inference for a proportion Paul the octopus

Application exercise: Simulation testing for one proportion Which of the following methods is best way to calculate the p-value

f the hypothesis test evaluating if Paul the Octopus’ predictions are

unusually higher than random guessing? (a) Flip a coin 8 times, record the proportion of times where all 8 tosses were heads. Repeat this many times, and calculate the proportion of simulations where all 8 tosses were heads. (b) Roll a die 8 times, record the proportion of times where all 8 rolls were 6s. Repeat this many times, and calculate the proportion of simulations where all 8 rolls were 6s. (c) Flip a coin 10,000 times, record the proportion of heads. Repeat this many times, and calculate the proportion of simulations where more than 50% of tosses are heads. (d) Flip a coin 10,000 times, calculate the proportion of heads.

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 20 / 23

SLIDE 72

Small sample inference for a proportion Paul the octopus

Application exercise: Simulation testing for one proportion Which of the following methods is best way to calculate the p-value

f the hypothesis test evaluating if Paul the Octopus’ predictions are

unusually higher than random guessing? (a) Flip a coin 8 times, record the proportion of times where all 8 tosses were heads. Repeat this many times, and calculate the proportion of simulations where all 8 tosses were heads. (b) Roll a die 8 times, record the proportion of times where all 8 rolls were 6s. Repeat this many times, and calculate the proportion of simulations where all 8 rolls were 6s. (c) Flip a coin 10,000 times, record the proportion of heads. Repeat this many times, and calculate the proportion of simulations where more than 50% of tosses are heads. (d) Flip a coin 10,000 times, calculate the proportion of heads.

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 20 / 23

SLIDE 73

Small sample inference for a proportion Paul the octopus

Simulate

Question Flip a coin 8 times. Did you get all heads? (a) Yes (b) No

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 21 / 23

SLIDE 74

Small sample inference for a proportion Paul the octopus

paul <- rep(’yes’, 8) inference(paul, est = "proportion", type = "ht", method = "simulation", success = "yes", null = 0.5, alternative = "greater", seed = 290) Single proportion -- success: yes Summary statistics: p_hat = 1 ; n = 8 Randomizing, please wait... H0: p = 0.5 HA: p > 0.5 p-value = 0.0037

yes 2 4 6 8

Randomization distribution

0.0 0.2 0.4 0.6 0.8 1.0 500 1500 2500 3500

bserved

1

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 22 / 23

SLIDE 75

Small sample inference for a proportion Paul the octopus

Conclusions

Question Which of the following is false? (a) If in fact Paul was randomly guessing, the probability that he would get the result of all 8 games correct is 0.0037. (b) Reject H0, the data provide convincing evidence that Paul did better than randomly guessing. (c) We may have made a Type I error. (d) The probability that Paul is psychic is 0.0037.

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 23 / 23

SLIDE 76

Small sample inference for a proportion Paul the octopus

Conclusions

Question Which of the following is false? (a) If in fact Paul was randomly guessing, the probability that he would get the result of all 8 games correct is 0.0037. (b) Reject H0, the data provide convincing evidence that Paul did better than randomly guessing. (c) We may have made a Type I error. (d) The probability that Paul is psychic is 0.0037.

Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 23 / 23