STAT 113 Analytic Inference for a Single Proportion Colin Reimer - - PowerPoint PPT Presentation

stat 113 analytic inference for a single proportion
SMART_READER_LITE
LIVE PREVIEW

STAT 113 Analytic Inference for a Single Proportion Colin Reimer - - PowerPoint PPT Presentation

STAT 113 Analytic Inference for a Single Proportion Colin Reimer Dawson Oberlin College 7-10 April 2017 Outline Theoretical Approximation of SE Single Proportion Sampling Distribution Confidence Interval Hypothesis Test Single Mean


slide-1
SLIDE 1

STAT 113 Analytic Inference for a Single Proportion

Colin Reimer Dawson

Oberlin College

7-10 April 2017

slide-2
SLIDE 2

Outline

Theoretical Approximation of SE Single Proportion Sampling Distribution Confidence Interval Hypothesis Test Single Mean Sampling Distribution Confidence Interval T-distribution Hypothesis Test

slide-3
SLIDE 3

Outline

Theoretical Approximation of SE Single Proportion Sampling Distribution Confidence Interval Hypothesis Test Single Mean Sampling Distribution Confidence Interval T-distribution Hypothesis Test

slide-4
SLIDE 4

Outline Theoretical Approximation of SE Single Proportion Single Mean

Limits of Normal Approximation So Far

  • We have still needed to do all that randomization / resampling

to calculate the standard error.

  • We can avoid that with some more theory.

4 / 48

slide-5
SLIDE 5

Outline Theoretical Approximation of SE Single Proportion Single Mean

Cases to Address

We will need standard errors to do CIs and tests for the following parameters:

  • 1. Single Proportion (now)
  • 2. Single Mean (today)
  • 3. Difference of Proportions (Thursday)
  • 4. Difference of Means (Thursday)
  • 5. Mean of Differences (new! next week)

5 / 48

slide-6
SLIDE 6

Outline Theoretical Approximation of SE Single Proportion Single Mean

Analytic Approximations of Sampling Distributions

Param. Stat. Randomization Theory SE Test Dist. p ˆ p Simulate from p0

  • p0(1−p0)

n

Normal µ ¯ x Bootstrap + shift

s √n

tn−1 pA − pB ˆ pA − ˆ pB Scramble groups

  • pA(1−pA)

nA

+ pB(1−pB)

nB

Normal µA − µB ¯ xA − ¯ xB Scramble groups

  • s2

A

nA + s2

B

nB

tmin(nA−1,nB−1) µD ¯ xD Flip pairs∗

sD √nD

tnD−1 ρ r Scramble pairings

  • 1−r2

n−2

tn−2

CI : Statistic ± Critical Value × SE Sandardized Test Statistic : Statistic − Null Param.

  • SE

6 / 48

slide-7
SLIDE 7

Outline

Theoretical Approximation of SE Single Proportion Sampling Distribution Confidence Interval Hypothesis Test Single Mean Sampling Distribution Confidence Interval T-distribution Hypothesis Test

slide-8
SLIDE 8

Outline

Theoretical Approximation of SE Single Proportion Sampling Distribution Confidence Interval Hypothesis Test Single Mean Sampling Distribution Confidence Interval T-distribution Hypothesis Test

slide-9
SLIDE 9

Outline Theoretical Approximation of SE Single Proportion Single Mean

Sampling Distribution of a Sample Proportion

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 p ^

  • 0.0

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 p ^

  • 0.0

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 p ^

  • 0.0

0.2 0.4 0.6 0.8 1.0 0.00 0.15 p ^

  • 0.0

0.2 0.4 0.6 0.8 1.0 0.00 0.15 p ^

  • 0.0

0.2 0.4 0.6 0.8 1.0 0.00 0.15 p ^

  • 0.0

0.2 0.4 0.6 0.8 1.0 0.00 0.02 0.04 p ^

  • 0.0

0.2 0.4 0.6 0.8 1.0 0.00 0.02 0.04 p ^

  • 0.0

0.2 0.4 0.6 0.8 1.0 0.00 0.02 0.04 p ^

  • Columns: values of p (left: 0.1, middle: 0.5; right: 0.9)

Rows: values of n (top: 10, middle: 50; bottom: 1000) 9 / 48

slide-10
SLIDE 10

Outline Theoretical Approximation of SE Single Proportion Single Mean

Things Affecting the Standard Error for ˆ p

  • 1. Sample Size (n)
  • Increasing n makes the standard error go
  • 2. Population Proportion (p)
  • What values of p make SE larger?

10 / 48

slide-11
SLIDE 11

Outline Theoretical Approximation of SE Single Proportion Single Mean

Distribution of ˆ p

  • Condition: The sampling distribution of ˆ

p is approximately Normal with at least 10 expected cases of each outcome: np ≥ 10 n(1 − p) ≥ 10

  • Mean: p
  • Standard deviation (standard error):

SEˆ

p =

  • p(1 − p)

n 11 / 48

slide-12
SLIDE 12

Outline

Theoretical Approximation of SE Single Proportion Sampling Distribution Confidence Interval Hypothesis Test Single Mean Sampling Distribution Confidence Interval T-distribution Hypothesis Test

slide-13
SLIDE 13

Outline Theoretical Approximation of SE Single Proportion Single Mean

CI Summary: Single Proportion

To compute a confidence interval for a proportion when the bootstrap distribution for ˆ p is approximately Normal (i.e., counts for both outcomes ≥ 10), use ˆ p ± Z∗ ·

  • ˆ

p(1 − ˆ p) n where Z∗ is the Z-score of the endpoint appropriate for the confidence level, computed from a standard normal (N(0, 1)). 13 / 48

slide-14
SLIDE 14

Outline Theoretical Approximation of SE Single Proportion Single Mean

Example: Kissing Right

Most people are right-handed, and even the right eye is dominant for most people. Developmental biologists have suggested that late-stage human embryos tend to turn their heads to the right. In a study reported in Nature (2003), German bio-psychologist Onur Güntürkün studied kissing couples in public places such as airports, train stations, beaches, and parks. They observed 124 couples, age 13-70 years. For each kissing couple observed, the researchers noted whether the couple leaned their heads to the right or to the left. Let’s find a 95% confidence interval for p, the proportion of all couples who lean right. 14 / 48

slide-15
SLIDE 15

Outline

Theoretical Approximation of SE Single Proportion Sampling Distribution Confidence Interval Hypothesis Test Single Mean Sampling Distribution Confidence Interval T-distribution Hypothesis Test

slide-16
SLIDE 16

Outline Theoretical Approximation of SE Single Proportion Single Mean

P-values for a sample proportion from a Standard Normal

Computing P-values when the null sampling distribution is approximately Normal (i.e., np0 and np0(1 − p0) ≥ 10) is the reverse process:

  • 1. Convert ˆ

p to a z-score within the theoretical distribution . Zobserved = ˆ p − p0

  • p0(1−p0)

n

  • 2. Find the relevant area beyond Zobserved using a Standard

Normal 16 / 48

slide-17
SLIDE 17

Outline Theoretical Approximation of SE Single Proportion Single Mean

Example: Kissing Right

Most people are right-handed, and even the right eye is dominant for most people. Developmental biologists have suggested that late-stage human embryos tend to turn their heads to the right. In a study reported in Nature (2003), German bio-psychologist Onur Güntürkün studied kissing couples in public places such as airports, train stations, beaches, and parks. They observed 124 couples, age 13-70 years. For each kissing couple observed, the researchers noted whether the couple leaned their heads to the right or to the left. Let’s assess how strong the evidence is against the null hypothesis that couples are equally likely to lean right and left. 17 / 48

slide-18
SLIDE 18

Outline

Theoretical Approximation of SE Single Proportion Sampling Distribution Confidence Interval Hypothesis Test Single Mean Sampling Distribution Confidence Interval T-distribution Hypothesis Test

slide-19
SLIDE 19

Outline

Theoretical Approximation of SE Single Proportion Sampling Distribution Confidence Interval Hypothesis Test Single Mean Sampling Distribution Confidence Interval T-distribution Hypothesis Test

slide-20
SLIDE 20

Outline Theoretical Approximation of SE Single Proportion Single Mean

Distribution of Sample Means

  • Central Limit Theorem: Sampling Distribution of ¯

x is approximately Normal, for “sufficiently large” samples, or when the population distribution is Normal.

  • As the sample size n goes up, the standard error goes

.

  • Pairs: What effect do you expect the population standard

deviation to have on the standard error of the distribution of sample means? Why? 20 / 48

slide-21
SLIDE 21

Outline Theoretical Approximation of SE Single Proportion Single Mean

Distribution of ¯ x

  • Population with mean µ and standard deviation σ
  • Conditions: Sampling distribution of ¯

x is Normal if

  • Population is Normal, or
  • Sample size is large (roughly can use n ≥ 27)
  • Mean: µ
  • Standard deviation (standard error):

SE¯

x =

σ √n 21 / 48

slide-22
SLIDE 22

Outline

Theoretical Approximation of SE Single Proportion Sampling Distribution Confidence Interval Hypothesis Test Single Mean Sampling Distribution Confidence Interval T-distribution Hypothesis Test

slide-23
SLIDE 23

Outline Theoretical Approximation of SE Single Proportion Single Mean

CI Summary: Single Mean

To compute a confidence interval for a mean when the sampling distribution for ¯ x is approximately Normal (i.e., Normal population,

  • r “large” n), use

¯ x ± Z∗ · σ √n where Z∗ is the Z-score of the endpoint appropriate for the confidence level, computed from a standard normal (N(0, 1)). 23 / 48

slide-24
SLIDE 24

Outline Theoretical Approximation of SE Single Proportion Single Mean

Example: Mean Atlanta Commute Time

library("mosaic"); library("Lock5Data"); data("CommuteAtlanta") dotPlot(~Time, data = CommuteAtlanta, width = 10, cex = 4)

Time Count

20 40 60 80 100 120 50 100 150

  • nrow(CommuteAtlanta)

[1] 500 mean(~Time, data = CommuteAtlanta) [1] 29.11

24 / 48

slide-25
SLIDE 25

Outline Theoretical Approximation of SE Single Proportion Single Mean

Atlanta Commute Time: Bootstrap CI

Bootstrap.means <- do(10000) * mean(~Time, data = resample(CommuteAtlanta)) CI.99.boot <- quantile(~mean, data = Bootstrap.means, prob = c(0.005, 0.995)) CI.99.boot ## 0.5% 99.5% ## 26.84399 31.58002

25 / 48

slide-26
SLIDE 26

Outline Theoretical Approximation of SE Single Proportion Single Mean

Commute Time: Pure Bootstrap CI

dotPlot(~mean, data = Bootstrap.means, width = 0.1, cex = 20, groups = mean >= CI.99.boot[1] & mean <= CI.99.boot[2])

mean Count

100 200 300 400 26 28 30 32

  • ● ●
  • ● ● ● ●
  • 26 / 48
slide-27
SLIDE 27

Outline Theoretical Approximation of SE Single Proportion Single Mean

Atlanta Commute Time: Analytic CI

  • Confidence interval

¯ x ± Z∗ · SE

  • ¯

x = 29.11

  • Z∗ ≈ 1.96
  • SE:

σ √n

  • n = 500
  • Wait, where do we get σ?

27 / 48

slide-28
SLIDE 28

Outline

Theoretical Approximation of SE Single Proportion Sampling Distribution Confidence Interval Hypothesis Test Single Mean Sampling Distribution Confidence Interval T-distribution Hypothesis Test

slide-29
SLIDE 29

Outline Theoretical Approximation of SE Single Proportion Single Mean

Using s instead of σ

  • We can approximate SE with

s √n, but need to account for the

fact that s itself is an estimate (differing between samples).

  • “95% of sample means are within 2SE of µ” no longer

accurate: the percentage is less than this.

  • How much less depends on how good an estimate s is of σ

(i.e., depends on n). 29 / 48

slide-30
SLIDE 30

Outline Theoretical Approximation of SE Single Proportion Single Mean

Degrees of Freedom

Recall s = n

i=1(xi − ¯

x)2 n − 1 n − 1 is the “degrees of freedom”, or the number of “pieces of information” we have about variability. Bigger d f → more accurate reflection of σ. 30 / 48

slide-31
SLIDE 31

Outline Theoretical Approximation of SE Single Proportion Single Mean

The t family of distributions

When we know σ, we have Z = ¯ X − µ σ/√n ∼ N(0, 1) i.e., z-scores calculated from sample means have a Standard Normal When we don’t know σ (almost always), estimate with s, then T = ¯ X − µ s/√n ∼ tn−1 31 / 48

slide-32
SLIDE 32

Outline Theoretical Approximation of SE Single Proportion Single Mean

A family of t distributions

−4 −2 2 4 0.0 0.1 0.2 0.3 0.4

(x − µ) (s

n) t density df = 1 df = 5 df = 30 Standard Normal

32 / 48

slide-33
SLIDE 33

Outline Theoretical Approximation of SE Single Proportion Single Mean

Tail Probabilities in t distributions

xpt(c(-2, 2), df = 1)

density

0.0005 0.0010 0.0015 −300 −200 −100 100 200 300

. 1 4 8 . 7 5 . 1 4 8

[1] 0.1475836 0.8524164

33 / 48

slide-34
SLIDE 34

Outline Theoretical Approximation of SE Single Proportion Single Mean

Tail Probabilities in t distributions

xpt(c(-2, 2), df = 3)

density

0.1 0.2 0.3 0.4 −5 5

. 7 . 8 6 1 . 7

[1] 0.06966298 0.93033702

34 / 48

slide-35
SLIDE 35

Outline Theoretical Approximation of SE Single Proportion Single Mean

Tail Probabilities in t distributions

xpt(c(-2, 2), df = 5)

density

0.1 0.2 0.3 0.4 0.5 −4 −2 2 4

. 5 1 . 8 9 8 . 5 1

[1] 0.05096974 0.94903026

35 / 48

slide-36
SLIDE 36

Outline Theoretical Approximation of SE Single Proportion Single Mean

Tail Probabilities in t distributions

xpt(c(-2, 2), df = 30)

density

0.1 0.2 0.3 0.4 0.5 −3 −2 −1 1 2 3

. 2 7 . 9 4 5 . 2 7

[1] 0.02731252 0.97268748

36 / 48

slide-37
SLIDE 37

Outline Theoretical Approximation of SE Single Proportion Single Mean

Tail Probabilities in Standard Normal distribution

xpnorm(c(-2, 2)) If X ~ N(0, 1), then P(X <= -2) = P(Z <= -2) = 0.02275013 P(X <= 2) = P(Z <= 2) = 0.97724987 P(X >

  • 2) = P(Z >
  • 2) = 0.97724987

P(X > 2) = P(Z > 2) = 0.02275013

density

0.1 0.2 0.3 0.4 0.5 −2 2

. 2 3 . 9 5 4 . 2 3

37 / 48

slide-38
SLIDE 38

Outline Theoretical Approximation of SE Single Proportion Single Mean

Quantiles of t distributions

xqt(c(0.025, 0.975), df = 1)

density

0.0005 0.0010 0.0015 −300 −200 −100 100 200 300

. 2 5 . 9 5 . 2 5

[1] -12.7062 12.7062

38 / 48

slide-39
SLIDE 39

Outline Theoretical Approximation of SE Single Proportion Single Mean

Quantiles of t distributions

xqt(c(0.025, 0.975), df = 3)

density

0.1 0.2 0.3 0.4 −5 5

. 2 5 . 9 5 . 2 5

[1] -3.182446 3.182446

39 / 48

slide-40
SLIDE 40

Outline Theoretical Approximation of SE Single Proportion Single Mean

Quantiles of t distributions

xqt(c(0.025, 0.975), df = 5)

density

0.1 0.2 0.3 0.4 0.5 −4 −2 2 4

. 2 5 . 9 5 . 2 5

[1] -2.570582 2.570582

40 / 48

slide-41
SLIDE 41

Outline Theoretical Approximation of SE Single Proportion Single Mean

Quantiles of t distributions

xqt(c(0.025, 0.975), df = 30)

density

0.1 0.2 0.3 0.4 0.5 −3 −2 −1 1 2 3

. 2 5 . 9 5 . 2 5

[1] -2.042272 2.042272

41 / 48

slide-42
SLIDE 42

Outline Theoretical Approximation of SE Single Proportion Single Mean

Quantiles of Standard Normal distribution

xqnorm(c(0.025, 0.975)) P(X <= -1.95996398454005) = 0.025 P(X <= 1.95996398454005) = 0.975 P(X >

  • 1.95996398454005) = 0.975

P(X > 1.95996398454005) = 0.025

density

0.1 0.2 0.3 0.4 0.5 −2 2

. 2 5 . 9 5 . 2 5

[1] -1.959964 1.959964

42 / 48

slide-43
SLIDE 43

Outline Theoretical Approximation of SE Single Proportion Single Mean

CI Summary: Single Mean

To compute a confidence interval for a mean when the sampling distribution for ¯ x is approximately Normal (i.e., Normal population,

  • r “large” n) and σ is unknown (which is almost always), use

¯ x ± t∗

n−1 ·

s √n where t∗

n−1 is the quantile appropriate for the confidence level,

computed from a t-distribution with n − 1 degrees of freedom. 43 / 48

slide-44
SLIDE 44

Outline Theoretical Approximation of SE Single Proportion Single Mean

Atlanta Commute Time: Analytic CI

  • Confidence interval

¯ x ± T ∗ · ˆ SE

  • ¯

x = 29.11

  • Get T ∗ using confidence level and d

f = n − 2

xqt(c(0.025, 0.975), df = 500 - 2) [1] -1.964739 1.964739

  • ˆ

SE :

s √n sd(~Time, data = CommuteAtlanta) # Need to find s first [1] 20.71831

44 / 48

slide-45
SLIDE 45

Outline

Theoretical Approximation of SE Single Proportion Sampling Distribution Confidence Interval Hypothesis Test Single Mean Sampling Distribution Confidence Interval T-distribution Hypothesis Test

slide-46
SLIDE 46

Outline Theoretical Approximation of SE Single Proportion Single Mean

P-values for a sample mean

Computing P-values when the null sampling distribution is approximately Normal (i.e., Population is normal OR sample size is “large”) and σ is unknown (which is almost always) is the reverse process:

  • 1. Convert ¯

x to a t-statistic within the theoretical distribution . Tobserved = ¯ x − µ0

s √n

  • 2. Find the relevant area beyond Tobserved using a t distribution

with n − 1 degrees of freedom 46 / 48

slide-47
SLIDE 47

Outline Theoretical Approximation of SE Single Proportion Single Mean

Example: Mean Body Temperature

data("BodyTemp50") dotPlot(~BodyTemp, data = BodyTemp50)

BodyTemp Count

5 10 15 96 97 98 99 100 101

  • mean(~BodyTemp, data = BodyTemp50) # find the sample mean (x-bar)

[1] 98.26 sd(~BodyTemp, data = BodyTemp50) # find the sample sd (s) [1] 0.7653197

47 / 48

slide-48
SLIDE 48

Outline Theoretical Approximation of SE Single Proportion Single Mean

Example: Mean Body Temperature

  • H0 : µ = 98.6
  • Sample mean (standardized): Tobs = ¯

x−µ0 ˆ SE

  • ¯

x = 98.26, µ0 = 98.6

SE =

s √n

  • s = 0.765, n = 50
  • Calculate tobs

t.obs <- (98.26 - 98.6) / (0.765 / sqrt(50)); t.obs [1] -3.142697

  • Once we have Tobs, find P-value from a t-distribution with

d f = n − 1

P.value <- 2 * xpt(-3.14, df = 50 - 1, lower.tail = TRUE); P.value [1] 0.002861716

48 / 48