STAT 113 Analytic Intervals and Tests for Differences Between Two - - PowerPoint PPT Presentation

stat 113 analytic intervals and tests for differences
SMART_READER_LITE
LIVE PREVIEW

STAT 113 Analytic Intervals and Tests for Differences Between Two - - PowerPoint PPT Presentation

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means STAT 113 Analytic Intervals and Tests for Differences Between Two Groups Colin Reimer Dawson Oberlin College November 9, 2017 1 / 27


slide-1
SLIDE 1

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

STAT 113 Analytic Intervals and Tests for Differences Between Two Groups

Colin Reimer Dawson

Oberlin College

November 9, 2017 1 / 27

slide-2
SLIDE 2

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Cases to Address

We will need standard errors to do CIs and tests for the following parameters:

  • 1. Single Proportion (Last Time)
  • 2. Single Mean (Wrap Up Today)
  • 3. Difference of Proportions (Today)
  • 4. Difference of Means (Today)
  • 5. Mean of Differences (Next Week)

2 / 27

slide-3
SLIDE 3

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means 3 / 27

slide-4
SLIDE 4

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Example: Penguins Again!

Penguin Breeding

The scientists who studied whether metal bands were harmful to penguin survival also examined whether they affected the penguins’ breeding

  • patterns. For the metal-band group, 39 of 122 penguin-seasons resulted

in offspring (32%). In the control group, 70 out of 160 penguin-seasons 32% of 122 breeding seasons (combined across penguins), whereas the controls had offpsring in 70 of 160 breeding seasons (44%).

  • If we want to construct a confidence interval or do a test about

the difference in the proportion “breeding opportunities” that were successful, what is the relevant population parameter?

  • What is the relevant sample statistic?

4 / 27

slide-5
SLIDE 5

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Outline

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means 5 / 27

slide-6
SLIDE 6

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Variance and Standard Error of Differences

  • With two independent samples, A and B, then quantities such

as ˆ pA − ˆ pB that depend on both random samples have two independent sources of variability.

  • So the difference is more variable than either sample statistic

alone.

  • Specifically, the variance of the difference is the sum of the

separate variances: s2

ˆ pA−ˆ pB = s2 ˆ pA + s2 ˆ pB

6 / 27

slide-7
SLIDE 7

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Variance and Standard Error of Proportions

  • Recall: across all random samples, the standard deviation of

the sample proportions (i.e., the standard error) is sˆ

p =

  • p(1 − p)

n where p is the population proportion and n is the sample size.

  • The variance of ˆ

p is the square of this; i.e., the same thing without the square root. 7 / 27

slide-8
SLIDE 8

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Standard Error of Difference of Proportions

So the variance of the difference between two independent sample proportions is s2

ˆ pA−ˆ pB = pA(1 − pA)

nA + pB(1 − pB) nB and the standard deviation (i.e., standard error) of the difference is sˆ

pA−ˆ pB =

  • pA(1 − pA)

nA + pB(1 − pB) nB 8 / 27

slide-9
SLIDE 9

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Standard Error of Difference of Means

The exact same reasoning applies to the standard error of a difference between means of two independent samples: s¯

xA =

σA √nA s¯

xB =

σB √nB s2

¯ xA = σ2 A

nA s2

¯ xB =

σ2

B

nB s2

¯ xA−¯ xB = σ2 A

nA + σ2

B

nB s¯

xA−¯ xB =

  • σ2

A

nA + σ2

B

nB 9 / 27

slide-10
SLIDE 10

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Analytic Approximations of Sampling Distributions

Param. Stat. Randomization Theory SE Test Dist. p ˆ p Simulate from p0

  • p0(1−p0)

n

Normal µ ¯ x Bootstrap + shift

s √n

tn−1 pA − pB ˆ pA − ˆ pB Scramble groups

  • pA(1−pA)

nA

+ pB(1−pB)

nB

Normal µA − µB ¯ xA − ¯ xB Scramble groups

  • s2

A

nA + s2

B

nB

tmin(nA,nB)−1 µD ¯ xD Flip pairs∗

sD √nD

tnD−1 ρ r Scramble pairings

  • 1−r2

n−2

tn−2

CI : Statistic ± Critical Value × SE Sandardized Test Statistic : Statistic − Null Param.

  • SE

10 / 27

slide-11
SLIDE 11

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Outline

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means 11 / 27

slide-12
SLIDE 12

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Distribution of ˆ pA − ˆ pB

  • Condition: The sampling distribution of ˆ

pA − ˆ pB is approximately Normal with at least 10 cases from all four combinations: nApA ≥ 10 nA(1 − pA) ≥ 10 nBpB ≥ 10 nB(1 − pB) ≥ 10

  • Mean: pA − pB
  • Standard deviation (standard error):

SEˆ

p =

  • pA(1 − pA)

nA + pB(1 − pB) nB 12 / 27

slide-13
SLIDE 13

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Confidence Interval for a Difference of Proportions

CI Summary: Difference of Proportions

To compute a confidence interval for a difference of two proportions when the sampling distribution for ˆ pA − ˆ pB is approximately Normal (see the last slide for conditions)

  • 1. Find the standardized endpoints, Z∗, for the confidence level,

using a standard Normal

  • 2. “Destandardize” to get the endpoints

ˆ pA − ˆ pB ± Z∗ ·

  • ˆ

pA(1 − ˆ pA) nA + ˆ pB(1 − ˆ pB) nB Why do we use ˆ pA and ˆ pB in the standard error, again? 13 / 27

slide-14
SLIDE 14

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

P-values for a difference of two sample proportions from a Standard Normal

Computing P-values when the null sampling distribution is approximately Normal (see previously stated conditions) is the reverse process:

  • 1. Convert ˆ

pA − ˆ pB to a z-score within the theoretical null sampling distribution (i.e., using its mean and standard deviation). Zobserved = ˆ pA − ˆ pB − 0 ?

  • 2. Find the relevant area beyond Zobserved using a Standard

Normal 14 / 27

slide-15
SLIDE 15

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Null Standard Error

  • Problem: H0 states what the difference is, but the standard

error depends on each population proportion: sˆ

pA−ˆ pB =

  • pA(1 − pA)

nA + pB(1 − pB) nB

  • This is not a function of the difference.
  • But, H0 says that pA and pB are the same thing, so we can

estimate this single number using ˆ pcombined, the proportion of the relevant category across both groups.

  • Note: hold this proportion constant already when doing a

randomization test. 15 / 27

slide-16
SLIDE 16

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

P-values for a difference of two proportions

Computing P-values when the null sampling distribution is approximately Normal (see previously stated conditions) is the reverse process:

  • 1. Convert ˆ

pA − ˆ pB to a z-score within the theoretical null sampling distribution (i.e., using its mean and standard deviation). Zobserved = ˆ pA − ˆ pB − 0

  • ˆ

pcombined(1−ˆ pcombined) nA

+ ˆ

pcombined(1−ˆ pcombined) nB

= ˆ pA − ˆ pB − 0

  • ˆ

pcombined(1 − ˆ pcombined)( 1

nA + 1 nB )

  • 2. Find the relevant area beyond Zobserved using a Standard

Normal 16 / 27

slide-17
SLIDE 17

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Example: Penguins Again!

Penguin Breeding

The scientists who studied whether metal bands were harmful to penguin survival also examined whether they affected the penguins’ breeding

  • patterns. For the metal-band group, 39 of 122 penguin-seasons resulted

in offspring (32%). In the control group, 70 out of 160 penguin-seasons 32% of 122 breeding seasons (combined across penguins), whereas the controls had offpsring in 70 of 160 breeding seasons (44%).

nmetal = ncontrol = ˆ pmetal = ˆ pcontrol = ˆ pmetal − ˆ pcontrol = 17 / 27

slide-18
SLIDE 18

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Penguins Breed Confidence (Interval)

Is the Normal approximation reasonable? CI : point estimate ± Z∗ · SE SE =

  • ˆ

pA(1 − ˆ pA) nA + ˆ pB(1 − ˆ pB) nB = Z∗ = Find a 90% CI for pA − pB 18 / 27

slide-19
SLIDE 19

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Do metal bands reduce breeding chances?

Zobserved = observed difference − null difference standard error SE =

  • ˆ

pcombined(1 − ˆ pcombined)( 1 nA + 1 nB ) ˆ pcombined = SE = Zobserved = P-value = Decision? 19 / 27

slide-20
SLIDE 20

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Outline

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means 20 / 27

slide-21
SLIDE 21

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Distribution of Difference of Sample Means

Distribution of ¯ x (σ known)

  • Two populations with means µA and µB, and standard

deviations σA and σB

  • Conditions: Sampling distribution of ¯

xA − ¯ xB is Normal if

  • Populations are Normal, or
  • Sample sizes in each group are large enough (roughly can use

nA, nB ≥ 27)

  • Mean: µA − µB
  • Standard deviation (standard error):

SE¯

xA−¯ xB =

  • σ2

A

nA + σ2

B

nB 21 / 27

slide-22
SLIDE 22

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Estimating σA and σB

Distribution of standardized difference

  • Since we don’t know σA and σB, we use
  • SE¯

xA−¯ xB =

  • s2

A

nA + s2

B

nB

  • Now, the standardized difference has a t-distribution

(¯ xA − ¯ xB) − (µA − µB)

  • s2

A

nA + s2

B

nB

∼ td

f

  • A conservative rule is to set d

f = min(nA, nB) − 1 22 / 27

slide-23
SLIDE 23

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Example: Credit Card Use and Tip Percentage

Credit Cards and Tip Percentage

We analyze the percent tip left on 157 bills from the First Crush bistro in Northern New York State. The mean percent tip left on the 106 bills paid in cash was 16.39 with a standard deviation of 5.05. The mean percent tip left on the 51 bills paid with a credit card was 17.10 with a standard deviation of 2.47.

library("mosaic"); library("Lock5Data"); data("RestaurantTips") dotPlot(~PctTip | Credit, data = RestaurantTips, width = 1, cex = 1)

PctTip Count

5 10 15 10 20 30 40

n

10 20 30 40

  • y

23 / 27

slide-24
SLIDE 24

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Credit Cards and Tip Percentage

We analyze the percent tip left on 157 bills from the First Crush bistro in Northern New York State. The mean percent tip left on the 106 bills paid in cash was 16.39 with a standard deviation of 5.05. The mean percent tip left on the 51 bills paid with a credit card was 17.10 with a standard deviation of 2.47.

ncard = ncash = ¯ xcard = ¯ xcash = s2

card =

s2

cash =

¯ xcard − ¯ xcash = 24 / 27

slide-25
SLIDE 25

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Card Vs. Cash Confidence Interval

Is the Normal approximation reasonable? CI : point estimate ± T ∗ · SE SE =

  • s2

A

nA + s2

B

nB = T ∗ = Find a 99% CI for µcard − µcash. 25 / 27

slide-26
SLIDE 26

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Do people who pay differently tip differently?

Tobserved = observed difference − null difference standard error SE =

  • s2

A

nA + s2

B

nB SE = Tobserved = P-value = Decision? 26 / 27

slide-27
SLIDE 27

Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means

Analytic Approximations of Sampling Distributions

Param. Stat. Randomization Theory SE Test Dist. p ˆ p Simulate from p0

  • p0(1−p0)

n

Normal µ ¯ x Bootstrap + shift

s √n

tn−1 pA − pB ˆ pA − ˆ pB Scramble groups

  • pA(1−pA)

nA

+ pB(1−pB)

nB

Normal µA − µB ¯ xA − ¯ xB Scramble groups

  • s2

A

nA + s2

B

nB

tmin(nA,nB)−1 µD ¯ xD Flip pairs∗

sD √nD

tnD−1 ρ r Scramble pairings

  • 1−r2

n−2

tn−2

CI : Statistic ± Critical Value × SE Sandardized Test Statistic : Statistic − Null Param.

  • SE

27 / 27