Inference for Proportions
Marc H. Mehlman marcmehlman@yahoo.com
University of New Haven
Based on Rare Event Rule: “rare events happen – but not to me”.
Marc Mehlman (University of New Haven) Inference for Proportions 1 / 20
Inference for Proportions Marc H. Mehlman marcmehlman@yahoo.com - - PowerPoint PPT Presentation
Inference for Proportions Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Based on Rare Event Rule: rare events happen but not to me. Marc Mehlman (University of New Haven) Inference for Proportions 1 / 20 Table of
University of New Haven
Marc Mehlman (University of New Haven) Inference for Proportions 1 / 20
1
2
Marc Mehlman (University of New Haven) Inference for Proportions 2 / 20
Inference for a Single Proportion
Marc Mehlman (University of New Haven) Inference for Proportions 3 / 20
Inference for a Single Proportion
j=1 Xj ∼ BIN(n, p).
def
X n = ¯
p def
n
p(1−ˆ p) n
p
Marc Mehlman (University of New Haven) Inference for Proportions 4 / 20
Inference for a Single Proportion
We compute a 90% confidence interval for the population proportion of arthritis patients who suffer some "adverse symptoms." What is the sample proportion p̂ ?
ˆ ˆ * (1 ) 1.645* 0.052(1 0.052) / 440 1.645*0.0106 0.017 m z p p n m m = − = − = ≈ 052 . 440 23 ˆ ≈ = p
For a 90% confidence level, z* = 1.645. Using the large sample method: With 90% confidence level, between 3.5% and 6.9% of arthritis patients taking this pain medication experience some adverse symptoms.
ˆ 90%CIfor : 0.052 0.017 p p m ± ±
Confidence level C df 0.50 0.60 0.70 0.80 0.90 0.95 0.96 z* 0.674 0.841 1.036 1.282 1.645 1.960 2.054
Marc Mehlman (University of New Haven) Inference for Proportions 5 / 20
Inference for a Single Proportion
The “plus four” method gives reasonably accurate confidence
successes and two failures. Thus, the new sample size is n + 4 and the count of successes is X + 2.
The “plus four” estimate of p is: An approximate level C confidence interval is: Use this method when C is at least 90% and sample size is at least 10.
Marc Mehlman (University of New Haven) Inference for Proportions 6 / 20
Inference for a Single Proportion
We want a 90% CI for the population proportion of arthritis patients who suffer some “adverse symptoms.”
018 . 011 . * 645 . 1 444 / ) 056 . 1 ( 056 . * 645 . 1 ) 4 ( ) ~ 1 ( ~ * ≈ = − = + − = m m n p p z m
An approximate 90% confidence interval for p using the “plus four” method is: With 90% confidence, between 3.8% and 7.4% of the population of arthritis patients taking this pain medication experience some adverse symptoms.
90%CIfor : 0.056 0.018 p p m ± ± %
056 . 444 25 4 440 2 23 ~ ≈ = + + = p What is the value of the “plus four” estimate of p?
Confidence level C df 0.50 0.60 0.70 0.80 0.90 0.95 0.96 0.98 0.99 0.995 0.998 0.999 z* 0.674 0.841 1.036 1.282 1.645 1.960 2.054 2.326 2.576 2.807 3.091 3.291 Marc Mehlman (University of New Haven) Inference for Proportions 7 / 20
Inference for a Single Proportion
m
2m
1 round up n to ensure it is a positive integer. 2 the closer one’s educated guess, p⋆, of p is to 1/2, the safer one is. 3 n = (z⋆)2
4m2 (ie, when p⋆ = 1/2) is the most conservative estimate of n.
Marc Mehlman (University of New Haven) Inference for Proportions 8 / 20
Inference for a Single Proportion
m
2m
1 round up n to ensure it is a positive integer. 2 the closer one’s educated guess, p⋆, of p is to 1/2, the safer one is. 3 n = (z⋆)2
4m2 (ie, when p⋆ = 1/2) is the most conservative estimate of n.
Marc Mehlman (University of New Haven) Inference for Proportions 8 / 20
Inference for a Single Proportion
m
2m
1 round up n to ensure it is a positive integer. 2 the closer one’s educated guess, p⋆, of p is to 1/2, the safer one is. 3 n = (z⋆)2
4m2 (ie, when p⋆ = 1/2) is the most conservative estimate of n.
Marc Mehlman (University of New Haven) Inference for Proportions 8 / 20
Inference for a Single Proportion
m
2m
1 round up n to ensure it is a positive integer. 2 the closer one’s educated guess, p⋆, of p is to 1/2, the safer one is. 3 n = (z⋆)2
4m2 (ie, when p⋆ = 1/2) is the most conservative estimate of n.
Marc Mehlman (University of New Haven) Inference for Proportions 8 / 20
Inference for a Single Proportion
What sample size would we need in order to achieve a margin of error no more than 0.01 (1 percentage point) with a 90% confidence level?
2 2
We could use 0.5 for our guessed p*. However, since the drug has been approved for sale over the counter, we can safely assume that no more than 10% of patients should suffer “adverse symptoms” (a better guess than 50%). For a 90% confidence level, z* = 1.645. To obtain a margin of error no more than 0.01 we need a sample size n of at least 2435 arthritis patients.
Confidence level C df 0.50 0.60 0.70 0.80 0.90 0.95 0.96 z* 0.674 0.841 1.036 1.282 1.645 1.960 2.054
Marc Mehlman (University of New Haven) Inference for Proportions 9 / 20
Inference for a Single Proportion
n
Marc Mehlman (University of New Haven) Inference for Proportions 10 / 20
Inference for a Single Proportion
13
A potato-chip producer has just received a truckload of potatoes from its main
shipment have blemishes, the truck will be sent away to get another load from the
inspection reveals that 47 of the potatoes have blemishes. Carry out a significance test at the α = 0.10 significance level. What should the producer conclude? We want to perform a test at the α = 0.10 significance level of H0: p = 0.08 Ha: p > 0.08 where p is the actual proportion of potatoes in this shipment with blemishes. If conditions are met, we should do a one-sample z test for the population proportion p. Random: The supervisor took a random sample of 500 potatoes from the shipment. Normal: Assuming H0: p = 0.08 is true, the expected numbers of blemished and unblemished potatoes are np0 = 500(0.08) = 40 and n(1 – p0) = 500(0.92) = 460, respectively. Because both of these values are at least 10, we should be safe doing Normal calculations.
Marc Mehlman (University of New Haven) Inference for Proportions 11 / 20
Inference for a Single Proportion
14
Since our P-value, 0.1251, is greater than the chosen significance level of α = 0.10, we fail to reject H0. There is not sufficient evidence to conclude that the shipment contains more than 8% blemished
potato chips. The sample proportion of blemished potatoes is ˆ
p = 47/500 = 0.094.
Test statistic z= ˆ p − p p
0(1− p 0)
n = 0.094 − 0.08 0.08(0.92) 500 =1.15 P-value The desired P-value is: P(z ≥ 1.15) = 1 – 0.8749 = 0.1251
Marc Mehlman (University of New Haven) Inference for Proportions 12 / 20
Comparing Two Proportions
Marc Mehlman (University of New Haven) Inference for Proportions 13 / 20
Comparing Two Proportions
We often need to compare 2 treatments with 2 independent samples. For large enough samples, the sampling distribution of is approximately Normal.
) ˆ ˆ (
2 1
p p −
However, neither p1 nor p2 are known.
Marc Mehlman (University of New Haven) Inference for Proportions 14 / 20
Comparing Two Proportions
1 D is approximately normal for large nX and nY . 2 µD = µˆ
pX − µˆ pY = pX − pY .
3 σ2
D = σ2 ˆ pX + σ2 ˆ pY = pX (1−pX ) nX
nY
nX
nY
def
Marc Mehlman (University of New Haven) Inference for Proportions 15 / 20
Comparing Two Proportions
Marc Mehlman (University of New Haven) Inference for Proportions 16 / 20
Comparing Two Proportions
Lyme disease is spread by infected ticks. Ticks feed mainly on mice. Mice feed on acorn. An experiment compared two similar forest areas in a year with low acorn amounts. One area was supplied large amounts of acorns, and the other untouched. The next spring mice populations were compared: trapped mice breeding mice Area 1: high in acorns 72 54 Area 2: low in acorns 17 10 Find a large–sample 95% confidence interval for the difference in proportion of breeders in high acorn and low acorn areas. Also find the plus–four 95% confidence interval. Solution for Large–Sample 95% confidence interval: (ˆ pX − ˆ pY ) ± z⋆ ⋆ SED = 54 72 − 10 17 ± 1.96
72
72
+
10 17
17
= 0.1642959 ± 0.2544338. Thus the answer is (−0.09, 0.42) (don’t imply more accuracy than there is). Solution for Four Plus 95% confidence interval: (ˆ pX − ˆ pY ) ± z⋆ ⋆ SED = 55 74 − 11 19 ± 1.96
74
74
+
11 19
19
= 0.1642959 ± 0.2432937. Thus the answer is (−0.08, 0.41). Marc Mehlman (University of New Haven) Inference for Proportions 17 / 20
Comparing Two Proportions
Let X1, · · · , XnX and Y1, · · · , YnY be independent r.s. where Xj ∼ BIN(1, pX ) and Yk ∼ BIN(1, pY ). Let H0 : pX = pY = p where p is unknown. Define the pooled estimate, ˆ p, and the pooled standard error of pX and pY to be ˆ p def = nX ˆ pX + nY ˆ pY nX + nY and SEDp
def
=
p(1 − ˆ p) nX + ˆ p(1 − ˆ p) nY =
p(1 − ˆ p) 1 nX + 1 nY
z = ˆ pX − ˆ pY
p(1 − ˆ p)
nX + 1 nY
= ˆ px − ˆ pY SEDp ∼ N(0, 1) for H0.
Marc Mehlman (University of New Haven) Inference for Proportions 18 / 20
Comparing Two Proportions
Gastric Freezing Gastric freezing was once a treatment for ulcers. Patients would swallow a deflated balloon with tubes to cool the stomach for an hour in hope of reducing acid production and relieving ulcer pain. The treatment was shown to be safe and significantly reducing ulcer pain and was widely used for years. A randomized comparative experiment later compared the outcome of gastric freezing with that of a placebo: 28 of the 82 patients subjected to gastric freezing improved, while 30 of the 78 in the control group improved.
H0: pgf = pplacebo Ha: pgf > pplacebo
Marc Mehlman (University of New Haven) Inference for Proportions 19 / 20
Comparing Two Proportions
Results: 28 of the 82 patients subjected to gastric freezing improved 30 of the 78 patients in the control group improved
1 2 1 2
ˆ ˆ 0.342 0.385 0.043 0.57 0.076 1 1 1 1 0.3625*0.6375 ˆ ˆ (1 ) 82 78 p p z p p n n − − − = = = ≈ − + − + ÷ ÷ 3625 . 78 82 30 28 ˆ = + + =
pooled
p
0.0 0.3 p^ gf - p^ pl
The P-value is greater than 50%... Gastric freezing was not significantly better than a placebo (P-value > 0.1), and this treatment was abandoned. ALWAYS USE A CONTROL!!!
H0: pgf = pplacebo Ha: pgf > pplacebo
ˆ ˆ
gf plac
p p − Marc Mehlman (University of New Haven) Inference for Proportions 20 / 20