inference for proportions
play

Inference for Proportions Marc H. Mehlman marcmehlman@yahoo.com - PowerPoint PPT Presentation

Inference for Proportions Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Based on Rare Event Rule: rare events happen but not to me. Marc Mehlman (University of New Haven) Inference for Proportions 1 / 20 Table of


  1. Inference for Proportions Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Based on Rare Event Rule: “rare events happen – but not to me”. Marc Mehlman (University of New Haven) Inference for Proportions 1 / 20

  2. Table of Contents Inference for a Single Proportion 1 Comparing Two Proportions 2 Marc Mehlman (University of New Haven) Inference for Proportions 2 / 20

  3. Inference for a Single Proportion Inference for a Single Proportion Inference for a Single Proportion Marc Mehlman (University of New Haven) Inference for Proportions 3 / 20

  4. Inference for a Single Proportion Let X 1 , · · · , X n be a random sample from BIN(1 , p ). Then X = � n j =1 X j ∼ BIN ( n , p ). Definition def n = ¯ X The sample population proportion is ˆ p = X . � p (1 − ˆ ˆ p ) def The standard error of ˆ p is SE ˆ = . p n � � � By the CLT, ¯ p (1 − p ) X is approximately N p , for big n and also ˆ p is approximately p n � � � for big n . Thus for big n , ¯ p (1 − ˆ ˆ p ) X is approximately N ˆ p , . n Theorem ( Large Sample Confidence Interval for p :) � ˆ p (1 − ˆ p ) margin of error = m = z ⋆ = z ⋆ SE ˆ p n and the confidence interval is ˆ p ± m. Use this interval for confidence 90% or more and when the number of successes and failures are both at least 15. Marc Mehlman (University of New Haven) Inference for Proportions 4 / 20

  5. Inference for a Single Proportion We compute a 90% confidence interval for the population proportion of arthritis patients who suffer some "adverse symptoms." 23 = ≈ p ˆ 0 . 052 What is the sample proportion p ̂ ? 440 For a 90% confidence level, z* = 1.645. Confidence level C df 0.50 0.60 0.70 0.80 0.90 0.95 0.96 Using the large sample method : z* 0.674 0.841 1.036 1.282 1.645 1.960 2.054 = − m z * p ˆ (1 p ˆ ) n ˆ ± 90%CIfor p p : m = − m 1.645* 0.052(1 0.052) / 440 ± 0.052 0.017 = ≈ m 1.645*0.0106 0.017  With 90% confidence level, between 3.5% and 6.9% of arthritis patients taking this pain medication experience some adverse symptoms. Marc Mehlman (University of New Haven) Inference for Proportions 5 / 20

  6. Inference for a Single Proportion “Plus four” confidence interval for p The “ plus four ” method gives reasonably accurate confidence intervals. We act as if we had four additional observations , two successes and two failures. Thus, the new sample size is n + 4 and the count of successes is X + 2. + counts of successes 2 ~ = p The “plus four” estimate of p is: + count of all observatio ns 4 An approximate level C confidence interval is: ~ ± CI : p m , with ~ ~ ~ = = − + m z * S E z * p ( 1 p ) ( n 4 ) Use this method when C is at least 90% and sample size is at least 10. Marc Mehlman (University of New Haven) Inference for Proportions 6 / 20

  7. Inference for a Single Proportion We want a 90% CI for the population proportion of arthritis patients who suffer some “adverse symptoms.” + 23 2 25 ~ = = ≈ p 0 . 056 What is the value of the “plus four” estimate of p ? + 440 4 444 An approximate 90% confidence interval for p using the “plus four” method is: ~ ~ = − + m z * p ( 1 p ) ( n 4 ) ± 90%CIfor p : p % m = − m 1 . 645 * 0 . 056 ( 1 0 . 056 ) / 444 ± 0.056 0.018 = ≈ m 1 . 645 * 0 . 011 0 . 018  With 90% confidence, between 3.8% and 7.4% of the population of arthritis patients taking this pain medication experience some adverse symptoms. Confidence level C df 0.50 0.60 0.70 0.80 0.90 0.95 0.96 0.98 0.99 0.995 0.998 0.999 z* 0.674 0.841 1.036 1.282 1.645 1.960 2.054 2.326 2.576 2.807 3.091 3.291 Marc Mehlman (University of New Haven) Inference for Proportions 7 / 20

  8. Inference for a Single Proportion Theorem (Sample Size) Given a desired margin of error, m, one should chose the following sample size, n, to obtain the confidence interval, ˆ p ± m (or ˜ p ± m) of p. � � z ⋆ � 2 p ⋆ (1 − p ⋆ ) when p ⋆ is an educated guess of what p is m n = . � z ⋆ � 2 with no educated guess of p 2 m Note: 1 round up n to ensure it is a positive integer. 2 the closer one’s educated guess, p ⋆ , of p is to 1/2, the safer one is. 3 n = ( z ⋆ ) 2 4 m 2 (ie, when p ⋆ = 1 / 2) is the most conservative estimate of n . Marc Mehlman (University of New Haven) Inference for Proportions 8 / 20

  9. Inference for a Single Proportion Theorem (Sample Size) Given a desired margin of error, m, one should chose the following sample size, n, to obtain the confidence interval, ˆ p ± m (or ˜ p ± m) of p. � � z ⋆ � 2 p ⋆ (1 − p ⋆ ) when p ⋆ is an educated guess of what p is m n = . � z ⋆ � 2 with no educated guess of p 2 m Note: 1 round up n to ensure it is a positive integer. 2 the closer one’s educated guess, p ⋆ , of p is to 1/2, the safer one is. 3 n = ( z ⋆ ) 2 4 m 2 (ie, when p ⋆ = 1 / 2) is the most conservative estimate of n . Marc Mehlman (University of New Haven) Inference for Proportions 8 / 20

  10. Inference for a Single Proportion Theorem (Sample Size) Given a desired margin of error, m, one should chose the following sample size, n, to obtain the confidence interval, ˆ p ± m (or ˜ p ± m) of p. � � z ⋆ � 2 p ⋆ (1 − p ⋆ ) when p ⋆ is an educated guess of what p is m n = . � z ⋆ � 2 with no educated guess of p 2 m Note: 1 round up n to ensure it is a positive integer. 2 the closer one’s educated guess, p ⋆ , of p is to 1/2, the safer one is. 3 n = ( z ⋆ ) 2 4 m 2 (ie, when p ⋆ = 1 / 2) is the most conservative estimate of n . Marc Mehlman (University of New Haven) Inference for Proportions 8 / 20

  11. Inference for a Single Proportion Theorem (Sample Size) Given a desired margin of error, m, one should chose the following sample size, n, to obtain the confidence interval, ˆ p ± m (or ˜ p ± m) of p. � � z ⋆ � 2 p ⋆ (1 − p ⋆ ) when p ⋆ is an educated guess of what p is m n = . � z ⋆ � 2 with no educated guess of p 2 m Note: 1 round up n to ensure it is a positive integer. 2 the closer one’s educated guess, p ⋆ , of p is to 1/2, the safer one is. 3 n = ( z ⋆ ) 2 4 m 2 (ie, when p ⋆ = 1 / 2) is the most conservative estimate of n . Marc Mehlman (University of New Haven) Inference for Proportions 8 / 20

  12. Inference for a Single Proportion What sample size would we need in order to achieve a margin of error no more than 0.01 (1 percentage point) with a 90% confidence level? We could use 0.5 for our guessed p *. However, since the drug has been approved for sale over the counter, we can safely assume that no more than 10% of patients should suffer “adverse symptoms” (a better guess than 50%). For a 90% confidence level, z * = 1.645. Confidence level C df 0.50 0.60 0.70 0.80 0.90 0.95 0.96 z* 0.674 0.841 1.036 1.282 1.645 1.960 2.054 2 2  z *   1 . 645  = − = ≈ n   p * ( 1 p *)   ( 0 . 1 )( 0 . 9 ) 2434 . 4  m   0 . 01   To obtain a margin of error no more than 0.01 we need a sample size n of at least 2435 arthritis patients. Marc Mehlman (University of New Haven) Inference for Proportions 9 / 20

  13. Inference for a Single Proportion Theorem (Large Sample z –Test for a Population Proportion) Let X 1 , · · · , X n be a random sample where X j ∼ BIN (1 , p ) and such that np ≥ 10 and n (1 − p ) ≥ 10 . Let H 0 : p = p 0 where p is unknown. Then p − p 0 ˆ z = ∼ N (0 , 1) � p 0 (1 − p 0 ) n is a test statistic for H 0 . Marc Mehlman (University of New Haven) Inference for Proportions 10 / 20

  14. Inference for a Single Proportion Example A potato-chip producer has just received a truckload of potatoes from its main supplier. If the producer determines that more than 8% of the potatoes in the shipment have blemishes, the truck will be sent away to get another load from the supplier. A supervisor selects a random sample of 500 potatoes from the truck. An inspection reveals that 47 of the potatoes have blemishes. Carry out a significance test at the α = 0.10 significance level. What should the producer conclude? We want to perform a test at the α = 0.10 significance level of H 0 : p = 0.08 H a : p > 0.08 where p is the actual proportion of potatoes in this shipment with blemishes. If conditions are met, we should do a one-sample z test for the population proportion p .  Random: The supervisor took a random sample of 500 potatoes from the shipment.  Normal: Assuming H 0 : p = 0.08 is true, the expected numbers of blemished and unblemished potatoes are np 0 = 500(0.08) = 40 and n (1 – p 0 ) = 500(0.92) = 460, respectively. Because both of these 13 values are at least 10, we should be safe doing Normal calculations. Marc Mehlman (University of New Haven) Inference for Proportions 11 / 20

  15. Inference for a Single Proportion Example p = 47/500 = 0.094. The sample proportion of blemished potatoes is ˆ p − p ˆ = 0.094 − 0.08 Test statistic z = = 1.15 0 0 (1 − p p 0 ) 0.08(0.92) n 500 P -value The desired P -value is: P ( z ≥ 1.15) = 1 – 0.8749 = 0.1251 Since our P -value, 0.1251, is greater than the chosen significance level of α = 0.10, we fail to reject H 0 . There is not sufficient evidence to conclude that the shipment contains more than 8% blemished potatoes. The producer will use this truckload of potatoes to make 14 potato chips. Marc Mehlman (University of New Haven) Inference for Proportions 12 / 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend