Inference for Proportions Marc H. Mehlman marcmehlman@yahoo.com - PowerPoint PPT Presentation

Inference for Proportions Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Based on Rare Event Rule: “rare events happen – but not to me”. Marc Mehlman (University of New Haven) Inference for Proportions 1 / 20

Table of Contents Inference for a Single Proportion 1 Comparing Two Proportions 2 Marc Mehlman (University of New Haven) Inference for Proportions 2 / 20

Inference for a Single Proportion Inference for a Single Proportion Inference for a Single Proportion Marc Mehlman (University of New Haven) Inference for Proportions 3 / 20

Inference for a Single Proportion Let X 1 , · · · , X n be a random sample from BIN(1 , p ). Then X = � n j =1 X j ∼ BIN ( n , p ). Definition def n = ¯ X The sample population proportion is ˆ p = X . � p (1 − ˆ ˆ p ) def The standard error of ˆ p is SE ˆ = . p n � � � By the CLT, ¯ p (1 − p ) X is approximately N p , for big n and also ˆ p is approximately p n � � � for big n . Thus for big n , ¯ p (1 − ˆ ˆ p ) X is approximately N ˆ p , . n Theorem ( Large Sample Confidence Interval for p :) � ˆ p (1 − ˆ p ) margin of error = m = z ⋆ = z ⋆ SE ˆ p n and the confidence interval is ˆ p ± m. Use this interval for confidence 90% or more and when the number of successes and failures are both at least 15. Marc Mehlman (University of New Haven) Inference for Proportions 4 / 20

Inference for a Single Proportion We compute a 90% confidence interval for the population proportion of arthritis patients who suffer some "adverse symptoms." 23 = ≈ p ˆ 0 . 052 What is the sample proportion p ̂ ? 440 For a 90% confidence level, z* = 1.645. Confidence level C df 0.50 0.60 0.70 0.80 0.90 0.95 0.96 Using the large sample method : z* 0.674 0.841 1.036 1.282 1.645 1.960 2.054 = − m z * p ˆ (1 p ˆ ) n ˆ ± 90%CIfor p p : m = − m 1.645* 0.052(1 0.052) / 440 ± 0.052 0.017 = ≈ m 1.645*0.0106 0.017  With 90% confidence level, between 3.5% and 6.9% of arthritis patients taking this pain medication experience some adverse symptoms. Marc Mehlman (University of New Haven) Inference for Proportions 5 / 20

Inference for a Single Proportion “Plus four” confidence interval for p The “ plus four ” method gives reasonably accurate confidence intervals. We act as if we had four additional observations , two successes and two failures. Thus, the new sample size is n + 4 and the count of successes is X + 2. + counts of successes 2 ~ = p The “plus four” estimate of p is: + count of all observatio ns 4 An approximate level C confidence interval is: ~ ± CI : p m , with ~ ~ ~ = = − + m z * S E z * p ( 1 p ) ( n 4 ) Use this method when C is at least 90% and sample size is at least 10. Marc Mehlman (University of New Haven) Inference for Proportions 6 / 20

Inference for a Single Proportion We want a 90% CI for the population proportion of arthritis patients who suffer some “adverse symptoms.” + 23 2 25 ~ = = ≈ p 0 . 056 What is the value of the “plus four” estimate of p ? + 440 4 444 An approximate 90% confidence interval for p using the “plus four” method is: ~ ~ = − + m z * p ( 1 p ) ( n 4 ) ± 90%CIfor p : p % m = − m 1 . 645 * 0 . 056 ( 1 0 . 056 ) / 444 ± 0.056 0.018 = ≈ m 1 . 645 * 0 . 011 0 . 018  With 90% confidence, between 3.8% and 7.4% of the population of arthritis patients taking this pain medication experience some adverse symptoms. Confidence level C df 0.50 0.60 0.70 0.80 0.90 0.95 0.96 0.98 0.99 0.995 0.998 0.999 z* 0.674 0.841 1.036 1.282 1.645 1.960 2.054 2.326 2.576 2.807 3.091 3.291 Marc Mehlman (University of New Haven) Inference for Proportions 7 / 20

Inference for a Single Proportion Theorem (Sample Size) Given a desired margin of error, m, one should chose the following sample size, n, to obtain the confidence interval, ˆ p ± m (or ˜ p ± m) of p. � � z ⋆ � 2 p ⋆ (1 − p ⋆ ) when p ⋆ is an educated guess of what p is m n = . � z ⋆ � 2 with no educated guess of p 2 m Note: 1 round up n to ensure it is a positive integer. 2 the closer one’s educated guess, p ⋆ , of p is to 1/2, the safer one is. 3 n = ( z ⋆ ) 2 4 m 2 (ie, when p ⋆ = 1 / 2) is the most conservative estimate of n . Marc Mehlman (University of New Haven) Inference for Proportions 8 / 20

Inference for a Single Proportion What sample size would we need in order to achieve a margin of error no more than 0.01 (1 percentage point) with a 90% confidence level? We could use 0.5 for our guessed p *. However, since the drug has been approved for sale over the counter, we can safely assume that no more than 10% of patients should suffer “adverse symptoms” (a better guess than 50%). For a 90% confidence level, z * = 1.645. Confidence level C df 0.50 0.60 0.70 0.80 0.90 0.95 0.96 z* 0.674 0.841 1.036 1.282 1.645 1.960 2.054 2 2  z *   1 . 645  = − = ≈ n   p * ( 1 p *)   ( 0 . 1 )( 0 . 9 ) 2434 . 4  m   0 . 01   To obtain a margin of error no more than 0.01 we need a sample size n of at least 2435 arthritis patients. Marc Mehlman (University of New Haven) Inference for Proportions 9 / 20

Inference for a Single Proportion Theorem (Large Sample z –Test for a Population Proportion) Let X 1 , · · · , X n be a random sample where X j ∼ BIN (1 , p ) and such that np ≥ 10 and n (1 − p ) ≥ 10 . Let H 0 : p = p 0 where p is unknown. Then p − p 0 ˆ z = ∼ N (0 , 1) � p 0 (1 − p 0 ) n is a test statistic for H 0 . Marc Mehlman (University of New Haven) Inference for Proportions 10 / 20

Inference for a Single Proportion Example A potato-chip producer has just received a truckload of potatoes from its main supplier. If the producer determines that more than 8% of the potatoes in the shipment have blemishes, the truck will be sent away to get another load from the supplier. A supervisor selects a random sample of 500 potatoes from the truck. An inspection reveals that 47 of the potatoes have blemishes. Carry out a significance test at the α = 0.10 significance level. What should the producer conclude? We want to perform a test at the α = 0.10 significance level of H 0 : p = 0.08 H a : p > 0.08 where p is the actual proportion of potatoes in this shipment with blemishes. If conditions are met, we should do a one-sample z test for the population proportion p .  Random: The supervisor took a random sample of 500 potatoes from the shipment.  Normal: Assuming H 0 : p = 0.08 is true, the expected numbers of blemished and unblemished potatoes are np 0 = 500(0.08) = 40 and n (1 – p 0 ) = 500(0.92) = 460, respectively. Because both of these 13 values are at least 10, we should be safe doing Normal calculations. Marc Mehlman (University of New Haven) Inference for Proportions 11 / 20

Inference for a Single Proportion Example p = 47/500 = 0.094. The sample proportion of blemished potatoes is ˆ p − p ˆ = 0.094 − 0.08 Test statistic z = = 1.15 0 0 (1 − p p 0 ) 0.08(0.92) n 500 P -value The desired P -value is: P ( z ≥ 1.15) = 1 – 0.8749 = 0.1251 Since our P -value, 0.1251, is greater than the chosen significance level of α = 0.10, we fail to reject H 0 . There is not sufficient evidence to conclude that the shipment contains more than 8% blemished potatoes. The producer will use this truckload of potatoes to make 14 potato chips. Marc Mehlman (University of New Haven) Inference for Proportions 12 / 20

Inference for Proportions Marc H. Mehlman marcmehlman@yahoo.com - PowerPoint PPT Presentation

Inference for Proportions Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Based on Rare Event Rule: rare events happen but not to me. Marc Mehlman (University of New Haven) Inference for Proportions 1 / 20 Table of

Lecture 22/Chapter 19 Part 4. Statistical Inference Ch. 19 Diversity of Sample Proportions

Unit 5: Inference for categorical variables Lecture 1: Inference for proportions Statistics 101

Small area estimation of proportions of Small area estimation of proportions of Arsenic affected

Estimating proportions of elements in finite symmetric and classical groups Alice Niemeyer UWA,

11/11/2014 Chapter 21 COMPARING TWO PROPORTIONS 1 THE STANDARD DEVIATION OF THE DIFFERENCE

Unit 5: Inference for categorical variables Lecture 2: Inference for 2-sample proportions

Unit 5: Inference for categorical variables Lecture 1: Inference for proportions Statistics 101

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Ratios, Rates & Proportions Slide 2 / 130 Table of Contents Click on the topic to go to that

Factor Proportions and the Structure of Commodity Trade John Romalis - Chicago GSB, August

PCA and Admixture proportions for low depth NGS data Anders Albrechtsen Structured

ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two Proportions Two sample tests

Amounts and Proportions Session 4 PMAP 8921: Data Visualization with R Andrew Young School of

Comparing two proportions Beginning Bayes in R Learning about many parameters Chapters 2-3

Hypothesis Tests for Population Proportions Bernd Schr oder logo1 Bernd Schr oder

STAT 113 Normal-Based Inference for Proportions Colin Reimer Dawson Oberlin College November

NO DISCLOSURES Convention and Controversies Richard A. Jacobs, M.D., PhD. Willie Burgdorfer, Ph.D.

4. Personalization Outline 4.1. Objectives 4.2. Concerns 4.3. Potential 4.4. Link Analysis

Intelligent Consumer-Centric Electronic Medical Record Gang Luo Selena B. Thomas Chunqiang

Evaluating dynamic treatment strategies Barbra Dickerman Department of Epidemiology Objectives

Two Tools for the Analysis of Longitudinal Data: Motivations, Applications and Issues Vern

Semi-Supervised Learning Literature Survey Xiaojin Zhu Computer Sciences TR 1530 University of

Clinical Metabolomics: Analytical Tool for Drug Development. Vladimir Tolstikov 1, * 1 Director of

Disclosures THE COST-EFFECTIVENESS OF SURGICAL TREATMENT FOR None COMPLEX PROXIMAL HUMERUS

Sambuz

Useful Links

Newsletter

Mail Us

Inference for Proportions Marc H. Mehlman marcmehlman@yahoo.com - PowerPoint PPT Presentation

Inference for Proportions Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Based on Rare Event Rule: rare events happen but not to me. Marc Mehlman (University of New Haven) Inference for Proportions 1 / 20 Table of

Lecture 22/Chapter 19 Part 4. Statistical Inference Ch. 19 Diversity of Sample Proportions

Unit 5: Inference for categorical variables Lecture 1: Inference for proportions Statistics 101

Small area estimation of proportions of Small area estimation of proportions of Arsenic affected

Estimating proportions of elements in finite symmetric and classical groups Alice Niemeyer UWA,

11/11/2014 Chapter 21 COMPARING TWO PROPORTIONS 1 THE STANDARD DEVIATION OF THE DIFFERENCE

Unit 5: Inference for categorical variables Lecture 2: Inference for 2-sample proportions

Unit 5: Inference for categorical variables Lecture 1: Inference for proportions Statistics 101

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Ratios, Rates &amp; Proportions Slide 2 / 130 Table of Contents Click on the topic to go to that

Factor Proportions and the Structure of Commodity Trade John Romalis - Chicago GSB, August

PCA and Admixture proportions for low depth NGS data Anders Albrechtsen Structured

ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two Proportions Two sample tests

Amounts and Proportions Session 4 PMAP 8921: Data Visualization with R Andrew Young School of

Comparing two proportions Beginning Bayes in R Learning about many parameters Chapters 2-3

Hypothesis Tests for Population Proportions Bernd Schr oder logo1 Bernd Schr oder

STAT 113 Normal-Based Inference for Proportions Colin Reimer Dawson Oberlin College November

NO DISCLOSURES Convention and Controversies Richard A. Jacobs, M.D., PhD. Willie Burgdorfer, Ph.D.

4. Personalization Outline 4.1. Objectives 4.2. Concerns 4.3. Potential 4.4. Link Analysis

Intelligent Consumer-Centric Electronic Medical Record Gang Luo Selena B. Thomas Chunqiang

Evaluating dynamic treatment strategies Barbra Dickerman Department of Epidemiology Objectives

Two Tools for the Analysis of Longitudinal Data: Motivations, Applications and Issues Vern

Semi-Supervised Learning Literature Survey Xiaojin Zhu Computer Sciences TR 1530 University of

Clinical Metabolomics: Analytical Tool for Drug Development. Vladimir Tolstikov 1, * 1 Director of

Disclosures THE COST-EFFECTIVENESS OF SURGICAL TREATMENT FOR None COMPLEX PROXIMAL HUMERUS

Sambuz

Useful Links

Newsletter

Mail Us

Ratios, Rates & Proportions Slide 2 / 130 Table of Contents Click on the topic to go to that