Unit 5: Inference for categorical variables Lecture 2: Inference for - PowerPoint PPT Presentation

Unit 5: Inference for categorical variables Lecture 2: Inference for 2-sample proportions Statistics 101 Thomas Leininger June 12, 2013

Announcements Announcements Quiz tomorrow Project reports... Statistics 101 (Thomas Leininger) U5 - L2: Inf. for 2-sample prop. June 12, 2013 2 / 33

Difference of two proportions Melting ice cap Question Scientists predict that global warming may have big effects on the polar regions within the next 100 years. One of the possible effects is that the northern ice cap may completely melt. Would this bother you a great deal, some, a little, or not at all if it actually happened? (a) A great deal (b) Some (c) A little (d) Not at all Statistics 101 (Thomas Leininger) U5 - L2: Inf. for 2-sample prop. June 12, 2013 3 / 33

Difference of two proportions Results from the GSS The GSS asks the same question, below is the distribution of responses from the 2010 survey: A great deal 454 Some 124 A little 52 Not at all 50 Total 680 Statistics 101 (Thomas Leininger) U5 - L2: Inf. for 2-sample prop. June 12, 2013 4 / 33

Difference of two proportions Parameter and point estimate Parameter of interest: Difference between the proportions of all Duke students and all Americans who would be bothered a great deal by the northern ice cap completely melting. p Duke − p US Point estimate: Difference between the proportions of sampled Duke students and sampled Americans who would be bothered a great deal by the northern ice cap completely melting. p Duke − ˆ ˆ p US Statistics 101 (Thomas Leininger) U5 - L2: Inf. for 2-sample prop. June 12, 2013 5 / 33

Difference of two proportions Inference for comparing proportions The details are the same as before... CI: point estimate ± margin of error HT: Use Z = point estimate − null value to find appropriate p-value. SE We just need the appropriate standard error of the point estimate ( SE ˆ p US ), which is the only new concept. p Duke − ˆ Standard error of the difference between two sample proportions � p 1 ( 1 − p 1 ) + p 2 ( 1 − p 2 ) SE (ˆ p 2 ) = p 1 − ˆ n 1 n 2 Statistics 101 (Thomas Leininger) U5 - L2: Inf. for 2-sample prop. June 12, 2013 6 / 33

Difference of two proportions Confidence intervals for difference of proportions Conditions for CI for difference of proportions Independence 1 within groups: The US group is sampled randomly and we’re assuming that the Duke group represents a random sample as well. n Duke < 10% of all Duke students and 680 < 10% of all Americans. We can assume that the attitudes of Duke students in the sample are independent of each other, and attitudes of US residents in the sample are independent of each other as well. between groups: The sampled Duke students and the US residents are independent of each other. Success-failure: 2 At least 10 observed successes and 10 observed failures in the two groups. Statistics 101 (Thomas Leininger) U5 - L2: Inf. for 2-sample prop. June 12, 2013 7 / 33

Difference of two proportions Confidence intervals for difference of proportions Application exercise: CI for difference of proportions Construct a 95% confidence interval for the difference between the proportions of Duke students and Americans who would be bothered a great deal by the melting of the northern ice cap (p Duke − p US ) . Data Duke US A great deal 454 Not a great deal 226 Total 680 Statistics 101 (Thomas Leininger) U5 - L2: Inf. for 2-sample prop. June 12, 2013 8 / 33

Difference of two proportions HT for comparing proportions Question Which of the following is the correct set of hypotheses for testing if the proportion of all Duke students who would be bothered a great deal by the melting of the northern ice cap differs from the proportion of all Americans who do? (a) H 0 : p Duke = p US H A : p Duke � p US (b) H 0 : ˆ p Duke = ˆ p US H A : ˆ p Duke � ˆ p US (c) H 0 : p Duke − p US = 0 H A : p Duke − p US � 0 (d) H 0 : p Duke = p US H A : p Duke < p US Both (a) and (c) are correct. Statistics 101 (Thomas Leininger) U5 - L2: Inf. for 2-sample prop. June 12, 2013 9 / 33

Difference of two proportions HT for comparing proportions Flashback to working with one proportion When constructing a confidence interval for a population proportion, we check if the observed number of successes and failures are at least 10. n ˆ n ( 1 − ˆ p ) ≥ 10 p ≥ 10 When conducting a hypothesis test for a population proportion, we check if the expected number of successes and failures are at least 10. np 0 ≥ 10 n ( 1 − p 0 ) ≥ 10 Statistics 101 (Thomas Leininger) U5 - L2: Inf. for 2-sample prop. June 12, 2013 10 / 33

Difference of two proportions HT for comparing proportions Pooled estimate of a proportion In the case of comparing two proportions where H 0 : p 1 = p 2 , there isn’t a given null value we can use to calculated the expected number of successes and failures in each sample. Therefore, we need to first find a common ( pooled ) proportion for the two groups, and use that in our analysis. This simply means finding the proportion of total successes among the total number of observations. Pooled estimate of a proportion p = # of successes 1 + # of successes 2 ˆ n 1 + n 2 Statistics 101 (Thomas Leininger) U5 - L2: Inf. for 2-sample prop. June 12, 2013 11 / 33

Difference of two proportions HT for comparing proportions Application exercise: Pooled estimate of a proportion - in context Calculate the estimated pooled proportion of Duke students and Amer- icans who would be bothered a great deal by the melting of the northern ice cap. Which sample proportion ( ˆ p Duke or ˆ p US ) the pooled estimate is closer to? Why? Data Duke US A great deal 454 Not a great deal 226 Total 680 Statistics 101 (Thomas Leininger) U5 - L2: Inf. for 2-sample prop. June 12, 2013 12 / 33

Difference of two proportions HT for comparing proportions Application exercise: HT for comparing proportions Do these data suggest that the proportion of all Duke students who would be bothered a great deal by the melting of the northern ice cap differs from the proportion of all Americans who do? Calculate the test statistic, the p-value, and interpret your conclusion in context of the data. Data Duke US ˆ p 0.668 n 680 Statistics 101 (Thomas Leininger) U5 - L2: Inf. for 2-sample prop. June 12, 2013 13 / 33

Recap Recap - inference for one proportion Population parameter: p , point estimate: ˆ p Conditions: independence - random sample at least 10 successes and failures - if not → randomization � p ( 1 − p ) Standard error: SE = n for CI: use ˆ p for HT: use p 0 Statistics 101 (Thomas Leininger) U5 - L2: Inf. for 2-sample prop. June 12, 2013 14 / 33

Recap Recap - comparing two proportions Population parameter: ( p 1 − p 2 ) , point estimate: (ˆ p 1 − ˆ p 2 ) Conditions: independence within groups - random sample and 10% condition met for both groups independence between groups at least 10 successes and failures in each group - if not → randomization � p 1 ( 1 − p 1 ) + p 2 ( 1 − p 2 ) p 2 ) = SE (ˆ p 1 − ˆ n 1 n 2 for CI: use ˆ p 1 and ˆ p 2 for HT: p pool = # suc 1 +# suc 2 when H 0 : p 1 = p 2 : use ˆ n 1 + n 2 when H 0 : p 1 − p 2 = (some value other than 0) : use ˆ p 1 and ˆ p 2 - this is pretty rare Statistics 101 (Thomas Leininger) U5 - L2: Inf. for 2-sample prop. June 12, 2013 15 / 33

Recap Reference - standard error calculations one sample two samples � s 2 s 2 s SE = SE = n 1 + 1 2 mean √ n n 2 � � p ( 1 − p ) p 1 ( 1 − p 1 ) + p 2 ( 1 − p 2 ) SE = SE = proportion n n 1 n 2 When working with means, it’s very rare that σ is known, so we usually use s . When working with proportions, if doing a hypothesis test, p comes from the null hypothesis if constructing a confidence interval, use ˆ p instead Statistics 101 (Thomas Leininger) U5 - L2: Inf. for 2-sample prop. June 12, 2013 16 / 33

Small sample inference for difference between two proportions Back of the hand Back of the hand There is a saying “know something like the back of your hand.” De- scribe an experiment to test if people really do know the backs of their hands. In the MythBusters episode, 11 out of 12 people guesses the backs of their hands correctly. Statistics 101 (Thomas Leininger) U5 - L2: Inf. for 2-sample prop. June 12, 2013 17 / 33

Small sample inference for difference between two proportions Back of the hand Comparing back of the hand to palm of the hand MythBusters also asked these people to guess the palms of their hands. This time 7 out of the 12 people guesses correctly. The data are summarized below. Back Palm Total Correct 11 7 18 Wrong 1 5 6 Total 12 12 24 Statistics 101 (Thomas Leininger) U5 - L2: Inf. for 2-sample prop. June 12, 2013 18 / 33

Small sample inference for difference between two proportions Back of the hand Proportion of correct guesses Palm Back Total Correct 11 7 18 Wrong 1 5 6 Total 12 12 24 Proportion of correct in the back group: 11 12 = 0 . 916 7 Proportion of correct in the palm group: 12 = 0 . 583 Difference: 33.3% more correct in the back of the hand group. Based on the proportions we calculated, do you think the chance of guessing the back of the hand correctly is higher than palm of the hand? Statistics 101 (Thomas Leininger) U5 - L2: Inf. for 2-sample prop. June 12, 2013 19 / 33

Unit 5: Inference for categorical variables Lecture 2: Inference for - PowerPoint PPT Presentation

Unit 5: Inference for categorical variables Lecture 2: Inference for 2-sample proportions Statistics 101 Thomas Leininger June 12, 2013 Announcements Announcements Quiz tomorrow Project reports... Statistics 101 (Thomas Leininger) U5 - L2:

Unit 5: Inference for categorical variables Lecture 1: Inference for proportions Statistics 101

Unit 5: Inference for categorical variables Lecture 1: Inference for proportions Statistics 101

Unit 5: Inference for categorical variables Lecture 3: Chi-square tests Statistics 101 Thomas

Unit 5: Inference for categorical variables Lecture 3: Chi-square tests Statistics 101 Thomas

STAT 113 Describing Categorical Data I Colin Reimer Dawson Oberlin College September 11, 2020

Grouping categorical variables Grouping categories of nominal variables Ricco RAKOTOMALALA

YCL Week 3 Lets talk about variables! Variables Variables are containers for data. Variables

Examining common themed variables Emily Robinson Data Scientist DataCamp Categorical Data in

Categorical Professional Development In-Service August 6, 2019 Welcome Back Categorical Team

STAT 113 Describing Categorical Data Colin Reimer Dawson Oberlin College September 7, 2017 1 /

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Case study introduction Emily Robinson Data Scientist DataCamp Categorical Data in the

Categorical quantum mechanics Chris Heunen 1 / 76 Categorical Quantum Mechanics? Study of

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Categorical models of probability with symmetries Sam Staton, Oxford Categorical models

Reordering factors Emily Robinson Data Scientist DataCamp Categorical Data in the Tidyverse

Flexible ADMM for Block-Structured Convex and Nonconvex Optimization Zhi-Quan (Tom) Luo Joint

Guaranteed Learning of Latent Variable Models through Tensor Methods Furong Huang University of

CPU+GPU Load Balance Guided by Execution Time Prediction Jean-Franois Dollinger, Vincent

Constant-Overhead Secure Computation using Preprocessing Ivan Damgrd, Sarah Zakarias Aarhus

C++ Program Information Database for Analysis Tools Wanghong Yuan, Xiangkui Chen, Tao Xie, Hong

Cryptographic 1972 Parnas On the criteria software engineering, to be used in decomposing

grep, awk and sed three VERY useful command-line utilities Matt Probert, Uni of York grep =

Cooperative Spectrum Sensing based Distributed Power Control

Sambuz

Useful Links

Newsletter

Mail Us