Unit 5: Inference for categorical variables Lecture 1: Inference for - PowerPoint PPT Presentation

Unit 5: Inference for categorical variables Lecture 1: Inference for proportions Statistics 101 Thomas Leininger June 12, 2013

Many research questions involve proportions Who will win the election? http://elections.huffingtonpost.com/2012/results Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 2 / 23

Single population proportion Question Two scientists want to know if a certain drug is effective against high blood pressure. The first scientist wants to give the drug to 1000 people with high blood pressure and see how many of them experience lower blood pressure levels. The second scientist wants to give the drug to 500 people with high blood pressure, and not give the drug to another 500 people with high blood pressure, and see how many in both groups experience lower blood pressure levels. Which is the better way to test this drug? (a) All 1000 get the drug (b) 500 get the drug, 500 don’t Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 3 / 23

Single population proportion Results from the GSS The GSS asks the same question, below is the distribution of responses from the 2010 survey: All 1000 get the drug 99 500 get the drug 500 don’t 571 Total 670 Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 4 / 23

Single population proportion Parameter and point estimate We would like to estimate the proportion of all Americans who have a good intuition about experimental design, i.e. would answer “500 get the drug 500 don’t?” What are the parameter of interest and the point estimate? Parameter of interest: Proportion of all Americans who have a good intuition about experimental design. Point estimate: Proportion of sampled Americans who have a good intuition about experimental design. Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 5 / 23

Single population proportion Inference on a proportion What percent of all Americans have a good intuition about experimental design, i.e. would answer “500 get the drug 500 don’t?” We can answer this research question using a confidence interval, which we know is always of the form point estimate ± ME And we also know that ME = critical value × standard error of the point estimate. p =? SE ˆ Standard error of a sample proportion � p ( 1 − p ) SE ˆ p = n Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 6 / 23

Single population proportion Identifying when a sample proportion is nearly normal Sample proportions are also nearly normally distributed Central limit theorem for proportions Sample proportions will be nearly normally distributed with mean equal � p ( 1 − p ) to the population mean, p , and standard error equal to . n �   p ( 1 − p )   ˆ p ∼ N   mean = p , SE =       n   But of course this is true only under certain conditions... any guesses? Note: If p is unknown (most cases), we use ˆ p when doing a CI and p 0 when doing a HT. Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 7 / 23

Single population proportion Confidence intervals for a proportion Back to experimental design... The GSS found that 571 out of 670 (85%) of Americans answered the question on experimental design correctly. Estimate (using a 95% confidence interval) the proportion of all Americans who have a good intuition about experimental design? Given: n = 670 , ˆ p = 0 . 85. First check conditions. 1. Independence : 2. Success-failure : Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 8 / 23

Single population proportion Confidence intervals for a proportion Question We are given that n = 670 , ˆ p = 0 . 85, we also just learned that the � p ( 1 − p ) standard error of the sample proportion is SE = . Which of n the below is the correct calculation of the 95% confidence interval? � 0 . 85 × 0 . 15 (a) 0 . 85 ± 1 . 96 × 670 � 0 . 85 × 0 . 15 (b) 0 . 85 ± 1 . 65 × 670 (c) 0 . 85 ± 1 . 96 × 0 . 85 × 0 . 15 √ 670 � 571 × 99 (d) 571 ± 1 . 96 × 670 Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 9 / 23

Single population proportion Choosing a sample size when estimating a proportion Choosing a sample size How many people should you sample in order to cut the margin of error of a 95% confidence interval down to 1%. ME = z ⋆ × SE Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 10 / 23

Single population proportion Choosing a sample size when estimating a proportion What if there isn’t a previous study? ... use ˆ p = 0 . 5 why? if you don’t know any better, 50-50 is a good guess p = 0 . 5 gives the most conservative estimate – highest possible ˆ sample size Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 11 / 23

Single population proportion Hypothesis testing for a proportion CI vs. HT for proportions Success-failure condition: CI: At least 10 observed successes and failures (use ˆ p ) HT: At least 10 expected successes and failures (use p 0 ) Standard error: � p ( 1 − ˆ ˆ p ) CI: calculate using observed sample proportion: SE = n � p 0 ( 1 − p 0 ) HT: calculate using the null value: SE = n Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 12 / 23

Single population proportion Hypothesis testing for a proportion The GSS found that 571 out of 670 (85%) of Americans answered the question on experimental design correctly. Do these data provide convincing evidence that more than 80% of Americans have a good intuition about experimental design? H 0 : p = 0 . 80 H A : p > 0 . 80 = SE Z = = p − value 0.8 0.85 sample proportions Conclusion: Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 13 / 23

Single population proportion Hypothesis testing for a proportion Question 11% of 1,001 Americans responding to a 2006 Gallup survey stated that they have objections to celebrating Halloween on religious grounds. At 95% confidence level, the margin of error for this survey a is ± 3 % . A news piece on this study’s findings states: “More than 10% of all Americans have objections on religious grounds to celebrating Halloween.” At 95% confidence level, is this news piece’s statement justified? (a) Yes (b) No (c) Cannot tell Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 14 / 23

Small sample inference for a proportion Carnival Game Suppose we want to set up a carnival game at the NC state fair this year. Can we estimate the proportion of times people can throw a ball and hit a target? https://commons.wikimedia.org/wiki/File:Archery Target 80cm.svg Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 15 / 23

Small sample inference for a proportion Carnival Game Let’s build a CI Conditions: Independence: We can assume that each guess is independent 1 of another. Sample size: Are the number of successes and failures both 2 larger than 10? So what do we do? http://lock5stat.com/statkey/ Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 16 / 23

Small sample inference for a proportion Paul the octopus Famous predictors Before this guy... There was this guy... Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 17 / 23

Small sample inference for a proportion Paul the octopus Paul the Octopus - psychic? Paul the Octopus predicted 8 World Cup games, and predicted them all correctly Does this provide convincing evidence that Paul actually has psychic powers? How unusual would this be if he was just randomly guessing (with a 50% chance of guessing correctly)? Hypotheses: H 0 : H A : Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 18 / 23

Small sample inference for a proportion Paul the octopus Conditions Independence: We can assume that each guess is independent 1 of another. Sample size: The number of expected successes and losses are 2 both smaller than 10 . 8 × 0 . 5 = 0 . 4 So what do we do? Since the sample size isn’t large enough to use CLT based methods, we can use a simulation method instead. How could we simulate this hypothesis test? Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 19 / 23

Small sample inference for a proportion Paul the octopus Application exercise: Simulation testing for one proportion Which of the following methods is best way to calculate the p-value of the hypothesis test evaluating if Paul the Octopus’ predictions are unusually higher than random guessing? (a) Flip a coin 8 times, record the proportion of times where all 8 tosses were heads. Repeat this many times, and calculate the proportion of simulations where all 8 tosses were heads. (b) Roll a die 8 times, record the proportion of times where all 8 rolls were 6s. Repeat this many times, and calculate the proportion of simulations where all 8 rolls were 6s. (c) Flip a coin 10,000 times, record the proportion of heads. Repeat this many times, and calculate the proportion of simulations where more than 50% of tosses are heads. (d) Flip a coin 10,000 times, calculate the proportion of heads. Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 20 / 23

Unit 5: Inference for categorical variables Lecture 1: Inference for - PowerPoint PPT Presentation

Unit 5: Inference for categorical variables Lecture 1: Inference for proportions Statistics 101 Thomas Leininger June 12, 2013 Many research questions involve proportions Who will win the election?

Unit 5: Inference for categorical variables Lecture 2: Inference for 2-sample proportions

Unit 5: Inference for categorical variables Lecture 1: Inference for proportions Statistics 101

Unit 5: Inference for categorical variables Lecture 3: Chi-square tests Statistics 101 Thomas

Unit 5: Inference for categorical variables Lecture 3: Chi-square tests Statistics 101 Thomas

STAT 113 Describing Categorical Data I Colin Reimer Dawson Oberlin College September 11, 2020

Grouping categorical variables Grouping categories of nominal variables Ricco RAKOTOMALALA

YCL Week 3 Lets talk about variables! Variables Variables are containers for data. Variables

Examining common themed variables Emily Robinson Data Scientist DataCamp Categorical Data in

Categorical Professional Development In-Service August 6, 2019 Welcome Back Categorical Team

STAT 113 Describing Categorical Data Colin Reimer Dawson Oberlin College September 7, 2017 1 /

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Case study introduction Emily Robinson Data Scientist DataCamp Categorical Data in the

Categorical quantum mechanics Chris Heunen 1 / 76 Categorical Quantum Mechanics? Study of

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Categorical models of probability with symmetries Sam Staton, Oxford Categorical models

Reordering factors Emily Robinson Data Scientist DataCamp Categorical Data in the Tidyverse

Oct Octopus: a an R RDMA-en enab abled led Di Distri tributed ed Pe Persistent Memory

Advanced Higher Literacy What do you need to know? Identifying/inserting chords to cadence points

LinSim Linear Accelerator Simulation Framework with PLACET and GUINEA-PIG Jochem Snuverink (JAI,

Octopus V3 Michel Hevinga Oscar Kuiken Peter Schakel Remco Terol Ad van den Berg Ronald van

octopus'GPU'cluster'inaugura2on' Scien2fic'Compu2ng'Group' Unil'|'18'fvrier'2016' Welcome'

1 The Table Corpus The Table Corpus Table type % total count Small tables 88.06 12.34B

NAUTILUS Sage Weil - Red Hat FOSDEM - 2019.02.03 1 CEPH UNIFIED STORAGE PLATFORM OBJECT BLOCK

Lecture 10 Stash, Blame, Undoing, and Visual Git Tools Schedule Only a few classes left

Sambuz

Useful Links

Newsletter

Mail Us