Unit 5: Inference for categorical data 2. Comparing two proportions - PowerPoint PPT Presentation

Announcements Unit 5: Inference for categorical data 2. Comparing two proportions ▶ Review materials will be posted over the weekend. Sta 101 - Spring 2015 ▶ PA 5 will open Monday morning (3/23 at 12:01am) and close Tuesday night (3/24 at 11:59pm). Duke University, Department of Statistical Science ▶ Project 1 due date will be pushed back to Monday morning 3/30. March 19, 2015 Dr. Windle Slides posted at http://bitly.com/windle2 1 CLT also describes the distribution of ˆ p 1 − ˆ p 2 For theoretical HT where H 0 : p 1 = p 2 , pool! For independent groups hypothesis test with H 0 : p 1 = p 2 ▶ Sampling distribtion   √   √ p 1 (1 − p 1 ) + p 2 (1 − p 2 ) p (1 − p ) + p (1 − p ) p 1 − ˆ p 2 ) ∼ N  mean = ( p 1 − p 2 ) , SE = (ˆ p 1 − ˆ p 2 ∼ N  mean = 0 , SE = ˆ n 1 n 2  n 1 n 2  ▶ Best guess of p : Conditions: ▶ Independence: Random sample/assignment + 10% rule p pool = total successes total sample size = suc 1 + suc 2 ˆ ▶ Success-Failure: At least 10 expected successes and failures n 1 + n 2 for each group ▶ Best guess of SE: √ When we do not know or assume anything about p 1 and p 2 : p pool (1 − ˆ p pool ) p pool (1 − ˆ p pool ) ˆ ˆ SE pool = + n 1 n 2 ▶ Success-Failure: At least 10 observed successes and failures for each group ▶ Success-Failure: At least 10 ``expected'' successes and failures for each group (since we do not know p , use ˆ p pool ). 2 3

When S-F fails, simulate! Clicker question Suppose in group 1 30 out of 50 observations are successes, and in group 2 20 out of 60 observations are successes. What is the pooled proportion? ▶ If the S-F condition is met, can do theoretical inference: Z test, Z interval (a) 30 50 ▶ If the S-F condition is not met, must use simulation based (b) 20 60 methods: randomization test, bootstrap interval (c) 30 50 + 20 60 (d) 30+20 50+60 50 + 20 30 (e) 60 2 4 5 ``Healthy adults immunized with an experimental malaria vaccine, called PfSPZ may be completely protected from infection, according to Outcome government researchers." reported Time magazine in Aug 2013. The Malaria No malaria vaccine contains weakened forms of the live parasite -- Plasmodium Vaccine 0 6 6 falciparum -- responsible for causing malaria. In a randomized trial, none of Group No vaccine 5 1 6 the six patients who received the vaccine developed malaria, while five of Total 5 7 12 the six who were not vaccinated became infected. Do these data provide convincing evidence of a difference in rate of malaria infection? H 0 : p T = p C H A : p T ̸ = p C Outcome Conditions: Malaria No malaria 1. Independence: Patients are randomly assigned to treatment Vaccine 0 6 6 Group groups No vaccine 5 1 6 2. Success-failure: ? Total 5 7 12 http://healthland.time.com/2013/08/09/malaria-vaccine-shows-strongest-protection-yet-against-parasite/ 6 7

Difference between two proportions -- success: malaria no malaria Observed difference between proportions (no vaccine-vaccine) = 0.8333 12 6 6 Sum 7 6 HA: p_no vaccine - p_vaccine != 0 p-value = 0.0152 1 5 H0: p_no vaccine - p_vaccine = 0 0 5 malaria no vaccine vaccine Sum y x download("https://stat.duke.edu/~mc301/data/vacc_malaria.csv", destfile = "vacc_malaria.csv") vacc_malaria = read.csv("vacc_malaria.csv") inference(vacc_malaria$outcome, vacc_malaria$group, success = "malaria", est = "proportion", type = "ht", null = 0, alternative = "twosided", method = "simulation", seed = 1028) Response variable: categorical, Explanatory variable: categorical Summary statistics: Clicker question Clicker question Assuming that the null hypothesis ( H 0 : p T = p C ) is true, which of the Assuming that the null hypothesis ( H 0 : p T = p C ) is true, how many following is the pooled proportion of patients with malaria in the two patients would we expect to get infected with malaria in the vaccine groups? group? (a) 6 (a) 0 . 417 × 12 = 5 12 = 0 . 5 Outcome Outcome (b) 5 (b) 0 . 417 × 6 = 2 . 5 12 = 0 . 417 Malaria No malaria Malaria No malaria (c) 0 (c) 0 . 417 × 5 = 2 . 085 5 = 0 Vaccine 0 6 6 Vaccine 0 6 6 Group Group No vaccine 5 1 6 No vaccine 5 1 6 (d) 6 (d) 0 . 5 × 6 = 3 7 = 0 . 857 Total 5 7 12 Total 5 7 12 (e) 7 (e) 0 . 583 × 12 = 7 12 = 0 . 583 8 9 Simulation scheme Simulate in R 1. Use 12 index cards, where each card represents an experimental unit. 2. Mark 5 of the cards as ``malaria" and the remaining 7 as ``no malaria". 3. Shuffle the cards and split into two groups of size 6, for vaccine and no vaccine. 4. Calculate the difference between the proportions of ``malaria" in the vaccine and no vaccine decks, and record this number. 5000 Randomization distribution no vaccine vaccine observed 0.8333 vacc_malaria$outcome 5. Repeat steps (3) and (4) many times to build a randomization 3000 malaria distribution of differences in simulated proportions. 1000 no malaria 0 vacc_malaria$group −0.5 0.0 0.5 10 11

Summary of main ideas Application exercise: App Ex 5.2 1. CLT also describes the distribution of ˆ p 1 − ˆ p 2 See course website for details. 2. For theoretical HT where H 0 : p 1 = p 2 , pool! 3. When S-F fails, simulate! 12 13

Unit 5: Inference for categorical data 2. Comparing two proportions - PowerPoint PPT Presentation

Announcements Unit 5: Inference for categorical data 2. Comparing two proportions Review materials will be posted over the weekend. Sta 101 - Spring 2015 PA 5 will open Monday morning (3/23 at 12:01am) and close Tuesday night (3/24 at

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing

Case study introduction Emily Robinson Data Scientist DataCamp Categorical Data in the

Reordering factors Emily Robinson Data Scientist DataCamp Categorical Data in the Tidyverse

STAT 113 Describing Categorical Data Colin Reimer Dawson Oberlin College September 7, 2017 1 /

The General Social S u r v e y IN FE R E N C E FOR C ATE G OR IC AL DATA IN R Andre w Bra y

STAT 113 Describing Categorical Data I Colin Reimer Dawson Oberlin College September 11, 2020

Unit 5: Inference for categorical variables Lecture 2: Inference for 2-sample proportions

Unit 5: Inference for categorical variables Lecture 1: Inference for proportions Statistics 101

Unit 5: Inference for categorical variables Lecture 1: Inference for proportions Statistics 101

Introduction to qualitative data Emily Robinson Data Scientist DataCamp Categorical Data in

Examining common themed variables Emily Robinson Data Scientist DataCamp Categorical Data in

Categorical Professional Development In-Service August 6, 2019 Welcome Back Categorical Team

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Categorical quantum mechanics Chris Heunen 1 / 76 Categorical Quantum Mechanics? Study of

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Categorical models of probability with symmetries Sam Staton, Oxford Categorical models

Approximate Kernel Methods and Learning on Aggregates Dino Sejdinovic joint work with Leon Law,

Classification Accuracies of Malaria Infected Cells Using Deep Convolutional Neural Networks

EPSS 15 Fall 2017 Introduction to Oceanography Laboratory #1 Maps, Cross-sections, Vertical

Primary 1 2020 Parents Briefing School Leadership Team Name Position Mr Muhammad Farizal Bin

Reducing Child Mortality in the Last Mile: A Randomized Social Entrepreneurship Intervention in

Machine learning applications in developing-world sustainability John Quinn AI-DEV Group School

Access the Audio Portion of the Webinar Thanks for joining! To access the audio portion of the

Probabilistic Reasoning; Probabilistic Reasoning; Network-based reasoning Network-based

Sambuz

Useful Links

Newsletter

Mail Us