Unit 3: Foundations for inference 1. Variability in estimates and - PowerPoint PPT Presentation

Unit 3: Foundations for inference 1. Variability in estimates and CLT GOVT 3990 - Spring 2020 Cornell University

Outline 1. Housekeeping 2. Main ideas 1. Sample statistics vary from sample to sample 2. CLT describes the shape, center, and spread of sampling distributions 3. CLT only applies when independence and sample size/skew conditions are met 3. Exercises [time permitting] 4. Summary

Announcements ◮ Decks online 1

Announcements ◮ Decks online ◮ Grades ◮ Problem Set and Lab now due Friday 1

Outline 1. Housekeeping 2. Main ideas 1. Sample statistics vary from sample to sample 2. CLT describes the shape, center, and spread of sampling distributions 3. CLT only applies when independence and sample size/skew conditions are met 3. Exercises [time permitting] 4. Summary

Sample statistics vary from sample to sample ◮ We are often interested in population parameters . 2

Sample statistics vary from sample to sample ◮ We are often interested in population parameters . ◮ Since complete populations are difficult (or impossible) to collect data on, we use sample statistics as point estimates for the unknown population parameters of interest. 2

Sample statistics vary from sample to sample ◮ We are often interested in population parameters . ◮ Since complete populations are difficult (or impossible) to collect data on, we use sample statistics as point estimates for the unknown population parameters of interest. ◮ Sample statistics vary from sample to sample. 2

Sample statistics vary from sample to sample ◮ We are often interested in population parameters . ◮ Since complete populations are difficult (or impossible) to collect data on, we use sample statistics as point estimates for the unknown population parameters of interest. ◮ Sample statistics vary from sample to sample. ◮ Quantifying how sample statistics vary provides a way to estimate the margin of error associated with our point estimate. 2

Sample statistics vary from sample to sample ◮ We are often interested in population parameters . ◮ Since complete populations are difficult (or impossible) to collect data on, we use sample statistics as point estimates for the unknown population parameters of interest. ◮ Sample statistics vary from sample to sample. ◮ Quantifying how sample statistics vary provides a way to estimate the margin of error associated with our point estimate. ◮ But before we get to quantifying the variability among samples, let’s try to understand how and why point estimates vary from sample to sample. Suppose we randomly sample 1,000 adults from each state in the US. Would you expect the sample means of their ages to be the same, somewhat different, or very different? 2

We would like to estimate the average number of drinks it takes students to get drunk. ◮ We will assume that our population is comprised of 146 students. ◮ Assume also that we don’t have the resources to collect data from all 146, so we will take a sample of size n = 10 . If we randomly select observations from this data set, which values are most likely to be selected, which are least likely? 25 20 15 10 5 0 0 2 4 6 8 10 3 number of drinks to get drunk

Social Media Activity Survey Back in 2015 we surveyed all 146 students of GOVT 111 and asked them, among other things, about their social media activity. For instance, we asked how many social media accounts they had. 4

Social Media Activity Survey Back in 2015 we surveyed all 146 students of GOVT 111 and asked them, among other things, about their social media activity. For instance, we asked how many social media accounts they had. These were their answers: 7 6 6 10 6 4 6 4 1 21 41 61 81 101 121 141 5 2 10 7 5 7 5 6 2 22 42 62 82 102 122 142 4 6 3 4 6 6 3 6 3 23 43 63 83 103 123 143 4 7 6 5 8 8 2 4 4 24 44 64 84 104 124 144 6 3 10 6 4 3 2 5 5 25 45 65 85 105 125 145 2 6 4 6 10 6 5 5 6 26 46 66 86 106 126 146 3 5 3 6 5 2 10 7 27 47 67 87 107 127 5 8 3 7 10 5 4 8 28 48 68 88 108 128 5 0 6 7 8 1 1 9 29 49 69 89 109 129 6 8 8 5 5 5 4 10 30 50 70 90 110 130 1 5 8 10 4 5 10 11 31 51 71 91 111 131 10 9 8 3 0.5 4 8 12 32 52 72 92 112 132 4 7 2 5.5 3 4 10 13 33 53 73 93 113 133 4 5 4 7 3 9 6 14 34 54 74 94 114 134 6 5 8 10 5 4 6 15 35 55 75 95 115 135 3 7 3 6 6 3 6 16 36 56 76 96 116 136 10 4 5 6 4 3 7 17 37 57 77 97 117 137 8 0 5 5 4 4 3 18 38 58 78 98 118 138 5 4 8 4 2 4 10 19 39 59 79 99 119 139 10 3 4 5 5 8 4 20 40 60 80 100 120 140 4

◮ Now, lets, sample, with replacement, ten student IDs (the white cell): > sample(1:146, size = 10, replace = TRUE) 5

◮ Now, lets, sample, with replacement, ten student IDs (the white cell): > sample(1:146, size = 10, replace = TRUE) [1] 59 121 88 46 58 72 82 81 5 10 5

◮ Now, lets, sample, with replacement, ten student IDs (the white cell): > sample(1:146, size = 10, replace = TRUE) [1] 59 121 88 46 58 72 82 81 5 10 ◮ Find the students with these IDs: 5

◮ Now, lets, sample, with replacement, ten student IDs (the white cell): > sample(1:146, size = 10, replace = TRUE) [1] 59 121 88 46 58 72 82 81 5 10 ◮ Find the students with these IDs: ◮ Calculate the sample mean of their answer: (8 + 6 + 10 + 4 + 5 + 3 + 5 + 6 + 6 + 6) / 10 = 5 . 9 5

Activity: Creating a sampling distribution Repeat this, now on your own, and report your sample mean. > sample(1:146, size = 10, replace = TRUE) 1. Find the students with these IDs: 7 6 6 10 6 4 6 4 1 21 41 61 81 101 121 141 5 2 10 7 5 7 5 6 2 22 42 62 82 102 122 142 4 6 3 4 6 6 3 6 3 23 43 63 83 103 123 143 4 7 6 5 8 8 2 4 4 24 44 64 84 104 124 144 6 3 10 6 4 3 2 5 5 25 45 65 85 105 125 145 2 6 4 6 10 6 5 5 6 26 46 66 86 106 126 146 3 5 3 6 5 2 10 7 27 47 67 87 107 127 5 8 3 7 10 5 4 8 28 48 68 88 108 128 5 0 6 7 8 1 1 9 29 49 69 89 109 129 6 8 8 5 5 5 4 10 30 50 70 90 110 130 1 5 8 10 4 5 10 11 31 51 71 91 111 131 10 9 8 3 0.5 4 8 12 32 52 72 92 112 132 4 7 2 5.5 3 4 10 13 33 53 73 93 113 133 4 5 4 7 3 9 6 14 34 54 74 94 114 134 6 5 8 10 5 4 6 15 35 55 75 95 115 135 3 7 3 6 6 3 6 16 36 56 76 96 116 136 10 4 5 6 4 3 7 17 37 57 77 97 117 137 8 0 5 5 4 4 3 18 38 58 78 98 118 138 5 4 8 4 2 4 10 19 39 59 79 99 119 139 10 3 4 5 5 8 4 20 40 60 80 100 120 140 2. Calculate the sample mean, round it to 2 decimal places. 6

Sampling distribution What you just constructed is called a sampling distribution . What is the shape and center of this distribution. Based on this distribution what do you think is the true population average? 7

Sampling distribution What you just constructed is called a sampling distribution . What is the shape and center of this distribution. Based on this distribution what do you think is the true population average? 5.39 7

Average number of Syracuse games attended Next let’s look at the population data for the number of Syracuse basketball games attended: 150 100 50 0 0 10 20 30 40 50 60 70 number of games attended 8

Average number of Syracuse games attended (cont.) Sampling distribution, n = 10: What does each observation in this distribution represent? 1500 Is the variability of the 500 sampling distribution smaller or larger than the 0 variability of the population 5 10 15 20 sample means from samples of n = 10 distribution? 9

Average number of Syracuse games attended (cont.) Sampling distribution, n = 10: What does each observation in this distribution represent? Sample mean, ¯ x , of samples 1500 of size n = 10 . Is the variability of the 500 sampling distribution 0 smaller or larger than the 5 10 15 20 variability of the population sample means from samples of n = 10 distribution? 9

Average number of Syracuse games attended (cont.) Sampling distribution, n = 10: What does each observation in this distribution represent? Sample mean, ¯ x , of samples of size n = 10 . 1500 Is the variability of the sampling distribution 500 smaller or larger than the 0 variability of the population 5 10 15 20 distribution? sample means from samples of n = 10 Smaller, sample means will vary less than individual observations. 9

Average number of Syracuse games attended (cont.) Sampling distribution, n = 30: How did the shape, center, and spread of the 1000 sampling distribution 500 change going from n = 10 to n = 30 ? 0 2 4 6 8 10 12 sample means from samples of n = 30 10

Unit 3: Foundations for inference 1. Variability in estimates and - PowerPoint PPT Presentation

Unit 3: Foundations for inference 1. Variability in estimates and CLT GOVT 3990 - Spring 2020 Cornell University Outline 1. Housekeeping 2. Main ideas 1. Sample statistics vary from sample to sample 2. CLT describes the shape, center, and

recap to this point foundations foundations foundations foundations genetics =

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

The Foundations: Logic and Proofs Chapter 1, Part III: Proofs Rules of Inference Section 1.6

Foundations for Inference I Dajiang Liu @PHS525 Feb-09-2016 Statistical Inference

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Unit Identifier Unit October 21, 2014 Unit Identifiers Unit Members Representing Name Email

Unit Title: Presentation Software Unit Level: 2 Unit Credit Value: 4 GLH: 30 LASER Unit

UQ, STAT2201, 2017, Lecture 8 (and part of 9). Unit 8 Two Sample Inference. Unit 9

BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD CLASS BUILDING THE

For personal use only BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD

For personal use only BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD

Outline Foundations of Data and Knowledge Systems EPCL Basic Training Camp 2012 3. Foundations

Machine Learning: Foundations Lecturer: Yishay Mansour Lecture 2 Bayesian Inference Kfir Bar

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

What Percent of the Continental US is Within One Mile of a Road? Sara Stoudt Yue Cao Department

Foundations of Chemical Kinetics Lecture 30: Transition-state theory in the solution phase Marc

Human Computation Online Citizen Science Projects Characterisation of Volunteer Engagement

March 23, 2015 Background Would like to have a uniform E field (at least in direction).

Continuous Improvement Toolkit Sampling Sample Population Continuous Improvement Toolkit .

Structure-Aware Sampling: Flexible and Accurate Summarization Edith Cohen, Graham Cormode, Nick

Gov 2000: 8. Simple Linear Regression Matthew Blackwell Fall 2016 1 / 84 1. Assumptions of the

Sampling Distributions and Inference Department of Mathematics & Statistics Memorial

Sambuz

Useful Links

Newsletter

Mail Us

Unit 3: Foundations for inference 1. Variability in estimates and - PowerPoint PPT Presentation

Unit 3: Foundations for inference 1. Variability in estimates and CLT GOVT 3990 - Spring 2020 Cornell University Outline 1. Housekeeping 2. Main ideas 1. Sample statistics vary from sample to sample 2. CLT describes the shape, center, and

recap to this point foundations foundations foundations foundations genetics =

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

The Foundations: Logic and Proofs Chapter 1, Part III: Proofs Rules of Inference Section 1.6

Foundations for Inference I Dajiang Liu @PHS525 Feb-09-2016 Statistical Inference

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Unit Identifier Unit October 21, 2014 Unit Identifiers Unit Members Representing Name Email

Unit Title: Presentation Software Unit Level: 2 Unit Credit Value: 4 GLH: 30 LASER Unit

UQ, STAT2201, 2017, Lecture 8 (and part of 9). Unit 8 Two Sample Inference. Unit 9

BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD CLASS BUILDING THE

For personal use only BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD

For personal use only BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD

Outline Foundations of Data and Knowledge Systems EPCL Basic Training Camp 2012 3. Foundations

Machine Learning: Foundations Lecturer: Yishay Mansour Lecture 2 Bayesian Inference Kfir Bar

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

What Percent of the Continental US is Within One Mile of a Road? Sara Stoudt Yue Cao Department

Foundations of Chemical Kinetics Lecture 30: Transition-state theory in the solution phase Marc

Human Computation Online Citizen Science Projects Characterisation of Volunteer Engagement

March 23, 2015 Background Would like to have a uniform E field (at least in direction).

Continuous Improvement Toolkit Sampling Sample Population Continuous Improvement Toolkit .

Structure-Aware Sampling: Flexible and Accurate Summarization Edith Cohen, Graham Cormode, Nick

Gov 2000: 8. Simple Linear Regression Matthew Blackwell Fall 2016 1 / 84 1. Assumptions of the

Sampling Distributions and Inference Department of Mathematics &amp; Statistics Memorial

Sambuz

Useful Links

Newsletter

Mail Us

Sampling Distributions and Inference Department of Mathematics & Statistics Memorial