Sampling Distribution of a Statistic Recall: a statistic is a summary - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Sampling Distribution of a Statistic Recall: a statistic is a summary calculated from a sample. Statistics vary from sample to sample. If samples are chosen randomly , the variation of a statistic is also random. That is, under random sampling, a statistic is a random variable . 1 / 21 Review of Basic Concepts Sampling Distributions and the CLT

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Sampling Distribution Every random variable has a probability distribution , usually represented by either: a probability density function , such as a normal density; or a probability mass function , such as the binomial or Poisson probability functions. In the special case of a statistic, its probability distribution is also called its sampling distribution . 2 / 21 Review of Basic Concepts Sampling Distributions and the CLT

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Fuel Consumption Example For example, suppose we view the 100 fuel consumption values as a population , and draw a random sample of size 25: mean(sample(epagas$MPG, 25)) # 36.944 If we draw more samples, we get a different sample mean each time: mean(sample(epagas$MPG, 25)) # 37.044 mean(sample(epagas$MPG, 25)) # 37.088 3 / 21 Review of Basic Concepts Sampling Distributions and the CLT

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II If we draw many samples, we begin to see the sampling distribution: sampleMeans = rep(NA, 1000) for (i in 1:length(sampleMeans)) sampleMeans[i] = mean(sample(epagas$MPG, 25)) hist(sampleMeans) Note that the sample means are: distributed around the population mean of 37 mpg; not as widely dispersed as the original 100 measurements. 4 / 21 Review of Basic Concepts Sampling Distributions and the CLT

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Some Theoretical Results If Y 1 , Y 2 , . . . , Y n are randomly sampled from some population with mean µ and standard deviation σ , then the sampling distribution of their mean ¯ Y satisfies: for any n , � ¯ � Mean: E = µ ¯ Y = µ, Y Y = σ √ n Standard error of estimate: σ ¯ for large n , ¯ Y is approximately normally distributed (Central Limit Theorem). 5 / 21 Review of Basic Concepts Sampling Distributions and the CLT

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Inference About a Parameter: Point Estimation For example, the population mean, µ A good estimator of µ should have a sampling distribution that is: centered around µ with little dispersion. We often make these ideas specific by using the mean and standard error. Consider the sample mean, ¯ Y : Y = µ ; ¯ centering: µ ¯ Y is unbiased ; Y = σ/ √ n ; ¯ dispersion: σ ¯ Y has a small standard error of estimate when n is large. 6 / 21 Review of Basic Concepts Point Estimate of a Population Mean

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II In fact, when the original data are normally distributed, ¯ Y has the smallest standard error of any unbiased estimator. That is, the sample mean ¯ Y is a Minimum Variance Unbiased Estimator (MVUE). In other cases, ¯ Y is usally a good estimator of µ , but not the best. For data with the uniform distribution, the midrange is better. For data with the double exponential (Laplace) distribution, the sample median is better. 7 / 21 Review of Basic Concepts Point Estimate of a Population Mean

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II The sample mean ¯ Y is always the Best Linear Unbiased Estimator (BLUE): For any constants w 1 , w 2 , . . . , w n with � w i = 1, if W is the estimator n � W = w i Y i i =1 then W is unbiased: n � µ W = w i µ = µ ; i =1 but the standard error of estimate is � n � i σ 2 ≥ σ � � w 2 σ W = √ n = σ ¯ Y . � i =1 8 / 21 Review of Basic Concepts Point Estimate of a Population Mean

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Inference About a Parameter: Interval Estimation Recall that, by the Central Limit Theorem, when n is large, ¯ Y is approximately normally distributed. That is, ¯ ¯ Y − µ ¯ Y − µ Y = σ/ √ n σ ¯ Y approximately follows the standard normal distribution. 9 / 21 Review of Basic Concepts Confidence Interval for a Population Mean

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II So the chance that µ − 1 . 96 σ Y ≤ µ + 1 . 96 σ √ n ≤ ¯ √ n is approximately 95%. Equivalently, the chance that Y − 1 . 96 σ Y + 1 . 96 σ ¯ √ n ≤ µ ≤ ¯ √ n is approximately 95%. We say that Y ± 1 . 96 σ ¯ √ n is an approximate 95% confidence interval (CI) for µ . 10 / 21 Review of Basic Concepts Confidence Interval for a Population Mean

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II To calculate the end-points of this approximate confidence interval, we need to know the additional parameter σ . Typically σ is unknown, so we cannot use the CI. But we can estimate σ by the sample standard deviation s , and use the alternative confidence interval Y ± 1 . 96 s ¯ √ n . When n is large, the chance that Y − 1 . 96 s Y + 1 . 96 s ¯ √ n ≤ µ ≤ ¯ √ n is still approximately 95%. 11 / 21 Review of Basic Concepts Confidence Interval for a Population Mean

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II What if n is not large? In small samples, we can still construct a confidence interval, but it has the correct coverage probability only if the original data are approximately normally distributed. The key is to replace ± 1 . 96, the 2.5% and 97.5% points of the normal distribution, with ± t . 025 , n − 1 , the 2.5% and 97.5% points of Student’s t -distribution with ( n − 1) degrees of freedom: for normally distributed data, the chance that s s ¯ √ n ≤ µ ≤ ¯ Y − t . 025 , n − 1 Y + t . 025 , n − 1 √ n is 95%. 12 / 21 Review of Basic Concepts Confidence Interval for a Population Mean

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Tables of the t -distribution show that when n is large, the percent points are very close to those of the normal distribution. So it’s reasonable to use the t -distribution percent points whenever the confidence interval is based on the sample s instead of the population σ . M&S give formulas for a general 100(1 − α )% confidence interval: s ¯ √ n ; Y ± t α/ 2 , n − 1 here α = . 05 for a 95% CI; in some situations, α = . 01 for a 99% CI is preferred; other values are rarely used. 13 / 21 Review of Basic Concepts Confidence Interval for a Population Mean

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Inference About a Parameter: Testing a Hypothesis A point estimate is the most likely value of the parameter. A confidence interval is a calibrated range of plausible values. Sometimes we just want to know whether a particular value is plausible. We assess its plausibility by testing statistical hypotheses . 14 / 21 Review of Basic Concepts Testing a Hypothesis About a Mean

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Example: µ 0 is an interesting value of the population mean µ . Null hypothesis, H 0 : µ = µ 0 Alternative hypothesis, H a : µ � = µ 0 . Data are a sample of size n with mean ¯ y and standard deviation s . Basic idea: H 0 is implausible if ¯ y is far from µ 0 . 15 / 21 Review of Basic Concepts Testing a Hypothesis About a Mean

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II To be precise: t = | ¯ y − µ 0 | s / √ n measures how far ¯ y is from µ 0 , as a multiple of the standard error of estimate. Basic idea: reject H 0 if t is large. To be precise: choose a level of significance α ; again often α = . 05. Reject H 0 if t > t α/ 2 , n − 1 . 16 / 21 Review of Basic Concepts Testing a Hypothesis About a Mean

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II We can show that when H 0 is true, that is µ = µ 0 , and the data are normally distributed, the chance of (incorrectly) rejecting H 0 is α . That is, α is the chance of making a Type I error . If a statistician always follows this procedure, true null hypotheses will be rejected only 100 α % of the time. So when a null hypothesis is rejected, either it was actually false, or one of these infrequent errors occurred. Note: We never accept H 0 , we only fail to reject it. 17 / 21 Review of Basic Concepts Testing a Hypothesis About a Mean

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II This is a two-tailed test: we reject H 0 if ¯ y is too far from µ 0 in either direction. In regression analysis, almost all tests are two-tailed. M&S discuss one-tailed tests, and provide an example. Deciding which hypothesis is H 0 and which is H a may not be easy. 18 / 21 Review of Basic Concepts Testing a Hypothesis About a Mean

Sampling Distribution of a Statistic Recall: a statistic is a summary - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Sampling Distribution of a Statistic Recall: a statistic is a summary calculated from a sample. Statistics vary from sample to sample. If samples

Sampling distribution STAT 587 (Engineering) Iowa State University September 23, 2020 Sampling

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling

Evaluation of the GMTED2010 (250m) to derive topographic variables at 1km resolution (first

Create Sampling Distributions from Single Die V0G 11/16/2016 V0G Create Sampling Distribution

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Students t -Distribution The t -Distribution, t -Tests, & Measures of Effect Size Sampling

Math 140 Sampling Distributions. Distribution of summary statistics obtained from taking

STAT 113: EXAM 2 PRACTICE PROBLEMS SOLUTION Inference Foundations. Parameters and Statistics.

Bayesian Subnational Estimation using Complex Survey Data: Overview, Motivation and Survey

Overview 1. Probabilistic Reasoning/Graphical models 2. Importance Sampling 3. Markov Chain

Sampling and Filtering Techniques Sampling and Filtering Techniques for IP Packet Selection for

DS504/CS586: Big Data Analytics Data acquisition and measurement Prof. Yanhua Li Time: 6:00pm

Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling Instructor: Sargur N. Srihari

Sampling in Practice GESIS Survey Guidelines Sabine Hder These slides are based on the GESIS

Sampling and Estimation in Network Graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for

Sampling Distribution of a Statistic Recall: a statistic is a summary - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Sampling Distribution of a Statistic Recall: a statistic is a summary calculated from a sample. Statistics vary from sample to sample. If samples

Sampling distribution STAT 587 (Engineering) Iowa State University September 23, 2020 Sampling

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Distributions Sampling Distribution of the Mean &amp; Hypothesis Testing Sampling

Evaluation of the GMTED2010 (250m) to derive topographic variables at 1km resolution (first

Create Sampling Distributions from Single Die V0G 11/16/2016 V0G Create Sampling Distribution

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Students t -Distribution The t -Distribution, t -Tests, &amp; Measures of Effect Size Sampling

Math 140 Sampling Distributions. Distribution of summary statistics obtained from taking

STAT 113: EXAM 2 PRACTICE PROBLEMS SOLUTION Inference Foundations. Parameters and Statistics.

Bayesian Subnational Estimation using Complex Survey Data: Overview, Motivation and Survey

Overview 1. Probabilistic Reasoning/Graphical models 2. Importance Sampling 3. Markov Chain

Sampling and Filtering Techniques Sampling and Filtering Techniques for IP Packet Selection for

DS504/CS586: Big Data Analytics Data acquisition and measurement Prof. Yanhua Li Time: 6:00pm

Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling Instructor: Sargur N. Srihari

Sampling in Practice GESIS Survey Guidelines Sabine Hder These slides are based on the GESIS

Sampling and Estimation in Network Graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling

Students t -Distribution The t -Distribution, t -Tests, & Measures of Effect Size Sampling