Business Statistics CONTENTS Estimating parameters The sampling - PowerPoint PPT Presentation

𝜈 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

CONTENTS Estimating parameters The sampling distribution Confidence intervals for 𝜈 Hypothesis tests for 𝜈 The 𝑢 -distribution Comparison of 𝑨 and 𝑢 Old exam question Further study

ESTIMATING PARAMETERS Central task in inferential statistics ▪ Estimation ▪ estimating a parameter (population value) from a sample ▪ Example ▪ what proportion of cars in Amsterdam is electric? ▪ population value: 𝜌 ▪ sample of size 𝑜 = 200 cars yields 26 electric cars 26 ▪ so, 𝑞 = 200 = 0.13 ▪ this suggests 𝜌 ≈ 0.13

ESTIMATING PARAMETERS Terminology ▪ Parameter ▪ a characteristic descriptive of the population ▪ e.g., 𝜈 , 𝜌 , 𝜏 (or 𝜏 2 ) ▪ Estimator ▪ a statistic derived from a sample to infer the value of a population parameter ▪ e.g., ത 𝑌 , 𝑄 , 𝑇 (or 𝑇 2 ) ▪ Estimate ▪ the value of the estimator in a particular sample ▪ e.g., ҧ 𝑦 , 𝑞 , 𝑡 (or 𝑡 2 )

ESTIMATING PARAMETERS

ҧ ESTIMATING PARAMETERS Estimator Estimate Population parameter 1 1 Mean 𝜈 ത 𝑜 𝑜 𝑌 = 𝑜 σ 𝑗=1 𝑌 𝑗 𝑦 = 𝑜 σ 𝑗=1 𝑦 𝑗 Standard 𝜏 1 1 𝑜 𝑌 𝑗 − ത 𝑜 𝑌 2 𝑦 2 𝑇 = 𝑜−1 σ 𝑗=1 𝑡 = 𝑜−1 σ 𝑗=1 𝑦 𝑗 − ҧ deviation 𝑦 𝑌 Proportion 𝜌 𝑞 = 𝑄 = 𝑜 𝑜

ESTIMATING PARAMETERS ▪ Another example (Amsterdam, 2015): ▪ what is the mean price of a glass of beer? ▪ population value: 𝜈 ▪ sample of size 𝑜 = 64 glasses of beer yields ҧ 𝑦 = 2.06€ ▪ this suggests that 𝜈 = 2.06€ ▪ But suppose we had taken a different sample ▪ again with sample size 𝑜 = 64 ▪ but now perhaps yielding ҧ 𝑦 = 2.13€ ▪ then we would estimate 𝜈 = 2.13€ ▪ Obviously there is sampling variation 𝑦 -values (the sampling distribution of ത ▪ so a distribution of ҧ 𝑌 ) ▪ Solution: point estimates and confidence intervals

THE SAMPLING DISTRIBUTION ▪ Example ▪ Consider a discrete uniform population consisting of the integers {0, 1, 2, 3} ▪ The population parameters are: ▪ 𝜈 = 1.5 ▪ 𝜏 = 1.118

THE SAMPLING DISTRIBUTION ▪ Sample 𝑜 = 2 values and calculate ҧ 𝑦 ▪ Do this for all possible sample of size 𝑜 = 2 𝑦 -values: the distribution ത ▪ You will get a distribution of ҧ 𝑌

THE SAMPLING DISTRIBUTION ▪ We will study the variance of the estimate of a population parameter from a sample statistic ▪ We will do so by studying how the sample statistic varies when you draw a different sample ▪ Example: ▪ GMAT score of MBA students ▪ 𝑂 = 2637 ▪ 𝜈 = 520.78 ▪ 𝜏 = 86.60

THE SAMPLING DISTRIBUTION ▪ Consider eight random samples, each of size 𝑜 = 5 ▪ the sample means ( ҧ 𝑦 8 = 582 ) 𝑦 1 = 504.0, ҧ 𝑦 2 = 576.0, … , ҧ tend to be close to the population mean ( 𝜈 = 520.78 ) ▪ sometimes a bit lower, sometimes a bit higher

THE SAMPLING DISTRIBUTION ▪ The dot plots show that the sample means ( ҧ 𝑦 8 ) 𝑦 1 , … , ҧ have much less variation than the individual data points ( 𝑦 1 , … , 𝑦 2637 )

THE SAMPLING DISTRIBUTION ▪ An estimator is a random variable since samples vary ▪ so we write it as a capital letter, e.g., 𝑌 , ത 𝑌 , 𝑇 , etc. ▪ The sampling distribution of an estimator is the probability distribution of all possible values the statistic may assume when a random sample of (a fixed) size 𝑜 is taken ▪ so we write 𝑌~𝑂 𝜈, 𝜏 , etc.

THE SAMPLING DISTRIBUTION ▪ The sampling distribution of ത 𝑌 ▪ for a population with 𝜈 = 𝜈 𝑌 and 𝜏 2 = 𝜏 𝑌 2 ▪ If the CLT holds 2 𝑌~𝑂 𝜈 𝑌 , 𝜏 𝑌 3 things: ത shape, mean, dispersion 𝑜 ▪ So, the statistic ത 𝑌 ▪ is normally distributed ▪ has mean 𝜈 𝑌 𝜏 𝑌 ▪ and has standard deviation 𝑜 ▪ Fortunately, the CLT holds pretty often

THE SAMPLING DISTRIBUTION ▪ The standard deviation of the distribution of sample means ത 𝑌 𝜏 𝑌 ▪ is given by 𝜏 ത 𝑌 = 𝑜 ▪ has a special name: standard error of the mean ▪ is often abbreviated as the standard error (SE) ▪ decreases with increasing sample size ▪ but only according to the “law of diminishing returns” ( 1/ 𝑜 ) ▪ is often calculated by software (SPSS, etc.) ▪ is the basis for confidence intervals and hypothesis tests (see later) That’s a bit confusing, because we will meet more standard errors later on

EXERCISE 1 What is the meaning of the standard error?

CONFIDENCE INTERVALS FOR 𝜈 ▪ A sample mean ҧ 𝑦 is a point estimate of the population mean 𝜈 ▪ it is the best possible estimate of 𝜈 To simplify notation, we will drop the “ 𝑌 ” from 𝜈 𝑌 now, ▪ but it will probably not be completely right and write just 𝜈 ▪ A confidence interval (CI) for the mean is a range of possible values for 𝜈 : 𝜈 lower ≤ 𝜈 ≤ 𝜈 upper ▪ such that the interval 𝐷𝐽 𝜈 = 𝜈 lower , 𝜈 upper contains the true value ( 𝜈 ) with a certain probability (e.g., 95% )

ҧ CONFIDENCE INTERVALS FOR 𝜈 ▪ From the CLT it follows that under certain conditions: the distribution of ത ▪ 𝑌 is normal the best estimate of ത ▪ 𝑌 of 𝜈 is ҧ 𝑦 𝜏 the standard deviation of ത 𝑌 is ▪ 𝑜 ▪ This implies that: 𝜏 𝜏 with probability 2.5% , ത 𝑜 ⇒ 𝜈 > ത ▪ 𝑌 < 𝜈 − 1.96 𝑌 + 1.96 𝑜 𝜏 𝜏 with probability 2.5% , ത 𝑜 ⇒ 𝜈 < ത ▪ 𝑌 > 𝜈 + 1.96 𝑌 − 1.96 𝑜 𝜏 𝜏 so with probability 95% , ത 𝑜 ≤ 𝜈 ≤ ത ▪ 𝑌 − 1.96 𝑌 + 1.96 𝑜 ▪ So, if we find a sample mean ҧ 𝑦 , we can construct the following 95% confidence interval for 𝜈 : 𝑦 − 1.96 𝜏 𝑦 + 1.96 𝜏 CI 𝜈,0.95 = 𝑜 , ҧ 𝑜

ҧ ҧ ҧ CONFIDENCE INTERVALS FOR 𝜈 Three notations for a confidence interval for 𝜈 𝜏 𝜏 ▪ 𝑦 − 1.96 𝑜 , ҧ 𝑦 + 1.96 𝑜 𝜏 𝜏 ▪ 𝑦 − 1.96 𝑜 ≤ 𝜈 ≤ ҧ 𝑦 + 1.96 𝑜 𝜏 ▪ 𝑦 ± 1.96 𝑜

ҧ CONFIDENCE INTERVALS FOR 𝜈 Example ▪ Population ▪ 𝜈 = 520.78 (unknown) ▪ 𝜏 = 86.60 (known) ▪ normally distributed (assumed) ▪ Sample ▪ 𝑜 = 5 (chosen) 𝑦 = 504.0 (estimated) ▪ ▪ Calculation 86.60 ▪ standard error of mean: 5 = 38.73 ▪ 1.96 × 38.73 = 75.91 ▪ 𝐷𝐽 𝜈,0.95 = 428.09, 579.91

EXERCISE 2 Write the confidence interval 428.09, 579.91 in two alternative ways.

ҧ CONFIDENCE INTERVALS FOR 𝜈 ▪ The factor 1.96 is of course related to the 95% probability ▪ Other confidence levels: Where 𝑨 𝛽/2 is such that 𝑄 𝑎 ≤ 𝑨 𝛽/2 = 𝛽 if 𝑎 is drawn from a 𝑎 -distribution ▪ General form of a 1 − 𝛽 × 100% confidence interval of the mean: 𝜏 𝜏 CI 𝜈,1−𝛽 = 𝑦 − 𝑨 𝛽/2 𝑜 , ҧ 𝑦 + 𝑨 𝛽/2 𝑜

CONFIDENCE INTERVALS FOR 𝜈

CONFIDENCE INTERVALS FOR 𝜈 ▪ Trade-off ▪ narrow CI  low confidence level ▪ wide CI  high confidence level ▪ Choice of confidence level depends on application ▪ more precision required for a refinery than for a dairy farm

CONFIDENCE INTERVALS FOR 𝜈 ▪ A confidence interval either does or does not contain 𝜈 ▪ The confidence level quantifies the risk ▪ Out of 100 confidence intervals, approximately 95% will contain 𝜈 , while approximately 5% might not contain 𝜈

HYPOTHESIS TESTS FOR 𝜈 ▪ We can use the standard error to perform a hypothesis test ▪ recall that 𝐷𝐽 𝜈,0.95 = 428.09, 579.91 ▪ Suppose we hypothesize 𝜈 = 550 ▪ The value 550 is inside the 95% confidence interval for 𝜈 ▪ therefore the sample statistic+confidence interval will not suggest that the hypothesis ( 𝜈 = 550 ) is wrong ▪ and we will not reject the hypothesis ▪ notice that we didn’t say that 𝜈 = 550 ; we only said that we can’t reject it (at a 5% significance level)

HYPOTHESIS TESTS FOR 𝜈 ▪ Another example: suppose we hypothesize that 𝜈 = 600 ▪ The value 600 is outside the confidence interval for 𝜈 ▪ finding a confidence interval not containing 𝜈 happens only in 5% of the cases ▪ so we conclude that 𝜈 ≠ 600 (at a 5% significance level) ▪ therefore the sample statistic+confidence interval will suggest that the hypothesis ( 𝜈 = 600 ) is wrong ▪ and we will reject the hypothesis Much more on hypothesis tests later on!

ҧ THE 𝑢 -DISTRIBUTION 𝜏 𝜏 ▪ A closer look at CI 𝜈,0.95 = 𝑦 − 1.96 𝑜 , ҧ 𝑦 + 1.96 𝑜 ▪ Given a sample mean ҧ 𝑦 , you can find a 95% confidence interval for the population mean 𝜈 ▪ Sounds great when you don’t know 𝜈 ... ▪ ... but it assumes you do know 𝜏 ! ▪ There are many situations in which you don’t know 𝜈 and you also don’t know 𝜏 ▪ So what to do?

THE 𝑢 -DISTRIBUTION ▪ A simple strategy ▪ If the population standard deviation 𝜏 is unknown, we can estimate it with the sample standard deviation 𝑡 𝑡 𝜏 ▪ Then we use ±1.96 𝑜 instead of ±1.96 𝑜 ▪ But we pay a price for that ▪ The reason is that 𝑡 is itself an estimate of 𝜏 , and therefore uncertain ▪ The price we pay is that the factor “ 1.96 ” must be somewhat larger

Business Statistics CONTENTS Estimating parameters The sampling - PowerPoint PPT Presentation

: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics CONTENTS Estimating parameters The sampling distribution Confidence intervals for Hypothesis tests for The -distribution Comparison of and Old exam

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Global assessment of linking trade statistics and the business register Nancy Snyder United

Introduction to Business Statistics Professor Jarad Niemi STAT 226 - Iowa State University

Business and Business Environment Business and Business Environment Introduction Business is

Business statistics and Globalisation UN Committee of Experts on Business Statistics First

Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 3 t

Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 4 t

REPUBLIC OF NAMIBIA WHAT IS FOREIGN TRADE STATISTICS WHAT IS FOREIGN TRADE STATISTICS Records

AP Biology and Statistics Statistics Statistics help to better understand the meaning of a

CS422 Computer Architecture Spring 2004 Lecture 02, 01 Jan 2004 Bhaskaran Raman Department of

Presenter : Junaid Maqsood Carleton University O UTLINE : Background Information

The transition On May 17 th , 2004, Intel, the worlds largest chip maker, canceled the

THE FRONTEND TA B OO A STORY OF FULL STACK MICRO-SERVICES L U I S M I N E I R O @voidmaze

Term Scores VSM, Session 3 CS6200: Information Retrieval Slides by: Jesse Anderton Flaws of TF

Computer Chinese Chess Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

Strong Law of Large Numbers Will Perkins February 12, 2013 The Theorem Theorem (Strong Law of

MATH 20: PROBABILITY Fundamental Theorems of Probability Theory Xingru Chen

Sambuz

Useful Links

Newsletter

Mail Us

Business Statistics CONTENTS Estimating parameters The sampling - PowerPoint PPT Presentation

: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics CONTENTS Estimating parameters The sampling distribution Confidence intervals for Hypothesis tests for The -distribution Comparison of and Old exam

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics &amp; Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Global assessment of linking trade statistics and the business register Nancy Snyder United

Introduction to Business Statistics Professor Jarad Niemi STAT 226 - Iowa State University

Business and Business Environment Business and Business Environment Introduction Business is

Business statistics and Globalisation UN Committee of Experts on Business Statistics First

Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 3 t

Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 4 t

REPUBLIC OF NAMIBIA WHAT IS FOREIGN TRADE STATISTICS WHAT IS FOREIGN TRADE STATISTICS Records

AP Biology and Statistics Statistics Statistics help to better understand the meaning of a

CS422 Computer Architecture Spring 2004 Lecture 02, 01 Jan 2004 Bhaskaran Raman Department of

Presenter : Junaid Maqsood Carleton University O UTLINE : Background Information

The transition On May 17 th , 2004, Intel, the worlds largest chip maker, canceled the

THE FRONTEND TA B OO A STORY OF FULL STACK MICRO-SERVICES L U I S M I N E I R O @voidmaze

Term Scores VSM, Session 3 CS6200: Information Retrieval Slides by: Jesse Anderton Flaws of TF

Computer Chinese Chess Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

Strong Law of Large Numbers Will Perkins February 12, 2013 The Theorem Theorem (Strong Law of

MATH 20: PROBABILITY Fundamental Theorems of Probability Theory Xingru Chen

Sambuz

Useful Links

Newsletter

Mail Us

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning