 
              HYPOTHESES: LOGIC AND FRAMEWORK Business Statistics
CONTENTS A hypothesis test Hypotheses Rejection region and significance level Five-step procedure for hypothesis tests More on hypotheses Old exam question Further study
A HYPOTHESIS TEST ▪ Suppose a beverage company wants to test if its bottles are filled with 1 liter ▪ more than 1 liter: not competitive ▪ less than 1 liter: trouble with the consumers association ▪ They take a random sample of 9 bottles ▪ and find ҧ 𝑦 = 1.02 liter ▪ Can they claim 𝜈 = 1 liter? ▪ Assume: ▪ population is normally distributed ▪ population has standard deviation 𝜏 = 0.003 liter ▪ so: 𝑌~𝑂 𝜈 =? , 𝜏 = 0.003
A HYPOTHESIS TEST If all assumptions (including 𝜈 = 1 !) are true: ▪ The sampling distribution of the mean ( ത 𝑌 ) Or even larger? ▪ is normal We’ll go into that soon. ▪ has mean 𝜈 ത 𝑌 = 𝜈 𝑌 = 1 𝜏 𝑌 0.003 ▪ has standard deviation 𝜏 ത 𝑌 = 9 = = 0.001 3 ▪ So, there is a probability of finding a sample mean ത 𝑌 = 1.02 or even larger, given by ത 𝑌−1 1.02−1 ▪ 𝑄 𝑂 ത 𝑌 ≥ 1.02 = 𝑄 𝑎 0.001 ≥ = 𝑄 𝑎 𝑎 ≥ 20 = 0.001 0.000 ≈ 0% ▪ very very unlikely! ▪ So, you can reject the claim 𝜈 𝑌 = 1 with high confidence!
A HYPOTHESIS TEST Now, suppose you had found ҧ 𝑦 = 1.002 liter ▪ There is a probability of finding a sample mean ത 𝑌 = 1.002 or even larger, given by ത 𝑌−1 1.002−1 ▪ 𝑄 𝑂 ത 𝑌 ≥ 1.002 = 𝑄 𝑎 0.001 ≥ = 𝑄 𝑎 𝑎 ≥ 2 = 0.001 0.02275 ≈ 2.3% ▪ not very likely, but it may certainly happen now and then ▪ So, you can reject the claim 𝜈 𝑌 = 1 with some confidence ▪ but you know that there is some chance to make the wrong decision ▪ Or: you can decide to not reject the claim 𝜈 𝑌 = 1 ▪ because you know that it may still be true, despite the data
HYPOTHESES In general a hypothesis is an unproven assertion ▪ In statistics: ▪ a hypothesis is a claim about a (population!) parameter ▪ Examples: ▪ the mean monthly cell phone bill of this city is 42$ is ( 𝜈 = $42) ▪ the proportion of adults in this city with an iPhone is at least 0.68 ( 𝜌 ≥ 0.68 ) ▪ the variance of spending on fashion for men is not smaller than that for women ( 𝜏 men 2 2 ) ≥ 𝜏 women ▪ the median life expectancy is the same for all three income groups ( 𝑁 1 = 𝑁 2 = 𝑁 3 )
HYPOTHESES Statistical hypotheses have the following aspects: ▪ A (population!) parameter ▪ 𝜈 , 𝜌 , 𝜏 2 , etc. ▪ In case of one-sample: a benchmark ▪ 𝜈 = 181 , 𝜌 ≤ 0.2 , etc. ▪ In case of several samples: a comparison 2 , etc. 2 = 𝜏 2 2 = 𝜏 3 ▪ 𝜈 1 = 𝜈 2 , 𝜌 1 − 𝜌 2 ≤ 0.2 , 𝜏 1 A hypothesis test is a decision between two competing mutually exclusive and collectively exhaustive hypotheses about the value(s) of the parameter(s)
HYPOTHESES ▪ Examples of a hypothesis test: ▪ 𝐼 0 : 𝜈 = 181 versus 𝐼 1 : 𝜈 ≠ 181 ▪ 𝐼 0 : 𝜌 ≤ 0.2 versus 𝐼 1 : 𝜌 > 0.2 ▪ Terminology ▪ 𝐼 0 is the null hypothesis (on which the test focuses) ▪ 𝐼 1 is the alternative hypothesis ▪ We focus on 𝐼 0 ▪ so if we reject 𝐼 0 , we automatically accept 𝐼 1 ▪ while if we do not reject 𝐼 0 , we “maintain” 𝐼 0 (but do not reject 𝐼 1 and do not accept 𝐼 0 )
EXERCISE 1 A government official wants to proudly announce that unemployment is under 4%. Which hypothesis should he test?
REJECTION REGION AND SIGNIFICANCE LEVEL ▪ Example: ▪ 𝐼 0 : 𝜈 = 181 versus 𝐼 1 : 𝜈 ≠ 181 ▪ We collect data and perform the hypothesis test ▪ Two possible outcomes: ▪ reject 𝐼 0 , so accept 𝐼 1 , and conclude 𝜈 ≠ 181 ▪ do not reject 𝐼 0 , and conclude that there is no evidence to reject 𝜈 = 181 ▪ Whatever the decision is, you may be wrong ▪ there is sampling variation ▪ you may always have an exceptional sample ▪ example: if you want to test if a coin is fair, it may happen that you have only “heads” in your sample, even if the coin is fair!
REJECTION REGION AND SIGNIFICANCE LEVEL ▪ Between rejecting and not rejecting, there is a boundary ▪ This boundary defines the risk you are prepared to take ▪ if you want to test if a coin is fair, and you use a sample of size 20 , how many “heads” will induce you to reject the null hypothesis ( 𝜌 = 0.5 )? ▪ You will determine a rejection region ▪ for instance: you will reject the null hypothesis ( 𝜌 = 0.5 ) when you obtain 5 heads or fewer, or 15 heads or more ▪ You use a pre-established significance level to determine the boundaries of the rejection region
REJECTION REGION AND SIGNIFICANCE LEVEL ▪ So, you define a significance level ▪ conventional symbol 𝛽 ▪ often taken to be 0.05 ▪ but also 0.1 , 0.01 , 0.005 , 0.001 , etc are used often ▪ There is a close link between ▪ the confidence level ( 1 − 𝛽 , as used in a confidence interval) ▪ and a significance level ( 𝛽 , as used in a hypothesis test) ▪ confidence level+significance level=1
REJECTION REGION AND SIGNIFICANCE LEVEL ▪ Suppose we have a sample and want to see if it comes from a distribution with mean 𝜈 0 ▪ assuming normality of the population ▪ assuming a known value for 𝜏 ▪ testing 𝜈 = 𝜈 0 at a significance level 𝛽 = 5% ▪ We want to determine boundary values for ത 𝑌 such that the claim 𝜈 = 𝜈 0 becomes unlikely So, we distribute the 𝛽 = 5% equally at ▪ upper boundary: 𝑄 ത 𝑌 ≥ ҧ 𝑦 upper = 0.025 both sides ▪ lower boundary: 𝑄 ത 𝑌 ≤ ҧ 𝑦 lower = 0.025 ▪ If the value of the test statistic is in the rejection region ▪ so if ҧ 𝑌 or ҧ 𝑦 data ≤ 𝜈 0 − 1.96𝜏 ത 𝑦 data ≥ 𝜈 0 + 1.96𝜏 ത 𝑌 ▪ we reject 𝐼 0 : 𝜈 = 𝜈 0 and accept 𝐼 1 : 𝜈 ≠ 𝜈 0
ҧ REJECTION REGION AND SIGNIFICANCE LEVEL Rejection region for non-standardized statistic ത 𝑌 ( 𝛽 = 0.05 ) 1 − 𝛽 = 0.95 𝛽 𝛽 2 = 0.025 2 = 0.025 Reject H 0 Do not reject H 0 Reject H 0 𝑨 crit 𝑦 crit 𝜈 0 = 𝜈 0 + 1.96𝜏 ത = 𝜈 0 − 1.96𝜏 ത 𝑌 𝑌
REJECTION REGION AND SIGNIFICANCE LEVEL ▪ The rejection region in this test is defined by the boundary values 𝜈 0 − 1.96𝜏 ത 𝑌 and 𝜈 0 + 1.96𝜏 ത 𝑌 ▪ But we can also standardize the test statistic, and focus on ത 𝑌−𝜈 0 𝑌 rather than on ത 𝑎 = 𝑌 𝜏 ഥ ▪ The rejection region in this test is defined by the boundary values −1.96 and 1.96 ▪ If the value of your standardized test statistic is in the rejection region ▪ so if 𝑨 data ≤ −1.96 or 𝑨 data ≥ 1.96 ▪ reject 𝐼 0 : 𝜈 = 𝜈 0 and accept 𝐼 1 : 𝜈 ≠ 𝜈 0
REJECTION REGION AND SIGNIFICANCE LEVEL ത 𝑌−𝜈 0 Rejection region for standardized statistic 𝑎 = ( 𝛽 = 𝜏 ഥ 𝑌 0.05 ) 1 − 𝛽 = 0.95 𝛽 𝛽 2 = 0.025 2 = 0.025 Reject H 0 Do not reject H 0 Reject H 0 0 𝑨 crit = +1.96 𝑨 crit = −1.96
EXERCISE 2 Suppose we test a hypothesis on the mean 𝐼 0 : 𝜈 = 310 with significance level 𝛽 = 0.05 . We sample data, and calculate a test statistic 𝑢 = −2.13 . What do we conclude?
FIVE-STEP PROCEDURE FOR HYPOTHESIS TESTS Five-step procedure ▪ step 1: state the hypotheses and the significance level ▪ step 2: choose a sample statistic and determine the rejection region (qualitatively) ▪ step 3: determine the null distribution, and state and/or check the requirements needed ▪ step 4: calculate the value of the test statistic and its critical value(s) ▪ step 5: draw conclusions These steps are done somewhat differently in every book and course. Never mind, all elements reappear.
FIVE-STEP PROCEDURE FOR HYPOTHESIS TESTS Using an example about the mean body height 𝜈 𝑌 of a 2 = 225 cm 2 population with 𝜏 𝑌 ▪ On the basis of a sample of size 𝑜 = 100 with ҧ 𝑦 = 179.1 cm Step 1 ▪ State the hypotheses and the significance level ▪ null hypothesis 𝐼 0 : 𝜈 𝑌 = 181 ▪ alternative hypothesis 𝐼 1 : 𝜈 𝑌 ≠ 181 ▪ significance level 𝛽 = 0.05
FIVE-STEP PROCEDURE FOR HYPOTHESIS TESTS Step 2 ▪ Choose a sample statistic and determine the rejection region (qualitatively) ▪ sample statistic: sample mean ത 𝑌 ▪ because the hypothesis is about 𝜈 𝑌 ▪ rejection region: reject 𝐼 0 when ҧ 𝑦 is “too small” or “too large” ▪ because both situations suggest that 𝐼 0 is probably wrong
FIVE-STEP PROCEDURE FOR HYPOTHESIS TESTS Step 3 ▪ Determine the null distribution, and state and/or validate the requirements needed 225 A. sampling distribution of ത 𝑌 under 𝐼 0 : ത 𝑌~𝑂 181, 100 ത 𝑌−181 ▪ or even better: 𝑎 = 225/100 ~𝑂 0,1 ▪ where the sample statistic ത 𝑌 is transformed into a standardized test statistic 𝑎 B. requirements: because 𝑜 = 100 ≥ 30 , the sampling distribution of ത 𝑌 will indeed be approximately normal ▪ no additional assumptions are needed
FIVE-STEP PROCEDURE FOR HYPOTHESIS TESTS Step 4 ▪ Calculate the value of the test statistic and its critical value(s) 179.1−181 ▪ value of 𝑎 calculated from the data is 225/100 = −1.267 ▪ we write this as 𝑨 calc = −1.267 ▪ critical values of 𝑎 from the table are 𝑨 crit,lower,0.025 = −1.96 and 𝑨 crit,upper,0.025 = 1.96 ▪ rejection region for 𝑎 is 𝑆 crit = −∞, −1.96 ∪ [1.96, ∞) −1.96 0 +1.96 −1.267
Recommend
More recommend