Business Statistics CONTENTS Comparing the variance of two - - PowerPoint PPT Presentation

▶

Dec 03, 2022 262 likes •477 views

TWO 2 S: COMPARISONS Business Statistics CONTENTS Comparing the variance of two populations The -distribution The -test Levenes test Old exam question Further study COMPARING THE VARIANCE OF TWO POPULATIONS So far, the

SLIDE 1

TWO 𝜏2S: COMPARISONS

Business Statistics

SLIDE 2

Comparing the variance of two populations The 𝐺-distribution The 𝐺-test Levene’s test Old exam question Further study CONTENTS

SLIDE 3

▪ So far, the emphasis was on differences in centrality ▪ There are also questions on differences in dispersion ▪ Context:

▪ you can choose between two drilling machines ▪ both make holes of the specified size ▪ but the precision of the two may be different ▪ so you do an experiment (intended hole size: 3 mm)

COMPARING THE VARIANCE OF TWO POPULATIONS

SLIDE 4

▪ A second case ▪ Recall that we can compare two population means under the assumption of equal population variances

▪ using the pooled variance

▪ Thus, we may need to check if the populations variances are indeed equal COMPARING THE VARIANCE OF TWO POPULATIONS

SLIDE 5

Test statistic to consider ▪ A combination of S𝑌

2 and S𝑍 2

▪ but the null distribution of 𝑇𝑌

2 − 𝑇𝑍 2 is problematic

▪ Instead,

𝑇𝑌

2 𝑇𝑍

2 is possible

▪ or

𝑇𝑍

𝑇𝑌

2 (sometimes easier)

▪ What is the hypothesis?

▪ 𝐼0: 𝜏𝑌

2 = 𝜏𝑍 2

▪ 𝐼0: 𝜏𝑌

2 ≥ 𝜏𝑍 2

▪ 𝐼0: 𝜏𝑌

2 ≤ 𝜏𝑍 2

▪ but 𝜏𝑌

2 = 𝜏𝑍 2 + 3 etc. is not possible!

COMPARING THE VARIANCE OF TWO POPULATIONS

These hypotheses are equivalent to 𝑇𝑌

𝑇𝑍

2 = 1

(or ≥ 1 or ≤ 1)

SLIDE 6

▪ For normally distributed populations, it is known that:

▪ under 𝐼0: 𝜏𝑌

2 = 𝜏𝑍 2:

𝑇𝑌

2

𝑇𝑍

2 ~𝐺𝑜𝑌−1,𝑜𝑍−1

▪ where 𝐺

df1,df2 is the F-distribution

▪ with df1 and df2 degrees of freedom ▪ note: 𝐺 is a ratio of two variances, use df1 for the numerator and df2 for the denominator

THE 𝐺-DISTRIBUTION

SLIDE 7

▪ So, we compute the value of a test statistic 𝐺calc = 𝑡𝑌

2

𝑡𝑍

2

▪ and expect it to be around 1 if 𝐼0 is true ▪ and reject 𝐼0 if 𝐺

calc is “too” small or “too” large

▪ we need to look up the critical values of the 𝐺-distribution

THE 𝐺-DISTRIBUTION

𝐺crit,lower 𝐺𝑑𝑠𝑗𝑢,𝑣𝑞𝑞𝑓𝑠 1 Of course (!) you expect 𝐺 = 1 when 𝐼0 is true

SLIDE 8

▪ 𝐺-distribution

▪ like 𝜓2 not symmetrical and strictly positive ▪ need to find 𝐺

crit,lower and 𝐺 crit,upper

THE 𝐺-DISTRIBUTION

SLIDE 9

Looking up critical values for 𝐺 THE 𝐺-DISTRIBUTION

𝛽/2 df2 df1 Is this 𝐺crit,lower or 𝐺crit,upper?

SLIDE 10

▪ Finding 𝐺crit,lower when you know 𝐺crit,upper 𝐺crit,lower df1, df2 = 1 𝐺crit,upper df2, df1 ▪ so 𝐺crit,lower =

1 3.85 = 0.26

THE 𝐺-DISTRIBUTION

𝑇1

𝑇2

2 > 𝑏 ⇔ 𝑇2 2

𝑇1

2 < 1

𝑏

SLIDE 11

Step 1: ▪ 𝐼0: 𝜏1

2 = 𝜏2 2; 𝐼1: 𝜏1 2 ≠ 𝜏2 2; 𝛽 = 0.05

Step 2: ▪ sample statistic: 𝐺 =

𝑇1

𝑇2

2; reject for “too small” and “too large”

values Step 3: ▪ null distribution: 𝐺~𝐺df1,df2 ▪ both populations must be normally distributed (no CLT here!) Step 4: ▪ 𝐺calc =

𝑡1

𝑡2

2; 𝐺crit,lower = ⋯; 𝐺crit,upper = ⋯ (use 𝛽/2)

Step 5: ▪ reject 𝐼0 if 𝐺calc < 𝐺crit,lower or 𝐺calc > 𝐺crit,upper

THE 𝐺-TEST

SLIDE 12

Example: ▪ machine 1 gives 𝑡1

2 = 0.012 with 𝑜1 = 6

▪ machine 2 gives 𝑡2

2 = 0.016 with 𝑜2 = 7

Computations: ▪ 𝐺calc =

0.012 0.016 = 0.75

▪ swap the two machines

▪ 𝐺

calc = 0.016 0.012 = 1.33

▪ null distribution: 𝐺~𝐺6,5 ▪ 𝐺crit;upper = 𝐺6,5;0.025 = 6.98 ▪ 𝐺calc < 𝐺crit;upper, do not reject 𝐼0 THE 𝐺-TEST

upper critical values can be read in the 𝐺- table, so easier to look up 𝐺crit,upper

SLIDE 13

Trick to avoid the use of 𝐺crit,lower in 𝐼0: 𝜏1

2 = 𝜏2 2

▪ put largest sample variance in numerator ▪ so calculated value of test statistic is 𝐺calc =

𝑡1

2 𝑡2

2 or 𝐺calc =

𝑡2

2 𝑡1

2

▪ with this, 𝐺𝑑𝑏𝑚𝑑 > 1, so we need only consider 𝐺𝑑𝑠𝑗𝑢,𝑣𝑞𝑞𝑓𝑠

▪ because 𝐺

𝑑𝑠𝑗𝑢,𝑚𝑝𝑥𝑓𝑠 is always < 1

▪ formally reject for “too large” and “too small” ▪ so keep using 𝛽/2 and not 𝛽 (it is still a two-sided test) THE 𝐺-TEST

SLIDE 14

One-sided tests: what is different compared to two-sided? Step 1: ▪ 𝐼0: 𝜏1

2 ≥ 𝜏2 2; 𝐼1: 𝜏1 2 < 𝜏2 2; 𝛽 = 0.05

▪ reformulate as 𝐼0: 𝜏2

2 ≤ 𝜏1 2; 𝐼1: 𝜏2 2 > 𝜏1 2; 𝛽 = 0.05

Step 2: ▪ sample statistic: 𝐺 =

𝑇2

𝑇1

2; reject for “too large” values

Step 3: ▪ null distribution: 𝐺~𝐺

df2,df1

▪ both populations must be normally distributed Step 4: ▪ 𝐺

calc = 𝑡2

𝑡1

2; 𝐺

crit,upper = ⋯ (use 𝛽)

Step 5: ▪ reject 𝐼0 if 𝐺

𝑑𝑏𝑚𝑑 > 𝐺 crit,upper

THE 𝐺-TEST

SLIDE 15

Given a sample 1 with 𝑜1 = 9 and 𝑡1 = 4 and a sample 2 with 𝑜2 =7 and 𝑡2 = 5. We want to test 𝐼0: 𝜏1

2 = 𝜏2 2.

a. What is step 2?
b. What is step 3?

EXERCISE 1

SLIDE 16

Recall that SPSS also did a comparison of two variances when we asked for comparing two means ▪ Levene’s test LEVENE’S TEST

SLIDE 17

The Levene test ▪ is based on a different principle ▪ requires no normal populations (!) ▪ also yields an 𝐺calc

▪ but with different values of df

▪ reject for large values only ▪ yields a 𝑞-value for 𝐼0: 𝜏1

2 = 𝜏2 2

LEVENE’S TEST

SLIDE 18

▪ When comparing two 𝜈s with the 𝑢-test, we needed to estimate 𝜏1

2 and 𝜏2 2

▪ we could estimate 𝜏1

2 by 𝑡1 2 and 𝜏2 2 by 𝑡2 2

▪ or assume 𝜏1

2 = 𝜏2 2 and use the pooled 𝑡P 2 to estimate both

▪ The second is only allowed if the two sample variances 𝑡1

2

and 𝑡2

2 are not “too unequal”

▪ Levene’s test tests this ▪ a low 𝑞-value means: unequal, so don’t do it ▪ do not necessarily use the same 𝛽 ▪ Why Levene and not the “normal” 𝐺-test?

LEVENE’S TEST

Usually, a low 𝑞- value means bingo, but not so here ... Rule of thumb: use 𝛽 = 0.1 Levene is a non-parametric test ...

SLIDE 19

26 March 2015, Q1c OLD EXAM QUESTION

SLIDE 20

Doane & Seward 5/E 10.7 Tutorial exercises week 3 Fisher 𝐺-test FURTHER STUDY