SLIDE 1
Business Statistics CONTENTS Comparing two samples Comparing two - - PowerPoint PPT Presentation
Business Statistics CONTENTS Comparing two samples Comparing two - - PowerPoint PPT Presentation
TWO S OR MEDIANS: COMPARISONS Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing the means of two unrelated samples Comparing the medians of two unrelated samples Old exam question Further study
SLIDE 2
SLIDE 3
It often happens that we want to compare two situations โช do I sell more when there is music in my shop? โช is the expensive machine more precise than the cheap
- ne?
โช are adverisements on TV or internet equally profitable? โช do people buy more on Tuesdays than on Wednesday? โช in couples, who drinks more: the man or the woman? โช etc. COMPARING TWO SAMPLES
SLIDE 4
In all these questions we compare two populations โช Situation 1: two populations (or sub-populations) with similar variable
โช sales in 105 days without music โช sales in 96 days with music
โช Data matrix: two options COMPARING TWO SAMPLES
SPSS requires this data presentation
SLIDE 5
โช Situation 2: one sample with paired observations
โช drinks of the man in 78 couples โช drinks of the woman in the same 78 couples
โช Data matrix: one option only โช Will be discussed in a later lecture COMPARING TWO SAMPLES
SLIDE 6
Situation 1 โช independent samples/unrelated samples โช introduce symbols for the two random variables โช e.g., using ๐1 en ๐2
โช ๐1 with sample ๐1,1, ๐1,2, โฆ , ๐1,๐1 and ๐2 with sample ๐2,1, ๐2,2, โฆ , ๐2,๐2
โช or using ๐ and ๐
โช ๐: ๐1, ๐2, โฆ , ๐๐๐ and ๐: ๐
1, ๐ 2, โฆ , ๐ ๐๐
โช sample sizes can be different COMPARING TWO UNRELATED SAMPLES
Or of course using โmeaningfulโ indices: ๐๐ถ and ๐๐ป for Belgium and Germany. Not ๐ถ and ๐ป, because we need to stress that it is โaboutโ a variable ๐ (like sales)
SLIDE 7
We want to test hypothesis such as โช are the means equal?
โช ๐ผ0: ๐๐ = ๐๐ or ๐ผ0: ๐1 = ๐2 or ๐ผ0: ๐๐1 = ๐๐2 or ...
โช are the variances equal?
โช ๐ผ0: ๐๐
2 = ๐๐ 2 or etc.
โช are the proportions equal
โช ๐ผ0: ๐๐ = ๐๐ or etc.
Also: โช inequalities, like ๐ผ0: ๐๐ โฅ ๐๐ โช and non-zero differences, like ๐ผ0: ๐๐ = ๐๐ + 85 COMPARING TWO UNRELATED SAMPLES
SLIDE 8
Context: โช sample ๐1: sales in ๐1 = 105 days without music โช sample ๐2: sales in ๐2 = 96 days with music General idea: โช เต ๐1~๐๐๐ก๐ข๐ ๐๐๐ฃ๐ข๐๐๐ ๐1 ๐2~๐๐๐ก๐ข๐ ๐๐๐ฃ๐ข๐๐๐ ๐2 ๐1 = ๐2? COMPARING TWO UNRELATED SAMPLES
SLIDE 9
Assumption (for now!): โช ๐~๐ ๐๐; ๐๐
2
โช ๐~๐ ๐๐; ๐๐
2
โช in words: both samples come from normally distributed populations with known variances Question โช are ๐๐ and ๐๐ different? โช can we test this, on the basis of the (limited!) evidence concerning าง ๐ฆ and เดค ๐ง? โช so, can we reject ๐ผ0: ๐๐ = ๐๐? To decide โช use เดค ๐ โ เดค ๐ ~๐ ๐ เดค
๐โ เดค ๐, ๐เดค ๐โ เดค ๐ 2
COMPARING THE MEANS OF TWO UNRELATED SAMPLES
So, the sampling distribution of the difference of means is normal
SLIDE 10
For one sample, we had เดค ๐ โ ๐ เดค
๐
๐ เดค
๐
~๐ 0,1 As it turns out, for two samples, we have เดค ๐ โ เดค ๐ โ ๐ เดค
๐ โ ๐ เดค ๐
๐ เดค
๐โ เดค ๐
~๐ 0,1
โช ๐ เดค
๐ โ ๐ เดค ๐ = ๐๐ โ ๐๐ follows from the null hypothesis
โช for instance ๐ผ0: ๐๐ = ๐๐ or ๐ผ0: ๐๐ โ ๐๐ = 85 โช าง ๐ฆ and เดค ๐ง are obtained from the data โช but what is ๐ เดค
๐โเดค ๐?
COMPARING THE MEANS OF TWO UNRELATED SAMPLES
SLIDE 11
For one sample, we had ๐เดค
๐ 2 = ๐๐ 2
๐ As it turns out, for two independent samples, we have ๐เดค
๐โ เดค ๐ 2
= ๐เดค
๐ 2 + ๐เดค ๐ 2, so
๐ เดค
๐โเดค ๐ =
๐๐
2
๐๐ + ๐๐
2
๐๐
โช recall that variances add up when ๐ and ๐ are independent โช e.g., ๐๐+๐
2
= ๐๐
2 + ๐๐ 2 but also ๐๐โ๐ 2
= ๐๐
2 + ๐๐ 2
COMPARING THE MEANS OF TWO UNRELATED SAMPLES
SLIDE 12
Example Context: โช do I sell more when there is music in my shop? Experiment โช on some days the music is turned on, on other days the music is turned off โช you keep track of the sales during each day Data: โช sample of sales on days with music (๐ฆ1, ๐ฆ2, โฆ , ๐ฆ105) โช sample of sales on days without music (๐ง1, ๐ง2, โฆ , ๐ง96) Five step procedure COMPARING THE MEANS OF TWO UNRELATED SAMPLES
SLIDE 13
โช Step 1:
โช ๐ผ0: ๐๐ = ๐๐; ๐ผ1: ๐๐ โ ๐๐; ๐ฝ = 0.05
โช Step 2:
โช sample statistic: เดค ๐ โ เดค ๐ โช reject for โtoo largeโ and โtoo smallโ values
โช Step 3:
โช null distribution
เดค ๐โเดค ๐ โ ๐๐โ๐๐ ๐เดฅ
๐โเดฅ ๐
=
เดค ๐โเดค ๐ ๐เดฅ
๐โเดฅ ๐
~๐ 0,1 โช valid because ...
โช Step 4:
โช ๐จ๐๐๐๐ = โช ๐จ๐๐ ๐๐ข =
โช Step 5:
โช reject or not reject because ...
COMPARING THE MEANS OF TWO UNRELATED SAMPLES
in a minute we will supply full details and a worked example ...
SLIDE 14
โช But, wait ...
โช ... isnโt it weird to assume that ๐๐
2 and ๐๐ 2 are known, while ๐๐
and ๐๐ are not known?
โช In reality the population variances will often be unknown as well!
โช remember we had the same problem in the one-sample case? โช there we decided to estimate the value of ๐2 with the value of ๐ก2 โช and paid a price of using the wider ๐ข-distribution โช here we will do the same: estimate the two ๐2-values with two ๐ก2-values โช and pay the same price: use ๐ข-dsitribution instead of ๐จ-distribution
COMPARING THE MEANS OF TWO UNRELATED SAMPLES
SLIDE 15
For one sample, we had เดค ๐ โ ๐ เดค
๐
๐ เดค
๐
~๐ขdf As it turns out, for two samples, we have เดค ๐ โ เดค ๐ โ ๐ เดค
๐ โ ๐ เดค ๐
๐ เดค
๐โ เดค ๐
~๐ขdf
โช ๐ เดค
๐ โ ๐ เดค ๐ = ๐๐ โ ๐๐ follows from the null hypothesis
โช าง ๐ฆ and เดค ๐ง are obtained from the data โช but what is ๐ก เดค
๐โเดค ๐?
โช and how to choose df?
COMPARING THE MEANS OF TWO UNRELATED SAMPLES
SLIDE 16
Two options for ๐ก เดค
๐โ เดค ๐:
โช 1: estimating ๐๐
2 and ๐๐ 2 from ๐ก๐ 2 and ๐ก๐ 2 respectively
โช 2: assuming ๐๐
2 = ๐๐ 2 = ๐2 and estimating ๐2 as the
weighted average of both sample variances Both options lead to a different value of df COMPARING THE MEANS OF TWO UNRELATED SAMPLES
SLIDE 17
Option 1: โช estimating ๐๐
2 and ๐๐ 2 from ๐ก๐ 2 and ๐ก๐ 2 respectively (Welchโs test)
๐ก เดค
๐โเดค ๐ =
๐ก๐
2
๐๐ + ๐ก๐
2
๐๐ โช testing with ๐ข-distribution with df = ๐ก๐
2
๐๐ + ๐ก๐
2
๐๐
2
๐ก๐
2
๐๐
2
๐๐ โ 1 + ๐ก๐
2
๐๐
2
๐๐ โ 1
COMPARING THE MEANS OF TWO UNRELATED SAMPLES
Compare to ๐ เดค
๐โเดค ๐ =
๐๐
2
๐๐ + ๐๐
2
๐๐ quick rule, but bad approximation: ๐๐ โ min ๐๐ โ 1, ๐๐ โ 1
SLIDE 18
Option 2: โช estimating the common ๐2 from both samples
โช a โweighted meanโ of ๐ก๐
2 and ๐ก๐ 2, the pooled variance ๐กP 2
๐กP
2 = ๐๐ โ 1 ๐ก๐ 2 + ๐๐ โ 1 ๐ก๐ 2
๐๐ โ 1 + ๐๐ โ 1 โช and ๐ก เดค
๐โ เดค ๐ =
๐กP
2
๐๐ + ๐กP
2
๐๐ โช testing with ๐ข-distribution with df = ๐๐ โ 1 + ๐๐ โ 1 = ๐๐ + ๐๐ โ 2
COMPARING THE MEANS OF TWO UNRELATED SAMPLES
Compare to ๐ก เดค
๐โเดค ๐ =
๐ก๐
2
๐๐ + ๐ก๐
2
๐๐ You can read this as
df๐๐ก๐
2+df๐๐ก๐ 2
df๐+df๐
You can read this as df๐ + df๐
SLIDE 19
COMPARING THE MEANS OF TWO UNRELATED SAMPLES
SLIDE 20
When to use:
- a. ๐จ =
าง ๐ฆโ เดค ๐ง
๐๐ 2 ๐๐+ ๐๐ 2 ๐๐
- b. ๐ข =
าง ๐ฆโ เดค ๐ง
๐ก๐ 2 ๐๐+ ๐ก๐ 2 ๐๐
- c. ๐ข =
าง ๐ฆโ เดค ๐ง
๐ก๐ 2 ๐๐+ ๐ก๐ 2 ๐๐
EXERCISE 1
SLIDE 21
Use of SPSS COMPARING THE MEANS OF TWO UNRELATED SAMPLES
a data set on Computer Anxiety Rating split by gender
SLIDE 22
Results split by gender Results of ๐ข-test COMPARING THE MEANS OF TWO UNRELATED SAMPLES
SLIDE 23
Zoom in COMPARING THE MEANS OF TWO UNRELATED SAMPLES
๐ข-test with pooled estimate of ๐๐
2 = ๐๐ 2
๐ข-test with separate estimates of ๐๐
2 and ๐๐ 2
value of the ๐ข-statistic (๐ขcalc) degrees of freedom ๐-value (2-sided)
SLIDE 24
And one more thing ... COMPARING THE MEANS OF TWO UNRELATED SAMPLES
tests of the assumption of equal variance ๐ผ0: ๐๐
2 = ๐๐ 2 versus ๐ผ1: ๐๐ 2 โ ๐๐ 2
๐-value for this test
SLIDE 25
For these two tests, we need both เดค ๐ and เดค ๐ to be normally distributed โช This means either of the following three requirements holds:
โช ๐ is a normally distributed population โช ๐ has a symmetric distribution and ๐๐ โฅ 15 โช ๐๐ โฅ 30
โช Also for ๐ one of these requirements must hold
โช but not necessarily the same one as for ๐
COMPARING THE MEANS OF TWO UNRELATED SAMPLES
SLIDE 26
- a. Suppose ๐ and ๐ are two distributions. We sample with
๐๐ = 10 and ๐๐ = 20 and we want to test ๐ผ0: ๐๐ = ๐๐. Do we need to make any assumption?
- b. Same as a, but now we know that ๐ and ๐ are
symmetrically distributed.
- c. Same as a, but now we know that ๐ and ๐ are normally
distributed EXERCISE 2
SLIDE 27
โช Recall the non-parametric one-sample test for the median
โช the Wilcoxon signed ranks test โช replacing the values by ranks and testing the sum of the positive ranks
โช Can we also develop a non-parametric (rank-order) order test for two unrelated samples? โช Yes we can: Wilcoxon-Mann-Whitney test
โช named after Frank Wilcoxon, Henry Mann, and Donald Whitney โช also named Wilcoxon (Mann-Whitney) test, Mann-Whitney test, etc. โช requirement: similar shape of distribution
COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES
SLIDE 28
โช Computational steps of the Wilcoxon-Mann-Whitney test
โช combine both samples (๐ and ๐) โช assign ranks to the combined sample โช ties get an average rank โช sum the ranks of both samples separately (๐
๐ and ๐๐)
โช compare the test statistic ๐
๐ (or ๐๐) to a critical value from the
table
COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES
SLIDE 29
Example (same as before) โช Sample data are collected on the capacity rates (in %) for two factories
โช factory A, the rates are 71, 82, 77, 94, 88 โช factory B, the rates are 85, 82, 92, 97
โช Are the median operating rates for two factories the same (at a significance level ๐ฝ = 0.05)? COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES
SLIDE 30
Example โช data A: ๐ฆ๐ (๐๐ = 5) โช data B: ๐ง๐ (๐๐ = 4) โช one case of ties (82) โช ๐๐ = 24.5 COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES
Capacity Rank
Factory A Factory B Factory A Factory B
71 1 77 2 82 3.5 82 3.5 85 5 88 6 92 7 94 8 97 9 Rank sums: 20.5 24.5 a tie: observations 3 and 4 are 82, so assign rank 3.5 to facilitate the discussion, we focus on the sample with the smallest sample size
SLIDE 31
Testing the Wilcoxon-Mann-Whitney ๐ statistic โช using a table of critical values
โช included in tables at exam
โช using a normal approximation
โช valid for large samples when Wilcoxon-Mann-Whitney table of critical values is not sufficient
COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES
SLIDE 32
Table of critical values of Wilcoxon statistic โช for ๐๐ฆ = ๐1 = 4 and ๐๐ง = ๐2 = 5 at ๐ฝ = 0.05:
โช ๐lower = 11, ๐
upper = 29
โช ๐crit = 0,11 โช [29,45]
COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES
Why 0 and 45? 0 because the sum
- f the ranks cannot be smaller than
- 0. 45 because the sum of the ranks
here cannot be larger than 45.
SLIDE 33
Conclusion from small sample Wilcoxon-Mann-Whitney test โช ๐๐ = 24.5 is between ๐lower = 11 and ๐
upper = 29
โช Therefore, do not reject the null hypothesis (๐ผ0: ๐๐ = ๐๐) at the 5% level โช There is not enough evidence to conclude that the medians are different COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES
SLIDE 34
Large sample approximation โช Under ๐ผ0, it can be shown that
โช ๐น ๐๐ =
๐๐ ๐๐+๐๐+1 2
โช var ๐
๐ = ๐๐๐๐ ๐๐+๐๐+1 12
โช Further, when ๐๐ โฅ 10 or ๐๐ โฅ 10, we use a normal approximation:
โช ๐๐~๐
๐๐ ๐๐+๐๐+1 2
,
๐๐๐๐ ๐๐+๐๐+1 12
โช ๐ =
๐๐โ๐๐ ๐๐+๐๐+1
2 ๐๐๐๐ ๐๐+๐๐+1 12
~๐ 0,1
COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES
SLIDE 35
Large sample approximation (continued) โช so you can compute ๐จcalc =
๐๐,calc โ ๐๐ ๐๐+๐๐+1
2 ๐๐๐๐ ๐๐+๐๐+1 12
โช and compare it to ๐จcrit (e.g., ยฑ1.96) COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES
SLIDE 36
Use of SPSS COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES
๐ = 345 ๐จ-score with normal approximation ๐-value (2-sided)
SLIDE 37
21 May 2015, Q2a OLD EXAM QUESTION
SLIDE 38