Business Statistics CONTENTS Comparing two samples Comparing two - - PowerPoint PPT Presentation

โ–ถ
business statistics
SMART_READER_LITE
LIVE PREVIEW

Business Statistics CONTENTS Comparing two samples Comparing two - - PowerPoint PPT Presentation

TWO S OR MEDIANS: COMPARISONS Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing the means of two unrelated samples Comparing the medians of two unrelated samples Old exam question Further study


slide-1
SLIDE 1

TWO ๐œˆS OR MEDIANS: COMPARISONS

Business Statistics

slide-2
SLIDE 2

Comparing two samples Comparing two unrelated samples Comparing the means of two unrelated samples Comparing the medians of two unrelated samples Old exam question Further study CONTENTS

slide-3
SLIDE 3

It often happens that we want to compare two situations โ–ช do I sell more when there is music in my shop? โ–ช is the expensive machine more precise than the cheap

  • ne?

โ–ช are adverisements on TV or internet equally profitable? โ–ช do people buy more on Tuesdays than on Wednesday? โ–ช in couples, who drinks more: the man or the woman? โ–ช etc. COMPARING TWO SAMPLES

slide-4
SLIDE 4

In all these questions we compare two populations โ–ช Situation 1: two populations (or sub-populations) with similar variable

โ–ช sales in 105 days without music โ–ช sales in 96 days with music

โ–ช Data matrix: two options COMPARING TWO SAMPLES

SPSS requires this data presentation

slide-5
SLIDE 5

โ–ช Situation 2: one sample with paired observations

โ–ช drinks of the man in 78 couples โ–ช drinks of the woman in the same 78 couples

โ–ช Data matrix: one option only โ–ช Will be discussed in a later lecture COMPARING TWO SAMPLES

slide-6
SLIDE 6

Situation 1 โ–ช independent samples/unrelated samples โ–ช introduce symbols for the two random variables โ–ช e.g., using ๐‘Œ1 en ๐‘Œ2

โ–ช ๐‘Œ1 with sample ๐‘Œ1,1, ๐‘Œ1,2, โ€ฆ , ๐‘Œ1,๐‘œ1 and ๐‘Œ2 with sample ๐‘Œ2,1, ๐‘Œ2,2, โ€ฆ , ๐‘Œ2,๐‘œ2

โ–ช or using ๐‘Œ and ๐‘

โ–ช ๐‘Œ: ๐‘Œ1, ๐‘Œ2, โ€ฆ , ๐‘Œ๐‘œ๐‘Œ and ๐‘: ๐‘

1, ๐‘ 2, โ€ฆ , ๐‘ ๐‘œ๐‘

โ–ช sample sizes can be different COMPARING TWO UNRELATED SAMPLES

Or of course using โ€œmeaningfulโ€ indices: ๐‘Œ๐ถ and ๐‘Œ๐ป for Belgium and Germany. Not ๐ถ and ๐ป, because we need to stress that it is โ€œaboutโ€ a variable ๐‘Œ (like sales)

slide-7
SLIDE 7

We want to test hypothesis such as โ–ช are the means equal?

โ–ช ๐ผ0: ๐œˆ๐‘Œ = ๐œˆ๐‘ or ๐ผ0: ๐œˆ1 = ๐œˆ2 or ๐ผ0: ๐œˆ๐‘Œ1 = ๐œˆ๐‘Œ2 or ...

โ–ช are the variances equal?

โ–ช ๐ผ0: ๐œ๐‘Œ

2 = ๐œ๐‘ 2 or etc.

โ–ช are the proportions equal

โ–ช ๐ผ0: ๐œŒ๐‘Œ = ๐œŒ๐‘ or etc.

Also: โ–ช inequalities, like ๐ผ0: ๐œˆ๐‘Œ โ‰ฅ ๐œˆ๐‘ โ–ช and non-zero differences, like ๐ผ0: ๐œˆ๐‘Œ = ๐œˆ๐‘ + 85 COMPARING TWO UNRELATED SAMPLES

slide-8
SLIDE 8

Context: โ–ช sample ๐‘Œ1: sales in ๐‘œ1 = 105 days without music โ–ช sample ๐‘Œ2: sales in ๐‘œ2 = 96 days with music General idea: โ–ช เต  ๐‘Œ1~๐‘’๐‘—๐‘ก๐‘ข๐‘ ๐‘—๐‘๐‘ฃ๐‘ข๐‘—๐‘๐‘œ ๐œ„1 ๐‘Œ2~๐‘’๐‘—๐‘ก๐‘ข๐‘ ๐‘—๐‘๐‘ฃ๐‘ข๐‘—๐‘๐‘œ ๐œ„2 ๐œ„1 = ๐œ„2? COMPARING TWO UNRELATED SAMPLES

slide-9
SLIDE 9

Assumption (for now!): โ–ช ๐‘Œ~๐‘‚ ๐œˆ๐‘Œ; ๐œ๐‘Œ

2

โ–ช ๐‘~๐‘‚ ๐œˆ๐‘; ๐œ๐‘

2

โ–ช in words: both samples come from normally distributed populations with known variances Question โ–ช are ๐œˆ๐‘Œ and ๐œˆ๐‘ different? โ–ช can we test this, on the basis of the (limited!) evidence concerning าง ๐‘ฆ and เดค ๐‘ง? โ–ช so, can we reject ๐ผ0: ๐œˆ๐‘Œ = ๐œˆ๐‘? To decide โ–ช use เดค ๐‘Œ โˆ’ เดค ๐‘ ~๐‘‚ ๐œˆ เดค

๐‘Œโˆ’ เดค ๐‘, ๐œเดค ๐‘Œโˆ’ เดค ๐‘ 2

COMPARING THE MEANS OF TWO UNRELATED SAMPLES

So, the sampling distribution of the difference of means is normal

slide-10
SLIDE 10

For one sample, we had เดค ๐‘Œ โˆ’ ๐œˆ เดค

๐‘Œ

๐œ เดค

๐‘Œ

~๐‘‚ 0,1 As it turns out, for two samples, we have เดค ๐‘Œ โˆ’ เดค ๐‘ โˆ’ ๐œˆ เดค

๐‘Œ โˆ’ ๐œˆ เดค ๐‘

๐œ เดค

๐‘Œโˆ’ เดค ๐‘

~๐‘‚ 0,1

โ–ช ๐œˆ เดค

๐‘Œ โˆ’ ๐œˆ เดค ๐‘ = ๐œˆ๐‘Œ โˆ’ ๐œˆ๐‘ follows from the null hypothesis

โ–ช for instance ๐ผ0: ๐œˆ๐‘Œ = ๐œˆ๐‘ or ๐ผ0: ๐œˆ๐‘Œ โˆ’ ๐œˆ๐‘ = 85 โ–ช าง ๐‘ฆ and เดค ๐‘ง are obtained from the data โ–ช but what is ๐œ เดค

๐‘Œโˆ’เดค ๐‘?

COMPARING THE MEANS OF TWO UNRELATED SAMPLES

slide-11
SLIDE 11

For one sample, we had ๐œเดค

๐‘Œ 2 = ๐œ๐‘Œ 2

๐‘œ As it turns out, for two independent samples, we have ๐œเดค

๐‘Œโˆ’ เดค ๐‘ 2

= ๐œเดค

๐‘Œ 2 + ๐œเดค ๐‘ 2, so

๐œ เดค

๐‘Œโˆ’เดค ๐‘ =

๐œ๐‘Œ

2

๐‘œ๐‘Œ + ๐œ๐‘

2

๐‘œ๐‘

โ–ช recall that variances add up when ๐‘Œ and ๐‘ are independent โ–ช e.g., ๐œ๐‘Œ+๐‘

2

= ๐œ๐‘Œ

2 + ๐œ๐‘ 2 but also ๐œ๐‘Œโˆ’๐‘ 2

= ๐œ๐‘Œ

2 + ๐œ๐‘ 2

COMPARING THE MEANS OF TWO UNRELATED SAMPLES

slide-12
SLIDE 12

Example Context: โ–ช do I sell more when there is music in my shop? Experiment โ–ช on some days the music is turned on, on other days the music is turned off โ–ช you keep track of the sales during each day Data: โ–ช sample of sales on days with music (๐‘ฆ1, ๐‘ฆ2, โ€ฆ , ๐‘ฆ105) โ–ช sample of sales on days without music (๐‘ง1, ๐‘ง2, โ€ฆ , ๐‘ง96) Five step procedure COMPARING THE MEANS OF TWO UNRELATED SAMPLES

slide-13
SLIDE 13

โ–ช Step 1:

โ–ช ๐ผ0: ๐œˆ๐‘Œ = ๐œˆ๐‘; ๐ผ1: ๐œˆ๐‘Œ โ‰  ๐œˆ๐‘; ๐›ฝ = 0.05

โ–ช Step 2:

โ–ช sample statistic: เดค ๐‘Œ โˆ’ เดค ๐‘ โ–ช reject for โ€œtoo largeโ€ and โ€œtoo smallโ€ values

โ–ช Step 3:

โ–ช null distribution

เดค ๐‘Œโˆ’เดค ๐‘ โˆ’ ๐œˆ๐‘Œโˆ’๐œˆ๐‘ ๐œเดฅ

๐‘Œโˆ’เดฅ ๐‘

=

เดค ๐‘Œโˆ’เดค ๐‘ ๐œเดฅ

๐‘Œโˆ’เดฅ ๐‘

~๐‘‚ 0,1 โ–ช valid because ...

โ–ช Step 4:

โ–ช ๐‘จ๐‘‘๐‘๐‘š๐‘‘ = โ–ช ๐‘จ๐‘‘๐‘ ๐‘—๐‘ข =

โ–ช Step 5:

โ–ช reject or not reject because ...

COMPARING THE MEANS OF TWO UNRELATED SAMPLES

in a minute we will supply full details and a worked example ...

slide-14
SLIDE 14

โ–ช But, wait ...

โ–ช ... isnโ€™t it weird to assume that ๐œ๐‘Œ

2 and ๐œ๐‘ 2 are known, while ๐œˆ๐‘Œ

and ๐œˆ๐‘ are not known?

โ–ช In reality the population variances will often be unknown as well!

โ–ช remember we had the same problem in the one-sample case? โ–ช there we decided to estimate the value of ๐œ2 with the value of ๐‘ก2 โ–ช and paid a price of using the wider ๐‘ข-distribution โ–ช here we will do the same: estimate the two ๐œ2-values with two ๐‘ก2-values โ–ช and pay the same price: use ๐‘ข-dsitribution instead of ๐‘จ-distribution

COMPARING THE MEANS OF TWO UNRELATED SAMPLES

slide-15
SLIDE 15

For one sample, we had เดค ๐‘Œ โˆ’ ๐œˆ เดค

๐‘Œ

๐‘‡ เดค

๐‘Œ

~๐‘ขdf As it turns out, for two samples, we have เดค ๐‘Œ โˆ’ เดค ๐‘ โˆ’ ๐œˆ เดค

๐‘Œ โˆ’ ๐œˆ เดค ๐‘

๐‘‡ เดค

๐‘Œโˆ’ เดค ๐‘

~๐‘ขdf

โ–ช ๐œˆ เดค

๐‘Œ โˆ’ ๐œˆ เดค ๐‘ = ๐œˆ๐‘Œ โˆ’ ๐œˆ๐‘ follows from the null hypothesis

โ–ช าง ๐‘ฆ and เดค ๐‘ง are obtained from the data โ–ช but what is ๐‘ก เดค

๐‘Œโˆ’เดค ๐‘?

โ–ช and how to choose df?

COMPARING THE MEANS OF TWO UNRELATED SAMPLES

slide-16
SLIDE 16

Two options for ๐‘ก เดค

๐‘Œโˆ’ เดค ๐‘:

โ–ช 1: estimating ๐œ๐‘Œ

2 and ๐œ๐‘ 2 from ๐‘ก๐‘Œ 2 and ๐‘ก๐‘ 2 respectively

โ–ช 2: assuming ๐œ๐‘Œ

2 = ๐œ๐‘ 2 = ๐œ2 and estimating ๐œ2 as the

weighted average of both sample variances Both options lead to a different value of df COMPARING THE MEANS OF TWO UNRELATED SAMPLES

slide-17
SLIDE 17

Option 1: โ–ช estimating ๐œ๐‘Œ

2 and ๐œ๐‘ 2 from ๐‘ก๐‘Œ 2 and ๐‘ก๐‘ 2 respectively (Welchโ€™s test)

๐‘ก เดค

๐‘Œโˆ’เดค ๐‘ =

๐‘ก๐‘Œ

2

๐‘œ๐‘Œ + ๐‘ก๐‘

2

๐‘œ๐‘ โ–ช testing with ๐‘ข-distribution with df = ๐‘ก๐‘Œ

2

๐‘œ๐‘Œ + ๐‘ก๐‘

2

๐‘œ๐‘

2

๐‘ก๐‘Œ

2

๐‘œ๐‘Œ

2

๐‘œ๐‘Œ โˆ’ 1 + ๐‘ก๐‘

2

๐‘œ๐‘

2

๐‘œ๐‘ โˆ’ 1

COMPARING THE MEANS OF TWO UNRELATED SAMPLES

Compare to ๐œ เดค

๐‘Œโˆ’เดค ๐‘ =

๐œ๐‘Œ

2

๐‘œ๐‘Œ + ๐œ๐‘

2

๐‘œ๐‘ quick rule, but bad approximation: ๐‘’๐‘” โ‰ˆ min ๐‘œ๐‘Œ โˆ’ 1, ๐‘œ๐‘ โˆ’ 1

slide-18
SLIDE 18

Option 2: โ–ช estimating the common ๐œ2 from both samples

โ–ช a โ€œweighted meanโ€ of ๐‘ก๐‘Œ

2 and ๐‘ก๐‘ 2, the pooled variance ๐‘กP 2

๐‘กP

2 = ๐‘œ๐‘Œ โˆ’ 1 ๐‘ก๐‘Œ 2 + ๐‘œ๐‘ โˆ’ 1 ๐‘ก๐‘ 2

๐‘œ๐‘Œ โˆ’ 1 + ๐‘œ๐‘ โˆ’ 1 โ–ช and ๐‘ก เดค

๐‘Œโˆ’ เดค ๐‘ =

๐‘กP

2

๐‘œ๐‘Œ + ๐‘กP

2

๐‘œ๐‘ โ–ช testing with ๐‘ข-distribution with df = ๐‘œ๐‘Œ โˆ’ 1 + ๐‘œ๐‘ โˆ’ 1 = ๐‘œ๐‘Œ + ๐‘œ๐‘ โˆ’ 2

COMPARING THE MEANS OF TWO UNRELATED SAMPLES

Compare to ๐‘ก เดค

๐‘Œโˆ’เดค ๐‘ =

๐‘ก๐‘Œ

2

๐‘œ๐‘Œ + ๐‘ก๐‘

2

๐‘œ๐‘ You can read this as

df๐‘Œ๐‘ก๐‘Œ

2+df๐‘๐‘ก๐‘ 2

df๐‘Œ+df๐‘

You can read this as df๐‘Œ + df๐‘

slide-19
SLIDE 19

COMPARING THE MEANS OF TWO UNRELATED SAMPLES

slide-20
SLIDE 20

When to use:

  • a. ๐‘จ =

าง ๐‘ฆโˆ’ เดค ๐‘ง

๐œ๐‘Œ 2 ๐‘œ๐‘Œ+ ๐œ๐‘ 2 ๐‘œ๐‘

  • b. ๐‘ข =

าง ๐‘ฆโˆ’ เดค ๐‘ง

๐‘ก๐‘Œ 2 ๐‘œ๐‘Œ+ ๐‘ก๐‘ 2 ๐‘œ๐‘

  • c. ๐‘ข =

าง ๐‘ฆโˆ’ เดค ๐‘ง

๐‘ก๐‘„ 2 ๐‘œ๐‘Œ+ ๐‘ก๐‘„ 2 ๐‘œ๐‘

EXERCISE 1

slide-21
SLIDE 21

Use of SPSS COMPARING THE MEANS OF TWO UNRELATED SAMPLES

a data set on Computer Anxiety Rating split by gender

slide-22
SLIDE 22

Results split by gender Results of ๐‘ข-test COMPARING THE MEANS OF TWO UNRELATED SAMPLES

slide-23
SLIDE 23

Zoom in COMPARING THE MEANS OF TWO UNRELATED SAMPLES

๐‘ข-test with pooled estimate of ๐œ๐‘Œ

2 = ๐œ๐‘ 2

๐‘ข-test with separate estimates of ๐œ๐‘Œ

2 and ๐œ๐‘ 2

value of the ๐‘ข-statistic (๐‘ขcalc) degrees of freedom ๐‘ž-value (2-sided)

slide-24
SLIDE 24

And one more thing ... COMPARING THE MEANS OF TWO UNRELATED SAMPLES

tests of the assumption of equal variance ๐ผ0: ๐œ๐‘Œ

2 = ๐œ๐‘ 2 versus ๐ผ1: ๐œ๐‘Œ 2 โ‰  ๐œ๐‘ 2

๐‘ž-value for this test

slide-25
SLIDE 25

For these two tests, we need both เดค ๐‘Œ and เดค ๐‘ to be normally distributed โ–ช This means either of the following three requirements holds:

โ–ช ๐‘Œ is a normally distributed population โ–ช ๐‘Œ has a symmetric distribution and ๐‘œ๐‘Œ โ‰ฅ 15 โ–ช ๐‘œ๐‘Œ โ‰ฅ 30

โ–ช Also for ๐‘ one of these requirements must hold

โ–ช but not necessarily the same one as for ๐‘Œ

COMPARING THE MEANS OF TWO UNRELATED SAMPLES

slide-26
SLIDE 26
  • a. Suppose ๐‘Œ and ๐‘ are two distributions. We sample with

๐‘œ๐‘Œ = 10 and ๐‘œ๐‘ = 20 and we want to test ๐ผ0: ๐œˆ๐‘Œ = ๐œˆ๐‘. Do we need to make any assumption?

  • b. Same as a, but now we know that ๐‘Œ and ๐‘ are

symmetrically distributed.

  • c. Same as a, but now we know that ๐‘Œ and ๐‘ are normally

distributed EXERCISE 2

slide-27
SLIDE 27

โ–ช Recall the non-parametric one-sample test for the median

โ–ช the Wilcoxon signed ranks test โ–ช replacing the values by ranks and testing the sum of the positive ranks

โ–ช Can we also develop a non-parametric (rank-order) order test for two unrelated samples? โ–ช Yes we can: Wilcoxon-Mann-Whitney test

โ–ช named after Frank Wilcoxon, Henry Mann, and Donald Whitney โ–ช also named Wilcoxon (Mann-Whitney) test, Mann-Whitney test, etc. โ–ช requirement: similar shape of distribution

COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES

slide-28
SLIDE 28

โ–ช Computational steps of the Wilcoxon-Mann-Whitney test

โ–ช combine both samples (๐‘Œ and ๐‘) โ–ช assign ranks to the combined sample โ–ช ties get an average rank โ–ช sum the ranks of both samples separately (๐‘ˆ

๐‘Œ and ๐‘ˆ๐‘)

โ–ช compare the test statistic ๐‘ˆ

๐‘Œ (or ๐‘ˆ๐‘) to a critical value from the

table

COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES

slide-29
SLIDE 29

Example (same as before) โ–ช Sample data are collected on the capacity rates (in %) for two factories

โ–ช factory A, the rates are 71, 82, 77, 94, 88 โ–ช factory B, the rates are 85, 82, 92, 97

โ–ช Are the median operating rates for two factories the same (at a significance level ๐›ฝ = 0.05)? COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES

slide-30
SLIDE 30

Example โ–ช data A: ๐‘ฆ๐‘— (๐‘œ๐‘Œ = 5) โ–ช data B: ๐‘ง๐‘— (๐‘œ๐‘ = 4) โ–ช one case of ties (82) โ–ช ๐‘ˆ๐‘ = 24.5 COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES

Capacity Rank

Factory A Factory B Factory A Factory B

71 1 77 2 82 3.5 82 3.5 85 5 88 6 92 7 94 8 97 9 Rank sums: 20.5 24.5 a tie: observations 3 and 4 are 82, so assign rank 3.5 to facilitate the discussion, we focus on the sample with the smallest sample size

slide-31
SLIDE 31

Testing the Wilcoxon-Mann-Whitney ๐‘ˆ statistic โ–ช using a table of critical values

โ–ช included in tables at exam

โ–ช using a normal approximation

โ–ช valid for large samples when Wilcoxon-Mann-Whitney table of critical values is not sufficient

COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES

slide-32
SLIDE 32

Table of critical values of Wilcoxon statistic โ–ช for ๐‘œ๐‘ฆ = ๐‘œ1 = 4 and ๐‘œ๐‘ง = ๐‘œ2 = 5 at ๐›ฝ = 0.05:

โ–ช ๐‘ˆlower = 11, ๐‘ˆ

upper = 29

โ–ช ๐‘†crit = 0,11 โˆช [29,45]

COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES

Why 0 and 45? 0 because the sum

  • f the ranks cannot be smaller than
  • 0. 45 because the sum of the ranks

here cannot be larger than 45.

slide-33
SLIDE 33

Conclusion from small sample Wilcoxon-Mann-Whitney test โ–ช ๐‘ˆ๐‘ = 24.5 is between ๐‘ˆlower = 11 and ๐‘ˆ

upper = 29

โ–ช Therefore, do not reject the null hypothesis (๐ผ0: ๐‘๐‘Œ = ๐‘๐‘) at the 5% level โ–ช There is not enough evidence to conclude that the medians are different COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES

slide-34
SLIDE 34

Large sample approximation โ–ช Under ๐ผ0, it can be shown that

โ–ช ๐น ๐‘ˆ๐‘ =

๐‘œ๐‘ ๐‘œ๐‘Œ+๐‘œ๐‘+1 2

โ–ช var ๐‘ˆ

๐‘ = ๐‘œ๐‘Œ๐‘œ๐‘ ๐‘œ๐‘Œ+๐‘œ๐‘+1 12

โ–ช Further, when ๐‘œ๐‘Œ โ‰ฅ 10 or ๐‘œ๐‘ โ‰ฅ 10, we use a normal approximation:

โ–ช ๐‘ˆ๐‘~๐‘‚

๐‘œ๐‘Œ ๐‘œ๐‘Œ+๐‘œ๐‘+1 2

,

๐‘œ๐‘Œ๐‘œ๐‘ ๐‘œ๐‘Œ+๐‘œ๐‘+1 12

โ–ช ๐‘Ž =

๐‘ˆ๐‘โˆ’๐‘œ๐‘ ๐‘œ๐‘Œ+๐‘œ๐‘+1

2 ๐‘œ๐‘Œ๐‘œ๐‘ ๐‘œ๐‘Œ+๐‘œ๐‘+1 12

~๐‘‚ 0,1

COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES

slide-35
SLIDE 35

Large sample approximation (continued) โ–ช so you can compute ๐‘จcalc =

๐‘ˆ๐‘,calc โˆ’ ๐‘œ๐‘ ๐‘œ๐‘Œ+๐‘œ๐‘+1

2 ๐‘œ๐‘Œ๐‘œ๐‘ ๐‘œ๐‘Œ+๐‘œ๐‘+1 12

โ–ช and compare it to ๐‘จcrit (e.g., ยฑ1.96) COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES

slide-36
SLIDE 36

Use of SPSS COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES

๐‘ˆ = 345 ๐‘จ-score with normal approximation ๐‘ž-value (2-sided)

slide-37
SLIDE 37

21 May 2015, Q2a OLD EXAM QUESTION

slide-38
SLIDE 38

Doane & Seward 5/E 10.1-10.3, 16.4 extra document โ€œWilcoxon Mann-Whitney testโ€ Tutorial exercises week 3 ๐‘จ-test (known variance) t-test (pooled variance) t-test (separate variance estimates) Wilcoxon Mann-Whitney test FURTHER STUDY