Business Statistics CONTENTS Hypotheses on the median The sign - - PowerPoint PPT Presentation
Business Statistics CONTENTS Hypotheses on the median The sign - - PowerPoint PPT Presentation
MEDIAN: NON-PARAMETRIC TESTS Business Statistics CONTENTS Hypotheses on the median The sign test The Wilcoxon signed ranks test Old exam question Further study HYPOTHESES ON THE MEDIAN The median is a central value that may be more
Hypotheses on the median The sign test The Wilcoxon signed ranks test Old exam question Further study CONTENTS
▪ The median is a central value that may be more suitable for strongly asymmetric distributions
▪ and for distributions with fat tails
▪ Can we test a population median?
▪ e.g., 𝐼0: 𝑁 = 400
▪ Note:
▪ for a more or less symmetric distribution, 𝑁 ≈ 𝜈, so a 𝑢-test of mean is appropriate (if 𝑜 ≥ 15) ▪ although perhaps more sensitive to large positive or negative
- utliers in the sample
HYPOTHESES ON THE MEDIAN
𝑁 is here the population
- median. Think of it as a
Greek letter ...
▪ What is the median of a sample?
▪ it is the middle value, i.e. 𝑦 𝑜/2
▪ So, if 𝐼0: 𝑁 = 400 would be true, approximately half of the data in the sample would be lower, and half would be higher ▪ Therefore, if we count the number of data points that is lower and compare it to the number of observations, we can develop a test statistic ▪ Two varieties of such non-parametric tests today:
▪ sign test ▪ Wilcoxon signed rank test
HYPOTHESES ON THE MEDIAN
The sign test ▪ involves simply counting the number of positive or negative signs in a sequence of 𝑜 signs ▪ is based on the binomial distribution ▪ can be applied without requirements on the population distribution THE SIGN TEST
Computational steps: ▪ for each data point 𝑦𝑗 compute the difference with the median (𝑁) of the null hypothesis (𝐼0): 𝑒𝑗 = 𝑦𝑗 − 𝑁 ▪ omit zero differences (𝑒𝑗 = 0); effective sample size is 𝑜′ ▪ assign +1 to positive differences (𝑒𝑗 > 0) and −1 to negative differences (𝑒𝑗 < 0) ▪ test statistic 𝑌 is the sum of the positive numbers (= number of positive observations) THE SIGN TEST
Example: Context: battery life until failure (in hours) ▪ 𝐼0: 𝑁 = 400; 𝐼1: 𝑁 ≠ 400 ▪ use 𝛽 = 0.05 ▪ sample of 𝑜 = 13 observations (𝑦1, … , 𝑦13) ▪ reject for large and for small numbers of positive signs THE SIGN TEST
Example (𝐼0: 𝑁 = 400): ▪ data: 𝑦𝑗 (𝑗 = 1, … , 13) ▪ difference with 𝑁: 𝑒𝑗 = 𝑦𝑗 − 400 ▪ no cases where 𝑒𝑗 = 0, so 𝑜′ = 𝑜 ▪ 𝑡𝑗 = ቊ 1 if 𝑒𝑗 > 0 −1 if 𝑒𝑗 < 0 ▪ 𝑡𝑗
+ = ቊ1
if 𝑒𝑗 > 0 if 𝑒𝑗 < 0 ▪ 𝑦 = σ𝑗=1
𝑜′ 𝑡𝑗 + = 8
THE SIGN TEST
xi xi-400 si si
(+)
342
- 58
- 1
426 26 1 1 317
- 83
- 1
545 145 1 1 264
- 136
- 1
451 51 1 1 1049 649 1 1 631 231 1 1 512 112 1 1 266
- 134
- 1
492 92 1 1 562 162 1 1 298
- 102
- 1
Example (continued): ▪ 𝑦 = 8 ▪ under 𝐼0: 𝑌~𝑐𝑗𝑜 13,0.5 ▪ 𝑄𝑐𝑗𝑜 13,0.5 𝑌 ≥ 8 = 0.291
▪ why ≥ 8? ▪ if we would reject for 8, we would also reject for 9
▪ 𝑞-value: 2 × 0.291 = 0.581
▪ why 2 ×? ▪ because it’s a two-sided null hypothesis
▪ there is no reason to reject 𝐼0 THE SIGN TEST
Suppose we have more observations (𝑜 = 130) and find 𝑦 = 80. Can you look up 𝑄𝑐𝑗𝑜 130,0.5 𝑌 ≥ 80 ? EXERCISE 1
In the sign test, we replace the numerical values by signs (+ or −) Advantage: ▪ we don’t need any assumption on normality, symmetry, etc.
▪ that’s why we say it’s non-parametric: we don’t have to assume a certain distribution with parameters
Disadvantage: ▪ we discard much information, so that the test is not very sensitive (has low “power”; see later) Are there other non-parametric tests that are more powerful? ▪ is there a compromise between value and sign that still needs some assumptions, but not too many assumptions? Yes, replacing data by their rank
THE SIGN TEST
Wilcoxon signed rank test ▪ involves comparing the sum of ranks of the values larger than the test value with the sum of ranks of the values smaller than the test value Computational Steps: ▪ for each data point 𝑦𝑗 compute the absolute difference with the median (𝑁) of the null hypothesis: 𝑒𝑗 = 𝑦𝑗 − 𝑁 ▪ omit zero differences (𝑒𝑗 = 0); effective sample size is 𝑜′ ▪ assign ranks (1, … , 𝑜′) to the 𝑒𝑗 ▪ reassign + and − to the ranks ▪ test statistic (𝑋) is the sum of the positive ranks THE WILCOXON SIGNED RANK TEST
Example (𝐼0: 𝑁 = 400): ▪ data: 𝑦𝑗 (𝑗 = 1, … , 13) ▪ difference with 𝑁: 𝑒𝑗 = 𝑦𝑗 − 400 ▪ no cases where 𝑒𝑗 = 0, so 𝑜′ = 𝑜 ▪ 𝑥 = σ𝑗=1
𝑜′ 𝑠 𝑗 + = 61
▪ under 𝐼0: 𝑋~? (use table) ▪ 𝑄𝐼0 𝑋 ≥ 61 =? THE WILCOXON SIGNED RANK TEST
xi xi– 400 |xi–400| ri ri
(+)
342
- 58
58
- 3
426 26 26 1 1 317
- 83
83
- 4
545 145 145 10 10 264
- 136
136
- 9
451 51 51 2 2 1049 649 649 13 13 631 231 231 12 12 512 112 112 7 7 266
- 134
134
- 8
492 92 92 5 5 562 162 162 11 11 298
- 102
102
- 6
Testing the median using the Wilcoxon 𝑋 statistic ▪ small samples: using a table of critical values
▪ included in tables at exam
▪ large samples: using a normal approximation of 𝑋
▪ valid when 𝑜 ≥ 20
▪ The test is only valid for symmetrically distributed populations
▪ if not, use sign test
THE WILCOXON SIGNED RANK TEST
Small samples: critical values of Wilcoxon statistic ▪ two-sided, 𝛽 = 0.05, 𝑜 = 13: 𝑥𝑚𝑝𝑥𝑓𝑠 = 17 and 𝑥𝑣𝑞𝑞𝑓𝑠 = 74 ▪ 𝑆crit = [0,17] ∪ [74,91] ▪ 𝑥calc = 61, so do not reject 𝐼0 at 𝛽 = 0.05
THE WILCOXON SIGNED RANK TEST
a = 0.05 a = 0.025 a = 0.01 a = 0.005 a = 0.10 a = 0.05 a = 0.02 a = 0.01 n
5 0 , 15
- -- , ---
- -- , ---
- -- , ---
6 2 , 19 0 , 21
- -- , ---
- -- , ---
7 3 , 25 2 , 26 0 , 28
- -- , ---
8 5 , 31 3 , 33 1 , 35 0 , 36 9 8 , 37 5 , 40 3 , 42 1 , 44 10 10 , 45 8 , 47 5 , 50 3 , 52 11 13 , 53 10 , 56 7 , 59 5 , 61 12 17 , 61 13 , 65 10 , 68 7 , 71 13 21 , 70 17 , 74 12 , 79 10 , 81
two-tail: (lower , upper)
Lower and Upper Critical Values W of Wilcoxon Signed-Ranks Test
- ne-tail:
Table is available at the exam (and on the course website)
Large samples: under 𝐼0:, it can be shown that ▪ 𝐹 𝑋 =
𝑜 𝑜+1 4
▪ var 𝑋 =
𝑜 𝑜+1 2𝑜+1 24
Further, for 𝑜 ≥ 20, approximately: ▪
𝑋−𝑜 𝑜+1
4 𝑜 𝑜+1 2𝑜+1 24
~𝑂 0,1 ▪ so you can compute 𝑨calc =
𝑥calc−𝑜 𝑜+1
4 𝑜 𝑜+1 2𝑜+1 24