SLIDE 1
Business Statistics CONTENTS Key questions Roadmaps for - - PowerPoint PPT Presentation
Business Statistics CONTENTS Key questions Roadmaps for - - PowerPoint PPT Presentation
CHOOSING THE RIGHT TEST Business Statistics CONTENTS Key questions Roadmaps for statistical tests A decision tree Old exam question Further study KEY QUESTIONS number of variables 1, 2, more than 2 number of subpopulations 1,
SLIDE 2
SLIDE 3
▪ number of variables
▪ 1, 2, more than 2
▪ number of subpopulations
▪ 1, 2, more than 2
▪ types of data
▪ numerical, categorical
▪ parameter to test
▪ centrality, dispersion, proportion, ...
▪ characteristics of the population
▪ normal, symmetric, ...
▪ paired/independent variables ▪ association vs. comparison KEY QUESTIONS
SLIDE 4
Some tests can be conceived in different ways ▪ Example:
▪ ANOVA: comparing 𝜈 of >2 numerical variables 𝑌1, 𝑌2, 𝑌3, ... ▪ or: association between categorical 𝑌 and numerical 𝑍
▪ Here we focus on the most usual approach KEY QUESTIONS
SLIDE 5
One sample ▪ centrality 𝐼0: 𝜈 = 𝜈0 or 𝐼0: 𝑁 = 𝑁0
▪ CLT-conditions and 𝜏 known: 𝑨-test ▪ CLT-conditions and 𝜏 unknown: 𝑢-test ▪ symmetric distribution: Wilcoxon signed ranks test ▪ sign test
▪ dispersion 𝐼0: 𝜏2 = 𝜏0
2
▪ normal population: 𝜓2-test
▪ proportion 𝐼0: 𝜌 = 𝜌0
▪ binomial test ▪ 𝑜𝜌 ≥ 5 and 𝑜 1 − 𝜌 ≥ 5: normal approximation
ROADMAPS FOR STATISTICAL TESTS
CLT-conditions: 𝑜 < 15: normal population 15 ≤ 𝑜 < 30: symmetric population 𝑜 ≥ 30: no restrictions
SLIDE 6
Two related (dependent) samples ▪ comparison of “similar” variables:
▪ convert into one-sample situation ▪ e.g., 𝐸 = 𝑌𝑏𝑔𝑢𝑓𝑠 − 𝑌𝑐𝑓𝑔𝑝𝑠𝑓 and 𝐼𝑝: 𝜈𝐸 = 0
▪ association between two “dissimilar” variables:
▪ two numerical variables: ▪ normal populations: correlation (𝑢-test) ▪ rank correlation (𝑨-test) ▪ normal error term: simple regression (𝐺-test and 𝑢-test)
ROADMAPS FOR STATISTICAL TESTS
SLIDE 7
Two independent subpopulations ▪ centrality 𝐼0: 𝜈1 = 𝜈2 or 𝐼0: 𝑁1 = 𝑁2
▪ CLT-conditions and 𝜏1 and 𝜏2 known: 𝑨-test ▪ CLT-conditions and 𝜏1 = 𝜏2 unknown: 𝑢-test ▪ CLT-conditions and 𝜏1 and 𝜏2 unknown but not necessarily equal: 𝑢-test ▪ distribution equally-shaped: Wilcoxon-Mann-Whitney test
▪ dispersion 𝐼0: 𝜏1
2 = 𝜏2 2
▪ normal populations: 𝐺-test ▪ Levene’s test
▪ proportion 𝐼0: 𝜌1 = 𝜌2
▪ 𝑨-test
ROADMAPS FOR STATISTICAL TESTS
SLIDE 8
More than two independent subpopulations ▪ centrality 𝐼0: 𝜈1 = 𝜈2 = 𝜈3 or 𝐼0: 𝑁1 = 𝑁2 = 𝑁3
▪ normal populations and equal variances: ANOVA ▪ equally-shaped distributions and group sizes>5: Kruskal-Wallis test
▪ independence of two categorical variables
▪ expected count ≥ 5: 𝜓2-test on contingency table
▪ dispersion 𝐼0: 𝜏1
2 = 𝜏2 2 = 𝜏3 2
▪ Levene’s test
ROADMAPS FOR STATISTICAL TESTS
SLIDE 9
More than two related samples ▪ dependence of one numerical variable on several other numerical variables:
▪ normal error term and linear relation: multiple regression (𝐺-test and 𝑢-test)
▪ dependence of one numerical variable on several other categorical variables:
▪ normal error term: multiple regression with dummy variables (𝐺- test and 𝑢-test)
ROADMAPS FOR STATISTICAL TESTS
SLIDE 10
Which test to use when:
- a. comparing the mean of the income of men and women?
- b. comparing the variance of the income of men and women?
- c. the relation between the color of a car and its probability of
being involved in an accident?
- d. the relation between the color of a car and the gender of its
- wner?
- e. the relation between the mean of the income and ethnicity
(black, white, Asian)?
- f. the relation between income and IQ?
EXERCISE 1
SLIDE 11
A DECISION TREE
SLIDE 12
A DECISION TREE
SLIDE 13
A DECISION TREE
SLIDE 14
A DECISION TREE
SLIDE 15
A DECISION TREE
SLIDE 16
21 May 2015, Q2d OLD EXAM QUESTION
SLIDE 17