SLIDE 1
Business Statistics CONTENTS Back to the promise Back to the - - PowerPoint PPT Presentation
Business Statistics CONTENTS Back to the promise Back to the - - PowerPoint PPT Presentation
MISCELLANEOUS TOPICS Business Statistics CONTENTS Back to the promise Back to the learning goals Standardizing data Assessing normality and symmetry Dealing with gaps in the tables The asterisk notation Review of distributions Old exam
SLIDE 2
SLIDE 3
Regression analysis −0.12 −2.11 ∗ 𝑞 < .05 BACK TO THE PROMISE
SLIDE 4
Wilcoxon test paired samples significant BACK TO THE PROMISE
SLIDE 5
Academic skills ▪ abstraction Research skills ▪ translating Quantitative skills ▪ a variety of methods Knowledge ▪ reading and writing statistics BACK TO THE LEARNING OBJECTIVES
SLIDE 6
We have often standardized a test statistic ▪ example: ത 𝑌 →
ത 𝑌−𝜈ഥ
𝑌
𝜏ഥ
𝑌
But one also frequently encounters standardized data ▪ Standardization of data is done by subtracting the mean and dividing by the standard deviation ▪ So: 𝑦𝑗 → 𝑨𝑗 = 𝑦𝑗 − ത 𝑌 𝑡𝑌 STANDARDIZING DATA
Not only for data from a normal population, but it is
- ften done for all sorts of data
SLIDE 7
Some properties: ▪ ҧ 𝑎 = 0 (the mean of standardized data is 0) ▪ 𝑡𝑎 = 𝑡𝑎
2 = 1 (the variance and the standard deviation of
standardized data is 1) ▪ standardized data is dimensionless (has no unit) Interpretation ▪ each value 𝑨𝑗 =
𝑦𝑗− ҧ 𝑦 𝑡𝑦 measure how many standard
deviations that value is removed from the mean ▪ examples:
▪ −2.5 is far in the left tail ▪ 0.2 is pretty central, a bit to the right of the mean
STANDARDIZING DATA
SLIDE 8
Which statements are true?
- a. 𝑨-scores can be made for numerical and categorical
variables
- b. 𝑨-scores has a skewness and kurtosis of 0
- c. 𝑨-scores are normally distributed when 𝑜 ≥ 30
- d. A 𝑨-score of 3.2 means pretty high compared to most other
data points EXERCISE 1
SLIDE 9
In many cases we need to make assumptions on “normal populations” or “symmetric populations” Is there a way to assess this? Qualitatively: ▪ histograms ▪ box plots Quantitatively: ▪ skewness ▪ kurtosis ASSESSING NORMALITY AND SYMMETRY
Not really helpful to judge normality for small sample sizes No formal test, but rules
- f thumb; see next slide
SLIDE 10
Practical rules of thumb that work more or less Normality: ▪ −1 ≤ skewness ≤ 1 ▪ −1 ≤ kurtosis ≤ 1 Symmetry: ▪ −1 ≤ skewness ≤ 1 ASSESSING NORMALITY AND SYMMETRY
Statistics sales 18 3,956 2,0893 ,716 ,536
- ,297
1,038 N Mean
- Std. Deviation
Skewness
- Std. Error of Skewness
Kurtosis
- Std. Error of Kurtosis
SLIDE 11
Try to look up 𝑄 𝑎 ≤ 0.402 ▪ table gives 𝑄 𝑎 ≤ 0.410 = 0.6591 ▪ and 𝑄 𝑎 ≤ 0.400 = 0.6554 ▪ but not 𝑄 𝑎 ≤ 0.402 DEALING WITH GAPS IN THE TABLES
SLIDE 12
Three solutions ▪ take nearest value
▪ 𝑄 𝑎 ≤ 0.402 ≈ 𝑄 𝑎 ≤ 0.400 = 0.6554
▪ make a linear interpolation
▪ 𝑄 𝑎 ≤ 0.402 ≈ 0.8𝑄 𝑎 ≤ 0.400 + 0.2𝑄 𝑎 ≤ 0.410 = 𝑄 𝑎 ≤ 0.400 +
2 10 0.6591 − 0.6554 = 0.65614
▪ use a conservative value
▪ either 𝑄 𝑎 ≤ 0.402 ≈ 𝑄 𝑎 ≤ 0.400 = 0.6554 ▪ or 𝑄 𝑎 ≤ 0.402 ≈ 𝑄 𝑎 ≤ 0.410 = 0.6591 ▪ depends on use (confidence interval, type of critical value)
Unless specified, we leave it up to you ▪ the difference are tiny anyhow DEALING WITH GAPS IN THE TABLES
SLIDE 13
Another type of gap: degrees of freedom Example: 𝑢𝑑𝑠𝑗𝑢 df = 68 ▪ recommendation: conservative, so use df = 65 DEALING WITH GAPS IN THE TABLES
“conservative” sometimes means round up,wards sometimes round downwards
SLIDE 14
Recall the introductory slides ▪ “ −.12 −2.11 ∗ ” ▪ what does that mean? THE ASTERISK NOTATION
SLIDE 15
In many journal articles in business and economics, regression models are used ▪ Every regression coefficient has
▪ an estimated value (𝑐1, etc) ▪ a standard error of the estimate (𝑡𝐶1, etc) ▪ a 𝑢-value based on 𝐼0: 𝛾1 = 0, etc. (𝑢𝑑𝑏𝑚𝑑 =
𝑐1−0 𝑡𝐶1 , etc.)
▪ a 𝑞-value for this: 𝑞−value = 𝑄 𝑢𝑑𝑏𝑚𝑑 ≥ 𝑢𝑑𝑠𝑗𝑢
THE ASTERISK NOTATION
SLIDE 16
In a journal we would need to report most of these ▪ This gives long sentences: “The estimated coefficient for uniqueness is 𝑐 = −.12, with a 𝑢-value of −2.11, giving a 𝑞-value between 0.01 and 0.05.” ▪ Therefore, this is often abbreviated: “−.12 −2.11 ∗” ▪ Usual conventions with the asterisks:
▪ * means 0.01 ≤ 𝑞−value < 0.05 ▪ ** means 0.001 ≤ 𝑞−value < 0.01 ▪ *** means 𝑞−value < 0.001
THE ASTERISK NOTATION
SLIDE 17
𝑎 statistic: 𝑎 =
𝑈−𝜈𝑈 𝜏𝑈 ~𝑂 0,1
▪ Used for testing the following parameters/hypotheses:
▪ 𝜈 = 𝜈0, when 𝜏2 is known (𝑜 < 15; 15 ≤ 𝑜 < 30; 𝑜 ≥ 30) ▪ 𝑁 = 𝑁0, (through Wilcoxon’s 𝑋) (when 𝑜 ≥ 20) ▪ 𝜈𝑌 − 𝜈𝑍 = 𝜈0, when 𝜏𝑌
2 and 𝜏𝑍 2 are known (𝑜 < 15; 15 ≤
𝑜 < 30; 𝑜 ≥ 30) ▪ 𝜌 = 𝜌0, (when 𝑜𝜌 ≥ 5 and 𝑜 1 − 𝜌 ≥ 5) ▪ 𝜌𝑌 − 𝜌𝑍 = 𝜌0, (when 𝑜𝜌 ≥ 5 and 𝑜 1 − 𝜌 ≥ 5) ▪ 𝜍𝑇 = 0, in Spearman correlation test (when 𝑜 ≥ 20) ▪ 𝑁1 = 𝑁2, when 𝑜1 ≥ 10 and 𝑜2 ≥ 10 in Mann-Whitney test
REVIEW OF DISTRIBUTIONS
SLIDE 18
𝑢 statistic: 𝑢 =
𝑈−𝜈𝑈 𝑡𝑈 ~𝑢𝑒𝑔
▪ Used for testing the following parameters/hypotheses :
▪ 𝜈 = 𝜈0, when 𝜏2 is unknown (𝑜 < 15; 15 ≤ 𝑜 < 30; 𝑜 ≥ 30) ▪ 𝜈𝑌 − 𝜈𝑍 = 𝜈0, when 𝜏𝑌
2 and 𝜏𝑍 2 are unknown (𝑜 < 15; 15 ≤
𝑜 < 30; 𝑜 ≥ 30) ▪ 𝛾 = 𝛾0, in regression analysis ▪ 𝜍 = 0, in Pearson correlation test
REVIEW OF DISTRIBUTIONS
SLIDE 19
𝜓2 statistic: 𝜓2 =
𝑒𝑔×𝑇2 𝜏2
~𝜓𝑒𝑔
2
▪ Used for testing the following parameters/hypotheses:
▪ 𝜏2 = 𝜏0
2, of a normal population
▪ 𝑁𝑌 = 𝑁𝑍 = ⋯ = 𝑁𝑎, in Kruskal-Wallis test ▪ independence in contingency tables when 𝑜𝑓𝑦𝑞 ≥ 5
REVIEW OF DISTRIBUTIONS
SLIDE 20
𝐺 statistic: 𝐺 =
𝑇1
2
𝑇2
2 ~𝐺𝑒𝑔 1,𝑒𝑔 2
▪ Used for testing the following parameters/hypotheses
▪ 𝜏𝑌
2 = 𝜏𝑍 2, of two normal populations
▪ 𝜏𝑌
2 = 𝜏𝑍 2 = ⋯ = 𝜏𝑎 2, with Levene’s test
▪ overall fit in regression analysis ▪ 𝜈𝑌 = 𝜈𝑍 = ⋯ = 𝜈𝑎, in ANOVA
REVIEW OF DISTRIBUTIONS
SLIDE 21
binomial statistic: 𝑌~bin 𝑜, 𝜌 ▪ Used for testing the following parameters/hypotheses
▪ 𝜌 = 𝜌0, in a repeated Bernoulli experiment ▪ 𝑁 = 𝑁0, in the sign test
REVIEW OF DISTRIBUTIONS
SLIDE 22
26 March 2015, Q1i OLD EXAM QUESTION
SLIDE 23