Business Statistics CONTENTS Back to the promise Back to the - - PowerPoint PPT Presentation

▶

Feb 28, 2024 474 likes •725 views

MISCELLANEOUS TOPICS Business Statistics CONTENTS Back to the promise Back to the learning goals Standardizing data Assessing normality and symmetry Dealing with gaps in the tables The asterisk notation Review of distributions Old exam

SLIDE 1

MISCELLANEOUS TOPICS

Business Statistics

SLIDE 2

Back to the promise Back to the learning goals Standardizing data Assessing normality and symmetry Dealing with gaps in the tables The asterisk notation Review of distributions Old exam question Further study CONTENTS

SLIDE 3

Regression analysis −0.12 −2.11 ∗ 𝑞 < .05 BACK TO THE PROMISE

SLIDE 4

Wilcoxon test paired samples significant BACK TO THE PROMISE

SLIDE 5

Academic skills ▪ abstraction Research skills ▪ translating Quantitative skills ▪ a variety of methods Knowledge ▪ reading and writing statistics BACK TO THE LEARNING OBJECTIVES

SLIDE 6

We have often standardized a test statistic ▪ example: ത 𝑌 →

ത 𝑌−𝜈ഥ

𝑌

𝜏ഥ

𝑌

But one also frequently encounters standardized data ▪ Standardization of data is done by subtracting the mean and dividing by the standard deviation ▪ So: 𝑦𝑗 → 𝑨𝑗 = 𝑦𝑗 − ത 𝑌 𝑡𝑌 STANDARDIZING DATA

Not only for data from a normal population, but it is

ften done for all sorts of data

SLIDE 7

Some properties: ▪ ҧ 𝑎 = 0 (the mean of standardized data is 0) ▪ 𝑡𝑎 = 𝑡𝑎

2 = 1 (the variance and the standard deviation of

standardized data is 1) ▪ standardized data is dimensionless (has no unit) Interpretation ▪ each value 𝑨𝑗 =

𝑦𝑗− ҧ 𝑦 𝑡𝑦 measure how many standard

deviations that value is removed from the mean ▪ examples:

▪ −2.5 is far in the left tail ▪ 0.2 is pretty central, a bit to the right of the mean

STANDARDIZING DATA

SLIDE 8

Which statements are true?

a. 𝑨-scores can be made for numerical and categorical

variables

b. 𝑨-scores has a skewness and kurtosis of 0
c. 𝑨-scores are normally distributed when 𝑜 ≥ 30
d. A 𝑨-score of 3.2 means pretty high compared to most other

data points EXERCISE 1

SLIDE 9

In many cases we need to make assumptions on “normal populations” or “symmetric populations” Is there a way to assess this? Qualitatively: ▪ histograms ▪ box plots Quantitatively: ▪ skewness ▪ kurtosis ASSESSING NORMALITY AND SYMMETRY

Not really helpful to judge normality for small sample sizes No formal test, but rules

f thumb; see next slide

SLIDE 10

Practical rules of thumb that work more or less Normality: ▪ −1 ≤ skewness ≤ 1 ▪ −1 ≤ kurtosis ≤ 1 Symmetry: ▪ −1 ≤ skewness ≤ 1 ASSESSING NORMALITY AND SYMMETRY

Statistics sales 18 3,956 2,0893 ,716 ,536

,297

1,038 N Mean

Std. Deviation

Skewness

Std. Error of Skewness

Kurtosis

Std. Error of Kurtosis

SLIDE 11

Try to look up 𝑄 𝑎 ≤ 0.402 ▪ table gives 𝑄 𝑎 ≤ 0.410 = 0.6591 ▪ and 𝑄 𝑎 ≤ 0.400 = 0.6554 ▪ but not 𝑄 𝑎 ≤ 0.402 DEALING WITH GAPS IN THE TABLES

SLIDE 12

Three solutions ▪ take nearest value

▪ 𝑄 𝑎 ≤ 0.402 ≈ 𝑄 𝑎 ≤ 0.400 = 0.6554

▪ make a linear interpolation

▪ 𝑄 𝑎 ≤ 0.402 ≈ 0.8𝑄 𝑎 ≤ 0.400 + 0.2𝑄 𝑎 ≤ 0.410 = 𝑄 𝑎 ≤ 0.400 +

2 10 0.6591 − 0.6554 = 0.65614

▪ use a conservative value

▪ either 𝑄 𝑎 ≤ 0.402 ≈ 𝑄 𝑎 ≤ 0.400 = 0.6554 ▪ or 𝑄 𝑎 ≤ 0.402 ≈ 𝑄 𝑎 ≤ 0.410 = 0.6591 ▪ depends on use (confidence interval, type of critical value)

Unless specified, we leave it up to you ▪ the difference are tiny anyhow DEALING WITH GAPS IN THE TABLES

SLIDE 13

Another type of gap: degrees of freedom Example: 𝑢𝑑𝑠𝑗𝑢 df = 68 ▪ recommendation: conservative, so use df = 65 DEALING WITH GAPS IN THE TABLES

“conservative” sometimes means round up,wards sometimes round downwards

SLIDE 14

Recall the introductory slides ▪ “ −.12 −2.11 ∗ ” ▪ what does that mean? THE ASTERISK NOTATION

SLIDE 15

In many journal articles in business and economics, regression models are used ▪ Every regression coefficient has

▪ an estimated value (𝑐1, etc) ▪ a standard error of the estimate (𝑡𝐶1, etc) ▪ a 𝑢-value based on 𝐼0: 𝛾1 = 0, etc. (𝑢𝑑𝑏𝑚𝑑 =

𝑐1−0 𝑡𝐶1 , etc.)

▪ a 𝑞-value for this: 𝑞−value = 𝑄 𝑢𝑑𝑏𝑚𝑑 ≥ 𝑢𝑑𝑠𝑗𝑢

THE ASTERISK NOTATION

SLIDE 16

In a journal we would need to report most of these ▪ This gives long sentences: “The estimated coefficient for uniqueness is 𝑐 = −.12, with a 𝑢-value of −2.11, giving a 𝑞-value between 0.01 and 0.05.” ▪ Therefore, this is often abbreviated: “−.12 −2.11 ∗” ▪ Usual conventions with the asterisks:

▪ * means 0.01 ≤ 𝑞−value < 0.05 ▪ means 0.001 ≤ 𝑞−value < 0.01 ▪ * means 𝑞−value < 0.001

THE ASTERISK NOTATION

SLIDE 17

𝑎 statistic: 𝑎 =

𝑈−𝜈𝑈 𝜏𝑈 ~𝑂 0,1

▪ Used for testing the following parameters/hypotheses:

▪ 𝜈 = 𝜈0, when 𝜏2 is known (𝑜 < 15; 15 ≤ 𝑜 < 30; 𝑜 ≥ 30) ▪ 𝑁 = 𝑁0, (through Wilcoxon’s 𝑋) (when 𝑜 ≥ 20) ▪ 𝜈𝑌 − 𝜈𝑍 = 𝜈0, when 𝜏𝑌

2 and 𝜏𝑍 2 are known (𝑜 < 15; 15 ≤

𝑜 < 30; 𝑜 ≥ 30) ▪ 𝜌 = 𝜌0, (when 𝑜𝜌 ≥ 5 and 𝑜 1 − 𝜌 ≥ 5) ▪ 𝜌𝑌 − 𝜌𝑍 = 𝜌0, (when 𝑜𝜌 ≥ 5 and 𝑜 1 − 𝜌 ≥ 5) ▪ 𝜍𝑇 = 0, in Spearman correlation test (when 𝑜 ≥ 20) ▪ 𝑁1 = 𝑁2, when 𝑜1 ≥ 10 and 𝑜2 ≥ 10 in Mann-Whitney test

REVIEW OF DISTRIBUTIONS

SLIDE 18

𝑢 statistic: 𝑢 =

𝑈−𝜈𝑈 𝑡𝑈 ~𝑢𝑒𝑔

▪ Used for testing the following parameters/hypotheses :

▪ 𝜈 = 𝜈0, when 𝜏2 is unknown (𝑜 < 15; 15 ≤ 𝑜 < 30; 𝑜 ≥ 30) ▪ 𝜈𝑌 − 𝜈𝑍 = 𝜈0, when 𝜏𝑌

2 and 𝜏𝑍 2 are unknown (𝑜 < 15; 15 ≤

𝑜 < 30; 𝑜 ≥ 30) ▪ 𝛾 = 𝛾0, in regression analysis ▪ 𝜍 = 0, in Pearson correlation test

REVIEW OF DISTRIBUTIONS

SLIDE 19

𝜓2 statistic: 𝜓2 =

𝑒𝑔×𝑇2 𝜏2

~𝜓𝑒𝑔

2

▪ Used for testing the following parameters/hypotheses:

▪ 𝜏2 = 𝜏0

2, of a normal population

▪ 𝑁𝑌 = 𝑁𝑍 = ⋯ = 𝑁𝑎, in Kruskal-Wallis test ▪ independence in contingency tables when 𝑜𝑓𝑦𝑞 ≥ 5

REVIEW OF DISTRIBUTIONS

SLIDE 20

𝐺 statistic: 𝐺 =

𝑇1

2 𝑇2

2 ~𝐺𝑒𝑔 1,𝑒𝑔 2

▪ Used for testing the following parameters/hypotheses

▪ 𝜏𝑌

2 = 𝜏𝑍 2, of two normal populations

▪ 𝜏𝑌

2 = 𝜏𝑍 2 = ⋯ = 𝜏𝑎 2, with Levene’s test

▪ overall fit in regression analysis ▪ 𝜈𝑌 = 𝜈𝑍 = ⋯ = 𝜈𝑎, in ANOVA

REVIEW OF DISTRIBUTIONS

SLIDE 21

binomial statistic: 𝑌~bin 𝑜, 𝜌 ▪ Used for testing the following parameters/hypotheses

▪ 𝜌 = 𝜌0, in a repeated Bernoulli experiment ▪ 𝑁 = 𝑁0, in the sign test

REVIEW OF DISTRIBUTIONS

SLIDE 22

26 March 2015, Q1i OLD EXAM QUESTION

SLIDE 23

MISCELLANEOUS TOPICS

Business Statistics

Back to the promise Back to the learning goals Standardizing data Assessing normality and symmetry Dealing with gaps in the tables The asterisk notation Review of distributions Old exam question Further study CONTENTS

Regression analysis −0.12 −2.11 ∗ 𝑞 < .05 BACK TO THE PROMISE

Wilcoxon test paired samples significant BACK TO THE PROMISE

Academic skills ▪ abstraction Research skills ▪ translating Quantitative skills ▪ a variety of methods Knowledge ▪ reading and writing statistics BACK TO THE LEARNING OBJECTIVES

We have often standardized a test statistic ▪ example: ത 𝑌 →

ത 𝑌−𝜈ഥ

𝑌

𝜏ഥ

𝑌

But one also frequently encounters standardized data ▪ Standardization of data is done by subtracting the mean and dividing by the standard deviation ▪ So: 𝑦𝑗 → 𝑨𝑗 = 𝑦𝑗 − ത 𝑌 𝑡𝑌 STANDARDIZING DATA

Not only for data from a normal population, but it is

Some properties: ▪ ҧ 𝑎 = 0 (the mean of standardized data is 0) ▪ 𝑡𝑎 = 𝑡𝑎

2 = 1 (the variance and the standard deviation of

standardized data is 1) ▪ standardized data is dimensionless (has no unit) Interpretation ▪ each value 𝑨𝑗 =

𝑦𝑗− ҧ 𝑦 𝑡𝑦 measure how many standard

deviations that value is removed from the mean ▪ examples:

▪ −2.5 is far in the left tail ▪ 0.2 is pretty central, a bit to the right of the mean

STANDARDIZING DATA

Which statements are true?

variables

data points EXERCISE 1

In many cases we need to make assumptions on “normal populations” or “symmetric populations” Is there a way to assess this? Qualitatively: ▪ histograms ▪ box plots Quantitatively: ▪ skewness ▪ kurtosis ASSESSING NORMALITY AND SYMMETRY

Not really helpful to judge normality for small sample sizes No formal test, but rules

Practical rules of thumb that work more or less Normality: ▪ −1 ≤ skewness ≤ 1 ▪ −1 ≤ kurtosis ≤ 1 Symmetry: ▪ −1 ≤ skewness ≤ 1 ASSESSING NORMALITY AND SYMMETRY

Try to look up 𝑄 𝑎 ≤ 0.402 ▪ table gives 𝑄 𝑎 ≤ 0.410 = 0.6591 ▪ and 𝑄 𝑎 ≤ 0.400 = 0.6554 ▪ but not 𝑄 𝑎 ≤ 0.402 DEALING WITH GAPS IN THE TABLES

Three solutions ▪ take nearest value

▪ 𝑄 𝑎 ≤ 0.402 ≈ 𝑄 𝑎 ≤ 0.400 = 0.6554

▪ make a linear interpolation

▪ 𝑄 𝑎 ≤ 0.402 ≈ 0.8𝑄 𝑎 ≤ 0.400 + 0.2𝑄 𝑎 ≤ 0.410 = 𝑄 𝑎 ≤ 0.400 +

2 10 0.6591 − 0.6554 = 0.65614

▪ use a conservative value

▪ either 𝑄 𝑎 ≤ 0.402 ≈ 𝑄 𝑎 ≤ 0.400 = 0.6554 ▪ or 𝑄 𝑎 ≤ 0.402 ≈ 𝑄 𝑎 ≤ 0.410 = 0.6591 ▪ depends on use (confidence interval, type of critical value)

Unless specified, we leave it up to you ▪ the difference are tiny anyhow DEALING WITH GAPS IN THE TABLES

Another type of gap: degrees of freedom Example: 𝑢𝑑𝑠𝑗𝑢 df = 68 ▪ recommendation: conservative, so use df = 65 DEALING WITH GAPS IN THE TABLES

“conservative” sometimes means round up,wards sometimes round downwards

Recall the introductory slides ▪ “ −.12 −2.11 ∗ ” ▪ what does that mean? THE ASTERISK NOTATION

In many journal articles in business and economics, regression models are used ▪ Every regression coefficient has

▪ an estimated value (𝑐1, etc) ▪ a standard error of the estimate (𝑡𝐶1, etc) ▪ a 𝑢-value based on 𝐼0: 𝛾1 = 0, etc. (𝑢𝑑𝑏𝑚𝑑 =

𝑐1−0 𝑡𝐶1 , etc.)

▪ a 𝑞-value for this: 𝑞−value = 𝑄 𝑢𝑑𝑏𝑚𝑑 ≥ 𝑢𝑑𝑠𝑗𝑢

THE ASTERISK NOTATION

▪ * means 0.01 ≤ 𝑞−value < 0.05 ▪ ** means 0.001 ≤ 𝑞−value < 0.01 ▪ *** means 𝑞−value < 0.001

THE ASTERISK NOTATION

𝑎 statistic: 𝑎 =

𝑈−𝜈𝑈 𝜏𝑈 ~𝑂 0,1

▪ Used for testing the following parameters/hypotheses:

▪ 𝜈 = 𝜈0, when 𝜏2 is known (𝑜 < 15; 15 ≤ 𝑜 < 30; 𝑜 ≥ 30) ▪ 𝑁 = 𝑁0, (through Wilcoxon’s 𝑋) (when 𝑜 ≥ 20) ▪ 𝜈𝑌 − 𝜈𝑍 = 𝜈0, when 𝜏𝑌

2 and 𝜏𝑍 2 are known (𝑜 < 15; 15 ≤

REVIEW OF DISTRIBUTIONS

𝑢 statistic: 𝑢 =

𝑈−𝜈𝑈 𝑡𝑈 ~𝑢𝑒𝑔

▪ Used for testing the following parameters/hypotheses :

▪ 𝜈 = 𝜈0, when 𝜏2 is unknown (𝑜 < 15; 15 ≤ 𝑜 < 30; 𝑜 ≥ 30) ▪ 𝜈𝑌 − 𝜈𝑍 = 𝜈0, when 𝜏𝑌

2 and 𝜏𝑍 2 are unknown (𝑜 < 15; 15 ≤

𝑜 < 30; 𝑜 ≥ 30) ▪ 𝛾 = 𝛾0, in regression analysis ▪ 𝜍 = 0, in Pearson correlation test

REVIEW OF DISTRIBUTIONS

𝜓2 statistic: 𝜓2 =

𝑒𝑔×𝑇2 𝜏2

~𝜓𝑒𝑔

2

▪ Used for testing the following parameters/hypotheses:

▪ 𝜏2 = 𝜏0

2, of a normal population

▪ 𝑁𝑌 = 𝑁𝑍 = ⋯ = 𝑁𝑎, in Kruskal-Wallis test ▪ independence in contingency tables when 𝑜𝑓𝑦𝑞 ≥ 5

REVIEW OF DISTRIBUTIONS

𝐺 statistic: 𝐺 =

𝑇1

2

𝑇2

2 ~𝐺𝑒𝑔 1,𝑒𝑔 2

▪ Used for testing the following parameters/hypotheses

▪ 𝜏𝑌

2 = 𝜏𝑍 2, of two normal populations

▪ 𝜏𝑌

2 = 𝜏𝑍 2 = ⋯ = 𝜏𝑎 2, with Levene’s test

▪ overall fit in regression analysis ▪ 𝜈𝑌 = 𝜈𝑍 = ⋯ = 𝜈𝑎, in ANOVA

REVIEW OF DISTRIBUTIONS

binomial statistic: 𝑌~bin 𝑜, 𝜌 ▪ Used for testing the following parameters/hypotheses

▪ * means 0.01 ≤ 𝑞−value < 0.05 ▪ means 0.001 ≤ 𝑞−value < 0.01 ▪ * means 𝑞−value < 0.001