Business Statistics CONTENTS Back to the promise Back to the - - PowerPoint PPT Presentation

business statistics
SMART_READER_LITE
LIVE PREVIEW

Business Statistics CONTENTS Back to the promise Back to the - - PowerPoint PPT Presentation

MISCELLANEOUS TOPICS Business Statistics CONTENTS Back to the promise Back to the learning goals Standardizing data Assessing normality and symmetry Dealing with gaps in the tables The asterisk notation Review of distributions Old exam


slide-1
SLIDE 1

MISCELLANEOUS TOPICS

Business Statistics

slide-2
SLIDE 2

Back to the promise Back to the learning goals Standardizing data Assessing normality and symmetry Dealing with gaps in the tables The asterisk notation Review of distributions Old exam question Further study CONTENTS

slide-3
SLIDE 3

Regression analysis −0.12 −2.11 ∗ 𝑞 < .05 BACK TO THE PROMISE

slide-4
SLIDE 4

Wilcoxon test paired samples significant BACK TO THE PROMISE

slide-5
SLIDE 5

Academic skills ▪ abstraction Research skills ▪ translating Quantitative skills ▪ a variety of methods Knowledge ▪ reading and writing statistics BACK TO THE LEARNING OBJECTIVES

slide-6
SLIDE 6

We have often standardized a test statistic ▪ example: ത 𝑌 →

ത 𝑌−𝜈ഥ

𝑌

𝜏ഥ

𝑌

But one also frequently encounters standardized data ▪ Standardization of data is done by subtracting the mean and dividing by the standard deviation ▪ So: 𝑦𝑗 → 𝑨𝑗 = 𝑦𝑗 − ത 𝑌 𝑡𝑌 STANDARDIZING DATA

Not only for data from a normal population, but it is

  • ften done for all sorts of data
slide-7
SLIDE 7

Some properties: ▪ ҧ 𝑎 = 0 (the mean of standardized data is 0) ▪ 𝑡𝑎 = 𝑡𝑎

2 = 1 (the variance and the standard deviation of

standardized data is 1) ▪ standardized data is dimensionless (has no unit) Interpretation ▪ each value 𝑨𝑗 =

𝑦𝑗− ҧ 𝑦 𝑡𝑦 measure how many standard

deviations that value is removed from the mean ▪ examples:

▪ −2.5 is far in the left tail ▪ 0.2 is pretty central, a bit to the right of the mean

STANDARDIZING DATA

slide-8
SLIDE 8

Which statements are true?

  • a. 𝑨-scores can be made for numerical and categorical

variables

  • b. 𝑨-scores has a skewness and kurtosis of 0
  • c. 𝑨-scores are normally distributed when 𝑜 ≥ 30
  • d. A 𝑨-score of 3.2 means pretty high compared to most other

data points EXERCISE 1

slide-9
SLIDE 9

In many cases we need to make assumptions on “normal populations” or “symmetric populations” Is there a way to assess this? Qualitatively: ▪ histograms ▪ box plots Quantitatively: ▪ skewness ▪ kurtosis ASSESSING NORMALITY AND SYMMETRY

Not really helpful to judge normality for small sample sizes No formal test, but rules

  • f thumb; see next slide
slide-10
SLIDE 10

Practical rules of thumb that work more or less Normality: ▪ −1 ≤ skewness ≤ 1 ▪ −1 ≤ kurtosis ≤ 1 Symmetry: ▪ −1 ≤ skewness ≤ 1 ASSESSING NORMALITY AND SYMMETRY

Statistics sales 18 3,956 2,0893 ,716 ,536

  • ,297

1,038 N Mean

  • Std. Deviation

Skewness

  • Std. Error of Skewness

Kurtosis

  • Std. Error of Kurtosis
slide-11
SLIDE 11

Try to look up 𝑄 𝑎 ≤ 0.402 ▪ table gives 𝑄 𝑎 ≤ 0.410 = 0.6591 ▪ and 𝑄 𝑎 ≤ 0.400 = 0.6554 ▪ but not 𝑄 𝑎 ≤ 0.402 DEALING WITH GAPS IN THE TABLES

slide-12
SLIDE 12

Three solutions ▪ take nearest value

▪ 𝑄 𝑎 ≤ 0.402 ≈ 𝑄 𝑎 ≤ 0.400 = 0.6554

▪ make a linear interpolation

▪ 𝑄 𝑎 ≤ 0.402 ≈ 0.8𝑄 𝑎 ≤ 0.400 + 0.2𝑄 𝑎 ≤ 0.410 = 𝑄 𝑎 ≤ 0.400 +

2 10 0.6591 − 0.6554 = 0.65614

▪ use a conservative value

▪ either 𝑄 𝑎 ≤ 0.402 ≈ 𝑄 𝑎 ≤ 0.400 = 0.6554 ▪ or 𝑄 𝑎 ≤ 0.402 ≈ 𝑄 𝑎 ≤ 0.410 = 0.6591 ▪ depends on use (confidence interval, type of critical value)

Unless specified, we leave it up to you ▪ the difference are tiny anyhow DEALING WITH GAPS IN THE TABLES

slide-13
SLIDE 13

Another type of gap: degrees of freedom Example: 𝑢𝑑𝑠𝑗𝑢 df = 68 ▪ recommendation: conservative, so use df = 65 DEALING WITH GAPS IN THE TABLES

“conservative” sometimes means round up,wards sometimes round downwards

slide-14
SLIDE 14

Recall the introductory slides ▪ “ −.12 −2.11 ∗ ” ▪ what does that mean? THE ASTERISK NOTATION

slide-15
SLIDE 15

In many journal articles in business and economics, regression models are used ▪ Every regression coefficient has

▪ an estimated value (𝑐1, etc) ▪ a standard error of the estimate (𝑡𝐶1, etc) ▪ a 𝑢-value based on 𝐼0: 𝛾1 = 0, etc. (𝑢𝑑𝑏𝑚𝑑 =

𝑐1−0 𝑡𝐶1 , etc.)

▪ a 𝑞-value for this: 𝑞−value = 𝑄 𝑢𝑑𝑏𝑚𝑑 ≥ 𝑢𝑑𝑠𝑗𝑢

THE ASTERISK NOTATION

slide-16
SLIDE 16

In a journal we would need to report most of these ▪ This gives long sentences: “The estimated coefficient for uniqueness is 𝑐 = −.12, with a 𝑢-value of −2.11, giving a 𝑞-value between 0.01 and 0.05.” ▪ Therefore, this is often abbreviated: “−.12 −2.11 ∗” ▪ Usual conventions with the asterisks:

▪ * means 0.01 ≤ 𝑞−value < 0.05 ▪ ** means 0.001 ≤ 𝑞−value < 0.01 ▪ *** means 𝑞−value < 0.001

THE ASTERISK NOTATION

slide-17
SLIDE 17

𝑎 statistic: 𝑎 =

𝑈−𝜈𝑈 𝜏𝑈 ~𝑂 0,1

▪ Used for testing the following parameters/hypotheses:

▪ 𝜈 = 𝜈0, when 𝜏2 is known (𝑜 < 15; 15 ≤ 𝑜 < 30; 𝑜 ≥ 30) ▪ 𝑁 = 𝑁0, (through Wilcoxon’s 𝑋) (when 𝑜 ≥ 20) ▪ 𝜈𝑌 − 𝜈𝑍 = 𝜈0, when 𝜏𝑌

2 and 𝜏𝑍 2 are known (𝑜 < 15; 15 ≤

𝑜 < 30; 𝑜 ≥ 30) ▪ 𝜌 = 𝜌0, (when 𝑜𝜌 ≥ 5 and 𝑜 1 − 𝜌 ≥ 5) ▪ 𝜌𝑌 − 𝜌𝑍 = 𝜌0, (when 𝑜𝜌 ≥ 5 and 𝑜 1 − 𝜌 ≥ 5) ▪ 𝜍𝑇 = 0, in Spearman correlation test (when 𝑜 ≥ 20) ▪ 𝑁1 = 𝑁2, when 𝑜1 ≥ 10 and 𝑜2 ≥ 10 in Mann-Whitney test

REVIEW OF DISTRIBUTIONS

slide-18
SLIDE 18

𝑢 statistic: 𝑢 =

𝑈−𝜈𝑈 𝑡𝑈 ~𝑢𝑒𝑔

▪ Used for testing the following parameters/hypotheses :

▪ 𝜈 = 𝜈0, when 𝜏2 is unknown (𝑜 < 15; 15 ≤ 𝑜 < 30; 𝑜 ≥ 30) ▪ 𝜈𝑌 − 𝜈𝑍 = 𝜈0, when 𝜏𝑌

2 and 𝜏𝑍 2 are unknown (𝑜 < 15; 15 ≤

𝑜 < 30; 𝑜 ≥ 30) ▪ 𝛾 = 𝛾0, in regression analysis ▪ 𝜍 = 0, in Pearson correlation test

REVIEW OF DISTRIBUTIONS

slide-19
SLIDE 19

𝜓2 statistic: 𝜓2 =

𝑒𝑔×𝑇2 𝜏2

~𝜓𝑒𝑔

2

▪ Used for testing the following parameters/hypotheses:

▪ 𝜏2 = 𝜏0

2, of a normal population

▪ 𝑁𝑌 = 𝑁𝑍 = ⋯ = 𝑁𝑎, in Kruskal-Wallis test ▪ independence in contingency tables when 𝑜𝑓𝑦𝑞 ≥ 5

REVIEW OF DISTRIBUTIONS

slide-20
SLIDE 20

𝐺 statistic: 𝐺 =

𝑇1

2

𝑇2

2 ~𝐺𝑒𝑔 1,𝑒𝑔 2

▪ Used for testing the following parameters/hypotheses

▪ 𝜏𝑌

2 = 𝜏𝑍 2, of two normal populations

▪ 𝜏𝑌

2 = 𝜏𝑍 2 = ⋯ = 𝜏𝑎 2, with Levene’s test

▪ overall fit in regression analysis ▪ 𝜈𝑌 = 𝜈𝑍 = ⋯ = 𝜈𝑎, in ANOVA

REVIEW OF DISTRIBUTIONS

slide-21
SLIDE 21

binomial statistic: 𝑌~bin 𝑜, 𝜌 ▪ Used for testing the following parameters/hypotheses

▪ 𝜌 = 𝜌0, in a repeated Bernoulli experiment ▪ 𝑁 = 𝑁0, in the sign test

REVIEW OF DISTRIBUTIONS

slide-22
SLIDE 22

26 March 2015, Q1i OLD EXAM QUESTION

slide-23
SLIDE 23

Doane & Seward 5/E missing Tutorial exercises week 6 FURTHER STUDY