Business Statistics CONTENTS The correlation coefficient The rank - - PowerPoint PPT Presentation

โ–ถ
business statistics
SMART_READER_LITE
LIVE PREVIEW

Business Statistics CONTENTS The correlation coefficient The rank - - PowerPoint PPT Presentation

: ESTIMATES AND TESTS Business Statistics CONTENTS The correlation coefficient The rank correlation coefficient Testing the correlation coefficient Non-linear relationships Old exam question Further study THE CORRELATION COEFFICIENT


slide-1
SLIDE 1

๐œ: ESTIMATES AND TESTS

Business Statistics

slide-2
SLIDE 2

The correlation coefficient The rank correlation coefficient Testing the correlation coefficient Non-linear relationships Old exam question Further study CONTENTS

slide-3
SLIDE 3

Correlation coefficient โ–ช ๐‘ 

๐‘Œ,๐‘ = ๐‘ก๐‘Œ,๐‘ ๐‘ก๐‘Œ๐‘ก๐‘

Or written in full โ–ช ๐‘ 

๐‘Œ,๐‘ = ฯƒ๐‘—=1

๐‘œ

๐‘ฆ๐‘—โˆ’ าง ๐‘ฆ ๐‘ง๐‘—โˆ’ เดค ๐‘ง ฯƒ๐‘—=1

๐‘œ

๐‘ฆ๐‘—โˆ’ าง ๐‘ฆ 2 ฯƒ๐‘—=1

๐‘œ

๐‘ง๐‘—โˆ’ เดค ๐‘ง 2

Or using the sums-of-squares notation โ–ช ๐‘ 

๐‘Œ,๐‘ = ๐‘‡๐‘‡๐‘Œ,๐‘ ๐‘‡๐‘‡๐‘Œ,๐‘Œ ๐‘‡๐‘‡๐‘,๐‘

THE CORRELATION COEFFICIENT

slide-4
SLIDE 4

Correlation coefficient โ–ช for two related numerical variables (paired data: ๐‘Œ, ๐‘ = ๐‘ฆ1, ๐‘ง1 , ๐‘ฆ2, ๐‘ง2 , โ€ฆ , ๐‘ฆ๐‘œ, ๐‘ง๐‘œ ) โ–ช โˆ’1 โ‰ค ๐‘  โ‰ค 1 โ–ช indicator of linear association between two numerical variables Alternative names: โ–ช Pearson correlation coefficient โ–ช Pearson product-moment correlation coefficient

โ–ช named after Karl Pearson, 1857-1936

THE CORRELATION COEFFICIENT

slide-5
SLIDE 5

Scatter plots showing various situations THE CORRELATION COEFFICIENT

slide-6
SLIDE 6

Some points to observe โ–ช There is only a โ€œsignโ€ relation between the correlation coefficient and the slope of the regression line

โ–ช if ๐‘  = 1, the points fall on a straight line with slope>0 โ–ช if ๐‘  = โˆ’1, the points fall on a straight line with slope<0

โ–ช Interchanging ๐‘Œ and ๐‘ will not change the correlation coefficient

โ–ช so ๐‘ 

๐‘Œ,๐‘ = ๐‘  ๐‘,๐‘Œ

โ–ช Rescaling ๐‘Œ or ๐‘ will not change the correlation coefficient

โ–ช in particular, ๐‘  is not sensitive to changes in units of ๐‘Œ or ๐‘

โ–ช Correlation coefficients are sensitive to outliers THE CORRELATION COEFFICIENT

because for correlation, the variables are standardized

slide-7
SLIDE 7

Note: โ–ช correlation implies no causality

โ–ช sources: tylervigen.com and forbes.com

THE CORRELATION COEFFICIENT

slide-8
SLIDE 8

THE CORRELATION COEFFICIENT

slide-9
SLIDE 9

Which figure has a larger correlation coefficient? EXERCISE 1

slide-10
SLIDE 10

Can we do a hypothesis test on the correlation coefficient? First acknoweldge: โ–ช ๐‘  is the correlation coefficient of the two samples โ–ช ๐œ is the correlation coefficient of the bivariate population โ–ช so the null hypothesis would be ๐ผ0: ๐œ = 0 or ๐ผ0: ๐œ โ‰ฅ 0.3 etc.

โ–ช never ๐‘  = 0 or so!

TESTING THE CORRELATION COEFFICIENT

slide-11
SLIDE 11

And the null distribution? โ–ช It is known that

๐‘† 1โˆ’๐‘†2 / ๐‘œโˆ’2 ~๐‘ข๐‘œโˆ’2

Important limitation: โ–ช The distribution of the test statistic

๐‘† 1โˆ’๐‘†2 / ๐‘œโˆ’2 is only

for valid for ๐œ = 0 (crucial in step 3)

โ–ช so we can only test ๐ผ0: ๐œ = 0 โ–ช fortunately, thatโ€™s by far the most interesting hypothesis

TESTING THE CORRELATION COEFFICIENT

Test of ๐œ = ๐œ0 โ‰  0: Google โ€œFisher transformationโ€; not in this course

slide-12
SLIDE 12

โ–ช Step 1:

โ–ช ๐ผ0: ๐œ = 0; ๐ผ1: ๐œ โ‰  0; ๐›ฝ = 0.05

โ–ช Step 2:

โ–ช sample statistic: ๐‘†; reject for โ€œtoo smallโ€ and โ€œtoo largeโ€ values

โ–ช Step 3:

โ–ช if ๐ผ0 is true,

๐‘† 1โˆ’๐‘†2 / ๐‘œโˆ’2 ~๐‘ข๐‘œโˆ’2

โ–ช normally distributed populations needed

โ–ช Step 4:

โ–ช as usual (insert ๐‘  for calculated value of ๐‘†)

โ–ช Step 5:

โ–ช as usual

TESTING THE CORRELATION COEFFICIENT

slide-13
SLIDE 13

What is the meaning of rejecting ๐ผ0: ๐œ = 0? Conclude: there is a significant linear correlation between ๐‘Œ and ๐‘ โ–ช meaning: the correlation is not 0 โ–ช do not conclude: ๐‘Œ causes ๐‘ (or ๐‘ causes ๐‘Œ) โ–ช do not conclude: ๐‘Œ has a large influence on ๐‘ (or the other way around) There is an important difference between a correlation coefficient and a regression coefficient โ–ช we will come back to this soon TESTING THE CORRELATION COEFFICIENT

slide-14
SLIDE 14

What to do in case of a non-linear relation? โ–ช Two suggestions:

โ–ช transform, e.g. log ๐‘Œ vs log ๐‘ โ–ช use ranked data

NON-LINEAR RELATIONSHIPS

slide-15
SLIDE 15

Suggestion 1: log transformation โ–ช life expectancy vs. GDP/capita

โ–ช โ†’ life expectancy vs. log(GDP/capita)

โ–ช ๐‘  = 0.646

โ–ช โ†’ 0.774

NON-LINEAR RELATIONSHIPS

slide-16
SLIDE 16

โ–ช Note: zero linear correlation does not exclude strong non- linear relation

โ–ช e.g., quadratic

NON-LINEAR RELATIONSHIPS

slide-17
SLIDE 17

Solution 2: with ranked data โ–ช replace data (๐‘Œ and ๐‘) by ranks (โ†’ ๐‘Œ๐‘  and ๐‘

๐‘ )

โ–ช compute the (Pearson) correlation coefficient of ๐‘Œ๐‘  and ๐‘

๐‘ 

๐‘ 

๐‘‡ ๐‘Œ, ๐‘ = ๐‘  ๐‘Œ๐‘ , ๐‘ ๐‘ 

โ–ช This is the rank correlation coefficient ๐‘ 

๐‘‡

โ–ช Also Spearman correlation coefficient

โ–ช after Charles Spearman, 1863-1945

NON-LINEAR RELATIONSHIPS

slide-18
SLIDE 18

Of course many properties for ๐‘  also hold for ๐‘ 

๐‘‡:

โ–ช โˆ’1 โ‰ค ๐‘ 

๐‘‡ โ‰ค 1

โ–ช If ๐‘ 

๐‘‡ > 0 increasing (decreasing) ๐‘ฆ-values tend to be

accompanied by increasing (decreasing) ๐‘ง-values โ–ช If ๐‘ 

๐‘‡ < 0 increasing (decreasing) ๐‘ฆ-values tend to be

accompanied by decreasing (increasing) ๐‘ง-values NON-LINEAR RELATIONSHIPS

slide-19
SLIDE 19

Example โ–ช life expectancy vs. GDP/capita

โ–ช or expectancy vs. log GDP/capita

โ–ช ๐‘ 

๐‘‡ = 0.828

โ–ช for both (obviously!)

NON-LINEAR RELATIONSHIPS

slide-20
SLIDE 20

Can we also test hypotheses for the rank correlation coefficient? โ–ช i.e. ๐ผ0: ๐œ๐‘‡ = 0 We can use similar but slightly different test as for ๐œ โ–ช i.e.

๐‘†๐‘‡ 1/ ๐‘œโˆ’1 ~๐‘‚ 0,1

โ–ช which requires ๐‘œ โ‰ฅ 20, but not normality of ๐‘Œ and ๐‘

NON-LINEAR RELATIONSHIPS

slide-21
SLIDE 21

21 May 2015, Q1k-l OLD EXAM QUESTION

slide-22
SLIDE 22

Doane & Seward 5/E 12.1, 16.7 Tutorial exercises week 5 hypothesis test FURTHER STUDY