SLIDE 1
Business Statistics CONTENTS The correlation coefficient The rank - - PowerPoint PPT Presentation
Business Statistics CONTENTS The correlation coefficient The rank - - PowerPoint PPT Presentation
: ESTIMATES AND TESTS Business Statistics CONTENTS The correlation coefficient The rank correlation coefficient Testing the correlation coefficient Non-linear relationships Old exam question Further study THE CORRELATION COEFFICIENT
SLIDE 2
SLIDE 3
Correlation coefficient โช ๐
๐,๐ = ๐ก๐,๐ ๐ก๐๐ก๐
Or written in full โช ๐
๐,๐ = ฯ๐=1
๐
๐ฆ๐โ าง ๐ฆ ๐ง๐โ เดค ๐ง ฯ๐=1
๐
๐ฆ๐โ าง ๐ฆ 2 ฯ๐=1
๐
๐ง๐โ เดค ๐ง 2
Or using the sums-of-squares notation โช ๐
๐,๐ = ๐๐๐,๐ ๐๐๐,๐ ๐๐๐,๐
THE CORRELATION COEFFICIENT
SLIDE 4
Correlation coefficient โช for two related numerical variables (paired data: ๐, ๐ = ๐ฆ1, ๐ง1 , ๐ฆ2, ๐ง2 , โฆ , ๐ฆ๐, ๐ง๐ ) โช โ1 โค ๐ โค 1 โช indicator of linear association between two numerical variables Alternative names: โช Pearson correlation coefficient โช Pearson product-moment correlation coefficient
โช named after Karl Pearson, 1857-1936
THE CORRELATION COEFFICIENT
SLIDE 5
Scatter plots showing various situations THE CORRELATION COEFFICIENT
SLIDE 6
Some points to observe โช There is only a โsignโ relation between the correlation coefficient and the slope of the regression line
โช if ๐ = 1, the points fall on a straight line with slope>0 โช if ๐ = โ1, the points fall on a straight line with slope<0
โช Interchanging ๐ and ๐ will not change the correlation coefficient
โช so ๐
๐,๐ = ๐ ๐,๐
โช Rescaling ๐ or ๐ will not change the correlation coefficient
โช in particular, ๐ is not sensitive to changes in units of ๐ or ๐
โช Correlation coefficients are sensitive to outliers THE CORRELATION COEFFICIENT
because for correlation, the variables are standardized
SLIDE 7
Note: โช correlation implies no causality
โช sources: tylervigen.com and forbes.com
THE CORRELATION COEFFICIENT
SLIDE 8
THE CORRELATION COEFFICIENT
SLIDE 9
Which figure has a larger correlation coefficient? EXERCISE 1
SLIDE 10
Can we do a hypothesis test on the correlation coefficient? First acknoweldge: โช ๐ is the correlation coefficient of the two samples โช ๐ is the correlation coefficient of the bivariate population โช so the null hypothesis would be ๐ผ0: ๐ = 0 or ๐ผ0: ๐ โฅ 0.3 etc.
โช never ๐ = 0 or so!
TESTING THE CORRELATION COEFFICIENT
SLIDE 11
And the null distribution? โช It is known that
๐ 1โ๐2 / ๐โ2 ~๐ข๐โ2
Important limitation: โช The distribution of the test statistic
๐ 1โ๐2 / ๐โ2 is only
for valid for ๐ = 0 (crucial in step 3)
โช so we can only test ๐ผ0: ๐ = 0 โช fortunately, thatโs by far the most interesting hypothesis
TESTING THE CORRELATION COEFFICIENT
Test of ๐ = ๐0 โ 0: Google โFisher transformationโ; not in this course
SLIDE 12
โช Step 1:
โช ๐ผ0: ๐ = 0; ๐ผ1: ๐ โ 0; ๐ฝ = 0.05
โช Step 2:
โช sample statistic: ๐; reject for โtoo smallโ and โtoo largeโ values
โช Step 3:
โช if ๐ผ0 is true,
๐ 1โ๐2 / ๐โ2 ~๐ข๐โ2
โช normally distributed populations needed
โช Step 4:
โช as usual (insert ๐ for calculated value of ๐)
โช Step 5:
โช as usual
TESTING THE CORRELATION COEFFICIENT
SLIDE 13
What is the meaning of rejecting ๐ผ0: ๐ = 0? Conclude: there is a significant linear correlation between ๐ and ๐ โช meaning: the correlation is not 0 โช do not conclude: ๐ causes ๐ (or ๐ causes ๐) โช do not conclude: ๐ has a large influence on ๐ (or the other way around) There is an important difference between a correlation coefficient and a regression coefficient โช we will come back to this soon TESTING THE CORRELATION COEFFICIENT
SLIDE 14
What to do in case of a non-linear relation? โช Two suggestions:
โช transform, e.g. log ๐ vs log ๐ โช use ranked data
NON-LINEAR RELATIONSHIPS
SLIDE 15
Suggestion 1: log transformation โช life expectancy vs. GDP/capita
โช โ life expectancy vs. log(GDP/capita)
โช ๐ = 0.646
โช โ 0.774
NON-LINEAR RELATIONSHIPS
SLIDE 16
โช Note: zero linear correlation does not exclude strong non- linear relation
โช e.g., quadratic
NON-LINEAR RELATIONSHIPS
SLIDE 17
Solution 2: with ranked data โช replace data (๐ and ๐) by ranks (โ ๐๐ and ๐
๐ )
โช compute the (Pearson) correlation coefficient of ๐๐ and ๐
๐
๐
๐ ๐, ๐ = ๐ ๐๐ , ๐ ๐
โช This is the rank correlation coefficient ๐
๐
โช Also Spearman correlation coefficient
โช after Charles Spearman, 1863-1945
NON-LINEAR RELATIONSHIPS
SLIDE 18
Of course many properties for ๐ also hold for ๐
๐:
โช โ1 โค ๐
๐ โค 1
โช If ๐
๐ > 0 increasing (decreasing) ๐ฆ-values tend to be
accompanied by increasing (decreasing) ๐ง-values โช If ๐
๐ < 0 increasing (decreasing) ๐ฆ-values tend to be
accompanied by decreasing (increasing) ๐ง-values NON-LINEAR RELATIONSHIPS
SLIDE 19
Example โช life expectancy vs. GDP/capita
โช or expectancy vs. log GDP/capita
โช ๐
๐ = 0.828
โช for both (obviously!)
NON-LINEAR RELATIONSHIPS
SLIDE 20
Can we also test hypotheses for the rank correlation coefficient? โช i.e. ๐ผ0: ๐๐ = 0 We can use similar but slightly different test as for ๐ โช i.e.
๐๐ 1/ ๐โ1 ~๐ 0,1
โช which requires ๐ โฅ 20, but not normality of ๐ and ๐
NON-LINEAR RELATIONSHIPS
SLIDE 21
21 May 2015, Q1k-l OLD EXAM QUESTION
SLIDE 22