Bus 701: Advanced Statistics
Harald Schmidbauer
c Harald Schmidbauer & Angi R¨
- sch, 2007
Bus 701: Advanced Statistics Harald Schmidbauer c Harald - - PowerPoint PPT Presentation
Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer & Angi R osch, 2007 Chapter 12: Correlation c Harald Schmidbauer & Angi R osch, 2007 12. Correlation 2/26 12.1 Introduction Assumptions and the
c Harald Schmidbauer & Angi R¨
c Harald Schmidbauer & Angi R¨
c Harald Schmidbauer & Angi R¨
Example: The expenditure (in euros) of 508 customers for certain groups of goods at a supermarket was recorded. Recorded were among others: Expenditure for. . .
What is the relation between these variables? — Is there any? Scatterplots will provide us with first insight.
c Harald Schmidbauer & Angi R¨
Expenditure for bread and cheese.
4 6 8 10 5 10 15 20
bread cheese
(Shown: only those customers who actually bought both groups.)
c Harald Schmidbauer & Angi R¨
Expenditure for dairy products and cheese.
4 6 8 10 5 10 15 20
dairy cheese
(Shown: only those customers who actually bought both groups.)
c Harald Schmidbauer & Angi R¨
Expenditure for tea/coffee and fruit.
10 15 20 25 30 5 10 15 20 25 30
tea & coffee fruit
(Shown: only those customers who actually bought both groups.)
c Harald Schmidbauer & Angi R¨
Example: Weekly returns on stock indices DAX (gdaxi) and CAC 40 (fchi).
return on DAX (black), CAC 40 (red) 2004.0 2004.5 2005.0 2005.5 2006.0 −6 −4 −2 2 4
There is obviously a close association between DAX and CAC 40. But to investigate this, another display is more useful.
c Harald Schmidbauer & Angi R¨
Using a scatterplot.
−4 −2 2 4 −4 −2 2 4 return on DAX return on CAC 40
The scatterplot reveals the high correlation between returns on DAX and returns on CAC 40.
c Harald Schmidbauer & Angi R¨
Defining the covariance.
y xi yi I II III IV
xi − x
Area: (xi − ¯ x)(yi − ¯ y) The covariance is defined as the average size of all rectangles: cov(X, Y ) = 1 n
n
(xi − ¯ x)(yi − ¯ y)
c Harald Schmidbauer & Angi R¨
Interpreting the covariance.
y xi yi I II III IV
xi − x
In I and III: (xi − ¯ x)(yi − ¯ y) > 0 In II and IV: (xi − ¯ x)(yi − ¯ y) < 0 If the points (xi, yi) are predominantly in quadrant. . . . . . I and III: cov(X, Y ) > 0 . . . II and IV: cov(X, Y ) < 0
c Harald Schmidbauer & Angi R¨
Some properties of the covariance.
are associated.
in particular: The covariance depends on the unit of measurement. This makes it sometimes difficult to use. This is why we often prefer to investigate the relationship between two variables using the correlation, rather than the covariance.
c Harald Schmidbauer & Angi R¨
n
c Harald Schmidbauer & Angi R¨
Some properties of the correlation.
are associated.
are on a straight line with positive (negative) slope.
in particular: The correlation does not depend on the unit of measurement.
c Harald Schmidbauer & Angi R¨
Correlation patterns I: r > 0, i.e. the linear relation between between X and Y is positive
r large (r ≈ 0.95): X Y
r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r
r smaller (r ≈ 0.75): X Y
r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r
c Harald Schmidbauer & Angi R¨
Correlation patterns II: r < 0, i.e. the linear relation between between X and Y is negative
|r| large (r ≈ −0.95): X Y
r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r
|r| smaller (r ≈ −0.55): X Y
r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r
c Harald Schmidbauer & Angi R¨
Correlation patterns III: r close to 0, with no apparent relation between X and Y
s2
Y small; r ≈ −0.14:
X Y
r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r rr r r r r r r r r r r r r r r
s2
Y larger; r ≈ −0.04:
X Y
r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r
c Harald Schmidbauer & Angi R¨
Correlation patterns IV: r not meaningful because there is a nonlinear relation between X and Y
formally, r ≈ 0: Y X
r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r rr r r r r r rr r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r
formally, r ≈ 0: X Y
r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r
c Harald Schmidbauer & Angi R¨
c Harald Schmidbauer & Angi R¨
Expenditure for bread and cheese.
4 6 8 10 5 10 15 20
bread cheese
r = 0.41 Moderate positive correlation.
c Harald Schmidbauer & Angi R¨
Expenditure for dairy products and cheese.
4 6 8 10 5 10 15 20
dairy cheese
r = 0.05 Practically uncorrelated.
c Harald Schmidbauer & Angi R¨
Expenditure for tea/coffee and fruit.
10 15 20 25 30 5 10 15 20 25 30
tea & coffee fruit
r = −0.13 The correlation is negative — but this has no meaningful interpretation.
c Harald Schmidbauer & Angi R¨
Weekly returns on stock indices DAX and CAC 40.
−4 −2 2 4 −4 −2 2 4 return on DAX return on CAC 40
r = 0.925 Returns on DAX and CAC 40 are highly correlated. Here, the correlation is very useful.
c Harald Schmidbauer & Angi R¨
Another application of correlation.
variables X, Y .
(Xt) = X1, X2, X3, . . .
series (Xt).
time series.
c Harald Schmidbauer & Angi R¨
Example: Monthly car sales in Turkey.
20000 30000 40000 50000 60000 10000 20000 30000 40000 50000 60000 X_t X_t + 1
r = 0.76 This high autocorrelation can be used for forecasting purposes.
c Harald Schmidbauer & Angi R¨
c Harald Schmidbauer & Angi R¨