Bus 701: Advanced Statistics Harald Schmidbauer c Harald - - PowerPoint PPT Presentation

bus 701 advanced statistics
SMART_READER_LITE
LIVE PREVIEW

Bus 701: Advanced Statistics Harald Schmidbauer c Harald - - PowerPoint PPT Presentation

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer & Angi R osch, 2007 Chapter 12: Correlation c Harald Schmidbauer & Angi R osch, 2007 12. Correlation 2/26 12.1 Introduction Assumptions and the


slide-1
SLIDE 1

Bus 701: Advanced Statistics

Harald Schmidbauer

c Harald Schmidbauer & Angi R¨

  • sch, 2007
slide-2
SLIDE 2

Chapter 12: Correlation

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 2/26
slide-3
SLIDE 3

12.1 Introduction

Assumptions and the problem. In this chapter, we assume that observations (xi, yi), i = 1, . . . , n, from a bivariate metric variable (X, Y ) are given. How can we measure the degree of linear dependence between X and Y ? Whatever the goal of our analysis is, the first step is usually to plot the data.

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 3/26
slide-4
SLIDE 4

12.1 Introduction

Example: The expenditure (in euros) of 508 customers for certain groups of goods at a supermarket was recorded. Recorded were among others: Expenditure for. . .

  • bread
  • cheese
  • dairy products
  • fruit
  • tea & coffee

What is the relation between these variables? — Is there any? Scatterplots will provide us with first insight.

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 4/26
slide-5
SLIDE 5

12.1 Introduction

Expenditure for bread and cheese.

  • 2

4 6 8 10 5 10 15 20

bread cheese

(Shown: only those customers who actually bought both groups.)

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 5/26
slide-6
SLIDE 6

12.1 Introduction

Expenditure for dairy products and cheese.

  • 2

4 6 8 10 5 10 15 20

dairy cheese

(Shown: only those customers who actually bought both groups.)

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 6/26
slide-7
SLIDE 7

12.1 Introduction

Expenditure for tea/coffee and fruit.

  • 5

10 15 20 25 30 5 10 15 20 25 30

tea & coffee fruit

(Shown: only those customers who actually bought both groups.)

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 7/26
slide-8
SLIDE 8

12.1 Introduction

Example: Weekly returns on stock indices DAX (gdaxi) and CAC 40 (fchi).

return on DAX (black), CAC 40 (red) 2004.0 2004.5 2005.0 2005.5 2006.0 −6 −4 −2 2 4

There is obviously a close association between DAX and CAC 40. But to investigate this, another display is more useful.

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 8/26
slide-9
SLIDE 9

12.1 Introduction

Using a scatterplot.

  • −6

−4 −2 2 4 −4 −2 2 4 return on DAX return on CAC 40

The scatterplot reveals the high correlation between returns on DAX and returns on CAC 40.

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 9/26
slide-10
SLIDE 10

12.2 Covariance

Defining the covariance.

  • x

y xi yi I II III IV

  • yi − y

xi − x

Area: (xi − ¯ x)(yi − ¯ y) The covariance is defined as the average size of all rectangles: cov(X, Y ) = 1 n

n

  • i=1

(xi − ¯ x)(yi − ¯ y)

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 10/26
slide-11
SLIDE 11

12.2 Covariance

Interpreting the covariance.

  • x

y xi yi I II III IV

  • yi − y

xi − x

In I and III: (xi − ¯ x)(yi − ¯ y) > 0 In II and IV: (xi − ¯ x)(yi − ¯ y) < 0 If the points (xi, yi) are predominantly in quadrant. . . . . . I and III: cov(X, Y ) > 0 . . . II and IV: cov(X, Y ) < 0

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 11/26
slide-12
SLIDE 12

12.2 Covariance

Some properties of the covariance.

  • The sign of cov(X, Y ) tells us in which direction X and Y

are associated.

  • The covariance is symmetric: cov(X, Y ) = cov(Y, X)
  • It holds that cov(aX + b, Y ) = a · cov(X, Y );

in particular: The covariance depends on the unit of measurement. This makes it sometimes difficult to use. This is why we often prefer to investigate the relationship between two variables using the correlation, rather than the covariance.

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 12/26
slide-13
SLIDE 13

12.3 Correlation

Definition: The correlation of X and Y is defined as r = cor(X, Y ) = cov(X, Y )

  • var(X) · var(Y )

It has the same sign as the covariance. Reminder: var(X) = 1 n

n

  • i=1

(xi − ¯ x)2

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 13/26
slide-14
SLIDE 14

12.3 Correlation

Some properties of the correlation.

  • The sign of cor(X, Y ) tells us in which direction X and Y

are associated.

  • The correlation is normed: −1 ≤ cor(X, Y ) ≤ +1.
  • It holds that cor(X, Y ) = ±1 if and only if all points (xi, yi)

are on a straight line with positive (negative) slope.

  • The correlation is symmetric: cor(X, Y ) = cor(Y, X)
  • It holds that cor(aX + b, Y ) = cor(X, Y ) (a > 0);

in particular: The correlation does not depend on the unit of measurement.

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 14/26
slide-15
SLIDE 15

12.3 Correlation

Correlation patterns I: r > 0, i.e. the linear relation between between X and Y is positive

r large (r ≈ 0.95): X Y

r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r

r smaller (r ≈ 0.75): X Y

r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 15/26
slide-16
SLIDE 16

12.3 Correlation

Correlation patterns II: r < 0, i.e. the linear relation between between X and Y is negative

|r| large (r ≈ −0.95): X Y

r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r

|r| smaller (r ≈ −0.55): X Y

r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 16/26
slide-17
SLIDE 17

12.3 Correlation

Correlation patterns III: r close to 0, with no apparent relation between X and Y

s2

Y small; r ≈ −0.14:

X Y

r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r rr r r r r r r r r r r r r r r

s2

Y larger; r ≈ −0.04:

X Y

r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 17/26
slide-18
SLIDE 18

12.3 Correlation

Correlation patterns IV: r not meaningful because there is a nonlinear relation between X and Y

formally, r ≈ 0: Y X

r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r rr r r r r r rr r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r

formally, r ≈ 0: X Y

r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 18/26
slide-19
SLIDE 19

12.3 Correlation

Uncorrelated and independent are not the same.

  • Two

variables are called uncorrelated if cor(X, Y ) = 0.

  • The last two figures show that being uncorrelated

is a relatively weak property: There can be a strong non-linear relationship between uncorrelated variables.

  • Being independent is much stronger: Independent

variables have no relation whatsoever.

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 19/26
slide-20
SLIDE 20

12.4 Examples

Expenditure for bread and cheese.

  • 2

4 6 8 10 5 10 15 20

bread cheese

r = 0.41 Moderate positive correlation.

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 20/26
slide-21
SLIDE 21

12.4 Examples

Expenditure for dairy products and cheese.

  • 2

4 6 8 10 5 10 15 20

dairy cheese

r = 0.05 Practically uncorrelated.

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 21/26
slide-22
SLIDE 22

12.4 Examples

Expenditure for tea/coffee and fruit.

  • 5

10 15 20 25 30 5 10 15 20 25 30

tea & coffee fruit

r = −0.13 The correlation is negative — but this has no meaningful interpretation.

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 22/26
slide-23
SLIDE 23

12.4 Examples

Weekly returns on stock indices DAX and CAC 40.

  • −6

−4 −2 2 4 −4 −2 2 4 return on DAX return on CAC 40

r = 0.925 Returns on DAX and CAC 40 are highly correlated. Here, the correlation is very useful.

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 23/26
slide-24
SLIDE 24

12.4 Examples

Another application of correlation.

  • We have seen the correlation, as applied to two different

variables X, Y .

  • The concept of correlation can also be applied to a series

(Xt) = X1, X2, X3, . . .

  • cor(Xt, Xt+1) is called the autocorrelation (at lag 1) of the

series (Xt).

  • Autocorrelation is a very important tool in the analysis of a

time series.

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 24/26
slide-25
SLIDE 25

12.4 Examples

Example: Monthly car sales in Turkey.

  • 10000

20000 30000 40000 50000 60000 10000 20000 30000 40000 50000 60000 X_t X_t + 1

r = 0.76 This high autocorrelation can be used for forecasting purposes.

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 25/26
slide-26
SLIDE 26

12.5 Outlook

A final remark.

  • We have used correlation only in the context of

descriptive statistics.

  • We shall come back to inductive statistics in the

next chapter, which deals with a related topic.

c Harald Schmidbauer & Angi R¨

  • sch, 2007
  • 12. Correlation 26/26