Visualization 1 Applied Multivariate Statistics Spring 2012 Goals - - PowerPoint PPT Presentation

visualization 1
SMART_READER_LITE
LIVE PREVIEW

Visualization 1 Applied Multivariate Statistics Spring 2012 Goals - - PowerPoint PPT Presentation

Visualization 1 Applied Multivariate Statistics Spring 2012 Goals Covariance, Correlation (true / sample version) Test for zero correlation: Fishers z -Transformation Scatterplot / Scatterplotmatrix Covariance matrix /


slide-1
SLIDE 1

Visualization 1

Applied Multivariate Statistics – Spring 2012

slide-2
SLIDE 2

Goals

  • Covariance, Correlation (true / sample version)
  • Test for zero correlation: Fisher’s z-Transformation
  • Scatterplot / Scatterplotmatrix
  • Covariance matrix / Correlation matrix
  • Multivariate Normal Distribution
  • Mahalanobis distance

2

  • Appl. Multivariate Statistics - Spring 2012
slide-3
SLIDE 3

Visualization in 1d

3

  • Appl. Multivariate Statistics - Spring 2012
slide-4
SLIDE 4

Normaldistribution in 1d: Most common model choice

4

  • Appl. Multivariate Statistics - Spring 2012

'¹;¾2(x) =

1 p 2¼¾2 exp(¡1 2 ¢ (x¡¹)2 ¾2

)

slide-5
SLIDE 5

'¹;¾2(x) =

1 p 2¼¾2 exp(¡1 2 ¢ (x¡¹)2 ¾2

)

Normaldistribution in 1d: Most common model choice

5

  • Appl. Multivariate Statistics - Spring 2012

Squared Mahalanobis Distance =

  • Sq. Distance from mean in

standard deviations

slide-6
SLIDE 6

Two variables: Covariance and Correlation

  • Covariance:
  • Correlation:
  • Sample covariance:
  • Sample correlation:
  • Correlation is invariant to changes in units,

covariance is not (e.g. kilo/gram, meter/kilometer, etc.)

6

  • Appl. Multivariate Statistics - Spring 2012

Cov(X;Y ) = E[(X ¡ E[X])(Y ¡ E[Y ])] 2 [¡1;1] Corr(X; Y ) = Cov(X;Y )

¾X¾Y

2 [¡1; 1] d Cov(x; y) =

1 n¡1

Pn

i=1(xi ¡ x)(yi ¡ y)

rxy = d Cor(x; y) = c

Cov(x;y) ^ ¾x^ ¾y

slide-7
SLIDE 7

Scatterplot: Correlation is scale invariant

7

  • Appl. Multivariate Statistics - Spring 2012
slide-8
SLIDE 8

Intuition and pitfalls for correlation Correlation = LINEAR relation

8

  • Appl. Multivariate Statistics - Spring 2012
slide-9
SLIDE 9

Test for zero correlation: Fisher’s z-Test

9

  • Appl. Multivariate Statistics - Spring 2012
  • X, Y (bivariate) normal distributed with true correlation ½
  • Take n samples
  • Compute sample correlation r

Compute Compute

  • For large n:
  • Use cor.test() in R to test and get confidence intervals

z = 1

2 log

¡1+r

1¡r

¢ » = 1

2 log

¡ 1+½

1¡½

¢ pn ¡ 1(z ¡ ») » N(0;1)

slide-10
SLIDE 10

Many dimensions: Scatterplot matrix

10

  • Appl. Multivariate Statistics - Spring 2012
slide-11
SLIDE 11

Covariance matrix / correlation matrix: Table of pairwise values

  • True covariance matrix:
  • True correlation matrix:
  • Sample covariance matrix:

Diagonal: Variances

  • Sample correlation matrix:

Diagonal: 1

11

  • Appl. Multivariate Statistics - Spring 2012

§ij = Cov(Xi;Xj) Cij = Cor(Xi;Xj) Sij = d Cov(xi; xj) Rij = d Cor(xi;xj)

slide-12
SLIDE 12

Multivariate Normal Distribution: Most common model choice

12

  • Appl. Multivariate Statistics - Spring 2012

f(x;¹; §) =

1

p

2¼j§j exp

¡ ¡ 1

2 ¢ (x ¡ ¹)T§¡1(x ¡ ¹)

¢

slide-13
SLIDE 13

Multivariate Normal Distribution: Funny facts

If X1, …, Xp multivariate normal, then

  • every linear combination Y = a1 X1 + … + ap Xp

is normally distributed

  • every projection on a subspace is multivariate normally

distributed If margins are normally distributed, then it is NOT GUARANTEED that the underlying distribution is multivariate normal (i.e., “multivariate” is stronger than just normal margins)

13

  • Appl. Multivariate Statistics - Spring 2012
slide-14
SLIDE 14

Multivariate Normal Distribution: Two examples 1000 random samples

14

  • Appl. Multivariate Statistics - Spring 2012

§ = µ 10 3 3 2 ¶ ¹ = µ 5 10 ¶ ; § = µ 1 1 ¶ ¹ = µ ¶ ;

slide-15
SLIDE 15

Multivariate Normal Distribution: Most common model choice

15

  • Appl. Multivariate Statistics - Spring 2012

f(x;¹; §) =

1

p

2¼j§j exp

¡ ¡ 1

2 ¢ (x ¡ ¹)T§¡1(x ¡ ¹)

¢

  • Sq. Mahalanobis Distance MD2(x)

=

  • Sq. distance from mean in

standard deviations IN DIRECTION OF X

slide-16
SLIDE 16

Mahalanobis distance: Example

16

  • Appl. Multivariate Statistics - Spring 2012

§ = µ 25 1 ¶ ¹ = µ ¶ ;

slide-17
SLIDE 17

Mahalanobis distance: Example

17

  • Appl. Multivariate Statistics - Spring 2012

§ = µ 25 1 ¶ ¹ = µ ¶ ;

(20,0)

MD = 4

slide-18
SLIDE 18

Mahalanobis distance: Example

18

  • Appl. Multivariate Statistics - Spring 2012

§ = µ 25 1 ¶ ¹ = µ ¶ ;

(0,10)

MD = 10

slide-19
SLIDE 19

Mahalanobis distance: Example

19

  • Appl. Multivariate Statistics - Spring 2012

§ = µ 25 1 ¶ ¹ = µ ¶ ;

(10, 7)

MD = 7.3

slide-20
SLIDE 20

Concepts to know

  • Covariance, Correlation (true / sample version)
  • Test for zero correlation: Fisher’s z-Transformation
  • Scatterplot / Scatterplotmatrix
  • Covariance matrix / Correlation matrix
  • Multivariate Normal Distribution
  • Mahalanobis distance

20

  • Appl. Multivariate Statistics - Spring 2012
slide-21
SLIDE 21

R commands to know

  • read.csv, head, str, dim
  • colMeans, cov, cor
  • mvrnorm, t, solve, %*%

21

  • Appl. Multivariate Statistics - Spring 2012