Z 1 = a 11 X 1 + a 12 X 2 + + a 1n X n Coefficients for linear - - PowerPoint PPT Presentation

z 1 a 11 x 1 a 12 x 2 a 1n x n
SMART_READER_LITE
LIVE PREVIEW

Z 1 = a 11 X 1 + a 12 X 2 + + a 1n X n Coefficients for linear - - PowerPoint PPT Presentation

Multivariate Fundamentals: Rotation Principal Components Analysis (PCA) Objective: Find linear combinations of the original variables X 1 , X 2 , , X n to produce components Z 1 , Z 2 , , Z n that are uncorrelated in order of their


slide-1
SLIDE 1

Principal Components Analysis (PCA)

Multivariate Fundamentals: Rotation

slide-2
SLIDE 2

Objective: Find linear combinations of the original variables X1, X2, …, Xn

to produce components Z1, Z2, …, Zn that are uncorrelated in order of their importance, and that describe the variation in the original data Components will decrease in the amount of variation they explain (i.e. the first component (Z1) will explain the greatest amount of variation, the second component (Z2) will explain the second greatest amount of variation, and so forth.

Karl Person (1857-1936) Herold Hotelling (1895-1974)

slide-3
SLIDE 3

The math behind PCA

Z1 = a11X1 + a12X2 + … + a1nXn

First principal component (column vector) Column vectors of original variables

Principle components are the linear combinations of the original variables Principle component 1 is NOT a replacement for variable 1 – All variables are used to calculate each principal component For each component: The constraint that a11

2 + a12 2 + … + a1n 2 = 1 ensures Var(Z1) is as large as possible

Coefficients for linear model

slide-4
SLIDE 4

The math behind PCA

  • Z2 is calculated using the same formula and constraint on a2n values

However, there is an addition condition that Z1 and Z2 have zero correlation for the data

  • The correlation condition continues for all successional principle components

i.e. Z3 is uncorrelated with both Z1 and Z2

  • The number of principal components calculated will match the number of

predictor variable included in the analysis

  • The amount of variation explained decreases with each successional principal

component

  • Generally you base your inferences on the first two or three components

because they explain the most variation in your data

  • Typically when you include a lot of predictor variables the last couple of

principal components explain very little (< 1%) of the variation in your data – not useful variables

slide-5
SLIDE 5

PCA in R

PCA in R:

princomp(dataMatrix,cor=T/F) (stats package)

Define whether the PCs should be calculated using the correlation or covariance matrix (derived within the function from the data) You tend to use the covariance matrix when the variable scales are similar and the correlation matrix when variables are on different scales Default is to use the correlation matrix because it standardizes to data before it calculated the PCs, removing the effect of the different units Data matrix of predictor variables You will assign the results back to a class once the PCs have been calculated

slide-6
SLIDE 6

PCA in R

slide-7
SLIDE 7

PCA in R

Loadings – these are the correlations between the original predictor variables and the principal components Identifies which of the original variables are driving the principal component

Example:

Comp.1 – is negatively related to Murder, Assault, and Rape Comp.2 – is negatively related to UrbanPop

Eigenvectors

slide-8
SLIDE 8

PCA in R

Scores – these are the calculated principal components Z1, Z2, …, Zn These are the values we plot to make inferences

slide-9
SLIDE 9

PCA in R

Variance – summary of the output displays the variance explained by each principal component Identifies how much weight you should put in your principal components

Example:

Comp.1 – 62 % Comp.2 – 25% Comp.3 – 9% Comp.4 – 4 %

Eignenvalues divided by the number of PCs

slide-10
SLIDE 10

PCA in R - Biplot

Data points considering Comp.1 and Comp.2 scores (displays row names) Direction of the arrows +/- indicate the trend of points (towards the arrow indicates more of the variable) If vector arrows are perpendicular then the variables are not correlated If you original variables do not have some level of correlation then PCA will NOT work for your analysis – i.e. You wont learn anything!