Big Data Management & Analytics
EXERCISE 8 β TEXT PROCESSING, PCA
21st of December, 2015
Sabrina Friedl LMU Munich
1
Big Data Management & Analytics EXERCISE 8 TEXT PROCESSING, PCA - - PowerPoint PPT Presentation
Big Data Management & Analytics EXERCISE 8 TEXT PROCESSING, PCA 21st of December, 2015 Sabrina Friedl LMU Munich 1 Product Component Analysis (PCA) REVISION AND EXAMPLE 2 Goals of PCA Find a lower-dimensional representation of
EXERCISE 8 β TEXT PROCESSING, PCA
21st of December, 2015
Sabrina Friedl LMU Munich
1
REVISION AND EXAMPLE
2
3
Find a lower-dimensional representation of data to:
features
visualization is possible only for few dimensions)
d=2 d=3
A good data representation retains the main differences between data points but eliminates irrelevant variances
variance = principal components: π€1, π€2, β¦ π€π
(n x d) * (d x k) = (n x k)
4 X = raw data matrix P = (v1, v2,... vk) transformation matrix Y = k-dimensional representation of X
5
Center data Transform by P
6
Calculate the eigenvalues and eigenvectors of the covariance matrix
=π·ππ(π, π) Describes the pairwise correlation between all features For a centralized data matrix π with Β΅ = 0 we can calculate the covariance matrix as:
π π ππΌπ =
Sigma here is the name of the matrix, not the sum symbol!
7
8
For π dimensions of π we get π eigevalues and eigenvectors. The transformation matrix is then constructed by putting the eigenvectors as columns into a matrix: T = π€1, π€2, β¦ π€π Eigendecomposition: Ξ£ = πΙ ππ To get a k-dimensional representation Y of (centered) data X we take only the first k eigenvectors (principal components) of T and call this matrix P. We calculate: ππΈ = Y To transform back: Z = πππ
Ξ£ = covariance matrix T = (v1, v2,... vn) transformation matrix Ι = diagonalised matrix with eigenvalues on diagonal
1. Center the data π : π¦π β Β΅π 2. Calculate the covariance-matrix: 3. Calculate the eigenvalues and eigenvectors of Ξ£
4. Select the π eigenvectors with the biggest eigenvalues and create P = (π€1, π€2, β¦ π€π) 5. Transform the original (n x d) matrix π to a (n x k) representation: ππ = π
9 Ξ£ = 1 π πππ
HDData.DimensionalityReduction.pdf
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
10