SVD and PCA
Derek Onken and Li Xiong
SVD and PCA Derek Onken and Li Xiong Feature Extraction Create new - - PowerPoint PPT Presentation
SVD and PCA Derek Onken and Li Xiong Feature Extraction Create new features (attributes) by combining/mapping existing ones Common methods Principle Component Analysis Singular Value Decomposition Other compression methods
Derek Onken and Li Xiong
January 29, 2018 2
Create new features (attributes) by combining/mapping
Common methods
Principle Component Analysis Singular Value Decomposition
Other compression methods (time-frequency analysis)
Fourier transform (e.g. time series) Discrete Wavelet Transform (e.g. 2D images)
January 29, 2018 3
Principle component analysis: find the dimensions that capture the most variance
A linear mapping of the data to a new coordinate system such that
the greatest variance lies on the first coordinate (the first principal component), the second greatest variance on the second coordinate, and so on.
Steps
Normalize input data: each attribute falls within the same range Compute k orthonormal (unit) vectors, i.e., principal components -
each input data (vector) is a linear combination of the k principal component vectors
The principal components are sorted in order of decreasing
“significance”
Weak components can be eliminated, i.e., those with low variance
Mathematically
Compute the covariance matrix Find the eigenvectors of the
covariance matrix correspond to large eigenvalues Y X v
5
6
7
8
9
How the eigenvalues and eigenvectors create a Matrix decomposition.
Columns of Q are eigenvectors
Λ contains eigenvalues
Columns of u are left-singular vectors Columns of v are right-singular vects Σ contains ordered singular values 𝜏𝑗 A must be square and we defined A as A=MTM. The vj are eigenvectors of MTM. The ui are eigenvectors of MMT. The eigenvalues are squares of the singular values. (𝜇𝑗 = 𝜏𝑗
2)
CS246: Mining Massive Datasets Jure Leskovec, Stanford University
It is always possible to decompose a real matrix A into A = U VT , where
U, , V: unique U, V: column orthonormal
UT U = I; VT V = I (I: identity matrix) (Columns are orthogonal unit vectors)
: diagonal
Entries (singular values) are positive,
1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 15
Nice proof of uniqueness: http://www.mpi-inf.mpg.de/~bast/ir-seminar-ws04/lecture2.pdf
Consider a matrix. What does SVD do?
1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 16
SciFi
Romance
Matrix Alien Serenity Casablanca Amelie
m n
“Concepts” AKA Latent dimensions AKA Latent factors
A = U VT - example: Users to Movies
1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 17
SciFi
Matrix Alien Serenity Casablanca Amelie
Romance
A = U VT - example: Users to
1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 18
SciFi-concept Romance-concept
SciFi
Matrix Alien Serenity Casablanca Amelie
Romance
A = U VT - example:
1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 19
Romance-concept
SciFi-concept
SciFi
Matrix Alien Serenity Casablanca Amelie
Romance
A = U VT - example:
1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 20
SciFi SciFi-concept “strength” of the SciFi-concept
SciFi
Matrix Alien Serenity Casablanca Amelie
Romance
A = U VT - example:
1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 21
SciFi-concept
SciFi-concept
SciFi
Matrix Alien Serenity Casablanca Amelie
Romance
1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 22
U: user-to-concept matrix V: movie-to-concept matrix : its diagonal elements:
‘strength’ of each concept
Fact: SVD gives ‘best’ axis to project on:
‘best’ = minimizing the sum of reconstruction errors
1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 23
Sigma
=
Sigma
=
𝐵 − 𝐶 𝐺 =
𝑗𝑘
𝐵𝑗𝑘 − 𝐶𝑗𝑘
2
Q: Find users that like ‘Matrix’ A: Map query into a ‘concept space’ – how?
1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 25
SciFi Romnce
Matrix Alien Serenity Casablanca Amelie
1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.59 0.65 0.07 -0.73 -0.67 0.07 -0.29 0.32 12.4 0 0 0 9.5 0 0 0 1.3 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09
Q: Find users that like ‘Matrix’ A: Map query into a ‘concept space’ – how?
1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 26
5 0
q =
Matrix Alien v1 q v2
Matrix Alien Serenity Casablanca Amelie
Project into concept space: Inner product with each ‘concept’ vector vi
Q: Find users that like ‘Matrix’ A: Map query into a ‘concept space’ – how?
1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 27
v1 q q*v1 5 0
Matrix Alien Serenity Casablanca Amelie
v2 Matrix Alien
q =
Project into concept space: Inner product with each ‘concept’ vector vi
Compactly, we have: qconcept = q V E.g.:
1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 28
movie-to-concept factors (V)
SciFi-concept
5 0
Matrix Alien Serenity Casablanca Amelie
q = 0.56 0.12 0.59 -0.02 0.56 0.12 0.09 -0.69 0.09 -0.69
2.8 0.6
How would the user d that rated
(‘Alien’, ‘Serenity’) be handled? dconcept = d V E.g.:
1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 29
movie-to-concept factors (V)
SciFi-concept
0 4 5
Matrix Alien Serenity Casablanca Amelie
q = 0.56 0.12 0.59 -0.02 0.56 0.12 0.09 -0.69 0.09 -0.69
5.2 0.4
Observation: User d that rated (‘Alien’, ‘Serenity’) will be similar to user q
1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 30
0 4 5
d =
SciFi-concept
5 0
q =
Matrix Alien Serenity Casablanca Amelie Zero ratings in common Similarity > 0
2.8 0.6 5.2 0.4
+ Optimal low-rank approximation
in terms of Frobenius norm
A singular vector specifies a linear
combination of all input columns or rows
Singular vectors are dense!
1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 31
U VT