[PPT] - SVD and PCA Derek Onken and Li Xiong Feature Extraction Create new PowerPoint Presentation

SLIDE 1

SVD and PCA

Derek Onken and Li Xiong

SLIDE 2

January 29, 2018 2

Feature Extraction

 Create new features (attributes) by combining/mapping

existing ones

 Common methods

 Principle Component Analysis  Singular Value Decomposition

 Other compression methods (time-frequency analysis)

 Fourier transform (e.g. time series)  Discrete Wavelet Transform (e.g. 2D images)

SLIDE 3

January 29, 2018 3 

Principle component analysis: find the dimensions that capture the most variance

 A linear mapping of the data to a new coordinate system such that

the greatest variance lies on the first coordinate (the first principal component), the second greatest variance on the second coordinate, and so on.



Steps

 Normalize input data: each attribute falls within the same range  Compute k orthonormal (unit) vectors, i.e., principal components -

each input data (vector) is a linear combination of the k principal component vectors

 The principal components are sorted in order of decreasing

“significance”

 Weak components can be eliminated, i.e., those with low variance

Principal Component Analysis (PCA)

SLIDE 4

Dimensionality Reduction: PCA

 Mathematically

 Compute the covariance matrix  Find the eigenvectors of the

covariance matrix correspond to large eigenvalues Y X v

SLIDE 5

PCA: Illustrative Example

5

SLIDE 6

PCA: Illustrative Example

6

SLIDE 7

PCA: Illustrative Example

7

SLIDE 8

PCA: Illustrative Example

8

SLIDE 9

PCA: Illustrative Example

9

SLIDE 10

Eigen Decomposition

How the eigenvalues and eigenvectors create a Matrix decomposition.

Q is a matrix consisting of the eigenvectors
Λ is the diagonal matrix containing all the eigenvalues

SLIDE 11

Singular Value Decomposition (SVD)

SLIDE 12

Similarity of Eigen and SVD

 Columns of Q are eigenvectors

 Λ contains eigenvalues

 Columns of u are left-singular vectors  Columns of v are right-singular vects  Σ contains ordered singular values 𝜏𝑗  A must be square and we defined A as A=MTM.  The vj are eigenvectors of MTM.  The ui are eigenvectors of MMT.  The eigenvalues are squares of the singular values. (𝜇𝑗 = 𝜏𝑗

2)

SLIDE 13

AN APPLICATION EXAMPLE…..

SLIDE 14

FROM:: Dimensionality Reduction: SVD & CUR

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

http://cs246.stanford.edu

SLIDE 15

SVD - Properties

It is always possible to decompose a real matrix A into A = U  VT , where

 U, , V: unique  U, V: column orthonormal

 UT U = I; VT V = I (I: identity matrix)  (Columns are orthogonal unit vectors)

 : diagonal

 Entries (singular values) are positive,

and sorted in decreasing order (σ1  σ2  ...  0)

1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 15

Nice proof of uniqueness: http://www.mpi-inf.mpg.de/~bast/ir-seminar-ws04/lecture2.pdf

SLIDE 16

SVD – Example: Users-to-Movies

 Consider a matrix. What does SVD do?

1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 16

=

SciFi

Romance

Matrix Alien Serenity Casablanca Amelie

1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2



m n

U VT

“Concepts” AKA Latent dimensions AKA Latent factors

SLIDE 17

SVD – Example: Users-to-Movies

 A = U  VT - example: Users to Movies

1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 17

=

SciFi

x x

Matrix Alien Serenity Casablanca Amelie

1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.59 0.65 0.07 -0.73 -0.67 0.07 -0.29 0.32 12.4 0 0 0 9.5 0 0 0 1.3 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09

Romance

SLIDE 18

SVD – Example: Users-to-Movies

 A = U  VT - example: Users to

Movies

1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 18

SciFi-concept Romance-concept

=

SciFi

x x

Matrix Alien Serenity Casablanca Amelie

1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.59 0.65 0.07 -0.73 -0.67 0.07 -0.29 0.32 12.4 0 0 0 9.5 0 0 0 1.3 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09

Romance

SLIDE 19

SVD – Example: Users-to-Movies

 A = U  VT - example:

1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 19

Romance-concept

U is “user-to-concept” factor matrix

SciFi-concept

=

SciFi

x x

Matrix Alien Serenity Casablanca Amelie

1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.59 0.65 0.07 -0.73 -0.67 0.07 -0.29 0.32 12.4 0 0 0 9.5 0 0 0 1.3 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09

Romance

SLIDE 20

SVD – Example: Users-to-Movies

 A = U  VT - example:

1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 20

SciFi SciFi-concept “strength” of the SciFi-concept

=

SciFi

x x

Matrix Alien Serenity Casablanca Amelie

1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.59 0.65 0.07 -0.73 -0.67 0.07 -0.29 0.32 12.4 0 0 0 9.5 0 0 0 1.3 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09

Romance

SLIDE 21

SVD – Example: Users-to-Movies

 A = U  VT - example:

1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 21

SciFi-concept

V is “movie-to-concept” factor matrix

SciFi-concept

=

SciFi

x x

Matrix Alien Serenity Casablanca Amelie

1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.59 0.65 0.07 -0.73 -0.67 0.07 -0.29 0.32 12.4 0 0 0 9.5 0 0 0 1.3 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09

Romance

SLIDE 22

1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 22

SVD - Interpretation #1 ‘movies’, ‘users’ and ‘concepts’:

 U: user-to-concept matrix  V: movie-to-concept matrix  : its diagonal elements:

‘strength’ of each concept

SLIDE 23

SVD – Best Low Rank Approx.

 Fact: SVD gives ‘best’ axis to project on:

 ‘best’ = minimizing the sum of reconstruction errors

1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 23

A U

Sigma

VT

=

B U

Sigma

VT

=

B is best approximation of A:

𝐵 − 𝐶 𝐺 =

𝑗𝑘

𝐵𝑗𝑘 − 𝐶𝑗𝑘

2

SLIDE 24

Example of SVD

SLIDE 25

Case study: How to query?

 Q: Find users that like ‘Matrix’  A: Map query into a ‘concept space’ – how?

1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 25

=

SciFi Romnce

x x

Matrix Alien Serenity Casablanca Amelie

1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.59 0.65 0.07 -0.73 -0.67 0.07 -0.29 0.32 12.4 0 0 0 9.5 0 0 0 1.3 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09

SLIDE 26

Case study: How to query?

 Q: Find users that like ‘Matrix’  A: Map query into a ‘concept space’ – how?

1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 26

5 0

q =

Matrix Alien v1 q v2

Matrix Alien Serenity Casablanca Amelie

Project into concept space: Inner product with each ‘concept’ vector vi

SLIDE 27

Case study: How to query?

 Q: Find users that like ‘Matrix’  A: Map query into a ‘concept space’ – how?

1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 27

v1 q q*v1 5 0

Matrix Alien Serenity Casablanca Amelie

v2 Matrix Alien

q =

Project into concept space: Inner product with each ‘concept’ vector vi

SLIDE 28

Case study: How to query?

Compactly, we have: qconcept = q V E.g.:

1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 28

movie-to-concept factors (V)

=

SciFi-concept

5 0

Matrix Alien Serenity Casablanca Amelie

q = 0.56 0.12 0.59 -0.02 0.56 0.12 0.09 -0.69 0.09 -0.69

x

2.8 0.6

SLIDE 29

Case study: How to query?

 How would the user d that rated

(‘Alien’, ‘Serenity’) be handled? dconcept = d V E.g.:

1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 29

movie-to-concept factors (V)

=

SciFi-concept

0 4 5

Matrix Alien Serenity Casablanca Amelie

q = 0.56 0.12 0.59 -0.02 0.56 0.12 0.09 -0.69 0.09 -0.69

x

5.2 0.4

SLIDE 30

Case study: How to query?

 Observation: User d that rated (‘Alien’, ‘Serenity’) will be similar to user q

that rated (‘Matrix’), although d and q have zero ratings in common!

1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 30

0 4 5

d =

SciFi-concept

5 0

q =

Matrix Alien Serenity Casablanca Amelie Zero ratings in common Similarity > 0

2.8 0.6 5.2 0.4

SLIDE 31

SVD: Drawbacks

+ Optimal low-rank approximation

in terms of Frobenius norm

Interpretability problem:

 A singular vector specifies a linear

combination of all input columns or rows

Lack of sparsity:

 Singular vectors are dense!

1/29/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets 31

=

U  VT