[PPT] - http://cs246.stanford.edu High-dimension == many features Find PowerPoint Presentation

SLIDE 1

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

http://cs246.stanford.edu

SLIDE 2

 High-dimension == many features  Find concepts/topics/genres:

Documents:
Features: thousands of words, millions of word pairs
Surveys – Netflix: 480k users x 177k movies

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 2

SLIDE 3

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 3

 Compress / reduce dimensionality:

106 rows; 103 columns; no updates
random access to any cell(s); small error: OK

SLIDE 4

 Assumption: Data lies on or near a low

d-dimensional subspace

 Axes of this subspace are effective

representation of the data

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 4

SLIDE 5

 Why reduce dimensionality?

Discover hidden correlations/topics
Words that occur commonly together
Remove redundant and noisy features
Not all words are useful
Interpretation and visualization
Easier storage and processing of the data

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 5

SLIDE 6

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 6

A[n x m] = U[n x r] Σ [ r x r] (V[m x r])T

 A: Input data matrix

n x m matrix (e.g., n documents, m terms)

 U: Left singular vectors

n x r matrix (n documents, r concepts)

 Σ: Singular values

r x r diagonal matrix (strength of each ‘concept’)

(r : rank of the matrix)

 V: Right singular vectos

m x r matrix (m terms, r concepts)

SLIDE 7

7

A

m n

Σ

m n

U VT

≈

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets

SLIDE 8

8

A

m n

≈

+

σ1u1v1 σ2u2v2

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets

σi … scalar ui … vector vi … vector

SLIDE 9

It is always possible to decompose a real matrix A into A = U Σ VT , where

 U, Σ, V: unique  U, V: column orthonormal:

UT U = I; VT V = I (I: identity matrix)
(Cols. are orthogonal unit vectors)

 Σ: diagonal

Entries (singular values) are positive,

and sorted in decreasing order (σ1 ≥ σ2 ≥ σ3 ≥ ...)

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 9

SLIDE 10

 A = U Σ VT - example:

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 10

1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1

0.18 0 0.36 0 0.18 0 0.90 0 0.53 0.80 0.27

= SciFi Romnce

9.64 0 5.29

x

0.58 0.58 0.58 0 0.71 0.71

x Matrix Alien Serenity Casablanca Amelie

SLIDE 11

 A = U Σ VT - example:

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 11

1 1 1 2 2 2 1 1 1 5 5 5 0 0 2 2 0 0 3 3 0 0 1 1

0.18 0 0.36 0 0.18 0 0.90 0 0.53 0.80 0.27

=

9.64 0 5.29

x

0.58 0.58 0.58 0 0.71 0.71

x SciFi-concept Romance-concept SciFi Romnce Matrix Alien Serenity Casablanca Amelie

SLIDE 12

 A = U Σ VT - example:

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 12

1 1 1 2 2 2 1 1 1 5 5 5 0 0 2 2 0 0 3 3 0 0 1 1

0.18 0 0.36 0 0.18 0 0.90 0 0.53 0.80 0.27

=

9.64 0 5.29

x

0.58 0.58 0.58 0 0.71 0.71

x SciFi-concept Romance-concept

user-to-concept similarity matrix

SciFi Romnce Matrix Alien Serenity Casablanca Amelie

SLIDE 13

 A = U Σ VT - example:

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 13

1 1 1 2 2 2 1 1 1 5 5 5 0 0 2 2 0 0 3 3 0 0 1 1

0.18 0 0.36 0 0.18 0 0.90 0 0.53 0.80 0.27

=

9.64 0 5.29

x

0.58 0.58 0.58 0 0.71 0.71

x ‘strength’ of SciFi-concept SciFi Romnce Matrix Alien Serenity Casablanca Amelie

SLIDE 14

 A = U Σ VT - example:

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 14

1 1 1 2 2 2 1 1 1 5 5 5 0 0 2 2 0 0 3 3 0 0 1 1

0.18 0 0.36 0 0.18 0 0.90 0 0.53 0.80 0.27

=

9.64 0 5.29

x

0.58 0.58 0.58 0 0.71 0.71

x

movie-to-concept similarity matrix

SciFi-concept SciFi Romnce Matrix Alien Serenity Casablanca Amelie

SLIDE 15

 A = U Σ VT - example:

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 15

1 1 1 2 2 2 1 1 1 5 5 5 0 0 2 2 0 0 3 3 0 0 1 1

0.18 0 0.36 0 0.18 0 0.90 0 0.53 0.80 0.27

=

9.64 0 5.29

x

0.58 0.58 0.58 0 0.71 0.71

x SciFi-concept SciFi Romnce Matrix Alien Serenity Casablanca Amelie

movie-to-concept similarity matrix

SLIDE 16

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 16

‘movies’, ‘users’ and ‘concepts’:

 U: user-to-concept similarity matrix  V: movie-to-concept sim. matrix  Σ: its diagonal elements:

‘strength’ of each concept

SLIDE 17

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 17

SVD gives best axis to project on:

 ‘best’ = min sum

f squares of

projection errors

 minimum

reconstruction error

v1 first singular vector Movie 1 rating Movie 2 rating

SLIDE 18

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 18

 A = U Σ VT - example:

1 1 1 2 2 2 1 1 1 5 5 5 0 0 2 2 0 0 3 3 0 0 1 1

0.18 0 0.36 0 0.18 0 0.90 0 0.53 0.80 0.27 9.64 0 5.29

x

0.58 0.58 0.58 0 0.71 0.71

x v1

=

SLIDE 19

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 19

 A = U Σ VT - example:

1 1 1 2 2 2 1 1 1 5 5 5 0 0 2 2 0 0 3 3 0 0 1 1

0.18 0 0.36 0 0.18 0 0.90 0 0.53 0.80 0.27 9.64 0 5.29

x

0.58 0.58 0.58 0 0.71 0.71

x variance (‘spread’)

n the v1 axis

=

SLIDE 20

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 20

 A = U Σ VT - example:

UΣ: gives the coordinates of the

points in the projection axis

1 1 1 2 2 2 1 1 1 5 5 5 0 0 2 2 0 0 3 3 0 0 1 1

0.18 0 0.36 0 0.18 0 0.90 0 0.53 0.80 0.27 9.64 0 5.29

x

0.58 0.58 0.58 0 0.71 0.71

x

=

SLIDE 21

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 21

More details

 Q: How exactly is dim. reduction done?

1 1 1 2 2 2 1 1 1 5 5 5 0 0 2 2 0 0 3 3 0 0 1 1

0.18 0 0.36 0 0.18 0 0.90 0 0.53 0.80 0.27 9.64 0 5.29

SLIDE 25

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 25

More details

 Q: How exactly is dim. reduction done?  A: Set the smallest singular values to zero:

1 1 1 2 2 2 1 1 1 5 5 5 0 0 2 2 0 0 3 3 0 0 1 1

0.18 0.36 0.18 0.90

9.64

x

0.58 0.58 0.58 0

x A=

~

SLIDE 26

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 26

More details

 Q: How exactly is dim. reduction done?  A: Set the smallest singular values to zero

1 1 1 2 2 2 1 1 1 5 5 5 0 0 2 2 0 0 3 3 0 0 1 1

~

1 1 1 2 2 2 1 1 1 5 5 5 0 0 0 0 0 0

A= B=

SLIDE 27

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 27

 Theorem: Let A = U Σ VT

(σ1≥σ2≥…, rank(A)=n)

then B = U S VT

S = diagonal nxn matrix where si=σi (i=1…k) else si=0

is a best rank-k approximation to A:

B is solution to minB ǁA-BǁF where rank(B)=k

 Why?

∑

= =

− = − Σ = −

n i i i s F F k B rank B

s S B A

i

1 2 ) ( ,

) ( min min min σ

∑ ∑ ∑

+ = + = =

= + − =

n k i i n k i i k i i i s

s

i

1 2 1 2 1 2

) ( min σ σ σ

SLIDE 28

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 28

Equivalent: ‘spectral decomposition’ of the matrix:

1 1 1 2 2 2 1 1 1 5 5 5 0 0 2 2 0 0 3 3 0 0 1 1

= x x u1 u2 σ1 σ2 v1 v2

SLIDE 29

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 29

Equivalent: ‘spectral decomposition’ of the matrix:

1 1 1 2 2 2 1 1 1 5 5 5 0 0 2 2 0 0 3 3 0 0 1 1

=

u1 σ1 vT

1

u2 σ2 vT

2

+ +... n m

n x 1 1 x m

r terms assume: σ1 ≥ σ2 ≥ σ3 ≥ ...

SLIDE 30

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 30

Q: How many σs to keep? A: Rule-of-a thumb: keep 80-90% of ‘energy’ (=∑σi

2)

1 1 1 2 2 2 1 1 1 5 5 5 0 0 2 2 0 0 3 3 0 0 1 1

= u1 σ1 vT

1

u2 σ2 vT

2

+ +... n m assume: σ1 ≥ σ2 ≥ σ3 ≥ ...

SLIDE 31

 To compute SVD:

O(nm2) or O(n2m) (whichever is less)

 But:

Less work, if we just want singular values
or if we want first k singular vectors
or if the matrix is sparse

 Implemented:

Linear algebra packages like: LINPACK, Matlab,

SPlus, Mathematica ...

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 31

SLIDE 32

 SVD: A= U Σ VT: unique

U: user-to-concept similarities
V: movie-to-concept similarities
Σ : strength of each concept

 Dimensionality reduction:

keep the few largest singular values

(80-90% of ‘energy’)

SVD: picks up linear correlations

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 32

SLIDE 33

 SVD gives us:

A = U Σ VT

 Eigen-decomposition:

A = X L XT
A is symmetric
U, V, X are orthonormal (UTU=I),
Λ, Σ are diagonal

 What is:

AAT= UΣ VT(UΣ VT)T = UΣ VT(VΣTUT) = UΣΣT UT
ATA= VΣT UT (UΣ VT) = V ΣΣT VT

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 33

X L XT

So, λi = σi

2

SLIDE 34

 A AT = U Σ2 UT  ATA = V Σ2 VT  (ATA) k = V Σ2k VT

E.g.: (ATA)2 = V Σ2 VT V Σ2 VT = V Σ4 VT

 (ATA) k ~ v1 σ1

2k v1 T

for k>>1

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 34

SLIDE 35

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 35

Q: Find users that like ‘Matrix’ and ‘Alien’

1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1

0.18 0 0.36 0 0.18 0 0.90 0 0.53 0.80 0.27

= SciFi Romnce

9.64 0 5.29

x

0.58 0.58 0.58 0 0.71 0.71

x Matrix Alien Serenity Casablanca Amelie

SLIDE 36

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 36

Q: Find users that like ‘Matrix’ and ‘Alien’ A: Map query into a ‘concept space’ – how?

1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1

0.18 0 0.36 0 0.18 0 0.90 0 0.53 0.80 0.27

= SciFi Romnce

9.64 0 5.29

x

0.58 0.58 0.58 0 0.71 0.71

x Matrix Alien Serenity Casablanca Amelie

SLIDE 37

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 37

Q: Find users that like ‘Matrix’ A: map query vectors into ‘concept space’ – how?

5 0

q= Matrix Alien v1 q v2 Matrix Alien Serenity Casablanca Amelie Project into concept space: Inner product with each ‘concept’ vector vi

SLIDE 38

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 38

Q: Find users that like ‘Matrix’ A: map query vectors into ‘concept space’ – how?

v1 q q*v1

5 0

q= Matrix Alien Serenity Casablanca Amelie v2 Matrix Alien Project into concept space: Inner product with each ‘concept’ vector vi

SLIDE 39

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 39

Compactly, we have: qconcept = q V E.g.:

0.58 0 0.58 0 0.58 0 0.71 0.71

movie-to-concept similarities =

2.9

SciFi-concept

5 0

q= Matrix Alien Serenity Casablanca Amelie

SLIDE 40

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 40

How would the user (‘Alien’, ‘Serenity’) be handled? dconcept = d V E.g.:

0.58 0 0.58 0 0.58 0 0.71 0.71

movie-to-concept similarities =

5.22 0

SciFi-concept

0 4 5

d= Matrix Alien Serenity Casablanca Amelie

SLIDE 41

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 41

Observation: User (‘Alien’, ‘Serenity’) will be retrieved by query (‘Matrix’), although it did not rate ‘Matrix’!

0 4 5

d=

1.16 0

SciFi-concept

5 0

0.58 0

q= Matrix Alien Serenity Casablanca Amelie

SLIDE 42

+ Optimal low-rank approximation:

in L2 norm
Interpretability problem:
A singular vector specifies a linear

combination of all input columns or rows

Lack of Sparsity:
Singular vectors are dense

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 42

=

U Σ VT

SLIDE 43

 Goal:

Make ǁA-CURǁF small

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 43

A C U R

Frobenius norm:

ǁXǁF = Σij Xij

2

SLIDE 44

 Goal:

Make ǁA-CURǁF small

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 44

Pseudo-inverse of the intersection of C and R

A C U R

Frobenius norm:

ǁXǁF = Σij Xij

2

SLIDE 45

 Let:

Ak be the “best” rank k approximation to A (e.i., SVD) Theorem [Drineas et al.]: CUR in O(mn) time achieves

ǁA-CURǁF ≤ ǁA-AkǁF + εǁAǁF

with probability at least 1-δ, by picking

O(k log(1/δ)/ε2) columns, and
O(k2log3(1/δ)/ε6) rows

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 45

SLIDE 46

 Sample columns (similarly for rows):

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 46

SLIDE 47

 Let W be the “intersection” of

sampled columns C and rows R

Let SVD of W = X Σ YT

 Then:

U = W+ = X Σ+ YT

Σ+: reciprocals of non-zero

singular values: Σ+

ii =1/ Σii

i.e., Moore–Penrose pseudoinverse

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 47

A C R U = W+ W

≈

SLIDE 48

+ Easy interpretation

Since the basis vectors are actual

columns and rows

+ Sparse basis

Since the basis vectors are actual

columns and rows

Duplicate columns and rows
Columns of large norms will be sampled many

times

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 48

Singular vector Actual column

SLIDE 49

 If we want to get rid of the duplicates:

Throw them away
Scale the columns/rows by the square

root of the number of duplicates

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 49

A Cd Rd Cs Rs

Construct a small U

SLIDE 50

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 50

SVD: A = U Σ VT

Huge but sparse Big and dense

CUR: A = C U R

Huge but sparse Big but sparse dense but small sparse and small

SLIDE 51

 DBLP bibliographic data

Author-to-conference big sparse matrix
Aij: Number of papers published by author i at conference j
428K authors (rows), 3659 conferences (columns)
Very sparse

 Want to reduce dimensionality

How much time does it take?
What is the reconstruction error?
How much space do we need?

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 51

SLIDE 52

 Accuracy: 1 – relative sum square error  Space ratio:

#output matrix entries / #input matrix entries

 CPU time

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 52

SVD CUR CUR no duplicates SVD CUR CUR no dup

More details: Sun,Faloutsos: Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM ’07.

SLIDE 53

 SVD is limited to linear projections:

Lower-dimensional linear projection

that preserves Euclidean distances

 Non-linear methods: Isomap

Data lies on a nonlinear low-dim curve aka manifold
Use the distance as measured along the manifold
How?
Build adjacency graph
Geodesic distance is

graph distance

SVD/PCA the graph

pairwise distance matrix

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 53

SLIDE 54

 Drineas et al., Fast Monte Carlo Algorithms for Matrices III:

Computing a Compressed Approximate Matrix Decomposition, SIAM Journal on Computing, 2006.

 J. Sun, Y. Xie, H. Zhang, C. Faloutsos: Less is More:

Compact Matrix Decomposition for Large Sparse Graphs, SDM 2007

 Intra- and interpopulation genotype reconstruction from

tagging SNPs, P. Paschou, M. W. Mahoney, A. Javed, J. R. Kidd, A. J. Pakstis, S. Gu, K. K. Kidd, and P. Drineas, Genome Research, 17(1), 96-107 (2007)

 Tensor-CUR Decompositions For Tensor-Based Data, M. W.

Mahoney, M. Maggioni, and P. Drineas, Proc. 12-th Annual SIGKDD, 327-336 (2006)

1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 54