http cs246 stanford edu high dimensional many features
play

http://cs246.stanford.edu High dimensional == many features Find - PowerPoint PPT Presentation

CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu High dimensional == many features Find concepts/topics/genres: Documents: Features: Thousands of words, millions of word pairs


  1. CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu

  2.  High ‐ dimensional == many features  Find concepts/topics/genres:  Documents:  Features: Thousands of words, millions of word pairs  Surveys – Netflix: 480k users x 177k movies 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2

  3.  Compress / reduce dimensionality:  10 6 rows; 10 3 columns; no updates  random access to any cell(s); small error: OK 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 3

  4.  Assumption: Data lies on or near a low d ‐ dimensional subspace  Axes of this subspace are effective representation of the data 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 4

  5. Why reduce dimensions?  Discover hidden correlations/topics  Words that occur commonly together  Remove redundant and noisy features  Not all words are useful  Interpretation and visualization  Easier storage and processing of the data 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 5

  6. A [m x n] = U [m x r]   r x r] ( V [n x r] ) T  A : Input data matrix  m x n matrix (e.g., m documents, n terms)  U : Left singular vectors  m x r matrix ( m documents, r concepts)   : Singular values  r x r diagonal matrix (strength of each ‘concept’) ( r : rank of the matrix A )  V : Right singular vectors  n x r matrix ( n terms, r concepts) 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 6

  7. n n   V T m m A U 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 7

  8. n  1 u 1 v 1  2 u 2 v 2  + m A σ i … scalar u i … vector v i … vector 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 8

  9. It is always possible to decompose a real matrix A into A = U  V T , where  U,  , V : unique  U, V : column orthonormal:  U T U = I ; V T V = I ( I : identity matrix)  (Cols. are orthogonal unit vectors)   : diagonal  Entries ( singular values ) are positive, and sorted in decreasing order ( σ 1  σ 2  ...  0) 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 9

  10.  A = U  V T ‐ example: Casablanca Serenity Amelie Matrix Alien 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 0.90 0 5 5 5 0 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 10

  11.  A = U  V T ‐ example: Casablanca SciFi ‐ concept Serenity Amelie Matrix Romance ‐ concept Alien 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 11

  12.  A = U  V T ‐ example: U is “user ‐ to ‐ concept” similarity matrix Casablanca SciFi ‐ concept Serenity Amelie Matrix Romance ‐ concept Alien 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 12

  13.  A = U  V T ‐ example: Casablanca Serenity Amelie Matrix Alien ‘strength’ of SciFi ‐ concept 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 13

  14.  A = U  V T ‐ example: V is “movie ‐ to ‐ concept” similarity matrix Casablanca Serenity Amelie Matrix Alien 0.18 0 1 1 1 0 0 SciFi ‐ concept 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 14

  15.  A = U  V T ‐ example: V is “movie ‐ to ‐ concept” similarity matrix Casablanca Serenity Amelie Matrix Alien 0.18 0 1 1 1 0 0 SciFi ‐ concept 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 15

  16. ‘ movies ’, ‘ users ’ and ‘ concepts ’:  U : user ‐ to ‐ concept similarity matrix  V : movie ‐ to ‐ concept sim. matrix   : its diagonal elements: ‘strength’ of each concept 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 16

  17. SVD gives best axis Movie 2 rating to project on:  ‘best’ = min sum first singular of squares of vector projection errors  minimum reconstruction v 1 error Movie 1 rating 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 17

  18.  A = U  V T ‐ example: 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 v 1 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 18

  19.  A = U  V T ‐ example: variance (‘spread’) on the v 1 axis 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 19

  20.  A = U  V T ‐ example:  U  Gives the coordinates of the points in the projection axis 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 20

  21. More details  Q: How exactly is dim. reduction done? 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 21

  22. More details  Q: How exactly is dim. reduction done?  A: Set the smallest singular values to zero 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 A= 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 22

  23. More details  Q: How exactly is dim. reduction done?  A: Set the smallest singular values to zero 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 x x ~ 0 0 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 A= 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 23

  24. More details  Q: How exactly is dim. reduction done?  A: Set the smallest singular values to zero: 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 x x ~ 0 0 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 A= 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 24

  25. More details  Q: How exactly is dim. reduction done?  A: Set the smallest singular values to zero: 0.18 1 1 1 0 0 0.36 2 2 2 0 0 9.64 0.18 1 1 1 0 0 x x ~ 5 5 5 0 0 0.90 0 0 0 2 2 0 A= 0 0 0 3 3 0 0.58 0.58 0.58 0 0 0 0 0 1 1 0 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 25

  26. More details  Q: How exactly is dim. reduction done?  A: Set the smallest singular values to zero B= 1 1 1 0 0 1 1 1 0 0 Frobenius norm: 2 2 2 0 0 2 2 2 0 0 ǁ M ǁ F = Σ ij M ij 2 1 1 1 0 0 1 1 1 0 0 ~ 5 5 5 0 0 A= 5 5 5 0 0 0 0 0 0 0 0 0 0 2 2 ǁ A-B ǁ F = Σ ij (A ij -B ij ) 2 0 0 0 0 0 0 0 0 3 3 0 0 0 0 0 is “small” 0 0 0 1 1 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 26

  27.  Theorem: Let A = U  V T ( σ 1  σ 2  …, rank( A )= r ) then B = U S V T  S = diagonal n x n matrix where s i = σ i ( i=1…k ) else s i =0 is a best rank ‐ k approximation to A :  B is solution to min B ǁ A-B ǁ F where rank( B )= k Σ � �� � ��  We will need 2 facts: �� �  where M = P Q R is SVD of M � �  U  V T ‐ U S V T = U (  ‐ S ) V T 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 27

  28.  We will need 2 facts: �� �  where M = P Q R is SVD of M � � We apply: -- P column orthonormal -- R row orthonormal -- Q is diagonal  U  V T ‐ U S V T = U (  ‐ S ) V T 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend