http://cs246.stanford.edu Often, our data can be represented by an - PowerPoint PPT Presentation

Note to other teachers and users of these slides: We would be delighted if you found our material useful for giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. If you make use of a significant portion of these slides in your own lecture, please include this message, or a link to our web site: http://www.mmds.org CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu

¡ Often, our data can be represented by an 𝑛 -by- 𝑜 matrix ¡ And this matrix can be closely approximated by the product of three matrices that share a small common dimension 𝑠 n n r r ´ ´ V T r S ≈ m m A U 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 2

¡ Compress / reduce dimensionality: § 10 6 rows; 10 3 columns; no updates § Random access to any cell(s); small error: OK New representation [1 0] [2 0] [1 0] [5 0] [0 2] [0 3] [0 1] Note: The above matrix is really “2-dimensional.” All rows can be reconstructed by scaling [1 1 1 0 0] or [0 0 0 1 1] 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 3

There are hidden, or latent factors, latent dimensions that – to a close approximation – explain why the values are as they appear in the data matrix 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 4

The axes of these dimensions can be chosen by: § The first dimension is the direction in which the points exhibit the greatest variance § The second dimension is the direction, orthogonal to the first, in which points show the 2 nd greatest variance § And so on…, until you have enough dimensions that variance is really low 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 5

¡ Q: What is rank of a matrix A ? ¡ A: Number of linearly independent rows of A ¡ Cloud of points in 3D space: § Think of point coordinates A as a matrix: A B 1 row per point: C ¡ We can rewrite coordinates more efficiently! § Old basis vectors: [1 0 0] [0 1 0] [0 0 1] § New basis vectors: [1 2 1] [-2 -3 1] § Then A has new coordinates: [1 0], B : [0 1], C : [1 -1] § Notice: We reduced the number of dimensions/coordinates! 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 6

¡ Goal of dimensionality reduction is to discover the axes of data! Rather than representing every point with 2 coordinates we represent each point with 1 coordinate (corresponding to the position of the point on the red line). By doing this we incur a bit of error as the points do not exactly lie on the line 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 7

¡ Gives a decomposition of any matrix into a product of three matrices: n n r r ´ ´ S V T r ~ m A U m ¡ There are strong constraints on the form of each of these matrices § Results in a unique decomposition ¡ From this decomposition, you can choose any number 𝑠 of intermediate concepts (latent factors) in a way that minimizes the reconstruction error 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 9

T n n r r r V T » S m m A U ¡ A : Input data matrix § m x n matrix (e.g., m documents, n terms) ¡ U : Left singular vectors § m x r matrix ( m documents, r concepts) ¡ S : Singular values § r x r diagonal matrix (strength of each ‘concept’) ( r : rank of the matrix A ) ¡ V : Right singular vectors § n x r matrix ( n terms, r concepts) 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 10

T n s 1 u 1 v 1 s 2 u 2 v 2 » + m A σ i … scalar u i … vector If we set s 2 = 0, then the green v i … vector columns may as well not exist. 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 11

It is always possible to decompose a real matrix A into A = U S V T , where ¡ U, S , V : unique ¡ U, V : column orthonormal § U T U = I ; V T V = I ( I : identity matrix) § (Columns are orthogonal unit vectors) ¡ S : diagonal § Entries ( singular values ) are non-negative, and sorted in decreasing order ( σ 1 ³ σ 2 ³ ... ³ 0 ) Nice proof of uniqueness: https://www.cs.cornell.edu/courses/cs322/2008sp/stuff/TrefethenBau_Lec4_SVD.pdf 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 12

¡ Consider a matrix. What does SVD do? Casablanca Serenity Amelie Matrix Alien n 1 1 1 0 0 3 3 3 0 0 SciFi V T 4 4 4 0 0 S = m 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 Romance U 0 1 0 2 2 “Concepts” Ratings matrix where each column AKA Latent dimensions corresponds to a movie and each row to a user. First 4 users prefer SciFi, AKA Latent factors while others prefer Romance. 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 13

¡ A = U S V T - example: Users to Movies Casablanca Serenity Amelie Matrix Alien 1 1 1 0 0 0.13 0.02 -0.01 3 3 3 0 0 0.41 0.07 -0.03 SciFi 12.4 0 0 4 4 4 0 0 0.55 0.09 -0.04 = 0 9.5 0 x x 5 5 5 0 0 0.68 0.11 -0.05 0 0 1.3 0 2 0 4 4 0.15 -0.59 0.65 0 0 0 5 5 0.07 -0.73 -0.67 Romance 0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 14

¡ A = U S V T - example: Users to Movies Casablanca SciFi-concept Serenity Amelie Matrix Romance-concept Alien 1 1 1 0 0 0.13 0.02 -0.01 3 3 3 0 0 0.41 0.07 -0.03 SciFi 12.4 0 0 4 4 4 0 0 0.55 0.09 -0.04 = 0 9.5 0 x x 5 5 5 0 0 0.68 0.11 -0.05 0 0 1.3 0 2 0 4 4 0.15 -0.59 0.65 0 0 0 5 5 0.07 -0.73 -0.67 Romance 0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 15

¡ A = U S V T - example: U is “user-to-concept” factor matrix Casablanca Serenity Amelie Matrix SciFi-concept Romance-concept Alien 1 1 1 0 0 0.13 0.02 -0.01 3 3 3 0 0 0.41 0.07 -0.03 SciFi 12.4 0 0 4 4 4 0 0 0.55 0.09 -0.04 = 0 9.5 0 x x 5 5 5 0 0 0.68 0.11 -0.05 0 0 1.3 0 2 0 4 4 0.15 -0.59 0.65 0 0 0 5 5 0.07 -0.73 -0.67 Romance 0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 16

¡ A = U S V T - example: Casablanca Serenity Amelie Matrix SciFi-concept Alien “strength” of the SciFi-concept 1 1 1 0 0 0.13 0.02 -0.01 3 3 3 0 0 0.41 0.07 -0.03 SciFi SciFi 12.4 0 0 4 4 4 0 0 0.55 0.09 -0.04 = 0 9.5 0 x x 5 5 5 0 0 0.68 0.11 -0.05 0 0 1.3 0 2 0 4 4 0.15 -0.59 0.65 0 0 0 5 5 0.07 -0.73 -0.67 Romance 0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 17

¡ A = U S V T - example: Casablanca V is “movie-to-concept” Serenity Amelie Matrix factor matrix SciFi-concept Alien 1 1 1 0 0 0.13 0.02 -0.01 3 3 3 0 0 0.41 0.07 -0.03 SciFi 12.4 0 0 4 4 4 0 0 0.55 0.09 -0.04 = 0 9.5 0 x x 5 5 5 0 0 0.68 0.11 -0.05 0 0 1.3 0 2 0 4 4 0.15 -0.59 0.65 0 0 0 5 5 0.07 -0.73 -0.67 Romance 0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 SciFi-concept 0.40 -0.80 0.40 0.09 0.09 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 18

Movies , users and concepts : ¡ U : user-to-concept matrix ¡ V : movie-to-concept matrix ¡ S : its diagonal elements: ‘strength’ of each concept 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 19

Movie 2 rating first right singular vector v 1 Movie 1 rating ¡ Instead of using two coordinates (𝒚, 𝒛) to describe point positions, let’s use only one coordinate ¡ Point’s position is its location along vector 𝒘 𝟐 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 21

¡ A = U S V T - example: Movie 2 rating § U : “user-to-concept” matrix first right singular § V : “movie-to-concept” matrix vector v 1 1 1 1 0 0 0.13 0.02 -0.01 Movie 1 rating 3 3 3 0 0 0.41 0.07 -0.03 12.4 0 0 4 4 4 0 0 0.55 0.09 -0.04 = 0 9.5 0 x x 5 5 5 0 0 0.68 0.11 -0.05 0 0 1.3 0 2 0 4 4 0.15 -0.59 0.65 0 0 0 5 5 0.07 -0.73 -0.67 0.56 0.59 0.56 0.09 0.09 0 1 0 2 2 0.07 -0.29 0.32 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 22

¡ A = U S V T - example: Movie 2 rating first right singular vector variance (‘spread’) on the v 1 axis v 1 1 1 1 0 0 0.13 0.02 -0.01 Movie 1 rating 3 3 3 0 0 0.41 0.07 -0.03 12.4 0 0 4 4 4 0 0 0.55 0.09 -0.04 = 0 9.5 0 x x 5 5 5 0 0 0.68 0.11 -0.05 0 0 1.3 0 2 0 4 4 0.15 -0.59 0.65 0 0 0 5 5 0.07 -0.73 -0.67 0.56 0.59 0.56 0.09 0.09 0 1 0 2 2 0.07 -0.29 0.32 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09 2/4/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 23

http://cs246.stanford.edu Often, our data can be represented by an - PowerPoint PPT Presentation

Note to other teachers and users of these slides: We would be delighted if you found our material useful for giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. If you make use of a

http://cs246.stanford.edu High dim. Graph Infinite Machine Apps data data data learning

http://cs246.stanford.edu CPU Machine Learning, Statistics Memory Classical Data Mining

http://cs246.stanford.edu Web pages are not equally important www.joe-schmoe.com vs.

http://cs246.stanford.edu Training data 100 million ratings, 480,000 users, 17,770 movies

http://cs246.stanford.edu Instructor: Jure Leskovec TAs: Aditya Parameswaran

http://cs246.stanford.edu More algorithms for streams: (1) Filtering a data stream: Bloom

http://cs246.stanford.edu High-dimension == many features Find concepts/topics/genres:

http://cs246.stanford.edu High dim. High dim. Graph Graph Infinite Infinite Machine Machine Apps

http://cs246.stanford.edu Classic model of algorithms You get to see the entire input, then

http://cs246.stanford.edu Rank nodes using link structure PageRank: Link voting: P

http://cs246.stanford.edu Web advertising Weve learned how to match advertisers to

http://cs246.stanford.edu Web advertising We discussed how to match advertisers to queries

http://cs246.stanford.edu Web advertising We discussed how to match

http://cs246.stanford.edu Web advertising We discussed how to match advertisers to

http://cs246.stanford.edu TAs : Bahman Bahmani Juthika Dabholkar Pierre Kreitmann

http://cs246.stanford.edu High dimensional == many features Find

Multi Language Support for Virtual Assistants Sierra Kaplan-Nelson, Max Farr Mentor: Mehrad

CEE 772: Instrumental Methods in Environmental Analysis Lecture #24 Special Applications:

Hac k- a- Vote : Studying Se c ur ity Issue s with E - Voting Da n Wa lla c h Ric e Unive

Cristian Cadar Department of Computing Imperial College London Joint work with Peter

Low-rank sums-of-squares representations Cynthia Vinzant, North Carolina State University joint

approximation algorithms I David Steurer Cornell Cargese Workshop, 2014 encoded as low-degree

Quiz Suppose u 1 , . . . , u n is a basis for U and v 1 , . . . , v k is a basis for V . Prove that

The stable category and big pure projective modules joint work with Pavel P r hoda and