compsci 514 algorithms for data science
play

compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 14 0 logistics exam/discuss solutions. 1 Midterm grades are on Moodle. Average was 32 . 67, median 33, standard deviation 6


  1. compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 14 0

  2. logistics exam/discuss solutions. 1 • Midterm grades are on Moodle. • Average was 32 . 67, median 33, standard deviation 6 . 8 • Come to office hours if you would like to see your

  3. • How do we compress data that does not lie close to a • Spectral methods (SVD and eigendecomposition) are still key • Spectral graph theory, spectral clustering. • Compress data that lies close to a k -dimensional subspace. • Equivalent to finding a low-rank approximation of the data • Optimal solution via PCA (eigendecomposition of X T X or summary Last Few Weeks: Low-Rank Approximation and PCA matrix X : X XVV T . equivalently, SVD of X ). This Class: Non-linear dimensionality reduction. k -dimensional subspace? techniques in this setting. 2

  4. • How do we compress data that does not lie close to a • Spectral methods (SVD and eigendecomposition) are still key • Spectral graph theory, spectral clustering. summary Last Few Weeks: Low-Rank Approximation and PCA equivalently, SVD of X ). This Class: Non-linear dimensionality reduction. k -dimensional subspace? techniques in this setting. 2 • Compress data that lies close to a k -dimensional subspace. • Equivalent to finding a low-rank approximation of the data matrix X : X ≈ XVV T . • Optimal solution via PCA (eigendecomposition of X T X or

  5. • How do we compress data that does not lie close to a • Spectral methods (SVD and eigendecomposition) are still key • Spectral graph theory, spectral clustering. summary Last Few Weeks: Low-Rank Approximation and PCA equivalently, SVD of X ). This Class: Non-linear dimensionality reduction. k -dimensional subspace? techniques in this setting. 2 • Compress data that lies close to a k -dimensional subspace. • Equivalent to finding a low-rank approximation of the data matrix X : X ≈ XVV T . • Optimal solution via PCA (eigendecomposition of X T X or

  6. summary Last Few Weeks: Low-Rank Approximation and PCA equivalently, SVD of X ). This Class: Non-linear dimensionality reduction. k -dimensional subspace? techniques in this setting. 2 • Compress data that lies close to a k -dimensional subspace. • Equivalent to finding a low-rank approximation of the data matrix X : X ≈ XVV T . • Optimal solution via PCA (eigendecomposition of X T X or • How do we compress data that does not lie close to a • Spectral methods (SVD and eigendecomposition) are still key • Spectral graph theory, spectral clustering.

  7. • Documents (for topic-based search and classification) • Words (to identify synonyms, translations, etc.) • Nodes in a social network entity embeddings End of Last Class: Embedding objects other than vectors into Euclidean space. Usual Approach: Convert each item into a high-dimensional feature vector and then apply low-rank approximation 3

  8. entity embeddings End of Last Class: Embedding objects other than vectors into Euclidean space. Usual Approach: Convert each item into a high-dimensional feature vector and then apply low-rank approximation 3 • Documents (for topic-based search and classification) • Words (to identify synonyms, translations, etc.) • Nodes in a social network

  9. entity embeddings End of Last Class: Embedding objects other than vectors into Euclidean space. Usual Approach: Convert each item into a high-dimensional feature vector and then apply low-rank approximation 3 • Documents (for topic-based search and classification) • Words (to identify synonyms, translations, etc.) • Nodes in a social network

  10. example: latent semantic analysis 4

  11. example: latent semantic analysis 4

  12. • I.e., y i z a 1 when doc i contains word a . • If doc i and doc j both contain word a , y i z a y j z a YZ T F is small, then on average, • If the error X y i z a example: latent semantic analysis X i a YZ T i a 1. 5

  13. • I.e., y i z a 1 when doc i contains word a . • If doc i and doc j both contain word a , y i z a y j z a example: latent semantic analysis 1. 5 • If the error ∥ X − YZ T ∥ F is small, then on average, X i , a ≈ ( YZ T ) i , a = ⟨ ⃗ y i ,⃗ z a ⟩ .

  14. • If doc i and doc j both contain word a , y i z a y j z a example: latent semantic analysis 1. 5 • If the error ∥ X − YZ T ∥ F is small, then on average, X i , a ≈ ( YZ T ) i , a = ⟨ ⃗ y i ,⃗ z a ⟩ . • I.e., ⟨ ⃗ y i ,⃗ z a ⟩ ≈ 1 when doc i contains word a .

  15. example: latent semantic analysis 5 • If the error ∥ X − YZ T ∥ F is small, then on average, X i , a ≈ ( YZ T ) i , a = ⟨ ⃗ y i ,⃗ z a ⟩ . • I.e., ⟨ ⃗ y i ,⃗ z a ⟩ ≈ 1 when doc i contains word a . • If doc i and doc j both contain word a , ⟨ ⃗ y i ,⃗ z a ⟩ ≈ ⟨ ⃗ y j ,⃗ z a ⟩ = 1.

  16. Another View: Each column of Y represents a ‘topic’. y i j indicates how much doc i belongs to topic j . z a j indicates how much word a example: latent semantic analysis associates with that topic. 6 If doc i and doc j both contain word a , ⟨ ⃗ y i ,⃗ z a ⟩ ≈ ⟨ ⃗ y j ,⃗ z a ⟩ = 1

  17. associates with that topic. example: latent semantic analysis 6 If doc i and doc j both contain word a , ⟨ ⃗ y i ,⃗ z a ⟩ ≈ ⟨ ⃗ y j ,⃗ z a ⟩ = 1 Another View: Each column of Y represents a ‘topic’. ⃗ y i ( j ) indicates how much doc i belongs to topic j . ⃗ z a ( j ) indicates how much word a

  18. • In an SVD decomposition we set Z • The columns of V k are equivalently: the top k eigenvectors of XX T . The eigendecomposition of XX T is XX T • What is the best rank- k approximation of XX T ? I.e. k B XX T B F • XX T example: latent semantic analysis ZZ T . k k V T 2 V k 2 V T . arg min rank V K . k V T documents. 7 • Just like with documents, ⃗ z a and ⃗ z b will tend to have high dot product if word i and word j appear in many of the same

  19. • The columns of V k are equivalently: the top k eigenvectors of XX T . The eigendecomposition of XX T is XX T • What is the best rank- k approximation of XX T ? I.e. k B XX T B F • XX T example: latent semantic analysis ZZ T . k k V T 2 V k 2 V T . arg min rank V K . documents. 7 • Just like with documents, ⃗ z a and ⃗ z b will tend to have high dot product if word i and word j appear in many of the same • In an SVD decomposition we set Z = Σ k V T

  20. The eigendecomposition of XX T is XX T • What is the best rank- k approximation of XX T ? I.e. k B XX T B F • XX T example: latent semantic analysis ZZ T . k k V T 2 V k arg min rank 2 V T . V K . documents. 7 • Just like with documents, ⃗ z a and ⃗ z b will tend to have high dot product if word i and word j appear in many of the same • In an SVD decomposition we set Z = Σ k V T • The columns of V k are equivalently: the top k eigenvectors of XX T .

  21. • What is the best rank- k approximation of XX T ? I.e. k B XX T B F • XX T arg min rank ZZ T . k k V T 2 V k example: latent semantic analysis K . documents. 7 • Just like with documents, ⃗ z a and ⃗ z b will tend to have high dot product if word i and word j appear in many of the same • In an SVD decomposition we set Z = Σ k V T • The columns of V k are equivalently: the top k eigenvectors of XX T . The eigendecomposition of XX T is XX T = V Σ 2 V T .

  22. • XX T example: latent semantic analysis ZZ T . k k V T 2 V k 7 documents. K . • Just like with documents, ⃗ z a and ⃗ z b will tend to have high dot product if word i and word j appear in many of the same • In an SVD decomposition we set Z = Σ k V T • The columns of V k are equivalently: the top k eigenvectors of XX T . The eigendecomposition of XX T is XX T = V Σ 2 V T . • What is the best rank- k approximation of XX T ? I.e. arg min rank − k B ∥ XX T − B ∥ F

  23. example: latent semantic analysis documents. K . k V T 7 • Just like with documents, ⃗ z a and ⃗ z b will tend to have high dot product if word i and word j appear in many of the same • In an SVD decomposition we set Z = Σ k V T • The columns of V k are equivalently: the top k eigenvectors of XX T . The eigendecomposition of XX T is XX T = V Σ 2 V T . • What is the best rank- k approximation of XX T ? I.e. arg min rank − k B ∥ XX T − B ∥ F • XX T = V k Σ 2 k = ZZ T .

  24. • Think about XX T as a similarity matrix (gram matrix, kernel matrix) with entry a b being the similarity between word a and word b . • Many ways to measure similarity: number of sentences both occur • Replacing XX T with these different metrics (sometimes example: word embedding LSA gives a way of embedding words into k -dimensional space. in, number of time both appear in the same window of w words, in similar positions of documents in different languages, etc. appropriately transformed) leads to popular word embedding algorithms: word2vec, GloVe, fastTest, etc. 8 • Embedding is via low-rank approximation of XX T : where ( XX T ) a , b is the number of documents that both word a and word b appear in.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend