compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 14 0

logistics exam/discuss solutions. 1 • Midterm grades are on Moodle. • Average was 32 . 67, median 33, standard deviation 6 . 8 • Come to office hours if you would like to see your

• How do we compress data that does not lie close to a • Spectral methods (SVD and eigendecomposition) are still key • Spectral graph theory, spectral clustering. • Compress data that lies close to a k -dimensional subspace. • Equivalent to finding a low-rank approximation of the data • Optimal solution via PCA (eigendecomposition of X T X or summary Last Few Weeks: Low-Rank Approximation and PCA matrix X : X XVV T . equivalently, SVD of X ). This Class: Non-linear dimensionality reduction. k -dimensional subspace? techniques in this setting. 2

• How do we compress data that does not lie close to a • Spectral methods (SVD and eigendecomposition) are still key • Spectral graph theory, spectral clustering. summary Last Few Weeks: Low-Rank Approximation and PCA equivalently, SVD of X ). This Class: Non-linear dimensionality reduction. k -dimensional subspace? techniques in this setting. 2 • Compress data that lies close to a k -dimensional subspace. • Equivalent to finding a low-rank approximation of the data matrix X : X ≈ XVV T . • Optimal solution via PCA (eigendecomposition of X T X or

summary Last Few Weeks: Low-Rank Approximation and PCA equivalently, SVD of X ). This Class: Non-linear dimensionality reduction. k -dimensional subspace? techniques in this setting. 2 • Compress data that lies close to a k -dimensional subspace. • Equivalent to finding a low-rank approximation of the data matrix X : X ≈ XVV T . • Optimal solution via PCA (eigendecomposition of X T X or • How do we compress data that does not lie close to a • Spectral methods (SVD and eigendecomposition) are still key • Spectral graph theory, spectral clustering.

• Documents (for topic-based search and classification) • Words (to identify synonyms, translations, etc.) • Nodes in a social network entity embeddings End of Last Class: Embedding objects other than vectors into Euclidean space. Usual Approach: Convert each item into a high-dimensional feature vector and then apply low-rank approximation 3

entity embeddings End of Last Class: Embedding objects other than vectors into Euclidean space. Usual Approach: Convert each item into a high-dimensional feature vector and then apply low-rank approximation 3 • Documents (for topic-based search and classification) • Words (to identify synonyms, translations, etc.) • Nodes in a social network

example: latent semantic analysis 4

• I.e., y i z a 1 when doc i contains word a . • If doc i and doc j both contain word a , y i z a y j z a YZ T F is small, then on average, • If the error X y i z a example: latent semantic analysis X i a YZ T i a 1. 5

• I.e., y i z a 1 when doc i contains word a . • If doc i and doc j both contain word a , y i z a y j z a example: latent semantic analysis 1. 5 • If the error ∥ X − YZ T ∥ F is small, then on average, X i , a ≈ ( YZ T ) i , a = ⟨ ⃗ y i ,⃗ z a ⟩ .

• If doc i and doc j both contain word a , y i z a y j z a example: latent semantic analysis 1. 5 • If the error ∥ X − YZ T ∥ F is small, then on average, X i , a ≈ ( YZ T ) i , a = ⟨ ⃗ y i ,⃗ z a ⟩ . • I.e., ⟨ ⃗ y i ,⃗ z a ⟩ ≈ 1 when doc i contains word a .

example: latent semantic analysis 5 • If the error ∥ X − YZ T ∥ F is small, then on average, X i , a ≈ ( YZ T ) i , a = ⟨ ⃗ y i ,⃗ z a ⟩ . • I.e., ⟨ ⃗ y i ,⃗ z a ⟩ ≈ 1 when doc i contains word a . • If doc i and doc j both contain word a , ⟨ ⃗ y i ,⃗ z a ⟩ ≈ ⟨ ⃗ y j ,⃗ z a ⟩ = 1.

Another View: Each column of Y represents a ‘topic’. y i j indicates how much doc i belongs to topic j . z a j indicates how much word a example: latent semantic analysis associates with that topic. 6 If doc i and doc j both contain word a , ⟨ ⃗ y i ,⃗ z a ⟩ ≈ ⟨ ⃗ y j ,⃗ z a ⟩ = 1

associates with that topic. example: latent semantic analysis 6 If doc i and doc j both contain word a , ⟨ ⃗ y i ,⃗ z a ⟩ ≈ ⟨ ⃗ y j ,⃗ z a ⟩ = 1 Another View: Each column of Y represents a ‘topic’. ⃗ y i ( j ) indicates how much doc i belongs to topic j . ⃗ z a ( j ) indicates how much word a

• In an SVD decomposition we set Z • The columns of V k are equivalently: the top k eigenvectors of XX T . The eigendecomposition of XX T is XX T • What is the best rank- k approximation of XX T ? I.e. k B XX T B F • XX T example: latent semantic analysis ZZ T . k k V T 2 V k 2 V T . arg min rank V K . k V T documents. 7 • Just like with documents, ⃗ z a and ⃗ z b will tend to have high dot product if word i and word j appear in many of the same

• The columns of V k are equivalently: the top k eigenvectors of XX T . The eigendecomposition of XX T is XX T • What is the best rank- k approximation of XX T ? I.e. k B XX T B F • XX T example: latent semantic analysis ZZ T . k k V T 2 V k 2 V T . arg min rank V K . documents. 7 • Just like with documents, ⃗ z a and ⃗ z b will tend to have high dot product if word i and word j appear in many of the same • In an SVD decomposition we set Z = Σ k V T

The eigendecomposition of XX T is XX T • What is the best rank- k approximation of XX T ? I.e. k B XX T B F • XX T example: latent semantic analysis ZZ T . k k V T 2 V k arg min rank 2 V T . V K . documents. 7 • Just like with documents, ⃗ z a and ⃗ z b will tend to have high dot product if word i and word j appear in many of the same • In an SVD decomposition we set Z = Σ k V T • The columns of V k are equivalently: the top k eigenvectors of XX T .

• What is the best rank- k approximation of XX T ? I.e. k B XX T B F • XX T arg min rank ZZ T . k k V T 2 V k example: latent semantic analysis K . documents. 7 • Just like with documents, ⃗ z a and ⃗ z b will tend to have high dot product if word i and word j appear in many of the same • In an SVD decomposition we set Z = Σ k V T • The columns of V k are equivalently: the top k eigenvectors of XX T . The eigendecomposition of XX T is XX T = V Σ 2 V T .

• XX T example: latent semantic analysis ZZ T . k k V T 2 V k 7 documents. K . • Just like with documents, ⃗ z a and ⃗ z b will tend to have high dot product if word i and word j appear in many of the same • In an SVD decomposition we set Z = Σ k V T • The columns of V k are equivalently: the top k eigenvectors of XX T . The eigendecomposition of XX T is XX T = V Σ 2 V T . • What is the best rank- k approximation of XX T ? I.e. arg min rank − k B ∥ XX T − B ∥ F

example: latent semantic analysis documents. K . k V T 7 • Just like with documents, ⃗ z a and ⃗ z b will tend to have high dot product if word i and word j appear in many of the same • In an SVD decomposition we set Z = Σ k V T • The columns of V k are equivalently: the top k eigenvectors of XX T . The eigendecomposition of XX T is XX T = V Σ 2 V T . • What is the best rank- k approximation of XX T ? I.e. arg min rank − k B ∥ XX T − B ∥ F • XX T = V k Σ 2 k = ZZ T .

• Think about XX T as a similarity matrix (gram matrix, kernel matrix) with entry a b being the similarity between word a and word b . • Many ways to measure similarity: number of sentences both occur • Replacing XX T with these different metrics (sometimes example: word embedding LSA gives a way of embedding words into k -dimensional space. in, number of time both appear in the same window of w words, in similar positions of documents in different languages, etc. appropriately transformed) leads to popular word embedding algorithms: word2vec, GloVe, fastTest, etc. 8 • Embedding is via low-rank approximation of XX T : where ( XX T ) a , b is the number of documents that both word a and word b appear in.

compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 14 0 logistics exam/discuss solutions. 1 Midterm grades are on Moodle. Average was 32 . 67, median 33, standard deviation 6

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Prof. Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

Cutting Plane Separators in SCIP Kati Wolter Zuse Institute Berlin DFG Research Center M ATHEON

Cutting Planes and Branch and Bound Marco Chiarandini Department of Mathematics & Computer

Chapter 2 Integer Programming Paragraph 3 Advanced Methods Search and Inference Different

Colleges & Coronavirus - Time to Be Proactive What a College Should NOT Do During a

Lecture 16: Dynamic Programming - Pole Cutting COMS10007 - Algorithms Dr. Christian Konrad

Cutting a part from many measures Nevena Pali 6th BMS Student Conference Nevena Pali

Cuttlefish A Lightweight Primitive for Online Tuning by Tomer Kaftan (UW), Magdalena Balazinska

Traffic Measurement Activities of the WIDE project Kenjiro Cho IIJ Research Lab traffic

compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 14 0 logistics exam/discuss solutions. 1 Midterm grades are on Moodle. Average was 32 . 67, median 33, standard deviation 6

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Prof. Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

Cutting Plane Separators in SCIP Kati Wolter Zuse Institute Berlin DFG Research Center M ATHEON

Cutting Planes and Branch and Bound Marco Chiarandini Department of Mathematics &amp; Computer

Chapter 2 Integer Programming Paragraph 3 Advanced Methods Search and Inference Different

Colleges &amp; Coronavirus - Time to Be Proactive What a College Should NOT Do During a

Lecture 16: Dynamic Programming - Pole Cutting COMS10007 - Algorithms Dr. Christian Konrad

Cutting a part from many measures Nevena Pali 6th BMS Student Conference Nevena Pali

Cuttlefish A Lightweight Primitive for Online Tuning by Tomer Kaftan (UW), Magdalena Balazinska

Traffic Measurement Activities of the WIDE project Kenjiro Cho IIJ Research Lab traffic

Cutting Planes and Branch and Bound Marco Chiarandini Department of Mathematics & Computer

Colleges & Coronavirus - Time to Be Proactive What a College Should NOT Do During a