compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 13 0

• MAP Feedback: • Going to adjust a bit how I take questions in class. • Will try to more clearly identify important information (what will • Will try to use iPad more to write out proofs in class. logistics graduates. We will have your Problem Set 2 and midterm grades back before then. appear on exams or problem sets) v.s. motivating examples. 1 • Pass/Fail Deadline is 10/29 for undergraduates and 10/31 for • Will release Problem Set 3 next week due ∼ 11 / 11.

logistics graduates. We will have your Problem Set 2 and midterm grades back before then. appear on exams or problem sets) v.s. motivating examples. 1 • Pass/Fail Deadline is 10/29 for undergraduates and 10/31 for • Will release Problem Set 3 next week due ∼ 11 / 11. • MAP Feedback: • Going to adjust a bit how I take questions in class. • Will try to more clearly identify important information (what will • Will try to use iPad more to write out proofs in class.

• Show how PCA can be interpreted in terms of the singular • Applications to word embeddings, graph embeddings, • Discussed how to compress a dataset that lies close to a • Optimal compression by projecting onto the top k • Saw how to calculate the error of the approximation – interpret the spectrum of X T X . summary Last Few Classes: Low-Rank Approximation and PCA k -dimensional subspace. eigenvectors of the covariance matrix X T X (PCA). This Class: Low-rank approximation and connection to singular value decomposition. value decomposition (SVD) of X . document classification, recommendation systems. 2

• Show how PCA can be interpreted in terms of the singular • Applications to word embeddings, graph embeddings, summary Last Few Classes: Low-Rank Approximation and PCA k -dimensional subspace. eigenvectors of the covariance matrix X T X (PCA). interpret the spectrum of X T X . This Class: Low-rank approximation and connection to singular value decomposition. value decomposition (SVD) of X . document classification, recommendation systems. 2 • Discussed how to compress a dataset that lies close to a • Optimal compression by projecting onto the top k • Saw how to calculate the error of the approximation –

summary Last Few Classes: Low-Rank Approximation and PCA k -dimensional subspace. eigenvectors of the covariance matrix X T X (PCA). interpret the spectrum of X T X . This Class: Low-rank approximation and connection to singular value decomposition. value decomposition (SVD) of X . document classification, recommendation systems. 2 • Discussed how to compress a dataset that lies close to a • Optimal compression by projecting onto the top k • Saw how to calculate the error of the approximation – • Show how PCA can be interpreted in terms of the singular • Applications to word embeddings, graph embeddings,

k be the v k be an orthonormal basis for d is the projection matrix onto • VV T X VV T . Gives the closest approximation to X with rows in • X v k . . . d review matrix with these vectors as its columns. d and V Let v 1 3 Set Up: Assume that data points ⃗ x 1 , . . . ,⃗ x n lie close to any k -dimensional subspace V of R d . Let X ∈ R n × d be the data matrix. ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

d is the projection matrix onto • VV T X VV T . Gives the closest approximation to X with rows in • X review v k . . d . matrix with these vectors as its columns. 3 Set Up: Assume that data points ⃗ x 1 , . . . ,⃗ x n lie close to any k -dimensional subspace V of R d . Let X ∈ R n × d be the data matrix. Let ⃗ v 1 , . . . ,⃗ v k be an orthonormal basis for V and V ∈ R d × k be the ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

X VV T . Gives the closest approximation to X with rows in • X matrix with these vectors as its columns. v k . . review 3 Set Up: Assume that data points ⃗ x 1 , . . . ,⃗ x n lie close to any k -dimensional subspace V of R d . Let X ∈ R n × d be the data matrix. Let ⃗ v 1 , . . . ,⃗ v k be an orthonormal basis for V and V ∈ R d × k be the • VV T ∈ R d × d is the projection matrix onto V . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

review matrix with these vectors as its columns. v k . 3 Set Up: Assume that data points ⃗ x 1 , . . . ,⃗ x n lie close to any k -dimensional subspace V of R d . Let X ∈ R n × d be the data matrix. Let ⃗ v 1 , . . . ,⃗ v k be an orthonormal basis for V and V ∈ R d × k be the • VV T ∈ R d × d is the projection matrix onto V . • X ≈ X ( VV T ) . Gives the closest approximation to X with rows in V . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

• XVV T is a rank- k matrix – all its rows fall in • X ’s rows are approximately spanned by the columns of V . • X ’s columns are approximately spanned by the columns of XV . review of last time . v k . 4 Low-Rank Approximation: Approximate X ≈ XVV T . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

• X ’s rows are approximately spanned by the columns of V . • X ’s columns are approximately spanned by the columns of XV . review of last time v k . 4 Low-Rank Approximation: Approximate X ≈ XVV T . • XVV T is a rank- k matrix – all its rows fall in V . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

v k . review of last time 4 Low-Rank Approximation: Approximate X ≈ XVV T . • XVV T is a rank- k matrix – all its rows fall in V . • X ’s rows are approximately spanned by the columns of V . • X ’s columns are approximately spanned by the columns of XV . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

dual view of low-rank approximation 5

k XVV T 2 k X optimal low-rank approximation d v k . 2 2 VV T x i 1 i n F orthonormal V orthonormal V arg max arg min 6 d XVV T 2 F Given ⃗ x 1 , . . . ,⃗ x n (the rows of X ) we want to find an orthonormal span V ∈ R d × k (spanning a k -dimensional subspace V ). ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

k XVV T 2 optimal low-rank approximation v k . 2 2 VV T x i 1 n F i d arg min orthonormal V 6 F arg max Given ⃗ x 1 , . . . ,⃗ x n (the rows of X ) we want to find an orthonormal span V ∈ R d × k (spanning a k -dimensional subspace V ). orthonormal V ∈ R d × k ∥ X − XVV T ∥ 2 ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

optimal low-rank approximation F v k . 2 2 VV T x i i n 1 6 arg min arg max Given ⃗ x 1 , . . . ,⃗ x n (the rows of X ) we want to find an orthonormal span V ∈ R d × k (spanning a k -dimensional subspace V ). orthonormal V ∈ R d × k ∥ X − XVV T ∥ 2 F = orthonormal V ∈ R d × k ∥ XVV T ∥ 2 ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 13 0 MAP Feedback: Going to adjust a bit how I take questions in class. Will try to more clearly identify important

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Prof. Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

History and Theory of Nonlinear Principal Component Analysis Jan de Leeuw February 11, 2011 Jan

Pattern Detection in Computer Networks Using Robust Principal Component Analysis Randy

Dimensionality Reduction: Linear Discriminant Analysis and Principal Component Analysis CMSC 678

Dimensionality Reduction & Embedding Prof. Mike Hughes Many ideas/slides attributable to:

Using HUD's CNA e-Tool for RAD Transactions Office of Recapitalization December 7, 2017 Webinar

When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS,

Probabilistic Dimensionality Reduction Neil D. Lawrence Amazon Research Cambridge and University

compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 13 0 MAP Feedback: Going to adjust a bit how I take questions in class. Will try to more clearly identify important

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Prof. Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

History and Theory of Nonlinear Principal Component Analysis Jan de Leeuw February 11, 2011 Jan

Pattern Detection in Computer Networks Using Robust Principal Component Analysis Randy

Dimensionality Reduction: Linear Discriminant Analysis and Principal Component Analysis CMSC 678

Dimensionality Reduction &amp; Embedding Prof. Mike Hughes Many ideas/slides attributable to:

Using HUD's CNA e-Tool for RAD Transactions Office of Recapitalization December 7, 2017 Webinar

When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS,

Probabilistic Dimensionality Reduction Neil D. Lawrence Amazon Research Cambridge and University

Dimensionality Reduction & Embedding Prof. Mike Hughes Many ideas/slides attributable to: