compsci 514 algorithms for data science
play

compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 11 0 logistics submissions until Sunday 10/13 at midnight with no penalty. Problem Set 2: Bernstein). Will give some review


  1. compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 11 0

  2. logistics submissions until Sunday 10/13 at midnight with no penalty. Problem Set 2: Bernstein). Will give some review exercises before midterm. 1 • Problem Set 2 is due this Friday 10/11. Will allow • Midterm next Thursday 10/17. • Mean was a 32 . 74 / 40 = 81 % . • Mostly seem to have mastered Markov’s, Chebyshev, etc. • Some difficulties with exponential tail bounds (Chernoff and

  3. summary Last Two Classes: Randomized Dimensionality Reduction the dataset. principal component analysis. Next Two Classes: Low-rank approximation, the SVD, and oblivious, matrix (linear compression) 2 • The Johnson-Lindenstrauss Lemma ( ) log n /δ ϵ 2 • Reduce n data points in any dimension d to O dimensions and preserve (with probability ≥ 1 − δ ) all pairwise distances up to 1 ± ϵ . • Compression is linear via multiplication with a random, data • Compression is still linear – by applying a matrix. • Chose this matrix carefully, taking into account structure of • Can give better compression than random projection.

  4. embedding with assumptions x j : with no distortion. 3 Assume that data points ⃗ x 1 , . . . ,⃗ x n lie in any k -dimensional subspace V of R d . Recall: Let ⃗ v 1 , . . . ,⃗ v k be an orthonormal basis for V and V ∈ R d × k be the matrix with these vectors as its columns. For all ⃗ x i ,⃗ ∥ V T ⃗ x i − V T ⃗ x j ∥ 2 = ∥ ⃗ x i − ⃗ x j ∥ 2 . • V T ∈ R k × d is a linear embedding of ⃗ x 1 , . . . ,⃗ x n into k dimensions • An actual projection, analogous to a JL random projection Π .

  5. embedding with assumptions and principal component analysis (PCA). 4 Main Focus of Today: Assume that data points ⃗ x 1 , . . . ,⃗ x n lie close to any k -dimensional subspace V of R d . Letting ⃗ v 1 , . . . ,⃗ v k be an orthonormal basis for V and V ∈ R d × k be the matrix with these vectors as its columns, V T ⃗ x i ∈ R k is still a good embedding for x i ∈ R d . The key idea behind low-rank approximation • How do we find V and V ? • How good is the embedding?

  6. low-rank factorization v k . 5 Claim: ⃗ x 1 , . . . ,⃗ x n lie in a k -dimensional subspace V ⇔ the data matrix X ∈ R n × d has rank ≤ k . • Letting ⃗ v 1 , . . . ,⃗ v k be an orthonormal basis for V , can write any ⃗ x i as: ⃗ x i = c i , 1 · ⃗ v 1 + c i , 2 · ⃗ v 2 + . . . + c i , k · ⃗ v k . • So ⃗ v 1 , . . . ,⃗ v k span the rows of X and thus rank ( X ) ≤ k . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  7. 6 c i V T . v k . Claim: ⃗ x 1 , . . . ,⃗ x n lie in a k -dimensional subspace V ⇔ the data matrix X ∈ R n × d has rank ≤ k . • Every data point ⃗ x i (row of X ) can be written as v k = ⃗ c i , 1 · ⃗ v 1 + . . . + c i , k · ⃗ • X can be represented by ( n + d ) · k parameters vs. n · d . • The columns of X are spanned by k vectors: the columns of C . ⃗ x 1 , . . . ,⃗ x n : data points (in R d ), V : k -dimensional subspace of R d , ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogonal basis for V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  8. low-rank factorization What is this coefficient matrix C ? v k . 7 Claim: If ⃗ x 1 , . . . ,⃗ x n lie in a k -dimensional subspace with orthonormal basis V ∈ R d × k , the data matrix can be written as X = CV T . • X = CV T = ⇒ XV = CV T V • V T V = I , the identity (since V is orthonormal) = ⇒ XV = C . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  9. low-rank factorization What is this coefficient matrix C ? v k . 7 Claim: If ⃗ x 1 , . . . ,⃗ x n lie in a k -dimensional subspace with orthonormal basis V ∈ R d × k , the data matrix can be written as X = CV T . • X = CV T = ⇒ XV = CV T V • V T V = I , the identity (since V is orthonormal) = ⇒ XV = C . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  10. low-rank factorization What is this coefficient matrix C ? v k . 7 Claim: If ⃗ x 1 , . . . ,⃗ x n lie in a k -dimensional subspace with orthonormal basis V ∈ R d × k , the data matrix can be written as X = CV T . • X = CV T = ⇒ XV = CV T V • V T V = I , the identity (since V is orthonormal) = ⇒ XV = C . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  11. projection view v k . 8 Claim: If ⃗ x 1 , . . . ,⃗ x n lie in a k -dimensional subspace V with orthonormal basis V ∈ R d × k , the data matrix can be written as X = X ( VV T ) . • VV T is a projection matrix, which projects the rows of X (the data points ⃗ x 1 , . . . ,⃗ x n onto the subspace V . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  12. projection view v k . 8 Claim: If ⃗ x 1 , . . . ,⃗ x n lie in a k -dimensional subspace V with orthonormal basis V ∈ R d × k , the data matrix can be written as X = X ( VV T ) . • VV T is a projection matrix, which projects the rows of X (the data points ⃗ x 1 , . . . ,⃗ x n onto the subspace V . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  13. projection view v k . 8 Claim: If ⃗ x 1 , . . . ,⃗ x n lie in a k -dimensional subspace V with orthonormal basis V ∈ R d × k , the data matrix can be written as X = X ( VV T ) . • VV T is a projection matrix, which projects the rows of X (the data points ⃗ x 1 , . . . ,⃗ x n onto the subspace V . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  14. low-rank approximation arg min v k . 9 Claim: If ⃗ x 1 , . . . ,⃗ x n lie close to a k -dimensional subspace V with orthonormal basis V ∈ R d × k , the data matrix can be approximated as: X ≈ X ( VV T ) = XP V Note: X ( VV T ) has rank k . It is a low-rank approximation of X . ∑ X ( VV T ) = ∥ X − B ∥ 2 F = ( X i , j − B i , j ) 2 . B with rows in V i , j ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  15. low-rank approximation column span of V ). v k . 10 So Far: If ⃗ x 1 , . . . ,⃗ x n lie close to a k -dimensional subspace V with orthonormal basis V ∈ R d × k , the data matrix can be approximated as: X ≈ X ( VV T ) . This is the closest approximation to X with rows in V (i.e., in the • Letting ( XVV T ) i , ( XVV T ) j be the i th and j th projected data points, ∥ ( XVV T ) i − ( XVV T ) j ∥ 2 = ∥ [( XV ) i − ( XV ) j ] V T ∥ 2 = ∥ [( XV ) i − ( XV ) j ] ∥ 2 . • Can use XV ∈ R n × k as a compressed approximate data set. Key question is how to find the subspace V and correspondingly V . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  16. why low-rank approximation? k -dimensional subspace? basis of k vectors. 11 Question: Why might we expect ⃗ x 1 , . . . ,⃗ x n to lie close to a • The rows of X can be approximately reconstructed from a

  17. why low-rank approximation? k -dimensional subspace? Linearly Dependent Variables: 12 Question: Why might we expect ⃗ x 1 , . . . ,⃗ x n to lie close to a • Equivalently, the columns of X are approx. spanned by k vectors.

  18. why low-rank approximation? k -dimensional subspace? Linearly Dependent Variables: 12 Question: Why might we expect ⃗ x 1 , . . . ,⃗ x n to lie close to a • Equivalently, the columns of X are approx. spanned by k vectors.

  19. why low-rank approximation? k -dimensional subspace? Linearly Dependent Variables: 12 Question: Why might we expect ⃗ x 1 , . . . ,⃗ x n to lie close to a • Equivalently, the columns of X are approx. spanned by k vectors.

  20. best fit subspace arg min v k . 2 n 13 If ⃗ x 1 , . . . ,⃗ x n are close to a k -dimensional subspace V with orthonormal basis V ∈ R d × k , the data matrix can be approximated as XVV T . XV gives optimal embedding of X in V . How do we find V (and V )? ( X i , j − ( XVV T ) i , j ) 2 = ∑ ∑ ∥ ⃗ x i − VV T ⃗ orthonormal V ∈ R d × k ∥ X − XVV T ∥ 2 F = x i ∥ 2 i = 1 i , j ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend