compsci 514 algorithms for data science
play

compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2020. Lecture 15 0 logistics whether people are working in groups. So if you dont have a group, I encourage you to join one. There are


  1. compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2020. Lecture 15 0

  2. logistics whether people are working in groups. So if you don’t have a group, I encourage you to join one. There are multiple people looking so post on Piazza to find some. 1 • Problem Set 3 is due next Friday 10/23, 8pm. • Problem set grades seem to be strongly correlated with • This week’s quiz due Monday at 8pm.

  3. summary space is given by projecting onto that space. approximate X ? Last Class: Low-Rank Approximation arg min perfectly embed into k dimensions using an orthonormal 2 • When data lies in a k -dimensional subspace V , we can span V ∈ R d × k . • When data lies close to V , the optimal embedding in that XVV T = ∥ X − B ∥ 2 F . B with rows in V This Class: Finding V via eigendecomposition. • How do we find the best low-dimensional subspace to • PCA and its connection to eigendecomposition.

  4. basic set up matrix with these vectors as its columns. v k . 3 Reminder of Set Up: Assume that ⃗ x 1 , . . . ,⃗ x n lie close to any k -dimensional subspace V of R d . Let X ∈ R n × d be the data matrix. v k be an orthonormal basis for V and V ∈ R d × k be the Let ⃗ v 1 , . . . ,⃗ • VV T ∈ R d × d is the projection matrix onto V . • X ≈ X ( VV T ) . Gives the closest approximation to X with rows in V . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  5. dual view of low-rank approximation 4

  6. best fit subspace arg min arg max 5 If ⃗ x 1 , . . . ,⃗ x n are close to a k -dimensional subspace V with orthonormal basis V ∈ R d × k , the data matrix can be approximated as XVV T . XV gives optimal embedding of X in V . How do we find V (equivalently V )? orthonormal V ∈ R d × k ∥ X − XVV T ∥ 2 F = orthonormal V ∈ R d × k ∥ XV ∥ 2 F .

  7. solution via eigendecomposition k v k . These are exactly the top k eigenvectors of X T X . arg max arg max 2 6 arg max n V minimizing ∥ X − XVV T ∥ 2 F is given by: ∑ ∑ ∥ V T ⃗ ∥ X ⃗ orthonormal V ∈ R d × k ∥ XV ∥ 2 F = x i ∥ 2 2 = v j ∥ 2 i = 1 j = 1 Surprisingly, can find the columns of V , ⃗ v 1 , . . . ,⃗ v k greedily. ⃗ ∥ X ⃗ 2 ⃗ v T X T X ⃗ v 1 = arg max v ∥ 2 v . ⃗ v with ∥ v ∥ 2 = 1 ⃗ ⃗ v T X T X ⃗ v 2 = v . ⃗ v with ∥ v ∥ 2 = 1 , ⟨ ⃗ v ,⃗ v 1 ⟩ = 0 . . . ⃗ ⃗ v T X T X ⃗ v k = v . ⃗ v with ∥ v ∥ 2 = 1 , ⟨ ⃗ v ,⃗ v j ⟩ = 0 ∀ j < k ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  8. review of eigenvectors and eigendecomposition v 1 v d v 2 v 1 v d v 2 7 x ). x ∈ R d is an eigenvector of a matrix A ∈ R d × d if Eigenvector: ⃗ A ⃗ x = λ⃗ x for some scalar λ (the eigenvalue corresponding to ⃗ • That is, A just ‘stretches’ x . • If A is symmetric, can find d orthonormal eigenvectors v d . Let V ∈ R d × d have these vectors as columns. ⃗ v 1 , . . . ,⃗     | | | | | | | | AV = A ⃗ A ⃗ · · · A ⃗  = λ 1 ⃗ λ 2 ⃗ · · · λ⃗  = V Λ       | | | | | | | | Yields eigendecomposition: AVV T = A = V Λ V T .

  9. review of eigenvectors and eigendecomposition Typically order the eigenvectors in decreasing order: 8 λ 1 ≥ λ 2 ≥ . . . ≥ λ d .

  10. courant-fischer principal arg max that we use for low-rank approximation. eigenvalues) are exactly the directions of greatest variance in X v T v T arg max Courant-Fischer Principal: For symmetric A , the eigenvectors are 9 given via the greedy optimization: ⃗ ⃗ v T A ⃗ v 1 = arg max v . ⃗ v with ∥ v ∥ 2 = 1 ⃗ v 2 = ⃗ v T A ⃗ v . ⃗ v with ∥ v ∥ 2 = 1 , ⟨ ⃗ v ,⃗ v 1 ⟩ = 0 . . . ⃗ ⃗ v T A ⃗ v d = v . ⃗ v with ∥ v ∥ 2 = 1 , ⟨ ⃗ v ,⃗ v j ⟩ = 0 ∀ j < d • ⃗ j A ⃗ v j = λ j · ⃗ j ⃗ v j = λ j , the j th largest eigenvalue. • The first k eigenvectors of X T X (corresponding to the largest k

  11. low-rank approximation via eigendecomposition 10

  12. low-rank approximation via eigendecomposition This is principal component analysis (PCA). v k . using eigenvalues of X T X . How accurate is this low-rank approximation? Can understand orthogonal basis minimizing 11 Upshot: Letting V k have columns ⃗ v 1 , . . . ,⃗ v k corresponding to the top k eigenvectors of the covariance matrix X T X , V k is the ∥ X − XV k V T k ∥ 2 F , ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : top eigenvectors of X T X , V k ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  13. spectrum analysis d v k . diagonal entries = sum eigenvalues). d k d v i v T k 12 principal components). Approximation error is: Let ⃗ v 1 , . . . ,⃗ v k be the top k eigenvectors of X T X (the top k ∥ X − XV k V T k ∥ 2 F = ∥ X ∥ 2 F tr ( X T X ) − ∥ XV k V T k ∥ 2 F tr ( V T k X T XV k ) ∑ ∑ ⃗ i X T X ⃗ = λ i ( X T X ) − i = 1 i = 1 ∑ ∑ ∑ = λ i ( X T X ) − λ i ( X T X ) = λ i ( X T X ) i = 1 i = 1 i = k + 1 F = ∑ d i = 1 ∥ ⃗ • For any matrix A , ∥ A ∥ 2 a i ∥ 2 2 = tr ( A T A ) (sum of ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : top eigenvectors of X T X , V k ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  14. spectrum analysis d v k . Claim: The error in approximating X with the best rank k 13 approximation (projecting onto the top k eigenvectors of X T X is: ∑ ∥ X − XV k V T k ∥ 2 F = λ i ( X T X ) i = k + 1 ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : top eigenvectors of X T X , V k ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  15. spectrum analysis Plotting the spectrum of the covariance matrix X T X (its eigenvalues) shows how compressible X is using low-rank approximation (i.e., how v k . 14 close ⃗ x 1 , . . . ,⃗ x n are to a low-dimensional subspace). ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : top eigenvectors of X T X , V k ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  16. spectrum analysis Exercises: 1. Show that the eigenvalues of X T X are always positive. Hint: v T v j . 2. Show that for symmetric A , the trace is the sum of 15 Use that λ j = ⃗ j X T X ⃗ eigenvalues: tr ( A ) = ∑ n i = 1 λ i ( A ) .

  17. summary onto a low-dimensional subspace. max data) is determined by the tail of X T X ’s eigenvalue spectrum. 16 • Many (most) datasets can be approximated via projection • Find this subspace via a maximization problem: orthonormal V ∥ XV ∥ 2 F . • Greedy solution via eigendecomposition of X T X . • Columns of V are the top eigenvectors of X T X . • Error of best low-rank approximation (compressibility of

  18. interpretation in terms of correlation Covariance becomes diagonal. I.e., all correlations have been v k . Recall: Low-rank approximation is possible when our data features removed. Maximal compression. are correlated. top k eigenvectors of X T X . 17 Our compressed dataset is C = XV k where the columns of V k are the What is the covariance of C ? C T C = V T k X T XV k = V T k V Λ V T V k = Λ k ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : top eigenvectors of X T X , V k ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend