the variational nystr m method for large scale spectral
play

The Variational Nystrm Method for Large-Scale Spectral Problems Max - PowerPoint PPT Presentation

The Variational Nystrm Method for Large-Scale Spectral Problems Max Vladymyrov Miguel Carreira-Perpin Google Inc. EECS, UC Merced June 20, 2016 Graph based dimensionality reduction methods Given high-dimensional data points


  1. The Variational Nyström Method for Large-Scale Spectral Problems Max Vladymyrov Miguel Carreira-Perpiñán Google Inc. EECS, UC Merced June 20, 2016

  2. Graph based dimensionality reduction methods Given high-dimensional data points . Y D × N = ( y 1 , . . . , y N ) 1.Convert data points to a affinity matrix . M N × N 2.Find low-dimensional coordinates , so X d × N = ( x 1 , . . . , x N ) that their similarity is as close as possible to . M High-dimensional Low-dimensional Affinity M input output Y X 100 80 60 40 R D 20 R d 20 40 60 80 100 2

  3. Spectral methods • Consider a spectral problem: XX T = I , � XMX T � min X tr s.t. ‣ : symmetric psd affinity matrix. M N × N • Examples: ‣ Laplacian eigenmaps, is a graph Laplacian. M ‣ ISOMAP , is given by a matrix of shortest distances. M ‣ Kernel PCA, MDS, Locally Linear Embedding (LLE), etc. • Solution is unique and can be found in closed form from the eigenvectors of : . X = U T M M With large , solving the eigenproblem is infeasible even if N M is sparse. 3

  4. Learning with landmarks Goal is find a fast, approximate solution for the embedding X using only the subset of the original points from . Y Select landmarks Compute reduced L Learn landmark Project the rest (e.g. random subset) . affinity matrix representation of the points L × L 0.5 5 0.4 0.3 10 0.2 R D 15 0.1 R d R d 20 5 10 15 20 4

  5. Nyström method Writing the affinity matrix by blocks (landmarks first): M B T A A 21 M = C = B 21 B 21 B 22 The approximation to the eigendecomposition is equal to: ✓ ◆ U A e U M = B 21 U A Λ − 1 A Essentially, an out-of-sample formula: 1. Solve the eigenproblem for a subset of points. 2. Predict the rest of the points through the interpolation formula. 5

  6. Column Sampling method Writing the affinity matrix by blocks (landmarks first): M B T A A 21 M = C = B 21 B 21 B 22 The approximation to the eigendecomposition is given by the left singular vectors of : C e C = U C Σ C V T U M = U C ⇒ C Uses more information from the affinity matrix than Nyström, but M still ignores non-landmark/non-landmark interaction part . B 22 6

  7. Locally Linear Landmarks (LLL) (Vladymyrov & Carreira-Perpiñán, 2013) • Construct the local linear projection matrix from the input : Z Y y n ≈ P L Y ≈ e YZ T l =1 z ln e y l , n = 1 , . . . , N ⇒ • Additional assumption: this projection is satisfied in the X = e embedding space: . XZ T • Plugging the projection to the original obj. function: � XMX T � XX T = I , X = e XZ T min X tr s.t. ⇒ ⇣ X T ⌘ XZ T MZ e XZ T Z e X T = I e e min e X tr s.t. • The solution is given by the reduced generalized eigenproblem: X = eig( ZMZ T , ZZ T ) e • Final embedding are predicted as: . X = e XZ T • This solution is optimal given the constraint . X = e XZ T 7

  8. Generalizing approximations Nyström: Expand the upper part: N × L ✓ ◆ ✓ AU A Λ − 1 ◆ } U A e = CU A Λ − 1 A U M = = B 21 U A Λ − 1 B 21 U A Λ − 1 A } A A L × d Column Sampling: C T C Rewrite using the eigendecomposition of matrix : L × L C = CU C T C Λ − 1 / 2 e U M = U C = CV C Σ − 1 C T C LLL: U M = Z e e X = e e X T XZ T Rewrite the solution as , where is X computed optimally (given ) as: Z X = eig( ZMZ T , ZZ T ) e 8

  9. Generalizing approximations Nyström: 1. Solve the smaller eigendecomposition: L × L A = U A Λ A U T A 2. Apply out-of-sample matrix: N × L e U M = CU A Λ − 1 A Column Sampling: 1. Solve the smaller eigendecomposition: L × L C T C = U C T C Λ C T C U C T C 2. Apply out-of-sample matrix: N × L U M = CU C T C Λ − 1 / 2 e C T C LLL: 1. Solve the smaller eigendecomposition: L × L X = eig( ZMZ T , ZZ T ) e 2. Apply out-of-sample matrix: N × L U M = Z e e X T 9

  10. Generalizing approximations Each approximation consist of the following steps: • define an out-of-sample matrix , Z N × L • compute some reduced eigenproblem and a matrix that Q L × d depends on it, e • final approximation is equal to . U M = ZQ Eigenproblem A U = B U Λ Q L × d Z N × L A , B U Λ − 1 A , I C Nyström Z T Z , I U Λ − 1 / 2 C Column Sampling ZMZ T , Z T Z Y ≈ e computed U YZ LLL from ZMZ T , Z T Z qr( M q S ) U Random Projection 10

  11. Generalizing approximations Each approximation consist of the following steps: • define an out-of-sample matrix , Z N × L • compute some reduced eigenproblem and a matrix that Q L × d depends on it, e • final approximation is equal to . U M = ZQ Eigenproblem A U = B U Λ Q L × d Z N × L A , B U Λ − 1 A , I C Nyström Z T Z , I U Λ − 1 / 2 C Column Sampling ZMZ T , Z T Z Y ≈ e U computed YZ LLL from ZMZ T , Z T Z qr( M q S ) U Random Projection ZMZ T , ZZ T U C Variational Nyström 11

  12. Variational Nyström Add this Nyström out-of-sample constraint to the spectral problem: � XMX T � XX T = I , X = e XC T min X tr s.t. ⇒ ⇣ X T ⌘ XC T MC e XC T C e X T = I e e min e X tr s.t. From LLL perspective: • replace customary built out-of-sample matrix with a readily Z available column matrix , C • abandon local linearity assumption of the weights , Z • save computation of , Z • is usually sparser than (due to locality). C Z 12

  13. Variational Nyström Add this Nyström out-of-sample constraint to the spectral problem: � XMX T � XX T = I , X = e XC T min X tr s.t. ⇒ ⇣ X T ⌘ XC T MC e XC T C e X T = I e e min e X tr s.t. From Nyström perspective: • use the same out-of-sample matrix , but optimize the choice of C the reduced eigenproblem, • for fixed gives better approx. than Nyström or Column e Y Sampling ( optimal for the out-of-sample kernel ). C • uses all the elements from to construct the reduced M eigenproblem, • forgo the interpolating property of Nyström. 13

  14. Subsampling graph Laplacian • Consider given by normalized graph Laplacian matrix: M L ∝ D − 1 / 2 WD − 1 / 2 - Gaussian affinity matrix: w nm = exp( �k y 2 n � y 2 m k / 2 σ 2 ) D = diag ( P N - Degree matrix: m =1 w nm ) • One of the most widely used kernel (Laplacian Eigenmaps, spectral clustering). • Graph Laplacian kernel is a data dependent : subset of graph Laplacian graph Laplacian computed for a subset L × L 6 = constructed for points. of input points L N L × L N × N 14

  15. Subsampling graph Laplacian • Data dependance can be a problem for methods that depend on the subsampling: - Nyström, - Column Sampling, - Variational Nyström. • Not a problem methods for which there is no subsampling: - LLL, - Random projection. Our solution: normalize subsample kernel separately, but in a way that interpolates over the landmarks and gives exact solution when : L = N D 1 D − 1 / 2 D − 1 / 2 D 2 C M L → N 15

  16. Subsampling graph Laplacian D 1 D − 1 / 2 D − 1 / 2 D 2 C M L → N • For Nyström and Column Sampling: • we propose different forms for and , D 1 D 2 • we evaluate empirically which one is the best. • For Variational Nyström: • we showed that factors out, D 2 • any leads to the exact solution when . L = N D 1 For the graph Laplacian kernel, the Variational Nyström approximation is more general. 16

  17. Experiments: Laplacian eigenmaps • Reduce dimensionality of digits from MNIST . N = 20 000 d = 10 • Run 5 times for different randomly chosen landmarks from L = 11 to . L = 19 900 0 10 Error with respect to the exact − 1 10 Nys objfun − 2 CS 10 LLL oNys − 3 10 Halko(q=1) Halko(q=2) Halko(q=3) − 4 10 2 3 4 10 10 10 L Number of landmarks 17

  18. Experiments: Laplacian eigenmaps • Reduce dimensionality of digits from MNIST . N = 20 000 d = 10 • Run 5 times for different randomly chosen landmarks from L = 11 to . L = 19 900 4 10 3 10 Runtime time 2 10 1 10 0 10 2 3 4 10 10 10 L Number of landmarks 18

  19. Experiments: Laplacian eigenmaps • Reduce dimensionality of digits from MNIST . N = 20 000 d = 10 • Run 5 times for different randomly chosen landmarks from L = 11 to . L = 19 900 0 10 Error with respect to the exact − 1 10 objfun − 2 10 − 3 10 − 4 10 0 1 2 3 4 10 10 10 10 10 Runtime time 19

  20. Experiments: Laplacian eigenmaps • Reduce dimensionality of digits from MNIST . N = 20 000 d = 10 • Run 5 times for different randomly chosen landmarks from L = 11 to . L = 19 900 0 10 Error with respect to the exact − 1 10 objfun − 2 10 Variational Nyström − 3 10 is winning! 2x as fast as LLL! − 4 10 0 1 2 3 4 10 10 10 10 10 Runtime time 19

  21. Experiments: Spectral clustering Original image Exact Spectral clustering, t = 512 s Variational Nyström, Nyström, t = 25 s t = 25 s 20x speedup! 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend