spectral clustering of large networks
play

SPECTRAL CLUSTERING OF LARGE NETWORKS A. Fender, N. Emad, S. - PowerPoint PPT Presentation

SPECTRAL CLUSTERING OF LARGE NETWORKS A. Fender, N. Emad, S. Petiton, M. Naumov May 8th, 2017 Introduction Laplacian Agenda Modularity Conclusions 2 THE CLUSTERING PROBLEM Example : detect relevant groups based on frequent co-purchasing


  1. SPECTRAL CLUSTERING OF LARGE NETWORKS A. Fender, N. Emad, S. Petiton, M. Naumov May 8th, 2017

  2. Introduction Laplacian Agenda Modularity Conclusions 2

  3. THE CLUSTERING PROBLEM Example : detect relevant groups based on frequent co-purchasing on Amazon.com Data: V. Krebs. 2004 3 Visualization: M. Bastian, S. Heymann, and M. Jacomy . “ Gephi: An Open Source Software for exploring and manipulating networks” 2009

  4. THE CLUSTERING PROBLEM Pink Liberal Yellow Neutral Green Conservative Data: V. Krebs. 2004 4 Visualization: M. Bastian, S. Heymann, and M. Jacomy . “ Gephi: An Open Source Software for exploring and manipulating networks” 2009

  5. CLUSTERING ALGORITHMS A x  x • Spectral Build a matrix, solve an eigenvalue problem, use eigenvectors for clustering coarse fine • Hierarchical / Agglomerative Build a hierarchy (fine to coarse), partition coarse, propagate results back to fine level Local refinements • Switch one node at a time 5

  6. Laplacian 6

  7. RATIO CUT COST FUNCTION Objective function: 𝑞 𝐷𝑝𝑡𝑢 = 𝜖𝐹 𝑗 𝑊 𝑗 𝑗=1 where 𝜖𝐹 𝑗 : # of edges cut and 𝑊 𝑗 : # of nodes in i-th partition 𝐷𝑝𝑡𝑢 = 2 2 + 2 3 = 5 A compromise between small edge-cut and balanced partitions 3 M. Naumov, T. Moon. “Parallel spectral graph partitioning.” Nvidia Technical Report, 2016. 7

  8. GRAPH LAPLACIAN L = D − A D : degree matrix A : adjacency matrix 1 -1 1 1 5 -1 3 -1 -1 3 1 1 1 3 1 − = -1 3 -1 -1 1 1 1 3 4 -1 -1 2 1 1 2 2 -1 1 1 1 𝑀 𝐵 𝐸 M. Naumov, T. Moon. “Parallel spectral graph partitioning.” Nvidia Technical Report, 2016. 8

  9. GRAPH LAPLACIAN For a vector x with elements that are 0 or 1 : Number of edges cut x T L x 1 -1 1 5 -1 3 -1 -1 1 3 |𝜖𝐹 1 | = = 2 1 1 1 -1 3 -1 -1 -1 -1 2 4 -1 1 2 V 1 𝜖𝐹 1 Number of elements x T x |𝑊 1 | = = 2 M. Naumov, T. Moon. “Parallel spectral graph partitioning.” Nvidia Technical Report, 2016. 9

  10. MINIMIZATION PROBLEM p p T Lx i x i x i 𝑊 i 𝜖𝐹 i min min T x i x i 𝑊 i i=1 i=1 where x i ∈ 0,1 n and x i ⊥ x j 5 Next step 3 1 Relax requirements on x, 4 and let x i take real values 2 V 1 𝜖𝐹 1 M. Naumov, T. Moon. “Parallel spectral graph partitioning.” Nvidia Technical Report, 2016. 10

  11. K-MEANS POINTS CLUSTERING Centroids Points p 1 c 1 p 2 c k … Lloyd’s Algorithm: Select centroids • • Compute distance of points to centroids Assign points to the closest centroid • p l • Recompute centroid position M. Naumov, T. Moon. “Parallel spectral graph partitioning.” Nvidia Technical Report, 2016. 11

  12. EDGE CUT MINIMIZATION PIPELINE Points Graph Eigensolver Laplacian Clustering Clustering Preprocessing 1.0 1 -1.0 1 -1 1.0 1 -0.3 -1 3 -1 -1 1 1.0 0.3 -1 3 -1 -1 1 1.0 0.0 -1 -1 2 1 1.0 1.0 -1 1 𝑦 1 𝑦 2 𝑦 1 𝑦 2 12

  13. SPECTRAL EDGE CUT MINIMIZATION 80% hit rate Balanced cut minimization Ground truth 13

  14. Modularity 14

  15. MODULARITY FUNCTION Measures the difference between how well vertices are assigned into clusters for the current graph G = (V,E) versus a random graph R = (V,F). … … … G = (V,E) R = (V,F) … − 𝑤 𝑗 𝑤 𝑘 𝑅 = 1 2𝜕 (𝑥 𝑗𝑘 2𝜕 ) 𝜀 𝑑 𝑗 𝑑(𝑘) 𝑗 𝑘 for some assignment c(.) into clusters. 15 A. Fender, N. Emad, S. Petiton, M. Naumov. “Parallel Modularity Clustering.” ICCS, 2017

  16. MODULARITY MATRIX 5 3 1 Let matrix 4 2 𝐶 = 𝐵 − 1 2𝜕 𝑤𝑤 𝑈 then modularity 𝑤 𝐵 1 2𝜕 𝑈𝑠(𝑌 𝑈 𝐶𝑌 ) Q = 1 1 3 1 1 1 where Tr(.) is the trace (sum of diagonal elements) 3 1 1 1 2 1 1 and X = [𝑦 1 , … , 𝑦 𝑞 ] is such that 𝑦 𝑗𝑙 = 1 𝑗𝑔 𝑑 𝑗 = 𝑙 . 1 1 16 A. Fender, N. Emad, S. Petiton, M. Naumov. “Parallel Modularity Clustering.” ICCS, 2017

  17. MODULARITY MAXIMIZATION PIPELINE Graph Points Eigensolver Modularity Clustering Preprocessing Clustering -0.5 1 0 1 -0.6 0.0 1 1 0 1 1 -0.4 1 0.0 1 0 1 1 0.4 1 0.6 0.0 1 1 0 1 -0.5 0.6 1 0 𝑦 4 𝑦 5 𝑦 1 − 1 𝑦 2 2𝜕 𝑤𝑤 𝑈 17

  18. SPECTRAL MODULARITY MAXIMIZATION 84% hit rate Spectral Modularity maximization Ground truth 18

  19. PROFILING The eigensolver takes 90% of the time The sparse matrix vector multiplication takes 90% of the time in the eigensolver 19

  20. MODULARITY VS. LAPLACIAN CLUSTERING Modularity  higher and steadier modularity score  3x speedup over Laplacian Laplacian  homogeneous cluster sizes Nvidia Titan X (Pascal), Intel Core i7-3930K @3.2 GHz 20

  21. SPEEDUP AND QUALITY VS. AGGLOMERATIVE * 0.8s on network with 100 million edges on a single Titan X GPU Speedup 3x over agglomerative* scheme Tradeoff Speed vs. quality *: D. LaSalle and G. Karypis. “Multi - threaded Modularity Based Graph Clustering Using the Multilevel Paradigm” Parallel Distrib. Comput., Vol. 76, pp. 66- 80, 2015. Nvidia Titan X (Pascal), Intel Core i7-3930K @3.2 GHz 21

  22. Spectral Clustering in CUDA Toolkit 9.0 release of nvGRAPH 22

  23. CUDA TOOLKIT 9.0 nvGRAPH API nvgraphStatus_t nvgraphSpectralClustering ( struct SpectralClusteringParameter { nvgraphHandle_t handle, int n_clusters; const nvgraphGraphDescr_t graph_descr, int n_eig_vects; const size_t weight_index, nvgraphSpectralClusteringType_t alg float evs_tolerance const struct SpectralClusteringParameter *params, int *clustering, int evs_max_iter; float kmean_tolerance; void *eig_vals, void *eig_vects ); … }; 23

  24. Conclusions 24

  25. SPECTRAL CLUSTERING • Software Framework similar for both • Laplacian - Minimum balanced cut [1] • Probably the most common metric, with balancing involved in the cost function • Requires careful choice of eigensolver • Modularity maximization [2] • Widely used in analysis of social networks • Faster to compute • [1] M. Naumov, T. Moon. “Parallel spectral graph partitioning.” Nvidia Technical Report, 2016. [2] A. Fender, N. Emad, S. Petiton, M. Naumov. “Parallel Modularity Clustering.” ICCS, 2017. 25

  26. Thank you H7129 - Accelerated Libraries Monday and Wednesday @ 4:00, Pod B 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend