graph clustering graph clustering what is clustering what
play

Graph Clustering Graph Clustering What is clustering? What is - PowerPoint PPT Presentation

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns in data, or grouping similar groups of data-points together into clusters . Clustering algorithms for numeric data: Lloyds K-means, EM


  1. Graph Clustering Graph Clustering

  2. What is clustering? What is clustering? � Finding patterns in data, or grouping similar groups of data-points together into clusters . � Clustering algorithms for numeric data: � Lloyd’s K-means, EM clustering, spectral clustering etc.

  3. Examples of good clustering: Examples of good clustering: IMAGE SEGMENTATION

  4. Graph Clustering: Graph Clustering: � Graphical representation of data as undirected graphs. GRAPH PARTITIONING!!

  5. Graph clustering: Graph clustering: � Undirected graphs � Clustering of vertices on basis of edge structure. � Defining a graph cluster? g g p � In its loosest sense, a graph cluster is a connected component. � In its strictest sense, it’s a maximal clique of a graph. q g p � Many vertices within each cluster. � Few edges between clusters. � Few edges between clusters.

  6. Graph terminology: Graph terminology:

  7. Graph partitioning: Graph partitioning:

  8. Graph Partitioning: Graph Partitioning: � The optimization problem for normalized cuts is intractable (an NP hard problem). � Hence we resort to spectral clustering and approximation algorithms.

  9. More Graph notation: More Graph notation: Adjacency Matrix, A Degree Matrix The properties of the Laplacian of a graph are found to be more interesting for the characterizatiojnof a grpah than the adjacency matrix. The unnormalized Graph Laplacian is defined as Laplacian is defined as

  10. Properties of the Laplacian: Properties of the Laplacian: For every vector 1. L is symmetric and positive definite. 2. 0 is an eigenvalue of the Laplacian with the constant 0 is an eigenvalue of the Laplacian, with the constant 3 3. vector as a corresponding eigenvector. L has n non-negative eigenvalues L has n non negative eigenvalues. 4 4.

  11. Number of Components: Number of Components:

  12. Graph spectra: Graph spectra: � The multiplicity of the eigenvalue 0 gives the number of connected components in the graph.

  13. Graph Generation models: Graph Generation models: � Uniform random model � All edges equiprobable � Poissonian degree distribution � No cluster structure. � Planted partition model � l partitions of vertex set � Edge-probabilities p and q . � Caveman graphs, RMAT generation etc. � Fuzzy graphs?? y g p

  14. General clustering paradigms: General clustering paradigms: � Hierarchical clustering VS flat clustering. � Hierarchical: � T op down � Bottom up

  15. Overview: Overview: � Cut based methods: � Become NP hard with introduction of size constraints. � Approximation algorithms minimizing graph conductance. � M � Maximum flow i fl � Using results by Golberg and Tarjan � Reasonable for small graphs � Reasonable for small graphs. � Graph Spectrum based: � Stable perturbation analysis � Stable perturbation analysis � Good even when graph is not exactly block diagonal. � Typically, second smallest eigenvalue is taken as graph characterstic. � Spectrum of graph transition matrix for blind walk.

  16. Overview: Overview: � Could experiment with properties of different Laplacians. � Typically outperforms k-means and other traditional clustering algorithms. � Computationally unfeasible for large graphs. � Roundabouts?

  17. Voltage potential view: ☺ Voltage-potential view: ☺ � Related to ‘betweenness’ of edges. � Not stable to placement of rando N bl l f d m sources and sinks.

  18. Markov Random walks: Markov Random walks: � Vertices in same cluster are quickly reachable. � A random walk in one of the clusters is likely to remain for a long time a long time. � The Perron-Frobenius theorem ensures that the largest eigenvalue associated with a transition matrix is always 1 eigenvalue associated with a transition matrix is always 1. (relation with Graph Laplacian). � Component of second eigenvector vector of the � Component of second eigenvector vector of the transition matrix serves as a measure of absorption time.

  19. Thank you.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend