data mining and matrices
play

Data Mining and Matrices 07 Graphs Rainer Gemulla, Pauli Miettinen - PowerPoint PPT Presentation

Data Mining and Matrices 07 Graphs Rainer Gemulla, Pauli Miettinen Jun 6, 2013 Graph mining Graphs everywhere Internet World wide web Social networks Protein-protein interactions Similarity graphs . . . Goals of


  1. Data Mining and Matrices 07 – Graphs Rainer Gemulla, Pauli Miettinen Jun 6, 2013

  2. Graph mining Graphs everywhere ◮ Internet ◮ World wide web ◮ Social networks ◮ Protein-protein interactions ◮ Similarity graphs ◮ . . . Goals of graph mining ◮ As data mining: classification, clustering, outliers, patterns ◮ Output often also one or more graphs ◮ Interesting subgraphs (e.g., communities, near-cliques, clusters) ◮ Important vertices (e.g., influential bloggers, PageRank, outliers) ◮ Web mining (e.g., topic predicition, classification) ◮ Web usage mining (e.g., frequent subgraphs, patterns) ◮ Recommender systems (e.g., movie recommendation, edge prediction) ◮ ... Spectral analysis of matrices associated with graphs is an important tool in graph mining. Our focus: spectral clustering and link analysis. 2 / 46

  3. A graph is a matrix is a graph Let G = ( V , E ) be a (weighted) graph Vertices V = { v 1 , . . . , v n } Edge ( i , j ) ∈ E has positive weight w ij (or 1 if graph is unweighted) Convention: absent edges ( i , j ) / ∈ E have weight w ij = 0 Adjacency matrix W is n × n matrix with W ij = w ij ⇒ W symmetric ( W = W T ) Undirected graph ⇐ Degree of vertex i given by d i = � j w ij = W i ∗ 1 Degree matrix D is n × n diagonal matrix with D ii = d i v 2 v 4  0 0 0 0 0   0 0 0 0 0  1 0 1 1 0 0 3 0 0 0     v 1     0 1 0 0 0 0 0 1 0 0         0 1 1 0 0 0 0 0 2 0     v 3 v 5 0 0 0 1 0 0 0 0 0 1 G W D 3 / 46

  4. Outline Spectral clustering 1 Similarity Graphs 2 Graph Laplacian 3 Unnormalized Spectral Clustering 4 Normalization 5 Summary 6 4 / 46

  5. k -Means example (1) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● k -Means cannot detect non-convex clusters well. 5 / 46

  6. k -Means example (2) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● k -Means is sensitive to skew in cluster sizes. 6 / 46

  7. A better clustering ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● In this clustering, points within a cluster are close to their neigh- bors, but not necessarily to all the points in the cluster. 7 / 46

  8. Graph-based clustering 1 Given a dataset, construct a similarity graph modeling local neighborhood relationships 2 Partition the similarity graph using suitable graph cuts ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Similarity graph Clustering 8 / 46

  9. Discussion Clustering Points within a cluster should be similar 1 Points in different clusters should be dissimlar 2 k -Means is global All points within a cluster should be similar (close) 1 Points in different clusters should be dissimilar (far apart) 2 Graph-based clustering is local Neighboring points within a cluster should be similar (close) 1 Points in different clusters should be dissimilar (far apart) 2 9 / 46

  10. Which cut? (1) G = ( V , E ): Undirected, weighted similarity graph A ⊂ V , ¯ A = V \ A A and ¯ A form a partitioning of V into two clusters Minimum cut cut( A , ¯ � A ) = w ij i ∈ A , j ∈ ¯ A Can be solved efficiently (in P) Often not useful in practice, e.g., may separate a single vertex → Need to balance cut weight and cluster sizes 10 / 46

  11. Which cut? (2) Minimum ratio cut (penalize different sizes w.r.t. vertices) � 1 | A | + 1 � RatioCut( A , ¯ � A ) = w ij | ¯ A | i ∈ A , j ∈ ¯ A Minimum normalized cut (penalize different sizes w.r.t. edges) � 1 1 � Ncut( A , ¯ � A ) = w ij vol( A ) + , vol(¯ A ) i ∈ A , j ∈ ¯ A where vol( A ) = � i ∈ A d i = � i , j ∈ A w ij Unfortunately, both problems are NP-hard Spectral clustering is a relaxation of RatioCut or Ncut, is simple to implement, and can be solved efficiently. 11 / 46

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend