brendan meeder carnegie mellon university
play

Brendan Meeder Carnegie Mellon University Christos Faloutsos - PowerPoint PPT Presentation

Leman Akoglu Carnegie Mellon University Hanghang Tong IBM T. J. Watson Brendan Meeder Carnegie Mellon University Christos Faloutsos Carnegie Mellon University Given a graph with node attributes (features) social networks + user interests


  1. Leman Akoglu Carnegie Mellon University Hanghang Tong IBM T. J. Watson Brendan Meeder Carnegie Mellon University Christos Faloutsos Carnegie Mellon University

  2. Given a graph with node attributes (features) social networks + user interests phone call networks + customer demographics gene interaction networks + gene expression info Find cohesive clusters, bridges, anomalies B A cohesive cluster: similar connectivity & attribute coherence 2 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  3. Feature (Binary) People People Groups Groups Features People Groups People Groups People People A F Given adjacency matrix A and feature matrix F Find homogeneous blocks (clusters) in A and F * parameter-free * scalable 3 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  4.  Flat clustering  Graph clustering  Additional feature nodes  heterogeneous graph  Weighted edges by both connectivity and feature similarity  quadratic pairwise computations!  choice of similarity function 4 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  5.   Flat clustering (e.g. k-means) [Kriegel+] [Leeuwen+] METIS [Karypis and Kumar], [Flake+]   [Girvan and Newman] [Andersen+] spectral [Ng+], co-clustering [Dhillon+] SA-cluster [Zhou+], Spect. rel. clus. [Long+]   CoPaM [Moser+], Gamer [Gunneman+]   ? ,     Autopart and cross-assoc.s [Chakrabarti+], GraphScope [Sun+], PaCK [He+] 5 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  6. DETAILS 1.How many node- & attribute-clusters? 2.How to assign nodes and attributes to clusters? Main idea: employ Minimum Description Length L (M) + L (D|M) encoding length encoding length of clustering of blocks Good Good implies Clustering Compression 6 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  7. BACKGROUND Given database D and set of models for D, d = 1 MDL selects model M that minimizes L (M) + L (D|M) vs. length in bits: data , length in bits: d = 9 encoded by M description of model M a 1 x+a 0 deltas vs. Bishop: PR&ML a 9 x 9 +…+ a 1 x+a 0 {} 7 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  8. DETAILS  L (M) : Model description cost 1. n: #nodes f: #attributes 2. k: #node-clus. l: #attribute-clus. size of node cluster i 3. size of attr. cluster j r     i optimal # bits log log p i n r r r        i i i node clus . c ost r . log n . log nH ( P ) i n n n i i 8 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  9. DETAILS  L(D|M): Data description cost given Model 1. For each block in A and F , #1s: 2. Encoding cost of a block where or 9 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  10. DETAILS  L (M) : Model description cost 1. as n: #nodes, f: #attributes 2. k: #node-clusters, l: #attribute-clusters 3. size of node-cluster i size of attribute-cluster j A similar problem (column re-ordering for minimum  L(D|M): Data description cost given Model total run length) is shown to be NP-hard 1. For each block in A and F , #1s: [Johnson+]. (reduction from Hamiltonian Path) 2. where or 10 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  11. The algorithm is iterative and monotonic – will converge to local optimum 11 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  12. 12 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  13. Computational complexity: time/iteration (s) # non-zeros 13 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  14. Graphs Description n f nnz 1. Phone call users, titles 94 7 391 2. Device users, titles 94 7 5K 3. PolBooks books, incl. 92 2 840 4. PolBlogs blogs, incl. 1.5K 2 20K 5. Twitter users, h-tags 9.6K 10K 82K 6. YouTube users, groups 77K 30K 1M 7. YeastGene genes, articles 844 17K 64K 14 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  15. Books Book groups liberal vs. conservative “core and periphery” 15 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  16. Examples of “core” liberal and conservative books Books Book groups liberal vs. Examples of bridging ‘conservative’ books conservative “core and periphery” – – – 16 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  17. call-center casual business grad Subjects title Phone calls Subjects title Device scans 17 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  18. 1 A 1 Yeast genes 2 A2 3 A3 Yeast genes Articles survey 844 genes 17K articles 18 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  19. casual Italian bloggers heavy-hitters Twitter users @hashtags 9,6K users 10K hashtags 19 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  20. familiar strangers anime lovers bridges YouTube users YouTube 77K users groups 30K groups 20 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  21.  Novel clustering model: ▪ PICS finds groups of nodes in an attributed graph with (1) similar connectivity, and (2) attribute homogeneity. ▪ It also groups the node attributes into attribute-clusters.  Parameter-free nature: ▪ No user input, e.g. number of clusters, similarity functions/thresholds  Effectiveness: ▪ Insightful clusters, bridges and outliers in diverse real- world datasets including YouTube and Twitter.  Scalability: ▪ Linearly growing run time with graph + attribute size 21 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  22. lakoglu@cs.cmu.edu http://www.cs.cmu.edu/~lakoglu/ Source code: www.cs.cmu.edu/~lakoglu/#pics 22 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend