mining attributed networks
play

Mining Attributed Networks Part 1 Introduction Rushed Kanawati, - PDF document

Overview Complex Network Analysis Outlook Mining Attributed Networks Part 1 Introduction Rushed Kanawati, Martin Atzmueller A 3 , Universit e Sorbonne Paris Cit e, France CSAI, Tilburg University, Netherlands DSAA17, Tokyo 20


  1. Overview Complex Network Analysis Outlook M ODULARITY O PTIMIZATION L IMITATIONS Hypothesis The best partition of a graph is the one that maximizes the modularity. If a network has a community structure, then it is possible to find a precise partition with maximal modularity If a network has a community structure, then partitions inducing high modularity values are structurally similar. All three hypothesis do not hold [GdMC10, LF11]. 33 / 45

  2. Overview Complex Network Analysis Outlook P ROPAGATION -B ASED A PPROACHES Algorithm 1 Label propagation Require: G = < V , E > a connected graph, 1: Initialize each node with unique label l v 2: while Labels are not stable do for v 2 V do 3: | Γ l ( v ) | /* random tie-breaking */ l v = arg max l end for 4: 5: end while 6: return communities from labels Γ l ( v ) : set of neighbors having label l 34 / 45

  3. Overview Complex Network Analysis Outlook L ABEL P ROPAGATION Advantages I Complexity : O ( m ) I Highly parallel Disadvantages I No convergence guarantee, oscillation phenomena I Low robustness Different runs yield very different community structure due to randomness 35 / 45

  4. Overview Complex Network Analysis Outlook S EED - CENTRIC ALGORITHMS (K ANAWATI , SCSM’2014) Algorithm 2 General seed-centric community detection algorithm Require: G = < V , E > a connected graph, 1: C ; 2: S compute seeds(G) 3: for s 2 S do C s compute local com(s,G) 4: C C + C s 5: 6: end for 7: return compute community( C ) 36 / 45

  5. Overview Complex Network Analysis Outlook L INK P REDICTION Link predction I Structural Find hidden/missing links in a network ex. Missing links in Wikipedia I Temporal Predicting new links to appear at time t p based on the network state at instants t < t p Readings I M. Pujari et. al. , Link prediction in complex networks , chapter 3, in Advanced methods for complex networks Analysis, N. Meghanthan (Editor), IGI publishing, 2016. 37 / 45

  6. Overview Complex Network Analysis Outlook O UTLOOK : A LTERNATIVE NETWORK MODELS Network science is mature enough to a move towards more complex, expressive models ⌅ K-partite networks 38 / 45

  7. Overview Complex Network Analysis Outlook A LTERNATIVE NETWORK MODELS Network science is mature enough to a move towards more complex, expressive models ⌅ K-partite networks ⌅ Dynamic networks 39 / 45

  8. Overview Complex Network Analysis Outlook A LTERNATIVE NETWORK MODELS Network science is mature enough to a move towards more complex, expressive models ⌅ K-partite networks ⌅ Dynamic networks ⌅ Heterogeneous networks ⌅ Multiplex networks 40 / 45

  9. Overview Complex Network Analysis Outlook A LTERNATIVE NETWORK MODELS Network science is mature enough to a move towards more complex, expressive models ⌅ K-partite networks ⌅ Dynamic networks ⌅ Heterogeneous networks ⌅ Multiplex networks ⌅ Attributed networks 41 / 45

  10. Overview Complex Network Analysis Outlook A LTERNATIVE NETWORK MODELS Network science is mature enough to a move towards more complex, expressive models ⌅ K-partite networks ⌅ Dynamic networks ⌅ Heterogeneous networks ⌅ Multiplex networks ⌅ Attributed networks Next: A powerful model : Multiplex Network 42 / 45

  11. Overview Complex Network Analysis Outlook B IBLIOGRAPHY I Vincent D Blondel, Jean-loup Guillaume, and Etienne Lefebvre, Fast unfolding of communities in large networks , Journal of Statistical Mechanics: Theory and Experiment 2008 (2008), P10008. Aaron Clauset, Finding local community structure in networks , Physical Review E (2005). Jiyang Chen, Osmar R. Za¨ ıane, and Randy Goebel, Local community identification in social networks , ASONAM, 2009, pp. 237–242. B. H. Good, Y.-A. de Montjoye, and A. Clauset., The performance of modularity maximization in practical contexts. , Physical Review E (2010), no. 81, 046106. M. Girvan and M. E. J. Newman, Community structure in social and biological networks , PNAS 99 (2002), no. 12, 7821–7826. Andrea Lancichinetti and Santo Fortunato, Limits of modularity maximization in community detection , CoRR abs/1107.1 (2011). 43 / 45

  12. Overview Complex Network Analysis Outlook B IBLIOGRAPHY II Feng Luo, James Zijun Wang, and Eric Promislow, Exploring local community structures in large networks , Web Intelligence and Agent Systems 6 (2008), no. 4, 387–400. Clara Pizzuti, A multiobjective genetic algorithm to find communities in complex networks , IEEE Trans. Evolutionary Computation 16 (2012), no. 3, 418–430. Pascal Pons and Matthieu Latapy, Computing communities in large networks using random walks , J. Graph Algorithms Appl. 10 (2006), no. 2, 191–218. Jaewon Yang and Jure Leskovec, Defining and evaluating network communities based on ground-truth , ICDM (Mohammed Javeed Zaki, Arno Siebes, Jeffrey Xu Yu, Bart Goethals, Geoffrey I. Webb, and Xindong Wu, eds.), IEEE Computer Society, 2012, pp. 745–754. 44 / 45

  13. Overview Analysis Conclusion Mining Attributed Networks Part 2 – Analysis of Multiplex Networks Rushed Kanawati, Martin Atzmueller A 3 , Universit´ e Sorbonne Paris Cit´ e, France CSAI, Tilburg University, Netherlands DSAA’17, Tokyo – 20 October 2017 1 / 49

  14. Overview Analysis Conclusion O UTLINE 1 Multiplex networks: Overview & Definitions 2 Analysis of multiplex networks 1 Network Measures 2 Analysis tasks 3 Evaluation 3 Conclusions 2 / 49

  15. Overview Analysis Conclusion M ULTIPLEX N ETWORK : D EFINITIONS G = < V , E , C > I V set of nodes I E = { E 1 , . . . , E α } : 8 k 2 [ 1 , α ] E k ✓ V ⇥ V I C Layer coupling links from [Mucha et. al., 2010] Coupling I Ordinal Coupling : Diagonal inter-layer links among consecutive layers. I Categorical Coupling : Diagonal inter-layer links between all pairs of layers. I Generalized coupling ? Ex. Decay functions 3 / 49

  16. Overview Analysis Conclusion N OTATION Notation I A [ k ] Adjacency Matrix of slice k : a [ k ] ij 6 = 0 iff ( v i , v j ) 2 E k , 0 otherwise. I m [ k ] = | E k | . Often, we have m ⇠ n I Neighbors of v in slice k : Γ ( v ) [ k ] = { x 2 V : ( x , v ) 2 E k } . I All neighbors of v : Γ ( v ) tot = [ s ∈ { 1 ,..., α } Γ ( v ) [ s ] v = k Γ ( v ) [ k ] k I Node degree in slice k : d k I Total degree of node v : d tot = || Γ tot ( v ) || v 4 / 49

  17. Overview Analysis Conclusion M ULTIPLEX NETWORKS : R ELATED TERMS Recommended readings a et. al. . Multilayer Networks . arXiv:1309.7233, / S. Mikko Kivel¨ March 2014 5 / 49

  18. Overview Analysis Conclusion P OWER OF THE MULTIPLEX MODEL Multi-relational networks European airports network 6 / 49

  19. Overview Analysis Conclusion P OWER OF MULTIPLEX MODEL Dynamic networks Academic collaborations per year 7 / 49

  20. Overview Analysis Conclusion P OWER OF THE MULTIPLEX MODEL Heterogeneous networks DBLP author-centred multiplex network 8 / 49

  21. Overview Analysis Conclusion M ULTIPLEX NETWORKS : M EASURES ⌅ Need of generalization of the usual measures : Degree Neighborhood Centralities Paths and distances Clustering coefficient . . . ⌅ New layer-oriented questions to answer : Which layers determine the centrality of a user Which layers are relevant to measure the similarity of two nodes How one layer influence the evolution of another . . . 9 / 49

  22. Overview Analysis Conclusion A PPROACHES 1 Transformation into a monoplex centred problem I Layer aggregation approaches. I Hypergraph transformation based approaches I Ensemble approaches 2 Generalization of monoplex oriented algorithms to multiplex networks 10 / 49

  23. Overview Analysis Conclusion L AYER AGGREGATION 11 / 49

  24. Overview Analysis Conclusion L AYER AGGREGATION Aggregation functions 8 9 1  l  α : A [ l ] 1 ij 6 = 0 α A ij = 1 < w k A [ k ] X A ij = ij α 0 otherwise : k = 1 A ij = k { d : A [ d ] A ij = sim ( v i , v j ) 6 = 0 } k ij 12 / 49

  25. Overview Analysis Conclusion K- UNIFORM HYPERGRAPH TRANSFORMATION Principle I A k-uniform hypergraph is a hypergraph in which the cardinality of each hyperedge is exactly k I Mapping a multiplex to a 3-uniform hypergraph H = ( V , E ) such that : V = V [ { 1 , . . . , α } ( u , v , i ) 2 E if 9 l : A [ l ] uv 6 = 0, u , v 2 V , i 2 { 1 , . . . , α } I Apply hypergraphs analysis approaches (Ex. tensor-based approaches) 13 / 49

  26. Overview Analysis Conclusion M ULTIPLEX : N ODE NEIGHBORHOOD Some options I Γ mux ( v ) = [ α k = 1 Γ k ( v ) I Γ mux ( v ) = \ α k = 1 Γ k ( v ) I Γ mux ( v ) = { x 2 Γ ( v ) tot : sim ( x , v ) � δ } δ 2 [ 0 , 1 ] I Γ mux ( v ) = { x 2 Γ ( v ) tot : Γ ( v ) tot \ Γ ( x ) tot Γ ( v ) tot [ Γ ( x ) tot � δ } I . . . 14 / 49

  27. Overview Analysis Conclusion P ATHS , SHORTEST DISTANCE Some options I Path in an aggregated network P m α = 1 d ( u , v ) [ α ] I d average = 8 u , v 2 V and ( u , v ) / 2 E i . m I path � length ( u , v ) = < r 1 , r 2 , . . . , r α > where r i number of links in layer i j < r y j  r y I path x ( u , v ) dominates path y ( u , v ) 9 j : r x j , 8 k 6 = j r x j 15 / 49

  28. Overview Analysis Conclusion W HAT ABOUT COMMUNITIES ? What is a dense subgraph in a multiplex network ? [BCG11] 16 / 49

  29. Overview Analysis Conclusion C OMMUNITY DETECTION IN MULTIPLEX NETWORKS Approaches 1 Transformation into a monoplex community detection problem I Layer aggregation approaches. I Multi-objective optimization approach. I Ensemble clustering approaches 2 Generalization of monoplex oriented algorithms to multiplex networks . I Generalized-modularity optimization I Generalized info-map I Generalized walktrap I Seed-centric approaches 17 / 49

  30. Overview Analysis Conclusion M ULTI - OBJECTIVE OPTIMIZATION APPROACH [AP14] 1 Rank the set of α layers according to some importance criteria C 1 community ( G [ 1 ] ) 2 for i 2 [ 2 , α ] do: 3 C i optimize ( community ( G [ i ] ) , similarity ( C i � 1 )) 4 return C α 18 / 49

  31. Overview Analysis Conclusion E NSEMBLE CLUSTERING APPROACHES 19 / 49

  32. Overview Analysis Conclusion E NSEMBLE CLUSTERING APPROACHES Ensemble Clustering [SG03] I CSPA: Cluster-based Similarity Partitioning Algorithm I HGPA: HyperGraph-Partitioning Algorithm I MCLA: Meta-Clustering Algorithm I . . . 20 / 49

  33. Overview Analysis Conclusion E NSEMBLE CLUSTERING : APPROACHES CSPA: Cluster-based Similarity Partitioning Algorithm I Let K be the number of basic models, C i ( x ) be the cluster in model i to which x belongs. K P δ ( C i ( v ) , C i ( u )) i = 1 I Define a similarity graph on objects : sim ( v , u ) = K I Cluster the obtained graph : Isolate connected components after prunning edges Apply community detection approach I Complexity : O ( n 2 kr ) : n # objects, k # of clusters, r # of clustering solutions 21 / 49

  34. Overview Analysis Conclusion CSPA : E XAMPLE from Seifi, M. Cœurs stables de communaut´ es dans les graphes de terrain. Th` ese de l’universit´ e Paris 6, 2012 22 / 49

  35. Overview Analysis Conclusion E NSEMBLE CLUSTERING : ILLUSTRATION I 23 / 49

  36. Overview Analysis Conclusion E NSEMBLE CLUSTERING : ILLUSTRATION II 24 / 49

  37. Overview Analysis Conclusion E NSEMBLE CLUSTERING : ILLUSTRATION III 25 / 49

  38. Overview Analysis Conclusion E NSEMBLE CLUSTERING : ILLUSTRATION IV 26 / 49

  39. Overview Analysis Conclusion E NSEMBLE CLUSTERING : ILLUSTRATION V 27 / 49

  40. Overview Analysis Conclusion E NSEMBLE CLUSTERING : ILLUSTRATION VI 28 / 49

  41. Overview Analysis Conclusion E NSEMBLE CLUSTERING : ILLUSTRATION VII 29 / 49

  42. Overview Analysis Conclusion M ULTIPLEX M ODULARITY [MRM + 10] Generalized modularity I 0 0 1 1 d [ k ] i d [ k ] Q multiplex ( P ) = 1 @ A [ k ] j X X A δ kl + δ ij C kl ij � λ k @ ij A 2 m [ k ] 2 µ c 2 P i , j 2 c k , l : 1 ! α m [ k ] + C l P I µ = jk j 2 V k , l : 1 ! α I C kl ij Inter slice coupling = 0 8 i 6 = j 30 / 49

  43. Overview Analysis Conclusion S EED - CENTRIC ALGORITHMS [K AN 14] Algorithm 1 General seed-centric community detection algorithm Require: G = < V , E > a connected graph, 1: C ; 2: S compute seeds(G) 3: for s 2 S do C s compute local com(s,G) 4: C C + C s 5: 6: end for 7: return compute community( C ) 31 / 49

  44. Overview Analysis Conclusion T HE L ICOD ALGORITHM [YK14] 1 Compute a set of seeds that are likely to be leaders in their communities Heuristic : nodes having higher degree centralities than their neighbors 2 Each node in the graph ranks seeds in function of its own preference In function of increasing shortest path (length) 3 Iterate till convergence: Each node modifies its preference vector in function of neighbor’s preferences Applying rank aggregation methods. 32 / 49

  45. Overview Analysis Conclusion M UX L ICOD Multiplex degree centrality [BNL13] d [ k ] d [ k ] α ! d multiplex X i i = � log i d [ tot ] d [ tot ] k = 1 i i Multiplex shortest path α SP ( u , v ) [ k ] P SP ( u , v ) multiplex = k = 1 α Multiplex neighborhood Γ mux ( v ) = { x 2 Γ ( v ) tot : Γ ( v ) tot \ Γ ( x ) tot Γ ( v ) tot [ Γ ( x ) tot � δ } 33 / 49

  46. Overview Analysis Conclusion R ANK AGGREGATION [PK12, DKNS01] 34 / 49

  47. Overview Analysis Conclusion O THER ALGORITHMS 1 Random walk based approach (Generalization of Walktrap [KM15] 2 Generalized infomap [DLAR15] 35 / 49

  48. Overview Analysis Conclusion E VALUATION CRITERIA I 1 Multiplex modularity 2 Redundancy [BCG11] k { k : 9 A [ k ] uv 6 = 0 } k X ρ ( c ) = α ⇥ k P c k ( u , v ) 2 ¯ ¯ P c ¯ ¯ P the set of couple ( u , v ) which are directly connected in at least two layers Complementarity : γ ( c ) = V c ⇥ ε c ⇥ H c 3 36 / 49

  49. Overview Analysis Conclusion E VALUATION CRITERIA II I Variety V c : the proportion of occurrence of the community c across layers of the multiplex. k9 ( i , j ) 2 c / A [ s ] α ij 6 = 0 k X V c = (1) α � 1 s = 1 I Exclusivity ε c : number of pairs of nodes, in community c , that are connected exclusively in one layer. α k P c , s k X ε c = (2) k P c k s = 1 37 / 49

  50. Overview Analysis Conclusion E VALUATION CRITERIA III I Homogeneity H c : How uniform is the distribution of the number of edges, in the community c , per layer. ⇢ 1 σ c = 0 if H c = (3) σ c 1 � otherwise σ max c with α k P c , s k X avg c = α s = 1 v α u ( k P c , s k � avg c ) 2 X u σ c = t α s = 1 r ( max ( k P c , d k ) � min ( k P c , d k )) 2 σ max = c 2 38 / 49

  51. Overview Analysis Conclusion D ATASETS Benchmark networks Lazzega Lawyer network #nodes 71 #layer 3 39 / 49

  52. Overview Analysis Conclusion D ATASETS Dataset Physicians collaboration network #nodes 246 #layers 3 40 / 49

  53. Overview Analysis Conclusion R ESULTS : R EDUNDANCY 41 / 49

  54. Overview Analysis Conclusion R ESULTS : C OMPLEMENTARITY 42 / 49

  55. Overview Analysis Conclusion R ESULTS : MULTIPLEX MODULARITY 43 / 49

  56. Overview Analysis Conclusion P ARETO FRONT 44 / 49

  57. Overview Analysis Conclusion L AZEGA DATASET : C OMPARATIVE STUDY Figure: NMI (lower triangular part) , adjusted Rand (upper triangular part). 45 / 49

  58. Overview Analysis Conclusion C ONCLUSIONS I Multiplex networks provide a rich representation of real-world interaction systems I A lot of work to reformulate basic network concepts for multiplex settings, e.g. Roles, RandomWalk, PageRank, etc. I Community evaluation: still an open problem I Uncovered topics : Layer selection and compression, Co-evolution models, Dynamics on multiplex networks I Ideas under exploration: I Multiplex of multiplexes I Interactive Multiplex network visualisation. I Benchmarking of available tools 46 / 49

  59. Overview Analysis Conclusion B IBLIOGRAPHY I Alessia Amelio and Clara Pizzuti, Community detection in multidimensional networks , IEEE 26th International Conference on Tools with Artificial Intelligence, 2014, pp. 352–359. Michele Berlingerio, Michele Coscia, and Fosca Giannotti, Finding and characterizing communities in multidimensional networks , ASONAM, IEEE Computer Society, 2011, pp. 490–494. Federico Battiston, Vincenzo Nicosia, and Vito Latora, Metrics for the analysis of multiplex networks , CoRR abs/1308.3182 (2013). Cynthia Dwork, Ravi Kumar, Moni Naor, and D Sivakumar, Rank aggregation methods for the Web , WWW, 2001, pp. 613–622. Manlio De Domenico, Andrea Lancichinetti, Alex Arenas, and Martin Rosvall, Identifying modular flows on multilayer networks reveals highly overlapping organization in social systems , Phys. Rev 5 (2015), 011027. 47 / 49

  60. Overview Analysis Conclusion B IBLIOGRAPHY II Rushed Kanawati, Seed-centric approaches for community detection in complex networks , 6th international conference on Social Computing and Social Media (Crete, Greece) (Gabriele Meiselwitz, ed.), vol. LNCS 8531, Springer, June 2014, pp. 197–208. Zhana Kuncheva and Giovanni Montana, Community detection in multiplex networks using locally adaptive random walks , MANEM 2workshop - Proceedings of ASONAM 2015 (Paris), August 2015. Peter J Mucha, Thomas Richardson, Kevin Macon, Mason A Porter, and Jukka-Pekka Onnela, Community structure in time-dependent, multiscale, and multiplex networks , Science 328 (2010), no. 5980, 876–878. Manisha Pujari and Rushed Kanawati, Supervised rank aggregation approach for link prediction in complex networks , WWW (Companion Volume) (Alain Mille, Fabien L. Gandon, Jacques Misselis, Michael Rabinovich, and Steffen Staab, eds.), ACM, 2012, pp. 1189–1196. A. Strehl and J. Ghosh, Cluster ensembles: a knowledge reuse framework for combining multiple partitions , The Journal of Machine Learning Research 3 (2003), 583–617. 48 / 49

  61. Overview Analysis Conclusion B IBLIOGRAPHY III Zied Yakoubi and Rushed Kanawati, Licod: Leader-driven approaches for community detection , Vietnam Journal of Computer Science 1 (2014), no. 4, 241–256. 49 / 49

  62. MAN Tutorial Part III: Analysis of Attributed Networks Rushed Kanawati, Martin Atzmueller Université Sorbonne Paris Cité, France Tilburg University, Netherlands DSAA 2017, Tokyo, 2017-10-20

  63. Agenda � Overview/Recap: Attributed Networks � Compositional Subgroup Analysis � Community Detection � Link Prediction � Summary 2

  64. Terminology (Recap) Network è Graphs � Set of atomic entities (actors) è nodes, vertices � Set of links/edges between nodes ("ties") � Edges model pairwise relationships � Edges: Directed or undirected � Social network [Wassermann & Faust 1994] � Social structure capturing actor relations � Actors, links given by dyadic ties between actors (friendship, kinship, organizational position, …) è Set of nodes and edges � Abstract object – independent of representation 3

  65. Variables [Wassermann & Faust 1994] � Structural � Measure ties between actors ( è links) � Specific relation � Make up connections in graph/network � Compositional � Measure actor attributes � Age � Gender � Ethnicity � Affiliation � … � Describe actors 4

  66. Attributed Graphs � Graph: edge attributes and/or node attributes � Structure: ties/links (of respective relations) � Attributes - additional information � Actor attributes (node labels) � Link attributes (information about connections) � Attribute vectors for actors and/or links � … can be mapped from/to each other � Integration of heterogenous data (networks + vectors) � Enables simultaneous analysis of relational + attribute data 5

  67. Attributed Network/Graph � Examples � Citation Attributes � (Co-)Authors � Affiliation � Country � Gender � … � WWW � Links � Content (BoW) � … 6 (Newman 2003)

  68. Subgroups & Cohesive subgroups [Wasserman & Faust 1994] � Subgroup � Subset of actors (and all their ties) � Define subgroups using specific criteria (homogeneity among members) � Compositional – actor attributes � Structural – using tie structures � Detection of cohesive subgroups & communities è structural aspects � Subgroup discovery è actor attributes � … attributed graph è can combine both 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend