A multilevel approach for overlapping community detection Alan - - PowerPoint PPT Presentation

a multilevel approach for overlapping community detection
SMART_READER_LITE
LIVE PREVIEW

A multilevel approach for overlapping community detection Alan - - PowerPoint PPT Presentation

A multilevel approach for overlapping community detection Alan Valejo, Jorge Valverde-Rebaza and Alneu de Andrade Lopes Department of Computer Science ICMC, University of So Paulo C.P. 668, CEP 13560-970, So Carlos, SP, Brazil


slide-1
SLIDE 1

A multilevel approach for overlapping community detection

Alan Valejo, Jorge Valverde-Rebaza and Alneu de Andrade Lopes

Department of Computer Science ICMC, University of São Paulo C.P. 668, CEP 13560-970, São Carlos, SP, Brazil {alan,jvalverr,alneu}@icmc.usp.br

October, 2014

slide-2
SLIDE 2

Outline

  • 1. Introduction
  • 2. Multilevel overlapping community detection
  • 3. Experiments
  • 4. Conclusion and Future Work
slide-3
SLIDE 3

Introduction

slide-4
SLIDE 4

Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work

Introduction

Graph partition techniques aim to divide the set of vertices of a graph into k disjoint partitions Social network

Biological network Information network Technology network

  • Vertices belonging to the same partitions share common properties

and have similar roles

  • Graph partitioning is useful to understand the topological structure

and dynamic processes of networks

Valejo et al. 1 / 18

slide-5
SLIDE 5

Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work

Introduction

Many real world networks have overlapping community structure, i.e. a vertex belongs to one or more communities

Figure: In (a), the network is partitioned into disjoint communities. In (b), the network have overlapping communities. The black vertices belong to more than

  • ne community.

Valejo et al. 2 / 18

slide-6
SLIDE 6

Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work

For instance:

  • In social networks, naturally, users create relationships with others

from various communities, such as family, friends, colleagues, etc [Reid et al., 2013].

  • In addition, online social network users may belong to many groups

[Valverde-Rebaza and Lopes, 2014].

  • This also occurs in other types of complex networks, such as

biological networks, where a large fraction of proteins belong to many complex [Gavin et al., 2006].

Valejo et al. 3 / 18

slide-7
SLIDE 7

Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work

Introduction

The graph partitioning problem is NP-complete

  • The identification of an optimal solution is a computationally

expensive task

  • Infeasible for large-scale networks

Big Data Facebook, Web networks, Biological, Biomedical, ...

Valejo et al. 4 / 18

slide-8
SLIDE 8

Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work

Introduction

Algorithm Reference Complexity CPM [Palla et al., 2005] O(n2) LFM [Lancichinetti et al., 2009] O(n2) HCL [Ahn et al., 2010] O(deg2

maxn)

Game [Chen et al., 2010] O(m2) iLCD [Cazabet et al., 2010] O(nk2) OSLOM [Lancichinetti et al., 2011] O(n2) NMF [Psorakis et al., 2011] O(cn2) UEOC [Jin et al., 2011] O(ln2) CIS [Kelley et al., 2012] O(n2)

Table: Algorithms for overlapping community detection with their respective computational complexity. Adapted from [Xie et al., 2013].

Most of the algorithms in the literature have good accuracy, but these have a high computational cost, prohibitive to address large-scale problems

Valejo et al. 5 / 18

slide-9
SLIDE 9

Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work

  • Recently, extensive researches emerged on multilevel strategies for

partitioning large-scale networks (MLP) [Bichot, 2013]

  • However, this strategy has not been explored in the overlapping

communities context We propose a multilevel approach to overlapping communities detection context

Valejo et al. 6 / 18

slide-10
SLIDE 10

Proposal

Multilevel overlapping community detection

slide-11
SLIDE 11

Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work

Proposal

Figure: The multilevel overlapping community detection scheme.

Valejo et al. 7 / 18

slide-12
SLIDE 12

Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work

Coarsening Phase

Gi Gi+1

Figure: Coarsening graph process uses the matching concept. the weight of all edges is 1. The rf = 0.5, thus, size o matching = number of vertices x 0.5

  • The reduction factor rf limits the number of pairs of vertices merged
  • When rf = 0.5, the number of vertices in the graph is reduced to half

Valejo et al. 8 / 18

slide-13
SLIDE 13

Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work

Initial Overlapping Partitioning Phase

Computes the initial partition C in the coarser graph GN We adapted two overlapping community detection algorithms:

  • Clique Percolation Method (CPM) [Palla et al., 2005]
  • Hierarchical Link Clustering (HCL) [Ahn et al., 2010]

We named them as CPM-MLP and HCL-MLP, respectively

Valejo et al. 9 / 18

slide-14
SLIDE 14

Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work

Overlapping Uncoarsening Phase

Gi Gi−1

Figure: The uncoarsening process. Dashed ellipses represent communities. Black vertices belong to more than one community.

Valejo et al. 10 / 18

slide-15
SLIDE 15

Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work

  • Metrics
  • Sensitivity truth overlapping vertices
  • Specificity truth non-overlapping ones
  • Accuracy weighted average of Sensitivity and Specificity
  • Modularity measure inter- and intra-community quality
  • Execution time (in seconds)
  • Our multilevel algorithms (CPM-MLP and HCL-MLP) have been

configured to perform three levels with reduction factor of 0.1, 0.2, 0.3, 0.4 and 0.5

  • We carried out experiments in two popular real world networks:
  • Facebook (social network)
  • Yeast (biological network)

Valejo et al. 11 / 18

slide-16
SLIDE 16

Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work

Facebook ego-networks

Method Sensitivity Specificity Accuracy Modularity time (s) CPM 0.557 0.639 0.595 0.179 310.179 CPM-MLP 0.448 0.779 0.569 0.171 5.201 HCL 0.302 0.801 0.439 0.166 68.559 HCL-MLP 0.318 0.917 0.472 0.211 0.904

reduction factor by three levels modularity

0.16 0.18 0.20 0.22 0.0 0.1 0.2 0.3 0.4 0.5

HCL CPM reduction factor by three levels accuracy

0.30 0.40 0.50 0.60 0.0 0.1 0.2 0.3 0.4 0.5

HCL CPM reduction factor by three levels time (s)

50 100 150 200 250 300 0.0 0.1 0.2 0.3 0.4 0.5

HCL CPM

Valejo et al. 12 / 18

slide-17
SLIDE 17

Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work

Yeast protein complexes

Method Sensitivity Specificity Accuracy Modularity time (s) CPM 0.606 0.586 0.596 0.438 7.09 CPM-MLP 0.809 0.444 0.627 0.593 0.19 HCL 0.419 0.658 0.538 0.497 8.26 HCL-MLP 0.678 0.667 0.672 0.642 0.30

reduction factor by three levels modularity

0.45 0.50 0.55 0.60 0.65 0.0 0.1 0.2 0.3 0.4 0.5

HCL CPM reduction factor by three levels accuracy

0.45 0.50 0.55 0.60 0.65 0.0 0.1 0.2 0.3 0.4 0.5

HCL CPM reduction factor by three levels time (s)

2 4 6 8 0.0 0.1 0.2 0.3 0.4 0.5

HCL CPM

Valejo et al. 13 / 18

slide-18
SLIDE 18

Conclusion

slide-19
SLIDE 19

Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work

Conclusion

  • We propose a multilevel approach to overlapping communities

detection context

  • The strategy proposed here is the first employing multilevel strategy

in the overlapping communities context

  • The definition of multilevel strategy in overlapping context allows to

use computationally expensive algorithms in large-scale applications without significant impact on general performance.

  • The application to a real network suggests that our approach

consistently produces better partitions than those produced by single-level approach substantially faster.

Valejo et al. 14 / 18

slide-20
SLIDE 20

Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work

Future Work

  • In uncoarsening phase it is possible by using refinement algorithms to

improve the solution quality

Valejo et al. 15 / 18

slide-21
SLIDE 21

References I

Ahn, Y.-Y., Bagrow, J. P., and Lehmann, S. (2010). Link communities reveal multiscale complexity in networks. Nature, 466:761–764. Bichot, C.-E. (2013). A Partitioning Requiring Rapidity and Quality: The Multilevel Method and Partitions Refinement Algorithms, pages 27–63. John Wiley & Sons, Inc. Cazabet, R., Amblard, F., and Hanachi, C. (2010). Detection of overlapping communities in dynamical social networks. In Social Computing (SocialCom), 2010 IEEE Second International Conference on, pages 309–314. Chen, W., Liu, Z., Sun, X., and Wang, Y. (2010). A game-theoretic framework to identify overlapping communities in social networks. Data Mining and Knowledge Discovery, 21(2):224–240.

slide-22
SLIDE 22

References II

Gavin, A. C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen,

  • L. J., Bastuck, S., Dumpelfeld, B., Edelmann, A., Heurtier, M. A., Hoffman, V., Hoefert, C.,

Klein, K., Hudak, M., Michon, A. M., Schelder, M., Schirle, M., Remor, M., Rudi, T., Hooper, S., Bauer, A., Bouwmeester, T., Casari, G., Drewes, G., Neubauer, G., Rick, J. M., Kuster, B., Bork, P., Russell, R. B., and Superti-Furga, G. (2006). Proteome survey reveals modularity of the yeast cell machinery. Nature, 440(7084):631–636. Jin, D., Yang, B., Baquero, C., Liu, D., He, D., and Liu, J. (2011). A markov random walk under constraint for discovering overlapping communities in complex networks. Journal of Statistical Mechanics: Theory and Experiment, 2011:P05031. Kelley, S., Goldberg, M., Magdon-Ismail, M., Mertsalov, K., and Wallace, A. (2012). Defining and discovering communities in social networks. In Handbook of Optimization in Complex Networks, pages 139–168. Springer US. Lancichinetti, A., Fortunato, S., and Kertész, J. (2009). Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics, 11(3):033015.

slide-23
SLIDE 23

References III

Lancichinetti, A., Radicchi, F., Ramasco, J. J., and Fortunato, S. (2011). Finding statistically significant communities in networks. PLoS ONE, 6(4):e18961. Palla, G., Derényi, I., Farkas, I., and Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814–818. Psorakis, I., Roberts, S., Ebden, M., and Sheldon, B. (2011). Overlapping community detection using bayesian non-negative matrix factorization.

  • Phys. Rev. E, 83:066114.

Reid, F., McDaid, A., and Hurley, N. (2013). Partitioning breaks communities. In Mining Social Networks and Security Informatics, Lecture Notes in Social Networks, pages 79–105. Springer Netherlands. Valverde-Rebaza, J. and Lopes, A. A. (2014). Link prediction in online social networks using group information. In ICCSA (6), pages 31–45.

slide-24
SLIDE 24

References IV

Xie, J., Kelley, S., and Szymanski, B. K. (2013). Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Comput. Surv., 45(4):43:1–43:35.

slide-25
SLIDE 25

Thanks!

Jorge Valverde-Rebaza jvalverr@icmc.usp.br