NETWORK COMMUNITY DETECTION IN PRACTICAL SCENARIOS Lovro Subelj - - PowerPoint PPT Presentation

network community detection in practical scenarios
SMART_READER_LITE
LIVE PREVIEW

NETWORK COMMUNITY DETECTION IN PRACTICAL SCENARIOS Lovro Subelj - - PowerPoint PPT Presentation

NETWORK COMMUNITY DETECTION IN PRACTICAL SCENARIOS Lovro Subelj University of Ljubljana Faculty of Computer and Information Science ARS 15 NETWORK COMMUNITY STRUCTURE Cross Five Tennessee 17 Louisiana State Vau Kentucky Georgia


slide-1
SLIDE 1

NETWORK COMMUNITY DETECTION IN PRACTICAL SCENARIOS

Lovro ˇ Subelj

University of Ljubljana Faculty of Computer and Information Science ARS ’15

slide-2
SLIDE 2

NETWORK COMMUNITY STRUCTURE

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

Karate club

Beak Beescratch Bumper CCL Cross DN16 DN21 DN63 Double Feather Fish Five Fork Gallatin Grin Haecksel Hook Jet Jonah Knit Kringel MN105 MN23 MN60 MN83 Mus Notch Number1 Oscar Patchback PL Quasi Ripplefluke Scabs Shmuddel SMN5 SN100 SN4 SN63 SN89 SN9 SN90 SN96 Stripes Thumper Topless TR120 TR77 TR82 TR88 TR99 Trigger TSN103 TSN83 Upbang Vau Wave Web Whitetip Zap Zig Zipfel

Bottlenose dolphins

Brigham Young Florida State Iowa Kansas State New Mexico Texas Tech Penn State Southern California Arizona State San Diego State Baylor North Texas Northern Illinois Northwestern Western Michigan Wisconsin Wyoming Auburn Akron Virginia Tech Alabama UCLA Arizona Utah Arkansas State North Carolina State Ball State Florida Boise State Boston College West Virginia Bowling Green State Michigan Virginia Buffalo Syracuse Central Florida Georgia Tech Central Michigan Purdue Colorado Colorado State Connecticut Eastern Michigan East Carolina Duke Fresno State Ohio State Houston Rice Idaho Washington Kansas Southern Methodist Kent Pittsburgh Kentucky Louisville Louisiana Tech Louisiana Monroe Minnesota Miami Ohio Vanderbilt Middle Tennessee State Illinois Mississippi State Memphis Nevada Oregon New Mexico State South Carolina Ohio Iowa State San Jose State Nebraska Southern Mississippi Tennessee Stanford Washington State Temple Navy Texas A&M Notre Dame Texas ElPaso Oklahoma Toledo Tulane Mississippi Tulsa North Carolina Utah State Army Cincinnati Air Force Rutgers Georgia Louisiana State Louisiana Lafayette Texas Marshall Michigan State Miami Florida Missouri Clemson Nevada Las Vegas Wake Forest Indiana Oklahoma State Oregon State Maryland Texas Christian California Alabama Birmingham Arkansas Hawaii

American football Synthetic graph Random graph

Communities are cohesive subgroups of sparse networks.

slide-3
SLIDE 3

NETWORK COMMUNITY DETECTION

  • graph partitioning,
  • hierarchical clustering,
  • modularity optimization,
  • statistical inference,
  • spectral methods,
  • map equation,
  • dynamics etc.

Girvan, M. & Newman, M. E. J., P. Natl. Acad. Sci. USA 99, 7821–7826 (2002). Fortunato, S., Phys. Rep. 486, 75–174 (2010).

slide-4
SLIDE 4

LABEL PROPAGATION ALGORITHM

Raghavan, U. N., Albert, R. & Kumara, S., Phys. Rev. E 76, 036106 (2007). ˇ Subelj, L. & Bajec, M., Phys. Rev. E 83, 036103 (2011) etc.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
slide-5
SLIDE 5

LARGE-SCALE COMMUNITY DETECTION

Hric, D., Darst, R. K. & Fortunato, S., Phys. Rev. E 90, 062805 (2014).

slide-6
SLIDE 6

COMMUNITY DETECTION TASKS

EXPLORATORY TASK

TRAINING SET

PREDICTIVE TASK

TRAINING SET TEST SET TEST SET

slide-7
SLIDE 7

APS & WIKILEAKS NETWORKS

DATA

APS WikiLeaks 1893-2013 1966-2010

NETWORK

citation reference 526,527 papers 52,416 cables 5,989,263 citations 78,506 references

CLUSTERS

12 journals 3 privacy levels 301 sections 263 embassies

SETTING

14 algorithms 26 algorithms

TRAINING

1893-2012 1966-2009

TEST

2013 (4%) 2010 (17%)

Non-overlapping and cohesive ground truth clusters.

slide-8
SLIDE 8

APS & WIKILEAKS RESULTS

EXPLORATORY TASK

NORMALIZED MUTUAL INFORMATION

12 journals 301 sections 3 privacy levels 263 embassies

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

PREDICTIVE TASK

CLASSIFICATION ACCURACY

12 journals 301 sections 3 privacy levels 263 embassies

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

TASK CORRELATION

SPEARMAN CORRELATION

12 journals 301 sections 3 privacy levels 263 embassies

−1 −0.75 −0.5 −0.25 0.25 0.5 0.75 1

slide-9
SLIDE 9

YOUTUBE, DBLP & JAVA NETWORKS

DATA

YouTube DBLP java

NETWORK

social collaboration software 39,841 users 317,080 authors 2,378 classes 224,235 friends. 1,049,866 collabs. 14,619 depends.

CLUSTERS

12,986 groups 98,326 venues 54 packages

SETTING

14 algorithms 14 algorithms 26 algorithms

TRAINING

leave-one-out leave-one-out leave-one-out

Overlapping or non-cohesive ground truth clusters.

slide-10
SLIDE 10

YOUTUBE, DBLP & JAVA RESULTS

EXPLORATORY TASK

NORMALIZED MUTUAL INFORMATION

12,986 categories 98,326 venues 54 packages

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

PREDICTIVE TASK

CLASSIFICATION ACCURACY

12,986 categories 98,326 venues 54 packages

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

TASK CORRELATION

SPEARMAN CORRELATION

12,986 categories 98,326 venues 54 packages

−1 −0.75 −0.5 −0.25 0.25 0.5 0.75 1

slide-11
SLIDE 11

COMMUNITY DETECTION IN PRACTICE

Take-home message:

  • community information is useful in practice,
  • for lots of clusters, same algorithms for both tasks,
  • for few clusters, different algorithms for different tasks.

Future work:

  • beyond majority classification,
  • overlapping and non-cohesive clusters,
  • descriptive, inferential, causal and mechanistic tasks.
slide-12
SLIDE 12

LOVRO ˇ SUBELJ

lovro.subelj@fri.uni-lj.si http://lovro.lpt.fri.uni-lj.si