NETWORK COMMUNITY DETECTION IN PRACTICAL SCENARIOS Lovro Subelj - - PowerPoint PPT Presentation
NETWORK COMMUNITY DETECTION IN PRACTICAL SCENARIOS Lovro Subelj - - PowerPoint PPT Presentation
NETWORK COMMUNITY DETECTION IN PRACTICAL SCENARIOS Lovro Subelj University of Ljubljana Faculty of Computer and Information Science ARS 15 NETWORK COMMUNITY STRUCTURE Cross Five Tennessee 17 Louisiana State Vau Kentucky Georgia
NETWORK COMMUNITY STRUCTURE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34Karate club
Beak Beescratch Bumper CCL Cross DN16 DN21 DN63 Double Feather Fish Five Fork Gallatin Grin Haecksel Hook Jet Jonah Knit Kringel MN105 MN23 MN60 MN83 Mus Notch Number1 Oscar Patchback PL Quasi Ripplefluke Scabs Shmuddel SMN5 SN100 SN4 SN63 SN89 SN9 SN90 SN96 Stripes Thumper Topless TR120 TR77 TR82 TR88 TR99 Trigger TSN103 TSN83 Upbang Vau Wave Web Whitetip Zap Zig ZipfelBottlenose dolphins
Brigham Young Florida State Iowa Kansas State New Mexico Texas Tech Penn State Southern California Arizona State San Diego State Baylor North Texas Northern Illinois Northwestern Western Michigan Wisconsin Wyoming Auburn Akron Virginia Tech Alabama UCLA Arizona Utah Arkansas State North Carolina State Ball State Florida Boise State Boston College West Virginia Bowling Green State Michigan Virginia Buffalo Syracuse Central Florida Georgia Tech Central Michigan Purdue Colorado Colorado State Connecticut Eastern Michigan East Carolina Duke Fresno State Ohio State Houston Rice Idaho Washington Kansas Southern Methodist Kent Pittsburgh Kentucky Louisville Louisiana Tech Louisiana Monroe Minnesota Miami Ohio Vanderbilt Middle Tennessee State Illinois Mississippi State Memphis Nevada Oregon New Mexico State South Carolina Ohio Iowa State San Jose State Nebraska Southern Mississippi Tennessee Stanford Washington State Temple Navy Texas A&M Notre Dame Texas ElPaso Oklahoma Toledo Tulane Mississippi Tulsa North Carolina Utah State Army Cincinnati Air Force Rutgers Georgia Louisiana State Louisiana Lafayette Texas Marshall Michigan State Miami Florida Missouri Clemson Nevada Las Vegas Wake Forest Indiana Oklahoma State Oregon State Maryland Texas Christian California Alabama Birmingham Arkansas HawaiiAmerican football Synthetic graph Random graph
Communities are cohesive subgroups of sparse networks.
NETWORK COMMUNITY DETECTION
- graph partitioning,
- hierarchical clustering,
- modularity optimization,
- statistical inference,
- spectral methods,
- map equation,
- dynamics etc.
Girvan, M. & Newman, M. E. J., P. Natl. Acad. Sci. USA 99, 7821–7826 (2002). Fortunato, S., Phys. Rep. 486, 75–174 (2010).
LABEL PROPAGATION ALGORITHM
Raghavan, U. N., Albert, R. & Kumara, S., Phys. Rev. E 76, 036106 (2007). ˇ Subelj, L. & Bajec, M., Phys. Rev. E 83, 036103 (2011) etc.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34LARGE-SCALE COMMUNITY DETECTION
Hric, D., Darst, R. K. & Fortunato, S., Phys. Rev. E 90, 062805 (2014).
COMMUNITY DETECTION TASKS
EXPLORATORY TASK
TRAINING SET
PREDICTIVE TASK
TRAINING SET TEST SET TEST SET
APS & WIKILEAKS NETWORKS
DATA
APS WikiLeaks 1893-2013 1966-2010
NETWORK
citation reference 526,527 papers 52,416 cables 5,989,263 citations 78,506 references
CLUSTERS
12 journals 3 privacy levels 301 sections 263 embassies
SETTING
14 algorithms 26 algorithms
TRAINING
1893-2012 1966-2009
TEST
2013 (4%) 2010 (17%)
Non-overlapping and cohesive ground truth clusters.
APS & WIKILEAKS RESULTS
EXPLORATORY TASK
NORMALIZED MUTUAL INFORMATION
12 journals 301 sections 3 privacy levels 263 embassies
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
PREDICTIVE TASK
CLASSIFICATION ACCURACY
12 journals 301 sections 3 privacy levels 263 embassies
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
TASK CORRELATION
SPEARMAN CORRELATION
12 journals 301 sections 3 privacy levels 263 embassies
−1 −0.75 −0.5 −0.25 0.25 0.5 0.75 1
YOUTUBE, DBLP & JAVA NETWORKS
DATA
YouTube DBLP java
NETWORK
social collaboration software 39,841 users 317,080 authors 2,378 classes 224,235 friends. 1,049,866 collabs. 14,619 depends.
CLUSTERS
12,986 groups 98,326 venues 54 packages
SETTING
14 algorithms 14 algorithms 26 algorithms
TRAINING
leave-one-out leave-one-out leave-one-out
Overlapping or non-cohesive ground truth clusters.
YOUTUBE, DBLP & JAVA RESULTS
EXPLORATORY TASK
NORMALIZED MUTUAL INFORMATION
12,986 categories 98,326 venues 54 packages
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
PREDICTIVE TASK
CLASSIFICATION ACCURACY
12,986 categories 98,326 venues 54 packages
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
TASK CORRELATION
SPEARMAN CORRELATION
12,986 categories 98,326 venues 54 packages
−1 −0.75 −0.5 −0.25 0.25 0.5 0.75 1
COMMUNITY DETECTION IN PRACTICE
Take-home message:
- community information is useful in practice,
- for lots of clusters, same algorithms for both tasks,
- for few clusters, different algorithms for different tasks.
Future work:
- beyond majority classification,
- overlapping and non-cohesive clusters,
- descriptive, inferential, causal and mechanistic tasks.