Crawling the Community Structure of Multiplex Networks Ricky - PowerPoint PPT Presentation

Crawling the Community Structure of Multiplex Networks Ricky Laishram 1 Jeremy D. Wendt 2 Sucheta Soundarajan 1 1 Syracuse University, Syracuse NY, USA 2 Sandia National Laboratories, Albuquerque NM, USA

Multiplex Networks Nodes have multiple types of edges between them 1 . Edges of the same type can be considered as belonging to the same ‘layer’. A special type of multilayer network in which nodes can participate in all layers. Example: Terrorist Network. Layers: Face-to-face Figure: NoordinTop Multiplex Network. communication, kinship, classmates, mentors. 1 Mucha, Peter J., et al. "Community structure in time-dependent, multiscale, and multiplex networks." science 328.5980 (2010): 876-878.

Data Collection in Multiplex Networks Before a multiplex network can be analyzed, we need data! Challenges of data collection in multiplex networks: 1 Different layers have different data collection costs. 2 Data collected from different layers have different reliabilities. Layer Cost of query Reliability of response Kinship Low High Communication High Low

Problem Definition Let M be a multiplex network, with L 0 , L 1 , . . . as the different layers. Query costs of the layers: c 0 , c 1 , . . . . Given the initial set of nodes V ′ , query budget B , and layer of interest L 0 , how can we sample M through crawling so that the sample of L 0 found is community representative of L 0 without exceeding the query budget?

Query Response Models 1 Reliable Query Response (RQR) : A query for the neighbors of a node returns all the neighbors. 2 Unreliable Query Response (UQR) : A query for the neighbors of a node may not return all the neighbors. Every node has an uncertainty factor that determines the probability of including a neighbor in the response. (b) Example of RQR (a) Example of UQR

Challenges 1 The layer of interest is costly to explore. 2 Need to balance trade-off between exploring the layer of interest and the other layers. 3 The true properties of many nodes are not known initially 1 . 4 In UQR, a queried node may still have unobserved neighbors. 1 This is a challenge related with data collection with crawling in general; not just in multiplex networks.

Contributions 1 We are the first to consider the problem of sampling a multiplex network to generate a sample that is representative of the community structure of the layer of interest. 2 We propose MultiComSample ( MCS ), a novel sampling algorithm for crawling the community structure of the layer of interest. 3 We perform extensive experimental evaluations, and demonstrate thet MCS outperforms all the baseline algorithms.

Methodology MCS consist of two steps: 1 RNDSample : Sample the ‘cheaper’ layers. 2 MABSample : Sample the ‘layer of interest’ using the information from RNDSample

RNDSample 1 Each layer is allocated some fraction of the budget. 2 Random walk (with jump) on layers with the allocated budget.

MABSample : Overview MABSample has three multi-armed bandits. 1 LBandit : Selects the layer that is more likely to have high edge overlap with L 0 . 2 CBandit : Selects a community in the layer selected by LBandit . 3 RBandit : Selects a node in the community selected by Cbandit . Each layer has its own CBandit and RBandit .

MABSample : Details Start Inputs: L S 0 , C 0 Yes Termination Stop Condition? L x , k , r ← Arms Update LBandit , from LBandit , CBandit , RBandit No CBandit , RBandit with rewards u ← Node in L x Update L S from community 0 with v and e k satisfying role r e ← Edges between Remove edges { ( u , x ) : x ∈ V S } from L S u and Γ( u , L 0 ) in L 0 0 Figure: The flowchart for MABSample .

MABSample : Rewards Edge Overlap: Measures how similar a layer L x is to L 0 based on observed edges. Community Update Distance: Normalized partition distance before and after querying some nodes. Reward Edge Overlap LBandit Community Update Distance CBandit Community Update Distance RBandit

MultiComSample ( MCS ) Start Inputs: V 0 , C max Initial budget al- location C x ) No Budget Return Community in L S 0 remaining? Yes Update budget al- RNDSample on L x ∈ ( 0 , 1 ] Stop loction for C x ∈ ( 0 , l ] MABSample on Update L S 0 from L 0 to update L S RNDSample 0 Figure: The flowchart of the MCS algorithm.

RQR vs UQR RQR: Once queried, a node is never queried in that layer again. UQR: Estimate the uncertainty of the queried nodes. Already queried nodes have some chance of being queried again in that layer.

Datasets Network Number of Nodes Number of Layers Max Budget TwitterKP 2420 3 50% TwitterOW 2182 3 50% TwitterSC 2116 3 50% TwitterTR 3036 3 50% CaHepPhTh 1324 2 50% NoordinTop 120 5 50% 6 × 10 5 DBLP 2 5% Table: Statistics of datasets used for experiments.

Baseline Algorithm Operates on Name Next node to query Node with most neighbors in L S 0 . SMD Layer of interest, L 0 Random node in L S SRW 0 Node with most neighbors in aggregated sample AMD Aggregate of all layers Random node in aggregated sample ARW Layer with highest edge overlay is selected Node with highest neighbors in selected layer MMD Multiplex Network Random node in selected layer MRW Node is queried in both L 0 and selected layer Appropriate modifications are made to the set of candidate node in the case of UQR.

Performance Comparison (a) RQR (b) UQR 0.8 0.6 0.6 Similarity Similarity 0.4 0.4 0.2 0.2 0.0 0.0 0 10 20 30 40 50 0 10 20 30 40 50 Cost Cost Figure: Comparison between MCS and baselines on TwitterKP dataset. MCS outperforms all the baselines in finding samples whose community structure is more similar to the original network.

Regret Analysis Cumulative Regret 0.25 0.20 0.15 TwitterKP 0.10 TwitterOW 0.05 0.0 0.1 0.2 0.3 Nodes Queried Figure: Cumulative regret for MCS for TwitterKP and TwitterOW. MCS gets close to the oracle after around 10 % -20 % of the nodes has been queried.

Conclusion Addressed the problem of sampling community structure of a layer of interest in multiplex network. Proposed a novel algorithm called MultiComSample ( MCS ). Showed that MCS outperforms baseline on multiple real-world networks.

Thank You. Questions? rlaishra@syr.edu

Crawling the Community Structure of Multiplex Networks Ricky - PowerPoint PPT Presentation

Crawling the Community Structure of Multiplex Networks Ricky Laishram 1 Jeremy D. Wendt 2 Sucheta Soundarajan 1 1 Syracuse University, Syracuse NY, USA 2 Sandia National Laboratories, Albuquerque NM, USA Multiplex Networks Nodes have multiple

CRAWLING WIT ITH Deeksha Kushal Motwani APACHE NUTCH Shailender Joseph Web-Crawling Apache

Pitfalls of Crawling Crawling, session 7 CS6200: Information Retrieval Slides by: Jesse Anderton

1 A Crawler Architecture Web Crawler Starts with a set of seeds Seeds are added to a URL

Web Crawling Najork and Heydon, High-Performance Web Crawling , Compaq SRC Research Report

Web Crawling Najork and Heydon, High-Performance Web Crawling , Compaq SRC Research Report

Multiplex arrays for Genotyping Multiplex arrays for Genotyping 2 Alere Technologies, Jena 3 4

Y chromosomal SNP Position Y-SNP Mutation 2715180 SRY 465 C/T 2794854 RPS4Y 711 C/T 2881786

Community Detection in Multiplex Networks: A survey Rushed Kanawati A 3 , LIPN, CNRS UMR 7030

COMMUNITY MANAGEMENT jono bacon COMMUNITY COMMUNITY COMMUNITY COMMUNITY COMMUNITY COMMUNITY

Competition in two-layer multiplex networks Sergio Gmez Universitat Rovira i Virgili,

Novel Gaits for a Novel Novel Gaits for a Novel Crawling/Grasping Mechanism Crawling/Grasping

Crawling HTML Query processing Content Analysis Indexing Crawling Document Layer Network

HTTP Crawling Crawling, session 2 CS6200: Information Retrieval Slides by: Jesse Anderton A

Crawling Structured Data Crawling, session 10 CS6200: Information Retrieval Slides by: Jesse

Web Crawling Najork and Heydon, High-Performance Web Crawling , Compaq SRC Research Report

RapiPlex : A Rapid, Multiplex, Immunoassay platform for universal application at the Point of

Challenges and future of three-body heavy meson decays Patricia C. Magalhes University of

Charm CP violation & mixing Mat Charles (Oxford & UPMC) ! Overview YES NO Interested

CP Violation Searches in Atmospheric Neutrinos Soeb Razzaque University of Johannesburg South

CP violation and Leptogenesis in Minimal Seesaw Model Sin Kyu Kang (Seoul-Tech) based on work

Learning Rules to Pre-process Web Data for Automatic Integration Kai Simon, Thomas Hornung, Georg

ARCHER Training Courses General Overview Reusing this material This work is licensed under a

IpMorph : unification de la mystification de la prise d'empreinte Guillaume PRIGENT

Mining Anomalies Andrzej Wasylkowski 1 Why Mine Anomalies? How can we make programs more

Crawling the Community Structure of Multiplex Networks Ricky - PowerPoint PPT Presentation

Crawling the Community Structure of Multiplex Networks Ricky Laishram 1 Jeremy D. Wendt 2 Sucheta Soundarajan 1 1 Syracuse University, Syracuse NY, USA 2 Sandia National Laboratories, Albuquerque NM, USA Multiplex Networks Nodes have multiple

CRAWLING WIT ITH Deeksha Kushal Motwani APACHE NUTCH Shailender Joseph Web-Crawling Apache

Pitfalls of Crawling Crawling, session 7 CS6200: Information Retrieval Slides by: Jesse Anderton

1 A Crawler Architecture Web Crawler Starts with a set of seeds Seeds are added to a URL

Web Crawling Najork and Heydon, High-Performance Web Crawling , Compaq SRC Research Report

Web Crawling Najork and Heydon, High-Performance Web Crawling , Compaq SRC Research Report

Multiplex arrays for Genotyping Multiplex arrays for Genotyping 2 Alere Technologies, Jena 3 4

Y chromosomal SNP Position Y-SNP Mutation 2715180 SRY 465 C/T 2794854 RPS4Y 711 C/T 2881786

Community Detection in Multiplex Networks: A survey Rushed Kanawati A 3 , LIPN, CNRS UMR 7030

COMMUNITY MANAGEMENT jono bacon COMMUNITY COMMUNITY COMMUNITY COMMUNITY COMMUNITY COMMUNITY

Competition in two-layer multiplex networks Sergio Gmez Universitat Rovira i Virgili,

Novel Gaits for a Novel Novel Gaits for a Novel Crawling/Grasping Mechanism Crawling/Grasping

Crawling HTML Query processing Content Analysis Indexing Crawling Document Layer Network

HTTP Crawling Crawling, session 2 CS6200: Information Retrieval Slides by: Jesse Anderton A

Crawling Structured Data Crawling, session 10 CS6200: Information Retrieval Slides by: Jesse

Web Crawling Najork and Heydon, High-Performance Web Crawling , Compaq SRC Research Report

RapiPlex : A Rapid, Multiplex, Immunoassay platform for universal application at the Point of

Challenges and future of three-body heavy meson decays Patricia C. Magalhes University of

Charm CP violation &amp; mixing Mat Charles (Oxford &amp; UPMC) ! Overview YES NO Interested

CP Violation Searches in Atmospheric Neutrinos Soeb Razzaque University of Johannesburg South

CP violation and Leptogenesis in Minimal Seesaw Model Sin Kyu Kang (Seoul-Tech) based on work

Learning Rules to Pre-process Web Data for Automatic Integration Kai Simon, Thomas Hornung, Georg

ARCHER Training Courses General Overview Reusing this material This work is licensed under a

IpMorph : unification de la mystification de la prise d'empreinte Guillaume PRIGENT

Mining Anomalies Andrzej Wasylkowski 1 Why Mine Anomalies? How can we make programs more

Charm CP violation & mixing Mat Charles (Oxford & UPMC) ! Overview YES NO Interested