Identifying and Characterizing Nodes Important to Community Using - - PowerPoint PPT Presentation
Identifying and Characterizing Nodes Important to Community Using - - PowerPoint PPT Presentation
Identifying and Characterizing Nodes Important to Community Using the Spectrum of the graph Citation => Published in volume 6 of the journal PLoS ONEs November 2011 edition => Authors: Yang Wang, Zengru Di, Ying Fan all from the
=> Published in volume 6 of the journal PLoS ONE’s November 2011 edition => Authors: Yang Wang, Zengru Di, Ying Fan all from the department of Systems Science, Beijing Normal University, China
Citation
Overview
- Networks represent the interaction structure
among components in a wide range of real complex systems
- Exploring network communities
- reveals the network
- provides new aspect of dynamic processes
- uncovers the relationship among the nodes
- This paper devices a new approach to identify the
important nodes without knowing the exact partition
- f the network
Construction
- Based on the implication that the Spectrum of the
adjacency matrix gives indication of community structure in network
- Distinguishes the critical nodes as
- community core - eigenvalues
- bridge – graph Laplacian
- Experiments on synthetic and real networks
Definitions
- Eigen vector: A non-zero column vector v is a
eigenvector of a matrix A iff there exists a number λ such that Av= λv.
- Eigen value: The number λ is called the eigen
value corresponding to that eigenvector v.
Identifying important nodes
- Proposed Method: A Centrality Metric based on
the spectrum of Adjacency Matrix
- Definitions: Binary network G=(V,E)
- |V| = m, |E| = n
- Eigenvectors are orthogonal and normalized
- Objective Function :
- Maximize eigenvalues (λ) using perturbation
theory
- where Pk is the relative change
in the c largest eigenvalues as node k is removed
Centrality Metric
- where Vik is the kth element of vi
and Pk lies in the interval [0,1]. If a node k is important to the community structure, Pk will be large
- In a network with n nodes and c communities,
- To scale the index to 1, Ik = Pk / c where
- If the index I is large than 1/n, it is an important
node
Distinguish two kinds of important nodes
- RatioCut Technique:
|Ci| is the size of the community Ci. Ratio cut problem reduces to Mincut problem when the sizes
- f the communities are almost the same.
- Case 1: c = 2
Index vector s with N elements
Continued
- RatioCut function becomes::
L is the graph Laplacian defined as Lij=-Aij for i≠j and Lii=ki where ki is the degree of node i. Also there are two constraints on s
Continued
- The partition problem can be devised as the
following minimization problem
- Solution to this problem is found to be the
eigenvector corresponding to the second-smallest eigenvalue of L, denoted by u2
- Community core nodes: |ui
2| is relatively large
- bridge nodes: |ui
2| is near zero
Continued
- Case 2: c > 2
A new n x c-index matrix S is defined as si,j = 1/√|Cj| if vertex i є Cj, else 0 RatioCut= Tr(STLS). L is a symmetric matrix which can be written as L=UDUT where U is the eigenvector of L and D is the diagonal matrix of eigenvalues Dii=βi RatioCut can be written as
Continued
- Defining vertex vector of i as ri and let [ri]j=Uij
the equates can be written as given that the network has almost equal sized
- communities. [Gk: set of vertices in community k]
Minimizing the RatioCut equates to the maximization problem Where p is a parameter. For clear community structure, p=c can be chosen.
Continued
- If the community structure is quire clear, vertex
vector magnitude |ri| in the first p terms give the identity of bridge nodes, denoted by b if the index b of a given vertex is near zero, it indicates that the presence of that node results in a large RatioCut and hence it is a bridge node.
Continued
- In order to scale the index to 1, a new term is
defined as wk where wk= bk / c
- Considering an ER random network with n nodes
as a null model, index of each node would be 1/n
- If w-score of any node is smaller than 1/n, this
vertex has nearly equal membership in more than
- ne community and hence it is a bridge node.
Pros of this approach
- Less computational cost O(mn)
Experimental Results
- Synthetic Network
The centrality metric I predicts node 1, 8 and 15 as important
- nodes. W-score identifies 15 as
the bridge node ΔH index also gives correct prediction, however requires significant computational cost M can identify cores only
Experimental Results (contd.)
Real World Network Zachary’s karate club (social network) with c=2
The centrality metric I identifies the community core: node 1 and node 34 (administrator and Instructor). The w-score identifies node 3 as the overlapping node i.e. the bridge between these two communities
Zachary’s karate club visualization
The diameter of each vertex is proportional to I Large diameter indicates important vertex Color of each vertex is related to the index w-score Red vertices behave like “overlapping” nodes or bridges Yellow vertices lie inside their own communities
Word Association Network
Four communities: Intelligence, Astronomy, Light, Colors word Bright is related to all of them. Likewise Sun Community critical nodes: Bright, Sun, Moon, Smart Community cores: Moon and Smart Bridges: Bright and Sun
Scientist Collaboration Network
Network represents scientists whose research centers on the properties of networks of one kind or another Edges placed between scientists who have published one paper together Centrality metric I identifies the group leader: Newman, Boccaletti, Barabasi w-score is not large as they have collaboration between scientists outside their own communities
- C. Elegans neural network
Network is divided into 3 communities (sensory, interneuron, motor neuron) Each node represents a neuron and each edge represents a synaptic connection between neurons high centrality metric I: important interneurons (AVA, AVB, … ) w-score is very small because most of the important nodes act as bridge since the connection between communities is more necessary
Applications in weighted networks Artificial Network
Adjacency matrix for undirected network is real and symmetric Works well in small artificial network 10 nodes with two communities Higher weight means closer relationship between vertices 4 and 9 are the core of the communities 11 is the bridge between communities
Applications in weighted networks (Contd.) Real Network: SFI (Santa Fe Collaboration)
SFI collaboration network Vertices 2, 12 and 24 are group leaders (community cores) Vertices 1, 9 and 11 are bridges The result is different from the corresponding unweighted network edge weight might affect the result s
Limitations
In case of many heterogeneous cluster size, the community identification fails This limitation is a result of the adjacency matrix property Nsmall 2 < Nlarge , small communities cannot be detected δ = Nlarge / Nsmall I cannot identify the important nodes in the small communities when the communities are in very different size
Conclusion/Observation
Proposed method works well in many cases without knowing the exact community structure The number of communities must be known, although This paper does not say anything about the effect of removing/adding any node The underlying community structure change is not taken into consideration The directed case is not considered which is subject to future research The identification of such key nodes is important and could potentially be used to identify the organizer of the community in social networks, to develop an immunization strategy in an epidemic process, to identify key nodes in biological networks