[PPT] - Identifying and Characterizing Nodes Important to Community Using PowerPoint Presentation

SLIDE 1

Identifying and Characterizing Nodes Important to Community Using the Spectrum of the graph

SLIDE 2

=> Published in volume 6 of the journal PLoS ONE’s November 2011 edition => Authors: Yang Wang, Zengru Di, Ying Fan all from the department of Systems Science, Beijing Normal University, China

Citation

SLIDE 3

Overview

Networks represent the interaction structure

among components in a wide range of real complex systems

Exploring network communities
reveals the network
provides new aspect of dynamic processes
uncovers the relationship among the nodes
This paper devices a new approach to identify the

important nodes without knowing the exact partition

f the network

SLIDE 4

Construction

Based on the implication that the Spectrum of the

adjacency matrix gives indication of community structure in network

Distinguishes the critical nodes as
community core - eigenvalues
bridge – graph Laplacian
Experiments on synthetic and real networks

SLIDE 5

Definitions

Eigen vector: A non-zero column vector v is a

eigenvector of a matrix A iff there exists a number λ such that Av= λv.

Eigen value: The number λ is called the eigen

value corresponding to that eigenvector v.

SLIDE 6

Identifying important nodes

Proposed Method: A Centrality Metric based on

the spectrum of Adjacency Matrix

Definitions: Binary network G=(V,E)
|V| = m, |E| = n
Eigenvectors are orthogonal and normalized
Objective Function :
Maximize eigenvalues (λ) using perturbation

theory

where Pk is the relative change

in the c largest eigenvalues as node k is removed

SLIDE 7

Centrality Metric

where Vik is the kth element of vi

and Pk lies in the interval [0,1]. If a node k is important to the community structure, Pk will be large

In a network with n nodes and c communities,
To scale the index to 1, Ik = Pk / c where
If the index I is large than 1/n, it is an important

node

SLIDE 8

Distinguish two kinds of important nodes

RatioCut Technique:

|Ci| is the size of the community Ci. Ratio cut problem reduces to Mincut problem when the sizes

f the communities are almost the same.
Case 1: c = 2

Index vector s with N elements

SLIDE 9

Continued

RatioCut function becomes::

L is the graph Laplacian defined as Lij=-Aij for i≠j and Lii=ki where ki is the degree of node i. Also there are two constraints on s

SLIDE 10

Continued

The partition problem can be devised as the

following minimization problem

Solution to this problem is found to be the

eigenvector corresponding to the second-smallest eigenvalue of L, denoted by u2

Community core nodes: |ui

2| is relatively large

bridge nodes: |ui

2| is near zero

SLIDE 11

Continued

Case 2: c > 2

A new n x c-index matrix S is defined as si,j = 1/√|Cj| if vertex i є Cj, else 0 RatioCut= Tr(STLS). L is a symmetric matrix which can be written as L=UDUT where U is the eigenvector of L and D is the diagonal matrix of eigenvalues Dii=βi RatioCut can be written as

SLIDE 12

Continued

Defining vertex vector of i as ri and let [ri]j=Uij

the equates can be written as given that the network has almost equal sized

communities. [Gk: set of vertices in community k]

Minimizing the RatioCut equates to the maximization problem Where p is a parameter. For clear community structure, p=c can be chosen.

SLIDE 13

Continued

If the community structure is quire clear, vertex

vector magnitude |ri| in the first p terms give the identity of bridge nodes, denoted by b if the index b of a given vertex is near zero, it indicates that the presence of that node results in a large RatioCut and hence it is a bridge node.

SLIDE 14

Continued

In order to scale the index to 1, a new term is

defined as wk where wk= bk / c

Considering an ER random network with n nodes

as a null model, index of each node would be 1/n

If w-score of any node is smaller than 1/n, this

vertex has nearly equal membership in more than

ne community and hence it is a bridge node.

SLIDE 15

Pros of this approach

Less computational cost O(mn)

SLIDE 16

Experimental Results

Synthetic Network

 The centrality metric I predicts node 1, 8 and 15 as important

nodes. W-score identifies 15 as

the bridge node ΔH index also gives correct prediction, however requires significant computational cost  M can identify cores only

SLIDE 17

Experimental Results (contd.)

Real World Network Zachary’s karate club (social network) with c=2

The centrality metric I identifies the community core: node 1 and node 34 (administrator and Instructor). The w-score identifies node 3 as the overlapping node i.e. the bridge between these two communities

SLIDE 18

Zachary’s karate club visualization

The diameter of each vertex is proportional to I Large diameter indicates important vertex Color of each vertex is related to the index w-score Red vertices behave like “overlapping” nodes or bridges Yellow vertices lie inside their own communities

SLIDE 19

Word Association Network

Four communities: Intelligence, Astronomy, Light, Colors word Bright is related to all of them. Likewise Sun Community critical nodes: Bright, Sun, Moon, Smart Community cores: Moon and Smart Bridges: Bright and Sun

SLIDE 20

Scientist Collaboration Network

Network represents scientists whose research centers on the properties of networks of one kind or another Edges placed between scientists who have published one paper together Centrality metric I identifies the group leader: Newman, Boccaletti, Barabasi  w-score is not large as they have collaboration between scientists outside their own communities

SLIDE 21

C. Elegans neural network

Network is divided into 3 communities (sensory, interneuron, motor neuron) Each node represents a neuron and each edge represents a synaptic connection between neurons high centrality metric I: important interneurons (AVA, AVB, … )  w-score is very small because most of the important nodes act as bridge since the connection between communities is more necessary

SLIDE 22

Applications in weighted networks Artificial Network

 Adjacency matrix for undirected network is real and symmetric  Works well in small artificial network 10 nodes with two communities Higher weight means closer relationship between vertices 4 and 9 are the core of the communities 11 is the bridge between communities

SLIDE 23

Applications in weighted networks (Contd.) Real Network: SFI (Santa Fe Collaboration)

SFI collaboration network Vertices 2, 12 and 24 are group leaders (community cores) Vertices 1, 9 and 11 are bridges  The result is different from the corresponding unweighted network  edge weight might affect the result s

SLIDE 24

Limitations

In case of many heterogeneous cluster size, the community identification fails This limitation is a result of the adjacency matrix property Nsmall 2 < Nlarge , small communities cannot be detected  δ = Nlarge / Nsmall  I cannot identify the important nodes in the small communities when the communities are in very different size

SLIDE 25

Conclusion/Observation

Proposed method works well in many cases without knowing the exact community structure The number of communities must be known, although This paper does not say anything about the effect of removing/adding any node The underlying community structure change is not taken into consideration The directed case is not considered which is subject to future research  The identification of such key nodes is important and could potentially be used to identify the organizer of the community in social networks, to develop an immunization strategy in an epidemic process, to identify key nodes in biological networks