Identifying and Characterizing Nodes Important to Community Using - - PowerPoint PPT Presentation

identifying and characterizing nodes important to
SMART_READER_LITE
LIVE PREVIEW

Identifying and Characterizing Nodes Important to Community Using - - PowerPoint PPT Presentation

Identifying and Characterizing Nodes Important to Community Using the Spectrum of the graph Citation => Published in volume 6 of the journal PLoS ONEs November 2011 edition => Authors: Yang Wang, Zengru Di, Ying Fan all from the


slide-1
SLIDE 1

Identifying and Characterizing Nodes Important to Community Using the Spectrum of the graph

slide-2
SLIDE 2

=> Published in volume 6 of the journal PLoS ONE’s November 2011 edition => Authors: Yang Wang, Zengru Di, Ying Fan all from the department of Systems Science, Beijing Normal University, China

Citation

slide-3
SLIDE 3

Overview

  • Networks represent the interaction structure

among components in a wide range of real complex systems

  • Exploring network communities
  • reveals the network
  • provides new aspect of dynamic processes
  • uncovers the relationship among the nodes
  • This paper devices a new approach to identify the

important nodes without knowing the exact partition

  • f the network
slide-4
SLIDE 4

Construction

  • Based on the implication that the Spectrum of the

adjacency matrix gives indication of community structure in network

  • Distinguishes the critical nodes as
  • community core - eigenvalues
  • bridge – graph Laplacian
  • Experiments on synthetic and real networks
slide-5
SLIDE 5

Definitions

  • Eigen vector: A non-zero column vector v is a

eigenvector of a matrix A iff there exists a number λ such that Av= λv.

  • Eigen value: The number λ is called the eigen

value corresponding to that eigenvector v.

slide-6
SLIDE 6

Identifying important nodes

  • Proposed Method: A Centrality Metric based on

the spectrum of Adjacency Matrix

  • Definitions: Binary network G=(V,E)
  • |V| = m, |E| = n
  • Eigenvectors are orthogonal and normalized
  • Objective Function :
  • Maximize eigenvalues (λ) using perturbation

theory

  • where Pk is the relative change

in the c largest eigenvalues as node k is removed

slide-7
SLIDE 7

Centrality Metric

  • where Vik is the kth element of vi

and Pk lies in the interval [0,1]. If a node k is important to the community structure, Pk will be large

  • In a network with n nodes and c communities,
  • To scale the index to 1, Ik = Pk / c where
  • If the index I is large than 1/n, it is an important

node

slide-8
SLIDE 8

Distinguish two kinds of important nodes

  • RatioCut Technique:

|Ci| is the size of the community Ci. Ratio cut problem reduces to Mincut problem when the sizes

  • f the communities are almost the same.
  • Case 1: c = 2

Index vector s with N elements

slide-9
SLIDE 9

Continued

  • RatioCut function becomes::

L is the graph Laplacian defined as Lij=-Aij for i≠j and Lii=ki where ki is the degree of node i. Also there are two constraints on s

slide-10
SLIDE 10

Continued

  • The partition problem can be devised as the

following minimization problem

  • Solution to this problem is found to be the

eigenvector corresponding to the second-smallest eigenvalue of L, denoted by u2

  • Community core nodes: |ui

2| is relatively large

  • bridge nodes: |ui

2| is near zero

slide-11
SLIDE 11

Continued

  • Case 2: c > 2

A new n x c-index matrix S is defined as si,j = 1/√|Cj| if vertex i є Cj, else 0 RatioCut= Tr(STLS). L is a symmetric matrix which can be written as L=UDUT where U is the eigenvector of L and D is the diagonal matrix of eigenvalues Dii=βi RatioCut can be written as

slide-12
SLIDE 12

Continued

  • Defining vertex vector of i as ri and let [ri]j=Uij

the equates can be written as given that the network has almost equal sized

  • communities. [Gk: set of vertices in community k]

Minimizing the RatioCut equates to the maximization problem Where p is a parameter. For clear community structure, p=c can be chosen.

slide-13
SLIDE 13

Continued

  • If the community structure is quire clear, vertex

vector magnitude |ri| in the first p terms give the identity of bridge nodes, denoted by b if the index b of a given vertex is near zero, it indicates that the presence of that node results in a large RatioCut and hence it is a bridge node.

slide-14
SLIDE 14

Continued

  • In order to scale the index to 1, a new term is

defined as wk where wk= bk / c

  • Considering an ER random network with n nodes

as a null model, index of each node would be 1/n

  • If w-score of any node is smaller than 1/n, this

vertex has nearly equal membership in more than

  • ne community and hence it is a bridge node.
slide-15
SLIDE 15

Pros of this approach

  • Less computational cost O(mn)
slide-16
SLIDE 16

Experimental Results

  • Synthetic Network

 The centrality metric I predicts node 1, 8 and 15 as important

  • nodes. W-score identifies 15 as

the bridge node ΔH index also gives correct prediction, however requires significant computational cost  M can identify cores only

slide-17
SLIDE 17

Experimental Results (contd.)

Real World Network Zachary’s karate club (social network) with c=2

The centrality metric I identifies the community core: node 1 and node 34 (administrator and Instructor). The w-score identifies node 3 as the overlapping node i.e. the bridge between these two communities

slide-18
SLIDE 18

Zachary’s karate club visualization

The diameter of each vertex is proportional to I Large diameter indicates important vertex Color of each vertex is related to the index w-score Red vertices behave like “overlapping” nodes or bridges Yellow vertices lie inside their own communities

slide-19
SLIDE 19

Word Association Network

Four communities: Intelligence, Astronomy, Light, Colors word Bright is related to all of them. Likewise Sun Community critical nodes: Bright, Sun, Moon, Smart Community cores: Moon and Smart Bridges: Bright and Sun

slide-20
SLIDE 20

Scientist Collaboration Network

Network represents scientists whose research centers on the properties of networks of one kind or another Edges placed between scientists who have published one paper together Centrality metric I identifies the group leader: Newman, Boccaletti, Barabasi  w-score is not large as they have collaboration between scientists outside their own communities

slide-21
SLIDE 21
  • C. Elegans neural network

Network is divided into 3 communities (sensory, interneuron, motor neuron) Each node represents a neuron and each edge represents a synaptic connection between neurons high centrality metric I: important interneurons (AVA, AVB, … )  w-score is very small because most of the important nodes act as bridge since the connection between communities is more necessary

slide-22
SLIDE 22

Applications in weighted networks Artificial Network

 Adjacency matrix for undirected network is real and symmetric  Works well in small artificial network 10 nodes with two communities Higher weight means closer relationship between vertices 4 and 9 are the core of the communities 11 is the bridge between communities

slide-23
SLIDE 23

Applications in weighted networks (Contd.) Real Network: SFI (Santa Fe Collaboration)

SFI collaboration network Vertices 2, 12 and 24 are group leaders (community cores) Vertices 1, 9 and 11 are bridges  The result is different from the corresponding unweighted network  edge weight might affect the result s

slide-24
SLIDE 24

Limitations

In case of many heterogeneous cluster size, the community identification fails This limitation is a result of the adjacency matrix property Nsmall 2 < Nlarge , small communities cannot be detected  δ = Nlarge / Nsmall  I cannot identify the important nodes in the small communities when the communities are in very different size

slide-25
SLIDE 25

Conclusion/Observation

Proposed method works well in many cases without knowing the exact community structure The number of communities must be known, although This paper does not say anything about the effect of removing/adding any node The underlying community structure change is not taken into consideration The directed case is not considered which is subject to future research  The identification of such key nodes is important and could potentially be used to identify the organizer of the community in social networks, to develop an immunization strategy in an epidemic process, to identify key nodes in biological networks