Discrete Mathematics and Its Applications Lecture 7: Graphs: - - PowerPoint PPT Presentation

discrete mathematics and its applications
SMART_READER_LITE
LIVE PREVIEW

Discrete Mathematics and Its Applications Lecture 7: Graphs: - - PowerPoint PPT Presentation

Discrete Mathematics and Its Applications Lecture 7: Graphs: Proximity MING GAO DaSE@ECNU (for course related communications) mgao@dase.ecnu.edu.cn Jan. 6, 2019 Outline Community Structures 1 Node Proximity 2 Simple Approaches


slide-1
SLIDE 1

Discrete Mathematics and Its Applications

Lecture 7: Graphs: Proximity MING GAO

DaSE@ECNU (for course related communications) mgao@dase.ecnu.edu.cn

  • Jan. 6, 2019
slide-2
SLIDE 2

Outline

1

Community Structures

2

Node Proximity Simple Approaches Graph-theoretic Approaches SimRank Random Walk based Approaches

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

2 / 24

slide-3
SLIDE 3

Community Structures

Community structures

Definition Community structure indicates that the network divides naturally in- to groups of nodes with dense connections internally and sparser connections between groups.

slide-4
SLIDE 4

Community Structures

Community structures

Definition Community structure indicates that the network divides naturally in- to groups of nodes with dense connections internally and sparser connections between groups. Global community structures: clustering-based approach, spectral clustering, modularity-based approach, etc.

slide-5
SLIDE 5

Community Structures

Community structures

Definition Community structure indicates that the network divides naturally in- to groups of nodes with dense connections internally and sparser connections between groups. Global community structures: clustering-based approach, spectral clustering, modularity-based approach, etc. Local community structures: node-centric community, group-centric community

Traditional network: clique, quasi-clique, k-clique, k-core, etc.

slide-6
SLIDE 6

Community Structures

Community structures

Definition Community structure indicates that the network divides naturally in- to groups of nodes with dense connections internally and sparser connections between groups. Global community structures: clustering-based approach, spectral clustering, modularity-based approach, etc. Local community structures: node-centric community, group-centric community

Traditional network: clique, quasi-clique, k-clique, k-core, etc. Bipartite network: bi-clique, quasi-bi-clique, etc.

slide-7
SLIDE 7

Community Structures

Community structures

Definition Community structure indicates that the network divides naturally in- to groups of nodes with dense connections internally and sparser connections between groups. Global community structures: clustering-based approach, spectral clustering, modularity-based approach, etc. Local community structures: node-centric community, group-centric community

Traditional network: clique, quasi-clique, k-clique, k-core, etc. Bipartite network: bi-clique, quasi-bi-clique, etc. Signed network: antagonistic community, quasi-antagonistic community, etc.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

3 / 24

slide-8
SLIDE 8

Community Structures

Global community structures

Goal Partition nodes of a network into disjoint sets.

slide-9
SLIDE 9

Community Structures

Global community structures

Goal Partition nodes of a network into disjoint sets.

slide-10
SLIDE 10

Community Structures

Global community structures

Goal Partition nodes of a network into disjoint sets. Clustering based on vertex similarity

slide-11
SLIDE 11

Community Structures

Global community structures

Goal Partition nodes of a network into disjoint sets. Clustering based on vertex similarity Latent space models

slide-12
SLIDE 12

Community Structures

Global community structures

Goal Partition nodes of a network into disjoint sets. Clustering based on vertex similarity Latent space models Spectral clustering

slide-13
SLIDE 13

Community Structures

Global community structures

Goal Partition nodes of a network into disjoint sets. Clustering based on vertex similarity Latent space models Spectral clustering Modularity maximization

slide-14
SLIDE 14

Community Structures

Global community structures

Goal Partition nodes of a network into disjoint sets. Clustering based on vertex similarity Latent space models Spectral clustering Modularity maximization In the study of complex networks, a network is said to have commu- nity structure if the nodes of the network can be easily grouped into (potentially overlapping) sets of nodes such that each set of nodes is densely connected internally.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

4 / 24

slide-15
SLIDE 15

Community Structures

Clustering based on vertex similarity

K-means v1, · · · , vn are vertices of a graph;

slide-16
SLIDE 16

Community Structures

Clustering based on vertex similarity

K-means v1, · · · , vn are vertices of a graph; Each vertex vi will be assigned to one and only one cluster;

slide-17
SLIDE 17

Community Structures

Clustering based on vertex similarity

K-means v1, · · · , vn are vertices of a graph; Each vertex vi will be assigned to one and only one cluster; C(i) denotes cluster number for vertex vi;

slide-18
SLIDE 18

Community Structures

Clustering based on vertex similarity

K-means v1, · · · , vn are vertices of a graph; Each vertex vi will be assigned to one and only one cluster; C(i) denotes cluster number for vertex vi; Similarity measure or dissimilarity measure: Euclidean distance metric or Jaccard coefficient;

slide-19
SLIDE 19

Community Structures

Clustering based on vertex similarity

K-means v1, · · · , vn are vertices of a graph; Each vertex vi will be assigned to one and only one cluster; C(i) denotes cluster number for vertex vi; Similarity measure or dissimilarity measure: Euclidean distance metric or Jaccard coefficient; K-means minimizes within-cluster point scatter:

W (C) = 1 2

K

  • k=1
  • C(i)=k
  • C(j)=k

xi − xj 2 =

K

  • k=1

Nk

  • C(i)=k

xi − mk 2,

where Nk is the number of vertices in k−th cluster

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

5 / 24

slide-20
SLIDE 20

Community Structures

K-means example

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

6 / 24

slide-21
SLIDE 21

Community Structures

Spectral clustering

Let L be normalized Laplacian of graph G, the algorithm partitions nodes into two sets (B1, B2) based on the eigenvector v corresponding to the second-smallest eigenvalue of L. Partitioning may be done in various ways:

slide-22
SLIDE 22

Community Structures

Spectral clustering

Let L be normalized Laplacian of graph G, the algorithm partitions nodes into two sets (B1, B2) based on the eigenvector v corresponding to the second-smallest eigenvalue of L. Partitioning may be done in various ways: Assign all nodes whose component in v satisfies certain condition in B1, and B2 otherwise, e.g., larger than median, the sign of each entry of v.

slide-23
SLIDE 23

Community Structures

Spectral clustering

Let L be normalized Laplacian of graph G, the algorithm partitions nodes into two sets (B1, B2) based on the eigenvector v corresponding to the second-smallest eigenvalue of L. Partitioning may be done in various ways: Assign all nodes whose component in v satisfies certain condition in B1, and B2 otherwise, e.g., larger than median, the sign of each entry of v. The algorithm can be used for hierarchical clustering by repeatedly partitioning the subsets in this fashion.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

7 / 24

slide-24
SLIDE 24

Community Structures

Modularity

Idea Graph has community structure, if it is different from random graph (not expected to have community structure for random graph).

slide-25
SLIDE 25

Community Structures

Modularity

Idea Graph has community structure, if it is different from random graph (not expected to have community structure for random graph). Modularity [Newman 2006]: M = 1 2m

  • i,j

(Aij − d(i)d(j) 2m )δ(Ci, Cj). where m and Ci denote # edges and the i−th community in the graph.

slide-26
SLIDE 26

Community Structures

Modularity

Idea Graph has community structure, if it is different from random graph (not expected to have community structure for random graph). Modularity [Newman 2006]: M = 1 2m

  • i,j

(Aij − d(i)d(j) 2m )δ(Ci, Cj). where m and Ci denote # edges and the i−th community in the graph. Compares the number of edges within a community with the expected such number in a corresponding random graph.

slide-27
SLIDE 27

Community Structures

Modularity

Idea Graph has community structure, if it is different from random graph (not expected to have community structure for random graph). Modularity [Newman 2006]: M = 1 2m

  • i,j

(Aij − d(i)d(j) 2m )δ(Ci, Cj). where m and Ci denote # edges and the i−th community in the graph. Compares the number of edges within a community with the expected such number in a corresponding random graph. It can be used as a measure to evaluate the communities quality.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

8 / 24

slide-28
SLIDE 28

Community Structures

Louvain method

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

9 / 24

slide-29
SLIDE 29

Community Structures

Clique

Definition A clique is a subset of vertices of an undirected graph such that its induced subgraph is complete. A maximal clique is a clique that cannot be extended by including one more adjacent vertex. Normally use cliques as a core or a seed to find larger communities.

slide-30
SLIDE 30

Community Structures

Clique

Definition A clique is a subset of vertices of an undirected graph such that its induced subgraph is complete. A maximal clique is a clique that cannot be extended by including one more adjacent vertex. Normally use cliques as a core or a seed to find larger communities. Find out all cliques of size k in a given network (NP-complete)

slide-31
SLIDE 31

Community Structures

Clique

Definition A clique is a subset of vertices of an undirected graph such that its induced subgraph is complete. A maximal clique is a clique that cannot be extended by including one more adjacent vertex. Normally use cliques as a core or a seed to find larger communities. Find out all cliques of size k in a given network (NP-complete) Construct a clique graph. Two cliques are adjacent if they share k − 1 nodes

slide-32
SLIDE 32

Community Structures

Clique

Definition A clique is a subset of vertices of an undirected graph such that its induced subgraph is complete. A maximal clique is a clique that cannot be extended by including one more adjacent vertex. Normally use cliques as a core or a seed to find larger communities. Find out all cliques of size k in a given network (NP-complete) Construct a clique graph. Two cliques are adjacent if they share k − 1 nodes Each connected component in the clique graph forms a community

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

10 / 24

slide-33
SLIDE 33

Community Structures

Extensions of clique

k−clique Maximal subgroup, where the largest geodesic distance between any pair of nodes is not greater than k. It is a clique if k = 1.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

11 / 24

slide-34
SLIDE 34

Community Structures

Extensions of clique

k−clique Maximal subgroup, where the largest geodesic distance between any pair of nodes is not greater than k. It is a clique if k = 1. Quasi-clique Generalize clique to dense subgraph with different definitions (degree, density).

slide-35
SLIDE 35

Community Structures

Extensions of clique

k−clique Maximal subgroup, where the largest geodesic distance between any pair of nodes is not greater than k. It is a clique if k = 1. Quasi-clique Generalize clique to dense subgraph with different definitions (degree, density). Node degree: every node in induced subgraph is adjacent to at least γ(n − 1) other nodes.

slide-36
SLIDE 36

Community Structures

Extensions of clique

k−clique Maximal subgroup, where the largest geodesic distance between any pair of nodes is not greater than k. It is a clique if k = 1. Quasi-clique Generalize clique to dense subgraph with different definitions (degree, density). Node degree: every node in induced subgraph is adjacent to at least γ(n − 1) other nodes. Edge density: Number of edges in subgraph is at least γn(n − 1)/2, where n denotes # nodes in subgraph.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

11 / 24

slide-37
SLIDE 37

Community Structures

Local community structure in bipartite graph

Biclique A biclique is a special kind of bipartite graph where every vertex of the first set is connected to every vertex of the second set. A complete bipartite graph with partitions of size |V1| = m and |V2| = n, is denoted Km,n.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

12 / 24

slide-38
SLIDE 38

Community Structures

Local community structure in bipartite graph

Biclique A biclique is a special kind of bipartite graph where every vertex of the first set is connected to every vertex of the second set. A complete bipartite graph with partitions of size |V1| = m and |V2| = n, is denoted Km,n. Quasi-bi-clique The definition of biclique is too strict. A quasi-bi-clique is a dense subgraph to relax the constraint for vertices.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

12 / 24

slide-39
SLIDE 39

Community Structures

Local community structure in bipartite graph

Biclique A biclique is a special kind of bipartite graph where every vertex of the first set is connected to every vertex of the second set. A complete bipartite graph with partitions of size |V1| = m and |V2| = n, is denoted Km,n. Quasi-bi-clique The definition of biclique is too strict. A quasi-bi-clique is a dense subgraph to relax the constraint for vertices. Relative version of quasi-bi-clique.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

12 / 24

slide-40
SLIDE 40

Community Structures

Local community structure in bipartite graph

Biclique A biclique is a special kind of bipartite graph where every vertex of the first set is connected to every vertex of the second set. A complete bipartite graph with partitions of size |V1| = m and |V2| = n, is denoted Km,n. Quasi-bi-clique The definition of biclique is too strict. A quasi-bi-clique is a dense subgraph to relax the constraint for vertices. Relative version of quasi-bi-clique. Absolute version of quasi-bi-clique.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

12 / 24

slide-41
SLIDE 41

Node Proximity

Node proximity

Node proximity (= similarity or closeness, but = distance) measures: Information exchange Latency/speed of information exchange Likelihood of future link Propagation of a product/idea/service/ disease Relevance: ranking

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

13 / 24

slide-42
SLIDE 42

Node Proximity

Node proximity

Node proximity (= similarity or closeness, but = distance) measures: Information exchange Latency/speed of information exchange Likelihood of future link Propagation of a product/idea/service/ disease Relevance: ranking Approaches Simple approaches Graph-theoretic approaches SimRank Random walk with restarts

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

13 / 24

slide-43
SLIDE 43

Node Proximity Simple Approaches

Outline

1

Community Structures

2

Node Proximity Simple Approaches Graph-theoretic Approaches SimRank Random Walk based Approaches

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

14 / 24

slide-44
SLIDE 44

Node Proximity Simple Approaches

Simple approaches

Similarity metrics Given a graph G = (V , E), N(vi) denotes the neighbors of node vi. Common neighbors: |N(vi) ∩ N(vj)|. Jaccard coefficient: |N(vi)∩N(vj)|

|N(vi)∪N(vj)|.

Adamic/Adar:

v∈N(vi)∩N(vj) 1 log |N(v)|.

Preferential attachment: |N(vi)| × |N(vj)|.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

15 / 24

slide-45
SLIDE 45

Node Proximity Simple Approaches

Simple approaches

Similarity metrics Given a graph G = (V , E), N(vi) denotes the neighbors of node vi. Common neighbors: |N(vi) ∩ N(vj)|. Jaccard coefficient: |N(vi)∩N(vj)|

|N(vi)∪N(vj)|.

Adamic/Adar:

v∈N(vi)∩N(vj) 1 log |N(v)|.

Preferential attachment: |N(vi)| × |N(vj)|. Drawbacks Jaccard coefficient treats a graph as a set of transactions which are independent. Thus, it loss the topological information of a graph.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

15 / 24

slide-46
SLIDE 46

Node Proximity Graph-theoretic Approaches

Outline

1

Community Structures

2

Node Proximity Simple Approaches Graph-theoretic Approaches SimRank Random Walk based Approaches

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

16 / 24

slide-47
SLIDE 47

Node Proximity Graph-theoretic Approaches

Graph-theoretic approaches

Idea (s1, t1) is more similar than (s4, t4). Simple metrics

Number of hops Sum of weights of hops

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

17 / 24

slide-48
SLIDE 48

Node Proximity Graph-theoretic Approaches

Graph-theoretic approaches

Idea (s1, t1) is more similar than (s4, t4). Simple metrics

Number of hops Sum of weights of hops

Drawbacks (s, t) in (b) more similar than in (a) and (c) because of linked via more paths. In (c), s and t are probably unrelated since (s, t) linked via high degree node.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

17 / 24

slide-49
SLIDE 49

Node Proximity Graph-theoretic Approaches

Graph-theoretic approaches cont.

Max-flow approach Heavy weighted links and number of paths matter, but path length doesn’t. The max flow of (s1, t1) is the same to that of (s4, t4).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

18 / 24

slide-50
SLIDE 50

Node Proximity Graph-theoretic Approaches

Graph-theoretic approaches cont.

Max-flow approach Heavy weighted links and number of paths matter, but path length doesn’t. The max flow of (s1, t1) is the same to that of (s4, t4). Katz

K =

  • l≥1

βl|paths<l>

vi,vj |,

where β ∈ (0, 1) is a pre-defined parameter. paths<l>

vi,vj is the set of exact l-length paths from vi to vj.

|paths<l>

vi,vj | = 1 if and only if vi and vi are connected by a link.

K = βA + β2A2 + · · · = (I − βA)−1 − I.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

18 / 24

slide-51
SLIDE 51

Node Proximity SimRank

Outline

1

Community Structures

2

Node Proximity Simple Approaches Graph-theoretic Approaches SimRank Random Walk based Approaches

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

19 / 24

slide-52
SLIDE 52

Node Proximity SimRank

SimRank [G. Jeh and J. Widom KDD 2002]

Idea Two objects are similar if they are connected to similar objects. Iterative computation with initial value s(0)(a, b) = 1, a = b; 0, a = b. s(k+1)(a, b) =

  • 1,

a = b;

α |N(a)N(b)|

  • c∈N(a)
  • d∈N(b)| s(k)(c, d),

a = b. where α ∈ [0, 1] is a constant. However, it needs O(n3) runtime. Many works improve the runtime.

Simrank++ [VLDB 2008] S = C(ATSA) + I with Kronecker product [EDBT 2010] Parallel SimRank [VLDB 2015]

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

20 / 24

slide-53
SLIDE 53

Node Proximity Random Walk based Approaches

Outline

1

Community Structures

2

Node Proximity Simple Approaches Graph-theoretic Approaches SimRank Random Walk based Approaches

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

21 / 24

slide-54
SLIDE 54

Node Proximity Random Walk based Approaches

Random walk

Hitting time Hitting time Hvi,vj is the expected number of steps required for a random walk starting at vi to reach vj. Similar to Katz, all paths need to enumerate when hitting time is computed, but longer paths have smaller probabilities. H = W + 2W 2 + · · · = ((I − W )−1 − I)(I − W )−1. Commute time Commute time Hvi,vj +Hvj,vi is the expected number of steps required for a random walk starting at vi to reach vj, then return to vi. For an undirected graph, Hvi,vj = Hvj,vi. Similar, we can compute commute time for every pair of nodes.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

22 / 24

slide-55
SLIDE 55

Node Proximity Random Walk based Approaches

Personalized random walk with restarts

Idea ri = cW ri + (1 − c)ei, where ri ∈ Rn×1 is ranking score w.r.t. vi, W is the transition probability matrix, and ei ∈ Rn×1 is start score w.r.t. vi with rii = 1, 0 otherwise. (I − cD−1A)ri = (1 − c)ei. ri = (1 − c)(I − cD−1A)−1ei.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

23 / 24

slide-56
SLIDE 56

Take-home msg.

Take-home messages

Community detection

Global structure Local structure

Node proximity

Simple approaches Graph-theoretic approaches SimRank Random walk based approaches

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications

  • Jan. 6, 2019

24 / 24