http://cs224w.stanford.edu Three basic stages: 1) Pre-processing - - PowerPoint PPT Presentation

http cs224w stanford edu three basic stages
SMART_READER_LITE
LIVE PREVIEW

http://cs224w.stanford.edu Three basic stages: 1) Pre-processing - - PowerPoint PPT Presentation

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu Three basic stages: 1) Pre-processing Construct a matrix representation of the graph 2) Decomposition Compute eigenvalues and eigenvectors of


slide-1
SLIDE 1

CS224W: Analysis of Networks Jure Leskovec, Stanford University

http://cs224w.stanford.edu

slide-2
SLIDE 2

¡ Three basic stages:

§ 1) Pre-processing

§ Construct a matrix representation of the graph

§ 2) Decomposition

§ Compute eigenvalues and eigenvectors of the matrix § Map each point to a lower-dimensional representation based on one or more eigenvectors

§ 3) Grouping

§ Assign points to two or more clusters, based on the new representation

¡ But first, let’s define the problem

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2

slide-3
SLIDE 3

¡ Undirected graph 𝑯(𝑾, 𝑭): ¡ Bi-partitioning task:

§ Divide vertices into two disjoint groups 𝑩, 𝑪

¡ Questions:

§ How can we define a “good” partition of 𝑯? § How can we efficiently identify such a partition?

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3

1 3 2 5 4 6 A B

1 3 2 5 4 6

slide-4
SLIDE 4

¡ What makes a good partition?

§ Maximize the number of within-group connections § Minimize the number of between-group connections

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4

1 3 2 5 4 6

A B

slide-5
SLIDE 5

A B

¡ Express partitioning objectives as a function

  • f the “edge cut” of the partition

¡ Cut: Set of edges with only one vertex in a

group:

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5

cut(A,B) = 2

1 3 2 5 4 6

If the graph is weighted wij is the weight, otherwise, all wij=1

slide-6
SLIDE 6

¡ Criterion: Minimum-cut

§ Minimize weight of connections between groups

¡ Degenerate case: ¡ Problem:

§ Only considers external cluster connections § Does not consider internal cluster connectivity

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6

arg minA,B cut(A,B)

“Optimal” cut Minimum cut

slide-7
SLIDE 7

¡ Criterion: Conductance [Shi-Malik, ’97]

§ Connectivity between groups relative to the density of each group

𝒘𝒑𝒎(𝑩): total weighted degree of the nodes in 𝑩: 𝒘𝒑𝒎 𝑩 = ∑ 𝒍𝒋

  • 𝒋∈𝑩

(number of edge end points in A)

n Why use this criterion?

n Produces more balanced partitions

¡ How do we efficiently find a good partition?

§ Problem: Computing optimal cut is NP-hard

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7

[Shi-Malik]

slide-8
SLIDE 8

¡ A: adjacency matrix of undirected G

§ Aij =1 if (𝒋, 𝒌) is an edge, else 0

¡ x is a vector in Ân with components (𝒚𝟐, … , 𝒚𝒐)

§ Think of it as a label/value of each node of 𝑯

¡ What is the meaning of A× x? ¡ Entry yi is a sum of labels xj of neighbors of i

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8

𝑧: = ; 𝐵:=𝑦= = ; 𝑦=

:,= ∈? @ =AB

slide-9
SLIDE 9

¡ jth coordinate of A× x :

§ Sum of the x-values

  • f neighbors of j

§ Make this a new value at node j

¡ Spectral Graph Theory:

§ Analyze the “spectrum” of matrix representing 𝑯 § Spectrum: Eigenvectors 𝒚𝒋 of a graph, ordered by the magnitude (strength) of their corresponding eigenvalues 𝝁𝒋:

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9

𝑩 ⋅ 𝒚 = 𝝁 ⋅ 𝒚

Note: We sort 𝝁𝒋 in ascending (not descending) order!

slide-10
SLIDE 10

¡ Suppose all nodes in 𝑯 have degree 𝒆

and 𝑯 is connected

¡ What are some eigenvalues/vectors of 𝑯?

𝑩× 𝒚 = 𝝁 ⋅ 𝒚 What is l? What x?

§ Let’s try: 𝒚 = (𝟐, 𝟐, … , 𝟐) § Then: 𝑩 ⋅ 𝒚 = 𝒆, 𝒆, … , 𝒆 = 𝝁 ⋅ 𝒚. So: 𝝁 = 𝒆 § We found an eigenpair of 𝑯: 𝒚 = (𝟐, 𝟐, … , 𝟐), 𝝁 = 𝒆

¡ d is the largest eigenvalue of A (see next slide)

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10

Remember the meaning of 𝒛 = 𝑩× 𝒚:

Note, this is just one eigenpair. An n by n matrix can have up to n eigenpairs. 𝑧: = ; 𝐵:=𝑦= = ; 𝑦=

:,= ∈? @ =AB

slide-11
SLIDE 11

¡ G is d-regular connected, A is its adjacency matrix ¡ Claim:

§ (1) d has multiplicity of 1 (there is only 1 eigenvector associated with eigenvalue d) § (2) d is the largest eigenvalue of A

¡ Proof:

§ To obtain d we needed 𝒚𝒋 = 𝒚𝒌 for every 𝑗, 𝑘 § This means 𝒚 = 𝑑 ⋅ (1,1, … , 1) for some const. 𝑑 § Define: Set 𝑻 = nodes 𝒋 with maximum value of 𝒚𝒋 § Then consider some vector 𝒛 which is not a multiple of vector (𝟐, … , 𝟐). So not all nodes 𝒋 (with labels 𝒛𝒋 ) are in 𝑻 § Consider some node 𝒌 ∈ 𝑻 and a neighbor 𝒋 ∉ 𝑻 then node 𝒌 gets a value strictly less than 𝒆 § So 𝑧 is not eigenvector! And so 𝒆 is the largest eigenvalue!

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11

Details!

slide-12
SLIDE 12

¡ What if 𝑯 is not connected?

§ 𝑯 has 2 components, each 𝒆-regular

¡ What are some eigenvectors?

§ 𝒚 = Put all 𝟐s on 𝑫 and 𝟏s on 𝑪 or vice versa

§ 𝒚′ = 𝟐, … , 𝟐, 𝟏, … , 𝟏 𝑼 then 𝐁 ⋅ 𝒚′ = 𝒆, … , 𝒆, 𝟏, … , 𝟏 𝑼 § 𝒚′′ = 𝟏, … , 𝟏, 𝟐, … , 𝟐 𝑼 then 𝑩 ⋅ 𝒚′′ = 𝟏, … , 𝟏, 𝒆, … , 𝒆 𝑼 § And so in both cases the corresponding 𝝁 = 𝒆

¡ A bit of intuition:

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12

C B

C B

𝝁𝒐 = 𝝁𝒐R𝟐

|C| |B|

C B

𝝁𝒐 − 𝝁𝒐R𝟐 ≈ 𝟏 2nd largest eigval. 𝜇@RB now has value very close to 𝜇@

slide-13
SLIDE 13

¡ More intuition:

§ If the graph is connected (right example) then we already know that 𝒚𝒐 = (𝟐, … 𝟐) is an eigenvector § Since eigenvectors are orthogonal then the components of 𝒚𝒐R𝟐 must sum to 0.

§ Why? 𝒚𝒐 ⋅ 𝒚𝒐R𝟐 = 𝟏 then ∑ 𝒚𝒐 𝒋 ⋅ 𝒚𝒐R𝟐[𝒋]

  • 𝒋

= ∑ 𝒚𝒐 [𝒋]

  • 𝒋

§ So we can look at the eigenvector of the 2nd largest eigenvalue and declare nodes with positive label in C and negative label in B.

§ So there is something interesting about 𝒚𝒐R𝟐…

(but there is still a lot for us to figure out)

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13

C B

𝝁𝒐 = 𝝁𝒐R𝟐

C B

𝝁𝒐 − 𝝁𝒐R𝟐 ≈ 𝟏 2nd largest eigval. 𝜇@RB now has value very close to 𝜇@

slide-14
SLIDE 14

¡ Adjacency matrix (A):

§ n´ n matrix § A=[aij], aij=1 if edge between node i and j

¡ Important properties:

§ Symmetric matrix § Eigenvectors are real-valued and orthogonal

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 14

1 3 2 5 4 6

1 2 3 4 5 6 1

1 1 1

2

1 1

3

1 1 1

4

1 1 1

5

1 1 1

6

1 1

slide-15
SLIDE 15

¡ Degree matrix (D):

§ n´ n diagonal matrix § D=[dii], dii = degree of node i

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15

1 3 2 5 4 6

1 2 3 4 5 6 1

3

2

2

3

3

4

3

5

3

6

2

slide-16
SLIDE 16

¡ Laplacian matrix (L):

§ n´ n symmetric matrix

¡ What is trivial eigenpair?

§ 𝒚 = (𝟐, … , 𝟐) then 𝑴 ⋅ 𝒚 = 𝟏 and so 𝝁 = 𝝁𝟐 = 𝟏

¡ Important properties:

§ Eigenvalues are non-negative real numbers § Eigenvectors are real and orthogonal

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 16

𝑴 = 𝑬 − 𝑩

1 3 2 5 4 6

1 2 3 4 5 6 1 3

  • 1
  • 1
  • 1

2

  • 1

2

  • 1

3

  • 1
  • 1

3

  • 1

4

  • 1

3

  • 1
  • 1

5

  • 1
  • 1

3

  • 1

6

  • 1
  • 1

2

slide-17
SLIDE 17

(a) All eigenvalues are ≥ 0 (b) 𝑦\𝑀𝑦 = ∑ 𝑀:=𝑦:𝑦=

  • :=

≥ 0 for every 𝑦 (c) 𝑀 = 𝑂\ ⋅ 𝑂

§ That is, 𝑀 is positive semi-definite

¡ Proof: (the 3 facts are saying the same thing)

§ (c)Þ(b): 𝑦\𝑀𝑦 = 𝑦\𝑂\𝑂𝑦 = 𝑦𝑂 \ 𝑂𝑦 ≥ 0

§ As it is just the square of length of 𝑂𝑦

§ (b)Þ(a): Let 𝝁 be an eigenvalue of 𝑴. Then by (b) 𝑦\𝑀𝑦 ≥ 0 so 𝑦\𝑀𝑦 = 𝑦\𝜇𝑦 = 𝜇𝑦\𝑦 Þ 𝝁 ≥ 𝟏 § (a)Þ(c): is also easy! Do it yourself.

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17

Details!

slide-18
SLIDE 18

¡ Fact: For symmetric matrix M: ¡ What is the meaning of min xT L x on G?

§ 𝑦\𝑀 𝑦 = ∑ 𝑀:=

@ :,=AB

𝑦:𝑦= = ∑ 𝐸:= − 𝐵:=

@ :,=AB

𝑦:𝑦= § = ∑ 𝐸::𝑦:

`

  • :

− ∑ 2𝑦:𝑦=

  • :,= ∈?

§ = ∑ (𝑦:

` + 𝑦= ` − 2𝑦:𝑦=)

  • :,= ∈?

= ∑ 𝒚𝒋 − 𝒚𝒌

𝟑

  • 𝒋,𝒌 ∈𝑭

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18

Node 𝒋 has degree 𝒆𝒋. So, value 𝒚𝒋

𝟑 needs to be summed up 𝒆𝒋 times.

But each edge (𝒋, 𝒌) has two endpoints so we need 𝒚𝒋

𝟑 +𝒚𝒌 𝟑

λ2 = min

x : xT w1=0 xT Mx xT x

(w1 is eigenvector corresponding to λ1)

See next slide for a proof

slide-19
SLIDE 19

¡ Write 𝑦 in basis of eigenvecs 𝑥B, 𝑥`, … , 𝑥@ of

𝑵. So, 𝑦 = ∑ 𝛽:𝑥:

@ :

¡ Then we get: 𝑁𝑦 = ∑ 𝛽:𝑁𝑥:

  • :

= ∑ 𝛽:𝜇:𝑥:

  • :

¡ So, what is 𝒚𝑼𝑵𝒚?

§ 𝑦\𝑁𝑦 = ∑ 𝛽:𝑥:

  • :

∑ 𝛽:𝜇:𝑥:

  • :

= ∑ 𝛽:𝜇=𝛽=𝑥:𝑥

=

  • :=

= ∑ 𝛽:

`𝜇:𝑥:𝑥:

  • :

= ∑ 𝝁𝒋𝜷𝒋

𝟑

  • 𝒋

§ To minimize this over all unit vectors x orthogonal to: w = min over choices of (𝛽B, … 𝛽@) so that: ∑𝛽:

` = 1 (unit length) ∑𝛽: = 0 (orthogonal to 𝑥B)

§ To minimize this, set 𝜷𝟑 = 𝟐 and so ∑ 𝝁𝒋𝜷𝒋

𝟑 = 𝝁𝟑

  • 𝒋

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19

𝝁𝒋𝒙𝒋 = 𝟏 if 𝒋 ≠ 𝒌 1 otherwise

Details!

λ2 = min

x : xT w1=0 xT Mx xT x

slide-20
SLIDE 20

¡ What else do we know about x?

§ 𝒚 is unit vector: ∑ 𝒚𝒋

𝟑 = 𝟐

  • 𝒋

§ 𝒚 is orthogonal to 1st eigenvector (𝟐, … , 𝟐) thus: ∑ 𝒚𝒋 ⋅ 𝟐

  • 𝒋

= ∑ 𝒚𝒋

  • 𝒋

= 𝟏

¡ Remember:

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 20

å å

  • =

Î 2 2 ) , ( 2

) ( min

i i j i E j i

x x x l

All labelings

  • f nodes 𝑗 so

that ∑𝑦: = 0

We want to assign values 𝒚𝒋 to nodes i such that few edges cross 0. (we want xi and xj to subtract each other)

𝑦: x 𝑦=

Balance to minimize

λ2 = min

x : xT w1=0 xT Mx xT x

slide-21
SLIDE 21

¡ Back to finding the optimal cut ¡ Express partition (A,B) as a vector

𝒛𝒋 = k+𝟐 −𝟐 𝒋𝒈 𝒋 ∈ 𝑩 𝒋𝒈 𝒋 ∈ 𝑪

¡ Enforce that |A| = |B| à Σjyi = 0

§ Equivalent to being orthogonal to the trivial eigenvector (𝟐, … , 𝟐)

¡ We can minimize the cut of the partition by finding

a vector y that minimizes:

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 21

𝑧: = −1 0 𝑧= = +1

Can’t solve exactly. Let’s relax 𝒛 and allow it to take any real value.

slide-22
SLIDE 22

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 22

n 𝝁𝟑 = 𝐧𝐣𝐨

𝒛

𝒈 𝒛 : The minimum value of 𝒈(𝒛) is given by the 2nd smallest eigenvalue λ2 of the Laplacian matrix L

n 𝐲 = 𝐛𝐬𝐡 𝐧𝐣𝐨𝐳 𝒈 𝒛 : The optimal solution for y is

given by the corresponding eigenvector 𝒚, referred to as the Fiedler vector

n Can use sign of xi to determine cluster assignment

  • f node i

𝑦: x 𝑦=

min

y∈Rn :

i yi=0f(y) =

(i,j)∈E(yi − yj)2 = yT Ly

Slide 18

λ2 = min

x : xT w1=0 xT Mx xT x

slide-23
SLIDE 23

¡ Suppose there is a partition of G into A and B

where 𝐵 ≤ |𝐶|, s.t. 𝜷 = (# yz{y| }~•€ • ‚• ƒ)

  • then 𝝁𝟑 ≤ 2𝜷

§ This is the approximation guarantee of the spectral clustering: Spectral finds a cut that has at most twice the conductance as the optimal one of conductance 𝜷.

¡ Proof:

§ Let: a=|A|, b=|B| and e= # edges from A to B § Enough to choose some 𝒚𝒋 based on A and B such that: 𝜇` ≤

∑ „…R„†

∑ „…

≤ 2𝛽 (while also ∑ 𝑦: = 0

  • :

)

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 23

𝝁𝟑 is only smaller

Details!

Note: |A|<|B|

slide-24
SLIDE 24

¡ Proof (continued):

§ 1) Let’s set: 𝒚𝒋 = ˆ −

𝟐 𝒃

+

𝟐 𝒄

𝒋𝒈 𝒋 ∈ 𝑩 𝒋𝒈 𝒋 ∈ 𝑪

§ Let’s quickly verify that ∑ 𝑦: = 0: 𝑏 − B

Œ + 𝑐 B Ž = 𝟏

  • :

§ 2) Then:

∑ „…R„†

∑ „…

=

  • ‘•

’ ‡

  • …∈“,†∈”

Œ R•

’ ‡

‘Ž •

  • ‡ =

y⋅ •

’‘•

  • ’‘•
  • =

𝑓

B Œ + B Ž ≤ 𝑓 B Œ + B Œ = 𝒇 𝟑 𝒃 ≤ 𝟑𝜷

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 24

Details!

Which proves that the cost achieved by spectral is better than twice the OPT cost

e … number of edges between A and B Note: |A|<|B|

slide-25
SLIDE 25

¡ Putting it all together: The Cheeger inequality

𝜷𝟑 𝟑𝒍𝒏𝒃𝒚 ≤ 𝝁𝟑 ≤ 𝟑𝜷

§ where 𝑙€Œ„ is the maximum node degree in the graph

§ Note we only provide the 1st part:𝝁𝟑 ≤ 𝟑𝜷 § We did not prove

𝜷𝟑 𝟑𝒍𝒏𝒃𝒚 ≤ 𝝁𝟑

§ Overall this always certifies that 𝝁𝟑 always gives a useful bound

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 25

Details!

slide-26
SLIDE 26

¡

How to define a “good” partition of a graph?

§ Minimize a given graph cut criterion

¡ How to efficiently identify such a partition?

§ Approximate using information provided by the eigenvalues and eigenvectors of a graph

¡ Spectral Clustering

11/15/17 26 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu

slide-27
SLIDE 27

¡ Three basic stages:

§ 1) Pre-processing

§ Construct a matrix representation of the graph

§ 2) Decomposition

§ Compute eigenvalues and eigenvectors of the matrix § Map each point to a lower-dimensional representation based on one or more eigenvectors

§ 3) Grouping

§ Assign points to two or more clusters, based on the new representation

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 27

slide-28
SLIDE 28

¡ 1) Pre-processing:

§ Build Laplacian matrix L of the graph

¡ 2)

Decomposition:

§ Find eigenvalues l and eigenvectors x

  • f the matrix L

§ Map vertices to corresponding components of l2

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 28

0.0

  • 0.4
  • 0.4

0.4

  • 0.6

0.4 0.5 0.4

  • 0.2
  • 0.5
  • 0.3

0.4

  • 0.5

0.4 0.6 0.1

  • 0.3

0.4 0.5

  • 0.4

0.6 0.1 0.3 0.4 0.0 0.4

  • 0.4

0.4 0.6 0.4

  • 0.5
  • 0.4
  • 0.2
  • 0.5

0.3 0.4 5.0 4.0 3.0 3.0 1.0 0.0

l= X =

How do we now find the clusters?

  • 0.6

6

  • 0.3

5

  • 0.3

4

0.3

3

0.6

2

0.3

1 1 2 3 4 5 6 1 3

  • 1
  • 1
  • 1

2

  • 1

2

  • 1

3

  • 1
  • 1

3

  • 1

4

  • 1

3

  • 1
  • 1

5

  • 1
  • 1

3

  • 1

6

  • 1
  • 1

2

slide-29
SLIDE 29

¡ 3) Grouping:

§ Sort components of reduced 1-dimensional vector § Identify clusters by splitting the sorted vector in two

¡ How to choose a splitting point?

§ Naïve approaches:

§ Split at 0 or median value

§ More expensive approaches:

§ Attempt to minimize normalized cut in 1-dimension (sweep over ordering of nodes induced by the eigenvector)

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 29

  • 0.6

6

  • 0.3

5

  • 0.3

4

0.3

3

0.6

2

0.3

1

Split at 0: Cluster A: Positive points Cluster B: Negative points

0.3

3

0.6

2

0.3

1

  • 0.6

6

  • 0.3

5

  • 0.3

4

A B

slide-30
SLIDE 30

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 30

Rank in x2 Value of x2

slide-31
SLIDE 31

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 31

Rank in x2 Value of x2

Components of x2

slide-32
SLIDE 32

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 32

Components of x1 Components of x3

slide-33
SLIDE 33

¡ How do we partition a graph into k clusters? ¡ Two basic approaches:

§ Recursive bi-partitioning [Hagen et al., ’92]

§ Recursively apply bi-partitioning algorithm in a hierarchical divisive manner § Disadvantages: Inefficient, unstable

§ Cluster multiple eigenvectors [Shi-Malik, ’00]

§ Build a reduced space from multiple eigenvectors § Commonly used in recent papers § A preferable approach…

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 33

slide-34
SLIDE 34

¡ Approximates the optimal cut [Shi-Malik, ’00]

§ Can be used to approximate optimal k-way normalized cut

¡ Emphasizes cohesive clusters

§ Increases the unevenness in the distribution of the data § Associations between similar points are amplified, associations between dissimilar points are attenuated § The data begins to “approximate a clustering”

¡ Well-separated space

§ Transforms data to a new “embedded space”, consisting of k orthogonal basis vectors

¡ Multiple eigenvectors prevent instability due to

information loss

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 34

slide-35
SLIDE 35

5 10 15 20 25 30 35 40 45 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Eigenvalue k

λ1 λ2

¡ Eigengap:

§ The difference between two consecutive eigenvalues

¡ Most stable clustering is generally given by the

value k that maximizes eigengap 𝚬𝒍: 𝚬𝒍 = 𝝁𝒍 − 𝝁𝒍R𝟐

¡ Example:

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 35

Þ Choose 𝒍 = 𝟑

1 2

max l l - = Dk

slide-36
SLIDE 36
slide-37
SLIDE 37

¡ What if we want our clustering based on other

patterns (not edges)?

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 37

A B

Small subgraphs (motifs, graphlets) are building blocks of networks [Milo et al., ’02]

slide-38
SLIDE 38

Find modules based on motifs!

38

Network: Motif:

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-39
SLIDE 39

Different motifs reveal different modular structures!

39 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-40
SLIDE 40

Generalize Cut and Volume to motifs:

40

𝑓𝑒𝑕𝑓𝑡 𝑑𝑣𝑢 𝑛𝑝𝑢𝑗𝑔𝑡 𝑑𝑣𝑢 𝑤𝑝𝑚(𝑇) = #(edge

end-points in S)

𝑤𝑝𝑚𝑁(𝑇) = #(motif

end-points in S)

𝜚 𝑇 = #(𝑓𝑒𝑕𝑓𝑡 𝑑𝑣𝑢) 𝑤𝑝𝑚(𝑇) 𝜚 𝑇 = #(𝑛𝑝𝑢𝑗𝑔𝑡 𝑑𝑣𝑢) 𝑤𝑝𝑚¦(𝑇)

[Benson, Gleich, Leskovec, Science, 2016]

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-41
SLIDE 41

41

Motif:

φM(S) = motifs cut motif volume = 1 10

A B

Network:

A B

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-42
SLIDE 42

How do we find clusters of motifs?

§ Given a motif M and a graph G § Find a set of nodes S that minimizes motif conductance

Bad news: Finding set S with the minimal motif conductance is NP-hard!

42 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu

φM(S) = motifs cut motif volume =

11/15/17

slide-43
SLIDE 43

Solution: Motif Spectral Clustering

§ Input: Graph G and motif M § Using G form a new weighted graph W § Apply spectral clustering on W § Output the clusters

Theorem: Resulting clusters will obtain near

  • ptimal motif conductance

43 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-44
SLIDE 44

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 44

¡ Three steps:

§ 1) Pre-processing

§ Wij

(M) = # times (i, j) participates in the motif

§ 2) Decomposition

§ Use tandard spectral clustering (but on W(M))

§ 3) Grouping

§ Same as standard spectral clustering

3 1 1 1 1 1 1 1 1 1 1 1 1 1 2

Graph G Weighted graph W(M)

slide-45
SLIDE 45

3 1 1 1 1 1 1 1 1 1 1 1 1 1 2

45

Graph G Weighted graph W(M)

Wij

(M) = # of times edge (i,j) participates in motif M

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-46
SLIDE 46

Insight: Spectral clustering on weighted graph W(M) finds clusters of low motif conductance:

46

φM(S) = motifs cut motif volume =

3 1 1 1 1 1 1 1 1 1 1 1 1 1 2

Weighted graph W(M)

Wij

(M) = # of times edge (i,j) participates in motif M

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-47
SLIDE 47

Step 1: Form weighted graph W(M)

47

C

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-48
SLIDE 48

Step 2: Compute Fiedler vector f(M) associated with λ2

  • f the Laplacian of W(M)

48

L(M) = D−1/2(D − W (M))D−1/2 L(M)z = λ2z f(M) = D−1/2z

D = diag(W (M)e)

Diagonal degree matrix

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-49
SLIDE 49

49

Step 3: Sort nodes by values in f(M): f1, f2, …fn Let Sr = {f1, …, fr} and compute the motif conductance

  • f each Sr

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-50
SLIDE 50

Theorem: The algorithm finds a set of nodes S for which In other words: Clusters S found by our method are provably near optimal

50

φM(S) ≤ 4 q φ∗

M

q φ∗

M

φM(S)… motif conductance of S found by our algorithm

… motif conductance of optimal set S*

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-51
SLIDE 51

¡ Generalization of community detection to

higher-order structures

¡ Motif-conductance objective admits a motif

Cheeger inequality

¡ Simple, fast, and scalable:

51

C

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-52
SLIDE 52

1) We don’t know a motif of interest

§ Food webs and new applications

2) We know the motif of interest

§ Regulatory transcription networks, connectome, social networks

52 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-53
SLIDE 53

53

Florida Bay food web:

¡ Nodes: species in

the ecosystem

¡ Edges: carbon exchange

(who eats whom) Different motifs capture different energy flow patterns:

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-54
SLIDE 54

Which motif organizes the food web?

Approach:

¡ Run motif spectral clustering separately for

each of the 13 motifs

¡ Examine the Sweep profile (next slide) to see

which motif gives best clusters

54 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-55
SLIDE 55

55

A B

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-56
SLIDE 56

Observation: Network organizes based on motif M6 (but not M5 or M8)

¡ There exist good cuts

for M6 but not for M5

  • r M8

A B

56 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-57
SLIDE 57

M6 reveals known aquatic layers with higher accuracy (84% vs. 65%)

B

57

Micronutrient sources Benthic Fishes Benthic Macroinvertibrates Pelagic fishes and benthic prey

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-58
SLIDE 58

Aquatic layers organize based on M6

¡ Many instances of M6 inside ¡ Few instances of M6 cross

C D

58 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-59
SLIDE 59

¡ Nodes are groups of genes in mRNA ¡ Edges are directed transcriptional regulation

relationships

¡ The “feedforward loop” motif represents

biological function [Alon ‘07]

59

A

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-60
SLIDE 60

60

A

97% detection accuracy (vs. 68-82%)

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-61
SLIDE 61

¡ Feed forward loops:

61

D

C

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11/15/17

slide-62
SLIDE 62

¡ METIS:

§ Heuristic but works really well in practice

§ http://glaros.dtc.umn.edu/gkhome/views/metis

¡ Graclus:

§ Based on kernel k-means

§ http://www.cs.utexas.edu/users/dml/Software/graclus.html

¡ Louvain:

§ Based on Modularity optimization

§ http://perso.uclouvain.be/vincent.blondel/research/louvain.html

¡ Clique percorlation method:

§ For finding overlapping clusters

§ http://angel.elte.hu/cfinder/

11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 62