http://cs224w.stanford.edu Three basic stages: 1) Pre-processing - PowerPoint PPT Presentation

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu

¡ Three basic stages: § 1) Pre-processing § Construct a matrix representation of the graph § 2) Decomposition § Compute eigenvalues and eigenvectors of the matrix § Map each point to a lower-dimensional representation based on one or more eigenvectors § 3) Grouping § Assign points to two or more clusters, based on the new representation ¡ But first, let’s define the problem 11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2

5 1 ¡ Undirected graph 𝑯(𝑾, 𝑭): 2 6 4 3 ¡ Bi-partitioning task: § Divide vertices into two disjoint groups 𝑩, 𝑪 A B 5 1 2 6 4 3 ¡ Questions: § How can we define a “good” partition of 𝑯 ? § How can we efficiently identify such a partition? 11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3

¡ What makes a good partition? § Maximize the number of within-group connections § Minimize the number of between-group connections 5 1 2 6 4 3 A B 11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4

¡ Express partitioning objectives as a function of the “edge cut” of the partition ¡ Cut: Set of edges with only one vertex in a group: If the graph is weighted w ij is the weight, otherwise, all w ij =1 B A 5 1 cut(A,B) = 2 2 6 4 3 11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5

¡ Criterion: Minimum-cut § Minimize weight of connections between groups arg min A,B cut(A,B) ¡ Degenerate case: “Optimal” cut Minimum cut ¡ Problem: § Only considers external cluster connections § Does not consider internal cluster connectivity 11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6

� [Shi-Malik] ¡ Criterion: Conductance [Shi-Malik, ’97] § Connectivity between groups relative to the density of each group 𝒘𝒑𝒎(𝑩) : total weighted degree of the nodes in 𝑩 : 𝒘𝒑𝒎 𝑩 = ∑ 𝒍 𝒋 (number of edge end points in A) 𝒋∈𝑩 n Why use this criterion? n Produces more balanced partitions ¡ How do we efficiently find a good partition? § Problem: Computing optimal cut is NP-hard 11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7

¡ A : adjacency matrix of undirected G § A ij =1 if (𝒋, 𝒌) is an edge, else 0 ¡ x is a vector in Â n with components (𝒚 𝟐 , … , 𝒚 𝒐 ) § Think of it as a label/value of each node of 𝑯 ¡ What is the meaning of A × x ? @ 𝑧 : = ; 𝐵 := 𝑦 = = ; 𝑦 = =AB :,= ∈? ¡ Entry y i is a sum of labels x j of neighbors of i 11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8

¡ j th coordinate of A × x : § Sum of the x -values of neighbors of j 𝑩 ⋅ 𝒚 = 𝝁 ⋅ 𝒚 § Make this a new value at node j ¡ Spectral Graph Theory: § Analyze the “spectrum” of matrix representing 𝑯 § Spectrum: Eigenvectors 𝒚 𝒋 of a graph, ordered by the magnitude (strength) of their corresponding eigenvalues 𝝁 𝒋 : Note: We sort 𝝁 𝒋 in ascending (not descending) order! 11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9

¡ Suppose all nodes in 𝑯 have degree 𝒆 and 𝑯 is connected ¡ What are some eigenvalues/vectors of 𝑯 ? 𝑩 × 𝒚 = 𝝁 ⋅ 𝒚 What is l ? What x ? § Let’s try: 𝒚 = (𝟐, 𝟐, … , 𝟐) § Then: 𝑩 ⋅ 𝒚 = 𝒆, 𝒆, … , 𝒆 = 𝝁 ⋅ 𝒚 . So: 𝝁 = 𝒆 § We found an eigenpair of 𝑯 : 𝒚 = (𝟐, 𝟐, … , 𝟐) , 𝝁 = 𝒆 ¡ d is the largest eigenvalue of A (see next slide) Remember the meaning of 𝒛 = 𝑩 × 𝒚 : @ Note, this is just one eigenpair. An n by n 𝑧 : = ; 𝐵 := 𝑦 = = ; 𝑦 = matrix can have up to n eigenpairs. =AB :,= ∈? 11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10

Details! ¡ G is d -regular connected, A is its adjacency matrix ¡ Claim: § (1) d has multiplicity of 1 (there is only 1 eigenvector associated with eigenvalue d ) § (2) d is the largest eigenvalue of A ¡ Proof: § To obtain d we needed 𝒚 𝒋 = 𝒚 𝒌 for every 𝑗, 𝑘 § This means 𝒚 = 𝑑 ⋅ (1,1, … , 1) for some const. 𝑑 § Define: Set 𝑻 = nodes 𝒋 with maximum value of 𝒚 𝒋 § Then consider some vector 𝒛 which is not a multiple of vector (𝟐, … , 𝟐) . So not all nodes 𝒋 (with labels 𝒛 𝒋 ) are in 𝑻 § Consider some node 𝒌 ∈ 𝑻 and a neighbor 𝒋 ∉ 𝑻 then node 𝒌 gets a value strictly less than 𝒆 § So 𝑧 is not eigenvector! And so 𝒆 is the largest eigenvalue! 11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11

¡ What if 𝑯 is not connected? § 𝑯 has 2 components, each 𝒆 -regular C B ¡ What are some eigenvectors? § 𝒚 = Put all 𝟐 s on 𝑫 and 𝟏 s on 𝑪 or vice versa § 𝒚′ = 𝟐, … , 𝟐, 𝟏, … , 𝟏 𝑼 then 𝐁 ⋅ 𝒚′ = 𝒆, … , 𝒆, 𝟏, … , 𝟏 𝑼 |B| § 𝒚′′ = 𝟏, … , 𝟏, 𝟐, … , 𝟐 𝑼 then 𝑩 ⋅ 𝒚′′ = 𝟏, … , 𝟏, 𝒆, … , 𝒆 𝑼 |C| § And so in both cases the corresponding 𝝁 = 𝒆 ¡ A bit of intuition: 2 nd largest eigval. 𝜇 @RB now has C C B B value very close to 𝜇 @ 𝝁 𝒐 = 𝝁 𝒐R𝟐 𝝁 𝒐 − 𝝁 𝒐R𝟐 ≈ 𝟏 11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12

� � ¡ More intuition: 2 nd largest eigval. 𝜇 @RB now has C B C B value very close to 𝜇 @ 𝝁 𝒐 = 𝝁 𝒐R𝟐 𝝁 𝒐 − 𝝁 𝒐R𝟐 ≈ 𝟏 § If the graph is connected (right example) then we already know that 𝒚 𝒐 = (𝟐, … 𝟐) is an eigenvector § Since eigenvectors are orthogonal then the components of 𝒚 𝒐R𝟐 must sum to 0 . § Why? 𝒚 𝒐 ⋅ 𝒚 𝒐R𝟐 = 𝟏 then ∑ 𝒚 𝒐 𝒋 ⋅ 𝒚 𝒐R𝟐 [𝒋] = ∑ 𝒚 𝒐 [𝒋] 𝒋 𝒋 § So we can look at the eigenvector of the 2 nd largest eigenvalue and declare nodes with positive label in C and negative label in B. § So there is something interesting about 𝒚 𝒐R𝟐 … (but there is still a lot for us to figure out) 11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13

¡ Adjacency matrix ( A ): § n ´ n matrix § A=[a ij ], a ij =1 if edge between node i and j 1 2 3 4 5 6 5 0 1 1 0 1 0 1 1 1 0 1 0 0 0 2 2 6 1 1 0 1 0 0 3 4 3 0 0 1 0 1 1 4 1 0 0 1 0 1 5 ¡ Important properties: 0 0 0 1 1 0 6 § Symmetric matrix § Eigenvectors are real-valued and orthogonal 11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 14

¡ Degree matrix (D): § n ´ n diagonal matrix § D=[d ii ], d ii = degree of node i 1 2 3 4 5 6 3 0 0 0 0 0 1 5 1 0 2 0 0 0 0 2 2 0 0 3 0 0 0 3 6 4 0 0 0 3 0 0 4 3 0 0 0 0 3 0 5 0 0 0 0 0 2 6 11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15

1 2 3 4 5 6 ¡ Laplacian matrix (L): 1 3 -1 -1 0 -1 0 § n ´ n symmetric matrix 2 -1 2 -1 0 0 0 3 -1 -1 3 -1 0 0 5 1 4 0 0 -1 3 -1 -1 2 6 5 -1 0 0 -1 3 -1 4 3 6 0 0 0 -1 -1 2 𝑴 = 𝑬 − 𝑩 ¡ What is trivial eigenpair? § 𝒚 = (𝟐, … , 𝟐) then 𝑴 ⋅ 𝒚 = 𝟏 and so 𝝁 = 𝝁 𝟐 = 𝟏 ¡ Important properties: § Eigenvalues are non-negative real numbers § Eigenvectors are real and orthogonal 11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 16

� Details! (a) All eigenvalues are ≥ 0 (b) 𝑦 \ 𝑀𝑦 = ∑ 𝑀 := 𝑦 : 𝑦 = ≥ 0 for every 𝑦 := (c) 𝑀 = 𝑂 \ ⋅ 𝑂 § That is, 𝑀 is positive semi-definite ¡ Proof: (the 3 facts are saying the same thing) § (c) Þ (b): 𝑦 \ 𝑀𝑦 = 𝑦 \ 𝑂 \ 𝑂𝑦 = 𝑦𝑂 \ 𝑂𝑦 ≥ 0 § As it is just the square of length of 𝑂𝑦 § (b) Þ (a): Let 𝝁 be an eigenvalue of 𝑴 . Then by (b) 𝑦 \ 𝑀𝑦 ≥ 0 so 𝑦 \ 𝑀𝑦 = 𝑦 \ 𝜇𝑦 = 𝜇𝑦 \ 𝑦 Þ 𝝁 ≥ 𝟏 § (a) Þ (c): is also easy! Do it yourself. 11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17

� � � � See next slide ¡ Fact: For symmetric matrix M : for a proof x T Mx min λ 2 = x T x x : x T w 1 =0 ( w 1 is eigenvector corresponding to λ 1 ) ¡ What is the meaning of min x T L x on G ? @ @ § 𝑦 \ 𝑀 𝑦 = ∑ 𝑦 : 𝑦 = = ∑ 𝑀 := 𝐸 := − 𝐵 := 𝑦 : 𝑦 = :,=AB :,=AB ` § = ∑ 𝐸 :: 𝑦 : − ∑ 2𝑦 : 𝑦 = : :,= ∈? 𝟑 ` + 𝑦 = ` − 2𝑦 : 𝑦 = ) § = ∑ = ∑ (𝑦 : 𝒚 𝒋 − 𝒚 𝒌 :,= ∈? 𝒋,𝒌 ∈𝑭 𝟑 needs to be summed up 𝒆 𝒋 times. Node 𝒋 has degree 𝒆 𝒋 . So, value 𝒚 𝒋 𝟑 +𝒚 𝒌 𝟑 But each edge (𝒋, 𝒌) has two endpoints so we need 𝒚 𝒋 11/15/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18

http://cs224w.stanford.edu Three basic stages: 1) Pre-processing - PowerPoint PPT Presentation

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu Three basic stages: 1) Pre-processing Construct a matrix representation of the graph 2) Decomposition Compute eigenvalues and eigenvectors of

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs224w.stanford.edu Stanford Social Web (ca. 1999) network

http://cs224w.stanford.edu Three topics for today: 1. GNN recommendation (PinSage) 2.

Stages of Language Acquisition Learning In-Utero Stages of Babbling Stages of Phonemic

Migration to Stages V7 Stages V7 UI Concept 2 Stages V7 Reporting Migration Guide:

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

Trawling the Plank! What is plankton ? Plankton are organisms that drift in the ocean because

Build Successful, Self-Directed Deep Learners SENCER Summer Institute -- 2018 Stephen Carroll,

1 st semester Upper Intermediate Lesson 28 Topic 28: Unit 3. Keeping fit. Lesson 2. Healthy

Integration of: Leadership theory Mindfulness Interpersonal neurobiology Somatics

Using Reactive Caps for Dissolved and NAPL Contaminants Upal Ghosh Department of Chemical,

Spatial-temporal modelling of delta smelt in the San Francisco Estuary Ken Newman US Fish and

Changing Arctic Ocean Implications for marine biology and biogeochemistry Photo by Jen Freer Dr

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of