http://cs224w.stanford.edu Degree distribution: P(k) Path length: - PowerPoint PPT Presentation

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu

Degree distribution: P(k) Path length: h Clustering coefficient: C Connected components: s Definitions will be presented for undirected graphs, sometimes we will explicitly mention extensions to directed graphs, and sometimes extensions will be obvious 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 3

¡ Degree distribution P(k) : Probability that a randomly chosen node has degree k N k = # nodes with degree k ¡ Normalized histogram: ➔ plot P(k) = N k / N P(k) 0.6 0.5 0.4 0.3 0.2 0.1 1 2 3 4 k For directed graphs we have separate in- and out-degree distributions. 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 4

¡ A path is a sequence of nodes in which each node is linked to the next one n = { i 0 , i 1 , i 2 ,..., i n } n = {( i 0 , i P P 1 ),( i 1 , i 2 ),( i 2 , i 3 ),...,( i n - 1 , i n )} ¡ A path can intersect itself and pass through the same edge multiple times B F A § E.g.: ACBDCDEG E D G C H X 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 5

¡ Distance (shortest path, geodesic) D between a pair of nodes is defined as A X the number of edges along the C shortest path connecting the nodes B § *If the two nodes are not connected, the h B,D = 2 h A,X = ∞ distance is usually defined as infinite (or zero) ¡ In directed graphs, paths need to D follow the direction of the arrows A § Consequence: Distance is C B not symmetric : h B,C ≠ h C,B h B,C = 1, h C,B = 2 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 6

¡ Diameter: The maximum (shortest path) distance between any pair of nodes in a graph ¡ Average path length for a connected graph or a strongly connected directed graph 1 å • h ij is the distance from node i to node j = h h E max is the max number of edges (total • ij 2 E number of node pairs) = n(n-1)/2 ¹ i , j i max § Many times we compute the average only over the connected pairs of nodes (that is, we ignore “infinite” length paths) § Note that ths measure also applied to (strongly) connected components of a graph 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 7

¡ Clustering coefficient (for undirected graphs): § How connected are i ’s neighbors to each other? § Node i with degree k i Note 𝑙 " (𝑙 " − 1) is § C i Î [0,1] max number of edges between the 𝑙 " neighbors where e i is the number of edges § between the neighbors of node i Clustering coefficient is undefined (or defined to be 0) for nodes with degree 0 or 1 1 N å ¡ Average clustering coefficient: = C C i N i 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 8

¡ Clustering coefficient (for undirected graps): § How connected are i ’s neighbors to each other? § Node i with degree k i where e i is the number of edges § between the neighbors of node i k B =2, e B =1, C B =2/2 = 1 B F A k D =4, e D =2, C D =4/12 = 1/3 E D G Avg. clustering: C=0.33 C H 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 9

¡ Size of the largest connected component § Largest set where any two vertices can be joined by a path ¡ Largest component = Giant component B How to find connected components: A Start from random node and perform • Breadth First Search (BFS) Label the nodes that BFS visits • If all nodes are visited, the network is connected C • F D Otherwise find an unvisited node and repeat BFS • H G 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 10

Degree distribution: P(k) Path length: h Clustering coefficient: C Connected components: s 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 11

MSN Messenger: ¡ 1 month of activity § 245 million users logged in § 180 million users engaged in conversations § More than 30 billion conversations § More than 255 billion exchanged messages 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 13

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 14

Network: 180M people, 1.3B edges 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 15

Messaging as an undirected graph • Edge (u,v) if users u and v exchanged at least 1 msg Contact Conversation • N=180 million people • E=1.3 billion edges 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 16

Note: We plotted the same data as on the previous slide, just the axes are now logarithmic. 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 18

Avg. clustering of the MSN: C = 0.1140 1 å = C C C k : average C i of nodes i of degree k: k i N = i : k k k i 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 19

Steps #Nodes 0 1 1 10 2 78 3 3,96 4 8,648 # nodes as we do BFS out of a random node 5 3,299,252 Number of links 6 28,395,849 between pairs of 7 79,059,497 nodes in the 8 52,995,778 largest connected 9 10,321,008 component 10 1,955,007 11 518,410 12 149,945 13 44,616 14 13,740 15 4,476 16 1,542 17 536 18 167 19 71 20 29 21 16 Avg. path length 6.6 22 10 23 3 90% of the nodes can be reached in < 8 hops 24 2 25 3 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 21

Heavily skewed; Degree distribution: avg. degree = 14.4 Path length: 6.6 Clustering coefficient: 0.11 Connectivity: giant component Are these values “expected”? Are they “surprising”? To answer this we need a model! 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 22

a. Undirected network N=2,018 proteins as nodes E=2,930 binding interactions as links. b. Degree distribution: Skewed. Average degree <k>=2.90 c. Diameter: Avg. path length = 5.8 d. Clustering: Avg. clustering = 0.12 Connectivity: 185 components the largest component has 1,647 nodes (81% of nodes) 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 23

¡ Erdös-Renyi Random Graphs [Erdös-Renyi, ‘60] ¡ Two variants: § G np : undirected graph on n nodes where each edge (u,v) appears i.i.d. with probability p § G nm : undirected graph with n nodes, and m edges picked uniformly at random What kind of networks do such models produce? 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 25

¡ n and p do not uniquely determine the graph! § The graph is a result of a random process ¡ We can have many different realizations given the same n and p n = 10 p= 1/6 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 26

Degree distribution: P(k) Path length: h Clustering coefficient: C What are the values of these properties for G np ? 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 27

¡ Fact: Degree distribution of G np is binomial. ¡ Let P(k) denote the fraction of nodes with degree k : - æ ö n 1 = ç ÷ - - - k n 1 k P ( k ) p ( 1 p ) ç ÷ P(k) k è ø Probability of Probability of missing the rest of k Select k nodes the n-1-k edges having k edges out of n-1 1/2 " % k = 1 − p 1 1 σ Mean, variance of a binomial distribution ≈ $ ' ( n − 1) 1/2 p ( n − 1) # & = ( - k p n 1 ) By the law of large numbers, as the network size increases, the distribution becomes increasingly σ 2 = p (1 − p )( n − 1) narrow—we are increasingly confident that the degree of a node is in the vicinity of k . 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 28

2 e ¡ Remember: = Where e i is the number i C of edges between i’s i - k ( k 1 ) neighbors i i ¡ Edges in G np appear i.i.d. with prob. p e i = p k i ( k i − 1) ¡ So, expected E[ e i ] is: 2 Number of distinct pairs of Each pair is connected neighbors of node i of degree k i with prob. p × - p k ( k 1 ) k k ¡ Then E[C i ] : = = = » C i i p - - k ( k 1 ) n 1 n i i Clustering coefficient of a random graph is small. If we generate bigger and bigger graphs with fixed avg. degree 𝑙 (that is we set 𝑞 = 𝑙 ⋅ 1/𝑜 ), then C decreases with the graph size n . 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 29

æ - ö n 1 Degree distribution: - - = ç ÷ - k n 1 k P ( k ) p ( 1 p ) ç ÷ k è ø Clustering coefficient: C=p=k/n Path length: next! Connectivity: 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 30

http://cs224w.stanford.edu Degree distribution: P(k) Path length: - PowerPoint PPT Presentation

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu Degree distribution: P(k) Path length: h Clustering coefficient: C Connected components: s Definitions will be presented for undirected graphs,

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs224w.stanford.edu Stanford Social Web (ca. 1999) network

http://cs224w.stanford.edu 1. Introduction to Knowledge Graphs 2. Knowledge Graph completion 3.

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it?

http://cs224w.stanford.edu Probabilistic models of network contagion Probabilistic models

Entanglement Spectroscopy and its application to the fractional quantum Hall phases N. Regnault

Transforming Clinical Practice Initiative: A Service Delivery Innovation Model Better Health.

CS780 Discrete-State Models Instructor: Peter Kemper R 006, phone 221-3462, email:kemper@cs.wm.edu

Game Theory -- Lecture 3 Patrick Loiseau EURECOM Fall 2016 1 Lecture 2 recap Defined

Ideal lattices in multicubic fields Andrea LESAVOUREY Thomas PLANTARD Willy SUSILO School of

Ethereum and Solidity Prof. Tom Austin San Jos State University Bitcoin (BTC) Protocol

Vivian de la Incera Sharif University of Technology Webinar August 4, 2020 1 Outline 1. Why

Accelerating lattice reduction algorithms with floating-point arithmetic Damien Stehl e