http cs224w stanford edu degree distribution p k path
play

http://cs224w.stanford.edu Degree distribution: P(k) Path length: - PowerPoint PPT Presentation

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu Degree distribution: P(k) Path length: h Clustering coefficient: C Connected components: s Definitions will be presented for undirected graphs,


  1. CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu

  2. Degree distribution: P(k) Path length: h Clustering coefficient: C Connected components: s Definitions will be presented for undirected graphs, sometimes we will explicitly mention extensions to directed graphs, and sometimes extensions will be obvious 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 3

  3. ¡ Degree distribution P(k) : Probability that a randomly chosen node has degree k N k = # nodes with degree k ¡ Normalized histogram: ➔ plot P(k) = N k / N P(k) 0.6 0.5 0.4 0.3 0.2 0.1 1 2 3 4 k For directed graphs we have separate in- and out-degree distributions. 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 4

  4. ¡ A path is a sequence of nodes in which each node is linked to the next one n = { i 0 , i 1 , i 2 ,..., i n } n = {( i 0 , i P P 1 ),( i 1 , i 2 ),( i 2 , i 3 ),...,( i n - 1 , i n )} ¡ A path can intersect itself and pass through the same edge multiple times B F A § E.g.: ACBDCDEG E D G C H X 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 5

  5. ¡ Distance (shortest path, geodesic) D between a pair of nodes is defined as A X the number of edges along the C shortest path connecting the nodes B § *If the two nodes are not connected, the h B,D = 2 h A,X = ∞ distance is usually defined as infinite (or zero) ¡ In directed graphs, paths need to D follow the direction of the arrows A § Consequence: Distance is C B not symmetric : h B,C ≠ h C,B h B,C = 1, h C,B = 2 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 6

  6. ¡ Diameter: The maximum (shortest path) distance between any pair of nodes in a graph ¡ Average path length for a connected graph or a strongly connected directed graph 1 å • h ij is the distance from node i to node j = h h E max is the max number of edges (total • ij 2 E number of node pairs) = n(n-1)/2 ¹ i , j i max § Many times we compute the average only over the connected pairs of nodes (that is, we ignore “infinite” length paths) § Note that ths measure also applied to (strongly) connected components of a graph 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 7

  7. ¡ Clustering coefficient (for undirected graphs): § How connected are i ’s neighbors to each other? § Node i with degree k i Note 𝑙 " (𝑙 " − 1) is § C i Î [0,1] max number of edges between the 𝑙 " neighbors where e i is the number of edges § between the neighbors of node i Clustering coefficient is undefined (or defined to be 0) for nodes with degree 0 or 1 1 N å ¡ Average clustering coefficient: = C C i N i 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 8

  8. ¡ Clustering coefficient (for undirected graps): § How connected are i ’s neighbors to each other? § Node i with degree k i where e i is the number of edges § between the neighbors of node i k B =2, e B =1, C B =2/2 = 1 B F A k D =4, e D =2, C D =4/12 = 1/3 E D G Avg. clustering: C=0.33 C H 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 9

  9. ¡ Size of the largest connected component § Largest set where any two vertices can be joined by a path ¡ Largest component = Giant component B How to find connected components: A Start from random node and perform • Breadth First Search (BFS) Label the nodes that BFS visits • If all nodes are visited, the network is connected C • F D Otherwise find an unvisited node and repeat BFS • H G 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 10

  10. Degree distribution: P(k) Path length: h Clustering coefficient: C Connected components: s 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 11

  11. MSN Messenger: ¡ 1 month of activity § 245 million users logged in § 180 million users engaged in conversations § More than 30 billion conversations § More than 255 billion exchanged messages 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 13

  12. 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 14

  13. Network: 180M people, 1.3B edges 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 15

  14. Messaging as an undirected graph • Edge (u,v) if users u and v exchanged at least 1 msg Contact Conversation • N=180 million people • E=1.3 billion edges 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 16

  15. 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 17

  16. Note: We plotted the same data as on the previous slide, just the axes are now logarithmic. 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 18

  17. Avg. clustering of the MSN: C = 0.1140 1 å = C C C k : average C i of nodes i of degree k: k i N = i : k k k i 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 19

  18. 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 20

  19. Steps #Nodes 0 1 1 10 2 78 3 3,96 4 8,648 # nodes as we do BFS out of a random node 5 3,299,252 Number of links 6 28,395,849 between pairs of 7 79,059,497 nodes in the 8 52,995,778 largest connected 9 10,321,008 component 10 1,955,007 11 518,410 12 149,945 13 44,616 14 13,740 15 4,476 16 1,542 17 536 18 167 19 71 20 29 21 16 Avg. path length 6.6 22 10 23 3 90% of the nodes can be reached in < 8 hops 24 2 25 3 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 21

  20. Heavily skewed; Degree distribution: avg. degree = 14.4 Path length: 6.6 Clustering coefficient: 0.11 Connectivity: giant component Are these values “expected”? Are they “surprising”? To answer this we need a model! 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 22

  21. a. Undirected network N=2,018 proteins as nodes E=2,930 binding interactions as links. b. Degree distribution: Skewed. Average degree <k>=2.90 c. Diameter: Avg. path length = 5.8 d. Clustering: Avg. clustering = 0.12 Connectivity: 185 components the largest component has 1,647 nodes (81% of nodes) 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 23

  22. ¡ Erdös-Renyi Random Graphs [Erdös-Renyi, ‘60] ¡ Two variants: § G np : undirected graph on n nodes where each edge (u,v) appears i.i.d. with probability p § G nm : undirected graph with n nodes, and m edges picked uniformly at random What kind of networks do such models produce? 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 25

  23. ¡ n and p do not uniquely determine the graph! § The graph is a result of a random process ¡ We can have many different realizations given the same n and p n = 10 p= 1/6 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 26

  24. Degree distribution: P(k) Path length: h Clustering coefficient: C What are the values of these properties for G np ? 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 27

  25. ¡ Fact: Degree distribution of G np is binomial. ¡ Let P(k) denote the fraction of nodes with degree k : - æ ö n 1 = ç ÷ - - - k n 1 k P ( k ) p ( 1 p ) ç ÷ P(k) k è ø Probability of Probability of missing the rest of k Select k nodes the n-1-k edges having k edges out of n-1 1/2 " % k = 1 − p 1 1 σ Mean, variance of a binomial distribution ≈ $ ' ( n − 1) 1/2 p ( n − 1) # & = ( - k p n 1 ) By the law of large numbers, as the network size increases, the distribution becomes increasingly σ 2 = p (1 − p )( n − 1) narrow—we are increasingly confident that the degree of a node is in the vicinity of k . 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 28

  26. 2 e ¡ Remember: = Where e i is the number i C of edges between i’s i - k ( k 1 ) neighbors i i ¡ Edges in G np appear i.i.d. with prob. p e i = p k i ( k i − 1) ¡ So, expected E[ e i ] is: 2 Number of distinct pairs of Each pair is connected neighbors of node i of degree k i with prob. p × - p k ( k 1 ) k k ¡ Then E[C i ] : = = = » C i i p - - k ( k 1 ) n 1 n i i Clustering coefficient of a random graph is small. If we generate bigger and bigger graphs with fixed avg. degree 𝑙 (that is we set 𝑞 = 𝑙 ⋅ 1/𝑜 ), then C decreases with the graph size n . 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 29

  27. æ - ö n 1 Degree distribution: - - = ç ÷ - k n 1 k P ( k ) p ( 1 p ) ç ÷ k è ø Clustering coefficient: C=p=k/n Path length: next! Connectivity: 9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend