http cs224w stanford edu 10 25 2010 jure leskovec
play

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,


  1. CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu

  2. 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

  3.  [Faloutsos Faloutsos and Faloutsos 1999]  [Faloutsos, Faloutsos and Faloutsos, 1999] Internet domain topology Internet domain topology 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

  4.  [Barabasi Albert 1999]  [Barabasi ‐ Albert, 1999] Actor collaborations Web graph Power ‐ grid 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

  5.  [Broder Kumar Maghoul Raghavan  [Broder, Kumar, Maghoul, Raghavan, Rajagopalan, Stata, Tomkins, Wiener, 2000] 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

  6. [Leskovec et al. KDD ‘08]  Take real network plot a histogram of p vs k  Take real network plot a histogram of p k vs. k Flickr social Flickr social network n= 584,207, m=3,555,115 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

  7. [Leskovec et al. KDD ‘08]  Plot the same data on log log axis:  Plot the same data on log ‐ log axis: Flickr social network network n= 584,207, m=3,555,115 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

  8.  Degrees are heavily skewed:  Degrees are heavily skewed: Distribution P(X>x) is heavy tailed if: 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

  9. [Clauset ‐ Shalizi ‐ Newman 2007]  Power law vs exponential on log log scales  Power ‐ law vs. exponential on log ‐ log scales 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

  10. [Clauset ‐ Shalizi ‐ Newman 2007]  Various names kinds and forms:  Various names, kinds and forms:  Long tail, Heavy tail, Zipf’s law, Pareto’s law  P(x) is proportional to:  P(x) is proportional to: 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

  11.  In social systems – lots of power laws:  In social systems – lots of power ‐ laws:  Pareto, 1897 – Wealth distribution  L tk 1926  Lotka 1926 – Scientific output S i tifi t t  Yule 1920s – Biological taxa and subtaxa  Zipf 1940s – Word frequency Zi f 1940 W d f  Simon 1950s – City populations 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

  12. [Clauset ‐ Shalizi ‐ Newman 2007] Many other quantities follow heavy ‐ tailed distributions 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

  13. [Chris Anderson, Wired, 2004] 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

  14. CMU grad ‐ students at the G20 meeting in Pittsburgh in Sept 2009 b h 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

  15.  Power ‐ law degree exponent is g p typically 2 <  < 3  Web graph:   in = 2.1,  out = 2.4 [Broder et al. 00]  Autonomous systems:   = 2 4 [Faloutsos 3 99]  = 2.4 [Faloutsos , 99]  Actor ‐ collaborations:   = 2.3 [Barabasi ‐ Albert 00]  Citations to papers:    3 [Redner 98]  Online social networks:  Online social networks:    2 [Leskovec et al. 07] 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

  16. [Clauset ‐ Shalizi ‐ Newman 2007]  What is the normalizing constant? What is the normalizing constant? P(x) = c x -  c=? 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16

  17. [Clauset ‐ Shalizi ‐ Newman 2007]  What’s the expectation of a power ‐ law rnd var? p p E[x]= 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

  18.  Power laws: Infinite moments!  Power ‐ laws: Infinite moments!  If α ≤ 2 : E [x]= ∞  If  If α ≤ 3 : Var [x]= ∞ ≤ 3 V [ ]  Sample average of n samples form a p g p power ‐ law with exponent α : 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

  19. [Clauset ‐ Shalizi ‐ Newman 2007]  Estimating  from data: Estimating  from data: BAD! 1. Fit a line on log ‐ log axis using least squares using least squares 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

  20. [Clauset ‐ Shalizi ‐ Newman 2007]  Estimating  from data: 2. Plot Complementary CDF P(X>x) Then α =1+ α ’ where α ’ is the slope of P(X>x) . E.i., if P(X=x)  x - α then P(X> x)  x -( α -1) α th )  Ok Ok E i if P(X P(X> ) ( α 1) 10/25/2010 20

  21. [Clauset ‐ Shalizi ‐ Newman 2007]  Estimating power ‐ law exponent  from data: Estimating power law exponent  from data: 3. Use MLE:  = x i is degree of node i Best 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21

  22. Linear scale Log scale, L l α =1.75 CCDF, Log CCDF, Log scale, α =1.75, scale, α =1.75 , exp cutoff exp. cutoff 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22

  23.  Not well characterized by the mean: y  Avg. U.S. city size: 165k, StdDev=410k  If human heights in US would be power ‐ law:  Expect to have 60k as high as 2.72m (world record), 10k people as high as giraffe, 1 person as high as Empire State Building  Can not arise from sums of independent events  Recall: in G np each pair of nodes in connected independently with prob. p ith b  X… degree of node v, X w … event that w links to v  X =  w X w , E[x i ]=  w E[X w ] = (n-1)p  Now what is Pr[X=k]?  Now what is Pr[X=k]?  Central limit theorem:  x 1 ,…,x n : rnd. vars with mean  , var  2 n X i :  S =  i S n  i X i : E[S ]=n  E[S n ] n  , var[S ]=n  2 var[S n ] n  , std dev[S ]=  n std dev[S n ]  n  P[S n =E[S n ]+X*std.dev.(S n )] ~ 1/(2  ) exp(-x 2 /2) 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23

  24. Random network Scale ‐ free (power ‐ law) network (Erdos ‐ Renyi random graph) Degree Function is distribution is scale free if: l f if Power ‐ law f(ax) = c f(x) Degree distribution is Binomial 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Part 1 ‐ 24

  25.  What is a good model that gives rise to  What is a good model that gives rise to power ‐ law degree distributions?  What is the analog of central limit theorem for power ‐ laws? for power ‐ laws? 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 25

  26.  Preferential attachment  Preferential attachment [Price 1965, Albert ‐ Barabasi 1999]:  Nodes arrive in order Nodes arrive in order  A new node j creates m out ‐ links  Prob. of linking to a previous node i is g p proportional to its degree d i d d   i P ( j i )   d k k 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26

  27.  New nodes are more likely to link to y nodes that already have high degree  Herbert Simon’s result:  Power ‐ laws arise from “Rich get richer” (cumulative advantage) ( l i d )  Examples [Price 65]:  Examples [Price 65]:  Citations: new citations of a paper are proportional to the number it already has proportional to the number it already has 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 27

  28. [Mitzenmacher, ‘03]  Pages are created in order 1 2 3  Pages are created in order 1,2,3,…,n n  When node j is created it makes a single link to an earlier node i chosen: single link to an earlier node i chosen: 1) With prob. p , j links to i chosen uniformly at random (from among all earlier nodes) random (from among all earlier nodes) 2) With prob. 1-p , node j chooses node i uniformly at random and links to the node i points to at random and links to the node i points to. Note this is same as saying: 2)With prob 1-p node j links to node u with prob 2)With prob. 1 p , node j links to node u with prob. proportional to d u (the degree of u ) 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 28

  29.  Claim: The described model generates  Claim: The described model generates networks where the fraction of nodes with degree k scales as: degree k scales as: 1  (   ( 1 1 ) )   q P ( d k ) k i where q=1-p 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend