http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

 [Faloutsos Faloutsos and Faloutsos 1999]  [Faloutsos, Faloutsos and Faloutsos, 1999] Internet domain topology Internet domain topology 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

 [Barabasi Albert 1999]  [Barabasi ‐ Albert, 1999] Actor collaborations Web graph Power ‐ grid 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

 [Broder Kumar Maghoul Raghavan  [Broder, Kumar, Maghoul, Raghavan, Rajagopalan, Stata, Tomkins, Wiener, 2000] 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

[Leskovec et al. KDD ‘08]  Take real network plot a histogram of p vs k  Take real network plot a histogram of p k vs. k Flickr social Flickr social network n= 584,207, m=3,555,115 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

[Leskovec et al. KDD ‘08]  Plot the same data on log log axis:  Plot the same data on log ‐ log axis: Flickr social network network n= 584,207, m=3,555,115 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

 Degrees are heavily skewed:  Degrees are heavily skewed: Distribution P(X>x) is heavy tailed if: 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

[Clauset ‐ Shalizi ‐ Newman 2007]  Power law vs exponential on log log scales  Power ‐ law vs. exponential on log ‐ log scales 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

[Clauset ‐ Shalizi ‐ Newman 2007]  Various names kinds and forms:  Various names, kinds and forms:  Long tail, Heavy tail, Zipf’s law, Pareto’s law  P(x) is proportional to:  P(x) is proportional to: 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

 In social systems – lots of power laws:  In social systems – lots of power ‐ laws:  Pareto, 1897 – Wealth distribution  L tk 1926  Lotka 1926 – Scientific output S i tifi t t  Yule 1920s – Biological taxa and subtaxa  Zipf 1940s – Word frequency Zi f 1940 W d f  Simon 1950s – City populations 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

[Clauset ‐ Shalizi ‐ Newman 2007] Many other quantities follow heavy ‐ tailed distributions 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

[Chris Anderson, Wired, 2004] 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

CMU grad ‐ students at the G20 meeting in Pittsburgh in Sept 2009 b h 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

 Power ‐ law degree exponent is g p typically 2 <  < 3  Web graph:   in = 2.1,  out = 2.4 [Broder et al. 00]  Autonomous systems:   = 2 4 [Faloutsos 3 99]  = 2.4 [Faloutsos , 99]  Actor ‐ collaborations:   = 2.3 [Barabasi ‐ Albert 00]  Citations to papers:    3 [Redner 98]  Online social networks:  Online social networks:    2 [Leskovec et al. 07] 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

[Clauset ‐ Shalizi ‐ Newman 2007]  What is the normalizing constant? What is the normalizing constant? P(x) = c x -  c=? 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16

[Clauset ‐ Shalizi ‐ Newman 2007]  What’s the expectation of a power ‐ law rnd var? p p E[x]= 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

 Power laws: Infinite moments!  Power ‐ laws: Infinite moments!  If α ≤ 2 : E [x]= ∞  If  If α ≤ 3 : Var [x]= ∞ ≤ 3 V [ ]  Sample average of n samples form a p g p power ‐ law with exponent α : 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

[Clauset ‐ Shalizi ‐ Newman 2007]  Estimating  from data: Estimating  from data: BAD! 1. Fit a line on log ‐ log axis using least squares using least squares 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

[Clauset ‐ Shalizi ‐ Newman 2007]  Estimating  from data: 2. Plot Complementary CDF P(X>x) Then α =1+ α ’ where α ’ is the slope of P(X>x) . E.i., if P(X=x)  x - α then P(X> x)  x -( α -1) α th )  Ok Ok E i if P(X P(X> ) ( α 1) 10/25/2010 20

[Clauset ‐ Shalizi ‐ Newman 2007]  Estimating power ‐ law exponent  from data: Estimating power law exponent  from data: 3. Use MLE:  = x i is degree of node i Best 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21

Linear scale Log scale, L l α =1.75 CCDF, Log CCDF, Log scale, α =1.75, scale, α =1.75 , exp cutoff exp. cutoff 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22

 Not well characterized by the mean: y  Avg. U.S. city size: 165k, StdDev=410k  If human heights in US would be power ‐ law:  Expect to have 60k as high as 2.72m (world record), 10k people as high as giraffe, 1 person as high as Empire State Building  Can not arise from sums of independent events  Recall: in G np each pair of nodes in connected independently with prob. p ith b  X… degree of node v, X w … event that w links to v  X =  w X w , E[x i ]=  w E[X w ] = (n-1)p  Now what is Pr[X=k]?  Now what is Pr[X=k]?  Central limit theorem:  x 1 ,…,x n : rnd. vars with mean  , var  2 n X i :  S =  i S n  i X i : E[S ]=n  E[S n ] n  , var[S ]=n  2 var[S n ] n  , std dev[S ]=  n std dev[S n ]  n  P[S n =E[S n ]+X*std.dev.(S n )] ~ 1/(2  ) exp(-x 2 /2) 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23

Random network Scale ‐ free (power ‐ law) network (Erdos ‐ Renyi random graph) Degree Function is distribution is scale free if: l f if Power ‐ law f(ax) = c f(x) Degree distribution is Binomial 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Part 1 ‐ 24

 What is a good model that gives rise to  What is a good model that gives rise to power ‐ law degree distributions?  What is the analog of central limit theorem for power ‐ laws? for power ‐ laws? 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 25

 Preferential attachment  Preferential attachment [Price 1965, Albert ‐ Barabasi 1999]:  Nodes arrive in order Nodes arrive in order  A new node j creates m out ‐ links  Prob. of linking to a previous node i is g p proportional to its degree d i d d   i P ( j i )   d k k 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26

 New nodes are more likely to link to y nodes that already have high degree  Herbert Simon’s result:  Power ‐ laws arise from “Rich get richer” (cumulative advantage) ( l i d )  Examples [Price 65]:  Examples [Price 65]:  Citations: new citations of a paper are proportional to the number it already has proportional to the number it already has 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 27

[Mitzenmacher, ‘03]  Pages are created in order 1 2 3  Pages are created in order 1,2,3,…,n n  When node j is created it makes a single link to an earlier node i chosen: single link to an earlier node i chosen: 1) With prob. p , j links to i chosen uniformly at random (from among all earlier nodes) random (from among all earlier nodes) 2) With prob. 1-p , node j chooses node i uniformly at random and links to the node i points to at random and links to the node i points to. Note this is same as saying: 2)With prob 1-p node j links to node u with prob 2)With prob. 1 p , node j links to node u with prob. proportional to d u (the degree of u ) 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 28

 Claim: The described model generates  Claim: The described model generates networks where the fraction of nodes with degree k scales as: degree k scales as: 1  (   ( 1 1 ) )   q P ( d k ) k i where q=1-p 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 29

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs246.stanford.edu Instructor: Jure Leskovec TAs: Aditya Parameswaran

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it?

http://cs224w.stanford.edu Probabilistic models of network contagion Probabilistic models

http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link prediction task: Link

http://cs224w.stanford.edu How to organize/navigate it? First try: Human curated Web

Pseudo-Bayesian Inference for Complex Survey Data Matt Williams 1 Terrance Savitsky 2 1 National

Decision tree learning Introduction to Machine Learning Task of classification Automatically

HBV-23 Trial Heplisav-B vs Engerix-B in Adults 18-70, including DM HBV-23 Trial: Study Design

Mean field limit of controlled system: From discrete to continuous problems. Nicolas Gast 1 EPFL

707.000 Web Science and Web Technology gy Network Evolution and Processes Markus

. IoT or Internet of {Things,Threats} Thomas (@nyx__o) Malware Researcher at ESET CTF lover

Leadership Briefings Summer Term 2017 - 2018 Introduction and Welcome Slides are available @

David Sankoff's projects: a biased sample of the first 25 years Vieques, Puerto Rico 1963

Sambuz

Useful Links

Newsletter

Mail Us