http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford - - PowerPoint PPT Presentation

โ–ถ
http cs224w stanford edu 10 31 2012 jure leskovec
SMART_READER_LITE
LIVE PREVIEW

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford - - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2 [Mitzenmacher, 03]


slide-1
SLIDE 1

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

http://cs224w.stanford.edu

slide-2
SLIDE 2

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

slide-3
SLIDE 3

We will analyze the following model:

๏‚ก Nodes arrive in order 1,2,3, โ€ฆ , ๐‘œ ๏‚ก When node ๐‘˜ is created it makes a

single out-link to an earlier node ๐‘— chosen:

  • 1) With prob. ๐‘ž, ๐‘˜ links to ๐‘— chosen uniformly at

random (from among all earlier nodes)

  • 2) With prob. 1 โˆ’ ๐‘ž, node ๐‘˜ chooses node ๐‘—

uniformly at random and links to a node i points to.

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

[Mitzenmacher, โ€˜03]

Node i

p โˆ’ + = 1 1 1 ฮฑ

CLAIM: the model generates networks with power-law degree distribution with exponent:

slide-4
SLIDE 4

๏‚ก Plan: Analyze ๐’†๐’‹(๐’–): continuous deterministic

in-degree of node ๐‘— at time ๐‘ข > ๐‘—

๏‚ก Initial condition:

  • ๐‘’๐‘—(๐‘ข) = 0, when ๐‘ข = ๐‘— (node i just arrived)

๏‚ก Expected change of ๐’†๐’‹(๐’–) over time:

  • With prob. ๐‘ž node ๐‘ข + 1 links randomly:
  • Links to our node ๐‘— with prob. 1/๐‘ข
  • With prob. 1 โˆ’ ๐‘ž node ๐‘ข + 1 links preferentially:
  • Links to our node ๐‘— with prob.

๐‘’๐‘—(๐‘ข) ๐‘ข

๐’†๐’‹ ๐’– + ๐Ÿ โˆ’ ๐’†๐’‹ ๐’– = ๐ช ๐Ÿ ๐’– + ๐Ÿ โˆ’ ๐’’ ๐’†๐’‹(๐’–) ๐’–

๏‚ก How does ๐’†๐’‹(๐’–) change as ๐’–โ†’โˆž?

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

Node i

slide-5
SLIDE 5

๏‚ก Expected change of ๐’†๐’‹ ๐’– :

  • ๐’†๐’‹(๐’– + ๐Ÿ) โˆ’ ๐’†๐’‹(๐’–) = ๐’’

๐Ÿ ๐’– + ๐Ÿ โˆ’ ๐’’ ๐’†๐’‹(๐’–) ๐’–

  • d๐‘’๐‘—(๐‘ข)

d๐‘ข

= ๐‘ž

1 ๐‘ข + 1 โˆ’ ๐‘ž ๐‘’๐‘—(๐‘ข) ๐‘ข

=

๐‘ž+๐‘Ÿ๐‘’๐‘—(๐‘ข) ๐‘ข

  • 1

๐‘ž+๐‘Ÿ๐‘’๐‘—(๐‘ข) d๐‘’๐‘—(๐‘ข) = 1 ๐‘ข d๐‘ข

  • โˆซ

1 ๐‘ž+๐‘Ÿ๐‘’๐‘—(๐‘ข) d๐‘’๐‘—(๐‘ข) = โˆซ 1 ๐‘ข d๐‘ข

  • 1

๐‘Ÿ ln ๐‘ž + ๐‘Ÿ๐‘’๐‘— ๐‘ข

= ln ๐‘ข + ๐‘‘

  • ๐‘ž + ๐‘Ÿ๐‘’๐‘— ๐‘ข = ๐ต ๐‘ข๐‘Ÿ โ‡’ ๐’†๐’‹ ๐’– =

๐Ÿ ๐’“ ๐‘ฉ๐’–๐’“ โˆ’ ๐’’

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

๐‘Ÿ = (1 โˆ’ ๐‘ž) integrate Let ๐ต = ๐‘“๐‘‘ and exponentiate Divide by ๐‘ž + ๐‘Ÿ ๐‘’๐‘—(๐‘ข)

A=?

slide-6
SLIDE 6

What is the value of constant A?

๏‚ก We know: ๐‘’๐‘— ๐‘— = 0 ๏‚ก So: ๐‘’๐‘— ๐‘— = 1

๐‘Ÿ ๐ต๐‘—๐‘Ÿ โˆ’ ๐‘ž = 0

๏‚ก โ‡’ ๐‘ฉ = ๐’’

๐’‹๐’“

๏‚ก And so โ‡’ ๐’†๐’‹ ๐’– = ๐’’

๐’“ ๐’– ๐’‹ ๐’“

โˆ’ ๐Ÿ

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

๐’†๐’‹ ๐’– = ๐Ÿ ๐’“ ๐‘ฉ๐’–๐’“ โˆ’ ๐’’

Note: Old nodes (small ๐‘— values) have higher in-degrees ๐‘’๐‘—(๐‘ข)

slide-7
SLIDE 7

๏‚ก What is ๐‘ฎ(๐’) the fraction of nodes that has

degree at least ๐’ at time ๐’–?

  • How many nodes i have degree > ๐’?
  • ๐‘’๐‘— ๐‘ข =

๐‘ž ๐‘Ÿ ๐‘ข ๐‘— ๐‘Ÿ

โˆ’ 1 > ๐‘™

  • Solve for ๐‘— and obtain: ๐ฃ < ๐ฎ

๐’“ ๐’’ ๐’ โˆ’ ๐Ÿ โˆ’๐Ÿ

๐’“

๏‚ก There are ๐’– nodes total at time ๐’– so the

faction ๐‘ฎ(๐’) is:

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

q

k p q k F

1

1 ) (

โˆ’

๏ฃบ ๏ฃป ๏ฃน ๏ฃฏ ๏ฃฐ ๏ฃฎ + =

Note: F(k) is a CCDF

  • f the degree

distribution

slide-8
SLIDE 8

๏‚ก What is the fraction of nodes with

degree exactly ๐’?

  • Take the derivative of โˆ’๐บ(๐‘™) w.r.t ๐‘™
  • ๐บ(๐‘™) is CCDF, so โˆ’๐บ๐บ(๐‘™) is the PDF

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

p k p q p k F

q

โˆ’ + = โ‡’ ๏ฃบ ๏ฃป ๏ฃน ๏ฃฏ ๏ฃฐ ๏ฃฎ + =

โˆ’ โˆ’

1 1 1 1 1 ) ( '

1 1

ฮฑ

q.e.d.

q

k p q k F

1

1 ) (

โˆ’

๏ฃบ ๏ฃป ๏ฃน ๏ฃฏ ๏ฃฐ ๏ฃฎ + =

slide-9
SLIDE 9

๏‚ก Pref. attachment gives power-law degrees ๏‚ก Intuitively a reasonable process ๏‚ก Can tune ๐‘ž to get the observed exponent

  • On the web, ๐‘„[๐‘œ๐‘œ๐‘’๐‘“ โ„Ž๐‘๐‘ ๐‘’๐‘“๐‘’๐‘’๐‘“๐‘“ ๐‘™] ~ ๐‘™โˆ’2.1
  • 2.1 = 1 + 1/(1 โˆ’ ๐‘ž) โ‡’ ๐’’ ~ ๐Ÿ. ๐Ÿ

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

slide-10
SLIDE 10

๏‚ก Two changes from the Gnp

  • (1) Growth
  • (2) Preferential attachment

๏‚ก Do we need both? Yes!

  • Add growth to Gnp (i.e., ๐‘ž = 1):
  • ๐‘ฆ๐‘˜ = degree of node ๐‘˜ at the end
  • ๐‘Œ

๐‘˜(๐‘ฃ) = 1 if ๐‘ฃ links to ๐‘˜, else 0

  • ๐‘Œ

๐‘˜ = ๐‘Œ ๐‘˜(๐‘˜ + 1) + ๐‘Œ ๐‘˜(๐‘˜ + 2) + โ‹ฏ + ๐‘Œ ๐‘˜(๐‘œ)

  • ๐น[๐‘Œ

๐‘˜(๐‘ฃ)] = ๐‘„[๐‘ฃ ๐‘š๐‘—๐‘œ๐‘™๐‘ ๐‘ข๐‘œ ๐‘˜] = 1/(๐‘ฃ โˆ’ 1)

  • ๐น ๐‘Œ

๐‘˜ = โˆ‘ 1 ๐‘ฃโˆ’1 ๐‘œ ๐‘˜+1

= 1

๐‘˜ + 1 ๐‘˜+1 + โ‹ฏ + 1 ๐‘œโˆ’1 = ๐ผ๐‘œโˆ’1 โ€“ ๐ผ ๐‘˜

  • ๐น[๐‘Œ

๐‘˜] = log

(๐‘œ โˆ’ 1) โ€“ log (๐‘˜) = log ((๐‘œ โˆ’ 1)/๐‘˜) NOT ๐‘œ

๐‘˜ ๐›ฝ

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

Hnโ€ฆnth harmonic number: ๐ผ๐‘œ = 1 ๐‘™ โ‰ˆ log (๐‘œ)

๐‘œ ๐‘™=1

slide-11
SLIDE 11

๏‚ก Preferential attachment is not so good at

predicting network structure

  • Age-degree correlation
  • Solution: Node fitness (virtual degree)
  • Links among high degree nodes
  • On the web nodes sometime avoid linking to each other

๏‚ก Further questions:

  • What is a reasonable model for how people

sample through web-pages and link to them?

  • Short random walks
  • Effect of search engines โ€“ reaching pages based on

number of links to them

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

๐’†๐’‹ ๐’– = ๐’’ ๐’“ ๐’– ๐’‹

๐’“

โˆ’ ๐Ÿ

slide-12
SLIDE 12

3 3 3 2 2 log

log log log ) 1 log( log log

> = < < = ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃณ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃฑ =

โˆ’

ฮฑ ฮฑ ฮฑ ฮฑ

ฮฑ

n const h

n n n

Size of the biggest hub is of order O(N). Most nodes can be connected within two steps, thus the average path length will be independent of the network size. The average path length increases slower than

  • logarithmically. In Gnp all nodes have comparable degree,

thus most paths will have comparable length. In a scale- free network vast majority of the path go through the few high degree hubs, reducing the distances between nodes. Some models produce ๐›ฝ = 3. This was first derived by Bollobas et al. for the network diameter in the context of a dynamical model, but it holds for the average path length as well.

The second moment of the distribution is finite, thus in

many ways the network behaves as a random network. Hence the average path length follows the result that we derived for the random network model earlier.

Degree exponent

  • Avg. path

length Ultra small world Small world

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

slide-13
SLIDE 13

๐›ฝ = 1 Second moment ๐‘™2 diverges ๐‘™2 finite Average ๐‘™ diverges ๐‘™ finite Ultra small world behavior Small world Behaves like a random network The scale-free behavior is relevant Regime full of anomaliesโ€ฆ

web web internet actor collaboration metabolic citation

๐›ฝ = 2 ๐›ฝ = 3

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

slide-14
SLIDE 14

๏‚ก How does network

connectivity change as nodes get removed?

[Albert et al. 00; Palmer et al. 01]

๏‚ก Nodes can be removed:

  • Random failure:
  • Remove nodes uniformly at random
  • Targeted attack:
  • Remove nodes in order of decreasing degree

๏‚ก This is important for robustness of the internet

as well as epidemiology

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

slide-15
SLIDE 15

๏‚ก Real networks are resilient to random failures ๏‚ก Gnp has better resilience to targeted attacks

  • Need to remove all pages of degree >5 to disconnect the Web
  • But this is a very small fraction of all web pages

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

Fraction of removed nodes Mean path length Fraction of removed nodes Random failures Targeted attack Gnp network AS network Random failures Targeted attack

slide-16
SLIDE 16

๏‚ก There is no universal degree exponent

characterizing all networks

๏‚ก We need growth and the preferential attachment

for the emergence of scale-free property

  • The mechanism is domain dependent
  • Many processes give rise to scale-free networks

๏‚ก Modeling real networks:

  • Identify microscopic processes that occur in the network
  • Measure their frequency from real data
  • Develop dynamical models that capture these processes
  • If the model is correct, it should predict the observations

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

slide-17
SLIDE 17

๏‚ก Copying mechanism (directed network)

  • Select a node and an edge of this node
  • Attach to the endpoint of this edge

๏‚ก Walking on a network (directed network)

  • The new node connects to a node, then to every
  • first, second, โ€ฆ neighbor of this node

๏‚ก Attaching to edges

  • Select an edge and attach to both endpoints of this edge

๏‚ก Node duplication

  • Duplicate a node with all its edges
  • Randomly prune edges of new node

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20

slide-18
SLIDE 18
slide-19
SLIDE 19

๏‚ก Preferential attachment is a model

  • f a growing network

๏‚ก Can we find a more realistic model? ๏‚ก What governs network growth & evolution?

  • P1) Node arrival process:
  • When nodes enter the network
  • P2) Edge initiation process:
  • Each node decides when to initiate an edge
  • P3) Edge destination process:
  • The node determines destination of the edge

[Leskovec, Backstrom, Kumar, Tomkins, 2008]

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22

slide-20
SLIDE 20

๏‚ก 4 online social networks with

exact edge arrival sequence

  • For every edge (u,v) we know exact

time of the creation tuv

๏‚ก Directly observe mechanisms leading

to global network properties

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23

(F) (D) (A) (L)

and so on for millionsโ€ฆ

[Leskovec et al., KDD โ€™08]

slide-21
SLIDE 21

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24

(F) (D) (A) (L)

Flickr: Exponential Delicious: Linear Answers: Sub-linear LinkedIn: Quadratic

slide-22
SLIDE 22

๏‚ก How long do nodes live?

  • Node life-time is the time between the 1st

and the last edge of a node

๏‚ก How do nodes โ€œwake upโ€ to create links?

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 25

time 1st edge

  • f node i

Last edge

  • f node i

Lifetime of a node time 1st edge

  • f node i

Last edge

  • f node i

Node i creates edges

slide-23
SLIDE 23

๏‚ก Lifetime a:

Time between nodeโ€™s first and last edge

LinkedIn

Node lifetime is exponentially distributed: ๐‘ž๐‘š ๐‘ = ๐œ‡๐‘“โˆ’๐œ‡๐œ‡

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26

slide-24
SLIDE 24

๏‚ก How do nodes โ€œwake upโ€ to create edges?

  • Edge gap ๐œบ๐’† ๐’— : time between ๐‘’th and ๐‘’ + 1st

edge of node ๐‘—:

  • Let ๐‘ข๐‘’ ๐‘— be the creation time of ๐‘’-th edge of node ๐‘—
  • ๐œ€๐‘’ ๐‘ฃ = ๐‘ข๐‘’+1 ๐‘ฃ โˆ’ ๐‘ข๐‘’ ๐‘ฃ
  • ๐œบ๐’† is a distribution (histogram) of ๐œบ๐’† ๐’— over all nodes ๐‘ฃ

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 27

time 1st edge

  • f node i

Last edge

  • f node i

๐œ€1 ๐‘— ๐œ€2 ๐‘— ๐œ€3 ๐‘— Node u ๐œ€1 ๐‘ฃ Node v ๐œ€1 ๐‘ค Node w ๐œ€1 ๐‘ฅ

slide-25
SLIDE 25

ฮฒ ฮฑ

ฮด ฮด

โˆ’ โˆ’

โˆ e pg

1 1)

(

Edge gap ๐œบ๐’†: inter-arrival time between ๐‘’th and ๐‘’ + 1st edge

LinkedIn

For every d we make a separate histogram

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 28

Edge gap, ๐œ€1 Edge gap probability P(๐œ€1)

slide-26
SLIDE 26

๏‚ก How do ๐œท and ๐œธ change as a function of ๐’†?

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 29

d de

p

d d g ฮฒ ฮฑ

ฮด ฮด

โˆ’ โˆ’

โˆ ) (

To each plot of ๐œบ๐’† fit:

๐œท is constant! ๐œธ linearly increases!

slide-27
SLIDE 27

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 30

๏‚ก ๐œท const., ๐œธ linear in ๐’†. What does this mean? ๏‚ก Gaps get smaller with ๐’†!

Degree ๐‘’ = 1 ๐‘’ = 3 ๐‘’ = 2 Log ๐œบ๐’† Log ๐’’๐’‰(๐œบ๐’†)

d d d g

e p

โ‹… โˆ’ โˆ’

โˆ

ฮฒ ฮฑ

ฮด ฮด ) (

ฮฑ

ฮด

โˆ’

โˆ

d

slide-28
SLIDE 28

๏‚ก Source node i wakes up and creates an edge ๏‚ก How does i select a target node j?

  • What is the degree of the target j?
  • Does preferential attachment really hold?
  • How many hops away is the target j?
  • Are edges attaching locally?

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 31

2 3 4

slide-29
SLIDE 29

๏‚ก Are edges more likely to connect to higher

degree nodes? YES!

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 32

ฯ„

k k pe โˆ ) (

Gnp PA Flickr

Network ฯ„ Gnp PA 1 Flickr 1 Delicious 1 Answers 0.9 LinkedIn 0.6

[Leskovec et al., KDD โ€™08]

slide-30
SLIDE 30

u w v

๏‚ก Just before the edge (u,w) is placed how

many hops are between u and w?

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 33

Network % ฮ” Flickr 66% Delicious 28% Answers 23% LinkedIn 50% Fraction of triad closing edges

Real edges are local! Most of them close triangles!

Gnp PA Flickr

[Leskovec et al., KDD โ€™08]

slide-31
SLIDE 31

๏‚ก Focus only on triad-closing edges ๏‚ก New triad-closing edge (u,w) appears next ๏‚ก Model this as 2 independent choices:

  • 1. u choses neighbor v
  • 2. v choses neighbor w

and connect u to w

  • E.g.: Under Random-Random:
  • ๐‘ž ๐‘ฃ, ๐‘ฅ =

1 5 โ‹… 1 2 + 1 5 โ‹… 1 = 3 10

๏‚ก Under a particular pair of โ€œstrategiesโ€:

Likelihood of the graph = โˆ ๐‘ž ๐‘ฃ, ๐‘ฅ

๐‘ฃ,๐‘ฅ โˆˆ๐น

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 34

u w v vโ€™

[Leskovec et al., KDD โ€™08]

slide-32
SLIDE 32

๏‚ก Improvement in log-likelihood over baseline:

  • Baseline: Pick a random node 2 hops away

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 35

Strategy to select v (1st node) Select w (2nd node)

Strategies to pick a neighbor:

  • random: uniformly at random
  • deg: proportional to its degree
  • com: prop. to the number of common friends
  • last: prop. to time since last activity
  • comlast: prop. to com*last

u w v

[Leskovec et al., KDD โ€™08]

slide-33
SLIDE 33

๏‚ก The model of network evolution

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Process Model P1) Node arrival

  • Node arrival function is given

P2) Edge initiation

  • Node lifetime is exponential
  • Edge gaps get smaller as the

degree increases P3) Edge destination Pick edge destination using random-random

10/31/2012 36

[Leskovec et al., KDD โ€™08]

slide-34
SLIDE 34

๏‚ก Theorem: Exponential node lifetimes and

power-law with exponential cutoff edge gaps lead to power-law degree distributions

๏‚ก Comments:

  • The proof is based on a combination of

exponentials (see HW3)

  • Interesting as temporal behavior predicts a

structural network property

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 37

[Leskovec et al., KDD โ€™08]

slide-35
SLIDE 35

๏‚ก Given the model one can take an existing

network continue its evolution

๏‚ก Compare true and predicted (based on the

theorem) degree exponent:

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 38

slide-36
SLIDE 36

๏‚ก How do networks evolve at the macro level?

  • What are global phenomena of network growth?

๏‚ก Questions:

  • What is the relation between the number of nodes

n(t) and number of edges e(t) over time t?

  • How does diameter change as the network grows?
  • How does degree distribution evolve as the

network grows?

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 39

slide-37
SLIDE 37

๏‚ก N(t) โ€ฆ nodes at time t ๏‚ก E(t) โ€ฆ edges at time t ๏‚ก Suppose that

N(t+1) = 2 * N(t)

๏‚ก Q: what is

E(t+1) =2 * E(t)

๏‚ก A: over-doubled!

  • But obeying the Densification Power Law

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

[Leskovec et al., KDD 05]

40

slide-38
SLIDE 38

๏‚ก Networks are denser over time ๏‚ก Densification Power Law:

a โ€ฆ densification exponent (1 โ‰ค a โ‰ค 2)

๏‚ก What is the relation between

the number of nodes and the edges over time?

๏‚ก First guess: constant average

degree over time

Internet Citations a=1.2 a=1.6

N(t) E(t) N(t) E(t)

41 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

[Leskovec et al., KDD 05]

slide-39
SLIDE 39

๏‚ก Densification Power Law

  • the number of edges grows faster than the

number of nodes โ€“ average degree is increasing

a โ€ฆ densification exponent: 1 โ‰ค a โ‰ค 2:

  • a=1: linear growth โ€“ constant out-degree

(traditionally assumed)

  • a=2: quadratic growth โ€“ clique

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 42

  • r

equivalently

[Leskovec et al. KDD 05]

slide-40
SLIDE 40

๏‚ก Prior models and intuition say

that the network diameter slowly grows (like log N, log log N)

time diameter diameter size of the graph Internet Citations

๏‚ก Diameter shrinks over time

  • as the network grows the

distances between the nodes slowly decrease

43 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

[Leskovec et al. KDD 05]

slide-41
SLIDE 41

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

diameter size of the graph

Erdos-Renyi random graph

Densification exponent a =1.3

Densifying random graph has increasing diameterโ‡’ There is more to shrinking diameter than just densification Is shrinking diameter just a consequence of densification?

[Leskovec et al. TKDD 07]

44

slide-42
SLIDE 42

Is it the degree sequence? Compare diameter of a:

  • True network (red)
  • Random network with

the same degree distribution (blue)

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 45

size of the graph diameter Citations

Densification + degree sequence give shrinking diameter

slide-43
SLIDE 43

๏‚ก How does degree distribution evolve to allow

for densification?

๏‚ก Option 1) Degree exponent ๐›ฝ๐‘ข is constant:

  • Fact 1: For 1 < ๐›ฝ๐‘ข < 2 constant, then: ๐’ƒ = ๐Ÿ‘/๐œท

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Email network [Leskovec et al. TKDD 07]

46

A consequence of what we learned in last class: โ–  Power-laws with exponents <2 have infinite expectations. โ–  So, by maintaining constant degree exponent ๐›ฝ the average degree grows.

slide-44
SLIDE 44

๏‚ก How does degree distribution evolve to allow

for densification?

๏‚ก Option 2) ๐›ฝ๐‘ข evolves with graph size ๐‘œ:

  • Fact 2: For ๐›ฝ๐‘ข =

4๐‘œ๐‘ข

๐‘โˆ’1โˆ’1

2๐‘œ๐‘ข

๐‘โˆ’1โˆ’1

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 47

Citation network [Leskovec et al. TKDD 07] Remember, expected degree is: ๐น ๐‘ฆ = ๐›ฝ โˆ’ 1 ๐›ฝ โˆ’ 2 ๐‘ฆ๐‘› So ๐›ฝ has to decay as as function of graph size for the avg. degree to go up

slide-45
SLIDE 45

๏‚ก Want to model graphs that density and have

shrinking diameters

๏‚ก Intuition:

  • How do we meet friends at a party?
  • How do we identify references when writing

papers?

10/31/2012 48

[Leskovec et al. TKDD 07]

v w

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-46
SLIDE 46

๏‚ก The Forest Fire model has 2 parameters:

  • p โ€ฆ forward burning probability
  • r โ€ฆ backward burning probability

๏‚ก The model:

  • Each turn a new node v arrives
  • Uniformly at random chooses an โ€œambassadorโ€ w
  • Flip 2 geometric coins to determine the number of

in- and out-links of w to follow

  • โ€œFireโ€ spreads recursively until it dies
  • New node v links to all burned nodes

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 49 10/31/2012

[Leskovec et al. TKDD 07] Geometric distribution:

slide-47
SLIDE 47

๏‚ก Forest Fire generates graphs that densify

and have shrinking diameter

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 50

densification diameter 1.32 N(t) E(t) N(t) diameter

slide-48
SLIDE 48

๏‚ก Forest Fire also generates graphs with

power-law degree distribution

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 51

in-degree

  • ut-degree

log count vs. log in-degree log count vs. log out-degree

10/31/2012

slide-49
SLIDE 49

๏‚ก Fix backward

probability r and vary forward burning prob. p

๏‚ก Notice a sharp

transition between sparse and clique-like graphs

๏‚ก Sweet spot is

very narrow

Sparse graph Clique-like graph Increasing diameter Decreasing diameter Constant diameter

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 52 10/31/2012