[PPT] - Random Graph Models Prof. Srijan Kumar 1 Srijan Kumar, Georgia PowerPoint Presentation

SLIDE 1

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

1

CSE 6240: Web Search and Text Mining. Spring 2020

Random Graph Models

Prof. Srijan Kumar

SLIDE 2

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

2

Today’s Lecture: Networks

Networks introduction
Web as a network
Networks properties
Random graph model: Erdos-Renyi Random Graph Model
Random graph model: Small-world Random Graph Model

Some slides are inspired by Prof. Jure Leskovec’s slides

SLIDE 3

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

3

Simplest Model of Graphs

¡ Erdös-Renyi Random Graphs [Erdös-Renyi, 1960]

Two variants:

– Gn,p: undirected graph on n nodes and each edge (u,v) appears i.i.d. with probability p – Gn,m: undirected graph with n nodes and m edges, where edges are picked uniformly at random

What kind of networks do such models produce?

SLIDE 4

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

4

Random Graph Models: Intuition

n and p do not uniquely determine the graph!

– The graph is a result of a random process

We can have many different realizations given the same

n and p

n = 10 p= 1/6

SLIDE 5

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

5

Random Graph Model: Edges

How likely is a graph on E edges?
P(E): the probability that a given Gnp generates a graph on

exactly E edges:

where Emax=n(n-1)/2 is the maximum possible number of edges in an undirected graph of n nodes

P(E) is a Binomial distribution: Number of

successes in a sequence of Emax independent yes/no experiments

E E E

p p E E E P

÷

÷ ø ö ç ç è æ =

max

) 1 ( ) (

max

SLIDE 6

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

6

Node Degrees in a Random Graph

What is expected degree of a node?
Probability of node u linking to node v is p
u can link (flips a coin) to all other (n-1) nodes
Thus, the expected degree of node u is: p(n-1)

E[Xv]= E[Xvu]= (n −1)p

u=1 n−1

∑

SLIDE 7

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Key Network Properties

7

Degree distribution: P(k)
Clustering coefficient: C
Path length: h

What are the values of these properties for Gnp?

SLIDE 8

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

8

Degree Distribution

Degree distribution of Gnp is binomial
Let P(k) denote the fraction of nodes with degree k:
Mean and variance of a binomial distribution

k n k

p p k n k P

÷

÷ ø ö ç ç è æ

=

1

) 1 ( 1 ) (

Select k nodes

ut of n-1

Probability of having k edges Probability of missing the rest of the n-1-k edges

σ 2 = p(1− p)(n −1)

) 1 ( - = n p k

SLIDE 9

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

9

Degree Distribution

As the network size increases, the distribution becomes

increasingly narrow—we are increasingly confident that the degree of a node is in the vicinity of k.

σ k = 1− p p 1 (n −1) " # $ % & '

1/2

≈ 1 (n −1)1/2

P(k) k

SLIDE 10

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

10

Clustering Coefficient of Gnp

Clustering coefficient

– Where ei is the number of edges between i’s neighbors

So,
Clustering coefficient of a random graph is small

– Bigger graphs with the same average degree k have lower clustering coefficient

n k n k p k k k k p C

i i i i

»

=

=

×

= 1 ) 1 ( ) 1 (

) 1 ( 2

=

i i i i

k k e C

ei = p ki(ki −1) 2

Number of distinct pairs of neighbors of node i of degree ki Each pair is connected with prob. p

SLIDE 11

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Key Network Properties

11

Degree distribution:
Clustering coefficient: C=p=k/n
Path length: h

What are the values of these properties for Gnp?

k n k

p p k n k P

÷

÷ ø ö ç ç è æ - =

1

) 1 ( 1 ) (

SLIDE 12

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

12

Average Shortest Path

Average path length = O(log n)
Erdös-Renyi networks can grow to be very large but nodes

will be just a few hops apart

200000 400000 600000 800000 1000000 5 10 15 20

num nodes average shortest path

SLIDE 13

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

MSN Network Properties vs. Gnp Properties

13

Degree distribution: Path length: 6.6 O(log n) ~ 8.2 Clustering coefficient: 0.11 k / n ≈ 8·10-8

MSN Gnp

SLIDE 14

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

14

Clustering Implies Edge Locality

MSN network has 7 orders of magnitude larger clustering

than the corresponding Gnp!

Other examples:

– Actor Collaborations (IMDB): N = 225,226 nodes, avg. degree k = 61 – Electrical power grid: N = 4,941 nodes, k = 2.67 – Network of neurons: N = 282 nodes, k = 14

Network hactual hrandom Cactual Crandom Film actors 3.65 2.99 0.00027 Power Grid 18.70 12.40 0.005

C. elegans

2.65 2.25 0.05

SLIDE 15

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

15

Gnp Simulation Experiment: Giant Component

n=100,000, k=p(n-1) = 0.5 … 3
Emergence of a giant component: average degree k=2E/n
r p=k/(n-1)

– When k=1-ε: all

components are of size Ω(log n)

– k=1+ε: 1 component

f size Ω(n), others

have size Ω(log n)

Fraction of nodes in the largest component

p*(n-1)=1

SLIDE 16

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

16

Real Networks vs. Gnp

Are real networks like random graphs?

– Giant connected component: YES – Average path length: YES – Clustering Coefficient: NO – Degree Distribution: NO

SLIDE 17

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

17

Real Networks vs. Gnp

Problems with the random networks model:

– Degree distribution differs from that of real networks – Giant component in most real networks does NOT emerge through a phase transition – No local structure – clustering coefficient is too low

Most important: Are real networks random?

– The answer is simply: NO!

SLIDE 18

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

18

Real Networks vs. Gnp

If Gnp is wrong, why did we spend time on it?

– It is the reference model for the rest of the class. – It will help us calculate many quantities, that can then be compared to the real data – It will help us understand to what degree is a particular property the result of some random process

While Gnp is not realistic, it is useful

SLIDE 19

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

19

Problem with the ER Model

Gnp model has short paths: O(log n)

– This is the smallest diameter we can get if we have a constant degree. – But clustering is low!

But real networks have “local” structure

– Triadic closure: Friend of a friend is my friend – High clustering but diameter is also high

Can we generate graphs with high clustering

coefficient while having short paths (low diameter)?

Solution: Small-World Model

Low diameter Low clustering coefficient High clustering coefficient High diameter

SLIDE 20

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

20

Today’s Lecture: Networks

Networks introduction
Web as a network
Networks properties
Random graph model: Erdos-Renyi Random Graph Model
Random graph model: Small-world Random Graph Model

Some slides are inspired by Prof. Jure Leskovec’s slides

SLIDE 21

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

21

Six Degrees of Kevin Bacon

Origins of a small-world idea:

The Bacon number:

– Create a network of Hollywood actors – Connect two actors if they co-appeared in

the movie

– Bacon number: number of steps to Kevin

Bacon

As of Dec 2007, the highest Bacon

number reported is 8

Only approx. 12% of all actors cannot be

linked to Bacon

SLIDE 22

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

22

Erdos Number

Erdos Number: number of

hops in scientific co-author graph to reach Paul Erdos

Srijan’ Erdos number is 4.
Find out your Erdos number:

http://www.ams.org/mathscin et/collaborationDistance.html

SLIDE 23

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

23

The Small-World Experiment

What is the typical shortest path length

between any two people?

– Experiment on the global friendship network

Can’t measure, need to probe explicitly
Small-world experiment [Milgram ’67]

– Picked 300 people in Omaha, Nebraska and Wichita, Kansas – Ask them to get a letter to a stock-broker in Boston by passing it through friends only

How many steps do you think it took?

SLIDE 24

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

24

The Small-World Experiment

64 chains completed (letters reached)

– It took 6.2 steps on the average, thus “6 degrees of separation”

Further observations:

– People who owned stock had shorter paths to the stockbroker than random people: 5.4 vs. 6.7 – People from the Boston area have even closer paths: 4.4

On average, you are 6 hops away from anyone in the

world!

Milgram’s small world experiment

SLIDE 25

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

25

The Small-World Experiment #2: Columbia

In 2003 Dodds, Muhamad and Watts performed similar

experiments using e-mail:

– 18 targets of various backgrounds – 24,000 first steps (~1,500 per target) – 65% dropout per step – 384 chains completed (1.5%)

Average chain length = 4.01

Problem: People stop participating

Path length, h n(h)

SLIDE 26

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

26

6-Degrees: Should We Be Surprised?

Assume each human is connected to 100 other people,

then:

– Step 1: reach 100 people – Step 2: reach 100100 = 10,000 people – Step 3: reach 100100100 = 1,000,000 people – Step 4: reach 100100100100 = 100M people – In 5 steps we can reach 10 billion people

SLIDE 27

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

27

Small-World: How?

Could a network with high clustering be at the same

time a small world?

– How can we at the same time have high clustering and small diameter?

Intuition:

– Clustering implies edge “locality” – Randomness enables “shortcuts”

High clustering High diameter Low clustering Low diameter

SLIDE 28

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

28

The Small-World Model [Watts-Strogatz ‘98]

Two components to the model:

(1) Start with a low-dimensional regular lattice

– In our case, we are using a ring as a lattice – Has high clustering coefficient, but has high diameter

(2) Rewire: introduce randomness (“shortcuts”)

– Add/remove edges to create shortcuts to join remote parts

f the lattice

– For each edge with probability p move the other end to a random node – Reduces the diameter by adding shortcuts

SLIDE 29

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

29

Tuning Randomness in Small-World Model

Rewiring allows us to “interpolate” between a regular lattice

and a random graph

High clustering High diameter High clustering Low diameter Low clustering Low diameter

4 3 2 = = C k N h N k C N h = = log log a

C = ½

SLIDE 30

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

30

Randomness vs Clustering

Intuition: It takes a lot of randomness to ruin the clustering,

but a very small amount to create shortcuts.

Clustering coefficient C = 1/n ∑ Ci

Parameter region of high clustering and low path length

Probability of rewiring, p

Clustering Coefficient Average path length

SLIDE 31

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

31

Alternative formulation of the model:
1. Start with a square grid
2. Add 1 random long-range edge per node
Each node has 1 spoke. Then randomly connect

them.

Each node has 8 + 1 = 9 edges
Each node’s neighbors have 12 edges
Clustering
Diameter: O(log(n))

– Why?

Ci = 2⋅ei ki(ki −1) = 2⋅12 9⋅8 ≥ 0.33

Diameter of the Watts-Strogatz Model

SLIDE 32

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

32

Proof: O(log n) Diameter of Small World Model

Convert 2x2 subgraphs into supernodes:

– Each supernode has 4 long-range edges sticking

ut: a 4-regular random graph!
Ignore the edges between neighboring supernodes

– Recall Gnp: short paths between super nodes Þ Path in the original graph = add at most 2 steps per long range edge (by traversing within supernodes) Þ Diameter of the model is O(2 + log n) = O (log n)

Edges between neighboring supernodes: these edges

will reduce the diameter further.

4-regular random graph on supernodes Super node

SLIDE 33

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

33

Small-World: Summary

Could a network with high clustering be at the same

time a small world?

– Yes! You don’t need more than a few random links

The Watts-Strogatz/Small-World Model:

– Provides insight on the interplay between clustering and the small-world – Captures the structure of many realistic networks – Accounts for the high clustering of real networks – Does not lead to the correct degree distribution

SLIDE 34

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

34

Today’s Lecture: Networks

Networks introduction
Web as a network
Networks properties
Random graph model: Erdos-Renyi Random Graph Model
Random graph model: Small-world Random Graph Model