Random Graph Models Prof. Srijan Kumar 1 Srijan Kumar, Georgia - - PowerPoint PPT Presentation

random graph models
SMART_READER_LITE
LIVE PREVIEW

Random Graph Models Prof. Srijan Kumar 1 Srijan Kumar, Georgia - - PowerPoint PPT Presentation

CSE 6240: Web Search and Text Mining. Spring 2020 Random Graph Models Prof. Srijan Kumar 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining Todays Lecture: Networks Networks introduction Web as a network


slide-1
SLIDE 1

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

1

CSE 6240: Web Search and Text Mining. Spring 2020

Random Graph Models

  • Prof. Srijan Kumar
slide-2
SLIDE 2

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

2

Today’s Lecture: Networks

  • Networks introduction
  • Web as a network
  • Networks properties
  • Random graph model: Erdos-Renyi Random Graph Model
  • Random graph model: Small-world Random Graph Model

Some slides are inspired by Prof. Jure Leskovec’s slides

slide-3
SLIDE 3

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

3

Simplest Model of Graphs

¡ Erdös-Renyi Random Graphs [Erdös-Renyi, 1960]

  • Two variants:

– Gn,p: undirected graph on n nodes and each edge (u,v) appears i.i.d. with probability p – Gn,m: undirected graph with n nodes and m edges, where edges are picked uniformly at random

  • What kind of networks do such models produce?
slide-4
SLIDE 4

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

4

Random Graph Models: Intuition

  • n and p do not uniquely determine the graph!

– The graph is a result of a random process

  • We can have many different realizations given the same

n and p

n = 10 p= 1/6

slide-5
SLIDE 5

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

5

Random Graph Model: Edges

  • How likely is a graph on E edges?
  • P(E): the probability that a given Gnp generates a graph on

exactly E edges:

where Emax=n(n-1)/2 is the maximum possible number of edges in an undirected graph of n nodes

  • P(E) is a Binomial distribution: Number of

successes in a sequence of Emax independent yes/no experiments

E E E

p p E E E P

  • ÷

÷ ø ö ç ç è æ =

max

) 1 ( ) (

max

slide-6
SLIDE 6

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

6

Node Degrees in a Random Graph

  • What is expected degree of a node?
  • Probability of node u linking to node v is p
  • u can link (flips a coin) to all other (n-1) nodes
  • Thus, the expected degree of node u is: p(n-1)

E[Xv]= E[Xvu]= (n −1)p

u=1 n−1

slide-7
SLIDE 7

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Key Network Properties

7

  • Degree distribution: P(k)
  • Clustering coefficient: C
  • Path length: h

What are the values of these properties for Gnp?

slide-8
SLIDE 8

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

8

Degree Distribution

  • Degree distribution of Gnp is binomial
  • Let P(k) denote the fraction of nodes with degree k:
  • Mean and variance of a binomial distribution

k n k

p p k n k P

  • ÷

÷ ø ö ç ç è æ

  • =

1

) 1 ( 1 ) (

Select k nodes

  • ut of n-1

Probability of having k edges Probability of missing the rest of the n-1-k edges

σ 2 = p(1− p)(n −1)

) 1 ( - = n p k

slide-9
SLIDE 9

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

9

Degree Distribution

  • As the network size increases, the distribution becomes

increasingly narrow—we are increasingly confident that the degree of a node is in the vicinity of k.

σ k = 1− p p 1 (n −1) " # $ % & '

1/2

≈ 1 (n −1)1/2

P(k) k

slide-10
SLIDE 10

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

10

Clustering Coefficient of Gnp

  • Clustering coefficient

– Where ei is the number of edges between i’s neighbors

  • So,
  • Clustering coefficient of a random graph is small

– Bigger graphs with the same average degree k have lower clustering coefficient

n k n k p k k k k p C

i i i i

»

  • =

=

  • ×

= 1 ) 1 ( ) 1 (

) 1 ( 2

  • =

i i i i

k k e C

ei = p ki(ki −1) 2

Number of distinct pairs of neighbors of node i of degree ki Each pair is connected with prob. p

slide-11
SLIDE 11

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Key Network Properties

11

  • Degree distribution:
  • Clustering coefficient: C=p=k/n
  • Path length: h

What are the values of these properties for Gnp?

k n k

p p k n k P

  • ÷

÷ ø ö ç ç è æ - =

1

) 1 ( 1 ) (

slide-12
SLIDE 12

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

12

Average Shortest Path

  • Average path length = O(log n)
  • Erdös-Renyi networks can grow to be very large but nodes

will be just a few hops apart

200000 400000 600000 800000 1000000 5 10 15 20

num nodes average shortest path

slide-13
SLIDE 13

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

MSN Network Properties vs. Gnp Properties

13

Degree distribution: Path length: 6.6 O(log n) ~ 8.2 Clustering coefficient: 0.11 k / n ≈ 8·10-8

MSN Gnp

slide-14
SLIDE 14

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

14

Clustering Implies Edge Locality

  • MSN network has 7 orders of magnitude larger clustering

than the corresponding Gnp!

  • Other examples:

– Actor Collaborations (IMDB): N = 225,226 nodes, avg. degree k = 61 – Electrical power grid: N = 4,941 nodes, k = 2.67 – Network of neurons: N = 282 nodes, k = 14

Network hactual hrandom Cactual Crandom Film actors 3.65 2.99 0.00027 Power Grid 18.70 12.40 0.005

  • C. elegans

2.65 2.25 0.05

slide-15
SLIDE 15

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

15

Gnp Simulation Experiment: Giant Component

  • n=100,000, k=p(n-1) = 0.5 … 3
  • Emergence of a giant component: average degree k=2E/n
  • r p=k/(n-1)

– When k=1-ε: all

components are of size Ω(log n)

– k=1+ε: 1 component

  • f size Ω(n), others

have size Ω(log n)

Fraction of nodes in the largest component

p*(n-1)=1

slide-16
SLIDE 16

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

16

Real Networks vs. Gnp

  • Are real networks like random graphs?

– Giant connected component: YES – Average path length: YES – Clustering Coefficient: NO – Degree Distribution: NO

slide-17
SLIDE 17

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

17

Real Networks vs. Gnp

  • Problems with the random networks model:

– Degree distribution differs from that of real networks – Giant component in most real networks does NOT emerge through a phase transition – No local structure – clustering coefficient is too low

  • Most important: Are real networks random?

– The answer is simply: NO!

slide-18
SLIDE 18

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

18

Real Networks vs. Gnp

  • If Gnp is wrong, why did we spend time on it?

– It is the reference model for the rest of the class. – It will help us calculate many quantities, that can then be compared to the real data – It will help us understand to what degree is a particular property the result of some random process

  • While Gnp is not realistic, it is useful
slide-19
SLIDE 19

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

19

Problem with the ER Model

  • Gnp model has short paths: O(log n)

– This is the smallest diameter we can get if we have a constant degree. – But clustering is low!

  • But real networks have “local” structure

– Triadic closure: Friend of a friend is my friend – High clustering but diameter is also high

  • Can we generate graphs with high clustering

coefficient while having short paths (low diameter)?

  • Solution: Small-World Model

Low diameter Low clustering coefficient High clustering coefficient High diameter

slide-20
SLIDE 20

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

20

Today’s Lecture: Networks

  • Networks introduction
  • Web as a network
  • Networks properties
  • Random graph model: Erdos-Renyi Random Graph Model
  • Random graph model: Small-world Random Graph Model

Some slides are inspired by Prof. Jure Leskovec’s slides

slide-21
SLIDE 21

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

21

Six Degrees of Kevin Bacon

Origins of a small-world idea:

  • The Bacon number:

– Create a network of Hollywood actors – Connect two actors if they co-appeared in

the movie

– Bacon number: number of steps to Kevin

Bacon

  • As of Dec 2007, the highest Bacon

number reported is 8

  • Only approx. 12% of all actors cannot be

linked to Bacon

slide-22
SLIDE 22

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

22

Erdos Number

  • Erdos Number: number of

hops in scientific co-author graph to reach Paul Erdos

  • Srijan’ Erdos number is 4.
  • Find out your Erdos number:

http://www.ams.org/mathscin et/collaborationDistance.html

slide-23
SLIDE 23

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

23

The Small-World Experiment

  • What is the typical shortest path length

between any two people?

– Experiment on the global friendship network

  • Can’t measure, need to probe explicitly
  • Small-world experiment [Milgram ’67]

– Picked 300 people in Omaha, Nebraska and Wichita, Kansas – Ask them to get a letter to a stock-broker in Boston by passing it through friends only

  • How many steps do you think it took?
slide-24
SLIDE 24

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

24

The Small-World Experiment

  • 64 chains completed (letters reached)

– It took 6.2 steps on the average, thus “6 degrees of separation”

  • Further observations:

– People who owned stock had shorter paths to the stockbroker than random people: 5.4 vs. 6.7 – People from the Boston area have even closer paths: 4.4

  • On average, you are 6 hops away from anyone in the

world!

Milgram’s small world experiment

slide-25
SLIDE 25

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

25

The Small-World Experiment #2: Columbia

  • In 2003 Dodds, Muhamad and Watts performed similar

experiments using e-mail:

– 18 targets of various backgrounds – 24,000 first steps (~1,500 per target) – 65% dropout per step – 384 chains completed (1.5%)

Average chain length = 4.01

Problem: People stop participating

Path length, h n(h)

slide-26
SLIDE 26

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

26

6-Degrees: Should We Be Surprised?

  • Assume each human is connected to 100 other people,

then:

– Step 1: reach 100 people – Step 2: reach 100*100 = 10,000 people – Step 3: reach 100*100*100 = 1,000,000 people – Step 4: reach 100*100*100*100 = 100M people – In 5 steps we can reach 10 billion people

slide-27
SLIDE 27

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

27

Small-World: How?

  • Could a network with high clustering be at the same

time a small world?

– How can we at the same time have high clustering and small diameter?

  • Intuition:

– Clustering implies edge “locality” – Randomness enables “shortcuts”

High clustering High diameter Low clustering Low diameter

slide-28
SLIDE 28

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

28

The Small-World Model [Watts-Strogatz ‘98]

Two components to the model:

  • (1) Start with a low-dimensional regular lattice

– In our case, we are using a ring as a lattice – Has high clustering coefficient, but has high diameter

  • (2) Rewire: introduce randomness (“shortcuts”)

– Add/remove edges to create shortcuts to join remote parts

  • f the lattice

– For each edge with probability p move the other end to a random node – Reduces the diameter by adding shortcuts

slide-29
SLIDE 29

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

29

Tuning Randomness in Small-World Model

  • Rewiring allows us to “interpolate” between a regular lattice

and a random graph

High clustering High diameter High clustering Low diameter Low clustering Low diameter

4 3 2 = = C k N h N k C N h = = log log a

C = ½

slide-30
SLIDE 30

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

30

Randomness vs Clustering

  • Intuition: It takes a lot of randomness to ruin the clustering,

but a very small amount to create shortcuts.

Clustering coefficient C = 1/n ∑ Ci

Parameter region of high clustering and low path length

Probability of rewiring, p

Clustering Coefficient Average path length

slide-31
SLIDE 31

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

31

  • Alternative formulation of the model:
  • 1. Start with a square grid
  • 2. Add 1 random long-range edge per node
  • Each node has 1 spoke. Then randomly connect

them.

  • Each node has 8 + 1 = 9 edges
  • Each node’s neighbors have 12 edges
  • Clustering
  • Diameter: O(log(n))

– Why?

Ci = 2⋅ei ki(ki −1) = 2⋅12 9⋅8 ≥ 0.33

Diameter of the Watts-Strogatz Model

slide-32
SLIDE 32

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

32

Proof: O(log n) Diameter of Small World Model

  • Convert 2x2 subgraphs into supernodes:

– Each supernode has 4 long-range edges sticking

  • ut: a 4-regular random graph!
  • Ignore the edges between neighboring supernodes

– Recall Gnp: short paths between super nodes Þ Path in the original graph = add at most 2 steps per long range edge (by traversing within supernodes) Þ Diameter of the model is O(2 + log n) = O (log n)

  • Edges between neighboring supernodes: these edges

will reduce the diameter further.

4-regular random graph on supernodes Super node

slide-33
SLIDE 33

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

33

Small-World: Summary

  • Could a network with high clustering be at the same

time a small world?

– Yes! You don’t need more than a few random links

  • The Watts-Strogatz/Small-World Model:

– Provides insight on the interplay between clustering and the small-world – Captures the structure of many realistic networks – Accounts for the high clustering of real networks – Does not lead to the correct degree distribution

slide-34
SLIDE 34

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

34

Today’s Lecture: Networks

  • Networks introduction
  • Web as a network
  • Networks properties
  • Random graph model: Erdos-Renyi Random Graph Model
  • Random graph model: Small-world Random Graph Model

Some slides are inspired by Prof. Jure Leskovec’s slides