Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
1
CSE 6240: Web Search and Text Mining. Spring 2020
Random Graph Models
- Prof. Srijan Kumar
Random Graph Models Prof. Srijan Kumar 1 Srijan Kumar, Georgia - - PowerPoint PPT Presentation
CSE 6240: Web Search and Text Mining. Spring 2020 Random Graph Models Prof. Srijan Kumar 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining Todays Lecture: Networks Networks introduction Web as a network
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
1
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
2
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
3
¡ Erdös-Renyi Random Graphs [Erdös-Renyi, 1960]
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
4
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
5
where Emax=n(n-1)/2 is the maximum possible number of edges in an undirected graph of n nodes
E E E
max
max
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
6
u=1 n−1
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
7
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
8
k n k
1
Select k nodes
Probability of having k edges Probability of missing the rest of the n-1-k edges
σ 2 = p(1− p)(n −1)
) 1 ( - = n p k
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
9
1/2
P(k) k
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
10
i i i i
) 1 ( 2
i i i i
k k e C
Number of distinct pairs of neighbors of node i of degree ki Each pair is connected with prob. p
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
11
k n k
1
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
12
200000 400000 600000 800000 1000000 5 10 15 20
num nodes average shortest path
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
13
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
14
Network hactual hrandom Cactual Crandom Film actors 3.65 2.99 0.00027 Power Grid 18.70 12.40 0.005
2.65 2.25 0.05
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
15
components are of size Ω(log n)
have size Ω(log n)
Fraction of nodes in the largest component
p*(n-1)=1
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
16
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
17
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
18
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
19
Low diameter Low clustering coefficient High clustering coefficient High diameter
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
20
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
21
the movie
Bacon
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
22
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
23
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
24
Milgram’s small world experiment
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
25
Average chain length = 4.01
Problem: People stop participating
Path length, h n(h)
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
26
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
27
High clustering High diameter Low clustering Low diameter
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
28
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
29
High clustering High diameter High clustering Low diameter Low clustering Low diameter
4 3 2 = = C k N h N k C N h = = log log a
C = ½
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
30
Clustering coefficient C = 1/n ∑ Ci
Parameter region of high clustering and low path length
Probability of rewiring, p
Clustering Coefficient Average path length
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
31
them.
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
32
will reduce the diameter further.
4-regular random graph on supernodes Super node
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
33
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
34