Sampling Large Graphs: Algorithms and Applications
Don Towsley College of Information & Computer Science Umass - Amherst
Collaborators: P.H. Wang, J.C.S. Lui, J.Z. Zhou, X. Guan
Sampling Large Graphs: Algorithms and Applications Don Towsley - - PowerPoint PPT Presentation
Sampling Large Graphs: Algorithms and Applications Don Towsley College of Information & Computer Science Umass - Amherst Collaborators: P.H. Wang, J.C.S. Lui, J.Z. Zhou, X. Guan Measuring, analyzing large networks - large networks can be
Collaborators: P.H. Wang, J.C.S. Lui, J.Z. Zhou, X. Guan
3
3
3
3
3
www.flickr.com/
High school friendship network
High school friendship network
node pairs in this work
117 CCDF True distribution BFS, depth = 3
CCDF RW sampling
i
Markov model at steady state visits edges
18
CCDF RW sampling
i
Markov model at steady state visits edges
19
20 log(degree)
log(CCDF)
random walk
log(degree) log(CCDF
node sampling RW – estimates tail well node sampling – estimates small degrees well
22
23
networks to personal behavior on the web. In WWW 2008 (MSN) Can infer characteristics and make recommendations
24
friendship prediction interest recommendation …
25
u v
average distance
effective diameter (the 90th percentile of all distances) small world
26
all pairs - 𝑇 = { 𝑣, 𝑤 : 𝑣, 𝑤 ∈ 𝑊, 𝑣 ≠ 𝑤} one-hop pairs - pairs of connected nodes
two-hop pairs - pairs of nodes with at least one
27
𝑇: (𝜕1, … , 𝜕𝐿) 𝑇(1): (𝜕1
(1), … , 𝜕𝐿 (1))
𝑇(2): (𝜕1
2 , … , 𝜕𝐿 2 )
(1), 𝜕𝑙 (2)- fractions of node pairs in 𝑇, 𝑇 1 , 𝑇(2)
28
Facebook, Google+, Twitter, Facebook,
UVS (Uniform Vertex Sampling):
crawling - RW: sampling bias
29
independent WVS (IWVS) (if we have topology) Metropolis-Hastings WVS (MHWVS) (if not):
30
𝑗=1 𝑜
31
1 : 𝑣 ∈ 𝑊), where
1 = 𝑒𝑣
32
u
1 : 𝑣 ∈ 𝑊), where
1 = 𝑒𝑣
33
v u
(1) = 1
𝑗=1 𝑜
(1) = 𝜕𝑙 (1), 𝑙 = 1, … , 𝐿
34
1)
2)
35
x
1)
2)
(2), 𝑙 = 1, … , 𝐿
36
u v x
API not provided user IDs sparsely distributed
random walk: walker moves to random
we saw for connected non-bipartite graph
37
38
∗ = 1
𝑗=1 𝑜
∗ - asymptotically unbiased estimate
39
(1),
40
𝑣𝑗 𝑤𝑗 step i-1 step i+1 step i
current edge (𝑣, 𝑤) next edge: select randomly
RW on graph with edges as
41
(2)
42
(2∗) = 1
𝑗=1 𝑜
(2), 1 ≤ 𝑙 ≤ 𝐿
43
44
B - number of sampled node pairs |S| - total number of node pairs error metric - 𝑂𝑁𝑇𝐹
𝐹 𝜕𝑙−𝜕𝑙 2 𝜕𝑙
45
46
1.
2.
47
CCDF of common interest
# common interests
consequence of
48
49
IWVS better for small
requires knowledge of
topology
50
produce asymptotically unbiased estimates
51
Markov Chain Mixing Times other more “powerful” & “elegant” sampling methods: Frontier
Efficiently Estimating Motif Statistics of Large Networks in the
Design of Efficient Sampling Methods on Hybrid Social-
measuring, maximizing group closeness centrality over disk-
52