An Efficient reconciliation algorithm for social networks
Silvio Lattanzi (Google Research NY) Joint work with: Nitish Korula (Google Research NY) ICERM Stochastic Graph Models
An Efficient reconciliation algorithm for social networks Silvio - - PowerPoint PPT Presentation
An Efficient reconciliation algorithm for social networks Silvio Lattanzi (Google Research NY) Joint work with: Nitish Korula (Google Research NY) ICERM Stochastic Graph Models Outline Graph reconciliation Model and theoretical results.
Silvio Lattanzi (Google Research NY) Joint work with: Nitish Korula (Google Research NY) ICERM Stochastic Graph Models
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Intra-language network
Stochastic Graph Models, ICERM
Intra-language network Inter-language network
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
social networks
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Ground truth 24000 matching across the two social networks
Stochastic Graph Models, ICERM
Ground truth 24000 matching across the two social networks
Stochastic Graph Models, ICERM
80 me-links
Ground truth 24000 matching across the two social networks They could re-identify 30.8% of the mappings.
Stochastic Graph Models, ICERM
80 me-links
Algorithm:
Stochastic Graph Models, ICERM
Algorithm:
Stochastic Graph Models, ICERM
Algorithm:
Stochastic Graph Models, ICERM
2
Algorithm:
Stochastic Graph Models, ICERM
2 1
Algorithm:
Stochastic Graph Models, ICERM
2 1
Algorithm:
Stochastic Graph Models, ICERM
Algorithm:
Stochastic Graph Models, ICERM
Why? Is it necessary to have high degree me-links?
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Underlying social network
Stochastic Graph Models, ICERM
Underlying social network p1 p2 Delete the edges independently
Stochastic Graph Models, ICERM
Underlying social network p1 p2 Delete the edges independently Initial matchings
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Algorithm: Narayanan Shmatikov + degree bucketing + acceptance threshold
Stochastic Graph Models, ICERM
p p1 p2
Stochastic Graph Models, ICERM
p p1 p2
Stochastic Graph Models, ICERM
c log n n ≤ p ≤ 1 6, l, p1, p2 ∈ O(1)
x = (n − 2)p2p1p2
P = " n X
i=1
Bi ≤ 2 # = (1 − x)n + nx(1 − x)n−1 + ✓n 2 ◆ x2(1 − x)n−2 = 1 − n3x3 − o(n3x3)
self-loops
edges with probability proportional to the current degrees
Stochastic Graph Models, ICERM
Gm
1
Gm
n
Gm
n−1
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
neighbors.
Stochastic Graph Models, ICERM
neighbors.
in turn help to detect small degree nodes.
Stochastic Graph Models, ICERM
Nodes inserted after time , for constant , have degree in
φn φ
Stochastic Graph Models, ICERM
Nodes inserted after time , for constant , have degree in
For nodes of degree greater than a constant fraction of their neighbors has been inserted after time , for constant
φn φ
log2 n
✏n
✏
Stochastic Graph Models, ICERM
Nodes inserted after time , for constant , have degree in
For nodes of degree greater than a constant fraction of their neighbors has been inserted after time , for constant
All nodes inserted before time , have degree at least
φn φ
log2 n
✏n
✏
n0.3
log3 n
Stochastic Graph Models, ICERM
Gm
1
Gm
n
Stochastic Graph Models, ICERM
Gm
1
Gm
n
φn
Stochastic Graph Models, ICERM
Gm
1
Gm
n
φn
λn
Stochastic Graph Models, ICERM
The probability that a node increase its degree is dominated by the probability of an head in a coin toss for a biased coin that gives head with probability
Gm
1
Gm
n
φn
λn
3di φn
Stochastic Graph Models, ICERM
✏n
1 2d
Gm
1
Gm
n
Stochastic Graph Models, ICERM
The probability that the node increases its degree is dominated by the probability of an head in a coin toss for a biased coin that gives head with probability ✏n
1 2d
Gm
1
Gm
n
d 2nm
Stochastic Graph Models, ICERM
From Cooper and Frieze result on the cover time of PA graphs, Playing a bit with algebra we can get the final result.
Dk = dnm(v1) + dnm(v2) + · · · + dnm(vk) Pr ⇣ |Dk − 2 √ 2kn| ≥ 3 p mn log mn ⌘ ≤ (mn)−2
Pr(dn(vk+1) = d + 1|Dk − 2k = s) ≤ s + d 2N − 2k − s − d
Stochastic Graph Models, ICERM
neighbors.
in turn help to detect small degree nodes.
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Gm
1
Gm
n
Stochastic Graph Models, ICERM
Gm
1
Gm
n
Stochastic Graph Models, ICERM
Gm
1
Gm
n
Stochastic Graph Models, ICERM
neighbors.
in turn help to detect small degree nodes.
Stochastic Graph Models, ICERM
Gm
1
Gm
n
n0.3
Stochastic Graph Models, ICERM
Gm
1
Gm
n
n0.3
4 3 0.3
3) 20.3
3) 30.3
4 3 0.3
2 −✏)0.3
2 −✏) 20.3
2 −✏) 30.3
Stochastic Graph Models, ICERM
Gm
1
Gm
n
n0.3
4 3 0.3
3) 20.3
3) 30.3
4 3 0.3
nb
X
i=na nb
X
j=na nb
X
k=na
✓ log3 n (i − 1) ◆2 ✓ log3 n (j − 1) ◆2 ✓ log3 n (k − 1) ◆2 ≈ n2b−3a ∈ o(1)
Stochastic Graph Models, ICERM
neighbors.
that in turn help to detect small degree nodes.
Stochastic Graph Models, ICERM
Gm
1
Gm
n
n0.3
Stochastic Graph Models, ICERM
Gm
1
Gm
n
n0.3
n0.25
Gm
1
Gm
n
Stochastic Graph Models, ICERM
Gm
1
Gm
n
n0.3
n0.25
Gm
1
Gm
n
Gm
1
Gm
n
Stochastic Graph Models, ICERM
Gm
1
Gm
n
n0.3
n0.25
Gm
1
Gm
n
Gm
1
Gm
n
Stochastic Graph Models, ICERM
neighbors.
in turn help to detect small degree nodes.
Stochastic Graph Models, ICERM
If the underlying network is a G(n,p) graph it is possible to reconcile it completely
If the underlying network is a PA graph it is possible to reconcile it a large fraction
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Experiments on different graphs:
Stochastic Graph Models, ICERM
Are our theoretical results robust?
Stochastic Graph Models, ICERM
How does the algorithm scale with the size of the graph?
Stochastic Graph Models, ICERM
How does the algorithm perform if the underlying graph is a social network?
Stochastic Graph Models, ICERM
How does the algorithm perform if the underlying graph is a social network? 80% recall!! Can we explain it in theory?
Stochastic Graph Models, ICERM
What does happen if we generate the underlying network using a cascade process? Recover almost all the graph in the intersection. Can we explain it in theory?
Stochastic Graph Models, ICERM
What does happen if we delete all the edges inside a subset of the communities? More than 80% recall. Can we explain it in theory?
Stochastic Graph Models, ICERM
DBLP: we generate two co-authorship graphs. One considering only publications in even years and the other publication only in
Stochastic Graph Models, ICERM
DBLP: we generate two co-authorship graphs. One considering only publications in even years and the other publication only in
Gowalla: we generate two co-checkin graphs. One considering only checkins in even years and the other checkins only in
Stochastic Graph Models, ICERM
DBLP: we generate two co-authorship graphs. One considering only publications in even years and the other publication only in
Gowalla: we generate two co-checkin graphs. One considering only checkins in even years and the other checkins only in
German/French Wikipedia: we crawl the inter-languange links, we use few of them as seed and we check how many links we could recover.
Stochastic Graph Models, ICERM
Recall for Wikipedia ~30%
Stochastic Graph Models, ICERM
We have really good performance for high degree nodes
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM
How can we model this more general setting?
Stochastic Graph Models, ICERM
Stochastic Graph Models, ICERM