SLIDE 1 Efficient Algorithms for Public-Private Social Networks
KDD2015 — Sydney, Australia — August 11, 2015
Flavio Chierichetti Sapienza University Alessandro Epasto Brown University Ravi Kumar Google Silvio Lattanzi Google Vahab Mirrokni Google
SLIDE 2
Idealized vision
Private-Public networks
SLIDE 3 Reality
Private-Public networks
My friends are private
SLIDE 4 Reality
Private-Public networks
My friends are private
A C B
SLIDE 5 Reality
Private-Public networks
My friends are private Only my friends can see my friends
SLIDE 6 Reality
Private-Public networks
My friends are private Only my friends can see my friends
A C B D
SLIDE 7 Reality
Private-Public networks
We are a private group My friends are private Only my friends can see my friends
SLIDE 8 Reality
Private-Public networks
~ 5 2 %
N Y C Facebook users hide their friends
We are a private group My friends are private Only my friends can see my friends
SLIDE 9 Reality
Private-Public networks
~ 5 2 %
N Y C Facebook users hide their friends
Only my friends can see my friends We are a private group My friends are private
There is no such thing as the Social Network!
SLIDE 10 Social network of
Each user has his/her own personal Social Network!
User A User A
SLIDE 11 Social network of
User B
Each user has his/her own personal Social Network!
User B User A
SLIDE 12
Computational implication
The algorithms need to respect the privacy of the users. We can only use the data that the user can access. Naively, we need to run the algorithms once for each user on a different (and huge) graph!
SLIDE 13 Application: Friend suggestion
Network signals are very useful
Number of common neighbors Personalized PageRank, etc.
A
My friends are private
B D C
SLIDE 14 Application: Friend suggestion
A
My friends are private
B D C
Common Neighbors - Ideal World
1) Run the algorithm (in parallel) on the graph G 2) For each user suggest top k users by common neighbors.
… but there is no such graph G.
SLIDE 15 Application: Friend suggestion
A
My friends are private
B D C
Common Neighbors - Real World
Multiple graphs = Multiple answers! How many common neighbors do B and C have?
A
Answer for One common neighbor: me!
SLIDE 16 Application: Friend suggestion
A
My friends are private
B D C
Answer for Zero common neighbors!
B
We cannot suggest C to B as friends based on common neighbors! Common Neighbors - Real World
Multiple graphs = Multiple answers! How many common neighbors do B and C have?
SLIDE 17 1) Running the algorithms N times is infeasible 2) Ignoring all private data is very ineffective!
Naive approaches
A
My friends are private
B D C E
From user A’s prospective there are interesting signals E and D are good suggestions!
SLIDE 18 Naive approaches
A
My friends are private
B D C E
From public data prospective there are no signals! No suggestions for the user!
1) Running the algorithms N times is infeasible 2) Ignoring all private data is very ineffective!
SLIDE 19
Public-Private Graph Model
SLIDE 20
There is a public graph
Private-Public model
G
SLIDE 21 There is a public graph in addition every node has access to a private graph
Private-Public model
G
u
u Gu
u
Gu We assume the private graph to be at <= 2 hops from .
u
SLIDE 22 For each we would like to execute computation on
Private-Public model
u
u G ∪ Gu
SLIDE 23 For each we would like to execute computation on
Private-Public model
u
u G ∪ Gu This respects the privacy of each user. We want the computation to be efficient.
SLIDE 24 Two-Steps Approach
Precompute data structure for so that we can solve problems in efficiently. G G ∪ Gu
Preprocessing Synopsis of Public Graph
+
u
Query for user Private Graph Gu Output for User fast computation
G G u u
SLIDE 25
Private-Public problem
Ideally. Preprocessing time: Preprocessing space: Query time: ˜ O (|VG|) ˜ O (|EG|) ˜ O (|EGu|)
SLIDE 26
Warm-up: # connected components
SLIDE 27 Warm-up: # connected components
Precompute component IDs in G
A A A A A A A B B B B B C C C C
SLIDE 28 Warm-up: # connected components
Add private edges and merge conn. components
A A A A A A A B B B B B C C C C
SLIDE 29 Warm-up: # connected components
Add private edges and merge conn. components.
A B A
SLIDE 30
Algorithms
Reachability Approximate All-pairs shortest paths Correlation clustering Social affinity Heuristics Personalized PageRank Centrality measures
Results
SLIDE 31
Algorithms
Reachability Approximate All-pairs shortest paths Correlation clustering Social affinity Heuristics Personalized PageRank Centrality measures
Results
SLIDE 32 Reachability
u
How many nodes can I reach from u?
SLIDE 33 Reachability
How many nodes can I reach from u?
u
We have to handle overlaps.
SLIDE 34 Reachability
Key idea: use size-estimation sketch [Cohen JCSS97]
Every node samples a random number between [0,1]
0.1 0.9 0.5 0.2 0.3 0.33 0.23
SLIDE 35 Reachability
Key idea: use size-estimation sketch [Cohen JCSS97]
Every node samples a random number between [0,1]. Look at the k-th smallest value, use it to estimate the size of the set.
0.1 0.9 0.5 0.2 0.3 0.33 0.23
[0.1, 0.2]
SLIDE 36 Reachability
Key idea: use size-estimation sketch [Cohen JCSS97]
Every node samples a random number between [0,1]. Look at the k-th smallest value, use it to estimate the size of the set. Composable sketch of size k.
0.1 0.9 0.5 0.2 0.3 0.33 0.23 0.7 0.5 0.9 0.33 0.15
[0.15, 0.2] [0.1, 0.2]
SLIDE 37 Reachability
Key idea: use size-estimation sketch [Cohen JCSS97]
Every node samples a random number between [0,1]. Look at the k-th smallest value, use it to estimate the size of the set. Composable sketch of size k.
0.1 0.9 0.5 0.2 0.3 0.33 0.23 0.7 0.5 0.9 0.33 0.15
[0.15, 0.2] [0.1, 0.2] [0.1, 0.15]
SLIDE 38 Reachability
How many nodes can I reach from u?
u
Precompute sketches for each node in public graph.
[0.1, 1.0] [0.7, 1.0] [0.2, 0.3] [0.8, 1.0]
SLIDE 39 Reachability
u
Compose sketches of nodes reachable in private graph.
[0.1, 1.0] [0.7, 1.0] [0.2, 0.3] [0.8, 1.0]
[0.1, 0.2]
How many nodes can I reach from u?
SLIDE 40
Experiments Personalized PageRank
Approximating the PPR stationary distribution. Up to 4 orders of magnitudes faster naive approach.
SLIDE 41
Conclusions
New model for practical problems; Some algorithms designed using sampling and
sketching techniques; Large speed-up in practice.
SLIDE 42
Future works
New algorithms for other problems; Not only graph problems; Study limit of the model (lower bounds).
SLIDE 43
Thanks!
SLIDE 44 Personalized PageRank
is the probability of visiting in the following lazy random walk:
- with probability jumps to
- with probability jumps to a random neighbor
PPR(v, z) α 1 − α v z
v
SLIDE 45 is the probability of visiting in the following lazy random walk:
- with probability jumps to
- with probability jumps to a random neighbor
Personalized PageRank
PPR(v, z) α 1 − α v z
v
SLIDE 46 is the probability of visiting in the following lazy random walk:
- with probability jumps to
- with probability jumps to a random neighbor
Personalized PageRank
PPR(v, z) α 1 − α v z
v
SLIDE 47 is the probability of visiting in the following lazy random walk:
- with probability jumps to
- with probability jumps to a random neighbor
Personalized PageRank
PPR(v, z) α 1 − α v z
v
SLIDE 48 is the probability of visiting in the following lazy random walk:
- with probability jumps to
- with probability jumps to a random neighbor
Personalized PageRank
PPR(v, z) α 1 − α v z
v
SLIDE 49 is the probability of visiting in the following lazy random walk:
- with probability jumps to
- with probability jumps to a random neighbor
Personalized PageRank
PPR(v, z) α 1 − α v z
v
SLIDE 50 is the probability of visiting in the following lazy random walk:
- with probability jumps to
- with probability jumps to a random neighbor
Personalized PageRank
PPR(v, z) α 1 − α v z
v
SLIDE 51 Nice property [Jeh and Widom WWW03]
Personalized PageRank
v
PPRG∪Gu(v, z) = (1 − α)dG∪Gu(y)−1 X
y∈N(z)
PPRG∪Gu(v, y) + α1v
SLIDE 52 Nice property [Jeh and Widom WWW03]
Personalized PageRank
v
PPRG∪Gu(v, z) = (1 − α)dG∪Gu(y)−1 X
y∈N(z)
PPRG∪Gu(v, y) + α1v
SLIDE 53 Nice property [Jeh and Widom WWW03]
Personalized PageRank
v
PPRG∪Gu(v, z) = (1 − α)dG∪Gu(y)−1 X
y∈N(z)
PPRG∪Gu(v, y) + α1v We don’t have it
SLIDE 54 Nice property [Jeh and Widom WWW03] Simple heuristic:
Personalized PageRank
v
PPRG∪Gu(v, z) = (1 − α)dG∪Gu(y)−1 X
y∈N(z)
PPRG∪Gu(v, y) + α1v PPRG∪Gu(v, z) ≈ (1 − α)dG∪Gu(y)−1 X
y∈N(z)
PPRG∪
u(v, y) + α1v
Using public graph distribution
SLIDE 55
Social affinity
Which connection is stronger?
SLIDE 56
Social affinity
Which connection is stronger? It is important to consider the number of paths and their lengths
SLIDE 57 Social affinity
is the maximum fraction of edges that it is possible to delete and still have and connected with probability at least
v w
Aθ(v, w) v w θ
SLIDE 58 Social affinity
How can we compute it?
v w
SLIDE 59 Social affinity
How can we compute it? [Panigrahy et al. WSDM12] For each for delete the edge in the graph with probability . Store for each node the component ids
v w
p ∈ [0, 1 + ✏, (1 + ✏)2, . . . ]
log n
p
SLIDE 60 Social affinity
How can we compute it? [Panigrahy et al. WSDM12] For each for delete the edge in the graph with probability . Store for each node the component ids
v w
p ∈ [0, 1 + ✏, (1 + ✏)2, . . . ]
log n
p [C, [B,
A B C
SLIDE 61 Social affinity
How can we compute it? [Panigrahy et al. WSDM12] For each for delete the edge in the graph with probability . Store for each node the component ids
v w
p ∈ [0, 1 + ✏, (1 + ✏)2, . . . ]
log n
p [C, A [B, A
A B C
SLIDE 62 Social affinity
How can we compute it? [Panigrahy et al. WSDM12] For each for delete the edge in the graph with probability . Store for each node the component ids
v w
p ∈ [0, 1 + ✏, (1 + ✏)2, . . . ]
log n
p [C, A,… [B, A,…
With samples we can estimate the connection probability
log n
SLIDE 63 Social affinity
Using sketches of size per node we can estimate affinity.
v w [C, A,…
[B, A,… log2 n
SLIDE 64 Social affinity
Using sketches of size per node we can estimate social affinity. When we add we have to update the sketches, it is enough to update the connected components!
v w [C, A,…
[B, A,… log2 n
Gu