Efficient Algorithms for Public-Private Social Networks Flavio - - PowerPoint PPT Presentation

efficient algorithms for public private social networks
SMART_READER_LITE
LIVE PREVIEW

Efficient Algorithms for Public-Private Social Networks Flavio - - PowerPoint PPT Presentation

Efficient Algorithms for Public-Private Social Networks Flavio Chierichetti Vahab Mirrokni Alessandro Epasto Ravi Kumar Silvio Lattanzi Sapienza University Google Brown University Google Google KDD2015 Sydney, Australia August


slide-1
SLIDE 1

Efficient Algorithms for Public-Private Social Networks

KDD2015 — Sydney, Australia — August 11, 2015

Flavio Chierichetti Sapienza University Alessandro Epasto Brown University Ravi Kumar Google Silvio Lattanzi Google Vahab Mirrokni Google

slide-2
SLIDE 2

Idealized vision

Private-Public networks

slide-3
SLIDE 3

Reality

Private-Public networks

My friends are private

slide-4
SLIDE 4

Reality

Private-Public networks

My friends are private

A C B

slide-5
SLIDE 5

Reality

Private-Public networks

My friends are private Only my friends can see my friends

slide-6
SLIDE 6

Reality

Private-Public networks

My friends are private Only my friends can see my friends

A C B D

slide-7
SLIDE 7

Reality

Private-Public networks

We are a private group My friends are private Only my friends can see my friends

slide-8
SLIDE 8

Reality

Private-Public networks

~ 5 2 %

  • f

N Y C Facebook users hide their friends

We are a private group My friends are private Only my friends can see my friends

slide-9
SLIDE 9

Reality

Private-Public networks

~ 5 2 %

  • f

N Y C Facebook users hide their friends

Only my friends can see my friends We are a private group My friends are private

There is no such thing as the Social Network!

slide-10
SLIDE 10

Social network of

Each user has his/her own personal Social Network!

User A User A

slide-11
SLIDE 11

Social network of

User B

Each user has his/her own personal Social Network!

User B User A

slide-12
SLIDE 12

Computational implication

The algorithms need to respect the privacy of the users. We can only use the data that the user can access. Naively, we need to run the algorithms once for each user on a different (and huge) graph!

slide-13
SLIDE 13

Application: Friend suggestion

Network signals are very useful
 Number of common neighbors Personalized PageRank, etc.

A

My friends are private

B D C

slide-14
SLIDE 14

Application: Friend suggestion

A

My friends are private

B D C

Common Neighbors - Ideal World

1) Run the algorithm (in parallel) on the graph G 2) For each user suggest top k users by common neighbors.

… but there is no such graph G.

slide-15
SLIDE 15

Application: Friend suggestion

A

My friends are private

B D C

Common Neighbors - Real World

Multiple graphs = Multiple answers! How many common neighbors do B and C have?

A

Answer for One common neighbor: me!

slide-16
SLIDE 16

Application: Friend suggestion

A

My friends are private

B D C

Answer for Zero common neighbors!

B

We cannot suggest C to B as friends based on common neighbors! Common Neighbors - Real World

Multiple graphs = Multiple answers! How many common neighbors do B and C have?

slide-17
SLIDE 17

1) Running the algorithms N times is infeasible 2) Ignoring all private data is very ineffective!

Naive approaches

A

My friends are private

B D C E

From user A’s prospective there are interesting signals E and D are good suggestions!

slide-18
SLIDE 18

Naive approaches

A

My friends are private

B D C E

From public data prospective there are no signals! No suggestions for the user!

1) Running the algorithms N times is infeasible 2) Ignoring all private data is very ineffective!

slide-19
SLIDE 19

Public-Private Graph Model

slide-20
SLIDE 20

There is a public graph

Private-Public model

G

slide-21
SLIDE 21

There is a public graph in addition every node has access to a private graph

Private-Public model

G

u

u Gu

u

Gu We assume the private graph to be at <= 2 hops from .

u

slide-22
SLIDE 22

For each we would like to execute computation on

Private-Public model

u

u G ∪ Gu

slide-23
SLIDE 23

For each we would like to execute computation on

Private-Public model

u

u G ∪ Gu This respects the privacy of each user. We want the computation to be efficient.

slide-24
SLIDE 24

Two-Steps Approach

Precompute data structure for so that we can solve problems in efficiently. G G ∪ Gu

Preprocessing Synopsis of Public Graph

+

u

Query for user Private Graph Gu Output for User fast computation

G G u u

slide-25
SLIDE 25

Private-Public problem

Ideally. Preprocessing time: Preprocessing space: Query time: ˜ O (|VG|) ˜ O (|EG|) ˜ O (|EGu|)

slide-26
SLIDE 26

Warm-up: # connected components

slide-27
SLIDE 27

Warm-up: # connected components

Precompute component IDs in G

A A A A A A A B B B B B C C C C

slide-28
SLIDE 28

Warm-up: # connected components

Add private edges and merge conn. components

A A A A A A A B B B B B C C C C

slide-29
SLIDE 29

Warm-up: # connected components

Add private edges and merge conn. components.

A B A

slide-30
SLIDE 30

Algorithms
 Reachability Approximate All-pairs shortest paths Correlation clustering Social affinity Heuristics Personalized PageRank Centrality measures

Results

slide-31
SLIDE 31

Algorithms
 Reachability Approximate All-pairs shortest paths Correlation clustering Social affinity Heuristics Personalized PageRank Centrality measures

Results

slide-32
SLIDE 32

Reachability

u

How many nodes can I reach from u?

slide-33
SLIDE 33

Reachability

How many nodes can I reach from u?

u

We have to handle overlaps.

slide-34
SLIDE 34

Reachability

Key idea: use size-estimation sketch [Cohen JCSS97]

Every node samples a random number between [0,1]

0.1 0.9 0.5 0.2 0.3 0.33 0.23

slide-35
SLIDE 35

Reachability

Key idea: use size-estimation sketch [Cohen JCSS97]

Every node samples a random number between [0,1]. Look at the k-th smallest value, use it to estimate the size of the set.

0.1 0.9 0.5 0.2 0.3 0.33 0.23

[0.1, 0.2]

slide-36
SLIDE 36

Reachability

Key idea: use size-estimation sketch [Cohen JCSS97]

Every node samples a random number between [0,1]. Look at the k-th smallest value, use it to estimate the size of the set. Composable sketch of size k.

0.1 0.9 0.5 0.2 0.3 0.33 0.23 0.7 0.5 0.9 0.33 0.15

[0.15, 0.2] [0.1, 0.2]

slide-37
SLIDE 37

Reachability

Key idea: use size-estimation sketch [Cohen JCSS97]

Every node samples a random number between [0,1]. Look at the k-th smallest value, use it to estimate the size of the set. Composable sketch of size k.

0.1 0.9 0.5 0.2 0.3 0.33 0.23 0.7 0.5 0.9 0.33 0.15

[0.15, 0.2] [0.1, 0.2] [0.1, 0.15]

slide-38
SLIDE 38

Reachability

How many nodes can I reach from u?

u

Precompute sketches for each node in public graph.

[0.1, 1.0] [0.7, 1.0] [0.2, 0.3] [0.8, 1.0]

slide-39
SLIDE 39

Reachability

u

Compose sketches of nodes reachable in private graph.

[0.1, 1.0] [0.7, 1.0] [0.2, 0.3] [0.8, 1.0]

[0.1, 0.2]

How many nodes can I reach from u?

slide-40
SLIDE 40

Experiments Personalized PageRank

Approximating the PPR stationary distribution. Up to 4 orders of magnitudes faster naive approach.

slide-41
SLIDE 41

Conclusions

New model for practical problems; Some algorithms designed using sampling and
 sketching techniques; Large speed-up in practice.

slide-42
SLIDE 42

Future works

New algorithms for other problems; Not only graph problems; Study limit of the model (lower bounds).

slide-43
SLIDE 43

Thanks!

slide-44
SLIDE 44

Personalized PageRank

is the probability of visiting in the following lazy random walk:

  • with probability jumps to
  • with probability jumps to a random neighbor

PPR(v, z) α 1 − α v z

v

slide-45
SLIDE 45

is the probability of visiting in the following lazy random walk:

  • with probability jumps to
  • with probability jumps to a random neighbor

Personalized PageRank

PPR(v, z) α 1 − α v z

v

slide-46
SLIDE 46

is the probability of visiting in the following lazy random walk:

  • with probability jumps to
  • with probability jumps to a random neighbor

Personalized PageRank

PPR(v, z) α 1 − α v z

v

slide-47
SLIDE 47

is the probability of visiting in the following lazy random walk:

  • with probability jumps to
  • with probability jumps to a random neighbor

Personalized PageRank

PPR(v, z) α 1 − α v z

v

slide-48
SLIDE 48

is the probability of visiting in the following lazy random walk:

  • with probability jumps to
  • with probability jumps to a random neighbor

Personalized PageRank

PPR(v, z) α 1 − α v z

v

slide-49
SLIDE 49

is the probability of visiting in the following lazy random walk:

  • with probability jumps to
  • with probability jumps to a random neighbor

Personalized PageRank

PPR(v, z) α 1 − α v z

v

slide-50
SLIDE 50

is the probability of visiting in the following lazy random walk:

  • with probability jumps to
  • with probability jumps to a random neighbor

Personalized PageRank

PPR(v, z) α 1 − α v z

v

slide-51
SLIDE 51

Nice property [Jeh and Widom WWW03]

Personalized PageRank

v

PPRG∪Gu(v, z) = (1 − α)dG∪Gu(y)−1 X

y∈N(z)

PPRG∪Gu(v, y) + α1v

slide-52
SLIDE 52

Nice property [Jeh and Widom WWW03]

Personalized PageRank

v

PPRG∪Gu(v, z) = (1 − α)dG∪Gu(y)−1 X

y∈N(z)

PPRG∪Gu(v, y) + α1v

slide-53
SLIDE 53

Nice property [Jeh and Widom WWW03]

Personalized PageRank

v

PPRG∪Gu(v, z) = (1 − α)dG∪Gu(y)−1 X

y∈N(z)

PPRG∪Gu(v, y) + α1v We don’t have it

slide-54
SLIDE 54

Nice property [Jeh and Widom WWW03] Simple heuristic:

Personalized PageRank

v

PPRG∪Gu(v, z) = (1 − α)dG∪Gu(y)−1 X

y∈N(z)

PPRG∪Gu(v, y) + α1v PPRG∪Gu(v, z) ≈ (1 − α)dG∪Gu(y)−1 X

y∈N(z)

PPRG∪

u(v, y) + α1v

Using public graph distribution

slide-55
SLIDE 55

Social affinity

Which connection is stronger?

slide-56
SLIDE 56

Social affinity

Which connection is stronger? It is important to consider the number of paths and their lengths

slide-57
SLIDE 57

Social affinity

is the maximum fraction of edges that it is possible to delete and still have and connected with probability at least

v w

Aθ(v, w) v w θ

slide-58
SLIDE 58

Social affinity

How can we compute it?

v w

slide-59
SLIDE 59

Social affinity

How can we compute it? [Panigrahy et al. WSDM12] For each for delete the edge in the graph with probability . Store for each node the component ids

v w

p ∈ [0, 1 + ✏, (1 + ✏)2, . . . ]

log n

p

slide-60
SLIDE 60

Social affinity

How can we compute it? [Panigrahy et al. WSDM12] For each for delete the edge in the graph with probability . Store for each node the component ids

v w

p ∈ [0, 1 + ✏, (1 + ✏)2, . . . ]

log n

p [C, [B,

A B C

slide-61
SLIDE 61

Social affinity

How can we compute it? [Panigrahy et al. WSDM12] For each for delete the edge in the graph with probability . Store for each node the component ids

v w

p ∈ [0, 1 + ✏, (1 + ✏)2, . . . ]

log n

p [C, A [B, A

A B C

slide-62
SLIDE 62

Social affinity

How can we compute it? [Panigrahy et al. WSDM12] For each for delete the edge in the graph with probability . Store for each node the component ids

v w

p ∈ [0, 1 + ✏, (1 + ✏)2, . . . ]

log n

p [C, A,… [B, A,…

With samples we can estimate the connection probability

log n

slide-63
SLIDE 63

Social affinity

Using sketches of size per node we can estimate affinity.

v w [C, A,…

[B, A,… log2 n

slide-64
SLIDE 64

Social affinity

Using sketches of size per node we can estimate social affinity. When we add we have to update the sketches, it is enough to update the connected components!

v w [C, A,…

[B, A,… log2 n

Gu